58
Malware Analysis Machine Learning Approach DeepSec 14-17 Nov 2017 – Vienna, Austria Chiheb Chebbi TEK-UP University

Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Malware Analysis Machine Learning Approach

DeepSec 14-17 Nov 2017 – Vienna, Austria

Chiheb ChebbiTEK-UP University

Page 2: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

<whoami/>

• Computer science engineering student @TEK-UP

• Cyber security leadership program fellow @Kaspersky_Lab

• Author / Technical Reviewer @Packt_Publishing UK

• Invited as a speaker to:

Besides Tempa Florida2017, BH Europe 2016,NASA SAC ...

Page 3: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Source: State of Malware Report 2017- MalwareBytes LABS

Page 4: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Source: State of Malware Report 2017- MalwareBytes LABS

Page 5: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Ransomware 49 % Android Malware 31 % Adware 37 %

Source: State of Malware Report 2017- MalwareBytes LABS

Page 6: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Malware Analysis TechniquesStatic Analysis

the examination of the malware samplewithout executing

Dynamic AnalysisDynamic analysis techniques track all themalware activities

Memory Analysis

the act of analyzing a dumped memoryimage from a targeted machine afterexecuting the malware

Page 7: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Source: McAfee Labs, 2017.

Page 8: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 9: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Machine Learning

Artificial Intelligence

Ability to perform tasks normally requiringhuman intelligence, such as visual perception,speech recognition

Machine Learning

the study and the creation of algorithms that can learn from data and make prediction on them

Page 10: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 11: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 12: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Machine Learning Models

Supervised Learningwe have input variables (I) and an output variable (O) and we need to map the functionDecision Trees, Nave Bayes Classification,Support Vector Machines

Unsupervised learningwe only have input data (X)

Reinforcementthe agent or the system is improving its performance based ona reward function

Page 13: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 14: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 15: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 16: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 17: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Hot DogOR

Not Hot Dog

Page 18: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 19: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 20: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Machine Learning Workflow

Page 21: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Malware Datasets

Malware Analysis Process Entry Points:

• File• URL• PCAP• Memory Image

Page 22: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Hidden Markov Models

Markov process or what we call a Markov chain is a stochastic modelused for any random system that change its states according to fixed probabilities

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a collection of random variables

Page 23: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Hidden Markov Models

• The Hidden Markov Model is a Markov Process where we are unable to directly observe the state of the system.

Each state has a fixed probability of ”emitting”.

p is a sequence of states (AKA a path).

Each p i takes a value from set Q.

We do not observe p

Page 24: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Hidden Markov Models

Page 25: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 26: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Hidden Markov Models

Page 27: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Classic Problems of Hidden Markov Model

• Problem 1: State Estimation Given a model λ = ( A , B , Π ) and an observation sequence O, we need to find P(O—λ).That is to determine the likelihood and check the wellness of the given model.

• Problem 2: Decoding or Most Probable Path (MPP): Given a model λ = ( A , B , Π ) and ,and an observation sequence O, to determine the optimal state sequence Q for the given model

• Problem 3: Training/Learning HMM: Given O, N, M, we can find a model that maximizes probability of O and learn the two HMM parameters A and B.

Page 28: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Solutions

• Forward-Backward technique• Viterbi Decoding technique• Baum-Welch (Expectation Maximization) technique

Page 29: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Profile Hidden Markov Model

• By definition a profile is a pattern of conservation.

The Profile Hidden Markov Model is a probabilistic approach that was developed specially for modeling sequence similarity occurring in biological sequences such as proteins and DNA.

• Profile HMM is a modified implementation of HMM.

Page 30: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

• HMMER is an open source implementation of Profile Hidden Markov

Models. It is basically built to build HMM models for protein sequences and alignment but in our case we are going to adopt it to build models for malware behaviour sequences.

Page 31: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 32: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Machine learning Model Evaluation Metrics

tp = True Positivefp= False Positivetn = True Negativefn = False Negative

Confusion Matrix

Page 33: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 34: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Low Detection Rate :'(

Page 35: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 36: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 37: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 38: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 39: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

One Algorithm Hypothesis

• There is some evidence that the human brain uses essentiallythe same algorithm to understand many different inputmodalities.

•Ferret experiments, in which the “input” for vision was pluggedinto auditory part of brain, and the auditory cortex learns to“see.”[Roe et al., 1992]

Page 40: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

“Look deep into nature,

and then you will understand everything better.”

Albert Einstein

Page 41: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 42: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 43: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

• The artificial model of a neuron is called perceptron

Page 44: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 45: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 46: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Backpropagation

Backpropagation is the process of trying to keep the error asdown as possible.

Stochastic Gradient Descent

Page 47: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Microsoft Malware Classification Challenge (BIG 2015)

10K Malware 500 GB

Page 48: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 49: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

•Accurately detects malware at > 90%

Page 50: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Well documented and open source frameworks

Page 51: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 52: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Deep learning life-cycle

• Network Definition

• Network Compiling

• Network Fitting

• Network Evaluation

• Prediction

Page 53: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image
Page 54: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Machine Learning vs Deep Learning

Page 55: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

Gartner report: “Intelligent and Automated Security Controls Impact the Future of the Security Market”, Oct 2015

Page 56: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

•Machine learning in cybersecurity will enormously booster spending in big data, intelligence and analytics, reaching as much as $96 billion (£71.9 billion) by 2021.

Page 57: Malware Analysis Machine Learning Approach - DeepSec · Malware Analysis Machine Learning Approach ... malware activities Memory Analysis the act of analyzing a dumped memory image

References

[1] Defeating Machine Learning What Your Security Vendor is Not Telling You – Blackhat USA 2015[2] Deep Learning for Malware Analysis Machine Learning for Computer Security Hugo Gascón[3] State of the art MalwareBytes Report 2017[4] Deep Machine Learning Meets Cybersecurity[5] How to build a malware classifier [that doesn't suck on real-world data]