41
SESSION ID: SESSION ID: #RSAC Evan Gaustad Applied Machine Learning: Defeating Modern Malicious Documents HT-W02 Sr. Manager CSIRT Target Corporation

Applied Machine Learning: Defeating Modern Malicious … · #RSAC Detecting Malicious Macros. How hard is it to create: ... Supervised Machine Learning – Classification Well thought

  • Upload
    phamdan

  • View
    233

  • Download
    2

Embed Size (px)

Citation preview

SESSION ID:SESSION ID:

#RSAC

Evan Gaustad

Applied Machine Learning:Defeating Modern Malicious Documents

HT-W02

Sr. Manager CSIRTTarget Corporation

#RSAC

Agenda

2

Office Macro Use and Abuse

Malicious documents in attack lifecycle

Machine Learning for Malware Detection

Demo Project: Malicious Macro Bot

Conclusion

#RSAC

Macro-enabled Microsoft Office Documents

3

An office macro is code that automates tasks in office documents

Automatically fill out formsUpdate graphs and display dataMake web requestsPerform computations

Written in Visual Basic for Applications (VBA)

VBA Support built into MS Office

99.7% of documents used in attachment-based campaigns relied on social engineering and macros, rather than exploits.- Proofpoint1

#RSAC

Attacker motivation for malicious office docs

4

Barrier of entry is very low

Uses built in, cross-platform features“exploit” reliability is high

Can implement sandbox evasion

Easy to update to evade AV signatures

#RSAC

Malicious Macro-enabled Office Documents

5

Used by an attacker to gain code execution on the targeted system(s)

Common Attacker VBA Techniques:Download and execute malicious payloadDrop and execute embedded payloads or scriptsObfuscation to hide intentSandbox evasion techniquesPayload targeting

“98% of Office-targeted threats use macros”- Microsoft2

#RSAC

Example: Maldocs in Attack Lifecycle

1) Phishing email with attachment

“Invoice Past Due”

6

#RSAC

7

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

Example: Maldocs in Attack Lifecycle

#RSAC

Example: Maldocs in Attack Lifecycle

8

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

#RSAC

Example: Maldocs in Attack Lifecycle

9

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

#RSAC

Example: Maldocs in Attack Lifecycle

10

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

5) Install additional malware e.g. Pony, Hancitor, Vawtrak

#RSAC

Example: Maldocs in Attack Lifecycle

11

1) Phishing email with attachment

“Invoice Past Due”

2) Victim opens file, allows macros to run

3) Malicious macro executes

4) Downloads / drops executables or powershell

5) Install additional malware e.g. Pony, Hancitor, Vawtrak

6) Steal credentials, data, maintain persistence, command and control

VictimAttackerhttp://.../gate.php

#RSAC

Detecting Malicious MacrosHow hard is it to create:

a malicious macro…that runs an executable…on victim’s machine…and evades AV?

Some easy to find tools:CrunchCode7

MacroShop8

Veil Framework9

Generate-Macro10

Criminals sell their own

12

#RSAC

Detecting Malicious MacrosHow hard is it to create:

a malicious macro…that runs an executable…on victim’s machine…and evades AV?

Some easy to find tools:CrunchCode7

MacroShop8

Veil Framework9

Generate-Macro10

Criminals sell their own

13

Really easy

#RSAC

Detecting Malicious Macros

14

#RSAC

Detecting Malicious Macros

15

#RSAC

Why Machine Learning?

16

Existing anti-virus and sandbox techniques can be subverted

Automates extracting insight from file samples

Can better generalize at identifying unknown variations

Reduces human analysis time

#RSAC

Project Approach

17

Goals:Triage: Determine whether a new Microsoft Office document contains a malicious or benign macroDetection: Provide useful detection when signature-based methods failThreat Intelligence: identify phishing campaigns

Guiding Principles:Supervised Machine Learning – ClassificationWell thought out featuresGeneralized and interpretable model output

#RSAC

Applied Machine Learning Steps

18

Benign Files

Malicious Files

Collect labeled data

#RSAC

Applied Machine Learning Steps

19

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

Collect labeled data Feature extraction

#RSAC

“Feature Engineering”

20

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Which one is malicious?

Why?

How would you measure that?

#RSAC

“Feature Engineering”

21

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'#############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0

Entropy 4.3 3.8

#RSAC

Feature Engineering

22

#RSAC

Feature Engineering

23

#RSAC

Applied Machine Learning Steps

24

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

…Classification

Model

Collect labeled data Feature extraction Train and Testmodel

Classification Models

#RSAC

Choose and Test Model

25

DOCUMENT #1…BHJASD = Chr(102 + 8)Set uHhdBhd = CreateObject("" & "W" & "" & "or" & "d." & "Applicatio" & BHJASD)uHhdBhd.Documents.Open(FFFNNNF)Module1.Tyryka (2)HYUASGD = Module1.Girow(WOIEW)Module1.Tyryka (3)uHhdBhd.QuitSet uHhdBhd = NothingEnd SubPublic Function Girow(qqa As String)Dim jjz As Variantjjz = Shell(qqa, 0)…

DOCUMENT #2…'#############################'# Code to Add Total Value Formula #'##############################

'Go to the top of the Price columnRange("H10").Select

'Find the bottom value - there are no values in the Non Stock Items

Selection.End(xlDown).Select'Check to see if still in the order form range - if not there were no Standard Items Selected

If ActiveCell.Row > 1000 Then GoToTidyUp…

Feature Doc1 Doc2# Lines of Code 74 584# Comments 8 161# Functions 9 14# Shell Instructions 1 0

Entropy 4.3 3.8

#RSAC

Simple Decision Tree Model

26

entropy <= 4.27samples = 88

samples = 47class = benign

# comments <= 39.0samples = 41

samples = 47class = benign

samples = 47class = malicious

True False

True False

#RSAC

Simple Decision Tree Model

27

entropy <= 4.27samples = 88

samples = 47class = benign

# comments <= 39.0samples = 41

samples = 47class = benign

samples = 47class = malicious

True False

True False

Doc #1

Feature Doc1 Doc2Entropy 4.3 3.8# Comments 8 161

Doc #2

#RSAC

Applied Machine Learning Steps

28

Benign Files

Malicious FilesFeature

Extraction

5.7 10 98 …1.2 23 15 …0.7 57 20 …

…Classification

Model

Collect labeled data Feature extraction Train and Testmodel

Classification Model

Deploy Model

NewFiles

“Benign”“Malicious”

Classification Models

#RSAC

Malicious Macro Bot Project

29

Model factored in over 20,000 samples

Analyzed over 121,000 samples from 7 months of VirusTotal samples

Over a thousand featuresVBA built-in language semantics for base language e.g. Shell, Dim, If, …Code heuristics e.g. LOC, # functions, entropy, …

Use Random Forest Classifier Fits many decision trees on many subsets of the datasetPicks best decision tree combinations“Ensemble”

#RSAC

Demo: Malicious Macro Bot Project

30

Demonstrate classification

Gaining insight from machine learning features

Identifying phishing campaigns through featureprints

Search and visualize in Elasticsearch / Kibana

#RSAC

Conclusion

31

Project Uses:Threat Intelligence: Identify new phishing campaignsDetection: Fill traditional detection gapsIncident Response: Rapid triage of office documents

Prevention would be best

#RSAC

Thank You!

32

https://github.com/egaus/MaliciousMacroBot

#RSAC

References

33

[1] Proofpoint “Human Factor Report”, 2016. https://www.proofpoint.com/sites/default/files/human-factor-report-2016.pdf[2] Microsoft, “New feature in Office 2016 can block macros and help prevent infection”, Mar 22, 2016. https://blogs.technet.microsoft.com/mmpc/2016/03/22/new-feature-in-office-2016-can-block-macros-and-help-prevent-infection/[3] Proofpoint, “The Cybercrime Economics of Malicious Macros”, 2016. https://www.proofpoint.com/sites/default/files/documents/bnt_download/pp-macroeconomics-rr.pdf[4] Ankit Anubhav, Dileep Kumar Jallepalli. “Hancitor (aka Chanitor) Observed Using Multiple Attack Approaches”. FireEye. Sept 23, 2016. https://www.fireeye.com/blog/threat-research/2016/09/hancitor_aka_chanit.html[5] PonyUp: Tracing Pony’s Threat Cycle and Multi-Stage Infection Chain. Damballa. Aug. 2015. https://www.damballa.com/wp-content/uploads/2015/08/Damballa_PonyUp.pdf[6] New Hancitor: Pimp my Downloader. Minerva Labs Research Team. Aug 19, 2016. http://www.minerva-labs.com/post/new-hancitor-pimp-my-downloader[7] CrunchCode http://www.crunchcode.de/en/index.html[8] MacroShop https://github.com/khr0x40sh/MacroShop[9] Veil Evasion Framework https://github.com/Veil-Framework/Veil-Evasion[10] Generate-Macro https://github.com/enigma0x3/Generate-Macro[11] SciKit Learn Algorithm Cheat Sheet. http://scikit-learn.org/stable/tutorial/machine_learning_map/

#RSAC

Thank You!

34

Questions?

#RSAC

Offline Demo

35

#RSAC

Identifying Phishing Campaigns

36

#RSAC

Identifying Phishing Campaigns

37

#RSAC

Identifying Phishing Campaigns

38

#RSAC

Identifying Phishing Campaigns

39

#RSAC

Identifying Phishing Campaigns

40

#RSAC

Identifying Phishing Campaigns

41