33
Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi) 1 Self-Learning Anti-Virus Scanner

Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette 2008 AVAR (New Delhi)1

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Arun Lakhotia, ProfessorAndrew Walenstein, Assistant Professor

University of Louisiana at Lafayettewww.cacs.louisiana.edu/labs/SRL

2008 AVAR (New Delhi) 1

Self-Learning Anti-Virus Scanner

Page 2: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Introduction

AVAR 2008 (New Delhi) 2

Director, Software Research Lab

Lab’s focus: Malware Analysis

Graduate level course on Malware Analysis

Six years of AV related research

Issues investigated:• Metamorphism• Obfuscation

Alumni in AV Industry

Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar

McAfee AVERT Erik Uday Kumar,

Authentium Moinuddin Mohammed,

Microsoft Prashant Pathak,

Ex-Symantec

Funded by: Louisiana Governor’s IT Initiative

Page 3: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Outline

2008 AVAR (New Delhi) 3

Attack of VariantsAV vulnerability: Exact match

Information Retrieval TechniquesInexact match

Adapting IR to AVAccount for code permutation

Vilo: System using IR for AVIntegrating Vilo into AV InfrastructureSelf-Learning AV using Vilo

Page 4: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

ATTACK OF VARIANTS

2008 AVAR (New Delhi) 4

Page 5: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

AVAR 2008 (New Delhi)

Variants vs Family

0

50000

100000

150000

200000

250000

Half Year

Total Variants Total Family

Total Variants 1E+062609138 84752E+04E+05E+05E+05E+07E+02E+0

Total Family 141 184 164 171 170 104 101

02-I02-II

03-I03-II

04-I04-II

05-I05-II

06-I06-II

07-I

5

Source: Symantec Internet Threat Report, XI

Page 6: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Analysis of attacker strategy

2008 AVAR (New Delhi) 6

Purpose of attack of variantsDenial of Service on AV infrastructureIncrease odds of passing through

Weakness exploitedAV system use: Exact match over extract

Attack strategyGenerate just enough variation to beat exact

matchAttacker cost

Cost of generating and distributing variants

Page 7: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Analyzing attacker cost

2008 AVAR (New Delhi) 7

Payload creation is expensiveMust reuse payload

Need thousands of variantsMust be automated

“General” transformers are expensiveSpecialized, limited transformers

Hence packers/unpackers

Page 8: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Attacker vulnerability

2008 AVAR (New Delhi) 8

Automated transformersLimited capabilityMachine generated, must have regular

patternExploiting attacker vulnerability

Detect patterns of similaritiesApproach

Information Retrieval (this presentation)Markov Analysis (other work)

Page 9: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Information Retrieval

2008 AVAR (New Delhi) 9

Page 10: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Basics

2008 AVAR (New Delhi) 10

Basis of Google, BioinformaticsOrganizing very large corpus of dataKey idea

Inexact match over wholeContrast with AV

Exact match over extract

Page 11: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Problem

AVAR 2008 (New Delhi) 11

IR

Document Collection

Query: Keywords

orDocument

Related documents

Page 12: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Steps

AVAR 2008 (New Delhi) 12

Have you wondered

When is a rose a rose?Have you wonderedYou wondered whenWondered when roseWhen rose rose

Step 1: Convert documents to vectors1a. Define a method to identify “features”

Example: k-consecutive words

1b. Extract all features from all documents

1c. Count features, make feature vector

1

How about onions

Onion smell stinks

1

1

1

0

0

[1, 1, 1, 1, 0,0]

Page 13: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Steps

AVAR 2008 (New Delhi) 13

Step 2: Compute feature vectorsTake into account features in entire corpusClassical method

W=TF x IDF

You wondered when

Wondered when roseWhen rose rose

How about onions

Onion smell stinks

DF = # documents containing the feature

TF = Term Frequency

5

7

8

6

3

DF

1/5

1/7

1/8

1/6

1/3

IDF

IDF = Inverse of DF

1

2

5

3

0

TF(v1)

1/5

2/7

5/8

3/6

0/3

w1 = TFxIDF(v1)

Page 14: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Steps

2008 AVAR (New Delhi) 14

Step 3: Compare vectorsCosine similarity

||||),(

21

2121 ww

wwwwsim

w1 = [0.33, =0.25, 0.66, 0.50]

w1 = [0.33, =0.25, 0.66, 0.50]

222222222100.33.63.44.50.66.25.33.

00.50.33.66.63.25.44.33.),(

wwsim

Page 15: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

IR Steps

AVAR 2008 (New Delhi) 15

Step 4: Document RankingUsing similarity measure

IR

Document Collection

0.90

0.82

0.76

0.30

Matching document

New Document

Page 16: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Adapting IR for AV

AVAR 2008 (New Delhi) 16

Page 17: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Adapting IR for AV

2008 AVAR (New Delhi) 17

l2D2: push ecxpush

4pop

ecxpush

ecxl2D7: rol

edx, 8mov

dl, aland

dl, 3Fhshr

eax, 6loop

l2D7pop

ecxcall

s319xchg

eax, edxstosdxchg

eax, edxinc

[ebp+v4]cmp

[ebp+v4], 12hjnz

short l305

l144: push ecxpush 4pop

ecxpush

ecxl149: movdl, al

anddl, 3Fh

roledx, 8

shrebx, 6

loopl149

popecx

calls52F

xchgebx, edx

stosdxchg

ebx, edxinc

[ebp+v4]cmp

[ebp+v4], 12hjnz

short l18

l2D2: push ecxpush

4pop

ecxpush

ecxl2D7: rol

edx, 8mov

dl, aland

dl, 3Fhshr

eax, 6loop

l2D7pop

ecxcall

s319xchg

eax, edxstosdxchg

eax, edxinc

[ebp+v4]cmp

[ebp+v4], 12hjnz

short l305

l144: push ecxpush

4pop

ecxpush

ecxl149: mov

dl, aland

dl, 3Fhrol

edx, 8shr

ebx, 6loop

l149pop

ecxcall

s52Fxchg

ebx, edxstosdxchg

ebx, edxinc

[ebp+v4]cmp

[ebp+v4], 12hjnz

short l18

pushpushpoppushrolmovandshrlooppopcallxchgstosdxchginccmpjnz

pushpushpoppushmovandrolshrlooppopcallxchgstosdxchginccmpjnz

Step 0: Mapping program to document

Extract Sequence of operations

Page 18: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Adapting IR for AV

2008 AVAR (New Delhi) 18

Step 1a: Defining features k-perm

PPOPRMASLOCXSXICJ

PPOPMARSLOCXSXICJ

P P O P R M A S L O C X S X I C J

P P O P S L O C X S X I C JRM A

Virus 1

Virus 2

Feature = Permutation of k operations

Page 19: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Adapting IR for AV

AVAR 2008 (New Delhi) 19

P P O P R M A S L O C X S X I C J

P P O P I C JO C X S XM A R S L

P P O P I C JO C X S XM A R S L

P P O P I C JO C X S XM A R S L P O P

Virus 1

Virus 2

Virus 3

Step 1 Example of 3-perm

Page 20: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Adapting IR for AV

AVAR 2008 (New Delhi) 20

POPR OPRM PRMA RMAS MASL POPM OPMA ARSL RSLP SLPO LPOP

1 1 1 1 1 1 0 0 0 0 0 0

2 0 0 0 1 1 1 0 0 0

3 0 0 0 0 0 0 1 1 1 1

1 1

1

PMAR MARS

0 0

0

0 0

0

P O P R M A S L

P O P M A R S L

1

2

3 M A R S L P O P

PMAR MARS

Step 2: Construct feature vectors (4-perms)

Page 21: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

AVAR 2008 (New Delhi)

Adapting IR for AV

21

Step 3: Compare vectorsCosine similarity (as before)

Step 4: Match new sample

Page 22: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo: System using IR for AV

AVAR 2008 (New Delhi) 22

Page 23: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo Functional View

AVAR 2008 (New Delhi) 23

Vilo

Malware Collection

0.90

0.82

0.76

0.30

Malware Match

New Sample

Page 24: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo in Action: Query Match

AVAR 2008 (New Delhi) 24

Page 25: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo: Performance

AVAR 2008 (New Delhi) 25

Response time vs Database size

Search on generic desktop: In Seconds

Contrast withBehavior match: In Minutes

Graph match: In Minutes

Page 26: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo Match Accuracy

AVAR 2008 (New Delhi) 26

ROC Curve: True Positive vs False Positive

False Positive

Tru

e P

osit

ive

Page 27: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo in AV Product

AVAR 2008 (New Delhi) 27

Page 28: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Vilo in AV Product

AVAR 2008 (New Delhi) 28

AV ScannerClassifier Classifier ClassifierViloClassifier Classifier

AV Systems: Composed of classifiers

Introduce Vilo as a Classifier

Page 29: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Self-Learning AV Product

AVAR 2008 (New Delhi) 29

ViloClassifier Classifier

How to get malware collection?

Collect malware detected by the Product.

Solution 1

Page 30: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Self-Learning AV Product

AVAR 2008 (New Delhi) 30

ViloClassifier Classifier

Internet Cloud

Vilo

How to get malware collection?

Collect and learn in the cloud

Solution 2

Page 31: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Learning in the Cloud

AVAR 2008 (New Delhi) 31

Vilo ClassifierClassifier Classifier

Internet CloudVilo Learner

How to get malware collection?

Collect and learn in the cloud

Solution 2

Page 32: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Experience with Vilo-Learning

AVAR 2008 (New Delhi) 32

Vilo-in-the-cloud holds promiseCan utilize cluster of workstations

Like GoogleTake advantage of increasing bandwidth and

compute powerEngineering issues to address

Control growth of databaseForget samplesUse “signature” feature vector(s) for familyBe “selective” about features to use

Page 33: Arun Lakhotia, Professor Andrew Walenstein, Assistant Professor University of Louisiana at Lafayette  2008 AVAR (New Delhi)1

Summary

AVAR 2008 (New Delhi) 33

Weakness of current AV systemExact match over extract

Exploited by creating large number of variants

Information Retrieval research strengthsInexact match over whole

VILO demonstrates IR techniques have promise

Architecture of Self-Learning AV SystemIntegrate VILO into existing AV systemsCreate feedback mechanism to drive learning