View
215
Download
0
Embed Size (px)
Citation preview
Arun Lakhotia, ProfessorAndrew Walenstein, Assistant Professor
University of Louisiana at Lafayettewww.cacs.louisiana.edu/labs/SRL
2008 AVAR (New Delhi) 1
Self-Learning Anti-Virus Scanner
Introduction
AVAR 2008 (New Delhi) 2
Director, Software Research Lab
Lab’s focus: Malware Analysis
Graduate level course on Malware Analysis
Six years of AV related research
Issues investigated:• Metamorphism• Obfuscation
Alumni in AV Industry
Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar
McAfee AVERT Erik Uday Kumar,
Authentium Moinuddin Mohammed,
Microsoft Prashant Pathak,
Ex-Symantec
Funded by: Louisiana Governor’s IT Initiative
Outline
2008 AVAR (New Delhi) 3
Attack of VariantsAV vulnerability: Exact match
Information Retrieval TechniquesInexact match
Adapting IR to AVAccount for code permutation
Vilo: System using IR for AVIntegrating Vilo into AV InfrastructureSelf-Learning AV using Vilo
ATTACK OF VARIANTS
2008 AVAR (New Delhi) 4
AVAR 2008 (New Delhi)
Variants vs Family
0
50000
100000
150000
200000
250000
Half Year
Total Variants Total Family
Total Variants 1E+062609138 84752E+04E+05E+05E+05E+07E+02E+0
Total Family 141 184 164 171 170 104 101
02-I02-II
03-I03-II
04-I04-II
05-I05-II
06-I06-II
07-I
5
Source: Symantec Internet Threat Report, XI
Analysis of attacker strategy
2008 AVAR (New Delhi) 6
Purpose of attack of variantsDenial of Service on AV infrastructureIncrease odds of passing through
Weakness exploitedAV system use: Exact match over extract
Attack strategyGenerate just enough variation to beat exact
matchAttacker cost
Cost of generating and distributing variants
Analyzing attacker cost
2008 AVAR (New Delhi) 7
Payload creation is expensiveMust reuse payload
Need thousands of variantsMust be automated
“General” transformers are expensiveSpecialized, limited transformers
Hence packers/unpackers
Attacker vulnerability
2008 AVAR (New Delhi) 8
Automated transformersLimited capabilityMachine generated, must have regular
patternExploiting attacker vulnerability
Detect patterns of similaritiesApproach
Information Retrieval (this presentation)Markov Analysis (other work)
Information Retrieval
2008 AVAR (New Delhi) 9
IR Basics
2008 AVAR (New Delhi) 10
Basis of Google, BioinformaticsOrganizing very large corpus of dataKey idea
Inexact match over wholeContrast with AV
Exact match over extract
IR Problem
AVAR 2008 (New Delhi) 11
IR
Document Collection
Query: Keywords
orDocument
Related documents
IR Steps
AVAR 2008 (New Delhi) 12
Have you wondered
When is a rose a rose?Have you wonderedYou wondered whenWondered when roseWhen rose rose
Step 1: Convert documents to vectors1a. Define a method to identify “features”
Example: k-consecutive words
1b. Extract all features from all documents
1c. Count features, make feature vector
1
How about onions
Onion smell stinks
1
1
1
0
0
[1, 1, 1, 1, 0,0]
IR Steps
AVAR 2008 (New Delhi) 13
Step 2: Compute feature vectorsTake into account features in entire corpusClassical method
W=TF x IDF
You wondered when
Wondered when roseWhen rose rose
How about onions
Onion smell stinks
DF = # documents containing the feature
TF = Term Frequency
5
7
8
6
3
DF
1/5
1/7
1/8
1/6
1/3
IDF
IDF = Inverse of DF
1
2
5
3
0
TF(v1)
1/5
2/7
5/8
3/6
0/3
w1 = TFxIDF(v1)
IR Steps
2008 AVAR (New Delhi) 14
Step 3: Compare vectorsCosine similarity
||||),(
21
2121 ww
wwwwsim
w1 = [0.33, =0.25, 0.66, 0.50]
w1 = [0.33, =0.25, 0.66, 0.50]
222222222100.33.63.44.50.66.25.33.
00.50.33.66.63.25.44.33.),(
wwsim
IR Steps
AVAR 2008 (New Delhi) 15
Step 4: Document RankingUsing similarity measure
IR
Document Collection
0.90
0.82
0.76
0.30
Matching document
New Document
Adapting IR for AV
AVAR 2008 (New Delhi) 16
Adapting IR for AV
2008 AVAR (New Delhi) 17
l2D2: push ecxpush
4pop
ecxpush
ecxl2D7: rol
edx, 8mov
dl, aland
dl, 3Fhshr
eax, 6loop
l2D7pop
ecxcall
s319xchg
eax, edxstosdxchg
eax, edxinc
[ebp+v4]cmp
[ebp+v4], 12hjnz
short l305
l144: push ecxpush 4pop
ecxpush
ecxl149: movdl, al
anddl, 3Fh
roledx, 8
shrebx, 6
loopl149
popecx
calls52F
xchgebx, edx
stosdxchg
ebx, edxinc
[ebp+v4]cmp
[ebp+v4], 12hjnz
short l18
l2D2: push ecxpush
4pop
ecxpush
ecxl2D7: rol
edx, 8mov
dl, aland
dl, 3Fhshr
eax, 6loop
l2D7pop
ecxcall
s319xchg
eax, edxstosdxchg
eax, edxinc
[ebp+v4]cmp
[ebp+v4], 12hjnz
short l305
l144: push ecxpush
4pop
ecxpush
ecxl149: mov
dl, aland
dl, 3Fhrol
edx, 8shr
ebx, 6loop
l149pop
ecxcall
s52Fxchg
ebx, edxstosdxchg
ebx, edxinc
[ebp+v4]cmp
[ebp+v4], 12hjnz
short l18
pushpushpoppushrolmovandshrlooppopcallxchgstosdxchginccmpjnz
pushpushpoppushmovandrolshrlooppopcallxchgstosdxchginccmpjnz
Step 0: Mapping program to document
Extract Sequence of operations
Adapting IR for AV
2008 AVAR (New Delhi) 18
Step 1a: Defining features k-perm
PPOPRMASLOCXSXICJ
PPOPMARSLOCXSXICJ
P P O P R M A S L O C X S X I C J
P P O P S L O C X S X I C JRM A
Virus 1
Virus 2
Feature = Permutation of k operations
Adapting IR for AV
AVAR 2008 (New Delhi) 19
P P O P R M A S L O C X S X I C J
P P O P I C JO C X S XM A R S L
P P O P I C JO C X S XM A R S L
P P O P I C JO C X S XM A R S L P O P
Virus 1
Virus 2
Virus 3
Step 1 Example of 3-perm
Adapting IR for AV
AVAR 2008 (New Delhi) 20
POPR OPRM PRMA RMAS MASL POPM OPMA ARSL RSLP SLPO LPOP
1 1 1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 1 1 0 0 0
3 0 0 0 0 0 0 1 1 1 1
1 1
1
PMAR MARS
0 0
0
0 0
0
P O P R M A S L
P O P M A R S L
1
2
3 M A R S L P O P
PMAR MARS
Step 2: Construct feature vectors (4-perms)
AVAR 2008 (New Delhi)
Adapting IR for AV
21
Step 3: Compare vectorsCosine similarity (as before)
Step 4: Match new sample
Vilo: System using IR for AV
AVAR 2008 (New Delhi) 22
Vilo Functional View
AVAR 2008 (New Delhi) 23
Vilo
Malware Collection
0.90
0.82
0.76
0.30
Malware Match
New Sample
Vilo in Action: Query Match
AVAR 2008 (New Delhi) 24
Vilo: Performance
AVAR 2008 (New Delhi) 25
Response time vs Database size
Search on generic desktop: In Seconds
Contrast withBehavior match: In Minutes
Graph match: In Minutes
Vilo Match Accuracy
AVAR 2008 (New Delhi) 26
ROC Curve: True Positive vs False Positive
False Positive
Tru
e P
osit
ive
Vilo in AV Product
AVAR 2008 (New Delhi) 27
Vilo in AV Product
AVAR 2008 (New Delhi) 28
AV ScannerClassifier Classifier ClassifierViloClassifier Classifier
AV Systems: Composed of classifiers
Introduce Vilo as a Classifier
Self-Learning AV Product
AVAR 2008 (New Delhi) 29
ViloClassifier Classifier
How to get malware collection?
Collect malware detected by the Product.
Solution 1
Self-Learning AV Product
AVAR 2008 (New Delhi) 30
ViloClassifier Classifier
Internet Cloud
Vilo
How to get malware collection?
Collect and learn in the cloud
Solution 2
Learning in the Cloud
AVAR 2008 (New Delhi) 31
Vilo ClassifierClassifier Classifier
Internet CloudVilo Learner
How to get malware collection?
Collect and learn in the cloud
Solution 2
Experience with Vilo-Learning
AVAR 2008 (New Delhi) 32
Vilo-in-the-cloud holds promiseCan utilize cluster of workstations
Like GoogleTake advantage of increasing bandwidth and
compute powerEngineering issues to address
Control growth of databaseForget samplesUse “signature” feature vector(s) for familyBe “selective” about features to use
Summary
AVAR 2008 (New Delhi) 33
Weakness of current AV systemExact match over extract
Exploited by creating large number of variants
Information Retrieval research strengthsInexact match over whole
VILO demonstrates IR techniques have promise
Architecture of Self-Learning AV SystemIntegrate VILO into existing AV systemsCreate feedback mechanism to drive learning