HUNTING FOR METAMORPHIC ENGINES

HUNTING FOR METAMORPHIC ENGINES

Mark Stamp&

Wing Wong

September 13, 2006

Outline

I. Metamorphic software Both good and evil uses

II. Metamorphic virus construction kitsIII. How effective are metamorphic engines?

How to compare two pieces of code? Similarity within and between virus families Similarity to non-viral code

IV. Can we detect metamorphic viruses? Commercial virus scanners Hidden Markov models (HMMs) Similarity index

V. Conclusion

PART I

Metamorphic Software

What is Metamorphic Software?

Software is metamorphic provided All copies do the same thing Internal structure of copies differs

Today almost all software is cloned “Good” metamorphic software…

Mitigate buffer overflow attacks “Bad” metamorphic software…

Avoid virus/worm signature detection

Metamorphic Software for Good?

Suppose program has a buffer overflow If we clone the program

One attack breaks every copy Break once, break everywhere (BOBE)

If instead, we have metamorphic copies Each copy still has a buffer overflow One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology

A little metamorphism does a lot of good!

Metamorphic Software for Evil?

Cloned virus/worm can be detected Common signature on every copy Detect once, detect everywhere (DODE?)

If instead virus/worm is metamorphic Each copy has different signature Same detection does not work against every copy Provides DODE-resistance Analogous to genetic diversity in biology

But, effective use of metamorphism here is tricky!

Crypto Analogy

In information security, almost everything that consistently works is either Crypto, or Has a crypto analogy…

Consider WWII ciphers German Enigma

Broken by Polish and British cryptanalysts Design was (mostly) known to cryptanalysts

Japanese Purple Broken by American cryptanalysts Design was (mostly) unknown to cryptanalysts

Crypto Analogy

Cryptanalysis break a (known) cipher Diagnosis determine how an unknown

cipher works (from ciphertext) Which was the greater achievement,

breaking Enigma or Purple? Cryptanalysis of Enigma was harder Diagnosis of Purple was harder

Can make a reasonable case for either…

Crypto Analogy

What does this have to do with metamorphic software?

Suppose we (the good guys) generate metamorphic copies of our software

Bad guys can attack individual copies Can bad guys attack all copies?

Bad guys can try to diagnose our metamorphic generator

Crypto Analogy

How to diagnose metamorphic generator (from exe’s)? Reverse engineer many copies, look at

differences, etc., etc. Lots of work

Diagnosis problem is hard If good guys can force bad guys to solve a

diagnosis problem, the good guys win Security by obscurity? Violates (spirit of)

Kerckhoffs’ Principle? Yes, but still may be valuable in the real world

Crypto Analogy

What about case where bad guys write metamorphic code? Metamorphic viruses, for example

Do good guys need to solve diagnosis problem? If so, good guys are in trouble

Not if good guys “only” need to detect the metamorphic code (not diagnose)

Not claiming the good guys job is easy Just claiming that there is hope…

Virus Evolution

Viruses first appeared in the 1980s Fred Cohen

Viruses must avoid signature detection Virus can alter its “appearance”

Techniques employed encryption polymorphic metamorphic

Virus Evolution - Encryption

Virus consists of decrypting module (decryptor) encrypted virus body

Different encryption key different virus body signature

Weakness decryptor can be detected

Virus Evolution – Polymorphism

Try to hide signature of decryptor Can use code emulator to decrypt

putative virus dynamically Decrypted virus body is constant

Signature detection is possible

Virus Evolution – Metamorphism

Change virus body Mutation

techniques: permutation of

subroutines insertion of

garbage/jump instructions

substitution of instructions

PART II

Virus Construction Kits

Virus Construction Kits – PS-MPC

According to Peter Szor:“… PS-MPC [Phalcon/Skism Mass-Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”

Virus Construction Kits – G2

From the documentation of G2 (Second Generation virus generator):

“… different viruses may be generated from identical configuration files…”

Virus Construction Kits - NGVCK

From the documentation for NGVCK (Next Generation Virus Creation Kit):

“… all created viruses are completely different in structure and opcode…… impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code”

Oh, really?

PART III

How Effective Are Metamorphic Engines?

How We Compare Two Pieces of Code

Opcode sequences Score

0 call1 pop2 mov3 sub

… m-1 m-1

… score = n-1 jmp average

% match

0 push 0 n-1 0 n-11 mov2 sub3 and

…

…m-1 retn

Program X

Graph of real matches

Program Y Program Y

(lines with length > 5)(matching 3 opcodes)Assembly programs

Program X

Graph of matches

Program X

Program Y

Virus Families – Test Data

Four generators, 45 viruses 20 viruses by NGVCK 10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN

20 normal utility programs from the Cygwin bin directory

Similarity within Virus Families – Results

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 50 100 150 200

Comparison number

Similarity score

NGVCK viruses

Normal files


NGVCK G2 VCL32 MPCGEN Normalmin 0.01493 0.62845 0.34376 0.44964 0.13603max 0.21018 0.84864 0.92907 0.96568 0.93395average 0.10087 0.74491 0.60631 0.62704 0.34689

Minimum, maximum, and average similarity scores


Size of bubble = average similarity

NGVCK

G2VCL32 MPCGENNormal

0

0.2

0.4

0.6

0.8

1

1.2

-0.2 0 0.2 0.4 0.6 0.8

Minmum similarity score

Maximum similarity score

NGVCK

G2

VCL32

MPCGEN

Normal


IDA_ NGVCK0-

IDA_ NGVCK8 (11.9%)

IDA_G4- IDA_G7 (75.2%)


IDA_VCL0-

IDA_VCL9

(60.2%)

IDA_MPC1-

IDA_MPC3

(58.0%)

NGVCK Similarity to Virus Families

NGVCK versus other viruses 0% similar to G2 and MPCGEN viruses 0 – 5.5% similar to VCL32 viruses (43

out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8

out of 400 comparisons have score > 0)

NGVCK Metamorphism/Similarity

NGVCK By far the highest degree of

metamorphism of any kit tested Virtually no similarity to other viruses

or normal programs Undetectable???

PART IV

Can Metamorphic Viruses Be Detected?

Commercial Virus Scanners

Tested three virus scanners eTrust version 7.0.405 avast! antivirus version 4.7 AVG Anti-Virus version 7.1

Each scanned 37 files 10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses

Commercial Virus Scanners

ResultseTrust and avast! detected 17

(G2 and MPCGEN)AVG detected 27 viruses (G2,

MPCGEN and VCL32)none of NGVCK viruses detected

by the scanners tested

Hidden Markov Models (HMMs)

state machines transitions between states have fixed

probabilities each state has a probability distribution for

observing a set of observation symbols can “train” an HMM to represent a set of

data (in the form of observation sequences) states = features of the input data transition and the observation probabilities

= statistical properties of features

HMM Example – the Occasionally Dishonest Casino

1: 1/6 0.05 1: 1/100.95 2: 1/6 2: 1/10 0.9

3: 1/6 3: 1/104: 1/6 4: 1/105: 1/6 5: 1/106: 1/6 0.1 6: 1/2

Fair Loaded

HMM Example – the Occasionally Dishonest Casino

2 states: fair/loaded The switch between dice is a Markov

process Outcomes of a roll have different

probabilities in each state If we can only see a sequence of rolls, the

state sequence is hidden want to understand the underlying

Markov process from the observations

HMMs – the Three Problems

1. Find the likelihood of seeing an observation sequence O given a model , i.e. P(O | )

2. Find an optimal state sequence that could have generated a sequence O

3. Find the model parameters given a sequence O, i.e. find transition and observation probabilities that maximize the probability of observing O

There exist efficient algorithms to solve the three problems

HMM Application – Determining the Properties of English Text

Given: a large quantity of written English text

Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space)

Train a model to find the most probable parameters (i.e., solve Problem 3)

Use trained model to score any unknown sequence of letters (and spaces) to determine whether it corresponds to English text. (i.e., solve Problem 1)

HMM Application – Initial and Final Observation Probability Distributions

HMM Application - Results

Observation probabilities converged, each letter belongs to one of the two hidden states

The two states correspond to consonants and vowels

Note: no a priori assumption was made HMM effectively recovered the statistically

significant feature inherent in English

HMM Application - Results

Probabilities can be sensibly interpreted for up to n = 12 hidden states

Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformation

Virus Detection with HMMs

Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants Train the model on family of

metamorphic viruses Use trained model to determine

whether a given program is similar to the viruses the HMM represents


A trained HMM maximizes the probabilities of

observing the training sequence assigns high probabilities to sequences

similar to the training sequence represents the “average” behavior if

trained on multiple sequences represents an entire virus family, as

opposed to individual viruses

Virus Detection with HMMs – Data

Data set 200 NGVCK viruses (160 for training, 40

for testing) Comparison set

40 normal exes from Cygwin 25 other “non-family” viruses (G2,

MPCGEN and VCL32) 25 HMM models generated and

tested

Virus Detection with HMMs – MethodologyTraining:

(1)Training set(160 files) (2) Training (4)

Threshold

(3)

Data Set

(1) Test set Normal programs(40 files) (40 files)

Other viruses(25 files)

Comparison Set

Classifying:

(3) Scoring

(1) Scoring LLPO > Threshold ?

HMM

Scores (LLPO) virus0 -2.0 virus1 -2.3 :

:

random0 -11.3 :

other0 -8.9

HMMProgram A

Virus Detection with HMMs – Results

Test set 0, N = 2

-160

-140

-120

-100

-80

-60

-40

-20

0

0 10 20 30 40

File number

Score (LLPO)

family viruses

normal files

Virus Detection with HMMs – Results

Detect some other viruses “for free”

Test set 0, N = 3

-180

-160

-140

-120

-100

-80

-60

-40

-20

0

0 10 20 30 40

File number

Score (LLPO)

familyviruses

non-familyviruses

normalfiles


Summary of experimental results All normal programs distinguished VCL32 viruses had scores close to

NGVCK family viruses With proper threshold, 17 HMM models

had 100% detection rate and 10 models had 0% false positive rate

No significant difference in performance between HMMs with 3 or more hidden states

Virus Detection with HMMs – Trained Models

Converged probabilities in HMM matrices may give insight into the features of the represented viruses

We observe opcodes grouped into “hidden” states most opcodes in one state only

What does this mean? We are not sure…

HMMs – The Trained Models

popretnpushjbrcljnbjadivadcrorshrrol

addsar

boundcmpsbretfmovxordecnotimul

movsbstosdlodswlodsdlodsbin

repemovsdfnstenv

cmcjns jle clc rcr fildout

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

observation probability

opcode

state 0

state 1

state 2

Detection via Similarity Index

Straightforward similarity index can be used as detector To determine whether a program belongs

to the NGVCK virus family, compare it to any randomly chosen NGVCK virus

NGVCK similarity to non-NGVCK code is small

Can use this fact to detect metamorphic NGVCK variants


Threshold determination:Pairwise comparison Scoring

Virus V Subset of D (randomly (randomly

chosen) chosen)Virus 0Virus 1

Data set D :Virus X

Classifying:Scoring

Similarity score > Threshold ? Yes => family virus

Virus V No => not family virus

Similarity scores Virus 0 0.035 Virus 1 0.041 : : Virus X 0.189

Program A


Experiment compare 105 programs to one

selected NGVCK virus Results

100% detection, 0% false positive Does not depend on specific

NGVCK virus selected

PART V

Conclusion

Conclusion

Metamorphic generators vary a lot NGVCK has highest metamorphism

(10% similarity on average) Other generators far less effective (60%

similarity on average) Normal files 35% similar, on average

But, NGVCK viruses can be detected! NGVCK viruses too different from other

viruses and normal programs

Conclusion

NGVCK viruses not detected by commercial scanners we tested

Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy

NGVCK viruses also detectable by similarity index

Conclusion

All metamorphic viruses tested were detectable because High similarity within family and/or Too different from normal programs

Effective use of metamorphism by virus/worm requires A high degree of metamorphism and

similarity to other programs This is not trivial!

Conclusion

How practical is our detection method?

We “cheat” in several ways Use IDA to disassemble Viruses not embedded in other code Limited testing, small no. of files, etc.

But results appear to be robust If so, we can be “sloppy” (i.e., more

efficient) and still get good results

The Bottom Line

Metamorphism for “good” For example, buffer overflow mitigation A little metamorphism does a lot of good

Metamorphism for “evil” For example, try to evade virus/worm

signature detection Requires high degree of metamorphism

and similarity to normal programs Not impossible, but not easy…

The Bottom Bottom Line

For metamorphic software, perhaps the inherent advantage lies with the good guys rather than the bad guys

All-too-often in information security, the advantage lies with the bad guys

References

X. Gao, Metamorphic software for buffer overflow mitigation, MS thesis, Dept. of CS, SJSU, 2005

P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005

M. Stamp, Information Security: Principles and Practice, Wiley InterScience, 2005

W. Wong, Analysis and detection of metamorphic computer viruses, MS thesis, Dept. of CS, SJSU, 2006

W. Wong and M. Stamp, Hunting for metamorphic engines, to appear in Journal in Computer Virology

Appendix

Extra Materials

HMMs – Run Time of Training Process

5 to 38 minutes, depending on number of states n.

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7

Number of states N

Training time (seconds)

500 iterations

800 iterations

HMMs – Run Time of Classifying Process 0.008 to 0.4 milliseconds, depending on N and number of opcodes

T .

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 500 1000 1500

Length of observation sequence T

Scoring time (milliseconds)

N = 2

N = 3

N = 4

N = 5

N = 6

AVG Anti-Virus Scanning Result

Documents

HUNTING FOR METAMORPHIC ENGINES