52
Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Embed Size (px)

Citation preview

Page 1: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

 

 

  Data Mining Techniques for Malware Detection

 

R. K. Agrawal

 

School of Computer and Systems SciencesJawaharlal Nehru University

NewDelhi-110067 

1

Page 2: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Outline

• Data Mining

• Classification

• Clustering

• Association Rules

• Experimental Results

• Conclusion and Future Work 2

Page 3: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Motivation: “Necessity is the Mother of

Invention• Data explosion problem

– Automated data collection tools lead to tremendous

amounts of data stored in databases and other information

repositories

• We are drowning in data, but starving for knowledge!

• Solution: data mining

– Extraction of interesting knowledge (rules, regularities,

patterns, constraints) from data in large databases

3

Page 4: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Commercial Viewpoint

• Lots of data is being collected and warehoused – Web data, e-commerce– purchases at department/

grocery stores– Bank/Credit Card

transactions

• Computers have become cheaper and more powerful

• Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in

Customer Relationship Management) 4

Page 5: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Scientific Viewpoint

• Data collected and stored at enormous speeds (GB/hour)– remote sensors on a satellite– Network related Log files– microarrays generating gene

expression data– scientific simulations

generating terabytes of data

• Traditional techniques infeasible for raw data

• Data mining may help scientists – in classifying and segmenting data– in Hypothesis Formation

5

Page 6: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

What Is Data Mining?

• Data mining (knowledge discovery in databases):– Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) information or patterns from data in large databases

• Alternative names:– Knowledge discovery(mining) in databases (KDD),

knowledge extraction, data/pattern analysis, data archeology, business intelligence, etc.

6

Page 7: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Data Mining Tasks

• Prediction Tasks– Use some variables to predict unknown or future values

of other variables

• Description Tasks– Find human-interpretable patterns that describe the

data.

Common data mining tasks– Classification [Predictive]

– Clustering [Descriptive]

– Association Rule Discovery [Descriptive]

– Sequential Pattern Discovery [Descriptive]

– Regression [Predictive]

– Deviation Detection [Predictive]7

Page 8: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Classification: Definition

• Given a collection of records (training set )– Each record contains a set of attributes, one of the

attributes is the class label.

• Find a model for class attribute as a function of the values of other attributes.

• Goal: previously unseen records should be assigned a class as accurately as possible.

YX :q

8

Page 9: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

9

Classification—A Two-Step Process

• Model construction: describing a set of predetermined classes– Each tuple/sample is assumed to belong to a predefined

class, as determined by the class label attribute– The set of tuples used for model construction is training

set– The model is represented as classification rules,

decision trees, or mathematical formulae

• Model usage: for classifying future or unknown objects– Estimate accuracy of the model

• The known label of test sample is compared with the classified result from the model

• Accuracy rate is the percentage of test set samples that are correctly classified by the model

– If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known

Page 10: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

10

Process (1): Model Construction

TrainingData

NAME RANK YEARS TENUREDRahul Assistant Prof 3 noMohan Assistant Prof 7 yesDev Professor 2 yesKirti Associate Prof 7 yesSudhir Assistant Prof 6 noArun Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 11: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

11

Process (2): Using the Model in Prediction

Classifier

TestingData

NAME RANK YEARS TENUREDTarun Assistant Prof 2 noManish Associate Prof 7 noJolly Professor 5 yesHarsh Assistant Prof 7 yes

Unseen Data

(Jolly Professor, 5)

Tenured?

Page 12: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Classification: Application

• Malware Detection– Goal: Predict whether the given binary is

Malware or not.– Approach:

• Use both kind of binaries (Normal and Malware)• Learn a model for the class of the binaries.• Use this model to detect malware by observing a

binary.

12

Page 13: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Clustering Definition

• Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that– Data points in one cluster are more similar to

one another.– Data points in separate clusters are less similar

to one another.

• Similarity Measures:– Euclidean Distance if attributes are continuous.– Other Problem-specific Measures.

13

Page 14: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Illustrating Clustering

Euclidean Distance Based Clustering in 3-D space.

Intracluster distancesare minimized

Intracluster distancesare minimized

Intercluster distancesare maximized

Intercluster distancesare maximized

14

Page 15: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Clustering: Application

• Binaries Segmentation:– Goal: subdivide a given set of binaries into

distinct subsets of binaries

15

Page 16: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Association Rule Discovery: Definition

• Given a set of records each of which contain some number of items from a given collection;– Produce dependency rules which will predict occurrence

of an item based on occurrences of other items.

TID Items

1 Bread, Coke, Milk

2 Beer, Bread

3 Beer, Coke, Diaper, Milk

4 Beer, Bread, Diaper, Milk

5 Coke, Diaper, Milk

Rules Discovered: {Bread} --> {Milk} {Diaper} --> {Beer}

Rules Discovered: {Bread} --> {Milk} {Diaper} --> {Beer}

16

Page 17: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

The Sad Truth About Diapers and Beer

• So, don’t be surprised if you find six-packs stacked next to diapers!

17

Page 18: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Association Rule Discovery: Application

• Malware Rules– Goal: To identify activities that are happen

together in a given malware.

18

Page 19: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Sequential Pattern Discovery: Definition

Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events:

– In telecommunications alarm logs, • (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)

– In point-of-sale transaction sequences,• Computer Bookstore:

(Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies)

• Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket)

19

Page 20: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Classification Example

x

2

1

x

x

height

weight2X

Training examples

)},(,),,{( 11 ll yy xx

1x

2x

Jy

Hy Linear classifier:

0)(

0)()q(

bifJ

bifH

xw

xwx

0)( bxw

20

Page 21: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Classification Techniques

• Decision Trees• Naïve Bayes• Support Vector Machines• Neural Networks• Parzen Window• K-nearest neigbor

21

Page 22: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

22

Issues: Data Preparation

• Data cleaning

– Preprocess data in order to reduce noise and handle missing values

• Relevance analysis (feature selection)

– Remove the irrelevant or redundant attributes

• Data transformation

– Generalize and/or normalize data

Page 23: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

23

Issues: Evaluating Classification Methods

• Accuracy– classifier accuracy: predicting class label– predictor accuracy: guessing value of predicted

attributes• Speed

– time to construct the model (training time)– time to use the model (classification/prediction

time)• Robustness: handling noise and missing values• Scalability: efficiency in disk-resident databases • Interpretability

– understanding and insight provided by the model• Other measures, e.g., goodness of rules, such as

decision tree size or compactness of classification rules

Page 24: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

24

Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

This follows an example of Quinlan’s ID3

Page 25: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

25

A Decision Tree for “buys_computer”

age?

overcast

student? credit rating?

<=30 >40

no yes yes

yes

31..40

no

fairexcellentyesno

Page 26: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

26

Algorithm for Decision Tree Induction

• Basic algorithm (a greedy algorithm)– Tree is constructed in a top-down recursive divide-and-

conquer manner– At start, all the training examples are at the root– Attributes are categorical (if continuous-valued, they are

discretized in advance)– Examples are partitioned recursively based on selected

attributes– Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)• Conditions for stopping partitioning

– All samples for a given node belong to the same class– There are no remaining attributes for further partitioning

– majority voting is employed for classifying the leaf– There are no samples left

Page 27: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

27

Attribute Selection Measure: Information Gain (ID3/C4.5)

Select the attribute with the highest information gain

Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|

Expected information (entropy) needed to classify a tuple in D:

Information needed (after using A to split D into v partitions) to classify D:

Information gained by branching on attribute A

)(log)( 21

i

m

ii ppDInfo

)(||

||)(

1j

v

j

jA DI

D

DDInfo

(D)InfoInfo(D)Gain(A) A

Page 28: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

28

Attribute Selection: Information Gain

Class P: buys_computer = “yes”

Class N: buys_computer = “no”

means “age <=30” has

5 out of 14 samples, with 2

yes’es and 3 no’s. Hence

Similarly,

age pi ni I(pi, ni)<=30 2 3 0.97131…40 4 0 0>40 3 2 0.971

694.0)2,3(14

5

)0,4(14

4)3,2(

14

5)(

I

IIDInfoage

048.0)_(

151.0)(

029.0)(

ratingcreditGain

studentGain

incomeGain

246.0)()()( DInfoDInfoageGain ageage income student credit_rating buys_computer

<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

)3,2(14

5I

940.0)14

5(log

14

5)

14

9(log

14

9)5,9()( 22 IDInfo

Page 29: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

29

Computing Information-Gain for Continuous-Value Attributes

• Let attribute A be a continuous-valued attribute

• Must determine the best split point for A

– Sort the value A in increasing order

– Typically, the midpoint between each pair of adjacent values is considered as a possible split point

• (ai+ai+1)/2 is the midpoint between the values of ai and ai+1

– The point with the minimum expected information requirement for A is selected as the split-point for A

• Split:

– D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set of tuples in D satisfying A > split-point

Page 30: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear Classifiers

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

Any of these would be fine..

..but which is best?

30

Page 31: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Support Vector Machine

What we know:• w . x+ + b = +1 • w . x- + b = -1 • w . (x+-x-) = 2

“Predict Class

= +1”

zone

“Predict Class

= -1”

zonewx+b=1

wx+b=0wx+b=-1

X-

x+

ww

wxxM

2)(

M=Margin Width Support

Vectors are those datapoints that the margin pushes up against

31

Page 32: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear SVM Mathematically• Goal: 1) Correctly classify all training data if yi = +1

If yi = -1

for all i

2) Maximize the Margin

same as minimize

• We can formulate a Quadratic Optimization Problem and solve for w and b

Minimize

subject to

wM

2

www t

2

1)(

1bwxi1bwxi

1)( bwxy ii

1)( bwxy ii

i

wwt

2

1

32

Page 33: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear SVM. Cont.

• Requiring the derivatives with respect to w,b to vanish yields:

• KKT conditions yield:

• Where:

i

xx

i

m

ii

m

i

m

jji

m

ii

0

0y:toSubject

,yy2

1maximize

1

i

1 1

jiji

1

iii xwybanyfor ,,0

m

i

iii xyw

1

33

Page 34: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear SVM. Cont.

• The resulting separating function is:

bxxyxfm

i

iii

1

,sgnsgn

34

Page 35: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear SVM. Cont.

• Requiring the derivatives with respect to w,b to vanish yields:

• KKT conditions yield:

• Where:

i

xx

i

m

ii

m

i

m

jji

m

ii

0

0y:toSubject

,yy2

1maximize

1

i

1 1

jiji

1

iii xwybanyfor ,,0

m

i

iii xyw

1

35

Page 36: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Linear SVM. Cont.

• The resulting separating function is:

• Notes:– The points with α=0 do not affect the solution.– The points with α≠0 are called support vectors.– The equality conditions hold true only for the Support

Vectors.

bxxyxfm

i

iii

1

,sgnsgn

36

Page 37: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Non-separable case

• The modifications yield the following problem:

iC

xx

i

m

ii

m

i

m

jji

m

ii

0

0y:toSubject

,yy2

1maximize

1

i

1 1

jiji

1

37

Page 38: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Non Linear SVM

• Note that the training data appears in the solution only in inner products.

• If we pre-map the data into a higher and sparser space we can get more separability and a stronger separation family of functions.

• The pre-mapping might make the problem infeasible.

• We want to avoid pre-mapping and still have the same separation ability.

• Suppose we have a simple function that operates on two training points and implements an inner product of their pre-mappings, then we achieve better separation with no added cost.

38

Page 39: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Non-linear SVMs: Feature spaces

• General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

39

Page 40: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

The “Kernel Trick”

• The linear classifier relies on inner product between vectors K(xi,xj)=xi

Txj

• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes:

K(xi,xj)= φ(xi) Tφ(xj)

• A kernel function is a function that is equivalent to an inner product in some feature space.

• Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi

Txj)2,

Need to show that K(xi,xj)= φ(xi) Tφ(xj):

K(xi,xj)=(1 + xiTxj)2

,= 1+ xi12xj1

2 + 2 xi1xj1 xi2xj2+ xi2

2xj22 + 2xi1xj1 + 2xi2xj2=

= [1 xi12 √2 xi1xi2 xi2

2 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj2

2 √2xj1 √2xj2] =

= φ(xi) Tφ(xj), where φ(x) = [1 x1

2 √2 x1x2 x22 √2x1 √2x2]

• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x) explicitly).

40

Page 41: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Mercer Kernels

• A Mercer kernel is a function:

for which there exists a function:

such that:

• A function k(.,.) is a Mercer kernel if for any function g(.), such that:

the following holds true:

dxxg )(2

0),()()( dxdyyxkygxg

RXXk dd :

HX d :

)(),(),(, yxyxkXyx d

41

Page 42: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Commonly used Mercer Kernels

• Homogeneous Polynomial Kernels:

• Non-homogeneous Polynomial Kernels:

• Radial Basis Function (RBF) Kernels:

pyxyxk 1,),(

2exp),( yxyxk

pyxyxk ,),(

42

Page 43: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Solution of non-linear SVM• The problem:

• The separating function:

iC

xxk

i

m

ii

m

i

m

jji

m

ii

0

0y:toSubject

,yy2

1maximize

1

i

1 1

jiji

1

bxxkyxfm

i

iii

1

,sgnsgn

43

Page 44: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

Multi-Class SVM

• Approaches:

• One against One ( K (K-1) / 2 ) binary Classifiers required

Outputs of the classifiers are aggregated to make the final

decision.

• One against All (K binary Classifiers required):

It trains k binary classifiers, each of which separates one class

from the other (k-1) classes. Given a data point X , the binary

classifier with the largest output determines the class of X.

44

Page 45: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

45

Why Is SVM Effective on High Dimensional Data?

The complexity of trained classifier is characterized by the #

of support vectors rather than the dimensionality of the data

The support vectors are the essential or critical training

examples —they lie closest to the decision boundary (MMH)

If all other training examples are removed and the training is

repeated, the same separating hyperplane would be found

The number of support vectors found can be used to

compute an (upper) bound on the expected error rate of the

SVM classifier, which is independent of the data

dimensionality

Thus, an SVM with a small number of support vectors can

have good generalization, even when the dimensionality of

the data is high

Page 46: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

46

Experiments

• Source of data: Preprocessed data in terms of API Calls taken from data collected from C-Dac Mohali.

• Description of data

Sample Space

Training set

Testing set

Benign 534 50 484

Malicious 168 50 118

Total 702 100 602

Page 47: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

47

Classifier Accuracy Measures

• Performance measuressensitivity = t-pos/pos /* true positive recognition rate */specificity = t-neg/neg /* true negative recognition rate

*/

accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)

C1 C2

C1 True positive False negative

C2 False positive

True negative

Page 48: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

48

Experimental Results

Classifier sensitivity specificity

k=5 K=6 K=7 K=5 K=6 K=7

C4.5 70.86 71.23 69.68 68.62 69.96 61.05

SVM 75.26 76.79 75.18 73.54 78.34 74.46

Page 49: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

49

Observations

• The performance of SVM classifier is significantly better in comparison to C4.5.

• The performance is dependent on the size of feature size

• SVM requires less training samples in comparison C4.5. Hence, svm is a better choice as collecting malicious samples is difficult.

Page 50: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

50

Conclusion & Future Work

• SVM is a better classification technique which can be used for detection of Malware.

• Needs attention to construct better feature representation for better generalization

• How to extend it to multi-class malware problem

Page 51: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

51

References

• C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2): 121-168, 1998.

• J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

• P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison Wesley, 2005.

• I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2ed. Morgan Kaufmann, 2005.

• Han and Kamber, Data Mining Concepts• B. Zhang, J. Yin, J. Hao, D. Zhang, S. Wang, Using Support Vector Machine to

detect unknown computer viruses, Int. Journal of Computational Intelligence Research, vol. 2, No. 1, pp. 100-104, 2006.

• Szappanos,G.: Are There Any Polymorphic Macro Viruses at ALL (and What to Do with Them).in Proceedings of the 12th International Virus Bulletin Conference, 2001.

• Forrest,S., Hofmeyr, S. A., Somayaji, A.: Computer immunology. Communications of the ACM. 10, pp. 88–96, 1997.

• Lee,W., Dong,X.: Information-Theoretic measures for anomaly detection. In: Needham,R., Abadi M, (eds):. Proceedings of the 2001 IEEE Symposium on Security and Privacy Oakland, CA: IEEE Computer Society Press, pp. 130-143, 2001.

• LIBSVM. http://www.csie.ntu.edu.tw/~cjlin/.

Page 52: Data Mining Techniques for Malware Detection R. K. Agrawal School of Computer and Systems Sciences Jawaharlal Nehru University NewDelhi-110067 1

52

References (4)

• P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison Wesley, 2005.

• S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.

• S. M. Weiss and N. Indurkhya. Predictive Data Mining. Morgan Kaufmann, 1997.

• I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2ed. Morgan Kaufmann, 2005.

• X. Yin and J. Han. CPAR: Classification based on predictive association rules. SDM'03

• H. Yu, J. Yang, and J. Han. Classifying large data sets using SVM with hierarchical clusters. KDD'03.