10
International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013 DOI : 10.5121/ijfls.2013.3205 63 SOFT-COMPUTING VIRUS IDENTIFICATION SYSTEM *Obi J.C 1 and Okpor D.M 2 Department of Computer Science, University of Benin, Benin City. Nigeria 1 . Department of Computer Science, Delta State, Polytechnic, Ozoro 2 [email protected] 1 &[email protected] 2 ABSTRACT Virus is one of the world’s deadliest malicious software. One of the known attribute of virus is the ability of reproducing itself and spreading from host to host. Several approaches in recognizing virus has not been completely objective in nature. In other to achieve this, developing a soft-computing system will help bring precision, learning ability and optimal solution which is the focal point of this research. KEYWORDS: Virus, fuzzy-logic, genetic Algorithm, Neural Network 1.0 INTRODUCTION A computer virus is an executable computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly, but erroneously, used to refer to other types of malware, including but not limited to adware and spyware programs that do not have a reproductive ability (Hkust, 2002 and Wikipedia, 2013). Virus usually reside on a computer program disrupting computer activities without replicating itself while a computer worm must replicate itself consuming computer resources such as memory space and cpu processing time. Examples of computer virus are malware, worms, and key-loggers etc. The majority of active malware threats are usually trojans or worms rather than viruses (Hkust, 2002). Malware such as Trojan horses and worms is sometimes confused with viruses, which are technically different: a worm can exploit security vulnerabilities to spread itself automatically to other computers through networks, while a trojan-horse is a program that appears harmless but hides malicious functions. Worms and trojan-horses, like viruses, may harm a computer system's data or performance. The criteria for identifying a virus are usually noticeable through abnormal system behaviors (Hkust, 2012). Virus can cause considerable harm and damage to a computer system, which could affect the hard disk, the main operating system and other software relevant to system functionalities. Virus can be prevented by strong anti-virus and regular update to harness the recent patches to prevent

Ijfls05

  • Upload
    ijfls

  • View
    207

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

DOI : 10.5121/ijfls.2013.3205 63

SOFT-COMPUTING VIRUS

IDENTIFICATION SYSTEM

*Obi J.C

1 and Okpor D.M

2

Department of Computer Science, University of Benin, Benin City. Nigeria1.

Department of Computer Science, Delta State, Polytechnic, Ozoro2

[email protected]&[email protected]

ABSTRACT

Virus is one of the world’s deadliest malicious software. One of the known attribute of

virus is the ability of reproducing itself and spreading from host to host. Several

approaches in recognizing virus has not been completely objective in nature. In other to

achieve this, developing a soft-computing system will help bring precision, learning

ability and optimal solution which is the focal point of this research.

KEYWORDS:

Virus, fuzzy-logic, genetic Algorithm, Neural Network

1.0 INTRODUCTION A computer virus is an executable computer program that can replicate itself and spread from one

computer to another. The term "virus" is also commonly, but erroneously, used to refer to other

types of malware, including but not limited to adware and spyware programs that do not have a

reproductive ability (Hkust, 2002 and Wikipedia, 2013). Virus usually reside on a computer

program disrupting computer activities without replicating itself while a computer worm must

replicate itself consuming computer resources such as memory space and cpu processing time.

Examples of computer virus are malware, worms, and key-loggers etc. The majority of active

malware threats are usually trojans or worms rather than viruses (Hkust, 2002). Malware such as

Trojan horses and worms is sometimes confused with viruses, which are technically different: a

worm can exploit security vulnerabilities to spread itself automatically to other computers

through networks, while a trojan-horse is a program that appears harmless but hides malicious

functions. Worms and trojan-horses, like viruses, may harm a computer system's data or

performance. The criteria for identifying a virus are usually noticeable through abnormal system

behaviors (Hkust, 2012).

Virus can cause considerable harm and damage to a computer system, which could affect the hard

disk, the main operating system and other software relevant to system functionalities. Virus can

be prevented by strong anti-virus and regular update to harness the recent patches to prevent

Page 2: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

64

exploitation of system vulnerabilities. There are many types of computer viruseswhich include

(Hkust, 2002):

a. File virus: Virus which attaches itself to program file (executable file) is known as file

virus).

b. Boot sector virus: Virus which affects system sector is known as boot virus. These system

sectors includes hard-disk, floppy disk etc).

c. Macro Virus: Virus writtenuses a language built into a particular application

d. Virus Hoax: A virus in the form of a compulsory non-existent message intended in

bending the user to carry a particular.

A complete soft-computing system for the recognition of computer virus utilizing genetic

algorithm, fuzzy logic and neural network drives this research hunt. Genetic algorithm will help

optimize the fuzzy set to arrive at a floor or roof value representing the clusters of the

membership function. The parameters for identifying computer virus are usually vague therefore

the adoption of fuzzy to set a boundary case while neural network provide the self-learning ability

(Hkust, 2002).

2.0 REVIEW OF RELATED LITERATURES The research work of (Nitesh et al., 2012) provides a general overview on computer viruses and

defensive techniques. They focused on reviewing the techniques utilized by anti-virus experts

design in and develop new antivirus (Babak et al., 2011) which was utilized in proposing their

own work.

The research work on the regulation of virus by (Mathias, 2003) took a critical look phenomena

of computer virus computer virus helping legislator both locally and internationally prohibit it

spread and influence. It also exploits the alternative use of virus for which a legislator should be

aware of when processing the legal status of computer code.

The criteria’s/parameters for recognizing and identifying virus are (Arnold, 2012)

a. Decline system speed: The speed of most system is closely linked with virus infection or

not. Most system infected with virus usually run slower disrupting relevant daily operation.

If the user disconnects from the internet and the system performs improve tremendously

and perform decline gradually while connected to the internet, it is a sure sign of virus

infection.

b. Numerous system errors: When a user experiences strange errors or just more errors that

disrupt daily operation such asclose your programs or causing the user to restart your

computer, there is a high possibility of virus infection. Sometimes errors are normal, but

constant occurrence is problematic. By keeping a log of errors experiences, when it

occurred and the severity can help mitigate the issues.

c. Persistent Logoff: when the computer system continuously and persistently logoff without

the user initiating it indicates the present of virus.

Page 3: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

65

d. Disable security software: When updated anti-virus software, firewall or window updates

has been change or disabled not by the user, virus is sequentially at work. Lack of updated

anti-virus, open the way for to exploit system vulnerabilities.

e. Abnormal Internet system Behaviors: The abnormal behavior of system when connected

to the Internet is another sign of virus infection. Virus is malicious codes therefore,

compromising sensitive document are usually one of their main purpose but this is only

possible when online.

f. Obvious desktop changes: Desktop icons are the small graphics on most computers

desktop. Under normal circumstances, a desktop icon only appears when a piece of

software is loaded, ask the program to load an icon, or when it is dragged from the icon of a

program list and drop it to on the desktop. The present of virus introduce desktop icons that

were not requested for.

g. Continuous Hard Drive noise: The retrieval of information from the system hard disk,

disk drive and external hard disk is usually accompanied by a slow processing noise which

is normal for any performing system. The continuous performs of this noise with or without

computer operation highlights computer virus infection.

Zadeh introduced fuzzy logic from which fuzzy set steams out as a means of handling

imprecision in real work data (Zadeh, 1965). Crisp values which propagate true or false values,

yes or no, and 0 or 1. Fuzzy provides ranges of values streaming from 0.00 through 1.00 Fuzzy

sets provide a means of representing and manipulating data that are not precise, but rather fuzzy

(Cengiz et al., 2007). Fuzzy logic is a superset of conventional (Boolean) logic that has been

extended to handle the concept of partial truth - truth values between "completely true" and

"completely false"(Abdelnassir, 2005; Kasabov, 1998; Robert, 1995; Christos and Dimitros,

2008, Leondes, 2010). Fuzzy logic works closely with neural network because they provide the

rules necessary for neural net trainer to adapt to in carrying out a particular action. They also

learn for close observations.

Integrated neurons form a neural net (Simon, 1999). This neuron is program construct mimicking

the biological parallel working of the human brain. There are different type of neural work such

supervised learning, unsupervised and reinforced learning rolling along several topology such as

feed-forward, feedback, adaptive resonant theory kohen self-organizing and counter-propagation.

The internal and external information flowing through the network results in continuous system

refinement leading optimal system results.

Genetic Algorithm is a search and optimization techniques made up of four main components

which include:

a. The fitness function, also called evaluation function, is the genetic algorithm component

that rates potential solution by calculating how good they are relative to the current

problem domain (Christian and Gustav 2003).

b. Selection enhances the movement of string of bit toward optimal genetic search space.

The individual bit is cross with other individual bit aim order to arrive at our optimal

fitness function.A lower selection pressure is recommended in the start of the genetic

search; while a higher selection pressure is recommended at the end in order to narrow

Page 4: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

66

c. Mutation operator changes one or more gene values in a chromosome. Mutation means

taking, one or more random chromosomes from the DNA string and changing it (for

example, 0 become 1 and vice versa).

d. Crossover, involves exchanging chromosome portions of genetic material. The result is

that a large variety of genetic combinations will be produced. In other words, crossover

takes two of the fittest genetic strings in a population and combines the two of them to

generate new genetic string (Christian and Gustav 2003).

3.0 MATERIAL METHOD, RESULT AND DISCUSSION

Seven (7) basic and major parameters were utilized in achieving our soft computing virus

identification system; these are presented in Table 1.

Table 1: Parameters for Virus Identification

Code Parameters

P01 Decline system speed

P02 Numerous system errors

P03 Persistent logoff

P04 Disable security software

P05 Abnormal Internet system behaviors

P06 Obvious desktop changes

P07 Continuous Hard Drive noise

Page 5: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

67

OUTPUT

The fuzzy partition for each input feature consists of the parameters of virus

identification. The rules that can be generated from the initial fuzzy partitions of the

classification of are thus

Knowledge-base

Database Engine

Neural Network

Fuzzy Logic

Decline system speed, numerous

system errors, persistent logoff,

Disable security software, abnormal

Internet system behaviors, obvious

desktop changes, and Continuous

Hard Drive noise

Structured

information

Unstructured

information

Genetic

Optimization

Page 6: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

68

a. Virus Infected System (Class: C1)

b. Moderately Secured system (Class: C2)

c. Secured System (Class: C3)

If the system is exhibiting less than or equal to two (2) of the parameters of virus

detection THEN (C1), if the system is exhibiting three (3) of the parameters of virus

identification THEN (C2) and if the system is exhibiting four (4) or more parameters of

virus identification THEN (C3)

The Fuzzy IF-THEN Rules (Ri) for Virus identification is thus:

R1: IF the system is exhibiting any of the parameters for virus identification such as

decline system speed THEN it is in class C1. R2: IF the system is exhibiting two of the parameters for virus identification such as

decline system speed and numerous stem errors THEN it is in class C1.

R3: IF the system is exhibiting three of the parameters for virus identification such as

decline system speed, numerous stem errors and persistent logoff THEN it is in

class C2.

R4: IF the system is exhibiting four of the parameters for virus identification such as

decline system speed, numerous stem errors, persistent logoff and disable security

software THEN it is in class C3.

R5: IF the system is exhibiting five of the parameters for virus identification such as

decline system speed, numerous stem errors, persistent logoff, disable security

software and obvious desktop change THEN it is in class C3. R6: IF the system is exhibiting six of the parameters for virus identification such as

decline system speed, numerous stem errors, persistent logoff, disable security

software, obvious desktop change and abnormal internet system behaviors THEN

it is in class C3. R7: IF the system is exhibiting seven of the parameters for virus identification such as

decline system speed, numerous stem errors, persistent logoff, disable security

software, obvious desktop change, abnormal internet system behaviors and

continuous hard change THEN it is in class C3.

A typical data set that contains the seven parameters is presented in Table 2. This shows

the degree of intensity (membership) of virus identification.

Page 7: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

69

Table 2:Data Set showing the Degree of membership of virus identification

The Genetic algorithm utilizes the following conditions to determine when to stop: Generations

or Fitness limit. In this case, we used the number of generation (4th generation) to determine the

stopping criterion.

Genetic Algorithm Inference:

R1: IF R01 THEN C1 = 0.50

R2: IF R01 AND R02 THEN C2 = 0.18

R3: IF R01, R02 AND R03 THEN C2 = 0.38

R4: IF R01, R02, R03 AND R04 THEN C3 = 0.44

R5: IF R01, R02, R03, R04 AND R05 THEN C3 = 0.37

R6: IF R01, R02, R03, R04, R05 AND R06 THEN C3 = 0.46

R7: IF R01, R02, R03, R04, R05, R06 AND RO7 THEN C3 = 0.46

We then convert these resolved values into whole numbers and imply them to be the

fitness function (f) of the initial generation (Parents)

R1: 50, R2: 18, R3: 38 R4: 44 R5: 37 R6: 46

R7: 46

Parameters Or Fuzzy Sets

Of virus identification

Codes

Degree of Membership

Cluster 1

(C1)

Cluster 2

(C2)

Cluster 3

(C3)

Decline system speed R01 0.50 0.15 0.35

Numerous stem errors R02 0.20 0.20 0.60

Persistent logoff R03 0.10 0.80 0.10

Disable security software R04 0.20 0.10 0.70

Obvious desktop change R05 0.30 0.60 0.10

Abnormal internet system behaviors R06 0.05 0.05 0.90

Continuous hard change R07 0.00 0.50 0.50

Page 8: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

70

S/N Selection Chromosomes (Binary; 0 or 1) Fitness

function Parent (1

st Gen) Crossover Parent (2

nd Gen)

1 50 110010 1 & 6 110101 53

2 46 101110 2 & 4 101100 44

3 46 101110 Mutation 101100 44

4 44 101100 2 & 4 101110 46

5 38 100110 5 & 7 100010 34

6 37 100101 1 & 6 100010 34

7 18 010010 5 & 7 010110 22

Table 3: 1st and 2

nd Generation Table

Table 4: 2nd

and 3rd

Generation Table

Table 5: 3rd

and 4th

Generation Table

To create our 2nd

and 3rd

generation from the parents (1st generation) we chose the third bit from

the left to be our crossover point. In the 4th generation each bold bit signifies the cross-over bits, a

S/N

Selection Chromosomes (Binary; 0 or 1) Fitness

function Parent (2

nd Gen) Crossover Parent (3

rd Gen)

1 53 110101 1 & 3 110100 52

2 46 101110 2 & 6 101010 42

3 44 101100 1 & 3 101101 45

4 44 101100 4 & 5 101010 42

5 34 100010 4 & 5 100100 36

6 34 100010 2 & 6 100110 38

7 22 010110 Mutation 010100 20

S/N

Selection Chromosomes (Binary; 0 or 1) Fitness

function Parent (3rd

Gen) Crossover Parent (4th

Gen)

1 52 110100 Mutation 110110 54

2 45 101101 2 & 3 101110 46

3 42 101010 2 & 3 101001 41

4 42 101010 6 & 4 101000 40

5 38 100110 5 & 7 100100 40

6 36 100100 6 & 4 100110 38

7 20 010100 5 & 7 010110 22

Page 9: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

71

single bold bit signifies mutation of that bit and an italicized chromosomes signifies elitism. The

best fourth generation (stopping criterion) is that with the best fitness function, 54. This implies

that the clusters of the various parameters has been searched and optimized to 0.54. Therefore

combination of parameters which produce a membership function < 0.54 = C1, 0.54 = C2 and ≥

0.54 = C3 as presented in Figure 4.

Table 6: Data Set showing the Degree of membership of Operating System Process

4.0 CONCLUSION

A complete soft-computing client-side system for the identification of virus has been developed.

The developed system will indicate the recent state of a user system based on the aforementioned

parameters for the recognition of virus. Full Implementation of such system will go a long way in

curbing the spread of virus on computer systems.

REFERENCE [1] AbdelnassirA. (2005), “Torque Ripple Minimization in Direct Torque Control of Induction

Machines” A Thesis Presented to The Graduate Faculty of the University of Akron In Partial

Fulfillment of the Requirements for the Degree Master of Science, retrieved online from

http://etd.ohiolink.edu/send-pdf.cgi/Abdalla%2520Abdelnassir.pdf?akron1116121267

[2] Arnold A. (2012), “Computer Virus Symptom List”, retrieved online from

http://www.ehow.com/list_6594924_computer-virus-symptomlist.html#ixzz2Mrmvxn6o.

Parameters Or Fuzzy Sets

Of Operating System Process

Codes

Degree of Membership

Cluster 1

(C1)

Cluster 2

(C2)

Cluster 3

(C3)

Decline system speed R01 0.50 0.15 0.35

Numerous stem errors R02 0.20 0.20 0.60

Persistent logoff R03 0.10 0.80 0.10

Disable security software R04 0.20 0.10 0.70

Obvious desktop change R05 0.30 0.60 0.10

Abnormal internet system behaviors R06 0.05 0.05 0.90

Continuous hard change R07 0.00 0.50 0.50

Result Virus

Infected

System

Moderatel

y Secured

System

Secure

System

Page 10: Ijfls05

International Journal of Fuzzy Logic Systems (IJFLS) Vol.3, No2, April 2013

72

[3] Babak B. R.., Maslin M. and Suhaimi I. (2011), “Evolution of Computer Virus Concealment and Anti-

Virus Techniques: A Short Survey”,International Journal of Computer Science Issues (IJCSI), Vol. 8, Issue 1, pp.

113-121.

[4] Cengiz K., Sezi C., Nufer Y.A. and Murat G. (2007), “Fuzzy multi-criteria evaluation of industrial

robotic systems” , Computers & Industrial Engineering, Vol. 52, Issue 4, Pages 414–433.

[5] Christian J. and Gustav E. (2003), “Optimizing Genetic Algorithms for Time Critical

Problems”,Department of Software Engineering and Computer Sciencelekinge Institute of

Technology Box 520 SE ä 372 25 Ronneby Sweden.

[6] Christos S. and Dimitros S. (2008) “Neural Network”, retrieved from

http://www.docstoc.com/docs/15050/neural-networks.

[7] Larry A. (2013), “How to Identify Computer Virus Symptoms” retrieved online from

http://www.ehow.com/how_4903119_identify-computer-virus-symptoms.html#ixzz2Mr

[8] Hkust,(2002), Types of Virus”, retrieved online from

http://www.ust.hk/itsc/antivirus/general/types.html

[9] Kasabov N. K. (1998), “Foundations of neural networks, fuzzy systems, and knowledge

engineering”, A Bradford Book, The MIT Press Cambridge, Massachusetts London, England, ISBN

0-262-11212-4.

[10] Leondes C (2010), “The Technology of Fuzzy Logic Algorithm retrieved from

Suite101.com/examples-of-expert-System-application-in-artificial Intelligence.

[11] Mathias K. (2003), “A Critical Look at the regulations of Computer Viruses”, International Journal

of Law and Information technology, Retrieved online from http://www.digital-rights.net/wp-

content/uploads/2008/01/klang_virus_ijlit.pdf.

[12] Nitesh K. D., Lokesh m., Mahendra S. C. and Bhabesh K. D., (2012), “The New Age

OfComputer Virus And Their Detection”, International Journal of Network Security & Its

Applications (IJNSA), Vol.4, No.3, retrieved on from http://airccse.org/journal/jnsa12_current.html

[13] Robert F. (1995), “Neural Fuzzy Systems”retrieved online from

http://faculty.petra.ac.id/resmana/basiclab/fuzzy/fuzzy_book.pdf.

[14] Simon S.H. (1999), “Neural Network: A comprehensive Foundation, Pretice Hall, Chapter 1-15, Pp.

1-889)

[15] Tom B. (2002), “Retail Market Basket Analysis: A Quantitative Modelling Approach” Dissertation

submitted to obtain the degree of Doctor in Applied Economic Sciences at the Limburg University

Center.

[16] Wikipedia (2013), “Computer virus” retrieved online from en.wikipedia.org/wiki/

[17] Zadeh L. A. (1965), “Fuzzy sets, Information and Control”, Vol.8, pp.338-353.