Minor Project Ppt

Intrusion Detection Using Data Mining

Problem Definition

• An Intrusion Detection System is an important part of the Security Management system for computers and networks that tries to detect break-ins or break-in attempts.

• Approaches to Solution– Signature-Based– Anomaly Based.

Abstract Due to the widespread proliferation of computer networks, attacks on computer systems are increasing day by day. Preventive measures canstop these attacks to some extent, but they are not very effective due to various reasons. This lead to the development of intrusion detection as a secondline of defense. In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity oravailability of a resource. Intrusion detection does not, in general, include prevention of intrusions. In this paper, we are focused on data miningtechniques that are being used for such purposes. We debate on the advantages and disadvantages of these techniques. Finally we present a new ideaon how data mining can aid IDSs in real time.

Types of Intrusion Detection

• Classification I– Real Time – After-the-fact (offline)

• Classification II– Network Based– Host Based

Approaches to IDS

Technique Signature Based Anomaly Based

Concept Model well-known attacks use these known patterns to identify intrusion.

Are trained using normal behavior of the systemTry to flag the deviation from normal pattern as intrusion

Pros and Cons

Specific to attacks cant extend to unknown intrusion patterns( False Negatives)

Usual changes due to traffic etc may lead higher number of false alarms

Approaches for IDS

Network-Based Host-Based

•Are installed on N/W Switches•Detect some of the attacks, that host-based systems don’t. Eg. DOS, Fragmented Packets.

•Are installed locally on host machines

SURVEY OF APPLIED TECHNIQUES

Machine Learning is the study of computer algorithms thatimprove automatically through experience. Applicationsrange from data mining programs that discover generalrules in large data sets, to information filtering systems thatautomatically learn users’ interests.

Classification TechniquesIn a classification task in machine learning, the task is totake each instance of a dataset and assign it to a particularclass. A classification based IDS attempts to classify alltraffic as either normal or malicious.

Neural networks provide a solution to the problem ofmodeling the users’ behavior in anomaly detection becausethey do not require any explicit user model. While previousworks have addressed the anomaly detection problem byanalyzing the audit records produced by the operatingsystem, in this approach, anomalies are detected by lookingat the usage of network protocols

Fuzzy Logic : Fuzzy logic is derived from fuzzy settheory dealing with reasoning that is approximate ratherthan precisely deduced from classical predicate logic.

Support Vector Machine : Support vector machines(SVMs) are a set of related supervised learning methodsused for classification and regression. SVMs attempt toseparate data into multiple classes.

Clustering is the classification of similarobjects into different groups, or more precisely, thepartitioning of a data set into subsets (clusters), so that thedata in each subset (ideally) share some common trait -often proximity according to some defined distancemeasure. Machine learning typically regards dataclustering as a form of unsupervised learning. Clustering isuseful in intrusion detection as malicious activity shouldcluster together, separating itself from non-maliciousactivity.

Clustering provides some significant advantagesover the classification techniques already discussed, in thatit does not require the use of a labeled data set for training

EXISTING SYSTEMS

1. The MINDS System: The Minnesota IntrusionDetection System (MINDS), uses data mining techniques toautomatically detect attacks against computer networksand systems. While the long-term objective of MINDS is toaddress all aspects of intrusion detection, the systemcurrently focuses on two specific issues:2. EMERALD (SRI) : EMERALD is a software-basedsolution that utilizes lightweight sensors distributed over anetwork or series of networks for real-time detection ofanomalous or suspicious activity. EMERALD sensorsmonitor activity both on host servers and network trafficstreams. By using highly distributed surveillance andresponse monitors, EMERALD provides a wide range ofinformation security coverage, real-time monitoring andresponse, protection of informational assets.3. IDSs in the Open Market: Various systems that employdata mining techniques have already been released as partsof commercial security packages.– Dshield,, RealSecureSiteProtectort

PROPOSED MODEL

The idea is to use a new data-mining based technique forintrusion detection using an ensemble of binary classifierswith feature selection and multiboosting simultaneously.We are making changes in Classifying Part .Our modelemploys feature selection so that the binary classifier foreach type of attack can be more accurate, which improvesthe detection of attacks that occur less frequently in thetraining data. Based on the accurate binary classifiers, ourmodel applies a new ensemble approach which aggregateseach binary classifier’s decisions for the same input anddecides which class is most suitable for a given input.During this process, the potential bias of certain binaryclassifier could be alleviated by other binary classifiers’decision. Our model also makes use of multiboosting forreducing both variance and bias. In this model ,For each trial i, i=1…T, where T is the total no. of trials,(1) A sample training set is generated by a multiboosterusing wagging (as specified in Webb’s multiboostingalgorithm [15]).(2) Binary classifiers are generated for each class of eventusing relevant features for the class and the

classification algorithm Binary classifiers are derived from the training sampleby considering all classes other than the current class asother, e.g., Cnormal will consider two classes: normal andother. The purpose of this phase is to select differentfeatures for different classes by applying the informationgain [18] or gain ratio [13] in order to identify relevantfeatures for each binary classifier. Moreover, applying theinformation gain or gain ratio will return all the featuresthat contain more information for separating the currentclass from all other classes. The output of this ensemble ofbinary classifiers will be decided using arbitration functionbased on the confidence level of the output of individualbinary classifiers (3) The ensemble classifier is used by the multibooster inorder to calculate the classification error, and derive thenext training set.(4) After T trials, the final committee is formed and it willbe used by our intrusion detection system.

Attack Simulation

• Types of attacks– NIDS

• SYN-Flood Attack

– HIDS• ssh Daemon attack.

Preprocessing on tcpdump• From the tcpdump data we extracted

following fields– src_ip ,dst_ip – src_port, dst_port– num_packets_src_dest / num_packets_dest_src – num_ack_src_dst/ num_ack_dst_src– num_bytes_src_dst/ num_bytes_dst_src– num_retransmit_src_dst/ num_retransmit_dst_src– num_pushed_src_dst/ num_pushed_dst_src– num_syn_src_dst/ num_syn_dst_src– num_fin_src_dst/ num_fin_dst_src– connection status

Literature Survey

• Types of attacks (Host and Network Based)

• Techniques– Association rules and Frequent Episode

Rules over host based and network based– Outlier Detection using clustering – classification

Documents

Minor Project Ppt