5
International Journal of Research Studies in Science, Engineering and Technology [IJRSSET] Volume 1, Issue 1, April 2014, PP 93-97 ©IJRSSET 93 Survey on Data Mining Techniques for Intrusion Detection System Miss.Kavita Patond CSE Deept,PR pote(patil)COET, Name &SGBAU, Amravati, India Prof. Pranjali Deshmukh CSE Deept, PR pote(patil)COET, Name &SGBAU, Amravati, India Abstract: Today, Intrusion Detection Systems have been employed by majority of the organizations to safeguard the security of information systems. Firewalls that are used for intrusion detection possess certain drawbacks which are overcome by various data mining approaches. Data mining techniques play a vital role in intrusion detection by analyzing the large volumes of network data and classifying it as normal or anomalous. Several data mining techniques like Classification, Clustering and Association rules are widely used to enhance intrusion detection. Among them clustering is preferred over classification since it does not require manual labelling of the training data and the system need not be aware of the new attacks. This paper discusses three different clustering algorithms namely K-Means Clustering, Y-Means Clustering and Fuzzy C-Means Clustering. K-Means clustering results in degeneracy and is not suitable for large databases. Y-Means is an improvement over K-means that eliminates empty clusters. Four issues such as Classification of Data, High Level of Human Interaction, Lack of Labelled Data, and Effectiveness of Distributed Denial of Service Attack are being solved using the algorithms like EDADT algorithm, Hybrid IDS model, Semi-Supervised Approach and Varying HOPERAA Algorithm respectively. Keywords: HybridIDS, AnomolyDetecton, Clustering, K-means 1. INTRODUCTION In this modern world intrusion occurs in a fraction of seconds. Intruders cleverly use the modified version of command and thereby erasing their footprints in audit and log files. Successful IDS intellectually differentiate both intrusive and non intrusive records. Data mining based IDS can efficiently identify these data of user interest and also predicts the results that can be utilized in the future. Data mining or knowledge discovery in databases has gained a great deal of attention in IT industry as well as in the society. Data mining has been involved to analyze the useful information from large volumes of data that are noisy, fuzzy and dynamic. Figure 1 illustrates the overall architecture of IDS. It has been placed centrally to capture all the incoming packets that are transmitted over the network. Data are collected and send for pre-processing to remove the noise; irrele-vant and missing attributes are replaced. Then the pre-processed data are analyzed and classified according to their severity measures. If the record is normal, then it does not re-quire any more change or else it send for report generation to raise alarms. Based on the state of the data, alarms are raised to make the administrator to handle the situation in advance. The attack is modeled so as to enable the classification of net- work data. All the above process continues as soon as the transmission starts. Figure1: Overall structure of Intrusion Detection System 2. LITERATURE SURVEY G.V. Nadiammai, M. Hemalatha [1], With the help of varying clock drift, the client can easily communicate with the server with minimum contact initiation trails and the improved max- imum delivery latency has been achieved. Varying HOPERAA algorithm shows 99% as receiving capacity at 100 ms and slightly deviates

15.pdf

  • Upload
    suchi87

  • View
    226

  • Download
    5

Embed Size (px)

Citation preview

International Journal of Research Studies in Science, Engineering and Technology [IJRSSET] Volume 1, Issue 1, April 2014, PP 93-97 IJRSSET 93 Survey on Data Mining Techniques for Intrusion Detection System Miss.Kavita Patond CSE Deept,PRpote(patil)COET, Name &SGBAU, Amravati, India Prof. Pranjali Deshmukh CSE Deept, PR pote(patil)COET,Name &SGBAU, Amravati, India Abstract: Today,Intrusion DetectionSystemshavebeenemployedbymajorityoftheorganizationstosafeguardthe securityofinformation systems.Firewallsthatareusedfor intrusiondetectionpossesscertaindrawbackswhichareovercomeby variousdataminingapproaches.Dataminingtechniquesplayavitalroleinintrusiondetection by analyzingthelargevolumesofnetworkdata andclassifyingitasnormaloranomalous. Severaldataminingtechniqueslike Classification, Clustering and Association rulesarewidelyusedtoenhance intrusiondetection.Amongthemclusteringispreferredoverclassificationsinceitdoesnotrequire manuallabellingofthe trainingdataandthesystem need not be aware of the new attacks.ThispaperdiscussesthreedifferentclusteringalgorithmsnamelyK-MeansClustering,Y-MeansClusteringandFuzzyC-MeansClustering.K-Meansclusteringresults indegeneracyandisnotsuitableforlargedatabases.Y-MeansisanimprovementoverK-meansthateliminatesemptyclusters. FourissuessuchasClassificationofData,HighLevelofHumanInteraction,LackofLabelledData,and Effectiveness of Distributed Denial of Service Attack are being solved using the algorithms like EDADT algorithm, Hybrid IDS model, Semi-Supervised Approach and Varying HOPERAA Algorithm respectively. Keywords: HybridIDS, AnomolyDetecton, Clustering, K-means 1.INTRODUCTION In this modern world intrusion occurs in a fraction ofseconds.Intruderscleverlyusethemodified versionofcommandandtherebyerasingtheir footprintsinauditandlogfiles.SuccessfulIDS intellectuallydifferentiatebothintrusiveandnon intrusiverecords.DataminingbasedIDScan efficientlyidentifythesedataofuserinterestand alsopredictsthe resultsthat can be utilized in the future.Dataminingorknowledgediscoveryin databases has gained a great deal of attention in IT industry as well as in the society. Data mining has beeninvolvedtoanalyzetheusefulinformation fromlargevolumesofdatathatarenoisy,fuzzy anddynamic.Figure1illustratestheoverall architecture of IDS. It has been placed centrally to capturealltheincomingpacketsthatare transmittedoverthenetwork.Dataarecollected andsendforpre-processingtoremovethenoise; irrele-vantandmissingattributesarereplaced. Thenthepre-processeddataareanalyzedand classifiedaccordingtotheirseveritymeasures.If the recordisnormal, then it doesnot re-quire any morechangeorelseitsendforreportgeneration toraisealarms.Basedonthestateofthedata, alarmsareraisedtomaketheadministratorto handlethesituationinadvance.Theattackis modeledsoastoenabletheclassificationofnet-workdata.Alltheaboveprocesscontinuesas soon as the transmission starts. Figure1:OverallstructureofIntrusionDetection System 2.LITERATURE SURVEY G.V.Nadiammai,M.Hemalatha[1],Withthe helpofvaryingclockdrift,theclientcaneasily communicatewiththeserverwithminimum contactinitiationtrailsandtheimprovedmax-imumdeliverylatencyhasbeenachieved. VaryingHOPERAAalgorithmshows99%as receiving capacity at 100 ms and slightly deviates Survey on Data Mining Techniques for Intrusion Detection System International Journal of Research Studies in Science, Engineering and Technology [IJRSSET]94 for40msratewithbetterthroughputcapac-ity. Our experimental results proved that the proposed algo-rithms solves the above mentioned issues and detectstheattacksinaneffectivemanner comparedwithotherexistingworks. Thus, it will pavethewayforaneffectivemeansofintrusion detectionwithbetteraccuracyandreducedfalse alarm rates. ShakibaKhademolqorani,AliZeinalHamadani [2],Nowadays,decisionmakersinvariablyneed tousedecisionsupporttechnologyinorderto tackle complexdecision making problems. In this area,datamininghasanimportantroletoextract valuableinformation.Also,thesuccessful applicationofdataminingtechnologyrequires thatonepossessspecificDMdecision-makingskills.Forinstance,theeffectiveapplicationofa data mining process is littered with many difficult andtechnicaldecisions(i.e.datacleansing, featuretransformations,algorithms,parameters, evaluation,etc.).Consequently,theuseofdata mininganddecisionsupportmethods,including novelvisualizationmethods,canleadtobetter performanceindecisionmaking,improvethe effectivenessofdevelopedsolutionsandenablestacklingofnewtypesofproblemsthathavenot beenaddressedbefore.Ontheotherhand,the MCDMmethoddealswiththevastareaof decisionmaking; choosing the bestoption among variousalternatives,andoptimizationofgoal amongmulti-objectivesituations.Therefore,the decisionsupportsystems(DSS),datamining (DM)andmultiplecriteriadecisionmaking (MCDM)arecomplementarymethodsfor decisionmakingprocess,andtheyhavealsoa strong link with expert systems. ZubairMd.Fadlullah,etl[3],Inthisarticle,theyhighlightedtheimportanceondesigning appropriateintrusiondetection systemsto combat attacks against cognitive radio networks. Also, we proposed a simple yet effective IDS, which can be easilyimplementedinthesecondaryusers cognitiveradiosoftware.OurproposedIDSuses non-parametriccusumalgorithm,whichoffers anomalydetection.Bylearningthenormalmode ofoperationsandsys-temparametersofaCRN, the proposed IDS is able to detect suspicious (i.e., anomalousor abnormal) behavior arisingfrom an attack. In particular, we presented an example of a jammingattackagainstaCRNsecondaryuser, and demonstrated how our proposed IDS is able to detecttheattackwithlowdetectionlatency.In future,ourworkwillperformfurther investigationsonhowtoenhancethedetection sensitivity of the IDS. MueenUddin,etl[4],Thispaperhasfocusedon theefficiencyandperformanceofthenewIDS: calledsignature-basedmulti-layerIDSusing mobileagents.Itthendiscussesthedevelopment of a new signature- based ID using mobile agents. Theproposedsystemusesmobileagentsto transferrule-basedsignaturesfromlargecomplementarydatabasetosmallsignature databaseandthenregularlyupdatethosedatabases with new signatures detected. SahilpreetSingh,MeenakshiBansa[5],This studyisapproachedtodiscoverthebest classification for the application machine learning to intrusion detection.For this, we have presented differentneuralbaseddataminingclassifier algorithmstoclassifyattacksinanefficientmanner. After doing experimental work, It is clear thatMultilayerPerceptronfeedforwardneural networkhashighestclassificationaccuracyand lowesterrorrateascomparedtootherneural classifieralgorithmnetwork.Toenhancetheresults the feature reduction techniques is applied. TheneuralalgorithmsareappliedtoNSLKDD datasetbyreducingitsattributesand implementedusingWEKAmachinelearning tool.Theyshowedthatmachinelearningisan effectivemethodologywhichcanbeusedinthe fieldofintrusiondetection.Infuture,Theywill proposeanewalgorithmwhichwillintegrate MultilayerPerceptionNetworkwithfuzzy inference rules to improve the performance PaulDokas,etl[6]Severalintrusiondetection schemesfordetectingnet-workintrusionsare proposedinthispaper.WhenappliedtoKDDCup99dataset,developedalgorithmsfor learn-ing from rare class weremore successfulin detectingnetworkattacksthanstandarddata miningtechniques.Experimentalresults performedonDARPA98andrealnetworkdata indicatethattheLOFapproachwasthemost promising technique for detecting novel intrusions Whenperformingexperimentson DARPA98data,theunsupervisedSVMswere verypromisingindetectingnewintrusionsbut theyhadveryhighfalsealarmrate.Therefore, futureworkisneededinordertokeephigh detection rate while lowering the false alarm rate. In addition, for the Mahal anobis based approach, wearecurrentlyinvestigatingtheideaodefining several types of normal behavior and measuring thedistancetoeachoftheminordertoidentify the anomalies. 3.TECHNIQUES FOR IDS Eachmaliciousactivityorattackhasaspecific pattern.ThepatternsofonlysomeoftheMiss.Kavita Patond & Prof. Pranjali Deshmukh International Journal of Research Studies in Science, Engineering and Technology [IJRSSET]95 attacksareknownwhereastheotherattacksonlyshowsomedeviationfromthenormal patterns. Therefore, thetechniquesusedfordetectingintrusionsarebasedonwhetherthepatternsoftheattacksareknownorunknown.The two main techniques used are: 3.1. Anomaly Detection Itisbasedontheassumptionthatintrusionsalwaysreflectsomedeviationsfromnormalpatterns.Thenormalstateofthenetwork,trafficload,breakdown,protocolandpacketsizearedefinedbythesystemadministratorinadvance.Thus,anomalydetectorcomparesthecurrentstateofthenetworkk tothenormalbehaviourandlooksformaliciousbehaviour.Itcandetectboth known and unknown attacks. 3.2. Misuse Detection Itisbasedontheknowledgeofknownpatternsofpreviousattacksandsystemvulnerabilities.Misusedetectioncontinuously comparescurrentactivitytoknownintrusionpatternstoensurethatanyattackerisnotattemptingto exploit knownvulnerabilities.Toaccomplishthistask,itis requir ed todescribeeachintrusionpatternindetail It cannot detect unknown attacks. 4.REVIEWOFDATAMININGTECHNIQUEIN INTRUSION DETECTION DataMiningreferstotheprocessofextractinghidden,previouslyunknownandusefulinformationfromlargedatabases.ItextractspatternsandconcentratesonissuesrelatingtotheirItisaconvenientwayofextractingpatternsandfocusesonissuesrelatingtotheirfeasibility,utility,efficiencyandscalability.Thusdataminingtechniques helptodetectpatternsinthedatasetanusethesepatternstodetectfutureintrusionsinsimilardata.Thefollowingareafewspecificthingsthatmaketheuseofdataminingimportantin an intrusion detection system: 1.Manage firewall rules for anomaly detection.2.Analy ze large volumes of network data. 3.Samedataminingtoolcanbeappliedto different data sources. 4.Performsdatasummarizationand visualization. 5.Differentiatesdatathatcanbeusedfor deviation analysis.6.Cluster sthedataintogroupssuchthatit possesshighintra-classsimilarityandlowinter-class similarity . Dataminingtechniquesplayanimportantrolein intrusiondetectionsystems.Differentdataminingtechniqueslike classification,clustering, associationruleminingareusedfrequentlytoacquireinformationaboutintrusionsbyobservingandanalyzingthenetworkdata[1].Thefollowingdescribesthedifferent data mining techniques. A.Classification Itisasupervisedlearningtechnique.A classificationbasedIDSwillclassifyallthe networktrafficintoeithernormalor malicious. Classificationtechniqueismostlyusedfor anomaly detection. The classification process is as follows:

1.It accepts collection of items as input. 2.Mapstheitemsintopredefinedgroupsor classes defined by some attributes. 3.Aftermapping,itoutputsaclassifierthatcanaccuratelypredict the classtowhichanew item belongs. B.Association Rule Thistechniquesearchesafrequentlyoccurring itemsetfromalargedataset.Associationruleminingdeterminesassociation[6]rulesand/or correlationrelationshipsamonglargesetofdataitems.Theminingprocessofassociation rule can be divided into two steps as follows: 1.FrequentItemsetGenerationGenerates allsetofitemswhosesupportisgreaterthanthe specified threshold called as min support. 2.AssociationRuleGenerationFromthepreviouslygeneratedfrequentitemsets,it generatestheassociationrules intheformofifthenstatementsthathaveconfidenceSurvey on Data Mining Techniques for Intrusion Detection System International Journal of Research Studies in Science, Engineering and Technology [IJRSSET]96 greaterthanthespecifiedthresholdcalledasmin confidence. 3.Thenetworkdataisarrangedintoadatabasetablewhereeachrowrepresentsanauditrecordandeachcolumnisafieldofthe audit records. 4.Theintrusions anduseractivitiesshowsfrequentcorrelationsamongthenetworkdata.Consistentbehaviorsinthenetworkdatacanbe capturedin associationrules. 5. Rulesbasedonnetworkdatacan continuouslymergethe rulesfrom a new r un toaggregate rule set of all previous runs. 6.Thuswiththeassociationrule,wegetthecapabilitytocapturebehaviorforcorrectly detectinintrusionsandhenceloweringthefalse alarmrate. C. Clustering Itisanunsupervisedmachinelearning mechanismfordiscoveringpatternsinunlabeled data. It is used tolabeldata andassignitintoclusterswhereeachclusterconsistsofmembersthatarequitesimilar. Membersfrom differentclustersaredifferentfromeachother.Henceclusteringmethodscanbeusefulforclassifyingnetworkdatafordetectingintrusions.Clusteringcanbeappliedonboth AnomalydetectionandMisusedetection.The basicstepsinvolvedinidentifyingintrusionare follows [3]: 1.Findthelargestcluster,whichconsistsofmaximumnumberofinstancesand labelitas normal. 2.Sort theremainingclusters in an ascendingorder of their distances to the largest cluster. 3.SelectthefirstK1clusterssothatthenumberofdatainstancesintheseclusterssumupto`Nand labelthemasnor mal,where`isthepercentageofnormal instances. 4.Label all other clusters as malicious. 5.Afterclustering,heuristicsareusedtoautomaticallylabel each clusteras eithernor malormalicious. The self-labelledcluster s arethenusedtodetectattacksinaseparate test dataset. Fromthethreedataminingtechniquesdiscussedaboveclusteringiswidelyusedforintrusiondetectionbecauseofthefollowing advantages overthe other techniques: 1.Doesnot require the use of a labeleddata setfor training.2.Nomanualclassificationoftrainingdata needs to be done.3.Neednothavetobeawareofnewtypesof intrusionsinorderfor thesystemtobe ableto detect them.5.CLUSTERING TECHNIQUES USED IN IDS Severalclusteringalgorithmshavebeenusedfor intrusion detection.Allthesealgorithmsreduce thefalsepositiverateandincreasethedetectionrateoftheintrusions.Thedetectionrateisdefinedasthenumberofintrusioninstancesdetectedbythesystemdividedbythetotalnumberofintrusion instancespresentinthedataset.The falsepositiverateisdefinedastotalnumberofnormalinstancesthatwereincorrectlyclassifiedasintrusions definedby thetotalnumberofnor malinstances.Some of the clustering techniques suchasK-MeansClustering, Y-MeansClustering andFuzzyC-MeansClusteringarediscussed below: A.K-Means ClusteringK-Meansalgorithmisahardpartitionedclusteringalgorithmwidelyusedduetoitssimplicityandspeed.ItusesEuclideandistance as the similarity measure.Hardclusteringmeansthat anitemina datasetcanbelongtooneandonlyoneclusteratatime.Itisaclusteringanalysis algorithmthat groups items basedontheirfeaturevaluesintoKdisjointcluster ssuchthat theitemsinthesame clusterhavesimilarattributesandthoseindifferent clustershavedifferentattributes. The Euclidean distancefunctionusedtocomputethe distance (i.e.Similarity)between two itemsis given follows ...............................................................1 where,p=(p1,p2,,pm)andq=(q1,q2,,qm)arethetwoinputvectorswithmquantitative attributes.Thealgorithm is appliedto trainingdatasetswhichmaycontainnormaland abnormal trafficwithoutbeing labeledpreviousl y. Themainideaofthisapproachisbasedontheassumptionthatnormal andabnormaltrafficfor m different cluster s.Thedatamay alsocontainoutliers, whicharethedataitemsthatareverydifferent fromtheotheritemsintheclusterandhencedonotbelongtoanycluster.Anoutlieris foundby comparingthe radiusesofthedataitems;thatis,iftheMiss.Kavita Patond & Prof. Pranjali Deshmukh International Journal of Research Studies in Science, Engineering and Technology [IJRSSET]97 radiusof a dataitemisgreaterthanagive threshold,itis considered asan outlier. Butthis doesnotdisturbtheK-meansclusteringprocessaslongasthenumberofoutliersissmall.[2] K-Means Clustering Algorithm is as follows: i)Define the number of clusters K.For example, ifK=2,weassumethatnormalandabnormaltrafficinthetrainingdataformtwodifferent clusters. ii)Initialize the K cluster centroids.This canbedonebyrandomlyselectingKdata itemsfromthedata set. iii)Computethe distance from each itemtothecentroidsofalltheclusterbyusingtheEuclideandistancemetricwhichisusedtofind thesimilarity between the items in data set. iv) Assign each item to the cluster with the nearest centroid.Inthiswayalltheitemswillbe assignedtodifferentclusterssuchthateachcluster will have items with similar attributes. v)Afteralltheitemshavebeenassignedto differentclustersre-calculatethemeansofmodifiedclusters Thenewly calculatedmeanisassignedas thenew centroid. vi)Repeatstep(iii)untiltheclustercentroidsdo not change.vii)Labeltheclusterasnormalandabnor mal dependingon thenumberofdata itemsineach cluster. 6.CONCLUSION Dataminingtechniquesarewidelyusedbecauseoftheircapabilitytodrasticallyimprovetheperformanceandusabilityofintrusiondetectionsystems.Differentdata miningtechniques like classification,clusteringand association rulemining areveryhelpfulinanalyzingthenetworkdata.Sincelargeamountofnetworktrafficneedstobecollectedfor intrusiondetection,clusteringismoresuitablethanclassificationinthedomainofintrusiondetectionasitdoesnotrequire labeleddatasettherebyreducingmanual efforts. Datamining techniquescan detectknown aswellasunknownattacks.Datamining technologyhelpstounderstandnormalbehaviorinsidethedataandusethisknowledgefordetectingunknownintrusions. ThreeclusteringalgorithmsnamelyK-means,Y-meansandFuzzyC-meanshavebeendiscussed.Eachofthesehasbothadvantagesanddisadvantagesandareanimprovementovertheother.AmongtheseFuzzyC-Meansclusteringcanbeconsideredasanefficientalgorithmforintrusion detectionsinceitallowsan itemtobelong tomore thanoneclusterand also measures the quality of partitioning.Thetechniquecan be used for largedatasets aswell asdatasetsthathaveoverlapping items REFERENCES [1]G.V.Nadiammai,M.Hemalatha.Effective approachtowardIntrusionDetectionSystem usingdataminingtechniquesReceived22 May 2013; revised 29 August 2013; accepted 27October2013Egyptioninformatics journal. Elsevier [2]ShakibaKhademolqorani,AliZeinal HamadaniAnAdjustedDecisionSupport SystemthroughDataMiningandMultiple CriteriaDecisionMakingThe2nd InternationalConferenceonIntegrated Information Elsevier 2013 [3]ZubairMd.Fadlullah,HirokiNishiyama, AnIntrusionDetectionSystem(IDS)for CombatingAttacksAgainstCognitiveRadio Networks 2013IEEE [4]MueenUddin,AzizahAbdulRehmanletl Signature-basedMulti-LayerDistributed IntrusionDetectionSystemusingMobile AgentsInternationalJournalofNetwork Security, Vol.15, No.1, PP.79-87, Jan. 2013 [5]SahilpreetSinghMeenakshiBansa, Improvementof Intrusion Detection System inDataMiningusingNeuralNetwork Volume3,Issue9,September2013IJARCSSE[6]Paul Dokas, Levent Ertoz,, Data Mining for Network Intrusion Detection [7]Pise,P.J.,1982,Laterallyloadedpilesina two-layersoilsystem.,J.Geotech.Engrg. Div., 108(9), 11771181. [8]Poulos,H.G.,1971,Behavioroflaterally loadedpiles-I:Singlepiles.,J.SoilMech. and Found. Div., 97(5), 711731. [9]Reese,L.C.,andMatlock,H.,1956,Non-dimensionalsolutionsforlaterallyloaded piles with soil modulus assumed proportional todepth.,Proc.,8thTexasConf.onSoil MechanicsandFoundationEngineering, Austin, Texas, 123 [10] Reese, L. C., and Welch, R. C., 1975, Lateral loadingofdeepfoundationsinstiffclay.,J. Geotech. Engrg. Div., 101(7), 633649.