25
Data Mining Approach for Data Mining Approach for Network Intrusion Network Intrusion Detection Detection Zhen Zhang Zhen Zhang Advisor: Dr. Chung-E Wang Advisor: Dr. Chung-E Wang 04/24/2002 04/24/2002 Department of Computer Science Department of Computer Science California State University, California State University, Sacramento Sacramento

Data Mining Approach for Network Intrusion Detection

  • Upload
    gamma

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Mining Approach for Network Intrusion Detection. Zhen Zhang Advisor: Dr. Chung-E Wang 04/24/2002 Department of Computer Science California State University, Sacramento. Outline. Background Intrusion Detection: promises and challenges Data Mining in IDS: how can it help Motivation - PowerPoint PPT Presentation

Citation preview

Page 1: Data Mining Approach for Network Intrusion Detection

Data Mining Approach for Network Data Mining Approach for Network Intrusion DetectionIntrusion Detection

Zhen ZhangZhen Zhang

Advisor: Dr. Chung-E WangAdvisor: Dr. Chung-E Wang

04/24/200204/24/2002

Department of Computer ScienceDepartment of Computer Science

California State University, SacramentoCalifornia State University, Sacramento

Page 2: Data Mining Approach for Network Intrusion Detection

OutlineOutline

BackgroundBackground– Intrusion Detection: promises and challengesIntrusion Detection: promises and challenges

– Data Mining in IDS: how can it helpData Mining in IDS: how can it help

MotivationMotivation Approaches, tasks, problems and my Approaches, tasks, problems and my

contributionscontributions ResultsResults Conclusion and future workConclusion and future work

Page 3: Data Mining Approach for Network Intrusion Detection

Intrusion DetectionIntrusion Detection- - Building a Secure NetworkBuilding a Secure Network

Primary assumptionsPrimary assumptions– System activities are observableSystem activities are observable

– Normal and intrusive activities have distinct evidenceNormal and intrusive activities have distinct evidence

Main techniquesMain techniques– Misuse detection: patterns of well-known Misuse detection: patterns of well-known

attacksattacks– Anomaly detection: deviation from normal Anomaly detection: deviation from normal

usage usage

Page 4: Data Mining Approach for Network Intrusion Detection

Data Mining in IDSData Mining in IDS

Shortfalls with current IDS (mostly misuse Shortfalls with current IDS (mostly misuse detections)detections)– VariantsVariants: Intrusions change easily and : Intrusions change easily and

frequently.frequently.

– False positiveFalse positive: Difficult to pick up intrusions.: Difficult to pick up intrusions.

– False negativeFalse negative: Detecting attacks for which there are : Detecting attacks for which there are no known signaturesno known signatures

– Data overloadData overload: Amount of data grows rapidly: Amount of data grows rapidly. .

Page 5: Data Mining Approach for Network Intrusion Detection

What is Data MiningWhat is Data Mining

Data Mining:Data Mining:Take data and pull from it patterns or deviations.Take data and pull from it patterns or deviations.

Many different types of algorithms:Many different types of algorithms:Decision Tree,Decision Tree, Link analysis, Clustering, Association, Link analysis, Clustering, Association, Rule abduction, Deviation Analysis, and Sequence Rule abduction, Deviation Analysis, and Sequence analysis.analysis.

Software and Tools:Software and Tools:– MS SQL Server 2000MS SQL Server 2000– Ripper and many others Ripper and many others

Page 6: Data Mining Approach for Network Intrusion Detection

How can Data Mining helpHow can Data Mining help

VariantsVariants– Use anomaly detection, no great concern with variants in Use anomaly detection, no great concern with variants in

an exploit codean exploit code.. False positivesFalse positives

– To identify recurring sequences of alarms in order to help To identify recurring sequences of alarms in order to help identify valid network activity.identify valid network activity.

False negativesFalse negatives – Attacks for which signatures have not been developed Attacks for which signatures have not been developed

might be detected.might be detected. Data overloadData overload

– Data mining plays a vital role.Data mining plays a vital role.

Page 7: Data Mining Approach for Network Intrusion Detection

Summary of my workSummary of my work

Identify objectiveIdentify objective– Distinguish network attacks from normal trafficDistinguish network attacks from normal traffic

– New area, several research projects, no commercial productsNew area, several research projects, no commercial products

– Focus on the principle and basic implementation of conceptsFocus on the principle and basic implementation of concepts

Data CollectionData Collection Data Pre-processing on tcpdump datasetData Pre-processing on tcpdump dataset Apply data mining on processed dataApply data mining on processed data Investigate resultsInvestigate results Software packages used: Visual Basic, Microsoft Software packages used: Visual Basic, Microsoft

SQL Server 2000 with Analysis Server, TcpdumpSQL Server 2000 with Analysis Server, Tcpdump

Page 8: Data Mining Approach for Network Intrusion Detection

Data CollectionData Collection

Tcpdump data Tcpdump data (http://iris.cs.uml.edu:8080/)(http://iris.cs.uml.edu:8080/)

– Tcpdump was executed on the gateway, to capture the Tcpdump was executed on the gateway, to capture the traffic between LAN and external, and broadcast traffic between LAN and external, and broadcast packets within LANpackets within LAN

– Only header, no user dataOnly header, no user data

– Filters were used, only TCP and UDP packetsFilters were used, only TCP and UDP packets

– Baseline and 4 simulated attacksBaseline and 4 simulated attacks

Page 9: Data Mining Approach for Network Intrusion Detection

TCPDUMP data formatTCPDUMP data format TCP packetTCP packet

– Time stamp Time stamp

– Source IP address Source IP address

– Source port Source port

– Destination IP address Destination IP address

– Destination port Destination port

– Flags (SYN, FIN, PUSH, RST, or .) Flags (SYN, FIN, PUSH, RST, or .)

– Data sequence number of this packet Data sequence number of this packet

– Data sequence number of the data expected in return Data sequence number of the data expected in return

– Number of bytes of receive buffer space available Number of bytes of receive buffer space available – Indication of whether or not the data is urgentIndication of whether or not the data is urgent

Page 10: Data Mining Approach for Network Intrusion Detection

Tcpdump data formatTcpdump data format

UDP packetUDP packet– Time stamp Time stamp

– Source IP address Source IP address

– Source port Source port

– Destination IP address Destination IP address

– Destination port Destination port

– Length of the packet Length of the packet

Example dataExample data

Page 11: Data Mining Approach for Network Intrusion Detection

Example tcpdump dataExample tcpdump data

Page 12: Data Mining Approach for Network Intrusion Detection

Data Pre-processingData Pre-processing- 80% ~ 90% work- 80% ~ 90% work

Packet level information to connection Packet level information to connection levellevel

– Group by same source/destination IP/PortGroup by same source/destination IP/Port

– Use flags, acks to determine status of the connectionUse flags, acks to determine status of the connection» SF, REJ, S0, S1, S3, S3, S4, RSTOSn, RSTRSn, SS, SH, SF, REJ, S0, S1, S3, S3, S4, RSTOSn, RSTRSn, SS, SH,

SHR, OOS1, OOS2SHR, OOS1, OOS2

– Record start time, duration, protocolRecord start time, duration, protocol

– Calculate bytes in, bytes out, resent rateCalculate bytes in, bytes out, resent rate

– UDP is connectionless, so simply treat each packet as UDP is connectionless, so simply treat each packet as a connectiona connection

Page 13: Data Mining Approach for Network Intrusion Detection

First round of processingFirst round of processing

Intrinsic FeaturesIntrinsic Features

Page 14: Data Mining Approach for Network Intrusion Detection

Establish more informationEstablish more informationCount_per_destCount_per_dest # of connections to this # of connections to this

destination IPdestination IP

REJ_count_per_destREJ_count_per_dest # of connections that get the # of connections that get the flag “REJ”flag “REJ”

S01_count_per_destS01_count_per_dest # of connections that send a # of connections that send a SYN packet but never get the SYN packet but never get the ACK packet (S0), or receive an ACK packet (S0), or receive an ACK on SYN that they never ACK on SYN that they never have sent (S1).have sent (S1).

Diff_Services_per_destDiff_Services_per_dest # of unique services# of unique services

Diff_Service_RateDiff_Service_Rate Diff_Services / CountDiff_Services / Count

Same Destination Temporal and Statistical Attributes (last 2 seconds)Same Destination Temporal and Statistical Attributes (last 2 seconds)

Page 15: Data Mining Approach for Network Intrusion Detection

Establish more informationEstablish more informationCount_per_serviceCount_per_service # of connections to this type of # of connections to this type of

serviceservice

REJ_count_per_serviceREJ_count_per_service # of connections that get the # of connections that get the flag “REJ” (SYN met by RST)flag “REJ” (SYN met by RST)

S01_count_per_serviceS01_count_per_service # of connections that send a # of connections that send a SYN packet but never get the SYN packet but never get the ACK packet (S0), or receive an ACK packet (S0), or receive an ACK on SYN that they never ACK on SYN that they never have sent (S1).have sent (S1).

Diff_Hosts_per_serviceDiff_Hosts_per_service # of unique destination hosts# of unique destination hosts

Diff_Hosts_RateDiff_Hosts_Rate Diff_Hosts / CountDiff_Hosts / Count

Same Service Temporal and Statistical Attributes (last 2 seconds)Same Service Temporal and Statistical Attributes (last 2 seconds)

Page 16: Data Mining Approach for Network Intrusion Detection

Second round of processingSecond round of processing

Same Destination Temporal and Statistical Attributes

Page 17: Data Mining Approach for Network Intrusion Detection

Final round of processingFinal round of processing

Final, but importantFinal, but important– Reduce data amountReduce data amount

– Remove noise or trivial informationRemove noise or trivial information

– Re-organization data, add new feature if necessaryRe-organization data, add new feature if necessary

ChallengesChallenges– Hard to tell which data to reduced/removeHard to tell which data to reduced/remove

– Requires tremendous domain knowledgeRequires tremendous domain knowledge

– Need experiments and adjustmentsNeed experiments and adjustments

Page 18: Data Mining Approach for Network Intrusion Detection

Data MiningData Mining

Decision Tree AlgorithmDecision Tree Algorithm Microsoft SQL Server 2000 Analysis Microsoft SQL Server 2000 Analysis

ServerServer Steps:Steps:

– 80% of baseline (normal) dataset as training data80% of baseline (normal) dataset as training data

– Use 20% left as validation data, compute Use 20% left as validation data, compute misclassification.misclassification.

– 20% of each of the four intrusion datasets as 20% of each of the four intrusion datasets as predication data, compute misclassification.predication data, compute misclassification.

Page 19: Data Mining Approach for Network Intrusion Detection

Dependency NetworkDependency Network

Page 20: Data Mining Approach for Network Intrusion Detection

Decision TreeDecision Tree

Page 21: Data Mining Approach for Network Intrusion Detection

Apply Data Mining Model to Validate/PredicateApply Data Mining Model to Validate/Predicate

Page 22: Data Mining Approach for Network Intrusion Detection

ResultsResults

% misclassification (by final state)% misclassification (by final state)

NormalNormal 149/1510 = 9.86%149/1510 = 9.86%

Intrusion1Intrusion1 443/2324 = 19.06%443/2324 = 19.06%

Intrusion2Intrusion2 376/1968 = 19.10%376/1968 = 19.10%

Intrusion3Intrusion3 386/2011 = 19.19%386/2011 = 19.19%

Intrusion4Intrusion4 437/2298 = 19.01%437/2298 = 19.01%

Page 23: Data Mining Approach for Network Intrusion Detection

Conclusion and future improvementConclusion and future improvement

AccuracyAccuracy – Preliminary experiments of using DM on the Preliminary experiments of using DM on the tcpdumptcpdump

data showed promising resultsdata showed promising results– depends on sufficient training data and right feature set.depends on sufficient training data and right feature set.

PerformancePerformance– 6 hours on one dataset (628775 records)6 hours on one dataset (628775 records)

Size of time windowSize of time window– 2 seconds or larger?2 seconds or larger?

Automated processAutomated process– Call MSSQL DM and DTS procedures within VBCall MSSQL DM and DTS procedures within VB– Real-time monitor and alarmReal-time monitor and alarm

Page 24: Data Mining Approach for Network Intrusion Detection

ReferencesReferences

Intrusion DetectionIntrusion Detection,, Rebecca Gurley Bace, Macmillan Technical Rebecca Gurley Bace, Macmillan Technical Publishing, 2000Publishing, 2000

Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques, , Jiawei Han Micheline Jiawei Han Micheline kamber, Morgan Kaufmann Publishers 2001kamber, Morgan Kaufmann Publishers 2001

Data Mining with Microcoft SQL Server 2000Data Mining with Microcoft SQL Server 2000, Claude Seidman. , Claude Seidman. Microsoft Press, 2001Microsoft Press, 2001

http://www.cs.columbia.edu/~sal/hpapers/USENIX/usenix.htmlhttp://www.cs.columbia.edu/~sal/hpapers/USENIX/usenix.html http://iris.cs.uml.edu:8080/network.htmlhttp://iris.cs.uml.edu:8080/network.html http://www-nrg.ee.lbl.gov/http://www-nrg.ee.lbl.gov/. Network Research Group (NRG) of the . Network Research Group (NRG) of the

Information and Computing Sciences Division Information and Computing Sciences Division (ICSD) at (ICSD) at Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory (LBNL) in (LBNL) in BerkeleyBerkeley, , CaliforniaCalifornia..

Page 25: Data Mining Approach for Network Intrusion Detection

Thank You!Thank You!