Upload
bharatsvnit
View
1.991
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Data Mining for
Security Applications
• Overview of Data Mining• Security Threats• Data Mining for Cyber security applications
– Intrusion Detection– Data Mining for Firewall Policy Management– Data Mining for Worm Detection• Data Mining for Counter-terrorism• Surveillance• Advantages• Conclusion
Data Mining - Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases [Han and Kamber 2005].
Data mining is used to sort through the tremendous amounts of data stored by automated data collection tools.
Extracts rules, regularities, patterns, and constraints from databases.
Natural DisastersHuman Errors
Non-Information related threats
Information Related threats
Biological, Chemical, Nuclear Threats
CriticalInfrastructureThreats
ThreatTypes
Data mining is being applied to problems such as intrusion detection and auditing. For example,
Anomaly detection techniques could be used to detect unusual patterns and behaviors.
Link analysis may be used to trace self-propagating malicious code to its authors.
Classification may be used to group various cyber attacks and then use the profiles to detect an attack when it occurs.
Prediction may be used to determine potential future attacks depending in a way on information learnt about terrorists through email and phone conversations
An intrusion can be defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource”.
Attacks are: Host-based attacks Network-based attacks
Intrusion detection systems are split into two groups: Anomaly detection systems Misuse detection systems
Data mining can help automate the process of investigating intrusion detection alarms.
Data mining on historical audit data and intrusion detection alarms can reduce future false alarms.
Build models of normal data Detect any deviation from normal data Flag deviation as suspect Identify new types of intrusions as deviation from normal behavior
Misuse detection Label all instances in the data set (“normal” or “intrusion” ) Run learning algorithms over the labeled data to generate
classification rules Automatically retrain intrusion detection models on different
input data
Misuse detection
•Classification Model
Bayesian classifier
Decision tree
Association rule
Support vector machine
Learning from rare class
Anomaly detection
•Anomaly Detection Model
Association rule
Neural network
Unsupervised SVM
Outlier detection
Analysis of Firewall Policy Rules Using Data Mining Techniques
•Firewall is the de facto core technology of today’s network security•First line of defense against external network attacks and threats
•Firewall controls or governs network access by allowing or denying the incoming or outgoing network traffic according to firewall policy rules.
•Manual definition of rules often result in anomalies in the policy
•Detecting and resolving these anomalies manually is a tedious and an error prone task
Anomaly detection:• Theoretical Framework for the resolution of anomaly• A new algorithm will simultaneously detect and
resolve any anomaly that is present in the policy rules
Traffic Mining: • Mine the traffic and detect anomalies
To bridge the gap between what is written in the firewall policy rules and what is being observed in the network is to analyze traffic and log of the packets–
Network traffic trend may show that some rules are out-dated or not used recently
FirewallFirewallLogLog File File
Mining Log File Mining Log File Using FrequencyUsing Frequency
FilteringFilteringRule Rule
GeneralizationGeneralization
Generic RulesGeneric Rules
Identify Decaying Identify Decaying &&
Dominant RulesDominant Rules
EditEditFirewall RulesFirewall Rules
FirewallPolicy Rule
What are worms? Self-replicating program; Exploits software vulnerability on a victim;
Remotely infects other victims Goals of worm detection
Real-time detection Issues
Substantial Volume of Identical Traffic, Random Probing Methods for worm detection
Count number of sources/destinations; Count number of failed connection attempts
Worm Types Email worms, Instant Messaging worms, Internet worms, IRC worms,
File-sharing Networks worms
Training data
Feature extraction
Clean or Infected ?
Outgoing Emails
ClassifierMachine Learning
Test data
The Model
Task: given some training instances of both “normal” and “viral” emails, induce a hypothesis to detect “viral” emails.
Data Mining forNon real-time Threats:Gather data, build terrorist profilesMine data, prune results
Data Mining forCounter-terrorism
Data Mining forReal-time Threats:Gather data in real-time, build real-time models,Mine data, Report results
Gather data from multiple sources Information on terrorist attacks: who, what, where, when,
how Personal and business data: place of birth, ethnic origin,
religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . .
Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . .
Integrate the data, build warehouses and federations Develop profiles of terrorists, activities/threats Mine the data to extract patterns of potential terrorists and
predict future activities and targets Find the “needle in the haystack” - suspicious needles? Data integrity is important
Integratedatasources
Clean/modifydatasources
BuildProfilesof Terrorists and Activities
Examineresults/
Pruneresults
Reportfinalresults
Data sourceswith informationabout terroristsand terrorist activities
Minethedata
Nature of data Data arriving from sensors and other devices
Continuous data streams Breaking news, video releases, satellite images Some critical data may also reside in caches
Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining)
Data mining techniques need to meet timing constraints Quality of service (QoS) tradeoffs among timeliness, precision and
accuracy Presentation of results, visualization, real-time alerts and triggers
Integratedatasources in real-time
Buildreal-timemodels
ExamineResults in Real-time
Reportfinalresults
Data sourceswith informationabout terroristsand terrorist activities
Minethedata
Rapidlysift throughdata and discardirrelevant data
Association:John and Jamesoften seen together after anattack
Link Analysis:Follow chain from A to B to C to D
Clustering: Divide population; People from country X of a certain religion; people from Country Y Interested in airplanes
Classification:Build profiles ofTerrorist and classify terrorists
Anomaly Detection:John registers at flight school;but des not care about takeoff or landing
Data Mining Outcomes and Techniques
Huge amounts of surveillance and video data available in the security domain
Analysis is being done off-line usually using “Human Eyes”
Need for tools to aid human analyst ( pointing out areas in video where unusual activity occurs)
Event Representation Estimate distribution of pixel intensity change
Event Comparison Contrast the event representation of different
video sequences to determine if they contain similar semantic event content.
Event Detection Using manually labeled training video
sequences to classify unlabeled video sequences
Law enforcement: Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviors.
Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.
The various data mining techniques that have been proposed towards the enhancement of security of different application.
The ways in which data mining has been known to aid the process of Intrusion Detection,firewall,worm detection counter-terrorism and the ways in which the various techniques have been applied and evaluated.
B. Thuraisingham. Managing threats to web databases and cyber systems: Issues, solutions and challenges. In V. Kumar et al, editor, Cyber Security: Threats and Countermeasures. Kluwer
B. Thuraisingham. Data mining, national security, privacy and civil liberties.SIGKDD Explorations, January 2003
F. Bolz et al. The Counterterrorism Handbook: Tactics, Procedures, and Techniques.CRC Press, 2001.
http://dmoz.org/Computers/Security/Intrusion_Detection_Systems/
Thank you