Click here to load reader
Upload
paulina-wheeler
View
220
Download
4
Embed Size (px)
Citation preview
A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai
Typical IDSData Collection
Data Pre-Processing
Intrusion Identification
Response
This work mainly focused on IntrusionIdentification
Architecture
Attribute Selection“With more data, the simpler solution
can be more accurate than the sophisticated solution.”
Selection process based on means and modes of numeric attributes
A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes
Selected Attributes
Selected Attributes
logged_in
Serror_rate
srv_serror_rate
Same_srv_rate
diff_srv_rate
dst_host_serror_rate
dst_host_srv_serror_rate
A strong contrast between the trends of a selected and discarded attribute visible
Training Set Selection (using LDA)Latent Dirichlet Allocation is a
generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.
Sample LDA OutputTopic 0th:
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly
0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly
0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly
0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly
Topic 1th:
0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly
0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly
………………
Genetic Algorithm
Genetic AlgorithmApplied on Normal and Anomaly
packets separatelyThreshold value taken for
providing a negative weightRun for 3 generationsTop 3 values for anomaly and
normal packets used
Identifying nature of incoming packet
For each selected attribute value Fi in incoming packet◦ If Fi ∈ Vi
Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)
◦ Else Si= 0
C = Σ Si If C > 0
◦ Then AnomalyElse Normal
Additional WeightMultiplied to the anomaly
frequencyWhy ?
generic anomalies having diverse values unlike the normal packets that contain
values in a particular range• Trade-off between the accuracy andthe false positive rate required
Additional Weight
ResultsTested against 50000 anomaly
and 50000 normal packets from KDDCup’99 dataset.
88.5% Accuracy with 6% FPR
Future WorkFocus on specific anomaly typesBetter Attribute Selection
algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest
Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees
REFERENCES1. Valeur, Fredrik, and Giovanni Vigna. Intrusion detection and
correlation: challenges and solutions. Vol. 14. Springer, 2005.2. Kim, Dong Seong, and Jong Sou Park. "Network-based
intrusion detection with support vector machines." Information Networking. Springer Berlin Heidelberg, 2003.
3. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research,Volume 3, pp.993-1022,2003.
4. Cramer, Christopher, and Lawrence Carin. "Bayesian topic models for describing computer network behaviors." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011.
5. Newton, Benjamin D. "Anomaly Detection in Network Traffic Traces Using Latent Dirichlet Allocation."
6. Li, Wei. "Using genetic algorithm for network intrusion detection." Proceedings of the United States Department of Energy Cyber Security Group,pp1-8,2004.
REFERENCES (Contd.)7. Bing-Yi Zhang,Ya-Min Sun,Yu-Lan,Bian,Hong Ke Zhang,”Linear
Discriminant Analysis in network traffic modeling”, International Journal of Communication Systems”,Volume 19,Issue 1,pp.53-65,2006.
8. A.Gomathy and B.Lakshmi,”Network intrusion detection using Genetic algorithm and Neural Network”, Communications in Computer and Information Science,Volume 198,pp.399-408,2011.
9. Siva S,Sivatha Sindhu,S.Geetha,A.Kannan,”Decision tree based light weight intrusion detection using a wrapper approach”,Expert Systems with applications,Volume 39,pp.129-141,2012.
10. B.Kavitha,S.Karthikeyan,P.Sheeba Maybell,”An ensemble design of intrusion detection system for handling uncertainity using neutrosophic logicclassifier”,Knowledge based systems,Volume 28,pp.88-96,2012.
11. Saini, Shubham, Bhavesh Kasliwal, and Shraey Bhatia. "Spam Detection using G-LDA." International Journal of Advanced Research in Computer Science and Software Engineering,Volume 3,Issue 10,pp.406-409,2013.
12. Cup, K. D. D. "Available on: http://kdd. ics. uci. edu/databases/kddcup 99/kddcup99. html.",2007.
REFERENCES (Contd.)13. Phan, Xuan-Hieu, and Cam-Tu Nguyen. "Jgibblda: A java implementation of
latent Dirichlet allocation (lda) using gibbs sampling for parameter estimation and inference”,2006.
14. Shekhar R Gaddam, Vir V Phoha and Kiran S Balagani,”A novel method for supervised anomaly detection by cascading K-Means clustering and ID3 deicsion tree learning methods”, IEEE transactions on knowledge and data engineering,Volume.19,pp.345-354,2007.
15. Amor, Nahla Ben, Salem Benferhat, and Zied Elouedi. “Naive Bayes vs decision trees in intrusion detection systems” Proceedings of the 2004 ACM symposium on Applied computing, pp.420-424,2004.
16. Benferhat, S. and Tabia, K., “On the combination of Naive Bayes and decision trees for intrusion detection”, International Conference on Intelligent Agents, Web Technologies and Internet Commerce,Volume 1, pp. 211–216,2006.
17. [17] Xiang, C., and Lim, S. M, “Design of multiple-level hybrid classifier for intrusion detection system”, IEEE Transaction on System, Man and Cybernetics, Part A: Cybernetics, Volume 2, pp.117–122,2005.
18. [18] Sumaiya Thaseen and Ch. Aswani Kumar, “An Analysis of supervised tree based classifiers for intrusion detection system”, IEEE International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), February 2013.
QUESTIONS?