18

Click here to load reader

A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Embed Size (px)

Citation preview

Page 1: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

A Hybrid Anomaly Detection Model using G-LDABhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.VIT University – Chennai

Page 2: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Typical IDSData Collection

Data Pre-Processing

Intrusion Identification

Response

This work mainly focused on IntrusionIdentification

Page 3: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Architecture

Page 4: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Attribute Selection“With more data, the simpler solution

can be more accurate than the sophisticated solution.”

Selection process based on means and modes of numeric attributes

A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes

Page 5: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Selected Attributes

Selected Attributes

logged_in

Serror_rate

srv_serror_rate

Same_srv_rate

diff_srv_rate

dst_host_serror_rate

dst_host_srv_serror_rate

A strong contrast between the trends of a selected and discarded attribute visible

Page 6: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Training Set Selection (using LDA)Latent Dirichlet Allocation is a

generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.

Page 7: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Sample LDA OutputTopic 0th:

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly

0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly

0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly

0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly

0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly

Topic 1th:

0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly

0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly

………………

Page 8: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Genetic Algorithm

Page 9: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Genetic AlgorithmApplied on Normal and Anomaly

packets separatelyThreshold value taken for

providing a negative weightRun for 3 generationsTop 3 values for anomaly and

normal packets used

Page 10: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Identifying nature of incoming packet

For each selected attribute value Fi in incoming packet◦ If Fi ∈ Vi

Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)

◦ Else Si= 0

C = Σ Si If C > 0

◦ Then AnomalyElse Normal

Page 11: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Additional WeightMultiplied to the anomaly

frequencyWhy ?

generic anomalies having diverse values unlike the normal packets that contain

values in a particular range• Trade-off between the accuracy andthe false positive rate required

Page 12: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Additional Weight

Page 13: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

ResultsTested against 50000 anomaly

and 50000 normal packets from KDDCup’99 dataset.

88.5% Accuracy with 6% FPR

Page 14: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

Future WorkFocus on specific anomaly typesBetter Attribute Selection

algorithm ?◦oneR◦Entropy based◦Chi-squared◦randomForest

Better classification technique ?◦Clustering – Hierarchical , K-Means◦Decision Trees

Page 15: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

REFERENCES1. Valeur, Fredrik, and Giovanni Vigna. Intrusion detection and

correlation: challenges and solutions. Vol. 14. Springer, 2005.2. Kim, Dong Seong, and Jong Sou Park. "Network-based

intrusion detection with support vector machines." Information Networking. Springer Berlin Heidelberg, 2003.

3. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research,Volume 3, pp.993-1022,2003.

4. Cramer, Christopher, and Lawrence Carin. "Bayesian topic models for describing computer network behaviors." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011.

5. Newton, Benjamin D. "Anomaly Detection in Network Traffic Traces Using Latent Dirichlet Allocation."

6. Li, Wei. "Using genetic algorithm for network intrusion detection." Proceedings of the United States Department of Energy Cyber Security Group,pp1-8,2004.

Page 16: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

REFERENCES (Contd.)7. Bing-Yi Zhang,Ya-Min Sun,Yu-Lan,Bian,Hong Ke Zhang,”Linear

Discriminant Analysis in network traffic modeling”, International Journal of Communication Systems”,Volume 19,Issue 1,pp.53-65,2006.

8. A.Gomathy and B.Lakshmi,”Network intrusion detection using Genetic algorithm and Neural Network”, Communications in Computer and Information Science,Volume 198,pp.399-408,2011.

9. Siva S,Sivatha Sindhu,S.Geetha,A.Kannan,”Decision tree based light weight intrusion detection using a wrapper approach”,Expert Systems with applications,Volume 39,pp.129-141,2012.

10. B.Kavitha,S.Karthikeyan,P.Sheeba Maybell,”An ensemble design of intrusion detection system for handling uncertainity using neutrosophic logicclassifier”,Knowledge based systems,Volume 28,pp.88-96,2012.

11. Saini, Shubham, Bhavesh Kasliwal, and Shraey Bhatia. "Spam Detection using G-LDA." International Journal of Advanced Research in Computer Science and Software Engineering,Volume 3,Issue 10,pp.406-409,2013.

12. Cup, K. D. D. "Available on: http://kdd. ics. uci. edu/databases/kddcup 99/kddcup99. html.",2007.

Page 17: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

REFERENCES (Contd.)13. Phan, Xuan-Hieu, and Cam-Tu Nguyen. "Jgibblda: A java implementation of

latent Dirichlet allocation (lda) using gibbs sampling for parameter estimation and inference”,2006.

14. Shekhar R Gaddam, Vir V Phoha and Kiran S Balagani,”A novel method for supervised anomaly detection by cascading K-Means clustering and ID3 deicsion tree learning methods”, IEEE transactions on knowledge and data engineering,Volume.19,pp.345-354,2007.

15. Amor, Nahla Ben, Salem Benferhat, and Zied Elouedi. “Naive Bayes vs decision trees in intrusion detection systems” Proceedings of the 2004 ACM symposium on Applied computing, pp.420-424,2004.

16. Benferhat, S. and Tabia, K., “On the combination of Naive Bayes and decision trees for intrusion detection”, International Conference on Intelligent Agents, Web Technologies and Internet Commerce,Volume 1, pp. 211–216,2006.

17. [17] Xiang, C., and Lim, S. M, “Design of multiple-level hybrid classifier for intrusion detection system”, IEEE Transaction on System, Man and Cybernetics, Part A: Cybernetics, Volume 2, pp.117–122,2005.

18. [18] Sumaiya Thaseen and Ch. Aswani Kumar, “An Analysis of supervised tree based classifiers for intrusion detection system”, IEEE International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), February 2013.

Page 18: A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

QUESTIONS?