36
常検知 Jubatus Casual Talks #2 株式会社Preferred Infrastructure リサーチャー&Jubatusチームリーダー 将平

Jubatus Casual Talks #2 異常検知入門

Embed Size (px)

DESCRIPTION

2013/12/14 Jubatus Casual Talks #2 異常検知入門の資料です。

Citation preview

  • 1. Jubatus Casual Talks #2 Preferred Infrastructure Jubatus
  • 2. l l l l HIDO Shohei TwitterID: @sla l l l 2006-2012: IBM l l 2012-: l Jubatus 2013-: Preferred Infrastructure America, Inc l Chief Research Officer 2
  • 3. l l l 3 l l l 3
  • 4. Agenda l l l l l
  • 5. 5 4 3 2 Outlier 1 0 -5 -3 -1 1 -1 -2 -3 -4 -5 5 3 5
  • 6. (1/2) l l l l l l l l l l l l l l l l l l l l 6
  • 7. (2/2) l l l l DDoS l l / l l / l l / l l l l l l l / l l / l l Twitter l 7
  • 8. Error Rare Novelty Defect Outlier Deviation Noise Anomaly Fault 8 Fraud Intrusion
  • 9. Noise Error 5 4 Rare 3 2 Outlier 1 0 -5 -3 -1 1 -1 3 -2 5 Intrusion -3 -4 Defect -5 Novelty Deviation 9 Fault Fraud
  • 10. l Grubbs, l 1969 An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. l Hawkins, l An observation that deviates so much from other observations as to arouse suspicion that is was generated by a different mechanism. l Barnett l 1980 & Lewis, 1994 An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. 10
  • 11. l l l l l l l l l HDD l l HDD DDoS l l 11
  • 12. 3 1. 2. 3. (Outlier detection) l l (Change point detection) l l (Anomaly detection, etc) l l 12
  • 13. l i.i.d l l 5 4 3 2 1 0 -5 -3 -1 1 -1 -2 -3 -4 -5 13 3 5
  • 14. l l l l 5 4 3 2 1 0 -1 0 5 10 15 -2 -3 -4 -5 14 20 25 30
  • 15. l l l l l 5 4 3 5 2 4 1 3 0 2 -1 1 -2 0 -3 -1 -2 0 5 10 15 -4 0 5 20 -5 -3 -4 -5 15 10 25 15 30 20 25 30
  • 16. Agenda l l l l l
  • 17. 3 l l l & l i.i.d l l l l Unix 17
  • 18. (1/3) l l l l l l l l l l l Minimum Volume Ellipsoid estimation [Rousseeuw, 1985] l Minimum Covariance Determinant [Rousseeuw, 1999] l 18
  • 19. (2/3) l l l l l l l l l l l RIPPER [Cohen, Fast effective rule induction, Machine Learning, 1995] 19
  • 20. l l l l l l 20
  • 21. Iris33 USPS10256 21
  • 22. (3/3) l l l l l l l l l l l l One-class SVM l Local Outlier Factor (LOF) 22
  • 23. One-class SVM [Schoelkopf et al., 1999] l l l Support Vector Machine SVM l l 2 OC-SVM l (1-) l l Manevitz et al, One-Class SVMs for Document Classification, 2001 23
  • 24. LOF: Identifying Density-Based Local Outliers [Breunig, SIGMOD2000] l LOF l l l l LOF X1 (LOF X1 X2LOF X2 X3 X3LOF 24
  • 25. l l l normal : [, 2009] l faulty 25
  • 26. [Sugiyama&Borgwardt, NIPS2013] l K=4 l K (5100) () l l l l () l l K l l K=20 l l CR l 26 https://github.com/mahito-sugiyama/sampling-outlier-detection/
  • 27. Agenda l l l l l
  • 28. (1/3) R l S l l CRAN l LOF l l l dpreplofactor Rloflof One-class SVM l l KernlabLIBSVMone-svc l l ADM3ADM3 l l cpmdetectChangePoint OutlierDC l mvoutlier 28
  • 29. (2/3) OSS l OSS l Weka: Java l l SHOGUN: SVM l l One-class SVM, EllipticEnvelop ELKI: AGPL l l One-class SVM Scikits-learn: Python l l Distance Based & Spatial outlier detection LOF DB-outlier, LOCI, LDOF OPTICS-OF EM-Outlier , , , l RapidMiner: YALE, WekaR l l LOF DB-outlier, Class Outlier Factor, , SAS: 29
  • 30. (3/3) l l NEC l Smart Sifter l Change Detector l l Malheur l l Nave Bayes l FICOFalcon Fraud Manager l l l Nerural networkMulti-layered self-calibrating analytics IBM, SAS, Oracle, NEC 30
  • 31. Agenda l l l l l
  • 32. Machine learning that matters Kiri L. Wagsta, ICML, 2012. l l l 32
  • 33. Edge-heavy data: l l l exhaust data l l , , , Edge-Heavy Data: CPS GICTF 2012, http://www.gictf.jp/doc/20120709GICTF.pdf 33
  • 34. Jubatus l l l (1) l (2) l (3) l l l l (4) l 34
  • 35. Agenda l l l l l
  • 36. l l l 3 l l l 36