View
221
Download
2
Tags:
Embed Size (px)
Citation preview
1
Negative selection algorithms: from the thymus to V-detector
Dissertation defenseZhou Ji
Major professor: Prof. DasguptaAdvisory committee: Dr. Lin, Dr. McCauley, Dr. Phan
2
Outline Background of the research area V-detector: a new algorithm Experiments Discussion on applicability and others Conclusions
4
Related research areas
Artificial Intelligence
… … Biology-inspired methods … …
Neural networkEvolutionary computation
Artificial immune system (AIS) … …
Immune network Clonal selectionNegative selection
algorithmsOther models
7
Biological metaphor: negative selection in the thymus
How T cells mature in the thymus
The immature T cells have diversified receptors.
Those that recognize self are eliminated. The rest can become mature T cells.
8
Basic idea of negative selection algorithms:
The problem to solve: anomaly detection or one-class classification
11
Components that make up a negative selection algorithm
Data and detector representation Binary (or string) representation Real-valued representation; detectors as hypersphere, or
hyper-rectangle Hybrid representation
Generate/elimination mechanism Random generation + censoring Genetic algorithm Greedy algorithm or other deterministic algorithm
Matching rule Rcb (r contiguous bits) for binary representation Euclidean distance-based for real-valued representation
12
Major issues in negative selection algorithms Number of detectors
Affecting the efficiency of generation and detection
Detector coverage Affecting the accuracy of detection
Algorithm of generating detectors Linked to efficiency and quality of detector set
14
V-detector: new development in NSA
1. Variable-sized detectors2. Estimation of detector coverage3. Boundary-aware interpretation of self
samples4. A generic algorithm
Important features of V-detector:
15
Detectors can be just a point
Detectors in their basic form (constant size)
Feature 1: variable size
17
How many detectors to generate:
approach in earlier works V-detector’s approach
Feature 2: coverage estimate
18
How to estimate the coverage:
Feature 2: coverage estimate
A random point may be • in self region• in nonself region, but already covered•In nonself region, not covered yet
More consecutive “already covered” point more coverage is achieved
1. An intuitive estimate; 2. hypothesis testing (Is the target coverage achieved?)
19
What does one self sample point mean?
Point-wise interpretation of self samples
Feature 3: boundary-aware
Smaller matching threshold Large matching threshold
20
“The whole is more than the sum of its parts.”
Self sample could be near the boundary. The neighboring points provide the hint.
Feature 3: boundary-aware
21
How to be boundary-aware by using detectors:
Feature 3: boundary-aware
Point-wise interpretationBoundary-aware interpretation
Large threshold
Small threshold
22
V-detector as a generic algorithm Components that can be plugged in:
Data representation Distance measure Matching rule
The other three features are available for different customized variations.
Feature 4: a generic algorithm
23
Example: generalized Euclidean distance
Minkowski distance of order m (m-norm distance or L-m distance)
Feature 4: a generic algorithm
25
V-detector's advantage Efficiency:
fewer detectors fast generation
Coverage confidence (reliability) Applicable to more applications
27
extensive experiments Synthetic 2-D data Real world data
Famous iris data Air pollution Biomedical data Gene expression Indian Telugu Ball bearing measurementBall bearing measurement KDD cup data Dental imageDental image
28
2-D synthetic data
Training points (1000) Test data (1000 points) and the ‘real shape’ we try to learn
30
Actual detectors generated
Detector set based 1000 training points Detector set based 100 training points
33
Comparison with other negative selection algorithms: iris data
Training Data Algorithm Detection Rate False Alarm rate Number of Detectors
Mean SD Mean SD Mean SD
Setosa100%
MILA 95.16 1.79 0 0 1000* 0
NSA 100 0 0 0 1000 0
V-detector 99.98 0.14 0 0 20 7.87
Setosa50%
MILA 94.02 2.44 8.42 1.56 1000* 0
NSA 100 0 11.18 2.17 1000 0
V-detector 99.97 0.17 1.32 0.95 16.44 5.63
Versicolor100%
MILA 84.37 2.79 0 0 1000* 0
NSA 95.67 0.69 0 0 1000 0
V-detector 85.95 2.44 0 0 153.24 38.8
Versicolor50%
MILA 84.46 2.70 19.60 2.00 1000* 0
NSA 96 0.45 22.2 1.25 1000 0
V-detector 88.3 2.77 8.42 2.12 110.08 22.61
Virginica100%
MILA 75.75 2.01 0 0 1000* 0
NSA 92.51 0.74 0 0 1000 0
V-detector 81.87 2.78 0 0 218.36 66.11
Virginica50%
MILA 88.96 2.04 24.98 2.56 1000* 0
NSA 97.18 0.71 33.26 0.96 1000 0
V-detector 93.58 2.33 13.18 3.24 108.12 30.74
38
Comparison with SVM• On disconnected 2-D self region
• On reduced representation of dental images
40
NSA’s applicability Applicable scenario
Large amount of self (normal) samples Rare or no abnormal samples
another possible usage: “negative database”
When it is not appropriate: for example, number of self samples is small.
41
Comparison with other methods Other negative selection algorithms SVM (Support Vector machines)
One-class SVM is comparable. Kernel function is very important for SVM
42
Conclusions Review of negative selection algorithms V-detector: a new development
High efficiency Generic algorithm
Real world application Prospect of NSA and AIS in general
43
My publications for this dissertation Dasgupta, Ji, Gonzalez, Artificial immune system (AIS)
research in the last five years, IEEE CEC 2003 Ji, Dasgupta, Augmented negative selection algorithm with
variable-coverage detectors, IEEE CEC 2004 Ji, Dasgupta, Real-valued negative selection algorithm with
variable-sized detectors, GECCO 2004 Ji, Dasgupta, Estimating the detector coverage in a
negative selection algorithm, GECCO 2005 Ji, A boundary-aware negative selection algorithm, ASC
2005 Ji, Dasgupta, Applicability Issues of the real-valued negative
selection algorithms, GECCO 2006 Ji, Dasgupta, Analysis of Dental Images using Artificial
Immune Systems, IEEE CEC 2006 Ji, Dasgupta, Revisiting negative selection algorithms,
revised submission to the Evolutionary Computation Journal