Graph-based Iterative Hybrid Feature Selection
Erheng Zhong† Sihong Xie† Wei Fan‡ Jiangtao Ren† Jing Peng# Kun Zhang$
†Sun Yat-sen University‡IBM T. J. Watson Research Center
#Montclair State University$Xavier University of Louisiana
Where we are
Supervised Feature Selection Unsupervised Feature Selection Semi-supervised Feature Selection Hybrid:
Supervised to include key features Improve with semi-supervised
approach
Supervised Feature Selection
Sufficient Labeled Data
EffectiveFeaturesFeature Selection
Insufficient Labeled Data sample selection bias problem
IneffectiveFeaturesFeature Selection
Only feature 2 will be selected, but feature 1 is also useful!
Toy example (1)
Labeled data A(1,1,1,1;red) B(1,-1,1,-1;blue) Unlabeled data C(0,1,1,1;red) D(0,-1,1,1;red) Both feature 2 & 4 are correlate
d to class based on A and B. They are selected by supervised fs.
Semi-supervised Feature Selection
EffectiveFeatures
Feature SelectionMany Distinct Unlabeled Data
A few Labeled Data
Many Unlabeled Data but Indistinctive
IneffectiveFeatures
Toy example (2)
A semi-supervised approach “Spectral Based Feature Selection”. Features are ranked according to the smoothness between data points and consistency with label information.
Feature 2 will be selected if only one feature is desired.
Instances which are closer should be in the same cluster
31
313
13
131 Label information
violated by clustering
Solution Hybrid
Labeled data insufficient Sample selection bias Supervised fail
Unlabeled data indistinct Data from different class are not separated Semi-supervised fail
Both have disadvantages,
how to address?
Combine !
Hybrid Feature Selection [IteraGraph_FS]
Good Distant Measure
Many Unlabeled Data
Disticnt!
EffectiveFeatures
Semi-supervised Feature Selection
Most CriticalFeatures
Supervised Feature Selection
A few Labeled Data
Better Distant Measure
Toy example (3)
Feature 2 & 4 are selected based on A and B using a supervised approach
A(1,1;Red) B(-1,-1;Blue)C(1,1;Red) D(-1,1;Red)
Dimension Reduction
Prediction
A(1,1;Red) B(-1,-1;Blue)C(1,1;Red) D(-1,1;Red)
Feature Selection
Only feature 4 is useful
Supervised feature selection
Semi-supervised feature selection
Properties of feature selection The distance between any two examples
is approximately the same under the high-dimension feature space.[Theorem 3.1]
Feature selection can obtain a more distinguishable distance measure which lead to a better confidence estimate. [Theorem 3.2]
Theorems 3.1 and 3.2
3.1 Dimensionality increases Nearest neighbor approaches the farthest
neighbor3.2 More distinguishable similarity measure Better classification confidence matrix
4
2
1
3
2 2
2
2
2
Confidence2: (0.5 vs 0.5) 4: (0.5 vs 0.5)
Feature selection
4
2
1
3
1 1
2
2
3
Confidence2: (0.67 vs 0.33) 4: (0.33 vs 0.67)
Semi-supervised Feature Selection
Graph-based [Label Propagation] Expand the labeled set by adding unlabeled
data and their prediction labels which have high confidence (s%).
Perform feature selection on the new labeled set.
4 Labeled Data 6 Labeled Data
Label Propagation
Confidence and Margin (Lemma 3.2)
Bad Distance Measure
Better Distance Measure
Near Hit Near Miss
Low Confidence
High Confidence
Larger Margin can be achieved via
distance manipulation
Selection Strategy Comparison (Theorem 3.3)
Random SelectionOur Confidence-Based Strategy
Low Average Confidence
High Average Confidence
Small Margin
Larger Margin
Lemma 3.2
Experiments setup Data Set
Handwritten Digit Recognition Problem Biomedical and Gene Expression Data Text Documents [Reuters-21578]
Comparable Approach Supervised Feature selection: SFFS Semi-supervised approach: sSelect [SDM07]
Data Set -- Description
Feature Quality Study
Conclusions
Labeled information Critical features, better confidence estimates
Unlabeled data Improve this chosen feature set
Flexible Can incorporate many feature selection
methods which aim at revealing the relationship between data points.