18
Graph-based Iterative Hybrid Feature Selection Erheng Zhong Sihong Xie Wei Fan Jiangtao Ren Ji ng Peng # Kun Zhang $ Sun Yat-sen University IBM T. J. Watson Research Center # Montclair State University $ Xavier University of Louisiana

Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

  • View
    230

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Graph-based Iterative Hybrid Feature Selection

Erheng Zhong† Sihong Xie† Wei Fan‡ Jiangtao Ren† Jing Peng# Kun Zhang$

†Sun Yat-sen University‡IBM T. J. Watson Research Center

#Montclair State University$Xavier University of Louisiana

Page 2: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Where we are

Supervised Feature Selection Unsupervised Feature Selection Semi-supervised Feature Selection Hybrid:

Supervised to include key features Improve with semi-supervised

approach

Page 3: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Supervised Feature Selection

Sufficient Labeled Data

EffectiveFeaturesFeature Selection

Insufficient Labeled Data sample selection bias problem

IneffectiveFeaturesFeature Selection

Only feature 2 will be selected, but feature 1 is also useful!

Page 4: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Toy example (1)

Labeled data A(1,1,1,1;red) B(1,-1,1,-1;blue) Unlabeled data C(0,1,1,1;red) D(0,-1,1,1;red) Both feature 2 & 4 are correlate

d to class based on A and B. They are selected by supervised fs.

Page 5: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Semi-supervised Feature Selection

EffectiveFeatures

Feature SelectionMany Distinct Unlabeled Data

A few Labeled Data

Many Unlabeled Data but Indistinctive

IneffectiveFeatures

Page 6: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Toy example (2)

A semi-supervised approach “Spectral Based Feature Selection”. Features are ranked according to the smoothness between data points and consistency with label information.

Feature 2 will be selected if only one feature is desired.

Instances which are closer should be in the same cluster

31

313

13

131 Label information

violated by clustering

Page 7: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Solution Hybrid

Labeled data insufficient Sample selection bias Supervised fail

Unlabeled data indistinct Data from different class are not separated Semi-supervised fail

Both have disadvantages,

how to address?

Combine !

Page 8: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Hybrid Feature Selection [IteraGraph_FS]

Good Distant Measure

Many Unlabeled Data

Disticnt!

EffectiveFeatures

Semi-supervised Feature Selection

Most CriticalFeatures

Supervised Feature Selection

A few Labeled Data

Better Distant Measure

Page 9: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Toy example (3)

Feature 2 & 4 are selected based on A and B using a supervised approach

A(1,1;Red) B(-1,-1;Blue)C(1,1;Red) D(-1,1;Red)

Dimension Reduction

Prediction

A(1,1;Red) B(-1,-1;Blue)C(1,1;Red) D(-1,1;Red)

Feature Selection

Only feature 4 is useful

Supervised feature selection

Semi-supervised feature selection

Page 10: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Properties of feature selection The distance between any two examples

is approximately the same under the high-dimension feature space.[Theorem 3.1]

Feature selection can obtain a more distinguishable distance measure which lead to a better confidence estimate. [Theorem 3.2]

Page 11: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Theorems 3.1 and 3.2

3.1 Dimensionality increases Nearest neighbor approaches the farthest

neighbor3.2 More distinguishable similarity measure Better classification confidence matrix

4

2

1

3

2 2

2

2

2

Confidence2: (0.5 vs 0.5) 4: (0.5 vs 0.5)

Feature selection

4

2

1

3

1 1

2

2

3

Confidence2: (0.67 vs 0.33) 4: (0.33 vs 0.67)

Page 12: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Semi-supervised Feature Selection

Graph-based [Label Propagation] Expand the labeled set by adding unlabeled

data and their prediction labels which have high confidence (s%).

Perform feature selection on the new labeled set.

4 Labeled Data 6 Labeled Data

Label Propagation

Page 13: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Confidence and Margin (Lemma 3.2)

Bad Distance Measure

Better Distance Measure

Near Hit Near Miss

Low Confidence

High Confidence

Larger Margin can be achieved via

distance manipulation

Page 14: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Selection Strategy Comparison (Theorem 3.3)

Random SelectionOur Confidence-Based Strategy

Low Average Confidence

High Average Confidence

Small Margin

Larger Margin

Lemma 3.2

Page 15: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Experiments setup Data Set

Handwritten Digit Recognition Problem Biomedical and Gene Expression Data Text Documents [Reuters-21578]

Comparable Approach Supervised Feature selection: SFFS Semi-supervised approach: sSelect [SDM07]

Page 16: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Data Set -- Description

Page 17: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Feature Quality Study

Page 18: Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Conclusions

Labeled information Critical features, better confidence estimates

Unlabeled data Improve this chosen feature set

Flexible Can incorporate many feature selection

methods which aim at revealing the relationship between data points.