Upload
elakadi
View
218
Download
0
Embed Size (px)
Citation preview
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
1/14
Feature Selection with Conditional Mutual
Information Maximin in Text Categorization
Department of Computer Science,
Hong Kong University of Science and Technology
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
2/14
Outline
Introduction
Information Theory Review Conditional Mutual Information Maximin (CMIM)
Experimental Result
Conclusion
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
3/14
Introduction
Text Categorization
Preprocessing
Feature selection
filter method
wrapper method
embedded method
Training the classifier Testing
Training the classifier Testing
Important
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
4/14
Classic Feature Selection Methods
Ranking Criterion
Information gain Mutual information
test
Drawback
Regardless of relationshipamong features
2
Documents w1 w2 w3 class
2
1
0
0
d1 2 0 c1
d2 1 0 c1
d3 0 0 c2
d4 0 2 c2
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
5/14
Information Theory Review Entropy: H(X)
H(X) = -p(x) log p(x) = - E( log p(x))
Mutual Information: I(X;Y)
I(X;Y)= - E( log (p(x,y)/p(x)p(y)) )
Conditional MI: I(X;Y|Z)I(X;Y|Z)= - E( log (p(x,y|z)/p(x|z)p(y|z)) )
H(X) H(Y)
I(X;Y)I(X;Y|Z)
H(X|Y)
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
6/14
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
7/14
Intuition of CMIM Algorithm
Try to approximate CMI
I(F*; C | F1,,Fk) min I(F*; C | Fi,,Fj)
I(F*; C | F1,,Fk)
min I(F*; C | Fi,,Fj)
I(F*; C | F1,,Fk)
min I(F*; C | Fi, Fj)I(F*; C | F1,,Fk) min I(F*; C | Fi )
k-1
k-2
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
8/14
CMIM Algorithm
Input: n the number of features to be selected
v the number of total features
Output: F the set for selected features
1. Set F to be empty2. m=1
3. Add Fi in F, where Fi = argmaxi=1..v I(Fi;C)
4. Repeat5. m++
6. add Fi in F, where Fi = argmaxi=1..v {min Fj F I(Fi;C|Fj)}
7. Until m=n
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
9/14
Experiment Setup
Dataset:
WebKB: 4199 pages, 4 categories NewsGroups: 20000 pages, 10 categories
Feature selection criterion:
CMIM
Information gain (IG)
Classifier: Nave Bayes
Support vector machine
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
10/14
Result for WebKB
Micro-averaged accuracy Macro-averaged accuracy
SVM
NB
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
11/14
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
12/14
Result for Newsgroup
Micro-averaged accuracy Macro-averaged accuracy
SVM
NB
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
13/14
Discussion
Feature size
AccCMIM >> AccIG when small feature size
Category number
AccCMIM >> AccIG when small category number
Category deviation
MicroAccCMIM >> MicroAccIG when large deviation
8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin
14/14
Conclusion
Use simple triplet to approximate joint conditional
mutual information
CMIM algorithm tries to reduce the correlation among
features
Complexity is O(NV3)