Feature Selection With Conditional Mutual Information MaxiMin

  • Upload
    elakadi

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    1/14

    Feature Selection with Conditional Mutual

    Information Maximin in Text Categorization

    Department of Computer Science,

    Hong Kong University of Science and Technology

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    2/14

    Outline

    Introduction

    Information Theory Review Conditional Mutual Information Maximin (CMIM)

    Experimental Result

    Conclusion

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    3/14

    Introduction

    Text Categorization

    Preprocessing

    Feature selection

    filter method

    wrapper method

    embedded method

    Training the classifier Testing

    Training the classifier Testing

    Important

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    4/14

    Classic Feature Selection Methods

    Ranking Criterion

    Information gain Mutual information

    test

    Drawback

    Regardless of relationshipamong features

    2

    Documents w1 w2 w3 class

    2

    1

    0

    0

    d1 2 0 c1

    d2 1 0 c1

    d3 0 0 c2

    d4 0 2 c2

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    5/14

    Information Theory Review Entropy: H(X)

    H(X) = -p(x) log p(x) = - E( log p(x))

    Mutual Information: I(X;Y)

    I(X;Y)= - E( log (p(x,y)/p(x)p(y)) )

    Conditional MI: I(X;Y|Z)I(X;Y|Z)= - E( log (p(x,y|z)/p(x|z)p(y|z)) )

    H(X) H(Y)

    I(X;Y)I(X;Y|Z)

    H(X|Y)

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    6/14

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    7/14

    Intuition of CMIM Algorithm

    Try to approximate CMI

    I(F*; C | F1,,Fk) min I(F*; C | Fi,,Fj)

    I(F*; C | F1,,Fk)

    min I(F*; C | Fi,,Fj)

    I(F*; C | F1,,Fk)

    min I(F*; C | Fi, Fj)I(F*; C | F1,,Fk) min I(F*; C | Fi )

    k-1

    k-2

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    8/14

    CMIM Algorithm

    Input: n the number of features to be selected

    v the number of total features

    Output: F the set for selected features

    1. Set F to be empty2. m=1

    3. Add Fi in F, where Fi = argmaxi=1..v I(Fi;C)

    4. Repeat5. m++

    6. add Fi in F, where Fi = argmaxi=1..v {min Fj F I(Fi;C|Fj)}

    7. Until m=n

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    9/14

    Experiment Setup

    Dataset:

    WebKB: 4199 pages, 4 categories NewsGroups: 20000 pages, 10 categories

    Feature selection criterion:

    CMIM

    Information gain (IG)

    Classifier: Nave Bayes

    Support vector machine

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    10/14

    Result for WebKB

    Micro-averaged accuracy Macro-averaged accuracy

    SVM

    NB

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    11/14

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    12/14

    Result for Newsgroup

    Micro-averaged accuracy Macro-averaged accuracy

    SVM

    NB

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    13/14

    Discussion

    Feature size

    AccCMIM >> AccIG when small feature size

    Category number

    AccCMIM >> AccIG when small category number

    Category deviation

    MicroAccCMIM >> MicroAccIG when large deviation

  • 8/14/2019 Feature Selection With Conditional Mutual Information MaxiMin

    14/14

    Conclusion

    Use simple triplet to approximate joint conditional

    mutual information

    CMIM algorithm tries to reduce the correlation among

    features

    Complexity is O(NV3)