69
High-Dimensional Classification Methods for Sparse Signals and Their Applications in Gene Expression Data Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati Biostatistics Epidemiology & Research Design Monthly Seminar Series Cincinnati Children’s Hospital Medical Center November 11, 2014 Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of Cincinnati High-Dimensional Classification Methods for Sparse Signals and

High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

High-Dimensional Classification Methodsfor Sparse Signals and Their Applications in

Gene Expression Data

Dawit Tadesse, Ph.D.Department of Mathematical Sciences

University of Cincinnati

Biostatistics Epidemiology & Research Design Monthly Seminar SeriesCincinnati Children’s Hospital Medical Center

November 11, 2014

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 2: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 3: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 4: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

◮ 3. Feature Selection

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 5: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

◮ 3. Feature Selection

◮ 4. Simulation Results

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 6: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

◮ 3. Feature Selection

◮ 4. Simulation Results

◮ 5. Applications to Gene Expression Data

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 7: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

◮ 3. Feature Selection

◮ 4. Simulation Results

◮ 5. Applications to Gene Expression Data

◮ 6. Conclusion

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 8: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Contents

◮ 1. Introduction

◮ 2. Classification with Sparse Signals

◮ 3. Feature Selection

◮ 4. Simulation Results

◮ 5. Applications to Gene Expression Data

◮ 6. Conclusion

◮ 7. Selected Bibliography

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 9: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

◮ High-dimensional classification arises in many contemporarystatistical problems.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 10: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

◮ High-dimensional classification arises in many contemporarystatistical problems.

◮ • Bioinformatic: disease classification using microarray,proteomics, fMRI data.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 11: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

◮ High-dimensional classification arises in many contemporarystatistical problems.

◮ • Bioinformatic: disease classification using microarray,proteomics, fMRI data.

◮ • Document or text classification: E-mail spam.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 12: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

◮ High-dimensional classification arises in many contemporarystatistical problems.

◮ • Bioinformatic: disease classification using microarray,proteomics, fMRI data.

◮ • Document or text classification: E-mail spam.

◮ • Voice recognition, hand written recognition, etc.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 13: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 14: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

◮ ♠ Fisher discriminant analysis

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 15: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

◮ ♠ Fisher discriminant analysis

◮ ♠ Naive Bayes classifier

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 16: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

◮ ♠ Fisher discriminant analysis

◮ ♠ Naive Bayes classifier

For high-dimensional data (i.e. when p >> n), the above methodsdoesn’t work well.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 17: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

◮ ♠ Fisher discriminant analysis

◮ ♠ Naive Bayes classifier

For high-dimensional data (i.e. when p >> n), the above methodsdoesn’t work well.

� Bickel and Levina (2004) showed that Fisher breaks down forhigh-dimensions and suggested Naive Bayes rule.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 18: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

Well known classification methods include:

◮ ♠ Logistic Regression

◮ ♠ Fisher discriminant analysis

◮ ♠ Naive Bayes classifier

For high-dimensional data (i.e. when p >> n), the above methodsdoesn’t work well.

� Bickel and Levina (2004) showed that Fisher breaks down forhigh-dimensions and suggested Naive Bayes rule.

� Fan and Fan (2008) showed that even for Naive Bayes usingall the features increases the error rate and suggested FAIR.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 19: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

� Fan and Fan (2008) showed that the two-sample t-test can getimportant features.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 20: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

� Fan and Fan (2008) showed that the two-sample t-test can getimportant features.

� Fan and etal.(2012) showed that Naive Bayes increase errorrates if there is correlation among the features.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 21: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

� Fan and Fan (2008) showed that the two-sample t-test can getimportant features.

� Fan and etal.(2012) showed that Naive Bayes increase errorrates if there is correlation among the features.

� My Works:

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 22: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

1. Introduction

� Fan and Fan (2008) showed that the two-sample t-test can getimportant features.

� Fan and etal.(2012) showed that Naive Bayes increase errorrates if there is correlation among the features.

� My Works:

• I will show that even under high-correlation Naive Bayes canperform better than Fisher.

• I propose a generalized test statistic and give the condition underwhich it selects important features.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 23: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Fisher discriminant rule

δF (X ,µd ,µa,Σ) = 1{

µTd Σ

−1(X − µa) > 0}

, (1)

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 24: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Fisher discriminant rule

δF (X ,µd ,µa,Σ) = 1{

µTd Σ

−1(X − µa) > 0}

, (1)

with corresponding misclassification error rate

W (δF ,θ) = Φ̄

(

(µTd Σ

−1µd )1/2

2

)

. (2)

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 25: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Naive Bayes rule

δNB(X ,µd ,µa,D) = 1{

µTd D

−1(X − µa) > 0}

, (3)

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 26: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Naive Bayes rule

δNB(X ,µd ,µa,D) = 1{

µTd D

−1(X − µa) > 0}

, (3)

whose misclassification error rate is

W (δNB ,θ) = Φ̄

(

µTd D

−1µd

2(µTd D

−1ΣD−1µd )1/2

)

. (4)

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 27: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Definition: Suppose that µd = (α1, α2, . . . , αs , 0, . . . , 0)T is the

p× 1 mean difference vector where αj ∈ R\{0}, j = 1, 2, . . . , s. Wesay that µd is sparse if s = o(p). Signal is defined as

Cs = µTd D

−1µd =s∑

j=1

α2j

σ2j

where σ2j is the common variance for

feature j in the two classes.

Examples of Sparse situations in real life:

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 28: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Definition: Suppose that µd = (α1, α2, . . . , αs , 0, . . . , 0)T is the

p× 1 mean difference vector where αj ∈ R\{0}, j = 1, 2, . . . , s. Wesay that µd is sparse if s = o(p). Signal is defined as

Cs = µTd D

−1µd =s∑

j=1

α2j

σ2j

where σ2j is the common variance for

feature j in the two classes.

Examples of Sparse situations in real life:

◮ ⋆ Gene Expression data (Eg: p genes from Leukemia andNormal, only s of them distinguish Leukemia and Normal).

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 29: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Definition: Suppose that µd = (α1, α2, . . . , αs , 0, . . . , 0)T is the

p× 1 mean difference vector where αj ∈ R\{0}, j = 1, 2, . . . , s. Wesay that µd is sparse if s = o(p). Signal is defined as

Cs = µTd D

−1µd =s∑

j=1

α2j

σ2j

where σ2j is the common variance for

feature j in the two classes.

Examples of Sparse situations in real life:

◮ ⋆ Gene Expression data (Eg: p genes from Leukemia andNormal, only s of them distinguish Leukemia and Normal).

◮ ⋆ Author Identification (Eg: two novels from two authors andthere are only s few words which distinguish them).

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 30: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Theorem 2.1: If m ≤ s,µ(m)d = (α,α, . . . , α)T = α1, α 6= 0 and

Σ(m) is the truncated m ×m equicorrelation matrix, then we have

W (δF ,θ(m)) = W (δNB ,θ

(m)),

where θ(m) is the truncated parameter.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 31: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Theorem 2.1: If m ≤ s,µ(m)d = (α,α, . . . , α)T = α1, α 6= 0 and

Σ(m) is the truncated m ×m equicorrelation matrix, then we have

W (δF ,θ(m)) = W (δNB ,θ

(m)),

where θ(m) is the truncated parameter.

We define ρ̄(m) and ρ(m)max are equicorrelation matrices with off

diagonals the mean of the correlation coefficients and largest of theabsolute values of the correlation coefficients respectively.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 32: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Theorem 2.2: Suppose ρ(m) is an m ×m correlation matrix and

µ(m)d is an m × 1 mean difference vector.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 33: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Theorem 2.2: Suppose ρ(m) is an m ×m correlation matrix and

µ(m)d is an m × 1 mean difference vector.

(a)

Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√

λmin(ρ(m))

≤ W (δw , θ(m)) ≤ Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√

λmax(ρ(m))

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 34: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

2. Classification with Sparse Signals

Theorem 2.2: Suppose ρ(m) is an m ×m correlation matrix and

µ(m)d is an m × 1 mean difference vector.

(a)

Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√

λmin(ρ(m))

≤ W (δw , θ(m)) ≤ Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√

λmax(ρ(m))

(b) Suppose, further, that λmin(ρ(m)) ≥ λmin(ρ̄

(m)) = 1− ρ̄. Then

Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√1− ρ̄

≤ W (δw , θ(m)) ≤ Φ̄

(µ(m)d )T (D(m))−1µ

(m)d

2√

1 + (m − 1)ρmax

where w = F or w = NB for the truncated parameter θ(m).

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 35: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Goal of Feature Selection. How do i pick the best markers?Which method? Finding a needle in haystack?

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 36: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Goal of Feature Selection. How do i pick the best markers?Which method? Finding a needle in haystack?

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 37: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Two-sample t-test

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 38: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Two-sample t-test

For unequal sample sizes, unequal variance, the absolute value ofthe two-sample t-statistic for feature j is defined as

Tj =|X̄1j − X̄0j |

S21j/n1 + S2

0j/n0, j = 1, . . . , p. (5)

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 39: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Two-sample t-test

For unequal sample sizes, unequal variance, the absolute value ofthe two-sample t-statistic for feature j is defined as

Tj =|X̄1j − X̄0j |

S21j/n1 + S2

0j/n0, j = 1, . . . , p. (5)

Fan and Fan (2008) gave the conditions under which thetwo-sample t-test can select all the important features withprobability 1.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 40: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Two-sample t-test

For unequal sample sizes, unequal variance, the absolute value ofthe two-sample t-statistic for feature j is defined as

Tj =|X̄1j − X̄0j |

S21j/n1 + S2

0j/n0, j = 1, . . . , p. (5)

Fan and Fan (2008) gave the conditions under which thetwo-sample t-test can select all the important features withprobability 1.

In this talk we will use the two-sample t-test as feature selectionmethod.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 41: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

They stated their theorem as follows assuming µd is sparse:

Theorem 3.1: Let s be a sequence such that log(p − s) = o(nγ)and log s = o(n1/2−γβn) for some βn → ∞ and 0 < γ < 1/3.

Suppose that min1≤j≤s

|µd,j |√

σ21j + σ2

0j

= n−γβn where µd,j is the j th

feature mean difference. Then, for x ∼ cnγ/2 with c some positiveconstant, we have

P

(

minj≤s

Tj ≥ x and maxj>s

Tj < x

)

→ 1.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 42: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

They stated their theorem as follows assuming µd is sparse:

Theorem 3.1: Let s be a sequence such that log(p − s) = o(nγ)and log s = o(n1/2−γβn) for some βn → ∞ and 0 < γ < 1/3.

Suppose that min1≤j≤s

|µd,j |√

σ21j + σ2

0j

= n−γβn where µd,j is the j th

feature mean difference. Then, for x ∼ cnγ/2 with c some positiveconstant, we have

P

(

minj≤s

Tj ≥ x and maxj>s

Tj < x

)

→ 1.

Note that asymptotically the two-sample t-test can pick up all theimportant features. However we are interested in the probability ofselecting all the important features in the short run.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 43: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection: Simulation Results

We take p = 4500, s = 90, n1 = n0 = 30,Σ is equicorrelation andµd equal mean difference. Simulation results for the probability ofgetting all the important s features in the first s and 2s t-statisticsrespectively.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 44: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection: Simulation Results

We take p = 4500, s = 90, n1 = n0 = 30,Σ is equicorrelation andµd equal mean difference. Simulation results for the probability ofgetting all the important s features in the first s and 2s t-statisticsrespectively.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Mean difference

Pro

babili

ty o

f gettin

g a

ll th

e im

port

ant s featu

res

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Mean difference

Pro

babili

ty o

f gettin

g a

ll th

e im

port

ant s featu

res

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 45: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Generalized Feature Selection

Two-sample t-test depends on (approximately) normal distribution.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 46: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Generalized Feature Selection

Two-sample t-test depends on (approximately) normal distribution.

Our test statistic Tj for feature j is defined as follows:

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 47: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Generalized Feature Selection

Two-sample t-test depends on (approximately) normal distribution.

Our test statistic Tj for feature j is defined as follows:

Tj =

∑n1k=1 w1kj −

∑n0k=1 w0kj

SE (∑n1

k=1 w1kj −∑n0

k=1 w0kj )(6)

where wikj , i = 0, 1, is the statistic for feature j in class i forsample k .

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 48: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Our test statistic is a special case of Two-sample t-test, WilcoxonMann-Whitney, and Two-sample Proportion test.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 49: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Our test statistic is a special case of Two-sample t-test, WilcoxonMann-Whitney, and Two-sample Proportion test.

Theorem 3.2: Assume that the vector µd = µ1 −µ0 is sparse andwithout loss of generality only first s entries are nonzero. Let s be asequence such that log(p − s) = o(nγ) and log s = o(nγ) for some

0 < γ < 1/3. Suppose min1≤j≤s

|ηj | = n−γCn such that Cn/n3γ2 → c∗.

For t ∼ cnγ

2 with some constant 0 < c < c∗/2 we have

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 50: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

3. Feature Selection

Our test statistic is a special case of Two-sample t-test, WilcoxonMann-Whitney, and Two-sample Proportion test.

Theorem 3.2: Assume that the vector µd = µ1 −µ0 is sparse andwithout loss of generality only first s entries are nonzero. Let s be asequence such that log(p − s) = o(nγ) and log s = o(nγ) for some

0 < γ < 1/3. Suppose min1≤j≤s

|ηj | = n−γCn such that Cn/n3γ2 → c∗.

For t ∼ cnγ

2 with some constant 0 < c < c∗/2 we have

P(minj≤s

|Tj | ≥ t, and maxj>s

|Tj | < t) → 1.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 51: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

We use validation data to determine the optimal number offeatures.

We take:

♦ p = 4500, s = 90

♦ Training: n1 = n0 = 30

♦ Validation: n1 = n0 = 30

♦ Testing: n1 = n0 = 50

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 52: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

NB dominates Fisher

ρ α = 1 m NB m F Emp. Err.NB Emp. Err.F

0.1 Q1 31.75 9.00 0.0375 0.1200

Median 63.00 13.00 0.0700 0.1400

Mean 79.98 16.42 0.0693 0.1448

Q3 122.20 23.00 0.1000 0.1700

0.5 Q1 9.00 4.00 0.0475 0.240

Median 53.50 10.00 0.2100 0.280

Mean 78.96 15.72 0.1852 0.267

Q3 155.50 25.00 0.2800 0.300

Ran. Corr. Q1 15.00 18.75 0.0100 0.0200

Median 20.00 21.00 0.0200 0.0300

Mean 22.46 24.55 0.0221 0.0394

Q3 26.25 29.25 0.0300 0.0500

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 53: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

Simulations for equicorrelation and equal mean difference withp = 4500, s = 90, ρ = 0.5. Balanced (n1 = n0 = 30) andunbalanced (n1 = 30, n0 = 60) respectively. The testing samplesizes are n1 = n0 = 50 for both.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 54: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

Simulations for equicorrelation and equal mean difference withp = 4500, s = 90, ρ = 0.5. Balanced (n1 = n0 = 30) andunbalanced (n1 = 30, n0 = 60) respectively. The testing samplesizes are n1 = n0 = 50 for both.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.1

0.2

0.3

0.4

0.5

alpha

Test

ing m

iscl

ass

ifica

tion e

rror

Naive Bayes, m=10

Naive Bayes, m=30

Naive Bayes, m=45

Fisher, m=10

Fisher, m=30

Fisher, m=45

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.1

0.2

0.3

0.4

0.5

Mean Difference

Test

ing M

iscl

ass

ifica

tion E

rror

Rate

Naive Bayes, m=10

Naive Bayes, m=30

Naive Bayes, m=45

Fisher, m=10

Fisher, m=30

Fisher, m=45

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 55: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

Similar simulation as the balanced except we use randomcorrelation. We randomly generate the eigenvalues of Σ in theinterval [0.5, 45.5].

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 56: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

4. Simulation Results

Similar simulation as the balanced except we use randomcorrelation. We randomly generate the eigenvalues of Σ in theinterval [0.5, 45.5].

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.1

0.2

0.3

0.4

0.5

Mean Difference

Test

ing

Misc

lass

ificat

ion

Erro

r Rat

e

Naive Bayes, m=10

Naive Bayes, m=30

Naive Bayes, m=45

Fisher, m=10

Fisher, m=30

Fisher, m=45

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 57: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Gene Expression Data

Leukemia Data (p = 7129, n = 72).

Training: n1 = 24 from class ALL and n0 = 13 from class AML.

Validation: n1 = 23 from class ALL and n0 = 12 from class AML.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 58: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Gene Expression Data

Leukemia Data (p = 7129, n = 72).

Training: n1 = 24 from class ALL and n0 = 13 from class AML.

Validation: n1 = 23 from class ALL and n0 = 12 from class AML.

0 5 10 15 20 25 30 35

0.0

0.1

0.2

0.3

0.4

0.5

m

Test

ing

Misc

lass

ificat

ion

Erro

r

Naive Bayes

Fisher

For NB the optimal number of genes is 43 with min. error 2/35.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 59: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Gene Expression Data

Atopic Dermatitis (AD) Data (p = 54675, n = 72).

Training: n1 = 24 from class AD and n0 = 15 from class non-AD.

Validation: n1 = 25 from class AD and n0 = 8 from class non-AD.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 60: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Gene Expression Data

Atopic Dermatitis (AD) Data (p = 54675, n = 72).

Training: n1 = 24 from class AD and n0 = 15 from class non-AD.

Validation: n1 = 25 from class AD and n0 = 8 from class non-AD.

5 10 15

0.0

0.1

0.2

0.3

0.4

0.5

Number of Genes used

Test

ing

Misc

lass

ificat

ion

Erro

r

Naive Bayes

Fisher

For NB the optimal number of genes is 34 with min. error 0.03.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 61: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Text Data

NASA flight data set (p = 26694, n = 4567).Training: n1 = 1081, n0 = 1486, Validation: n1 = n0 = 500 andTesting: n1 = n0 = 500

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 62: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

5. Applications to Text Data

NASA flight data set (p = 26694, n = 4567).Training: n1 = 1081, n0 = 1486, Validation: n1 = n0 = 500 andTesting: n1 = n0 = 500

10 20 30 40 50

0.0

0.1

0.2

0.3

0.4

0.5

m

Testi

ng M

isclas

sifica

tion

Erro

r

Naive Bayes

Fisher

For NB classifier the optimal number of features selected using thevalidation data set is 148 with corresponding testing error rate0.116. For Fisher using 48 with corresponding testing error > 0.20.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 63: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

6. Conclusion

In this talk we considered a binary classification problem when thefeature dimension p is much larger than the sample size n. Thefollowing are the main results:

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 64: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

6. Conclusion

In this talk we considered a binary classification problem when thefeature dimension p is much larger than the sample size n. Thefollowing are the main results:

� We have given conditions under which Naive Bayes is optimalfor the population model.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 65: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

6. Conclusion

In this talk we considered a binary classification problem when thefeature dimension p is much larger than the sample size n. Thefollowing are the main results:

� We have given conditions under which Naive Bayes is optimalfor the population model.

� Through theory, simulation and data analysis we have shownthat Naive Bayes is practical method to use than Fisher forhigh-dimensional data.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 66: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

6. Conclusion

In this talk we considered a binary classification problem when thefeature dimension p is much larger than the sample size n. Thefollowing are the main results:

� We have given conditions under which Naive Bayes is optimalfor the population model.

� Through theory, simulation and data analysis we have shownthat Naive Bayes is practical method to use than Fisher forhigh-dimensional data.

� In designing binary classification experiments, Fisher requires fullcorrelation structure but using equicorelation structure we candesign our experiment using Naive Bayes.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 67: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

6. Conclusion

In this talk we considered a binary classification problem when thefeature dimension p is much larger than the sample size n. Thefollowing are the main results:

� We have given conditions under which Naive Bayes is optimalfor the population model.

� Through theory, simulation and data analysis we have shownthat Naive Bayes is practical method to use than Fisher forhigh-dimensional data.

� In designing binary classification experiments, Fisher requires fullcorrelation structure but using equicorelation structure we candesign our experiment using Naive Bayes.

� Through simulation we characterized that the two-sample t-testcan pick up all the important features as far the signal is not toolow.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 68: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

8. Selected Bibliography

• Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’slinear discriminant function, ”naive Bayes”, and some alternativeswhen there are many more variables than observations. Bernoulli10, 989-1010.

• Cao, Hongyuan (2007). Moderate Deviations For Two SampleT-Statistics. ESAIM: Probability and Statistics, Vol. 11, 264-271.

• Fan, J. and Fan, Y. (2008). High dimensional classification usingfeatures annealed independence rules. Ann. Statist., 36,2605-2637.

• Fan, J., Feng, Y., and Tong, X. (2012). A road to classificationin high dimensional space: the regularized optimal affinediscriminant. J. R. Statist. Soc. B. 74, 745-771.

• Richard A. Johnson and Dean W. Wichern (6th edition). AppliedMultivariate Statistical Analysis. Pearson Prentice Hall, 2007.

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and

Page 69: High-Dimensional Classification Methods for Sparse …...4. Simulation Results Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification

Thank You For Listening!

Dawit Tadesse, Ph.D. Department of Mathematical Sciences University of CincinnatiHigh-Dimensional Classification Methods for Sparse Signals and