53
Regularized Double Nearest Neighbor Feature Extractio n for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Informa tion Science, Fu-Jen University

Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Embed Size (px)

Citation preview

Page 1: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification

Hsiao-Yun Huang

Department of Statistics and Information Science,Fu-Jen University

Page 2: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Hyperspectral Image Introduction 1

(image credit: AFRL)

Page 3: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Hyperspectral Image Introduction 2

(image credit: AFRL)

Page 4: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Applications of Hyperspectral Image

Military: military equipment detection. Commercial: mineral exploration, agriculture

and forest production. Ecology: chlorophyll, leaf water, cellulose,

lignin. Agriculture: illness or type of the plants.

Page 5: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Classification of Hypectral Image Pixels How to distinguish different land cover types

precisely and automatically in the hyperspectral images is an interesting and important research problem.

Generally, each pixel in a hyperspectral image is consisted of about hounds or even thousands of bands. This makes the discrimination among pixels a high-dimensional classification problem.

Page 6: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

High-Dimensional Data Analysis

“We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed;…” (Lecture on August 8, to the American Mathematical S

ociety ‘Math Challenges of the 21st Centure’ by David L. Donoho (2000))

Page 7: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Blessing: The Power of Increasing Dimensionality

xx11xx22

xx33

xx22

xx11

xx33

xx11

xx33

xx22

-5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

-5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

-5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

xx11 xx22 xx33

Page 8: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Curse: Hughes Phenomenon

m=2510

20

50100

200

1000

500

m =

1 100050020010050201052

MEASUREMENT COMPLEXITY n (Total Discrete Values)

0.50

0.55

0.60

0.65

0.70

0.75

ME

AN

RE

CO

GN

ITIO

N A

CC

UR

AC

Y

Page 9: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Curse of Dimensionality

In statistics, it is about the situation that the convergence of any estimator to the true value of a smooth function defined on a space of high dimension is very slow. That is, we need an extremely large amount of observations. (Bellman, 1961 ) http://www.stat.ucla.edu/~sabatti/statarray/textr/

node5.html

Page 10: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Challenge

Unfortunately, in hyperspectral image classification, the p > N case is the usual situation due to the access of training samples (ground truth data) can be very difficult and expensive.

The large dimension but few samples problems might cause the accuracy rate of the hyperspectral image classification to be unsatisfied.

Page 11: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Dimensionality Reduction

One common way to deal with the curse of dimensionality is to reduce the number of dimensions.

Two major reduction ideas: Feature Selection Feature Extraction

Page 12: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

xx11

xxpp

xx11

xxpp

ff11

ff22

ff22

ff11

Feature selection:select l out of p measurements

Feature extraction:map p measurements to l measurements

Page 13: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Feature Extraction v.s. Feature Selection

-5 0 50

50

100

150

-5 0 50

50

100

150

-5 0 5

-6

-4

-2

0

2

4

6Selection

Extraction

Page 14: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Basic Ideas of Feature Extraction

Feature extraction consists of choosing those features which are most effective for preserving class separability.

Class Separability depends not only on the class distributions but also on the classifier to be used.

We seek the minimum feature set with reference to the Bayes classifier; this will result in the minimum error for the given distributions. Therefore, the Bayes error is the optimum measure of feature effectiveness.

Page 15: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

One Consideration

A major disadvantage of the Bayes error as a criterion is that an explicit mathematical expression is not available except for a very few special cases, therefore, we cannot expect a great deal of theoretical development.

Page 16: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Practical Alternatives

Two types of criteria which have explicit mathematical expressions and frequently be used in practice: Functions of scatter matrices (do not relate to t

he Bayes error) Conceptually simple and give systematic algo

rithms. Bhattacharyya distance type of criteria (give u

pper bounds of the Bayes error) Only for two-class problem, and based on nor

mality assumption.

Page 17: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Discriminant Analysis Feature Extraction (DAFE or Fisher’s LDA)

1

1 10

10 ))(()()( where

L

i

L

ij

Tjiji

Ti

L

ii

DAb mmmmmmmmS

DAb

DAw SS 1)( of seignvector the

of composed is DAFEofmatrix ation transformfeature The

DAwS

L Is the number of classSb in pairwise st

ructure

Note: The number of extracted features are min{p,L-1} where p is the dimension of the mean vector

Page 18: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

DAFE v.s. PCA

PCA

DAFE

Page 19: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Drawbacks of the Fisher’s LDA (1) In some situations, is not a good m

easure of class separability Share the same mean: No scatter of M1 and M2 aroun

d M0 Multimodal: more than L-1 features are needed

DAb

DAwLDA SStrJ

1

Unimodal share the same

mean

Multimodal and share the same mean

Multimodal

Page 20: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Drawbacks of the Fisher’s LDA (2) The unbiased estimate S (pooled covariance estim

ate) of the within-class scatter matrix is adopted in LDA. If it is singular, the performance will be poor.

When dim>>n, S will loose its full rank as a growing number of eigenvalues become zero. So, it is not positive definite and can not be inverted.

100

Eigenvalue(dim/n=10)

0

_: true eigenvalues- - : Sw eigenvalues

dim

Page 21: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Feature Extraction Methods with Other Measure of Separability Nonparametric Discriminant Analysis (NDA; Fuku

naga and Mantock, 1983). Nonparametric Weighted Feature Extraction (NW

FE; Bor-Chen Kuo and Landgrebe, 2004) Regularized Double Nearest Proportion Feature

Extraction (RDNP; Hsiao-Yun Huang and Bor-Chen Kuo, submitted)

Page 22: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The idea of Nonpaprametric Discriminant Analysis (NDA; Fukunaga and Mantock,1983)

i Class

j Class

jM

iM

Instead of separating the means like LDA

Try to separate the

boundary

Page 23: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Nearest Neighbor Structure

i Class

j Class

Xik

k NN for class j

k NN for class i

Page 24: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Pairwise Between-Class Scatter Matrix

)( )(ikj xM

i Class

j ClassXik

)( )(ihj xM

Xih

)( )(ikj xM

Large weight

Small weight

),(),(

)},(),,(min{)()()()(

)()()()(),(

jkNN

il

ikNN

il

jkNN

il

ikNN

ilji

l xxdxxd

xxdxxdw

Page 25: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

NDA

Tilj

il

ilj

il

L

i

L

ijj

n

l i

jil

iNDAb xMxxMx

n

wPS

i

))())((( where )()()()(

1 1 1

),(

j. classin point kNN its to from distance theis ),(

and infinity, and zerobetween parameter control a is

),(),(

)},(),,(min{

)()()(

)()()()(

)()()()(),(

il

jkNN

il

jkNN

il

ikNN

il

jkNN

il

ikNN

ilji

l

xxxd

xxdxxd

xxdxxdw

NDAb

DAw SS 1)( of seignvector the

of composed is NDA ofmatrix ation transformfeature The

Page 26: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Properties of NDA

The between-class scatter matrix Sb is usually full rank. So, the restriction about only min(#class-1, dim) features can be extracted can be liberated.

Since the parametric nature of the Sb is replaced by the nonparametric Sb which leads to preserve important boundary information for classification, NDA is more robust.

Page 27: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Some Considerations about NDA When Overlap Occurs (1) Based on the definition of the boundary of

NDA (the focus portion of the distribution), the points with similar distance among the considered two groups are regarded as the boundary points.

This definition of boundary will fail when overlap occurs, because the points around and within overlap region will tend to have the same weight.

Page 28: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Then Boundary of NDA When Overlap Occurs

Projection direction ?

)(ilx

++ +

++

+

Projection direction ?

++ +

++

+

Projection directionProjection direction

Page 29: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Some Considerations about NDA When Overlap Occurs (2) In NDA, the kNN is adopted for measuring the ‘local’

between-class scatter, so the selected k is a very small integer as kNN people usually do (All the experiments in the paper and book shown by Fukunaga use either k=1 or 3).

This setting of k might cause the data point j and its local mean are very similar (close). The consequence is that the entries of Sbj will be very close to zero and thus cancels out the effect of the weight or makes the Sbj even with less influence among the overall Sb.

Page 30: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Some Considerations about NDA When Overlap Occurs (3) Also, in the Sb of NDA

only one data point is used to represent one group and used the kNN mean to represent the other local group.

This makes the Sb may not measure the scatter between “groups” very well and be easily influenced by the outliers.

Tilj

il

ilj

il

L

i

L

ijj

n

l i

jil

iNDAb xMxxMx

n

wPS

i

))())((( )()()()(

1 1 1

),(

Page 31: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

One Another Consideration

In the NDA, the boundary is estimated based on the sample. Even when the sample distributions are not overlapped, based on the setting of NDA, the estimated boundary might be too close to the edge (since small k and only one xj in one group is used in Sb).

Like what happened in the hard SVM, extremely clear cut support vectors (boundary) estimated from the sample might have unsatisfied performance due to the over fitting.

Page 32: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Singularity Problem

In NDA, the unbiased covariance estimate S is still adopted ,thus, the singularity problem still exist in NDA.

Page 33: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Nonparamentric Weighted Feature Extraction (NWFE)

i Class

j Class

)(ilx

)( )(ilj xM)( )(i

li xM

)( )(itj xM

)( )(iti xM

)(itx

)( )()( itj

it xMx

)( )()( ilj

il xMx

Large Weight

Light Weight

1

1

1

in

k

(i)kj

(i)k

(i)lj

(i)l(i,j)

l

))(x,Mdist(x

))(x,Mdist(xλ

Page 34: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Nonparametric Weighted Feature Extraction (NWFE; Kuo & Landgrebe, 2002, 2004)

Tikj

ik

ikj

ik

L

i

L

ijj

n

k i

jik

iNWb xMxxMx

nPS

i

))())((( )()()()(

1 1 1

),(

L

i

n

k

Tiki

ik

iki

ik

i

iik

iNWw

i

xMxxMxn

PS1 1

)()()()(),(

))())(((

( ) ( , ) ( )

1

( ) , is the number of training samples of class jjn

i i j jj k kl l j

l

M x w x n

,

1

1

1

in

l

(i)lj

(i)l

(i)kj

(i)k(i,j)

k

))(x,Mdist(x

))(x,Mdist(xλ

in

l

jl

(i)k

jl

(i)k(i,j)

kl

),xdist(x

),xdist(xw

1

1)(

1)(

NWb

NWw

NWw SSS 1)](diag5.05.0[ of seignvector the

of composed are NWFE ofmatrix ation transformfeature The

Page 35: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Double Nearest Proportion Structure

Class j

Class i

self-class nearest proportion

other-class nearest proportion

)(iiM

)(ijM

Weightreference

)(ilx

Page 36: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Robust Against the Overlap

Class j

Class i

)(ilx

)(itx

)( )(ili xM

)( )(iti xM

)( )(ilj xM

)( )(itj xM

)( )(ilxONP

)( )(itxONP

larger weight

smaller weight

Page 37: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Improvement of the Estimation of Sw (1) In Regularized Discriminant Analysis (RDA)

(Friedman, 1989), an extension of LDA, also proposed a improvement version of the Sw in LDA. The generalized version of that estimate is

∑ˆ= λ ∑ˆ +(1- λ) (σˆ)^2 I

λ is between 0 and 1.

The question is how to choose λ? (Friedman suggested using cross-validation.)

Page 38: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Improvement of the Estimation of Sw (2)

In NWFE, different way to get the local mean and weight in NDA were proposed. But, the most influential effect on the performance improvement is its proposed estimation of Sw

Why 0.5?

)(diag5.05.0 NWw

NWw SS

Page 39: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Shrinkage Estimation of Sw

Let Ψ denote the parameters of the unrestricted high-dimensional model, and Θ the matching parameters of a lower dimension restricted submodel. Also, let U be the estimate of Ψ and T be estimate of Θ. Then the shrinkage (regularized) estimate

U* = λ T +(1-λ )U where λ is between 0 and 1. λ can be determined analytically by Ledoit and Wolf

lemma (2003). Once the T (target) is specified, the λ can be calculated.

Page 40: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Some Targets

J. Schafer and K. Strimmer (2005) proposed six targets for the shrinkage estimate of the Sw.

Page 41: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

RDNP Feature Extraction

The feature transformation matrix of RDPN is composed of the eignvectors of

where

RDNPb

RDNPw SS 1)(

L

i

ilj

ili

L

ijj

N

l i

jil

iRDNPb xMxM

NPS

i

1

)()(

1 1

),(

)()((

Tilj

ili xMxM )()(( )()(

L

i

N

l

il

ili

RDNPw

i

SPPS1 1

)()(

)()()()()( )1( il

il

il

il

il STS

iN

t

itj

iti

ilj

iliji

l

xMxMd

xMxMd

1

1)()(

1)()(),(

))(),((

))(),((

hg

ighl

hg

ighl

il ssVar 2)(

,)(

,)( )())(,

Page 42: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Properties of RDNP (1)

RDNP is more likely to figure out the boundary when overlap occurs.

Use proportion mean in each group, so the between groups scatter could be measured more properly, the entries of the Sb will not be so close zero, the influence of the outliers will be reduced, and the estimated boundary will not too close to the edge.

Page 43: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The Properties of RDNP (2)

When NPi=Ni and NPj=Nj, then it can be easily shown that the features extracted by the RDNP is exactly the same as the features extracted by the Fisher’s LDA. Thas is, LDA is a special case of RDNP.

Page 44: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Washington DC Mall Image

Page 45: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Indian Pine Site Image

Page 46: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Experiment Result 1 (Washington DC Mall , Classifier: 1nn, Features 6)

# of trainingSamples

LDA NDA RDA NWFE RDNP

20

0.5771 0.5825 0.8564 0.8851 0.9217

40

0.8122 0.8160 0.8840 0.9231 0.9420

100

0.8897 0.8979 0.9206 0.9347 0.9688

Page 47: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Experiment Result 2 (Washington DC Mall , Classifier: SVM, Features 6)

# of trainingSamples

LDA NDA RDA NWFE RDNP

20

0.5809 0.5990 0.8441 0.8933 0.9266

40

0.8244 0.8067 0.8799 0.9243 0.9385

100

0.8902 0.8922 0.9302 0.9330 0.9701

Page 48: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

A color IR image of a portion of the DC data set

NWFE with 1nn

RDA with 1nn

1NN-NS (191 bands)

RDNP with 1nn

Page 49: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Experiment Result 3 (Indian Pine Site, Classifier: 1nn, Features 8)

# of trainingSamples

LDA NDA RDA NWFE RDNP

20

0.5512 0.5825 0.7662 0.8012 0.8377

40

0.5729 0.6060 0.7911 0.8331 0.8503

100

0.6345 0.6495 0.8180 0.8452 0.8910

Page 50: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Experiment Result 4 (Indian Pine Site, Classifier: SVM, Features 8)

# of trainingSamples

LDA NDA RDA NWFE RDNP

20

0.5512 0.5825 0.7662 0.8012 0.8377

40

0.5729 0.6060 0.7911 0.8331 0.8503

100

0.6345 0.6495 0.8180 0.8452 0.8910

Page 51: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information
Page 52: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

Other Applications

Microarray Data Discrimination Quality Control EEG Signal Classification

Page 53: Regularized Double Nearest Neighbor Feature Extraction for Hyperspectral Image Classification Hsiao-Yun Huang Department of Statistics and Information

The End

Thank you for your listening.