14
3880 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007 Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images N. Gökhan Kasapo ˇ glu, Student Member, IEEE, and Okan K. Ersoy, Fellow, IEEE Abstract—Effective partitioning of the feature space for high classification accuracy with due attention to rare class members is often a difficult task. In this paper, the border vector detection and adaptation (BVDA) algorithm is proposed for this purpose. The BVDA consists of two parts. In the first part of the algorithm, some specially selected training samples are assigned as initial reference vectors called border vectors. In the second part of the algorithm, the border vectors are adapted by moving them toward the decision boundaries. At the end of the adaptation process, the border vectors are finalized. The method next uses the minimum distance to border vector rule for classification. In supervised learning, the training process should be unbiased to reach more accurate results in testing. In the BVDA, decision region borders are related to the initialization of the border vectors and the input ordering of the training samples. Consensus strategy can be applied with cross validation to reduce these dependencies. The performance of the BVDA and consensual BVDA were studied in comparison to other classification algorithms including neural network with backpropagation learning, support vector machines, and some statistical classification techniques. Index Terms—Border vector detection and adaptation (BVDA), consensual classification, data classification, decision region borders, remote sensing. I. I NTRODUCTION T HE PERFORMANCE of a classifier is strongly related to the number and quality of training samples in supervised learning [1], [2]. A desirable classifier is expected to achieve sufficient overall classification accuracy, while rare class mem- bers are also correctly classified in the same process. Achieving this aim is not a trivial task, particularly, when the training samples are limited in number. The lack of a sufficient number of training samples decreases the generalization performance of a classifier. In particular, in remote sensing, collecting training samples is a costly and difficult process. Therefore, a limited number of training samples is obtained in practice. A heuristic metric is that the number of training samples for each class should be at least 10–30 times the number of attributes (features per band) [3], [4]. It is true that this may be achieved for multispectral data classification. However, for hyperspectral Manuscript received November 25, 2006; revised April 20, 2007. N. G. Kasapoˇ glu is with the Department of Electronics and Communication Engineering, Istanbul Technical University (ITU), Istanbul 34469, Turkey. O. K. Ersoy is with the Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2007.900699 data, which has at least 100–200 bands, a sufficient number of training samples cannot be collected. Normally, when the number of bands that are used in the classification process increases, more accurate class determination is expected. For a high-dimensional feature space, when a new feature is added to the data, classification error decreases, but at the same time, the bias of the classification error increases [5]. If the increment of the bias of the classification error is more than the reduction in classification error, then the use of the additional feature ac- tually decreases the performance of the classifier. This phenom- enon is called the Hughes effect [6], and it may be much more harmful with hyperspectral data than with multispectral data. Special attention can be given to the determination of signif- icant samples, which are much more effective to use in order to form the decision boundaries than other samples [7]. The structure of discriminant functions that are used by classifiers can give some important clues about the positions of the effect- ive samples in the feature space. The training samples near the decision boundaries can be considered significant samples. The problem would be to specify the positions of these samples in the image. For example, in crop mapping applications, some samples near parcel borders (spatial boundary in the image) are assumed to be samples with mixed spectral responses. Sam- ples compromising mixed spectral responses can be taken into consideration to determine significant samples. Therefore, the same classification accuracy can be achieved by using a lower number of significant samples than a larger number of samples that are collected from pure pixels [8]. Consequently, one major classifier design consideration should be the detection and usage of training samples near the decision boundaries [9]. It is known that the training stage is very important in su- pervised learning and affects the generalization capability of a classification algorithm. In some cases, not all training samples are useful, and some may even be detrimental to classification [10]. In such a case, some (noisy) samples may be discarded from the training set, or their intensity values may be filtered for noise reduction by using appropriate spatial filtering operations such as mean filtering to enhance the generalization capability of the classifier [11]. For example, this kind of spatial filtering with a small window size (1 × 2) has been applied to parcel borders in agricultural areas to find appropriate intensity values of the spectral mixture-type pixels [8]. The training process should not be biased. Equal number of training samples should be selected for each class if possible. In practice, this may not be possible. In addition, the training 0196-2892/$25.00 © 2007 IEEE

Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

  • Upload
    okan-k

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3880 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

Border Vector Detection and Adaptationfor Classification of Multispectral andHyperspectral Remote Sensing Images

N. Gökhan Kasapoglu, Student Member, IEEE, and Okan K. Ersoy, Fellow, IEEE

Abstract—Effective partitioning of the feature space for highclassification accuracy with due attention to rare class membersis often a difficult task. In this paper, the border vector detectionand adaptation (BVDA) algorithm is proposed for this purpose.The BVDA consists of two parts. In the first part of the algorithm,some specially selected training samples are assigned as initialreference vectors called border vectors. In the second part of thealgorithm, the border vectors are adapted by moving them towardthe decision boundaries. At the end of the adaptation process, theborder vectors are finalized. The method next uses the minimumdistance to border vector rule for classification. In supervisedlearning, the training process should be unbiased to reach moreaccurate results in testing. In the BVDA, decision region bordersare related to the initialization of the border vectors and theinput ordering of the training samples. Consensus strategy can beapplied with cross validation to reduce these dependencies. Theperformance of the BVDA and consensual BVDA were studiedin comparison to other classification algorithms including neuralnetwork with backpropagation learning, support vector machines,and some statistical classification techniques.

Index Terms—Border vector detection and adaptation (BVDA),consensual classification, data classification, decision regionborders, remote sensing.

I. INTRODUCTION

THE PERFORMANCE of a classifier is strongly related tothe number and quality of training samples in supervised

learning [1], [2]. A desirable classifier is expected to achievesufficient overall classification accuracy, while rare class mem-bers are also correctly classified in the same process. Achievingthis aim is not a trivial task, particularly, when the trainingsamples are limited in number. The lack of a sufficient numberof training samples decreases the generalization performance ofa classifier. In particular, in remote sensing, collecting trainingsamples is a costly and difficult process. Therefore, a limitednumber of training samples is obtained in practice. A heuristicmetric is that the number of training samples for each classshould be at least 10–30 times the number of attributes (featuresper band) [3], [4]. It is true that this may be achieved formultispectral data classification. However, for hyperspectral

Manuscript received November 25, 2006; revised April 20, 2007.N. G. Kasapoglu is with the Department of Electronics and Communication

Engineering, Istanbul Technical University (ITU), Istanbul 34469, Turkey.O. K. Ersoy is with the Department of Electrical and Computer Engineering,

Purdue University, West Lafayette, IN 47907 USA.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TGRS.2007.900699

data, which has at least 100–200 bands, a sufficient numberof training samples cannot be collected. Normally, when thenumber of bands that are used in the classification processincreases, more accurate class determination is expected. Fora high-dimensional feature space, when a new feature is addedto the data, classification error decreases, but at the same time,the bias of the classification error increases [5]. If the incrementof the bias of the classification error is more than the reductionin classification error, then the use of the additional feature ac-tually decreases the performance of the classifier. This phenom-enon is called the Hughes effect [6], and it may be much moreharmful with hyperspectral data than with multispectral data.

Special attention can be given to the determination of signif-icant samples, which are much more effective to use in orderto form the decision boundaries than other samples [7]. Thestructure of discriminant functions that are used by classifierscan give some important clues about the positions of the effect-ive samples in the feature space. The training samples near thedecision boundaries can be considered significant samples. Theproblem would be to specify the positions of these samples inthe image. For example, in crop mapping applications, somesamples near parcel borders (spatial boundary in the image) areassumed to be samples with mixed spectral responses. Sam-ples compromising mixed spectral responses can be taken intoconsideration to determine significant samples. Therefore, thesame classification accuracy can be achieved by using a lowernumber of significant samples than a larger number of samplesthat are collected from pure pixels [8]. Consequently, one majorclassifier design consideration should be the detection andusage of training samples near the decision boundaries [9].

It is known that the training stage is very important in su-pervised learning and affects the generalization capability of aclassification algorithm. In some cases, not all training samplesare useful, and some may even be detrimental to classification[10]. In such a case, some (noisy) samples may be discardedfrom the training set, or their intensity values may be filtered fornoise reduction by using appropriate spatial filtering operationssuch as mean filtering to enhance the generalization capabilityof the classifier [11]. For example, this kind of spatial filteringwith a small window size (1 × 2) has been applied to parcelborders in agricultural areas to find appropriate intensity valuesof the spectral mixture-type pixels [8].

The training process should not be biased. Equal number oftraining samples should be selected for each class if possible.In practice, this may not be possible. In addition, the training

0196-2892/$25.00 © 2007 IEEE

Page 2: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3881

process may be affected by the order of the input trainingsamples. To reduce such dependencies and to increase classi-fication accuracy, a consensual rule [12], [13] can be applied tocombine results that were obtained from a pool of classifiers.This process can also be combined with cross validation toimprove the generalization capability of a classifier.

Our motivation in this paper is to overcome some of thesegeneral classification problems by developing a classificationalgorithm that is directly based on the detection of significanttraining samples without relying on the underlying statisticaldata distribution. Our proposed algorithm, i.e., the border vectordetection and adaptation (BVDA), uses detected border vectorsnear the decision boundaries, which are adapted to make aprecise partitioning in the feature space by using a maximummargin principle.

Many supervised classification techniques have been usedfor multispectral and hyperspectral data classification, such asthe maximum likelihood (ML), neural networks (NNs), andsupport vector machines (SVMs). Practical implementationissues and computational load are additional factors that areused to evaluate classification algorithms.

Statistical classification algorithms are fast and reliable, butthey assume that the data have a specific distribution. For real-world data, these kinds of assumptions may not be sufficientlyaccurate, particularly, for low-probability classes. For a high-dimensional feature space, first and particularly, second-orderstatistics (mean and covariance matrix) could not be accuratelyestimated. The total number of parameters in the covariancematrix is equal to the square of the feature size. Therefore,the proper estimation of the covariance matrix is particularlya difficult challenge. To overcome proper parameter estimationproblems, some valuable methods are introduced in the liter-ature. Covariance matrix regularization is one of the methodsthat can be applied to estimate a more accurate covariance ma-trix [14], [15]. In this method, sample and common covariancematrices are combined in some way to achieve more accuratecovariance matrix estimation. Enhancing statistics by usingunlabeled samples iteratively is another method to reduce theeffects of poor statistics. The expectation–maximization algo-rithm can be used for this purpose to enhance statistics [16].In hyperspectral data, neighboring bands are usually highlycorrelated. Methods such as discriminant analysis feature ex-traction (DAFE) [5] and decision boundary feature extraction(DBFE) [17] can be applied. Working in a high-dimensionalfeature space directly is also problematic with these methods.Therefore, subset feature selection via band grouping such asparametric projection pursuit (PP) [18] can be used beforeDAFE and DBFE.

Some nonparametric classification methods such as SVMsare robust with both multispectral and hyperspectral data.Therefore, the Hughes effect can be less harmful with thesenonparametric methods than with parametric methods. TheK-nearest neighbor (KNN) rule is one of the simple and effec-tive classification techniques in nonparametric pattern recog-nition that does not need knowledge of the distribution of thepatterns [19], but it is also sensitive to the presence of noisein the data when K is not chosen to be big enough. NNs arewidely used in the analysis of remotely sensed data. There is a

variety of network types that are used in remote sensing, such asmultilayer perceptron or feedforward NN with backpropagation(NN-BP) learning [1]. There are also some additional classifica-tion schemes to improve the classification performance of NNsto simplify the complex classification problem by accepting orrejecting samples in a number of modules, such as parallel self-organizing hierarchical NNs (PSHNNs) [20]. By using parallelstages of NN modules, hard vectors are rejected to be processedin the succeeding stage modules, and this rejection scheme iseffective in enhancing classification accuracy. Consensual clas-sifiers are related to PSHNNs and also reach high classificationaccuracies [21]–[23].

In recent years, kernel methods such as SVMs have demon-strated good performance in multispectral and hyperspectraldata classification [24]–[26]. Some of the drawbacks of SVMsare the necessity of choosing an appropriate kernel functionand time-intensive optimization. In addition, the assumptionsthat were made in the presence of samples that are not linearlyseparable are not necessarily optimal. It is also possible to usecomposite kernels [26] for remote sensing image classificationto reach higher classification accuracies.

In this paper, a new classification algorithm that is well suitedfor the classification of remote sensing images is developedwith a new approach to detecting and adapting border vectorswith the training data. This approach is particularly effectivewhen the information source has a limited number of data sam-ples and the distribution of the data is not necessarily Gaussian.Training samples that are closer to class borders are moreprone to generate misclassification and therefore are significantsamples to be used to reduce classification errors. The proposedclassification algorithm searches for such error-causing trainingsamples in a special way and adapts them to generate bordervectors to be used as labeled samples for classification.

The BVDA algorithm can be considered in two parts. Thefirst part of the algorithm consists of defining initial bordervectors using class centers and misclassified training samples.With this approach, a manageable number of border vectors aredetected. The second part of the algorithm is the adaptation ofthe border vectors by using a technique that has some similaritywith the learning vector quantization (LVQ) algorithm [27].In this adaptation process, the border vectors are adaptivelymodified to support proper distances between them and theclass centers, and to increase the margins between neighboringborder vectors with different class labels. The class centersare also adapted during this process. Subsequent classificationis based on labeled border vectors and class centers. Withthis approach, the proper number of vectors for each class isgenerated by the algorithm.

This paper consists of four sections. The BVDA and con-sensual BVDA (C-BVDA) are presented in Section II. Thedata sets that were used and the experimental results that wereobtained with them are presented in Section III. Conclusionsand discussion of future research are presented in Section IV.

II. BVDA

Partitioning the feature space by using some selected refer-ence vectors from a training set is a well-known approach in

Page 3: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3882 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

pattern recognition [28]. In general, there is an optimal numberof reference vectors that can be used. More numbers of refer-ence vectors above the optimal number may cause reduction ofgeneralization performance. To avoid performance reduction,additional efforts should be taken to discard redundant refer-ence vectors. An example of such a procedure is discussed inthe grow-and-learn algorithm (GAL) [28].

We propose a new approach to reference vector selectioncalled border vector detection. In developing such an approach,the selected reference vectors are required to satisfy certaingeometric considerations. For example, a major property ofSVMs is to optimize the margin between the hyperplanescharacterizing different classes [9]. The training vectors on thehyperplanes (in a separable problem) are called support vectors.In the proposed algorithm, the same type of consideration leadsto the positions of the reference vectors that were selected fromthe training set to be adapted, so that they better represent thedecision boundaries while the reference vectors from differentclasses are as far away from each other as possible. Theseadapted reference vectors are called border vectors.

A. Border Vector Detection

The border vector detection algorithm is developed by con-sidering three requirements.

1) Border vectors should be adapted, so that they representthe decision boundaries as well as possible.

2) The initial selection and adaptation procedure is desiredto be automatic, with a reasonable number of initialborder vectors.

3) Every class should be represented with an appropriatenumber of border vectors to properly represent the class.

In order to choose the initial border vectors, the class centersare used. A particular class center is defined as the nearestvector to its class mean. Using class center instead of classmean is a precaution for some classes, which are spread in aconcave form in the feature space.

Assuming a labeled training data set {(x1, y1),(x2, y2), . . . , (xn, yn)}, where the training samples arexi ∈ R

N , i = 1, . . . , n, the class labels are yi ∈ {1, 2, . . . ,m},n is the total number of training samples, and m is the numberof classes, the class means are calculated as follows:

mi =1ni

ni∑j=1

xj , {xj |yj = i, i = 1, . . . , m} (1)

where ni is the total number of training samples for class i.The class center ci for class i is defined in (2), shown at thebottom of the page. Let Bt be a set of border vectors in thefeature space. For t = 0, B0 is the set of initial border vectors

that were chosen as a combination of initial border vector setsBi, i.e.,

B0 =⋃

0≤i≤m

Bi. (3)

B0 is chosen as the set of initial class centers. They can bewritten together with their class labels as

B0 = {(c1, y1), (c2, y2), . . . , (cm, ym)}=

{(b1, y1), (b2, y2), . . . , (bm, ym)

}. (4)

The number of members for the set B0 is m0 = m. Addition-ally, Bi, i = 1, . . . , m, is chosen as a set of initial border vectorsthat were detected for class i, as discussed next. Assume that thetotal number of detected border vectors is mi for class i. In thisassignment procedure, Ri = B0 ∪ Bi is called the referenceset for class i, and the number of members for the reference setis m0 + mi. At the beginning of the detection procedure forevery class, Bi(t = 0) = ∅, and therefore, Ri(t = 0) = B0.During the detection process for class i = q, every member ofthe training samples belonging to class q is randomly selectedonly once as an input. Assume that (xk, yk = q) is selected.Then, the Euclidean distances that were calculated between thissample and the current reference set members are given by

Dj(xk, bj) = ‖xk − bj‖, j = 1, . . . , (m0 + mq). (5)

The winning border vector is chosen by

w = arg min{Dj}. (6)

If the label of the winning border vector bw is yw = yk = q,then (xk, yk = q) is chosen as a new reference vector for classq and added to the reference vector set. This can be writtenas Ri=q(t) = Ri=q(t − 1) ∪ {(xk, yk = q)}. This procedureis somewhat similar to the ART1 algorithm [29]. The procedurefor selecting border vectors is applied with all the classes.

We define b as the total number of border vectors and mi, i =1, . . . , m, as the number of detected border vectors for class i,with m0 = m being the number of classes. Then, the followingis true:

b =m∑

i=0

mi = m +m∑

i=1

mi. (7)

As an example, a binary classification problem in a 2-D fea-ture space is depicted in Fig. 1. In this figure, the training sam-ples that are shown with symbols + and × are for classes 1 and2, respectively. The samples that were detected as initial bordervectors are shown as circles. The initial decision boundary

ci = xk,

{k = arg min{Dj}(1 ≤ i ≤ m), (1 ≤ j ≤ n),

Dj(mi,xj) = ‖mi − xj‖ =∑N

d=1

√(mi(d) − xj(d))2, {xj |yj = i}

}(2)

Page 4: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3883

Fig. 1. Binary classification problem. Class centers and selected initial bordervectors are depicted as circles, and the initial border line between classes whenthe decision is made based on only class centers.

Fig. 2. Partitioning of the 2-D feature space by using initial border vectorsobtained at the end of the border vector detection procedure.

based on only the class centers, i.e., B0, is shown as a line. Theborder vectors other than the class centers are selected from themisclassified samples, as shown in Fig. 1.

In Fig. 2, all the detected border vectors B0 are used topartition the feature space. The next step is to adapt the bordervectors, so that they more accurately represent the class bound-aries. Additionally, in the adaptation procedure, if any newborder vector requirement occurs, additional border vectors areadded to the border vector set.

B. Adaptation Procedure

In the adaptation process, competitive learning principles areapplied as follows. The initial border vectors B0 are adaptivelymodified to support maximum distance between the bordervectors and their means, and to increase the margins betweenneighboring border vectors with different class labels. The

Fig. 3. Flow graph of the adaptation stage of the BVDA.

means of border vectors to be used during adaptation aregiven by

mi =1

mi + 1

b∑j=1

bj , {bj |yj = i, i = 1, . . . ,m} (8)

M0 = {(m1, y1), (m2, y2), . . . , (mm, ym)} . (9)

The means of border vectors are not taken into account inthe final decision process. Hence, at the end of the adaptationprocess, the means of the border vectors are redundant. Duringthe adaptation process, they are used to decide whether newborder vectors should be generated. They are also adaptedduring learning due to the changes of border vectors.

The strategy of adaptation can be explained as follows. Anearest border vector bw(t), which causes wrong decision,should be farther away from the current training sample. Onthe other hand, the nearest border vector bl(t) with the correctclass label should be closer to the current training sample. Thecorresponding adaptation process that was used has some sim-ilarity with the LVQ algorithm [27]. The adaptation procedureis depicted as a flow graph in Fig. 3.

Let xj be one of the training samples with label yj . Assumethat bw(t) is the nearest border vector to xj with label ybw

. Ifyj = ybw

, then the adaptation is applied as follows:

bw(t+1)= bw(t)−η(t)·(xj−bw(t)

)(10)

mybw(t+1)=

(mybw

·mybw(t)−η(t)·

(xj−bw(t)

))/mybw

.

(11)

Page 5: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3884 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

Fig. 4. Partitioning of the 2-D feature space by using the final border vectorsobtained at the end of the adaptation procedure.

On the other hand, if bl(t) is the nearest border vector to xj

with label ybland yj = ybl

, then

bl(t + 1) = bl(t) + η(t) ·(xj − bl(t)

)(12)

mybl(t + 1)=

(mybl

· mybl(t) + η(t) ·

(xj − bl(t)

))/mybl

(13)

where η(t) is a descending function of time and is called thelearning rate. A good choice for it is given by

η(t) = η0e−t/τ . (14)

During training, after a predefined number of iterations t′,the combination of Mt and Bt is used as reference nodes toclassify input training samples. If the nearest node to a selectedtraining sample xj with label yj is one of the means of theborder vectors mw(t > t′) with label ymw

and if yj = ymw,

then the wrongly classified training sample xj is added as anadditional border vector, i.e.,

Bt+1 = Bt ∪ {(xj , yj)} , (t > t′). (15)

The corresponding mean vector is also adapted as follows:

myj(t + 1)=

(myj

(t)·myj(t)+xj

)/(myj

(t) + 1)

(16)

where myj(t) is the number of border vectors belonging to class

yj at iteration t. Therefore, myj(t + 1) is the number of border

vectors in class yj after the addition of the new border vector.To illustrate the theory, the synthetic data result for the

chosen binary classification problem in the 2-D space is de-picted in Fig. 4. After the adaptation process, the final bordervectors shown as circles and the final decision boundary as acombination of partial lines are observed in Fig. 4.

During testing with the testing data set, classification iscurrently based on the minimum distance rule, with the bordervectors determined at the end of the adaptation procedure.

Fig. 5. Block scheme of consensus strategy with k-fold cross validation.

C. Consensus Strategy With Cross Validation

In supervised learning, the training process should be unbi-ased to reach more accurate results in testing. In the BVDA,accuracy is related to the initialization of the border vectors andthe input ordering of the training samples. These dependenciesmake the classifier a biased decision maker. Consensus strategycan be applied with cross validation to reduce these dependen-cies. Cross-validation fold number f should be chosen to be bigenough with a limited number of training samples. The blockscheme of consensus strategy with k-fold cross validation isdepicted in Fig. 5.

There are a variety of consensual rules that can be applied tocombine k individual results to obtain improved classification.The reliability factor of the classification results is depictedas a weight λk for the kth BVDA classifier in Fig. 5. Thisreliability factor can be specified by the consensual rule thatwas applied. For the majority voting (MV) rule, weights canbe equally chosen, and the majority label is taken as the finallabel. It is also possible to use a nonequal voting structure(qualified MV, QMV) based on training accuracies [30]. Byusing cross validation as a part of the consensual strategy,part of the training samples are used for cross validation, andreliability factors can be assigned more precisely based on val-idation. Once the reliability factors are determined, consensualclassification results can be obtained by applying a maximumrule with reliability factors. Additionally, obtaining optimalreliability factors (weights λk) can be done by least squaresanalysis (LSE) [12].

III. EXPERIMENTAL RESULTS

Reliable data sets that are well known in the literature aremore convenient to evaluate the performance of the proposedBVDA algorithm than data sets that are not tested before. Twowell-known data sets that are widely encountered in the liter-ature were used in the experiments for this purpose [31], [32].One additional data set from Turkey [33] was also used to makeproper comparison and to show the robustness of the proposedalgorithm. As a consequence, three different data sets, with oneof them having four different combinations of training sam-ples and corresponding classes, were used in the experiments

Page 6: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3885

to demonstrate a large number of results that are obtainedwith the BVDA. Additionally, a total of eight different trainingsample sets, which have much less training samples from theiroriginated data sets, was used to illustrate the sensitivity of themethods’ performance in the situation of the presence of muchless training samples both for multispectral and hyperspectraldata classification. We observed that the overall classificationaccuracies that were obtained with the BVDA are satisfactory.We also observed that rare class members are more accuratelyclassified than those with some other classification methods,and the Hughes effect [6] is less harmful with the BVDAthan with other conventional statistical methods. This meantthat the performance of the BVDA with a limited number oftraining samples is generally higher than the performances ofconventional classifiers.

The performance of the BVDA was compared with otherclassification algorithms including NN-BP learning [7], SVMs[24], [25], and some statistical classification techniques, such asML and Fisher linear likelihood (FLL) [34]. The data analysissoftware called Multispec [31] was used to perform the sta-tistical classification methods. Linear SVM and SVM with aradial basis kernel function were implemented in Matlab usingSVMlight [35] and its Matlab interface [36]. A one-against-one (OAO) multiclassification scheme was adopted in the ex-periments to compare SVM’s performance to BVDA’s. Onlyspectral features were taken into account in the comparison ofthe BVDA with other classification techniques.

A. Choice of Parameters

How to choose the parameters for the BVDA is an importantconcern. Three parameters need to be assigned. These pa-rameters are the learning rate η, the time constant τ , andthe predefined number of iterations t′. For fast convergence,η = 0.1 and τ = 1000 were found to be satisfactory. Fastertraining is suitable for relatively less complex classificationproblems. For more complex classification problems, finer tun-ing may be necessary, and η = 0.2 and τ = 6750 can be chosen.Parameter selection for the BVDA also has some similarity withthe self-organizing map [27]. Additionally, extra border vectorrequirements are controlled by using a predefined number ofiterations t′ during the adaptation process. This situation occursparticularly for complex classification problems. In the exper-iments, t′ = 5000 was chosen. During the training process, avalidation set can be used with a pocket algorithm to avoidoverfitting.

Determination of proper parameters is also an important con-cern for most other classification algorithms, such as the SVMclassifiers. SVM is a binary classifier, and the OAO strategywas used to generate a multiclass SVM classifier in this paper.For the OAO strategy, C and γ should be chosen for everybinary class combination. We assigned common parameters foreach binary SVM classifier empirically based on the trainingsamples. High overall classification accuracies can be obtainedby using common-parameter selection. A similar common-parameter assignment was applied in [24]. One drawback mayoccur with data sets that have unbalanced numbers of trainingsamples. In this situation, although high overall classification

accuracy may be obtained, the accuracies of the rare classesmay be lower than the overall classification accuracy. It is pos-sible to use a multiclass SVM classifier by reducing the classifi-cation to a single optimization problem. This approach may alsorequire fewer support vectors than a multiclass classification ap-proach based on the combined use of many binary SVMs [37].

NN-BP learning was chosen in the experiments as a well-known NN classifier. One hidden layer with 15 neurons waschosen empirically as the network structure with a learning rateof 0.01 and a maximum iteration number of 1000. This choicewas based on the testing of four variant network architecturescontaining 5, 10, and 15 neurons in one hidden layer and20 neurons equally in two hidden layers.

For the KNN classifier that was also used in the experiments,the choice of K is related to the generalization performanceof the classifier. Choosing a small number of K causes thereduction of generalization of the KNN classifier. It is alsotrue that K = 1 is the most sensitive choice to noisy samples.Therefore, K = 5 was chosen in the experiments.

B. Description of the Data Sets and the Experiments

Three different data sets were used in the experiments. Thenames of the experiments are chosen to be the same as thenames of the data sets, which are Airborne Visible/InfraredImaging Spectrometer (AVIRIS) data [31], Satimage data [32],and Karacabey data [33].1) AVIRIS Data Experiment: The AVIRIS image that was

taken from the northwest Indiana’s Pine site in June 1992 [31]was used in the experiments. This is a well-known test imageand has been often used for validating hyperspectral imageclassification techniques [34], [38]. Detailed comparisons weremade by using the AVIRIS data set in this paper. We usedthe whole scene consisting of the full 145 × 145 pixels withtwo different class combinations and two different spectralband combinations. The training sample sets with 17 classes(pixels with class labels of mixture type were considered forclassification) and nine classes (more significant classes fromthe statistical viewpoint) were generated with different com-binations of nine (to illustrate multispectral data classificationperformance) and 190 spectral bands (30 channels discardedfrom the original 220 spectral channels because of atmosphericproblems). The nine spectral bands that were used in data sets1 and 3 were obtained by using parametric PP based on subsetfeature selection via band grouping. Table I shows the numberof training and testing samples for the 17 and 9 class sets thatwere used in the experiments. Data sets 1 and 2 contain back-ground and building–grass–tree classes, which are of mixturetype. Therefore, these two classification experiments involvedmore complex classification problems than the other data sets.Additionally, data sets 1 and 2 have rare class members, whichhave a limited number of training samples (alfalfa, oats, stone,steel towers, etc.). Statistically meaningful classes were chosenfor the data sets 3 and 4. Additionally, four training sample setswith less number of training samples were chosen from datasets 1 and 4 to illustrate the performance of the classificationmethods with less training samples for multispectral and hy-perspectral data, respectively. The numbers of training samples

Page 7: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3886 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

TABLE INUMBERS OF TRAINING AND TESTING SAMPLES USED IN AVIRIS DATA EXPERIMENTS

TABLE IIREDUCED NUMBERS OF TRAINING AND TESTING SAMPLES USED

IN AVIRIS DATA TO ILLUSTRATE THE PERFORMANCE OF THE

CLASSIFICATION ALGORITHMS WITH FEWER TRAINING SAMPLES

that were used in the experiments are depicted in Table II. Theseadditional training sample sets were named data sets 1.1–1.4and data sets 4.1–4.4, respectively.

In the BVDA, classification is currently based on the min-imum distance rule with the finalized border vectors for thetesting data. This rule can be thought of as a 1-nearest neighborwith border vectors. The aim of the BVDA is to occupy thefeature space by using a minimum number of border vectors.Therefore, the nearest border vectors typically have differentclass labels, causing K > 1 to yield worse results. For example,the classification accuracies for data set 1 are shown in Fig. 6for different K values. The highest accuracies were obtainedwith K = 1, as expected.

The average testing accuracies that were obtained are shownin Fig. 7 for the AVIRIS data experiments. The ML classifier(MLC) results were obtained only for data sets 1 and 3 becauseof the requirement of additional training samples to accuratelyestimate the sample covariance matrix for every class. A highernumber of training samples is needed for proper sample co-

Fig. 6. Training accuracies obtained by the BVDA with various K valuesfor AVIRIS data sets 1–4 when KNN with border vectors was applied afteradaptation.

Fig. 7. Average testing accuracies for AVIRIS data sets 1–4.

variance matrix estimation in high-dimensional feature space(data sets 2 and 4).

The FLL algorithm was also used as an example of statisticalclassifiers. The inverse of the common covariance matrix isused in the discriminant function of the FLL. Therefore, all theclasses are assumed to have the same variance and correlationstructure.

The results that were obtained with the KNN were satisfac-tory in lower dimensional feature space (data sets 1 and 3).

Page 8: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3887

Fig. 8. Accuracies obtained by C-BVDA versus fold number for four differentconsensual rules.

Fig. 9. Processing times for the C-BVDA versus fold number for AVIRIS datasets 1–4.

Fig. 10. Processing times for AVIRIS data sets 1–4 for RBF-SVM, C-BVDA,and BVDA.

In the original KNN algorithm, all the training set membersare used as reference vectors. Therefore, the testing time islonger than with other conventional classification techniques.However, spatial search methods such as kd-trees [39], [40],R-trees [41], metric trees [42], and ball trees [43] can be usedto reduce the computational cost of KNN and render fasterimplementation possible without resorting to approximation.Additionally, fast KNN classification using the cluster-spaceapproach has been reported with some improvement of clas-sification performance while achieving a large reduction ofprocessing time [44].

For complex classification problems (data sets 1 and 2),we obtained relatively poor classification results with NN-BPlearning. For statistically meaningful data sets, we obtained sat-isfactory results with the NN-BP (data sets 3 and 4). Therefore,an approximately equal number of training samples should beselected to achieve better results with the NN-BP.

TABLE IIIAVERAGE NUMBER OF BORDER VECTORS OBTAINED WITH THE BVDA

TABLE IVCLASS-BY-CLASS ACCURACIES OBTAINED WITH AVIRIS DATA SET 1

TABLE VCLASS-BY-CLASS ACCURACIES OBTAINED WITH AVIRIS DATA SET 4

In general, the differences of the accuracies between datasets 2 and 1 and data sets 4 and 3 illustrate the robust-ness of the algorithms with respect to the Hughes effect (seeFig. 7). Based on this consideration, radial basis function SVM(RBF-SVM), C-BVDA, BVDA, and linear SVM are observedto be the most robust algorithms with respect to the Hugheseffect. As observed in Fig. 7, these algorithms also producecase-independent results.

Page 9: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3888 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

Fig. 11. (a) Ground truth of the AVIRIS data set with 17 classes. (b) Thematic map of the BVDA result with data set 1. (c) Thematic map of the C-BVDA resultwith data set 2. (d) Thematic map of the BVDA result with data set 3. (e) Thematic map of the C-BVDA result with data set 4.

In low-dimensional feature space, we obtained higher clas-sification accuracies with the BVDA and the C-BVDA. Theperformance of the BVDA in high-dimensional feature space

was also satisfactory, as shown in Fig. 7. In high-dimensionalfeature space, we obtained the highest classification accuracieswith the RBF-SVM and the C-BVDA. The accuracies that were

Page 10: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3889

obtained with the C-BVDA versus fold number k are shownin Fig. 8 with four different consensual rules for data set 1.These rules are MV, QMV based on the overall classificationaccuracies that were obtained by each validation set (every fold)(QMV-1), QMV based on class-by-class accuracies that wereobtained by each validation set (QMV-2), and optimal weightselection based on LSE. The best accuracies were obtained forf = 10 and MV rule for data set 1.

Processing time is also an important concern for our pro-posed consensual strategy, which is related to fold number f .The total processing time for the C-BVDA versus fold numberf are shown in Fig. 9 for data sets 1–4. A more complexclassification problem (data set 2) needs much more time whenwe compare with other data sets. This is directly related tothe number of classes and feature size. The total process-ing times are shown in Fig. 10 for RBF-SVM, BVDA, andC-BVDA. It is shown that the RBF-SVM used more processingtime as compared to the C-BVDA and the BVDA. As observedin Fig. 10, the processing time of the BVDA is reasonable.

The average number of border vectors that were used inthe BVDA algorithm is shown in Table III, together with theprocessing times for data sets 1–4. Border vector detectionprocedure supports a proper number of border vector require-ments that are dependent on the complexity of the problem. Fordetailed analysis, class-by-class accuracies are also given fordata sets 1 and 4 in Tables IV and V, respectively.

The relatively low accuracy of class 1 (background class,which is mixture type) in data set 1 reduced the overall clas-sification accuracy for the MLC, as observed in Table IV. Thereduction effects of the mixture-type classes are much moreharmful with the FLL in data set 1. We can conclude thatthe presence of the mixture-type class members reduces theoverall classification accuracy with the statistical classifiers, asobserved in data set 1.

NN-BP learning is based on minimizing the overall squareerror. As a result, the rare class members are not detected indata set 1, as expected. The performance of the NN-BP wassatisfactory for data set 4, which has only statistically meaning-ful classes, as observed in Table V. One interesting observationwas the very low accuracy of class 16 (building–grass–tree,which is a mixture-type rare class member) in data set 1 withthe RBF-SVM classifier, as observed in Table IV. Common-parameter assignment for each binary SVM classifier may havecaused this result.

With the BVDA and the C-BVDA, we obtained very satisfac-tory results for both data sets 1 and 4, as observed in Tables IVand V. The BVDA reached high overall classification accuracywhile correctly classifying rare class members, as observed inTable IV. Furthermore, the C-BVDA was used to enhance theclassification accuracy of the single BVDA classifier by usingcross validation in the consensual strategy with reasonableprocessing time, as observed in Tables IV and V. The thematicmaps that were obtained with the BVDA and the C-BVDAare depicted in Fig. 11(b) and (c) for data sets 1 and 2, andFig. 11(d) and (e) for data sets 3 and 4, respectively.

Additional training sample sets (Table II) originating fromdata sets 1 and 4 and containing less number of training sampleswere used to demonstrate the sensitivity of the classification

Fig. 12. Average testing accuracies for AVIRIS data sets 1 and 1.1–1.4.

Fig. 13. Average testing accuracies for AVIRIS data sets 4 and 4.1–4.4.

methods’ performance on the presence of a limited number oftraining samples. The average testing accuracies for data sets 1and 1.1–1.4, which contain 17 classes with nine spectral bands,are shown in Fig. 12. Maximum testing accuracies were ob-tained with C-BVDA (73.06%, 71.20%, 70.53%, and 67.46%for data sets 1 and 1.1–1.3, respectively), except for data set1.4. We obtained a satisfactory dynamic range of the accuraciesand results with RBF-SVM in this experiment. With data set1.4, which contains less number of total training samples (661),we obtained the best result with RBF-SVM (66.11%) andsatisfactory results with C-BVDA (64.32%). Similar effectswere observed with RBF-SVM, BVDA, and C-BVDA whenusing data sets 1 and 1.1–1.3. Similar results were obtainedwith NN-BP (59.5%), MLC (59.32%), and KNN (58.95%)when using data set 1.4, and the worst result was obtainedwith FLL (46.33%). The average testing accuracies for data sets4 and 4.1–4.4, which contain nine classes with 190 spectralbands, are shown in Fig. 13. We obtained higher accuracieswith RBF-SVM and C-BVDA when using data sets 4 and4.1–4.4, demonstrating robust performance in the presence ofa limited number of training samples of hyperspectral data. Thehighest classification accuracy was obtained with RBF-SVM(79.79%) when using data set 4.4, which contains less numberof training samples (405). We also obtained satisfactory resultwith C-BVDA (78.29%) when using data set 4.4 and relativelyhigher accuracy with BVDA (75.41%). The other classifica-tion methods that were used in the experiments performedsimilarly with data sets 4 and 4.1–4.3, as shown in Fig. 13:FLL (69.25%), KNN (71.40%), and NN-BP (72.24%). Theywere more affected by the limited number of training samples(data set 4.4).

Page 11: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3890 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

TABLE VICLASS-BY-CLASS ACCURACIES OBTAINED WITH THE SATIMAGE DATA SET

Fig. 14. (a) Color composite image of the Karacabey data set for bands 2–4. (b) Ground truth of the Karacabey data set with nine classes. (c) Thematic mapobtained with the BVDA and the Karacabey data set.

2) Satimage Data Experiment: The Satimage data set isa part of the Landsat MSS data and contains six differentclasses. Training samples of 4435 and 2000 were obtainedfrom the statlog web site with their labels [32]. The trainingset contains statistical meaningful samples for each class, asshown in Table VI. Four spectral bands were used with oneneighboring feature extraction method to extract features. Asa result, 4 × 9 = 36 features were assigned to a pixel.

Highest accuracy in previous works with this data set wasobtained with the RBF-SVM [37]. In this experiment, theRBF-SVM classifier, the NN-BP, and the MLC were used tomake comparisons with the BVDA and the C-BVDA. The aimof this experiment was to demonstrate the robustness of theresults that were obtained with the BVDA and to illustratethe performance of the BVDA on additional types of remotelysensed data in comparison with other methods. The parameters

Page 12: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3891

TABLE VIINUMBER OF SAMPLES FOR TRAINING, TESTING, AND WHOLE SCENE

of the BVDA were chosen as η = 0.2 and τ = 6750 for learn-ing rate and time constant, respectively, in this experiment.This parameter selection makes slow convergence and finetuning possible. The classification accuracy of the RBF-SVM(C = 16 and γ = 1) classifier with the OAO strategy wasreported as 91.3% for the Satimage testing data set in [37]. Inthis experiment, C = 6 and γ = 1.5 were chosen by using apattern search algorithm with a validation set. Fifteen neuronsin one hidden layer were chosen empirically with the learningrate of 0.01 as network parameters for the NN-BP learning.The activation function of the NN was chosen as the sigmoidfunction. In comparison, the testing results that were obtainedwith the C-BVDA and the RBF-SVM were almost same andsatisfactory for the Satimage data set, as observed in Table VI.Approximately 4% less average accuracy was obtained for theNN-BP when we compare the average of the results that wereobtained by the RBF-SVM, the BVDA, and the C-BVDA.Additionally less accurate result (27.48%) was obtained forclass damp gray soil by the MLC. Therefore, the MLC wasnot sufficient to make detailed class discrimination (for classdamp gray soil) in this experiment. Obtained accuracies for thisproblematic class were 54.04%, 66.82%, 67.29%, and 68.72%by NN-BP, RBF-SVM, BVDA, and C-BVDA, respectively.Therefore, the best accurate results were obtained by the BVDAand the C-BVDA for this specific class.3) Karacabey Data Experiment: The Karacabey data set is

a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) imagethat was taken from northwest Turkey in the Karacabey region,Bursa, in July 2000 [33]. Six visible infrared bands (bands 1–5and 7) having 30-m resolution were used as spectral features.Previous work was used as auxiliary information for the extrac-tion of the ground reference data [33]. A color composite of thesubimage is shown in Fig. 14(a), and the ground truth map thatwas used in our experiments is depicted in Fig. 14(b).

Nine classes were utilized, while background and parcelboundaries [depicted as w0 and w10 in the ground truth map,see Fig. 14(b)] were discarded from evaluation. The discardingof parcel boundaries supports the selection of pure pixels.Therefore, pixels that contain a mixture of spectral responseswere discarded in this experiment. The same selection processwas applied as in [33].

The description of the classes and the numbers of class sam-ples that were used in the experiment are depicted in Table VII.

TABLE VIIIAVERAGE CLASSIFICATION RESULTS WITH THE KARACABEY DATA SET

The average training and testing accuracies as well as theaccuracies that were obtained with the whole scene are shownin Table VIII. Balanced numbers of training and testing sampleswere selected randomly. In this experiment, we compared theBVDA with the SVM classifiers and the MLC.

As we observe in Table VIII, the results that were obtainedwith the BVDA and the C-BVDA (67.41% and 68.80%, respec-tively, for whole scene) are satisfactory in comparison to MLC(63.80%). We obtained better result (69.20% for whole scene)with the RBF-SVM classifier in this experiment. The averageclassification accuracies are less than 70%. Using only one setof multispectral data is not sufficient for discriminating detailedclass types. In the previous work [33], three different scenes thatwere acquired in a period of approximately one month wereused for classification. This indicates that multitemporal dataclassification can be used to improve classification accuracyfurther. The thematic map of the BVDA result for the Kara-cabey data set is depicted in Fig. 14(c).

IV. CONCLUSION AND FUTURE WORK

In this paper, we have proposed a new algorithm for theclassification of remote sensing images. The method first makesuse of detected border vectors as part of an adaptation processthat is aimed at better describing the classes and then usesminimum distance to border vector rule for classification. Theconcept of border vectors that was proposed in this paperhas some similarity with support vectors in SVM classifiers.However, the procedure of the initialization of border vectorsand subsequent adaptation process to find final border vectorsis completely different. The competitive learning principle isapplied during the adaptation procedure. In this sense, theadaptation algorithm that was used has some similarity withthe LVQ algorithm. The reason for this adaptation strategy isto satisfy the maximum margin principle adaptively. It may beuseful to mention some other classification algorithms that havesome similarity to the BVDA. The GAL algorithm randomlychooses a subset of training samples to satisfy a predefinedtraining accuracy until reaching a predefined iteration numberwithout any geometric consideration. The border vectors thatwere chosen in the BVDA are different. The KNN algorithmuses the whole training set directly as reference vectors. In theBVDA algorithm, the adaptation procedure changes the initialvalues of the border vector, which are selected from originaltraining samples via the border vector detection procedure.Therefore, the adaptation procedure improves the generaliza-tion capability of the BVDA classifier.

The BVDA, which utilizes training samples near decisionboundaries, is a nonparametric classifier, robust against the

Page 13: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

3892 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

Hughes effect, and well suited for remote sensing applica-tions. The C-BVDA, which combines individual results of theBVDAs based on consensual rule via cross validation, wasintroduced in order to improve the performance of the indi-vidual BVDA classifier. Additionally, appropriate safe rejectionschemes [20] can be applied to the BVDA to reach higherclassification accuracies.

In the spatial space, there are also a variety of applica-tions that are suitable for processing with the BVDA, suchas target detection and contour specification. In conclusion,the BVDA can be applied in various suitable applicationsin remote sensing, image processing, and other classificationapplications.

ACKNOWLEDGMENT

The authors would like to thank Global Land Cover Facilityfor providing Landsat-7 ETM+ data, which is used as theKaracabey data set in the experiment.

REFERENCES

[1] P. M. Atkinson and A. R. L. Tatnall, “Neural networks in remote sensing,”Int. J. Remote Sens., vol. 18, no. 4, pp. 699–709, Mar. 1997.

[2] G. M. Foody and M. K. Arora, “An evaluation of some factors affectingthe accuracy of classification by an artificial neural network,” Int. J.Remote Sens., vol. 18, no. 4, pp. 799–810, Mar. 1997.

[3] P. M. Mather, Computer Processing of Remotely Sensed Images, 2nd ed.Chichester, U.K.: Wiley, 1999.

[4] T. G. Van Niel and B. Datt, “On the relationship between training samplesize and data dimensionality of broadband multi-temporal classification,”Remote Sens. Environ., vol. 98, pp. 416–425, 2005.

[5] K. Fukunaga, Introduction to Statistical Pattern Recognition. San Diego,CA: Academic, 1990, pp. 99–109.

[6] G. F. Hughes, “On the mean accuracy of statistical pattern recognizers,”IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968.

[7] G. M. Foody, “The significance of border training patterns in classificationby a feedforward neural network using back propagation learning,” Int. J.Remote Sens., vol. 20, no. 18, pp. 3549–3562, Dec. 1999.

[8] G. M. Foody and A. Mathur, “The use of small training sets containingmixed pixels for accurate hard image classification: Training on mixedspectral responses for classification by a SVM,” Remote Sens. Environ.,vol. 103, no. 2, pp. 179–189, Jul. 2006.

[9] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.[10] M. S. Dawson and M. T. Manry, “Surface parameter retrieval using

fast learning neural networks,” Remote Sens. Rev., vol. 7, no. 1, pp. 1–18,1993.

[11] H. Karakahya, B. Yazgan, and O. K. Ersoy, “A spectral-spatialclassification algorithm for multispectral remote sensing data,” inProc. 13th Int. Conf. Artif. Neural Netw., Istanbul, Turkey, Jun. 2003,pp. 1011–1017.

[12] J. A. Benediktsson, J. R. Sveinsson, O. K. Ersoy, and P. H. Swain, “Par-allel consensual neural networks,” IEEE Trans. Geosci. Remote Sens.,vol. 8, no. 1, pp. 54–64, Jan. 1997.

[13] J. Lee and O. Ersoy, “Consensual and hierarchical classification of re-motely sensed multispectral images,” in Proc. IGARRS, Denver, CO,Jul. 31–Aug. 4 2006, pp. 3915–3918.

[14] J. P. Hoffbeck and D. A. Landgrebe, “Covariance matrix estimation andclassification with limited training data,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 18, no. 7, pp. 763–767, Jul. 1996.

[15] S. Tadjudin and D. A. Landgrebe, “Covariance estimation with limitedtraining samples,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 4,pp. 2113–2118, Jul. 1999.

[16] T. K. Moon, “The expectation-maximization algorithm,” IEEE SignalProcess. Mag., vol. 13, no. 6, pp. 47–60, Nov. 1996.

[17] C. Lee and D. A. Landgrebe, “Feature extraction based on decision bound-aries,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 4, pp. 388–400, Apr. 1993.

[18] L. O. Jimenez and D. A. Landgrebe, “Hyperspectral data analysis andsupervised feature reduction via projection pursuit,” IEEE Trans. Geosci.Remote Sens., vol. 37, no. 6, pp. 2653–2667, Nov. 1999.

[19] T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,”IEEE Trans. Inf. Theory, vol. IT-13, no. 1, pp. 21–27, Jan. 1967.

[20] S. Cho, O. K. Ersoy, and M. R. Lehto, “Parallel, self-organizing, hi-erarchical neural networks with competitive learning and safe rejec-tion schemes,” IEEE Trans. Circuits Syst., vol. 40, no. 9, pp. 556–567,Sep. 1993.

[21] J. A. Benediktsson, J. R. Sveinsson, and P. H. Swain, “Hybrid consensustheoretic classification,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 4,pp. 833–843, Jul. 1997.

[22] J. A. Benediktsson and I. Kanellopoulos, “Classification of multisourceand hyperspectral data based on decision fusion,” IEEE Trans. Geosci.Remote Sens., vol. 37, no. 3, pp. 1367–1376, May 1999.

[23] H.-M. Chee and O. K. Ersoy, “A statistical self-organizing learning systemfor remote sensing classification,” IEEE Trans. Geosci. Remote Sens.,vol. 432, no. 8, pp. 1890–1900, Aug. 2005.

[24] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens-ing images with support vector machines,” IEEE Trans. Geosci. RemoteSens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.

[25] G. M. Foody and A. Mathur, “A relative evaluation of multiclass imageclassification by support vector machines,” IEEE Trans. Geosci. RemoteSens., vol. 42, no. 6, pp. 1335–1343, Jun. 2004.

[26] G. C. Valls, L. G. Chova, J. M. Mari, J. V. Frances, and J. C. Marivilla,“Composite kernels for hyperspectral image classification,” IEEE Geosci.Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006.

[27] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9,pp. 1464–1480, Sep. 1990.

[28] E. Alpaydin, “GAL: Networks that grow when they learn and shrink whenthey forget,” Int. Comput. Sci. Inst., Berkeley, CA, Tech. Rep. TR 91-032,1991.

[29] G. A. Carpenter and S. Grosberg, “A massively parallel architecture for aself organizing neural pattern recognition machine,” Comput. Vis. Graph.Image Process., vol. 37, no. 1, pp. 54–115, 1987.

[30] L. O. Jimenez, A. M. Morell, and A. Creus, “Classification of hyperdi-mensional data based on feature and decision fusion approaches usingprojection pursuit, majority voting, and neural networks,” IEEE Trans.Geosci. Remote Sens., vol. 37, no. 3, pp. 1360–1366, May 1999.

[31] D. Landgrebe and L. Biehl, Multispec and AVIRIS NW Indiana’sIndian Pines 1992 data set. [Online]. Available: http://www.ece.purdue.edu /~biehl /MultiSpec/index.html

[32] Satimage Data Set. [Online]. Available: http://www.liacc.up.pt/ML/old/statlog/ datasets.html

[33] M. Arikan, “Parcel based crop mapping through multi-temporal maskingclassification of Landsat 7 images in Karacabey, Turkey,” in Proc. ISPRSSymp., Istanbul Int. Archives Photogrammetry. Remote Sensing and Spa-tial Inf. Sci., 2004, vol. 34, p. 1085.

[34] D. A. Landgrebe, Signal Theory and Methods in Multispectral RemoteSensing. Hoboken, NJ: Wiley, 2003.

[35] T. Joachims, “Making large-scale SVM learning practical,” in Advancesin Kernel Methods—Support Vector Learning, B. Schölkopf, C. Burges,and A. Smola, Eds. Cambridge, MA: MIT Press, 1999.

[36] A. Schwaighofer, MATLAB Interface to SVM Light, Syria, Austria:Inst. Theoretical Comput. Sci., Graz Univ. Technol. [Online]. Available:http://www.cis.tugraz.at/igi/aschwaig/ software.html

[37] C. Hsu and C.-J. Lin, “A comparison of methods for multiclass supportvector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425,Mar. 2002.

[38] P. K. Varshney and M. K. Arora, Advanced Image Processing Techniquesfor Remotely Sensed Hyperspectral Data. New York: Springer-Verlag,2004.

[39] J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for findingbest matches in logarithmic expected time,” ACM Trans. Math. Softw.,vol. 3, no. 3, pp. 209–226, Sep. 1977.

[40] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduc-tion. New York: Springer-Verlag, 1985.

[41] A. Guttman, “R-trees: A dynamic index structure for spatial searching,”in Proc. 3rd ACM SIGACT-SIGMOD Symp. Principles Database Syst.,Apr. 1984, pp. 47–57.

[42] J. K. Uhlmann, “Satisfying general proximity/similarity queries withmetric trees,” Inf. Process. Lett., vol. 40, no. 4, pp. 175–179,Nov. 1991.

[43] S. M. Omohundro, “Bumptrees for efficient function, constraint, andclassification learning,” in Advances in Neural Information ProcessingSystems 3, R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Eds.San Mateo, CA: Morgan Kaufmann, 1991.

[44] X. Jia and J. A. Richards, “Fast k-NN classification using the cluster-spaceapproach,” IEEE Geosci. Remote Sens. Lett., vol. 2, no. 2, pp. 225–228,Apr. 2005.

Page 14: Border Vector Detection and Adaptation for Classification of Multispectral and Hyperspectral Remote Sensing Images

KASAPOGLU AND ERSOY: BORDER VECTOR DETECTION AND ADAPTATION 3893

N. Gökhan Kasapoglu (S’01) received the B.Sc. de-gree from the Yıldız Technical University, Istanbul,Turkey, in 1995 and the M.Sc. degree from IstanbulTechnical University (ITU), Istanbul, in 2000, bothin electronics and communication engineering.

From 1999 to 2005, he was a Senior System En-gineer with the Center for Satellite Communicationand Remote Sensing (ITU-CSCRS). He is currently aResearch Assistant in the Department of Electronicsand Communication Engineering, ITU. His researchinterests include pattern recognition in remote sens-

ing, neural networks, synthetic aperture radar (SAR) processing, and SAR rawdata compression.

Okan K. Ersoy (M’86–SM’90–F’00) receivedthe BSEE degree from Robert College (currentlyBogaziçi University), Istanbul, Turkey, in 1967 andthe MS degree and the second MS and PhD degreesfrom the University of California, Los Angeles, in1968 and 1972, respectively.

He is currently a Professor of electrical andcomputer engineering with Purdue University, WestLafayette, IN. His current research interests includestatistical and computational intelligence, digitalsignal/image processing and recognition, transform

and time-frequency methods, and their applications in remote sensing, bioinfor-matics, imaging, diffractive optics, management, and distant learning. He haspublished approximately 240 papers in his areas of research. He is the holderof four patents.

Prof. Ersoy is a Fellow of the Optical Society of America.