10
Using k-nn and discriminant analyses to classify rain forest types in a Landsat TM image over northern Costa Rica Sirpa Thessler a, , Steven Sesnie b,c , Zayra S. Ramos Bendaña c , Kalle Ruokolainen d , Erkki Tomppo a , Bryan Finegan c a Finnish Forest Research Institute (Metla), Unioninkatu 40 A, 00170 Helsinki, Finland b University of Idaho, College of Natural Resources, Forest Resources Department, PO Box 1133, Moscow, Idaho 83844-1133, United States c Tropical Agricultural Research and Higher Education Center (CATIE), 7170 Turrialba, Costa Rica d University of Turku, Department of Biology, 20014 Turku, Finland Received 29 March 2007; received in revised form 22 October 2007; accepted 18 November 2007 Abstract Conservation and land use planning in humid tropical lowland forests urgently need accurate remote sensing techniques to distinguish among floristically different forest types. We investigated the degree to which floristically and structurally defined Costa Rican lowland rain forest types can be accurately discriminated by a non-parametric k nearest neighbors (k-nn) classifier or linear discriminant analysis. Pixel values of Landsat Thematic Mapper (TM) image and Shuttle Radar Topography Mission (SRTM) elevation model extracted from segments or from 5×5 pixel windows were employed in the classifications. 104 field plots were classified into three floristic and one structural type of forest (regrowth forest). Three floristically defined forest types were formed through clustering the old-growth forest plots (n = 52) by their species specific importance values. An error assessment of the image classification was conducted via cross-validation and error matrices, and overall percent accuracy and Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish two old-growth forest classes, so they were merged into a single forest class. The resulting three forest classes were most accurately classified by the k-nn classifier using segmented image data (overall accuracy 91%). The second best method, with respect to accuracy, was the k-nn with 5 × 5 pixel windows data (89% accuracy), followed by the canonical discriminant analysis using the 5×5 pixel window data (86%) and the segment data (82%). We conclude the k-nn classifier can accurately distinguish floristically and structurally different rain forest types. The classification accuracies were higher for the k-nn classifier than for the canonical discriminant analysis, but the differences in Kappa scores were not statistically significant. The segmentation did not increase classification accuracy in this study. © 2008 Elsevier Inc. All rights reserved. Keywords: Discriminant analysis; Forest classification; k nearest neighbours; Landsat TM 5; Remote sensing; Satellite image; Segmentation; Tropical rain forest 1. Introduction Satellite images and aerial photographs are a primary source of vegetation data in remote tropical regions where other data sources, such as maps of vegetation, soil and topography are typically unavailable (Ferrier, 2002). Floristically diverse tropical rain forest areas often lack extensive field data due to logistical problems and difficulty in describing extremely diverse or poorly known flora (Ruokolainen et al., 1997). Once related to species composition or structural conditions on the ground, satellite images provide a low-cost source of data for distinguishing and mapping rain forest types, a prerequisite for conservation planning and sustainable forest management (Favrichon, 1998; Ferrier & Guisan, 2006; Margules & Pressey, 2000). Tropical rain forest types are usually classified by broad physiognomic and structural features (FAO, 1996; IBGE 1989; Available online at www.sciencedirect.com Remote Sensing of Environment 112 (2008) 2485 2494 www.elsevier.com/locate/rse Corresponding author. MTT Agrifood Research Finland, Luutnantintie 13, 00410 Helsinki, Finland. Tel.: +358 9 5608 6201. E-mail addresses: [email protected] (S. Thessler), [email protected] (S. Sesnie), [email protected] (Z.S. Ramos Bendaña), [email protected] (K. Ruokolainen), [email protected] (E. Tomppo), [email protected] (B. Finegan). 0034-4257/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.rse.2007.11.015

Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

Available online at www.sciencedirect.com

t 112 (2008) 2485–2494www.elsevier.com/locate/rse

Remote Sensing of Environmen

Using k-nn and discriminant analyses to classify rain forest types in aLandsat TM image over northern Costa Rica

Sirpa Thessler a,⁎, Steven Sesnie b,c, Zayra S. Ramos Bendaña c, Kalle Ruokolainen d,Erkki Tomppo a, Bryan Finegan c

a Finnish Forest Research Institute (Metla), Unioninkatu 40 A, 00170 Helsinki, Finlandb University of Idaho, College of Natural Resources, Forest Resources Department, PO Box 1133, Moscow, Idaho 83844-1133, United States

c Tropical Agricultural Research and Higher Education Center (CATIE), 7170 Turrialba, Costa Ricad University of Turku, Department of Biology, 20014 Turku, Finland

Received 29 March 2007; received in revised form 22 October 2007; accepted 18 November 2007

Abstract

Conservation and land use planning in humid tropical lowland forests urgently need accurate remote sensing techniques to distinguish amongfloristically different forest types. We investigated the degree to which floristically and structurally defined Costa Rican lowland rain forest typescan be accurately discriminated by a non-parametric k nearest neighbors (k-nn) classifier or linear discriminant analysis. Pixel values of LandsatThematic Mapper (TM) image and Shuttle Radar Topography Mission (SRTM) elevation model extracted from segments or from 5×5 pixelwindows were employed in the classifications. 104 field plots were classified into three floristic and one structural type of forest (regrowth forest).Three floristically defined forest types were formed through clustering the old-growth forest plots (n=52) by their species specific importancevalues. An error assessment of the image classification was conducted via cross-validation and error matrices, and overall percent accuracy andKappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish two old-growth forestclasses, so they were merged into a single forest class. The resulting three forest classes were most accurately classified by the k-nn classifier usingsegmented image data (overall accuracy 91%). The second best method, with respect to accuracy, was the k-nn with 5×5 pixel windows data(89% accuracy), followed by the canonical discriminant analysis using the 5×5 pixel window data (86%) and the segment data (82%). Weconclude the k-nn classifier can accurately distinguish floristically and structurally different rain forest types. The classification accuracies werehigher for the k-nn classifier than for the canonical discriminant analysis, but the differences in Kappa scores were not statistically significant. Thesegmentation did not increase classification accuracy in this study.© 2008 Elsevier Inc. All rights reserved.

Keywords: Discriminant analysis; Forest classification; k nearest neighbours; Landsat TM 5; Remote sensing; Satellite image; Segmentation; Tropical rain forest

1. Introduction

Satellite images and aerial photographs are a primary sourceof vegetation data in remote tropical regions where other data

⁎ Corresponding author. MTT Agrifood Research Finland, Luutnantintie 13,00410 Helsinki, Finland. Tel.: +358 9 5608 6201.

E-mail addresses: [email protected] (S. Thessler),[email protected] (S. Sesnie), [email protected] (Z.S. Ramos Bendaña),[email protected] (K. Ruokolainen), [email protected] (E. Tomppo),[email protected] (B. Finegan).

0034-4257/$ - see front matter © 2008 Elsevier Inc. All rights reserved.doi:10.1016/j.rse.2007.11.015

sources, such as maps of vegetation, soil and topography aretypically unavailable (Ferrier, 2002). Floristically diverse tropicalrain forest areas often lack extensive field data due to logisticalproblems and difficulty in describing extremely diverse or poorlyknown flora (Ruokolainen et al., 1997). Once related to speciescomposition or structural conditions on the ground, satellite imagesprovide a low-cost source of data for distinguishing and mappingrain forest types, a prerequisite for conservation planning andsustainable forest management (Favrichon, 1998; Ferrier &Guisan, 2006; Margules & Pressey, 2000).

Tropical rain forest types are usually classified by broadphysiognomic and structural features (FAO, 1996; IBGE 1989;

Page 2: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2486 S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

Prance, 1989), because they often appear structurally relativelyhomogeneous at landscape scale. In addition, the field data isusually too coarse for distinguishing floristic differences causedby sampling error from differences which might be due to someenvironmental factors. Consequently, only a few categorieshave been used to discriminate rain forest types using remotesensing methods (Achard et al., 2001; Kleinn et al., 2002;Mayaux et al., 1999; Sader & Joyce, 1988; Skole & Tucker,1993; Stibig et al., 2003). More recently, several studies haveshown that rain forests have a lot of soil-related fine-grained(resolution of ca. 1 km and less) floristic and structuralvariation, which is not displayed by the existing vegetationmaps (Clark et al., 1998, 1999; Paoli et al., 2006; Phillips et al.,2003; Potts et al., 2002; Ruokolainen et al., 1998, 2007;Tuomisto & Poulsen, 1996; Tuomisto et al., 1995, 2002,2003b).

Relatively few studies have attempted to link fine-grainedrain forest composition with remotely sensed data. Early studiesusing Landsat TM imagery have discriminated swamp and

Fig. 1. Location of the field plot sites in the Northern Co

floodplain forest, but not upland rain forest types (Foody & Hill,1996; Hill, 1999). More recent studies suggest that spectral datafrom satellite images have the potential to be used to mappreviously undetected floristic differences in tropical rain forests(Foody & Cutler, 2003, 2006; Ruokolainen et al., 1998;Salovaara et al., 2005; Thenkabail et al., 2004; Thessler et al.,2005; Tuomisto et al., 2003a,b; Vieira et al., 2003).

The level of forest classification accuracy from a givensatellite sensor's data depends on the classification algorithm(Kleinn et al., 2002) and the resolution (pixel window orsegment size) applied in the process (Hill & Foody, 1994; Loboet al., 1998). Moreover, clearly defined forest categories arenecessary to evaluate classifiers for thematic accuracy (Kleinnet al., 2002). A commonly used supervised classification methodis discriminant analysis (e.g. Lobo et al., 1998; Thenkabail et al.,2004). Discriminant analysis searches for the linear combinationof variables (e.g., spectral features) that best discriminatesamong classes (Legendre & Legendre, 1998). A non-parametricalternative to discriminant analysis is the k nearest neighbours

sta Rica overlaid in Landsat TM image (band TM5).

Page 3: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2487S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

(k-nn) classifier. The k-nn technique has frequently been used toestimate forest inventory variables from satellite imagery, such astotal volume and basal area for temperate and boreal zones(Eisele, 1998; Gjertsen et al., 2000; Koukal et al., 2007;McInerney et al., 2005; McRoberts et al., 2007; Nilsson, 1997;Tomppo, 1991, 1996; Tomppo et al., 1999, 2001), however, it isless commonly used for image based classification of forest types(but see Franco-Lopez et al., 2001; Haapanen et al., 2004). In thetropics it has been tested once, with promising results, forestimating continuous floristic differences (Thessler et al., 2005).One advantage of the non-parametric k-nn classifier is that itmakes no distributional assumptions about the variables used(Tomppo, 1996). In k-nn, the pixel whose class is unknown isassigned to a class according to its spectrally nearest neighbours(e. g. spectrally most similar pixels) whose class identities areknown.

Classification accuracy may potentially be enhanced using themean pixel values computed within pixel windows or imagesegments rather than single pixel values. Use of pixel windows orsegments reduces the negative effects of satellite image rectifica-tion errors and uncertainty of field plot locations (Hill & Foody,1994; Lobo et al., 1998). Conversely, large pixel windows maydecrease classification accuracy in heterogeneous landscapeswhere two or more land use types are contained in a singlewindow (Hill, 1999). This problem can be alleviated bysegmenting the image into regions (segments) that consist ofspatially contiguous pixels with similar image features (Narendra& Goldberg, 1980).

Our principle objective is to investigate whether discriminantanalysis and k-nn classifiers can accurately classify Costa Ricanrain forest types. Comparisons are made using feature extractionsfrom image segments and from pixel windows to identify themost suitable classification approach via accuracy assessment.

2. Materials

2.1. Field data

The study area is located in northern Costa Rica (Fig. 1).Forest plots were located within, or close to, the BiologicalCorridor of Rio San Juan–La Selva, which connects protectedforests in Costa Rica's Central Mountains to mainly undisturbedforests in south-eastern Nicaragua (Chassot & Monge Arias,2002). Rainfall averages 4000 mm per year with no distinct dryseason and only one month with less than 100 mm rainfall. Theaverage monthly temperature is approximately 26 °C with littlevariation among months (Sanford et al., 1994). Forest cover inthe study area is highly fragmented outside protected areas.

We compiled three different field data sets in lowland old-growth forest (e.g., late successional closed canopy forest) andforest regrowth areas previously used for agriculture:

1) Forty-one 0.25 ha (50 m×50 m) plots were establishedbetween May and October 2003 in old-growth forest patcheslarger than 40 ha. A plot size of 50-m square was used toaccommodate plot locations offset from the centre ofLandsat TM pixels (30 m). Plots were established in old-

growth forests patches which were easily accessible and forwhich the land-owners gave us permission. Within a forestpatch, the plots were located at least 300 m from each otherand at least 150 m from forest edge. Diameter at breast height(137 cm, dbh) was measured for trees greater than 30 cm dbhand palms greater than 10 cm dbh and they were identifiedby genus and species names. For trees whose species wereunidentified in the field, a herbarium specimen was collectedand identified by an experienced botanist, Nelson Zamora, atthe National Biodiversity Institute in San José, Costa Rica.The plots were geolocated in the field using a Garmin GPSIII and non-differentially corrected signals (Ramos Bendaña,2004). Old-growth forest patches bore some evidence ofselective logging.

2) Eleven permanent plots in old-growth forests were sampledin 1996 and 1997 using the same dimensions and tree mea-surements as above (Finegan & Camacho, 1999). Plots weregeolocated using unpublished maps of 1:2000 constructed byCentro de Investigaciones Agronomicas at the University ofCosta Rica.

3) Fifty-two field plots were geolocated in the field betweenJanuary and April of 2004 using Garmin GPS III for primarilyyounger forest regrowth areas (younger than 20 years old)regenerated from pasture, larger than 2 ha in size. Forest areaswere characterized as a regrowth stage based on tree height andcanopy closure. No species-specific data was collected fromregenerating forest areas and all plots were grouped into asingle early successional structural class termed “regrowth”.

One should note that the field sampling was not random orsystematic, so predictions of the characteristics of unvisitedsites in the study area may be biased (Stehman & Czaplewski,1998). However, our objective was not to classify unvisitedsites, but to compare the performances of different methods inestablishing a predictive link between field observations andremotely sensed data.

Three floristically defined old-growth forest types wereidentified from hierarchical cluster analysis using species-specific importance values per plot determined by relativeabundance, basal area and frequency of occurrence within theplot. Classification procedures and forest types categorized bytree species composition are described in Ramos Bendaña(2004). The floristic classification of old-growth forestsproduced three forest types: Pentaclethra–Carapa forest(PeOG), palm forest (PaOG), and mixed old-growth forest(MOG, Table 1). The first two forest types were floristicallysimilar and the mixed old-growth forest was floristically moredistinct. The regrowth forests type (RG) was a fourth forest type(Table 1).

2.2. Remotely sensed data

We used a sufficiently cloud-free (cloud cover less than 10%)Landsat Thematic Mapper (TM) 5 satellite image (WRS path 15and row 53) from January 2001 that was closest to the date offield sampling. We used information from the bands TM1,TM4, TM5 and TM7. Bands TM2 and TM3 were excluded

Page 4: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

Table 1The classification of Costa Rican lowland rain forests into four types

Forest type⁎ Description Nbr ofplots

Pentaclethra-Carapa forest(PeOG)

Homogeneous old growth forests dominatedby Pentaclethra macroloba with associateabundant tree species Carapa guianensis.Relatively low abundance of palms. Foresttype occurs mainly in relatively fertile andmoisture soils (Inceptisols).

25

Mixed old-growth forest(MOG)

Mature forest with several dominant trees(Qualea paraensis, Dialium guianense, Coumamacrocarpa). Pentaclethra macroloba was rare.Plots of this forest type concentrate to the well-drained soils with low fertility and to thenorthwest corner of the study area.

16

Palm forest(PaOG)

Pentaclethra macroloba dominated matureforest that had high number of palms of thespecies Socratea exorrhiza, Iriartea deltoidea,Welfia georgii, and Euterpe precatoria. Thedistribution was found not tightly related toany single soil type.

11

Re-growthforest (RG)

Young (b20 years old) regowth forest.Mainly forests that have spontaneouslyreturned to abandoned pastures butsometimes also forests of native treespecies in abandoned tree plantations.

52

⁎In the classification of the forests into three forest types, PeOG and PaOG werecombined into Pentaclethra forest type (POG).

2488 S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

because they had strong striping and banding. Thermal bandTM6 was excluded because it is less informative for vegetationclassification and has a larger pixel size than the other bands.The sun zenith angle was 45° and the sun azimuth angle 134°.

We used the Shuttle Radar Topography Mission digitalelevation model (SRTM DEM), which is made from the C —band radar data and has a spatial resolution of 90 m. The SRTMDEM was downloaded from the United States GeologicalSurvey website July of 2003 (http://srtm.usgs.gov/). The verticalerror of the SRTMDEM in our study area is approximately 16 m(Hofton et al., 2006), but the error can be decreased usingaverage values of multiple pixels (Kellndorfer et al., 2004).

The satellite image was rectified with 50 points extracted fromthe Costa Rican National Geographic Institute's digital 1:50 000topographical sheets using a linear polynomial model. Rectifica-tion error was 12.7 m (x=11.4, y=8.9). Atmospheric correctionwas not considered useful, because only a single Landsat TMscene was used (Song et al., 2001) and no information onvariation of atmospheric conditions was available.

Topographic normalization was implemented using the non-Lambertian Minnaert correction method (Minnaert & Szeicz,1961). Prior to normalisation, the 90 m SRTM DEM wasresampled to the resolution of Landsat TM (30 m) with a cubicconvolution resampling method. A constant k of Minnaertcorrection, which describes the roughness of the surface and ishigh when topography affects pixel values, was calculatedseparately for each band. The normalised radiance values werecalculated by Backward Minnaert Correction method (Colby,1991). Normalisation markedly decreased correlation coeffi-cients between incidence angle (angle between ground normal

and sun rays) and radiance values compared to the originalimage, but the influence on classification accuracy was onlymarginal. In a preliminary linear discriminant analysis of thethree old-growth forest types (PeOG, PaOG and MOG) the non-corrected image produced an overall accuracy of 75%. Thenormalized image yielded an increase in accuracy of 2%.

3. Methods

3.1. Segmentation

Image segments were delineated from the topographicallynormalised Landsat TM image, which was smoothed by aGaussian filter. A pixel window of 7×7 pixels and width para-meter of 3.5 were applied in the filter. The spectral featureswere, however, extracted from the unfiltered satellite image.

We applied the modified implementation of “Segmentationwith directed trees” (Narendra & Goldberg, 1980; Pekkarinen,2002), which has shown good results for delineating forest standsand in extracting spectral features for forest inventory applications(Mäkelä & Pekkarinen, 2001; Pekkarinen, 2002). The methodstarts by dividing an image into edge and plateau areas using theedge value (e) and the maximum difference in edge values(G) computed over all the bands. The edge value (e) for each pixelof the image (I) is determined using the neighbourhood of theeight spatially nearest pixels N8(i,j) and the original spectralvalues. Let us denote by Pb(i,j) the spectral value of pixel (i,j) onband b. The edge value is

e i;jð Þ ¼Xnb¼1

Xl;mð ÞaN8 i; jð Þ

jPb i; jð Þ � Pb l;mð Þj0@

1A; i; jð Þa I ð1Þ

where n is the number of image bands.The maximum difference in edge values (G) between the

pixel and its neighbourhood (Eq. 2) is calculated to form anedge gradient image which is used to divide the pixels into edge,plateau and root pixels by applying a user-defined thresholdvalue.

G i; jð Þ ¼ maxl;mð ÞaN8 i; jð Þ

e i; jð Þ � e l;mð Þ� �

; i; jð Þa I ð2Þ

The pixel is classified as a plateau pixel if the absolute valueof G of the pixel is less than or equal to the user-definedthreshold value. Otherwise, pixels having a negative value of Gare classified as root pixels and all other pixels are classified asedge pixels. Then the edge pixels are connected in the directiongiving maximum positive edge gradient (G), i.e. the deepestslope in the edge values.

Segments were constructed from the clusters using a con-nected component labelling as follows: all connected (spatiallyadjacent) plateau and root pixels are given a unique label (i.e. anumber starting from 1). The edge pixels are labelled according tothe links established in the previous step. If the difference betweenthe edge value of a pixel and the neighbouring pixel is smallerthan the given threshold, the equivalency of the labels is recorded.Finally, all the equivalencies are resolved and all the connected

Page 5: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2489S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

components are assigned a unique label (for details see Narendra& Goldberg, 1980; Pekkarinen, 2002).

An initial segmentation was fine-tuned using the nearestneighbours region merging algorithm (Narendra & Goldberg,1980; Pekkarinen, 2002; Tomppo, 1992) which uses minimumsegment size and distance parameters in merging the segments.The distance parameter is calculated as Euclidean distance inthe spectral space between neighbouring segments. If the size ordistance parameter between neighbouring segments is smallerthan a user-defined value, the segments are merged.

We iteratively tested combinations of the size and distancethreshold parameters for the merging routine. A minimum seg-ment size of 3 pixels and distance threshold of 1 was chosenbased on visual inspection and was used for further analyses.

3.2. Feature extraction

Two data sets were constructed by extracting spectral valuesfrom segments (termed the “segment dataset”) and from 5×5pixel windows (termed the “pixel window dataset”). Eachsegment and pixel window contained a single forest plot. Whentwo field plots fell in the same segment, the vegetation class ofthe plot that was closer to the centre of the segment wasselected. The segment data set included 102 segments with fieldplot data. The 5×5 pixel windows were centered on individualforest plots. Because different field data sets were combined,two field plots were less than 150 m apart, resulting in the pixelwindows corresponding to these plots being partly overlapping.This was avoided by dropping one of the two plots. A total of103 field plots were used in the pixel window data set.

A total of 15 different features were extracted for both segmentand pixel window data sets. Features from the topographicallynormalized Landsat TM image included: means and variances ofbands TM1, TM4, TM5 and TM7 and six band ratios (TM1/TM4,TM1/TM5, TM1/TM7, TM4/TM5, TM4/TM7 and TM5/TM7)calculated either within a 5×5 pixel window or segment.Variances of respective spectral bands were extracted becausethey were expected to contain information on structural char-acteristics of forest types. SRTM DEM has a pixel size of 90 m,where only a single elevation value is contained in a 5×5 pixelwindow used in feature extraction. Therefore, an elevation valuefrom oneDEMpixel was used to characterize a feature in the pixelwindow dataset. For each segment, elevation was calculated as themean of the DEM pixels belonging to that segment.

3.3. Classification methods

The k-nn classifier is a non-parametric method that can be usedto predict continuous or discrete variables for un-sampled locations,once calibratedwith field reference data (Franco-Lopez et al., 2001;Haapanen et al., 2004; McRoberts et al., 2002; McRoberts &Tomppo, 2007; Tomppo, 1991, 1996). The k-nn classifier can beapplied in several ways, and it is flexible because it can use differentdistance metrics, different weighting of neighbours and differentk values (McRoberts et al., 2002; McRoberts & Tomppo, 2007).

In the present study, Euclidean distance was employed in thespace of satellite image spectral bands and elevation, and the

neighbours were weighted inversionally proportional to thesquared Euclidean distance. First, k-nn classifier calculatedEuclidean distance (dpi, p) in the space of the spectral andelevation features from pixel (p) to each pixel (pi) whose forestclass was known. Thereafter, k nearest neighbours for pixel pwere selected in feature space, and the predicted forest type of pwas the type having the greatest sum of weights among k nearestneighbours i1(p), ..., ik(p), where the weight was defined as

wi;p ¼1

d2pi;p=Xkj¼1

1d2p jð Þ;p

if ia i1 pð Þ; N ; ik pð Þf g0 otherwise

8><>:

ð3Þ

(Tomppo, 1991, 1996). In the case of ties, a forest type wasrandomly selected from those having an equal sum of weights.The risk of ties is low, because the sum of weights is acontinuous variable.

The weights of all feature variables in the distance metricwere determined using a Genetic algorithm (GA) based opti-mization following the methods by Tomppo & Halme (2004).The algorithm was applied to a categorical variable (Tomppo,2006). The fitness function to be minimized in GAwas 1-Kappawhere (un-weighted) Kappawas calculated from the error matrix(Congalton & Green, 1999). All the spectral variables andelevation were applied in GA. An advantage of using GA is thatno assumptions about the distribution of the feature variables areneeded. A disadvantage is that GA often ends up being a localoptimum instead of a global one and several runs are needed.Thus, we ranGA 5 times and selected the weights resulting in thehighest Kappa score. GA requires a lot of computing time.

We selected three nearest neighbours (k=3) after testing thek values from 1 to 5 with the results that 1-Kappa levelled off atk=3. A small value of k can also be justified based on the smallnumber of field plots.

Classification accuracy for the k-nn classifier was comparedto that for a canonical discriminant analysis. Canonical discri-minant analysis first forms linear combinations of spectral andelevation features, called canonical variables that best summarizebetween-class variation. Canonical variables were standardizedso that means are zero and pooled, and within-class varianceequals one. In the next step, it searches for the linear function ofthe canonical variables that best separates forest types (Legendre& Legendre, 1998). All the features that significantly contributedto discriminating among forest types, according to the F test(pb0.05), were included in the canonical discriminant analysis.The final pixel window data set included the following features:means of bands TM1, TM4, TM5 and TM7, all 6 band ratios, andvariance of band TM7. In addition, the elevation, as a mean ofelevations for pixels within a segment, was selected from thesegment data set. The number of canonical variables was one lessthan the number of the classes (i.e. two or three canonicalvariables). The canonical variables were approximately normallydistributed within classes, the only exception was the firstcanonical variable in the class MOG.We also tested if a quadraticfunction of canonical variables would increase the number ofcorrectly classified observations compared to the linear function,but it did not improve classification accuracy.

Page 6: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

Fig. 2. Border lines of segments in Landsat TM-image (band TM5). The modifiedimplementation of Narenda and Goldberg's method “Segmentation with directedtrees” was applied in segmentation. Minimum segment size of 3 pixels andminimumdistance of 1 between segment'smean pixel values were used inmerginginitial segments.

2490 S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

A leave-one-out cross-validation was employed in theaccuracy assessment of both classification methods. In cross-validation, one plot was removed at time, and the rest of the plotswere used to predict the forest type of the removed plot. Separatetest data sets were also used to assess the classification accuracy.The segment and pixel window data sets were divided intoseparate training and test data sets (N=25) by systematicallyexcluding 1/4 of the plots in each forest class for the test data set.The spectral and elevation features were the same in training andtest data sets. In k-nn, weights for the features were estimated byGA using only the training data set and these weights were thenused in classification of test set data. In the canonical discriminantanalysis, the whole data set was used in computing the canonicalvariables. A linear discriminant function was computed fromtraining data only and then applied to classify the test set data.

We compared the classifiers and the data sets by computingerror matrices with producer's and user's accuracies, overallclassification accuracies and Kappa scores (Eq. 4). Overall ac-curacy was calculated from the error matrix by dividing thenumber of correctly classified plots by the total number of plots.Also the Kappa score (Eq. 4) was computed from the error matrix.

j ¼NPri¼1

Xii�Pri¼1

xiþ 4xþ ið Þ

N 2 �Pri¼1

xiþ 4xþ ið Þ; ð4Þ

where r is the number of rows in the error matrix, xii is the numberof observations in the cell ii (row i and column i), xi+ are themarginal totals of row i, x+i are the marginal totals of column iand N is the total number of observations.

Kappa is a measure of the difference between the actual agree-ment between the reference data and a remotely sensed classifica-tion (indicated by the major diagonal of the error matrix) and thechance agreement between the reference data and a randomclassification (indicated by the marginals of the error matrix)(Congalton &Green, 1999). Kappa scores vary from −1 to +1, butscores closer to +1 are expected with +1 being the perfect agree-ment. We tested if Kappa scores significantly differed from zerousing the asymptotic test in SASversion 9.1 (SASOnlineDocs 9.1).

The Kappa scores generated from error matrices of the k-nnand the canonical disriminant analysis were compared. Teststatistics were computed using

ZAB ¼ jKA � KBjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffid ̂KA þ d ̂KB

q ð5Þ

where A and B refer to two different error matrices, δ̂ is thevariance of the Kappa score (Congalton & Green, 1999) andZAB is N(0,1) distributed.

4. Results

4.1. Segmentation

The segmentation procedure (Fig. 2) resulted in 173,514segments with a mean size of 26.3 pixels. The field plots fell in

102 segments, and the mean size of those segments was35.3 pixels (minimum 5, maximum 257 andmedian 27.5 pixels).According to a visual inspection, the segmentation results cor-responded quite well with land cover classes. Agricultural andforested land were clearly separated into different segments.Within forests, each segment visually captured internally homo-geneous patches of pixels. Often these patches corresponded toshadows of hills, despite the topographic normalisation.

4.2. Classification of forest plots

The classification accuracies of the four forest types (Table 2)varied between 75.5% and 82.5%. The k-nn classification of thefour forest types was 5–8%more accurate than the classificationbased on canonical discriminant analysis (Table 2). The mostaccurate classification (overall classification accuracy of 82.5%)resulted from the k-nn classification of the pixel window dataset. The producer's accuracies varied between 73% (for PaOG)and 88% (for PeOG). The lowest and highest user's accuracieswere 69% for PeOG and 95% for RG. PeOG and PaOG wereconfused with each other; 8%–20% of plots belonging to PeOGclass were misclassified to PaOG and 0%–27% of plots assignedto class PaOG were misclassified to class PeOG in fourclassifications (k-nn and discriminant analysis of two datasets). These two classes were also floristically the two mostsimilar ones. Therefore we combined these two forest types intoa single Pentaclethra forest type (POG). The comparison of thewithin-class means and standard deviations of the canonicalvariables supported this decision. It showed that distributions of

Page 7: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

Table 2The overall classification accuracies and Kappa scores assessed by leave-one-out cross validation

Data set Canonical discriminantanalysis

k-nn

Classificationaccuracy (%)

Kappa(%)⁎⁎

Classificationaccuracy (%)

Kappa(%)⁎⁎

Four forest types⁎

5×5 pixel data 77.7 65.6 82.5 73.6Segment data 75.5 64.2 80.3 71.3Three forest types⁎

5×5 pixel data 86.4 77.6 89.3 82.3Segment data 82.4 71.6 91.2 85.8

⁎See Table 1 for an explanation for three and four forest types.⁎⁎All Kappa values differed significantly from zero (pb0.0001).

Table 3Error matrices resulting from classification of the three forest types using leave-one-out cross-validation

A. 5×5 pixel data Predicted class⁎ bydiscriminant analysis

Producer'saccuracy(%)

Class⁎ POG MOG RG Total

POG 31 1 4 36 86MOG 2 13 0 15 87RG 5 2 45 52 87Total 38 16 49User's accuracy (%) 82 81 92

B. 5×5 pixel data Predicted class⁎ byk-nn

Producer'saccuracy(%)

Class⁎ POG MOG RG Total

POG 33 1 2 36 92MOG 2 12 1 15 80RG 3 2 47 52 90Total 38 15 50User's accuracy (%) 87 80 94

C. Segment data Predicted class⁎ bydiscriminant analysis

Producer'saccuracy(%)

Class⁎ POG MOG RG Total

POG 30 1 5 36 83MOG 1 14 1 16 88RG 6 4 40 50 80Total 37 19 46User's accuracy (%) 81 74 87

D. Segment data Predicted class⁎ byk-nn

Producer'saccuracy(%)

Class⁎ POG MOG RG Total

POG 34 1 1 36 94MOG 1 15 0 16 94RG 3 3 44 50 88Total 38 19 45User's accuracy (%) 89 79 98

⁎See Table 1 for an explanation for three and four forest types.

2491S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

the first and second canonical variables between forest typesPeOG and PaOG were mostly overlapping. The discriminatingpower of the third canonical variable was low (results notshown). Accordingly, the results reported below are based onthree forest types; Pentaclethra forest (POG), mixed old-growthforest (MOG) and regrowth forest (RG).

The resulting three forest types (POG, MOG, RG) weredistinguished with high overall accuracies (82–91%, Table 2)when assessed via cross-validation. The k-nn classifier yielded3–8% higher accuracy and 4–14% higher Kappa scores thancanonical discriminant analysis. The highest classificationaccuracy (91%) was reached using segmentation and the k-nn(Table 2). The differences in Kappa scores between the k-nn andthe canonical discriminant analysis were not, however,statistically significant.

The cross-validation showed that all the classes were dis-tinguished with a producer's accuracy greater than 80%, in-dependently of data set or classification method (Table 3). Theuser's accuracies for the different forest types varied between71% and 92%, when the canonical discriminant analysis wasused. With k-nn, the user's accuracies were greater than 79%.The user's accuracy was greatest in the regrowth forest class(RG) in all error matrices (Table 3).

Validation of the three-forest type classification using separatetest data followed the results of the cross-validation. The Kappascores for the classification of the pixel window data set (79% forthe canonical discriminant analysis and 86% for k-nn) wereslightly greater, whereas, the accuracies for the classification ofthe segment data set (68% and 80%, respectively) were less thanthe accuracies obtained from the cross-validation. It should benoted that the number of field plots in the test data sets was verysmall (N=25).

The use of segments did not increase overall accuracies andKappa scores compared to the 5×5 pixel data set. One exceptionwas when the k-nn classifier was used with the segmentationprocedure and increased Kappa score by 4% (Table 2).

In the k-nn classification, the weighting of the features had agreat influence on classification accuracy. In the three-classclassification of the 5×5 pixel data, the Kappa score increasedfrom 60.5% to 82.8% due to the weighting. For the segmentdata set, the Kappa score increased from 60.6% to 85.8%. In the

canonical discriminant analysis, the selection of features andcomputation of canonical variables also increased the Kappascores compared to the linear discriminant analysis of all theextracted features. In the pixel window data, the Kappa scoreincreased from 68.1% to 77.6%, but with the segment data theincrease in the Kappa score was only marginal, from 70% to71.6%. In general, the mean features and the band ratios hadhigher discriminating power than the variance features. Of thevariance features, only the variance of the band TM7 showedstatistically significantly differences in class means (univariatetest using F statistics, p=0.005).

5. Discussion

Our results agree with other recent studies reporting theutility of Landsat TM or ETM+ images in distinguishingfloristic variation within lowland tropical rain forests (Foody& Cutler, 2003, 2006; Salovaara et al., 2005; Thessler et al.,

Page 8: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2492 S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

2005; Vieira et al., 2003). The forest types showing floristicdifferences in lowland rain forest in Costa Rica are notcommonly delineated on maps of tropical rainforest cover.Our study demonstrates that these forest types can be reliablydistinguished using readily available Landsat TM data.

Naturally, it is also important to acknowledge the problemsand constraints in mapping rain forest classes from satelliteimagery. We were not able to successfully distinguish betweenthe two Pentaclethra macroloba dominated forest types (PeOGand PaOG) with Landsat TM data. Only by combining thesetwo floristically similar classes into one category did classifica-tion accuracies reach an acceptable and comparable level tothose achieved for other rain forest types at this scale (Foody &Cutler, 2003, 2006; Salovaara et al., 2005). Undoubtedly, atsome point floristic differences become so small that it will befundamentally impossible to detect them using remote sensinginformation. However, there are still many possibilities forimproving the performance of remote sensing classification.

Other remotely sensed data could improve the ability todiscriminate field-verified floristic differences. For example,secondary forest classification in the tropics has reachedaccuracies of over 96% when hyperspectral Hyperion wasapplied (Thenkabail et al., 2004). It is also quite clear that thefield sampling procedure could be improved. A floristically andspatially representative sample of a species-rich lowland rainforest is difficult to obtain within reasonable time and costlimits, because the number of rare species is high and multipleenvironmental gradients are present. Topographical shadowsalso contribute to errors. The impact of topography on radiancevalues can eventually be removed when more accurate elevationdata becomes available.

Our results are promising for the aim of classifying rainforests by remote sensing in a systematic and credible fashion,which accounts for the floristic variation in lowland rain forests(Clark et al., 1998, 1999; Paoli et al., 2006; Phillips et al., 2003;Potts et al., 2002; Ruokolainen et al., 1998; Tuomisto et al.,1995; Tuomisto & Poulsen, 1996; Tuomisto et al., 2002, 2003b;Vormisto et al., 2004). Where knowledge of differencesbetween rain forest types is important for conservationactivities, cost-effective remotely sensed data and the statisticalmethods examined here may be used. Against this perspective,our results demonstrate that the k-nn classification, especiallyafter the weighting of features, attained higher classificationaccuracy (though not a significantly better Kappa score) thanthe canonical discriminant analysis. Higher accuracy of k-nn forthis study may be attributed to its flexibility in determining acomplex relationship between field variables and spectralfeatures. Whenever the relationship between floristic andremotely sensed data is so complex that it can not be describedby a linear model, the k-nn classifier, or other non-linearmethod, will likely perform better. In our data, topography andshadows probably induced confusion in the classifications, butthis confusion will also appear in a flat terrain because of thegradual darkening of Landsat TM images towards the east(Toivonen et al., 2006). It is also quite possible that therelationship between floristic and remotely sensed data isinherently non-linear. A disadvantage of k-nn is that it requires a

relatively extensive field dataset that encompasses the spectralvariation of the satellite image and the variation of fieldvariables (Katila & Tomppo, 2001).

Segmentation did not increase classification accuracy in ourstudy. This contrasts with earlier studies (Hill, 1999; Lobo et al.,1998) that have reported higher classification accuracies whensegmentation is used instead of pixel windows or single pixelvalues. We did anticipate that segmentation would improve theclassification result because the forests in the study area arehighly fragmented. Therefore, pixel windows would frequentlyinclude two or more land cover types. However in our study, thefield plots were located at least 150 m from forest edge, and thisapparently diminished mixing the land cover types within the5×5 pixel windows. Also, the topographical shadows inducedartificial boundaries in the segments and therefore mostprobably decreased their relevance for the floristic classifica-tion. The mean segment size (26.3 pixels) was also close to thesize of the pixel window (25 pixels).

Wise land use practices are needed for tropical areas to retainnative floristic diversity (Pitman et al., 2002) and to facilitatesustainable use of forest resources. Reliable and economicallyfeasible tools and techniques for mapping rain forest variation areessential to this process. Fast and efficient field inventorymethods(Higgins & Ruokolainen, 2004; Ruokolainen et al., 2007, 1997)are one part of this package and the use of remotely sensed data iscertainly another part. Our case study shows that the k-nnclassifier has a potential to accurately distinguish rain forest typesthat are defined by floristic and structural differences.

6. Conclusions

Our results were promising for classifying rain forests typesusing remotely sensed data and field observations in lowlandrain forest area. Three floristically and structurally distinctforest types were distinguished with high overall accuracies(82–91%). The k-nn classifier attained higher classificationaccuracy (though not a significantly better Kappa score) thanthe canonical discriminant analysis, especially after optimisingthe weights of the features in the k-nn classification. Bandmeans and ratios were more powerful features than bandvariances for discriminating the forest types, but the use ofsegments in feature extraction did not increase overallaccuracies and Kappa scores compared to the window of 5×5pixels.

Acknowledgements

We thank Anssi Pekkarinen for help with the segmentationand for his valuable comments on the manuscript. We areindebted to Edwin Pereira, Leonel Coto, Gustavo Basil andothers of the field group for their efforts during the fieldinventories. Warm thanks go to GIS laboratory of CATIE and toJeffrey Jones for offering the facilities for image analysis. HugoBrenes kindly helped to compile the floristic data from differentsources. We also thank Daisy E. Duursma for correcting thelanguage of the manuscript. The study was partly funded byAcademy of Finland and Emil Aaltonen Foundation.

Page 9: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2493S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

References

Achard, F., Eva, H., & Mayaux, P. (2001). Tropical forest mapping from coarsespatial resolution satellite data: production and accuracy assessment issues.International Journal of Remote Sensing, 22, 2741−2762.

Chassot, O., &MongeArias, G. (2002).Corredor Biólogico San Juan - La Selva.Costa Rica: Centro Scientifico Tropical (80 pp.).

Clark, D. B., Clark, D. A., & Read, J. M. (1998). Edaphic variation and themesoscale distribution of tree species in a neotropical rain forest. Journal ofEcology, 86, 101−112.

Clark, D. B., Palmer, M. W., & Clark, D. A. (1999). Edaphic factors and the land-scape-scale distributions of tropical rain forest trees. Ecology, 80, 2662−2675.

Colby, R. (1991). Defining and mapping vegetation types in mega-diversetropical forests. Photogrammetric Engineering and Remote Sensing, 57,531−537.

Congalton, R. G., & Green, K. (1999). Assessing the accuracy of remotelysensed data: Principles and practices.Boca Raton: Lewis (137 pp.).

Eisele, F. (1998). Practical benefits of multi-source inventories in CentralEurope. Managing the resources of the world's forests. Symposia pro-ceedings (pp. 71−86). Falun: Marcus Wallenberg Foundation.

FAO (1996). Forest resources assessment 1990: Survey of tropical forest coverand study of change processes.Rome: FAO (152 pp.).

Favrichon, V. (1998). Modeling the dynamics and species composition of atropical mixed-species uneven-aged natural forest: effects of alternativecutting regimes. Forest Science, 44, 113−124.

Ferrier, S. (2002). Mapping spatial pattern in biodiversity for regional conservationplanning: where to from here? Systematic Biology, 51, 331−363.

Ferrier, S., & Guisan, A. (2006). Spatial modelling of biodiversity at thecommunity level. Journal of Applied Ecology, 43, 393−404.

Finegan, B., & Camacho, M. (1999). Stand dynamics in a logged andsilviculturally treated Costa Rican rain forest, 1988–1996. Forest Ecologyand Management, 121, 177−189.

Foody, G. M., & Cutler, M. E. J. (2003). Tree biodiversity in protected andlogged Bornean tropical rain forests and its measurement by satellite remotesensing. Journal of Biogeography, 30, 1053−1066.

Foody, G. M., & Cutler, M. E. J. (2006). Mapping the species richness andcomposition of tropical forests from remotely sensed data with neuralnetworks. Ecological Modelling, 195, 37−42.

Foody, G. M., & Hill, R. A. (1996). Classification of tropical forest classes fromLandsat TM data. International Journal of Remote Sensing, 17, 2353−2367.

Franco-Lopez, H., Ek, A. R., & Bauer, M. E. (2001). Estimation and mapping offorest stand density, volume, and cover type using the k-nearest neighborsmethod. Remote Sensing of Environment, 77, 251−274.

Gjertsen, A. K., Tomter, S., & Tomppo, E. (2000). Combined use of NFI sampleplots Ad Landsat TM data to provide forest information on municipalitylevel. Proceedings of conference on remote sensing and forest monitoring,June 1–3, 1999, Rogow, Poland (pp. 167−174).

Haapanen, R., Ek, A. R., Bauer, M. E., & Finley, A. O. (2004). Delineation offorest/nonforest land use classes using nearest neighbor methods. RemoteSensing of Environment, 89, 265−271.

Higgins, M. A., & Ruokolainen, K. (2004). Rapid tropical forest inventory: acomparison of techniques based on inventory data from western Amazonia.Conservation Biology, 18, 799−811.

Hill, R. A. (1999). Image segmentation for humid tropical forest classification inLandsat TM data. International Journal of Remote Sensing, 20, 1039−1044.

Hill, R. A., & Foody, G. M. (1994). Separability of tropical rain-forest types inthe Tambopata–Candamo Reserved Zone, Peru. International Journal ofRemote Sensing, 15, 2687−2693.

Hofton, M., Dubayah, R., Blair, J. B., & Rabine, D. (2006). Validation of SRTMelevations over vegetated and non-vegetated terrain using medium footprintLidar. Photogrammetric Engineering and Remote Sensing, 72, 279−285.

IBGE (1989). Mapa de vegetação da Amazônia Legal. Scale 1:2,500,000.Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis(IBAMA), Brasília.

Katila, M., & Tomppo, E. (2001). Selecting estimation parameters for theFinnish multisource National Forest Inventory. Remote Sensing of Environ-ment, 76, 16−32.

Kellndorfer, J., Walker, W., Pierce, L., Dobson, C., Fites, J. A., Hunsaker, C., et al.(2004). Vegetation height estimation from shuttle radar topography missionand national elevation datasets. Remote Sensing of Environment, 93, 339−358.

Kleinn, C., Corrales, L., & Morales, D. (2002). Forest area in Costa Rica: acomparative study of tropical forest cover estimates over time. Environ-mental Monitoring and Assessment, 73, 17−40.

Koukal, T., Suppan, F., & Schneider, W. (2007). The impact of relativeradiometric calibration on the accuracy of kNN–predictions of forestattributes. Remote Sensing of Environment, 110, 431−437.

Legendre, P., & Legendre, L. (1998). Numerical Ecology.Amsterdam: Elsevier(853 pp.).

Lobo, A., & Gullison, R. E. (1998). Mapping the tropical landscape of Beni(Bolivia) from Landsat-TM imagery: beyond the ‘forest/non-forest’ legend.In F. Dallmeier, & J. A. Comiskey (Eds.), Forest biodiversity research,monitoring and modeling: Conseptual background and old world casestudies (pp. 159−181). Paris: UNESCO.

Mäkelä, H., & Pekkarinen, A. (2001). Estimation of timber volume at the sampleplot level by means of image segmentation and Landsat TM imagery. Re-mote Sensing of Environment, 77, 66−75.

Margules, C. R., & Pressey, R. L. (2000). Systematic conservation planning.Nature, 405, 243−253.

Mayaux, P., Richards, T., & Janodet, E. (1999). Avegetation map of Central Africaderived from satellite imagery. Journal of Biogeography, 26, 353−366.

McInerney, D., Pekkarinen, A., & Haakana, M. (2005). Combining LandsatETM+ with field data for Ireland's national forest inventory— a pilot studyfor Co. Clare. Proceendings of ForestSat 2005, May 31–June 3, Borås,Sweden (pp. 12−16).

McRoberts, R. E., Nelson, M. D., & Wendt, D. G. (2002). Stratified estimationof forest area using satellite imagery, inventory data, and the k-nearestneighbors technique. Remote Sensing of Environment, 82, 457−468.

McRoberts, R. E., & Tomppo, E. (2007). Remote sensing support for nationalforest inventories. Remote Sensing of Environment, 110, 412−419.

McRoberts, R. E., Tomppo, E. O., Finley, A. O., & Heikkinen, J. (2007).Estimating areal means and variances of forest attributes using the k-nearestneighbours technique and satellite imagery. Remote Sensing of Environment,111, 466−480.

Minnaert, J. L., & Szeicz, G. (1961). The reciprocity principle in lunarphotometry. Astrophysics Journal, 93, 403−410.

Narendra, P. M., & Goldberg, M. (1980). Image segmentation with directedtrees. IEEE transactions on pattern analysis and machine intelligence,PAMI-2 (pp. 185−191).

Nilsson, M. (1997). Estimation of forest variables using satellite image data andairborne Lidar. Acta Universitatis Agriculturae Sueciae. Silvestria, 17, 1−32.

Paoli, G. D., Curran, L. M., & Zak, D. R. (2006). Soil nutrients and betadiversity in the Bornean Dipterocarpaceae: evidence for niche partitioningby tropical rain forest trees. Journal of Ecology, 94, 157−170.

Pekkarinen, A. (2002). Image segment-based spectral features in the estimationof timber volume. Remote Sensing of Environment, 82, 349−359.

Phillips, O. L., Vargas, P. N., Monteagudo, A. L., Cruz, A. P., Zans, M. E. C.,Sanchez, W. G., et al. (2003). Habitat association among Amazonian treespecies: a landscape-scale approach. Journal of Ecology, 91, 757−775.

Pitman, N. C. A., Jørgensen, P. M., Williams, R. S. R., León-Yánez, S., &Valencia, R. (2002). Extinction-rate estimates for a modern Neotropicalflora. Conservation Biology, 16, 1427−1431.

Potts, M. D., Ashton, P. S., Kaufman, L. S., & Plotkin, J. B. (2002). Habitatpatterns in tropical rain forests: a comparison of 105 plots in NorthwestBorneo. Ecology, 83, 2782−2797.

Prance, G. T. (1989). American tropical forests. In H. Lieth, & M. J. A. Werger(Eds.), Tropical rain forest ecosystems: Biogeographical and ecologicalstudies (pp. 99−132). New York: Elsevier.

Ramos Bendaña, Z.S. (2004). Estructura y composición de un paisaje boscosofragmentado: herramienta para el diseño de estrategias de conservación de labiodiversidad. MSc thesis. (pp. 1-114). Costa Rica: CATIE.

Ruokolainen, K., Linna, A., & Tuomisto, H. (1997). Use of Melastomataceaeand pteridophytes for revealing phytogeographical patterns in Amazonianrain forests. Journal of Tropical Ecology, 13, 243−256.

Ruokolainen, K., & Tuomisto, H. (1998). Vegetación natural de la zona deIquitos. In R. Kalliola, & S. Flores Paitán (Eds.), Geoecología y desarrollo

Page 10: Using k-nn and discriminant analyses to classify rain ... · Kappa scores were used as measures of accuracy. Image classification of the four forest types did not adequately distinguish

2494 S. Thessler et al. / Remote Sensing of Environment 112 (2008) 2485–2494

amazónico estudio integrado en la zona de Iquitos, Perú (pp. 253−365).Turku: University of Turku.

Ruokolainen, K., Tuomisto, H., Macía, M. J., Higgins, M. A., & Yli-Halla, M.(2007). Are floristic and edaphic patterns in Amazonian rain forestscongruent for trees, pteridophytes and Melastomataceae? Journal ofTropical Ecology, 23, 13−25.

Sader, S. A., & Joyce, A. T. (1988). Deforestation rates and trends in Costa Rica,1940 to 1983. Biotropica, 20, 11−19.

Salovaara, K. J., Thessler, S., Malik, R. N., & Tuomisto, H. (2005). Classi-fication of Amazonian primary rain forest vegetation using Landsat ETMplus satellite imagery. Remote Sensing of Environment, 97, 39−51.

Sanford, R. L., Jr, Paaby, P., Luvall, J. C., & Phillips, E. (1994). Climate,geomorphology, and aquatic systems. In L. A. McDade, K. S. Bawa, H. A.Hespenheide, & G. S. Hartshorn (Eds.), La Selva: Ecology and naturalhistory of a neotropical rain forest (pp. 19−33). Chicago, Illinois, USA:University of Chicago Press.

SAS OnlineDoc 9.1. http://support.sas.com/91doc/docMainpage.jsp. SASInstitute Inc., NC, USA.

Skole, D., & Tucker, C. (1993). Tropical deforestation and habitat fragmentationin the Amazon: satellite data from 1978 to 1988. Science, 260, 1905−1910.

Song, C., Woodcock, C. E., Seto, K. C., Lenney, M. P., & Macomber, S. A.(2001). Classification and change detection using Landsat TM data: whenand how to correct atmospheric effects? Remote Sensing of Environment,75, 230−244.

Stehman, S. V., & Czaplewski, R. L. (1998). Design and analysis for thematicmap accuracy assessment: fundamental principles. Remote Sensing ofEnvironment, 64, 331−344.

Stibig, H. J., Beuchle, R., & Achard, F. (2003). Mapping of the tropical forestcover of insular Southeast Asia from SPOT4-Vegetation images. Interna-tional Journal of Remote Sensing, 24, 3651−3662.

Thenkabail, P. S., Enclona, E. A., Ashton, M. S., Legg, C., & De Dieu, M. J.(2004). Hyperion, IKONOS, ALI, and ETM plus sensors in the study ofAfrican rainforests. Remote Sensing of Environment, 90, 23−43.

Thessler, S., Ruokolainen, K., Tuomisto, H., & Tomppo, E. (2005). Mappinggradual landscape-scale floristic changes in Amazonian primary rain forestsby combining ordination and remote sensing. Global Ecology andBiogeography, 14, 315−325.

Toivonen, T., Kalliola, R., Ruokolainen, K., & Malik, R. N. (2006). Across-pathDN gradient in Landsat TM imagery of Amazonian forests: a challenge forimage interpretation and mosaicking. Remote Sensing of Environment, 100,550−562.

Tomppo, E. (1991). Satellite image-based national forest inventory of Finland.International Archives of Photogrammetry and Remote Sensing, 28, 419−424.

Tomppo, E. (1992). Satellite image aided forest site fertility estimation for forestincome taxation. Acta Forestalia Fennica, 229, 1−70.

Tomppo, E. (1996). Application of remote sensing in Finnish National ForestInventory. Proceedings of international workshop of application of remotesensing in European forest monitoring, 14th–16th October, Vienna, Austria(pp. 377−388).

Tomppo, E. (2006). Computing feature weights by means of a genetic algorithmwhen predicting categorial variables with k-NN. Manuscript. : FinnishForest Research Institute.

Tomppo, E., Goulding, C., & Katila, M. (1999). Adapting Finnish multi-sourceforest inventory techniques to the New Zealand preharvest inventory.Scandinavian Journal of Forest Research, 14, 182−192.

Tomppo, E., & Halme, M. (2004). Using coarse scale forest variables asancillary information and weighting of variables in k-nn estimation: agenetic algorithm approach. Remote Sensing of Environment, 92, 1−20.

Tomppo, E., Korhonen, K. T., Heikkinen, J., & Yli-Kojola, H. (2001). Multi-source inventory of the forests of the Hebei Forestry Bureau, Heilongjiang,China. Silva Fennica, 35, 309−328.

Tuomisto, H., & Poulsen, A. D. (1996). Influence of edaphic specialization onpteridophyte distribution in neotropical rain forests. Journal of Biogeo-graphy, 23, 283−293.

Tuomisto, H., Poulsen, A.D., Ruokolainen, K.,Moran, R.C.,Quintana, C., Celi, J.,et al. (2003). Linking floristic patterns with soil heterogeneity and satelliteimagery in Ecuadorian Amazonia. Ecological Applications, 13, 352−371.

Tuomisto, H., Ruokolainen, K., Aguilar, M., & Sarmiento, A. (2003). Floristicpatterns along a 43-km long transect in an Amazonian rain forest. Journal ofEcology, 91, 743−756.

Tuomisto,H., Ruokolainen, K., Kalliola, R., Linna, A., Danjoy,W.,&Rodrogiez, Z.(1995). Dissecting Amazonian biodiversity. Science, 269, 63−66.

Tuomisto, H., Ruokolainen, K., Poulsen, A. D., Moran, R. C., Quintana, C.,Canas, G., et al. (2002). Distribution and diversity of pteridophytes andMelastomataceae along edaphic gradients in Yasuni National Park,Ecuadorian Amazonia. Biotropica, 34, 516−533.

Vieira, I. C. G., de Almeida, A. S., Davidson, E. A., Stone, T. A., de Carvalho,C. J. R., & Guerrero, J. B. (2003). Classifying successional forests usingLandsat spectral properties and ecological characteristics in eastern Amazonia.Remote Sensing of Environment, 87, 470−481.

Vormisto, J., Svenning, J. C., Hall, P., & Balslev, H. (2004). Diversity anddominance in palm (Arecaceae) communities in terra firme forests in thewestern Amazon basin. Journal of Ecology, 92, 577−588.