5
CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL AND LIDAR DATA USING DEEP CONVOLUTIONAL NEURAL NETWORKS Saurabh Morchhale, V. Pa´ ul Pauca, Robert J. Plemmons, and Todd C. Torgersen Wake Forest University Departments of Computer Science and Mathematics Winston-Salem, NC 27109 ABSTRACT We investigate classification from pixel-level fusion of Hy- perspectral (HSI) and Light Detection and Ranging (LiDAR) data using convolutional neural networks (CNN). HSI and LiDAR imaging are complementary modalities increasingly used together for geospatial data collection in remote sens- ing. HSI data is used to glean information about material composition and LiDAR data provides information about the geometry of objects in the scene. Two key questions rela- tive to classification performance are addressed: the effect of merging multi-modal data and the effect of uncertainty in the CNN training data. Two recent co-registered HSI and LiDAR datasets are used here to characterize performance. One was collected, over Houston TX, by the University of Houston National Center for Airborne Laser Mapping with NSF spon- sorship, and the other was collected, over Gulfport MS, by Universities of Florida and Missouri with NGA sponsorship. Index TermsLiDAR and Hyperspectral Imaging, Con- volutional Neural Networks, Data Fusion. 1. INTRODUCTION Hyperspectral imaging (HSI) combines the power of digital imaging and spectroscopy. Imaging spectrometers gather data over a wide and continuous band of the electromagnetic spec- trum, which can be used to accurately determine the composi- tion of objects and ground cover in a scene. When the images are acquired at high spatial resolution and co-registered, the resulting data provide a robust and detailed characterization of the earth’s surface and its constituent elements. Light Detection and Ranging (LiDAR) is a remote sensing method that uses light in the form of a pulsed laser to measure ranges (variable distances) to objects in a scene. These light pulses combined with other data recorded by the airborne system generate precise, three-dimensional information about the elevation and shape of the Earth’s surface characteristics. In recent years, it has been shown that remote sensing tasks, such as scene reconstruction, feature enhancement, and classification of targets, are improved when co-registered HSI and LiDAR data are used jointly. This has spurred active re- search into methods that can reliably fuse and extract informa- tion from these complementary sensing modalities. A num- ber of feature-level fusion techniques have been developed that combine features extracted individually to produce a new feature set that better represents the scene [1, 2, 3, 4]. For example, Dalponte, Bruzzone, and Gianelle [5] apply the se- quential forward floating selection method to extract features from denoised hyperspectral data. These features are then integrated with corrected elevation and intensity models de- rived from the LiDAR data and classified using support vec- tor machines and Gaussian maximum likelihood techniques. As part of the winning entry for the 2013 GRSS Data Fu- sion Contest, Debes et al. [1] combined abundance maps ob- tained through an spectral unmixing procedure with LiDAR data, providing topological information to the classification process. A flexible strategy based on morphological features and subspace multinomial logistic regression was presented in [2] for jointly classifying HSI and LiDAR data without the need for regularization parameters. More recently, the use of deep convolutional neural net- works (CNN) has been proposed for classification of hyper- spectral imagery [6, 7]. In deep learning, neural networks of three or more layers are used to learn deep features of input data and can provide better approximations to nonlinear func- tions than single-layer classifiers, such as linear support vec- tor machines. Inspired by visual mechanisms in living organ- isms, CNNs consists of layers of neurons whose outputs are combined through convolution. Some applications of CNNs include material classification, object detection, and face and speech recognition. Here, we are concerned with the classification perfor- mance of CNNs when HSI and LiDAR data are combined at the pixel level [8], that is, before feature extraction in the classification process. In this type of fusion LiDAR elevation data is replicated and appended to HSI data for each pixel in the scene. This combined data is then processed with a multilayer CNN similar to that proposed in [7] to learn the filters producing the strongest response to local input pat- terns. Pixel-level fusion can have an advantage over other

CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL AND LIDAR DATA USINGDEEP CONVOLUTIONAL NEURAL NETWORKS

Saurabh Morchhale, V. Paul Pauca, Robert J. Plemmons, and Todd C. Torgersen

Wake Forest UniversityDepartments of Computer Science and Mathematics

Winston-Salem, NC 27109

ABSTRACT

We investigate classification from pixel-level fusion of Hy-perspectral (HSI) and Light Detection and Ranging (LiDAR)data using convolutional neural networks (CNN). HSI andLiDAR imaging are complementary modalities increasinglyused together for geospatial data collection in remote sens-ing. HSI data is used to glean information about materialcomposition and LiDAR data provides information about thegeometry of objects in the scene. Two key questions rela-tive to classification performance are addressed: the effect ofmerging multi-modal data and the effect of uncertainty in theCNN training data. Two recent co-registered HSI and LiDARdatasets are used here to characterize performance. One wascollected, over Houston TX, by the University of HoustonNational Center for Airborne Laser Mapping with NSF spon-sorship, and the other was collected, over Gulfport MS, byUniversities of Florida and Missouri with NGA sponsorship.

Index Terms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks, Data Fusion.

1. INTRODUCTION

Hyperspectral imaging (HSI) combines the power of digitalimaging and spectroscopy. Imaging spectrometers gather dataover a wide and continuous band of the electromagnetic spec-trum, which can be used to accurately determine the composi-tion of objects and ground cover in a scene. When the imagesare acquired at high spatial resolution and co-registered, theresulting data provide a robust and detailed characterizationof the earth’s surface and its constituent elements.

Light Detection and Ranging (LiDAR) is a remote sensingmethod that uses light in the form of a pulsed laser to measureranges (variable distances) to objects in a scene. These lightpulses combined with other data recorded by the airbornesystem generate precise, three-dimensional information aboutthe elevation and shape of the Earth’s surface characteristics.

In recent years, it has been shown that remote sensingtasks, such as scene reconstruction, feature enhancement, andclassification of targets, are improved when co-registered HSI

and LiDAR data are used jointly. This has spurred active re-search into methods that can reliably fuse and extract informa-tion from these complementary sensing modalities. A num-ber of feature-level fusion techniques have been developedthat combine features extracted individually to produce a newfeature set that better represents the scene [1, 2, 3, 4]. Forexample, Dalponte, Bruzzone, and Gianelle [5] apply the se-quential forward floating selection method to extract featuresfrom denoised hyperspectral data. These features are thenintegrated with corrected elevation and intensity models de-rived from the LiDAR data and classified using support vec-tor machines and Gaussian maximum likelihood techniques.As part of the winning entry for the 2013 GRSS Data Fu-sion Contest, Debes et al. [1] combined abundance maps ob-tained through an spectral unmixing procedure with LiDARdata, providing topological information to the classificationprocess. A flexible strategy based on morphological featuresand subspace multinomial logistic regression was presentedin [2] for jointly classifying HSI and LiDAR data without theneed for regularization parameters.

More recently, the use of deep convolutional neural net-works (CNN) has been proposed for classification of hyper-spectral imagery [6, 7]. In deep learning, neural networks ofthree or more layers are used to learn deep features of inputdata and can provide better approximations to nonlinear func-tions than single-layer classifiers, such as linear support vec-tor machines. Inspired by visual mechanisms in living organ-isms, CNNs consists of layers of neurons whose outputs arecombined through convolution. Some applications of CNNsinclude material classification, object detection, and face andspeech recognition.

Here, we are concerned with the classification perfor-mance of CNNs when HSI and LiDAR data are combinedat the pixel level [8], that is, before feature extraction in theclassification process. In this type of fusion LiDAR elevationdata is replicated and appended to HSI data for each pixelin the scene. This combined data is then processed with amultilayer CNN similar to that proposed in [7] to learn thefilters producing the strongest response to local input pat-terns. Pixel-level fusion can have an advantage over other

Page 2: CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

techniques in that it tends to avoid loss of information thatmay occur during the feature extraction process [9].

This paper is motivated by the thesis work of SaurabhMorchhale [10]. We characterize classification performanceby modifying the CNN parameters and investigate robustnessto classification errors in the training data. We apply ourtechniques to sample classification problems using two well-known Hyperspectral and LiDAR datasets that have been re-cently developed for test purposes.

2. CLASSIFICATION FRAMEWORK

2.1. Data Fusion

We assume that the LiDAR and HSI datasets are geo-referenced and have been pre-processed to have the samespatial resolution, providing information for the same surfacearea over the Earth. Let column vector h(x, y) ∈ RM1 denotethe spectral response over M1 channels and d(x, y) ∈ RM2

denote a column vector of components derived from the Li-DAR data, such as elevation and LiDAR intensity, at eachpoint (x, y) in a regularly spaced grid over the observed sur-face. Further, each component of d(x, y) is scaled to ensurebalance among the fused data sources.

We define a new data vector for point (x, y) as follows:

g(x, y) =

[h(x, y)

d(x, y)⊗ 1

], (1)

where ⊗ is the Kronecker product, 1 is a column vector ofones and hence d(x, y)⊗1 is a repetition of the LiDAR com-ponents. The length of vector 1 is an additional parameter thatwe include in the characterization of performance of CNN forfused LiDAR and HSI data. In general, the repetition of Li-DAR data, relative to the length of the HSI vector, can be usedas a form of weighting the desired influence of one modalityover the other in the classification procedure.

2.2. Convolutional Neural Networks

A convolutional neural network is a multilayer neural networkinspired by the organization of neurons in the animal visualcortex [11]. It generally consists of one or more convolutionallayers and intermediate subsampling layers, followed by atleast one fully connected layer.

Here, we consider the CNN architecture recently devel-oped in [7] for processing HSI data. In our case, the in-put layer consists of the fused vectors g(x, y) from equa-tion (1). For simplicity, we use a single index i to enumer-ate these vectors in the spatial domain, i.e. gi = g(x, y)for i ∈ {1, 2, ..., N}, where N is the total number of spatialpoints (x, y) in the dataset. The convolution layer consists ofa set of K 1-D filter vectors, {fk}, of fixed length. In thislayer, each of these filters is convolved with gi to produceui,k = tanh(gi ? fk), where ? is the convolution operation.

In the max pooling layer, ui,k are subsampled by taking themaxima over non-overlapping regions of length 2, producingvectors us

i,k of half the size. Next, the subsampled vectorsusi,k are stacked together as

usi =

[usi1,u

si2, · · · ,us

iK

]Twhich is then used as input for the hidden neuron layer, pro-ducing output vector yi. This process is expressed as:

yi = f(W(h)usi ) + b(h), (2)

where W(h) is the weight matrix associated with the hiddenneuron layer and b(h) is a vector of unit bias. The numberof rows, P , in W(h) corresponds to the number of neurons inthe layer. The function f(·) is the layer’s activation functiondefined as f(x) = tanh(x) applied element-wise to the inputvector in equation (2). Finally, yi is passed through the outputlayer to produce,

ti = exp (W(o)yi + b(o)), (3)

zi =1

‖ti‖1ti, (4)

where the softmax function is applied to W(o)yi+b(o). Here,W(o) is the weight matrix associated with the output layer,and b(o) is a vector of unit bias. The number of rows, C, inW(o) corresponds to the number of labeled classes specifiedduring the training phase of the CNN. The final output vectorzi contains the estimated class probabilities for the classifica-tion of input vector gi.

2.3. Optimization of Class Probabilities

During the training phase of a CNN, an objective functionmeasuring the classification error for the sample training datais minimized. Let {gi}Nt

i=1 denote the set of Nt samples usedfor training and let {Lj}Cj=1 denote the set of C classifica-tion labels. Further, let x denote a vector of all the trainableparameters, specifically, {fk}, W(h),b(h),W(o), and b(o).Recall that zi in equation (4) is a vector of length C; let zi,jdenote the j th component of vector zi.

We define the following objective function:

J(x) = − 1

Nt

Nt∑i=1

C∑j=1

δi,j log(zi,j), (5)

where δij ={

1 if gi belongs to class Lj ,0 otherwise.

Equation (5) is the well-known logarithmic loss functionwhich is used to maximize predictive accuracy by rewardingcorrect classifications that are made with a high probability,i.e., whenever zi,j is close to 1 and sample gi ∈ Lj .

We minimize (5) using a standard gradient descent ap-proach. Starting with an initial guess x0 for a local minimumof J(x), we compute:

xn+1 = xn − αn∇J(xn), n ≥ 0 (6)

Page 3: CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

where ∇J(xn) is the gradient of J . In our implementationthe step size αn is kept constant with αn = 0.08. We stopiterating when the relative error in the cost function is suffi-ciently small; this serves as a regularization constraint whichtends to avoid overfitting.

3. EXPERIMENTAL RESULTS

We first employ the 2013 IEEE GRSS DF Contest dataset [12]to characterize classification performance. This dataset con-sists of a hyperspectral image of 144 spectral bands and aDigital Surface Model (DSM) derived from airborne LiDARat a 2.5m spatial resolution. There are 1903 × 329 pixels inthe scene and true labels are known for only a small subsetof these pixels. We utilized a total 2832 known (labeled)pixels spread over all C = 15 classes to form our fuseddataset {gi}, with the LiDAR DSM value repeated 16 timesor 11% relative to the length of the HSI vector. Too small arecurrence of the LiDAR value yields no improvement whiletoo large a recurrence tends to decrease overall classification.We found that 11% to 14% LiDAR recurrence produced bestclassification results.

The size of the training dataset relative to that of the testdataset is an important consideration of practical value. Toolarge a training dataset can lead to overfitting and is also un-realistic in most imaging applications. To avoid overfitting,we adopt the technique proposed in [13]. We partition thefused dataset {gi} into three subsets: training, validation, andtesting. Specifically we choose 900 observation vectors (60samples per class) for training, 900 observation vectors forvalidation, and 1032 observation vectors for testing, to char-acterize the classification accuracy of our CNN approach. Inaddition, we use K = 40 convolution filters and P = 60neurons in the fully connected layer.

3.1. Classification Accuracy

Figure 1 compares the classification accuracy obtained withHSI data alone and with fused HSI and LiDAR data. As can

Fig. 1. Comparison of classification accuracy, HSI vs. fusedHSI-LiDAR, for the 2013 IEEE GRSS DF Contest dataset.

be observed, the accuracy in the CNN output is roughly 10%higher for the fused vectors relative to classification via theHSI data alone. Moreover an accuracy of 80% is reached in55 iterations (or epochs) compared to 160 for the HSI data.Table 1 shows the complete error matrix for all classes in thedataset. Notice that accuracies of over 98% are achieved forstressed grass, trees, soil, tennis court and running track.

Pixel-level fusion can introduce additional variabilityacross HSI classes, as can be observed in Figure 2. Notice

Fig. 2. Fused data vectors for pixels with similar hyperspec-tral response.

how adding the LiDAR component significantly increasesthe distinction between commercial buildings and highwaysand between residential buildings and parking lot 1 spectraltraces. The complementary nature of HSI and LiDAR enablesthis gain in variability across HSI classes.

3.2. Error in the Training Data

In this experiment, we consider the possibility of misclas-sification error in the training data due to human or pre-processing oversight. To do this we randomly switch thelabels for a percentage of the 900 data vectors gi used fortraining of the CNN. We then classify the 1932 testing datavectors using the so-trained CNN. Table 2 shows the effectof up to 20% misclassification error in the training data onthe true positive classification. Interestingly, the algorithmappears, in most cases, to be impervious to such error, exceptfor classes with relative low true positive classification. Cor-responding results are also shown in Table 2 for classificationusing HSI data only.

3.3. Overall Image Classification

A visual comparison in the classification of all the pixels ofthe 2013_IEEE_GRSS_DF_Contest dataset is given inFigure 3. For these results, 100 known pixels per class wereused for training of the CNN instead of 60 (a ratio of 1.13:1between training and test datasets). This change in the size ofthe training data resulted in approximately 1% improvementrelative to the error matrix results shown in Table 1.

Page 4: CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

Table 1. The classification accuracies (HSI vs Fused) for 2013 IEEE GRSS DF Contest dataset.HSI | Fused

Healthygrass

Stressedgrass

Syntheticgrass Trees Soil Water Residential Commercial Road Highway Railway

ParkingLot 1

ParkingLot 2

TennisCourt

RunningTrack

Healthy grass 96% | 95% 4% | 5% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0%Stressed grass 0% | 0% 100% | 100% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0%Synthetic grass 0% | 0% 0% | 0% 98% | 95% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 1% 1% | 3% 0% | 1% 1% | 0% 0% | 0% 0% | 0%Trees 6% | 2% 3% | 0% 0% | 0% 91% | 98% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0%Soil 0% | 0% 0% | 0% 0% | 0% 0% | 0% 98% | 98% 0% | 0% 0% | 0% 1% | 0% 0% | 1% 1% | 0% 0% | 0% 0% | 0% 0% | 1% 0% | 0% 0% | 0%Water 0% | 0% 0% | 0% 1% | 0% 0% | 0% 0% | 0% 91% | 93% 3% | 3% 0% | 0% 0% | 0% 2% | 0% 0% | 0% 1% | 2% 2% | 2% 0% | 0% 0% | 0%Residential 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 85% | 90% 1% | 4% 0% | 0% 0% | 0% 4% | 6% 7% | 0% 1% | 0% 2% | 0% 0% | 0%Commercial 0% | 0% 0% | 0% 0% | 0% 0% | 0% 2% | 0% 0% | 0% 1% | 1% 54% | 87% 10% | 0% 2% | 0% 0% | 1% 3% | 11% 28% | 0% 0% | 0% 0% | 0%Road 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 1% 0% | 0% 0% | 0% 2% | 0% 52% | 82% 37% | 3% 5% | 7% 3% | 2% 1% | 4% 0% | 1% 0% | 0%Highway 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 14% | 1% 62% | 85% 7% | 6% 17% | 8% 0% | 0% 0% | 0% 0% | 0%Railway 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 17% | 9% 0% | 0% 2% | 0% 3% | 3% 73% | 83% 5% | 5% 0% | 0% 0% | 0% 0% | 0%Parking Lot 1 0% | 0% 0% | 0% 0% | 0% 0% | 0% 6% | 0% 0% | 0% 2% | 0% 0% | 5% 32% | 32% 15% | 3% 3% | 1% 42% | 52% 0% | 7% 0% | 0% 0% | 0%Parking Lot 2 0% | 0% 0% | 0% 0% | 0% 0% | 0% 2% | 3% 0% | 0% 8% | 0% 9% | 0% 17% | 7% 5% | 2% 0% | 6% 11% | 9% 46% | 69% 1% | 2% 1% | 2%Tennis Court 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 1% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 100% | 99% 0% | 0%Running Track 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 2% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 0% | 0% 100% | 98%

Table 2. The classification accuracies for 2013 IEEE GRSS DF Contest dataset on introduction of noise.HSI | Fused% Error

Healthygrass

Stressedgrass

Syntheticgrass Trees Soil Water Residential Commercial Road Highway Railway

ParkingLot 1

ParkingLot 2

TennisCourt

RunningTrack

No Error 96% | 95% 100% | 100% 98% | 95% 91% | 98% 98% | 98% 91% | 93% 85% | 90% 54% | 87% 52% | 82% 62% | 85% 73% | 83% 42% | 52% 46% | 69% 100% | 99% 100% | 98%1 % 98% | 95% 100% | 100% 98% | 95% 97% | 98% 98% | 99% 92% | 92% 81% | 85% 53% | 86% 59% | 85% 60% | 89% 79% | 83% 39% | 58% 53% | 59% 100% | 99% 100% | 98%5 % 96% | 96% 100% | 100% 86% | 96% 82% | 99% 98% | 100% 91% | 93% 81% | 91% 53% | 86% 53% | 85% 54% | 82% 45% | 75% 50% | 44% 52% | 61% 100% | 99% 100% | 98%10 % 97% | 96% 100% | 100% 98% | 100% 95% | 99% 99% | 100% 93% | 97% 71% | 93% 47% | 85% 68% | 84% 53% | 85% 69% | 78% 38% | 55% 49% | 52% 100% | 99% 100% | 98%15 % 96% | 96% 100% | 100% 98% | 100% 95% | 97% 100% | 100% 94% | 96% 58% | 92% 54% | 91% 79% | 71% 47% | 85% 71% | 82% 37% | 48% 37% | 56% 100% | 99% 98% | 98%20 % 93% | 97% 100% | 100% 99% | 100% 97% | 98% 100% | 100% 93% | 95% 70% | 91% 46% | 92% 56% | 65% 50% | 79% 69% | 75% 39% | 40% 69% | 59% 98% | 100% 100% | 98%

Fig. 3. 2013 IEEE GRSS DF Contest dataset: CNN usingHSI data only (top) and fused HSI and LiDAR data (middle).Google map view (bottom).

3.4. MUUFL Gulfport Dataset

We also applied the algorithm to a subset of the MUUFLGulfport dataset [14, 15] which consists of a hyperspectralimage of 58 spectral bands and co-registered LiDAR eleva-tion and intensity data. There are 320 × 360 pixels in thisdataset. For training purposes we selected 60 pixels for 12known classes and used additional 1620 labeled pixels fortesting. Repetition of the LiDAR elevation and intensity com-ponents was set to 5. We use K = 20 convolution filters andP = 20 neurons in the fully connected layer. Classes labeledTargets 1-4 correspond to materials placed on the ground.

Fig. 4. MUUFL Gulfport dataset: CNN using HSI data only(left) and fused HSI and LiDAR data (middle). Google mapview (right).

4. DISCUSSION AND FUTURE WORK

Our experimental results suggest that pixel-level data fusioncan be an effective approach to improving accuracy of con-volutional neural network classifiers, by roughly 10% com-pared to classification via HSI alone. Similar results havebeen found relative to the improvement of classification dueto fusion in non-CNN approaches [16]. Whether CNN pro-vides better results compared to non-CNN methods is a topicof future research, though our results seem to suggest thathigher accuracies can be achieved.

Future work includes exploitation of the natural correla-tion among nearby pixels. Nearby pixels may be expected tobe highly correlated in their spectral signatures, their LiDARelevations, and in their LiDAR intensities. We believe the re-sults presented here can be extended to use 3-D convolutionto further improve classification accuracy and we plan furtherexperiments with the MUUFL Gulfport dataset. We will alsoconsider the question of whether the convolution steps of ourCNN will tolerate some registration errors between the twomodalities.

Page 5: CLASSIFICATION OF PIXEL-LEVEL FUSED HYPERSPECTRAL …csweb.cs.wfu.edu/~pauca/publications/whispers-CNN2016.pdfIndexTerms— LiDAR and Hyperspectral Imaging, Con-volutional Neural Networks,

AcknowledgementsThe authors would like to thank the Hyperspectral Im-

age Analysis group and the NSF Funded Center for Air-borne Laser Mapping (NCALM) at the University of Hous-ton for providing the data sets used in this study, and theIEEE GRSS Data Fusion Technical Committee for organiz-ing the 2013 Data Fusion Contest. The authors also thankP. Gader, A. Zare, R. Close, J. Aitken, G. Tuell, the Univer-sity of Florida, and the University of Missouri for sharing the“MUUFL Gulfport Hyperspectral and LiDAR Data Collec-tion” acquired with NGA funding. In addition, the authorsthank the reviewers for their helpful comments and sugges-tions.

This research was supported in part by the U.S. AirForce Office of Scientific Research (AFOSR) under Grantno. FA9550-15-1-0286.

5. REFERENCES

[1] C. Debes, A. Merentitis, R. Heremans, J. Hahn,N. Frangiadakis, T. van Kasteren, W. Liao, R. Bellens,A. Pizurica, S. Gautama, et al., “Hyperspectral and li-dar data fusion: Outcome of the 2013 grss data fusioncontest,” Selected Topics in Applied Earth Observationsand Remote Sensing, IEEE Journal of, vol. 7, no. 6, pp.2405–2418, 2014.

[2] M. Khodadadzadeh, J. Li, S. Prasad, and A. Plaza, “Fu-sion of hyperspectral and lidar remote sensing data usingmultiple feature learning,” Selected Topics in AppliedEarth Observations and Remote Sensing, IEEE Journalof, vol. 8, no. 6, pp. 2971–2983, 2015.

[3] D. Nikic, J. Wu, V.P. Pauca, R. Plemmons, andQ. Zhang, “A novel approach to environment recon-struction in lidar and hsi datasets,” in Advanced MauiOptical and Space Surveillance Technologies Confer-ence, 2012, vol. 1, p. 81.

[4] Q. Zhang, V. P. Pauca, R. J. Plemmons, and D. Nikic,“Detecting objects under shadows by fusion of hyper-spectral and lidar data: A physical model approach,” inProc. 5th Workshop Hyperspectral Image Signal Pro-cess.: Evol. Remote Sens, 2013, pp. 1–4.

[5] M. Dalponte, L. Bruzzone, and D. Gianelle, “Fusion ofhyperspectral and lidar remote sensing data for classifi-cation of complex forest areas,” Geoscience and RemoteSensing, IEEE Transactions on, vol. 46, no. 5, pp. 1416–1427, 2008.

[6] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deeplearning-based classification of hyperspectral data,” Se-lected Topics in Applied Earth Observations and RemoteSensing, IEEE Journal of, vol. 7, no. 6, pp. 2094–2107,2014.

[7] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deepconvolutional neural networks for hyperspectral imageclassification,” Journal of Sensors, vol. 2015, 2015.

[8] C. Pohl and J. L. Van Genderen, “Review article multi-sensor image fusion in remote sensing: concepts, meth-ods and applications,” International journal of remotesensing, vol. 19, no. 5, pp. 823–854, 1998.

[9] M. Mangolini, Apport de la fusion d’images satellitairesmulticapteurs au niveau pixel en teledetection et photo-interpretation, Ph.D. thesis, Universite de Nice Sophia-Antipolis, 1994.

[10] S. Morchhale, “Deep convolutional neural networks forclassification of fused hyperspectral and LiDAR data,”M.S. thesis, Wake Forest University, MS Thesis, Dept.of Computer Science., 2016, http://csweb.cs.wfu.edu/˜pauca/MorchhaleThesis16.pdf.

[11] N. Kruger, P. Janssen, S. Kalkan, M. Lappe,A. Leonardis, J. Piater, A. Rodriguez-Sanchez, andL. Wiskott, “Deep hierarchies in the primate visual cor-tex: What can we learn for computer vision?,” PatternAnalysis and Machine Intelligence, IEEE Transactionson, vol. 35, no. 8, pp. 1847–1871, 2013.

[12] IEEE Geoscience and Remote Sensing Society2013 Data Fusion Contest, “http://www.grss-ieee.org/community/technical-committees/data-fusion,”.

[13] Jerome Friedman, Trevor Hastie, and Robert Tibshirani,The elements of statistical learning, vol. 1, Springerseries in statistics Springer, Berlin, 2001.

[14] P. Gader, A. Zare, R. Close, and G. Tuell, “Co-registeredhyperspectral and LiDAR Long Beach, Mississippi datacollection,” 2010, University of Florida, University ofMissouri, and Optech International.

[15] P. Gader, A. Zare, R. Close, J. Aitken, and G. Tuell,“MUUFL gulfport hyperspectral and LiDAR airbornedata set,” Tech. rep. REP-2013-570, University ofFlorida, Oct. 2013.

[16] Pengyu Hao and Zheng Niu, “Comparison of differentlidar and hypespectral data fusion strategies using svmand abnet,” Remote Sensing Science, vol. 1, no. 3, 2013.