4
Proceedings of the 4th International Symposium on Communications, Control and Signal Processing, ISCCSP 2010, Limassol, Cyprus, 3-5 March 2010 Sparse Representations for Automatic Target Classification in SAR Images Jayaraman J. Thiagarajan, Karthikeyan N. Ramamurthy, Peter Knee, Andreas Spanias and Visar Berisha Abstract-We propose a sparse representation approach for classifying different targets in Synthetic Aperture Radar (SAR) images. Unlike the other feature based approaches, the proposed method does not require explicit pose estimation or any pre- processing. The dictionary used in this setup is the collection of the normalized training vectors itself. Computing a sparse representation for the test data using this dictionary corresponds to finding a locally linear approximation with respect to the underlying class manifold. SAR images obtained from the Moving and Stationary Target Acquisition and Recognition (MSTAR) public database were used in the classification setup. Results show that the performance of the algorithm is superior to using a support vector machines based approach with similar assumptions. Significant complexity reduction is obtained by reducing the dimensions of the data using random projections for only a small loss in performance. I. INTRODUCTION Synthetic Aperture Radar (SAR) automatic sensors, first developed in the 1950's, have the ability to produce all- weather, 24-hour a day, high-resolution images with quality quickly approaching that of optical imaging systems [1]. Au- tomatic Target Recognition (ATR) systems using SAR sensors continue to be developed for a wide variety of applications, particularly in the area of military defense. The goal of these ATR systems is to detect and classify military targets using various image and signal processing techniques. The conven- tional multistage ATR algorithm consists of three separate stages [2]: The pre-screener identifies local regions of interest using a Constant False Alarm Rate (CFAR) detector, allowing all targets and numerous false alarms to pass. It is followed by a one-class discriminator which aims to eliminate all natural false alarms, also referred to as clutter. Finally, the classifier receives all man-made objects and attempts to categorize each input image as a specific target type contained in the training set or to reject the object as man-made clutter. In the commonly used template matching based algorithms, multiple templates for each target are generated at incremen- tally spaced aspect angles [3]. The addition of a target to the training set requires only the creation of an additional set of templates. However, these schemes become computationally intensive as the number of templates and target types are increased. The other popular approach uses the entire set of training data to discriminately train the classification system. The systems produced are typically nonlinear, for example the multilayer perceptron [4] and the Support Vector Machine The authors are with the SenSIP Center, Arizona State University, Tempe, USA. The radar research was sponsored in part by the ASU SenSIP Consortium and its industry member Raytheon. 978-1-4244-6287-2/10/$26.00 ©2010 IEEE (SVM) [5]. These feature-based classification approaches have been shown to outperform conventional template-based ap- proaches, providing better generalization capabilities and in- creased rejection to target confusers. These pattern recognition approaches utilize a set of classifiers, each developed using target training images over a given range of aspect angles [4], [5], [6]. While these classification systems achieve reduced complexity, the approaches are reliant on an accurate pose an- gle estimate. Numerous pose angle estimation techniques have been developed, including the use of a 2-D continuous wavelet transform, but estimation errors are nearly ±10 degrees. We propose a sparse representation based algorithm for classifying SAR images into different targets. This algorithm is closely related to the one proposed in [7]. In [7], the training images used for classification are assumed to be perfectly registered. However, we use the algorithm in a scenario where we have unknown poses and provide reasons for the working of the algorithm from the perspective of class manifolds. The proposed algorithm does not require any preprocessing of the images and there is no need for any pose estimation. It works by projecting the test data on a subset of training vectors from the class manifolds and computing the residual error with respect to each manifold. The best class estimate corresponds to the manifold that has the least residual error. The publicly available portion of the Moving and Stationary Target Acquisition and Recognition (MSTAR) database [8] was used to test the classification setup. The three targets considered were the T72 tank, the BTR70 personnel carrier and the BMP2 tank. The log-intensity images collected at a 17-degree depression angle were used for training while the images collected at 15 degrees were used for testing. Results show that the classification performance is slightly better than a similar SVM based classification setup [5] for the same training and testing sets. To reduce the complexity, we use the theory of random projections to decrease the dimensionality of the data sets to about 11 % of the original dimensions. Significant complexity reduction was achieved for only a small loss in classification performance. The organization of this paper is as follows: in Section II we present a brief review of the theory of sparse representations and random projections. The proposed algorithm for classifi- cation is described in Section III. The results will be presented in Section IV, along with a comparison between the proposed approach and results obtained using a SVM based classifier in [5]. It is appropriate to directly compare these two sets of results since neither utilizes significant image pre-processing on the MSTAR images. Finally, Section V will present our

[IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

  • Upload
    visar

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

Proceedings of the 4th International Symposium on Communications,Control and Signal Processing, ISCCSP 2010, Limassol, Cyprus, 3-5 March 2010

Sparse Representations for Automatic TargetClassification in SAR Images

Jayaraman J. Thiagarajan, Karthikeyan N. Ramamurthy, Peter Knee, Andreas Spanias and Visar Berisha

Abstract-We propose a sparse representation approach forclassifying different targets in Synthetic Aperture Radar (SAR)images. Unlike the other feature based approaches, the proposedmethod does not require explicit pose estimation or any pre­processing. The dictionary used in this setup is the collectionof the normalized training vectors itself. Computing a sparserepresentation for the test data using this dictionary correspondsto finding a locally linear approximation with respect to theunderlying class manifold. SAR images obtained from the Movingand Stationary Target Acquisition and Recognition (MSTAR)public database were used in the classification setup. Resultsshow that the performance of the algorithm is superior tousing a support vector machines based approach with similarassumptions. Significant complexity reduction is obtained byreducing the dimensions of the data using random projectionsfor only a small loss in performance.

I. INTRODUCTION

Synthetic Aperture Radar (SAR) automatic sensors, firstdeveloped in the 1950's, have the ability to produce all­weather, 24-hour a day, high-resolution images with qualityquickly approaching that of optical imaging systems [1]. Au­tomatic Target Recognition (ATR) systems using SAR sensorscontinue to be developed for a wide variety of applications,particularly in the area of military defense. The goal of theseATR systems is to detect and classify military targets usingvarious image and signal processing techniques. The conven­tional multistage ATR algorithm consists of three separatestages [2]: The pre-screener identifies local regions of interestusing a Constant False Alarm Rate (CFAR) detector, allowingall targets and numerous false alarms to pass. It is followed bya one-class discriminator which aims to eliminate all naturalfalse alarms, also referred to as clutter. Finally, the classifierreceives all man-made objects and attempts to categorize eachinput image as a specific target type contained in the trainingset or to reject the object as man-made clutter.

In the commonly used template matching based algorithms,multiple templates for each target are generated at incremen­tally spaced aspect angles [3]. The addition of a target to thetraining set requires only the creation of an additional set oftemplates. However, these schemes become computationallyintensive as the number of templates and target types areincreased. The other popular approach uses the entire set oftraining data to discriminately train the classification system.The systems produced are typically nonlinear, for examplethe multilayer perceptron [4] and the Support Vector Machine

The authors are with the SenSIP Center, Arizona State University, Tempe,USA.

The radar research was sponsored in part by the ASU SenSIP Consortiumand its industry member Raytheon.

978-1-4244-6287-2/10/$26.00 ©2010 IEEE

(SVM) [5]. These feature-based classification approaches havebeen shown to outperform conventional template-based ap­proaches, providing better generalization capabilities and in­creased rejection to target confusers. These pattern recognitionapproaches utilize a set of classifiers, each developed usingtarget training images over a given range of aspect angles [4],[5], [6]. While these classification systems achieve reducedcomplexity, the approaches are reliant on an accurate pose an­gle estimate. Numerous pose angle estimation techniques havebeen developed, including the use of a 2-D continuous wavelettransform, but estimation errors are nearly ±10 degrees.

We propose a sparse representation based algorithm forclassifying SAR images into different targets. This algorithmis closely related to the one proposed in [7]. In [7], the trainingimages used for classification are assumed to be perfectlyregistered. However, we use the algorithm in a scenario wherewe have unknown poses and provide reasons for the workingof the algorithm from the perspective of class manifolds. Theproposed algorithm does not require any preprocessing of theimages and there is no need for any pose estimation. It worksby projecting the test data on a subset of training vectorsfrom the class manifolds and computing the residual error withrespect to each manifold. The best class estimate correspondsto the manifold that has the least residual error.

The publicly available portion of the Moving and StationaryTarget Acquisition and Recognition (MSTAR) database [8]was used to test the classification setup. The three targetsconsidered were the T72 tank, the BTR70 personnel carrierand the BMP2 tank. The log-intensity images collected at a17-degree depression angle were used for training while theimages collected at 15 degrees were used for testing. Resultsshow that the classification performance is slightly better thana similar SVM based classification setup [5] for the sametraining and testing sets. To reduce the complexity, we use thetheory of random projections to decrease the dimensionalityof the data sets to about 11% of the original dimensions.Significant complexity reduction was achieved for only a smallloss in classification performance.

The organization of this paper is as follows: in Section II wepresent a brief review of the theory of sparse representationsand random projections. The proposed algorithm for classifi­cation is described in Section III. The results will be presentedin Section IV, along with a comparison between the proposedapproach and results obtained using a SVM based classifierin [5]. It is appropriate to directly compare these two sets ofresults since neither utilizes significant image pre-processingon the MSTAR images. Finally, Section V will present our

Page 2: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

conclusions, including a discussion on possible improvements.

min Ilalll subject to Ilx - Dal12 ::; E (3)a

min Iiallo subject to Ilx - Dal12 ::; E (2)a

where D is a N x K matrix with N < K and it is assumedthat rank(D) = N. For signal representation, the columns ofD form an overcomplete set and y is a measurement vector orthe target signal to be represented. The goal is to solve for thecoefficients a in this underdetermined system. A requirementconsidered here is that the coefficient vector, a, be sparse, i.e.only few of its entries are non-zero. Relevant cost functionsthat can promote sparsity such as the eo and the el normsare used. The code, a, can be obtained by solving one of thefollowing optimization problems,

The SAR images from the training sets of all the classesare vectorized and normalized to have unit e2 norm and usedas training vectors. Let us denote the normalized i th trainingvector from a class j as di,j and let D j = [di,j]~l' whereN j is the number of training vectors in class j. The overalldictionary is given by D = [Dj]!=l and it contains thenormalized training vectors of all classes. The algorithm forclassification (Table I) is similar to the one given in [7], butwe motivate and describe the working of the algorithm fromthe perspective of class manifolds.

III. PROPOSED CLASSIFICATION ALGORITHM

B. Random Projections

Random projection is a non-adaptive universal dimension­ality reduction technique that performs a low-distortion Eu­clidean embedding of the high dimensional data [13]. Thenumber of dimensions reduced using random projections doesnot depend on the original data dimensions, but is proportionalto the logarithm of the number of data points. This can beformally stated using the Johnson-Lindenstrauss (JL) lemma.For any 0 < E < 1 and any integer f, let N be a positiveinteger such that N 2: 4(E2 /2 - E3 /3)-1 In f. Then, for anyset V of S points in jRM, there is a map j : jRM ---+ jRN suchthat for all u, v E V,

(1 - E)llu - vI1 2::; Ilj(u) - j(v)11 2::; (1 + E)llu - v11 2. (4)

Furthermore this map can be found in randomized polynomialtime. Mappings that satisfy JL lemma include random matriceswhose entries are drawn independently from a zero meanuniform distribution or the normal distribution N(O, 1).

(1)x=Da,

II. BACKGROUND

A. Sparse Representations

Sparse representation problems aim to approximate a targetsignal using a linear combination of elementary signals drawnfrom a large collection. We work in finite dimensional, realinner-product space jRN, which is referred to as the signalspace. A dictionary for the signal space, D, is a finitecollection of unit-norm elementary signals. If the atoms in thedictionary are linearly dependent, it is termed to be redundantor overcomplete. This means that every signal can have aninfinite number of best approximations. Hence, the problemof signal representation can be formulated as a solution to anunderdetermined system of equations

where, 11.110 and 11.111 refer to the eo norm and el norm respec­tively. Here, the objective function is a measure of sparsity inthe resulting code and the constraint is on the approximationerror. When eo norm is used as the cost function, exactdetermination of the sparsest representation is an NP-hardproblem [10]. The eo norm optimization problem searches forthe smallest subset of the dictionary atoms to represent thesignal. The complexity of the search grows exponentially withthe dictionary size. The problem in (3) is a convex optimizationproblem closest to (2).

The two most important approaches followed to find sparserepresentations are the greedy pursuit methods and the convexrelaxation methods [9]. The greedy pursuit methods select oneatom at a time from the dictionary and builds the approximant.The selected atom is best correlated with the residual, andthe current approximation is updated using the atom. At eachstage, the residual is computed as the difference between theinput signal and the contribution of all dictionary atoms pickedup to the previous stage. Since the sparse approximation in (2)is a combinatorial problem, its related convex problem in (3)can be used. The convex optimization problem can be solvedin polynomial time using standard software. Sparse represen­tations have been successfully used in image compression,denoising, clustering [9] and classification [15].

A. Sparse Representations and Manifolds

It is known that SAR images of a given class lie in amanifold, whose dimensions are much lower than the actualdimension of the image [14]. The images lie in a special typeof manifold called the Image Appearance Manifold (lAM),because the image of a target predominantly depends on thedepression angle and the pose [11]. Therefore, classificationof SAR images is equivalent to finding the manifold that isclosest to the test image. The manifold of a given class isdenoted as as M j and the training images are assumed tobe the samples drawn from the manifold. The manifold of aclass is also highly non-differentiable because of the presenceof edges in the image [11].

Linear representation can be provided to any non-linearmanifold, when only a small local region of the manifoldis considered. This fact is exploited by the Locally LinearEmbedding (LLE) algorithm [12] for manifold learning. Wewill use this fact to provide a sparse representation basedclassification setup for SAR images. In this setup, we willassume that for a given pose, changing the depression anglealone does not change the SAR image appreciably for thepurpose of classification. Therefore, we will use the imagestaken at a depression angle of 17 degrees for training and ata depression angle of 15 degrees for testing.

Page 3: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

TABLE IPROPOSED ALGORITHM FOR CLASSIFICATION OF SAR IMAGES USING

SPARSE REPRESENTATIONS.

Input

Training vectors, { {Yi,j}~1} :=1Number of classes, J.Training vectors per class, N j .

Test vector, x.Sparsity, L.

Fig. 1. Classification oftest data (x) by computing locally linear projections(XI,X2) and residuals (rI,r2) on the manifolds (MI,M2) using sparserepresentation.

where Xj is the "projection" of x on the manifold M j andr j is the residual. The "projection" is given by a local linearcombination of training vectors given as,

Assuming that the test vector x lies close to the manifold ofa class j, it can be represented as a locally linear combinationof the training vectors of the class. This is given by,

Initialization- Normalize Yi,j and create di,j.

- Overall dictionary D = [[di,jl~IJ;=1

Proposed Algorithm- Assume generative model x = Da + r.- Compute the sparse representation a using aMP.- Find supports ll~ for each class j, where

coefficients are non-zero.- Compute "projections" Xj (5) and residuals r j (6).- rj = IIrj 112 and c= argmin rj.

j

OutputClass estimate for x, C.

B. Dimensionality Reduction using Random Projections

The algorithm proposed in Section III-A provides a goodclassification performance, even in the absence of preprocess­ing as noted in Section IV. However, it is very expensivein terms of computations to directly use the training andtest images in order to compute the sparse representation.Assuming that d E M and the number of elements inD is K, for a sparsity of L, computing a using aMPalgorithm incurs a complexity of O(LKM). To reduce thecomplexity, we randomly project both the training and testvectors before performing sparse representation. We create aprojection matrix R E lRMxN , whose elements are drawnfrom N (0, 1). Dimensionality reduction of x from lRM to lRN ,

where N « M is performed by,

1 TX r = NR x, (7)

where X r is the randomly projected data.If the reduced number of dimensions are taken as per

the JL lemma, the distances and angles between the pointsare preserved approximately (i.e.) a low distortion Euclideanembedding of the manifolds can be performed. We use (7)to project both the training and test vectors to a randomlow dimensional space and perform classification using thealgorithm given in Table I. The dominant complexity ofclassification is computing the representation and it reducesfrom O(LKM) to O(LKN) due to random projections.

IV. RESULTS

The classifier is tested with images from the MSTARdatabase. The images of targets T72 (SN_132), BTR70(SN_C71) and BMP2 (SN_C21), taken at 17 degree depressionangle are used for training. The total number of trainingvectors considered was 698. For testing, the images of targets

(5)

(6)X' = " ak ·dk .J L..J ,J ,J'

kEL:.~

where D.~ is the set containing the indices k of the trainingvectors in class j that participate in the linear combinationand ak,j are the coefficients of the representation. We defineD.x = Uf=l D.~ as the overall set of training vectors thatrepresent the test vector.

For a given test vector, the set D.x and the coefficients areunknown. L = lD.xl is much smaller than the total numberof training vectors, as only a few training vectors from eachclass manifold participate in the representation. This can besolved as a sparse approximation problem (i.e.), computinga, with a fixed number (L) of non-zero elements, for the testvector x using the dictionary D. L is also referred to as thesparsity of the representation. We use a greedy algorithm - theOrthogonal Matching Pursuit (aMP) algorithm [9] to computethe coefficients. After estimating the coefficients, the residualenergy rj = IIrjl12 with respect to each manifold is computed.The best estimate of the class for x is c = argminj rj' Thismeans that the best estimate of the class of x is the one thatcan represent the maximum energy of x using a set of localtraining vectors. This idea is illustrated in Figure 1, wherethe test vector x is represented using 3 training samples fromM 1 and 1 training sample from M2. The residual energy ofrepresentation is lesser for M 1 and hence x belongs to class1. The algorithm for classification is given in Table I. Wealso note that no explicit pose estimation is required in thisalgorithm as we directly deal with the manifolds of a classof images. Locally linear approximation, solved as a sparserepresentation problem, is able to select necessary trainingvectors to give a good discrimination across different classes.

Page 4: [IEEE 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP) - Limassol, Cyprus (2010.03.3-2010.03.5)] 2010 4th International Symposium on Communications,

95 ,-----~---___.,_____c_____=____=-_,______,___________,

I---+- Full dimensional data- e - Reduced dimensional data

TABLE IICONFUSION MATRIX FOR FULL DIMENSIONAL DATA WITH L = 5.

I T72 I BTR70 I BMP2 IT72 529 0 38

BTR70 14 196 26BMP2 39 0 523

90___ -o-_-----o------B-

TABLE IIICONFUSION MATRIX FOR REDUCED DIMENSIONAL DATA WITH L = 9.

T72 I BTR70 I BMP2 I

85'------~---~---~------.J

1 5Sparsity level

Fig. 2. Classification performance in full and reduced dimensional cases.

TI2 (SN_132,SN_812,SN_S7), BTR70 (SN_C71) and BMP2(SN_C21,SN_C9566,SN_C9663), taken at a depression angleof 15 degrees were used. The total number of testing vectorswas 1365. Note that the symbols in parenthesis indicate aparticular variant of the target. The training and test setsare exactly the same as the one used in [5]. The imageswere of size 128 x 128 and classification was performedwith the algorithm given in Table I. Classification was alsoperformed with reduced-dimensional training and test vectors.The dimensionality was reduced from 16384 to 1792 usingrandom projections. In this case, the algorithm was executed50 times, each time with a different random projection matrix,for each sparsity level and the results were averaged.

The classification performance with full dimensional andreduced dimensional data is given in Figure 2. The best clas­sification performance with full dimensional data is obtainedfor a sparsity level of 5 and it is 91.43%. The confusionmatrix with counts for this case is given in Table II. For thereduced dimensional data, the best classification performanceis obtained for a sparsity level of 9 and it is 90.91%. Theconfusion matrix for this case is given in Table III. Theconfusion matrix has non-integer entries because it is anaverage over multiple iterations.

V. DISCUSSION

The proposed algorithm performs slightly better than theSVM based approach proposed in [5], when a closed-setclassification setup is considered. The training and test vectorsets used in [5] are the same as those used in our setup. Wenote that we do not perform pose estimation and we also do notperform verification. The random projection based approachthat uses only about 11% of the original dimensions of data,has only 0.5% lesser classification performance when the bestcases are compared. The proposed method has the potential fornon-adaptive dimensionality reduction with a minimal loss inclassification performance. It is also interesting to note thatwhen a sparsity level of 1 is considered, the classificationproblem becomes similar to a nearest neighbor approach andthe performance is still greater than 90% with both full andreduced dimensional data.

T72 535.78 0 46.66BTR70 12.38 195.94 34.82BMP2 33.84 0.06 505.52

The power of our sparse representation based classificationalgorithm, which chooses the "best" locally linear subspacefor the purpose of classification is demonstrated in this paper.Further improvements in performance can be obtained ifan open set classification setup is considered with outlierrejection. The target in a SAR image lies close to the middleof the image. Hence, identifying the presence of target andusing a cropped image will also lead to huge improvementsin performance.

REFERENCES

[I] w.G. Carrara, RS. Goodman and RM. Majewski, Spotlight SyntheticAperture Radar: Signal Processing Algorithms, Artech House, Boston,MA,1995.

[2] D.E. Kreithen, S.D. Halversen and G.J. Owirka, "Discriminating targetsfrom clutter," Lincoln Laboratory Journal, vol. 6(1), pp. 25-52, 1993.

[3] G.J. Owirka, S.M. Verbout and L.M. Novak, "Template-based SAR ATRperformance using different image enhancement techniques," Proc. SPIE,vol. 3721, pp. 302-319, 1999.

[4] S.J. Rogers et.a!., "Neural networks for automatic target recognition,"Neural Networks, vol. 8, pp. 1153-1184, 1995.

[5] Q. Zhao et.a!., "Support Vector Machines for SAR Automatic TargetRecognition," IEEE Transactions on Aerospace and Electronic Systems,vol. 37, no. 2, pp. 643-653, 2001.

[6] Q. Zhao et.a!., "Synthetic aperture radar automatic target recognition withthree strategies of learning and representation," Optical Engineering, vol.39, no. 5, pp. 1230-1244,2000.

[7] 1. Wright et.al., "Robust Face Recognition via Sparse Representation,"IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, no. 2, pp. 210-227,2009.

[8] E.R Keydel, "MSTAR Extended Operating Conditions," Proc. SPIE,Vo1.2757, pp. 228-242, 1996.

[9] 1.A. Tropp, Topics in Sparse Approximation, Ph.D. thesis, University ofTexas, Austin, 2004.

[10] G. Davis, S. Mallat, and M. Avellaneda, "Greedy adaptive approxima­tion," Journal of Constructive Approximation, vol. 13, pp. 57-98, 1997.

[11] M. Waldn, D. Donoho, H. Choi and R Baraniuk, "The multiscalestructure of non-differentiable image manifolds;' SPIE Conference Series,pp. 413-429, 2005.

[12] S.T. Roweis and L.K. Saul, "Nonlinear dimensionality analysis bylocally linear embedding;' Science, VOL. 290, pp. 2323-2326, 2000.

[13] S. Vempala, The random projection method, Series in DiscreteMathematics and Theoretical Computer Science. American MathematicalSociety, 2004.

[14] V. Berisha et.al, "Sparse Manifold Learning with Applications to SARImage Classification," Proc. ICASSP, Vol.3, pp. 1089-1092,2007.

[15] U. Thiagarajan, Dictionary Learning Algorithms for Shift-InvariantRepresentation and Classification, M.S. thesis, Arizona State University,2008.