IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1 Robust ...static.tongtianta.site/paper_pdf/59ed0746-4cac-11e... · A shape descriptor named dense local self-similarity (DLSS) is ﬁrst

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1

Robust Optical-to-SAR Image MatchingBased on Shape Properties

Yuanxin Ye, Member, IEEE, Li Shen, Ming Hao, Jicheng Wang, and Zhu Xu

Abstract— Although image matching techniques have beendeveloped in the last decades, automatic optical-to-syntheticaperture radar (SAR) image matching is still a challenging taskdue to significant nonlinear intensity differences between suchimages. This letter addresses this problem by proposing a novelsimilarity metric for image matching using shape properties.A shape descriptor named dense local self-similarity (DLSS) isfirst developed based on self-similarities within images. Then asimilarity metric (named DLSC) is defined using the normalizedcross correlation (NCC) of the DLSS descriptors, followed by atemplate matching strategy to detect correspondences betweenimages. DLSC is robust against significant nonlinear intensitydifferences because it captures the shape similarity betweenimages, which is independent of intensity patterns. DLSC hasbeen evaluated with four pairs of optical and SAR images.Experimental results demonstrate its advantage over the state-of-the-art similarity metrics (such as NCC and mutual information),and show the superior matching performance.

Index Terms— Dense local self-similarity (DLSS), optical-to-synthetic aperture radar (SAR) image matching, shape proper-ties, similarity metric.

I. INTRODUCTION

IMAGE matching aims to detect control points (CPs) orcorrespondences between images, and it is a crucial pre-

liminary step for many subsequent remote sensing imageprocessing, such as image registration, image fusion, andchange detection [1]. Acquired by different imaging modalitiesand from various spectra, optical and synthetic aperture radar(SAR) images usually have significant intensity differences,namely, the two types of images present very different inten-sity patterns despite they cover the same scene (Fig. 1). Thesedifferences result in the CP detection becomes much more

Manuscript received January 25, 2017; accepted January 25, 2017.This work was supported in part by the National Key Research andDevelopment Program of China under Grant 2016YFB0501403 andGrant 2016YFB0502603, in part by the National Natural Science Foundationof China under Grant 41401369 and Grant 41401374, in part by the Scienceand Technology Program of Sichuan under Grant 2015SZ0046, and in partby the Fundamental Research Funds for the Central Universities underGrant 2682016CX083. (Corresponding authors: Li Shen and Ming Hao.)

Y. Ye is with the Faculty of Geosciences and Environmental Engineering,Southwest Jiaotong University, Chengdu 610031, China, and also with theDepartment of Information Engineering and Computer Science, University ofTrento, 38123 Trento, Italy (e-mail: [email protected]).

L. Shen, J. Wang, and Z. Xu are with the Faculty of Geosciences andEnvironmental Engineering, Southwest Jiaotong University, Chengdu 610031,China (e-mail: [email protected]).

M. Hao is with the School of Environment Science and Spatial Informatics,China University of Mining and Technology, Xuzhou 221116, China (e-mail:[email protected]).

Color versions of one or more of the figures in this letter are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LGRS.2017.2660067

Fig. 1. Example of intensity differences between the optical and SAR images,where the red boxes include a small image region and the green boxes includea larger image region.

difficult than a case when multisensor optical images should bematched. Therefore, automatic matching of optical and SARimages could be challenging.

Methods of optical-to-SAR image matching can be gen-erally classified into two categories [2]: feature-based andarea-based methods. Feature-based methods use similaritiesof features extracted from images to achieve CPs betweenimages. Such features mainly include point features [3], lineor edge features [4], and contour or region features [5].Recently, local invariant features have been developed rapidlyin the computer vision. Some representative local invariantfeatures such as the scale invariant feature transform andshape context have been used and improved for optical-to-SARimage matching because of their invariance to image rotationand scale changes [6], [7]. However, these methods depend ondetecting highly repeatable common features between images,which is a tough task for optical and SAR images due tosignificant intensity differences, thus degrading their matchingperformance [8].

As another type of matching methods, area-based meth-ods usually utilize similarity metrics to detect CPs betweenimages through a template matching strategy. Compared withfeature-based methods, area-based methods have the followingadvantages: 1) they avoid to extract common features betweenimages and 2) they can detect CPs within a small search regionbecause most remote sensing images can be directly georefer-enced by applying navigation instruments, which make imagesonly have position offsets of several or dozens of pixels [9].

The selection of similarity metrics is crucial for area-basedmethods. Common similarity metrics include the normalizedcross correlation (NCC), the mutual information (MI) [8],and the matching by tone mapping (MTM) [10]. NCC is notadapted to matching of optical and SAR images because itis vulnerable to nonlinear intensity differences. By compar-ison, MI is more robust to intensity changes and has beenapplied for optical-to-SAR image matching [8]. However, theMI-based methods often produce many local extrema in the

1545-598X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


2 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

matching, which may result in some mismatches. MTM isa new similarity metric in the computer vision, and handlesintensity differences between images using a tone mappingapproximated by a piecewise constant function. Nonetheless,the piecewise constant function cannot exactly fit the intensityrelationship between optical and SAR images because of itscomplexity. In general, these similarity metrics cannot effec-tively handle nonlinear intensity differences between opticaland SAR images because they only use intensity informationto detect CPs. Compared with intensity information, shapefeatures or geometric structures of images are more robustto nonlinear intensity changes, as shown in Fig. 1. The shapefeatures between the optical and SAR images are quite similarto some degree though their intensity information is signifi-cantly different. Recently, Ye and Shen [11] and Ye et al. [12]proposed a similarity metric based on structural properties,which has been effectively applied in the matching of multi-sensor remote sensing images. However, this method requiresto compute phase congruency features with multiscale andmultidirection, resulting in a computationally time-consumingprocedure. Differently from this way, this letter will build ashape similarity metric on the basis of local self-similarity(LSS) that captures local shape properties of images [13], [14].

We should note that it can be difficult to employ LSSfor optical-to-SAR image matching directly because localshape properties between such images are different (Fig. 1).By comparison, shape properties of a large image region arequite similar (Fig. 1). Accordingly, we develop a similaritymetric which can capture the shape similarity between largeregions of images. First, a feature descriptor named the denseLSS (DLSS) is built by integrating LSS descriptors of multiplesmall regions. DLSS aims to capture shape features of alarge image region. Then, a similarity metric named DLSCis developed using the NCC of the DLSS descriptors. Finally,DLSC is used to detect CPs between images by a templatematching strategy. This letter extends our earlier work [15]by presenting a more principled derivation of this similaritymetric. We also give a more thorough evaluation using moreexperimental data. The code is available in this Web site.1

II. METHODOLOGY

This letter aims to build a similarity metric that is robustto significant nonlinear intensity differences between opticaland SAR images. The proposed approach is based on theassumption that optical and SAR images share similar shapeproperties despite they have quite different intensity patterns.In this section, we develop a dense shape descriptor namedDLSS and define a similarity metric based on this descriptor.

A. Dense Shape Descriptor

DLSS is inspired from the LSS descriptor that capturesinternal geometric layouts of local self-similarities withinimages. LSS describes statistical co-occurrence of surroundingsmall image patches in an image region. An example of

1https://www.dropbox.com/s/o6f7j0igdiadlvs/DLSC%20publish%20code.rar?dl=0

Fig. 2. Example of the LSS descriptor generation. (a) Local image region.(b) Correlation surface of (a). (c) Log-polar translated LSS descriptor.

this descriptor generation is illustrated in Fig. 2. First,a correlation surface Sq(x, y) is calculated by the sum ofsquare differences between all surrounding image patches pi

and the patch centered at q in a local region. Sq(x, y) isnormalized by the maximum value of the image patch intensityvariance and noise, which is a constant that corresponds toacceptable photometric variations (in illumination or due tonoise). It is defined as

Sq (x, y) = exp(− SSDq(x, y)

max(varnosie, varauto(q))). (1)

Then, the correlation surface Sq(x, y) is transformed intoa log-polar representation and is partitioned into bins (e.g.,10 angles and 3 radial intervals). The maximal correlationvalue of each bin is used to form the LSS descriptor associatedwith the pixel q . Finally, the LSS descriptor is linearlystretched to the range of [0…1] to achieve invariance tointensity variations of different patches in the local region.

LSS is able to capture shape properties of local imageregions, but it cannot provide enough information for optical-to-SAR image matching because local shape propertiesbetween such images are different (Fig. 1). In contrast, theirshape properties of larger image regions are quite similar(Fig. 1). Accordingly, we develop a dense shape descriptor(named DLSS) to capture these properties by integrating theLSS descriptors of multiple local image regions in a densesampling grid. Fig. 3 illustrates the main processing chain forextracting the DLSS descriptor, and each step of which is asfollows.

1) The first step is to select a template window in an image.All pixels within the template window are utilized tobuild the DLSS descriptor.

2) The second step is to divide the template window intosome spatial regions called “cells.” Each cell containsn × n pixels, and has an overlapping region of half acell width with the neighboring cell. This process formsthe fundamental framework of the DLSS descriptor.

3) The third step is to extract the LSS descriptor of eachcell, and each descriptor is normalized by the L2 normto achieve a better invariance to intensity changes.

4) Finally, DLSS is built by collecting the LSS descriptorfrom all cells within a dense overlapping grid coveringthe template window into a combined feature vector.

B. Similarity Metric Based on Dense Shape Descriptor

As mentioned above, DLSS is a feature descriptor that cap-tures shape features of large image regions. Accordingly, this


YE et al.: ROBUST OPTICAL-TO-SAR IMAGE MATCHING BASED ON SHAPE PROPERTIES 3

Fig. 3. Main processing chain for extracting the DLSS descriptor.

Fig. 4. DLSS descriptors of a pair of synthetic images in the corner andedge regions.

descriptor can be used to match two images with significantintensity differences as long as they have similar geomet-ric shape layouts. Fig. 4 shows that the DLSS descriptorscomputed from a pair of synthetic images with nonlinearintensity differences and noise. One can clearly observe thatthe two DLSS descriptors are quite similar despite the differentintensity patterns between the two images.

Considering the shape similarity of large regions betweenoptical and SAR images, the NCC of the DLSS descriptorsis used as the similarity metric (named DLSC) for imagematching.

To illustrate its advantage in matching multimodal images,DLSC is compared to NCC, MTM, and MI by the similaritycurve. A pair of visible and SAR images with a high resolutionis selected in the test. The significant nonlinear intensitydifferences can be clearly observed between the two images.A template window with a size of 68 × 68 pixels is firstselected from the visible image. Then, NCC, MTM, MI, andDLSC are computed for the horizontal-direction translations(−10 to 10 pixels) within a search region of the SAR image,respectively.

Fig. 5 shows the similarity curves of NCC, MTM, MI, andDLSC. NCC fails to detect the CP, and MTM and MI alsohave some location errors caused by the significant intensitydifferences. By comparison, DLSC not only detects the CP

Fig. 5. Comparison of similarity curves of NCC, MTM, MI, and DLSC.(a) Optical image. (b) SAR image. (c) NCC similarity curve. (d) MTMsimilarity curve. (e) MI similarity curve. (f) DLSC similarity curve.

correctly but also presents a more distinguishable curve peak.This example preliminarily illustrates that DLSC is morerobust than the other similarity metrics to complex intensitychanges. A detailed analysis on the performance of DLSC willbe given in Section III.

III. EXPERIMENTS

To evaluate the matching performance of DLSC, it is com-pared with the three state-of-the-art similarity metrics includ-ing NCC, MTM, and MI. The test data sets, implementationdetails, and experimental analysis are as follows.


4 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

Fig. 6. Optical and SAR images used in the experiments. (a) Test 1. (b) Test 2.(c) Test 3. (d) Test 4.

A. Date Sets

Four pairs of optical and SAR images are selected asthe test data, which are divided into two categories: high-resolution images and medium-resolution images. Fig. 6 showsthe image pairs of test sets. All of these image pairs havebeen systematically rectified by the directly georeferencingtechnique, and also resampled in the same ground sampledistance (GSD). Thus there are almost no obvious rotationand scale differences, only the offsets of a few pixels existingbetween the reference and sensed images. However, differentintensity characteristics can be clearly observed between theseimages due to the differences of their imaging mechanism (Fig.6). Table I presents the description of the test sets, and othercharacteristics of each set are described below.

High-resolution data sets: Test 1 and test 2 are two pairsof optical and SAR images with a high resolution. Since theyare located in urban areas, these images have rich shape andcontour features. In addition, since there is a temporal differ-ence of 14 months between the images of test 2, some groundobjects have changed during this period. These differences inthis test make it quite difficult to match the two images.

Medium-resolution data sets: Test 3 and test 4 are two pairsof optical and SAR images with medium resolutions, whichare located in the suburban areas. One can observe from Fig. 6that the images of test 3 have the more significant shape andstructure features compared with these of test 4.

B. Implementation Details

In the experiments, the block-based Harris operator [14]is used to detect the 200 evenly distributed interest pointsin the reference image, and then utilize NCC, MTM, MI,and DLSC to detect CPs within a search region with a fixedsize (20 × 20 pixels) of the sensed image using a templatematching strategy. In the matching process, template windowswith different sizes (from 36 × 36 to 124 × 124 pixels) areused to detect CPs to analyze the sensitivities of these simi-larity metrics with respect to changes in the template size.

The correct matching ratio (CMR) is chosen as the evalu-ation criterion, where C M R = C M/C , C M is the numberof correctly matched point pairs in the matching results, andC is the total number of match point pairs. The C M isdetermined by the following strategy. For each image pair,

TABLE I

DESCRIPTIONS OF TEST IMAGES

40–60 evenly distributed points were manually selected as thecheck points, which are used to calculate the residual errorfor each matched point pair by a projective transformationmodel. The point pair with residual error less than 1.5 pixelsis regarded as the correct match. Moreover, the root-mean-square error (RMSE) [15] of the correct matched point pairsis used to evaluate the matching accuracy.

There are three key parameters in the DLSC, i.e., cell sizen × n pixels, angle number β, and radial interval number α.Their influences have been tested by ten pairs of optical andSAR images. The test results showed the DLSC with theparameters (cell size with 7 × 7 pixels, 9 angles, and 2 radialintervals) achieved the optimal CMR value. Therefore, theseparameter settings are used in the following experiment.

C. Experimental Analysis

Fig. 7 shows the CMR values of all the experiments. DLSCperforms the best, followed by MI and MTM. NCC achievesthe lowest CMR values in most tests because it is quitesensitive to nonlinear intensity differences. MTM appliesa piecewise linear function to fit the intensity relationshipbetween images to handle intensity differences. However, theintensity relationship between optical and SAR images is toocomplicated to be exactly fit by a piecewise linear function.Accordingly, MTM cannot also effectively detect CPs betweenoptical and SAR images. In addition, it can be seen thatDLSC’s performance is less influenced by template sizescompared with MI, the performance of which is sensitive totemplate size changes.

The CMR values of MI are usually low when the templatesize is small (less than 60 × 60 pixels). This is because MIneeds to calculate the joint entropy between images, whichis very sensitive to sample sizes (namely, template sizes).By comparison, DLSC presents the more stable performancewhen the template size changes, and can achieve a substantialCMR value even in a small template size.

On the other hand, these four similarity metrics presentthe different matching performance for different test sets dueto the differences of image characteristics. For test 1 andtest 2 consisting of high-resolution images covering urbanareas, DLSC achieves the much higher CMRs than the othersimilarity metrics [Fig. 7(a) and (b)]. This is because thereare the similar shape and contour features such as buildingsand roads between the images. Therefore DLSC, capturing the


YE et al.: ROBUST OPTICAL-TO-SAR IMAGE MATCHING BASED ON SHAPE PROPERTIES 5

Fig. 7. CMR values of NCC, MTM, MI, and DLSC with increased templatesizes. (a) Test 1. (b) Test 2. (c) Test 3. (d) Test 4.

Fig. 8. RMSEs of the four similarity metrics.

shape similarity between images, has an obvious advantagecompared with the others. For test 3 and test 4 including themedium-resolution images covering suburban areas, DLSC’ssuperiority over other similarity metrics declines slightly incomparison to the first two tests. This may be attributedto that the images of the two tests do not contain so richshape and structure information as well as these of test 1 andtest 2. Moreover, for test 4 including the images with quitepoor shape and contour features, the CMR value of DLSCpresent an obvious drop compared with the first three testsets. This indicates that the performance of DLSC depends onshape properties of images, and could decline when imagescontain few shape or contour features. However, DLSC stillperforms better than other similarity metrics.

Fig. 8 shows the RMSEs of the four similarity metrics ofcorrect matched point pairs which are detected in the templatewindow with a size of 100 × 100 pixels. One can clearlyobserve that DLSC achieves the highest matching accuracy.The above experiments demonstrate that DLSC is a robustsimilarity metric for optical-to-SAR image matching.

IV. CONCLUSION

In this letter, a novel similarity metric based on shapeproperties of images is proposed for optical-to-SAR imagematching, to address complex nonlinear intensity differences

between such images. We first build the DLSS descriptor onthe basis of LSS, and then utilize the NCC of such descriptorsto define a similarity metric (named DLSC) that representsthe shape similarity between images. A template matchingstrategy is employed to detect CPs between images. DLSC hasbeen evaluated using four pairs of optical and SAR images,and compared to the traditional alternatives—NCC, MTM, andMI. The experimental results demonstrate that DLSC is robustto nonlinear intensity differences between optical and SARimages, and outperforms the other similarity metrics. However,it should be noted that the performance of DLSC may declineif images contain few shape or contour information because itdepends on shape features of images. This issue will be furtheraddressed in the future work.

ACKNOWLEDGMENT

The authors would like to thank the anonymous review-ers for their helpful comments and good suggestions, andChatfield et al. [16] for the public local self-similarity code.

REFERENCES

[1] M. Hao, W. Shi, H. Zhang, and C. Li, “Unsupervised change detectionwith expectation-maximization-based level set,” IEEE Geosci. RemoteSens. Lett., vol. 11, no. 1, pp. 210–214, Jan. 2014.

[2] B. Zitová and J. Flusser, “Image registration methods: A survey,” ImageVis. Comput., vol. 21, pp. 977–1000, Oct. 2003.

[3] J. Ma, J. C. W. Chan, and F. Canters, “Fully automatic subpixel imageregistration of multiangle CHRIS/Proba data,” IEEE Trans. Geosci.Remote Sens., vol. 48, no. 7, pp. 2829–2839, Jul. 2010.

[4] H. Sui, C. Xu, J. Liu, and F. Hua, “Automatic optical-to-SAR imageregistration by iterative line extraction and Voronoi integrated spectralpoint matching,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 11,pp. 6058–6072, Nov. 2015.

[5] H. Li, B. S. Manjunath, and S. K. Mitra, “A contour-based approachto multisensor image registration,” IEEE Trans. Image Process., vol. 4,no. 3, pp. 320–334, Mar. 1995.

[6] B. Fan, C. Huo, C. Pan, and Q. Kong, “Registration of optical and SARsatellite images by exploring the spatial relationship of the improvedSIFT,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 4, pp. 657–661,Jul. 2013.

[7] L. Huang and Z. Li, “Feature-based image registration using the shapecontext,” Int. J. Remote Sens., vol. 31, no. 8, pp. 2169–2177, 2010.

[8] S. Suri and P. Reinartz, “Mutual-information-based registration ofTerraSAR-X and Ikonos imagery in urban areas,” IEEE Trans. Geosci.Remote Sens., vol. 48, no. 2, pp. 939–949, Feb. 2010.

[9] H. Gonçalves, J. A. Gonçalves, L. Corte-Real, and A. C. Teodoro,“CHAIR: Automatic image registration based on correlation and Houghtransform,” Int. J. Remote Sens., vol. 33, no. 24, pp. 7936–7968, 2012.

[10] Y. Hel-Or, H. Hel-Or, and E. David, “Matching by tone mapping:Photometric invariant template matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 36, no. 2, pp. 317–330, Feb. 2014.

[11] Y. Ye and L. Shen, “HOPC: A novel similarity metric based on geometricstructural properties for multi-modal remote sensing image matching,”in Proc. Ann. Photogram., Remote Sens. Spatial Inf. Sci. (ISPRS),Jun. 2016, pp. 9–16.

[12] Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration ofmultimodal remote sensing images based on structural similarity,” IEEETrans. Geosci. Remote Sens., to be published.

[13] E. Shechtman and M. Irani, “Matching local self-similarities acrossimages and videos,” in Proc. IEEE Comput. Vis. Pattern Recognit.,Jun. 2007, pp. 1–8.

[14] Y. Ye and J. Shan, “A local descriptor based registration method for mul-tispectral remote sensing images with non-linear intensity differences,”ISPRS J. Photogramm. Remote Sens., vol. 90, pp. 83–95, Apr. 2014.

[15] Y. Ye, L. Shen, J. Wang, Z. Li, and Z. Xu, “Automatic matching ofoptical and SAR imagery through shape property,” in Proc. IGARSS,Jul. 2015, pp. 1072–1075.

[16] K. Chatfield, V. Gulshan, J. Philbin, and A. Zisserman. Self-Similarity Descriptor Code, accessed on Dec. 2016. [Online]. Available:https://www.robots.ox.ac.uk/~vgg/software/SelfSimilarity/

Documents

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1 Robust ...static.tongtianta.site/paper_pdf/59ed0746-4cac-11e... · A shape descriptor named dense local self-similarity (DLSS) is ﬁrst