12
Contextual and non-contextual performance evaluation of edge detectors T.B. Nguyen, D. Ziou * Dept. de Math. & d’Informatique, Faculte de Sciences, Universit e de Sherbrooke, Sherbrooke, Que., Canada J1K 2RI Received 23 April 1999; received in revised form 29 May 2000 Abstract This paper presents two new evaluation methods for edge detectors. First is non-contextual and concerns the evaluation of edge detector performance in terms of detection errors. The second contextual method evaluates the performance of edge detectors in the context of image reconstruction. Both methods study the influence of image characteristics and edge detector properties on detector performance. Five detectors are evaluated and the performance is compared. Ó 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Edge detection; Performance evaluation; Detector properties; Image characteristics 1. Introduction Several edge detectors have been proposed, with dierent goals and mathematical and algorithmic properties (Ziou and Tabbone, 1998). Conse- quently, one problem encountered by vision sys- tems developers is the selection of an edge detector to be used in a given application. This selection is primarily based on the definition of the influence of image characteristics and the properties of the detectors on their performance, a process we call edge detector performance evaluation (Ziou and Koukam, 1998). Several performance evaluation methods have already been proposed. Certain authors (Heath et al., 1997; Cho et al., 1997; Bowyer and Phillips, 1998; Dougherty et al., 1998) group the existing methods according to the presence or absence of ground truth. Such grouping is based on the complexity of the images (e.g., real images, synthetic images) and the per- formance criteria used. For example, without ground truth, it is dicult to measure the dis- placement of an edge from its true location. Ex- isting methods that rely on ground truth use either synthetic images or simple real images for which it is easy to specify the ground truth. This grouping of existing evaluation methods takes into account neither the subsequent use of edges nor the inter- vention of humans (i.e., subjectivity vs objectivity) during the evaluation process. Thus, we propose to group the existing work in two classes according to whether it considers subsequent use of edges in a given application (contextual and non-contextual) and the type of performance criteria used (sub- jective, objective; with or without ground truth). Both contextual and non-contextual evaluation methods can be either objective or subjective. The www.elsevier.nl/locate/patrec Pattern Recognition Letters 21 (2000) 805–816 * Corresponding author. Tel.: +1-819-821-3031; fax: +1-819- 821-8200. E-mail address: [email protected] (D. Ziou). 0167-8655/00/$ - see front matter Ó 2000 Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 4 5 - 3

Contextual and non-contextual performance evaluation of edge detectors

Embed Size (px)

Citation preview

Page 1: Contextual and non-contextual performance evaluation of edge detectors

Contextual and non-contextual performance evaluation ofedge detectors

T.B. Nguyen, D. Ziou *

Dept. de Math. & d'Informatique, Faculte de Sciences, Universit�e de Sherbrooke, Sherbrooke, Que., Canada J1K 2RI

Received 23 April 1999; received in revised form 29 May 2000

Abstract

This paper presents two new evaluation methods for edge detectors. First is non-contextual and concerns the

evaluation of edge detector performance in terms of detection errors. The second contextual method evaluates the

performance of edge detectors in the context of image reconstruction. Both methods study the in¯uence of image

characteristics and edge detector properties on detector performance. Five detectors are evaluated and the performance

is compared. Ó 2000 Published by Elsevier Science B.V. All rights reserved.

Keywords: Edge detection; Performance evaluation; Detector properties; Image characteristics

1. Introduction

Several edge detectors have been proposed, withdi�erent goals and mathematical and algorithmicproperties (Ziou and Tabbone, 1998). Conse-quently, one problem encountered by vision sys-tems developers is the selection of an edge detectorto be used in a given application. This selection isprimarily based on the de®nition of the in¯uenceof image characteristics and the properties of thedetectors on their performance, a process we calledge detector performance evaluation (Ziou andKoukam, 1998). Several performance evaluationmethods have already been proposed. Certainauthors (Heath et al., 1997; Cho et al., 1997;Bowyer and Phillips, 1998; Dougherty et al., 1998)

group the existing methods according to thepresence or absence of ground truth. Suchgrouping is based on the complexity of the images(e.g., real images, synthetic images) and the per-formance criteria used. For example, withoutground truth, it is di�cult to measure the dis-placement of an edge from its true location. Ex-isting methods that rely on ground truth use eithersynthetic images or simple real images for which itis easy to specify the ground truth. This groupingof existing evaluation methods takes into accountneither the subsequent use of edges nor the inter-vention of humans (i.e., subjectivity vs objectivity)during the evaluation process. Thus, we propose togroup the existing work in two classes according towhether it considers subsequent use of edges in agiven application (contextual and non-contextual)and the type of performance criteria used (sub-jective, objective; with or without ground truth).Both contextual and non-contextual evaluationmethods can be either objective or subjective. The

www.elsevier.nl/locate/patrec

Pattern Recognition Letters 21 (2000) 805±816

* Corresponding author. Tel.: +1-819-821-3031; fax: +1-819-

821-8200.

E-mail address: [email protected] (D. Ziou).

0167-8655/00/$ - see front matter Ó 2000 Published by Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 4 5 - 3

Page 2: Contextual and non-contextual performance evaluation of edge detectors

contextual method involves evaluating edge de-tectors taking into account the requirements of aparticular application. It has been used by a fewauthors in the areas of object recognition (Heath etal., 1997; Sanocki et al., 1998) and motion (Shin etal., 1998). Non-contextual evaluation is carriedout independently of any application.

Subjective methods (Nair et al., 1995; Heathet al., 1997), borrowed from the ®eld of psychology,use human judgment to evaluate the performanceof edge detectors. More precisely, these methodsinvolve presenting a series of edge images to sev-eral individuals and asking them to assign scoreson a given scale (Nair et al., 1995). Even if thesemethods seem easy to put into practice, they havesome drawbacks. The number of characteristics ahuman eye can distinguish is limited. For example,the eye cannot di�erentiate between two gray lev-els that are slightly di�erent. As well, the judgmentdepends on the individual's experience and at-tachment to the method, as well as on the imagetype (e.g., multi-spectra, X-ray).

The basic idea behind the objective methods(Abdou, 1978; Kitchen and Rosenfeld, 1981; Pratt,1991; Kitchen and Venkatesh, 1992; Kanungoet al., 1995) is to measure the performance of thedetector according to prede®ned criteria. This canbe accomplished empirically or theoretically. Inthe presence of a ground truth, the criteria concernthe di�erence between detected edges and theground truth, measured by errors of omission,localization, multiple responses, sensitivity, orien-tation, continuity, and thinness. If there is noground truth, then the methods evaluate the like-lihood of a detected edge being a true edge and thescattering of its orientation and location. Usuallythe criteria concern the continuity, thinness, vari-ance of location and orientation of edges.

This paper presents two new performanceevaluation methods for edge detectors contextualand non-contextual. There is a signi®cant di�er-ence between our methods and earlier ones thatconcerns the taking into account of image features,properties of detectors and parameters used in theevaluation method. The non-contextual methodinvolves evaluating the performance of edge de-tectors in terms of detection errors. Detectionerrors include classical errors (of omission,

localization, multiple responses, sensitivity, andorientation) as well as a new error related to false-edge suppression. The basic idea behind thismethod consists of running a given detector sev-eral times on an image with a known structure,varying the parameters of the detector and theimage, and then measuring its performance. Thedrawback of this approach is that it does notcompletely characterize the performance of edgedetectors, and it seems important to take into ac-count the subsequent use, that is, to know whetherthey satisfy the requirements of a particular ap-plication. For this reason, we proposed a contex-tual performance evaluation method. This methodinvolves evaluating the performance of edge de-tectors in the context of image reconstruction. Itconsists of measuring the performance of an edgedetector using the mean square di�erence betweenthe reconstructed image and the original one. Bothmethods study the in¯uence of image characteris-tics and detector properties on detector perfor-mance. This paper is divided into six sections.Section 2 describes the non-contextual method ofperformance evaluation. Section 3 describes theexperimental results yielded by this evaluationmethod. Sections 4 and 5 are devoted to the con-textual performance evaluation method and theexperimental results obtained and Section 6 sum-marizes the main results.

2. Non-contextual performance evaluation method

The non-contextual method consists of runningan edge detector several times on a synthetic im-age, varying the image characteristics and the de-tector properties. We then determine the in¯uenceof these parameters on the performance of theedge detector. Detector performance is determinedby comparing the obtained edges with the idealedges, which are assumed to be known. For thispurpose, a given edge pixel is assigned to one ofthe following four classes: ideal, unambiguous,ambiguous, or false. False edge pixels do not be-long to the support region of ideal edge (i.e., pixelsin the vicinity of the ideal edge). Fig. 1(a) presentsan example of an ideal edge, identi®ed in black.The pixels identi®ed in gray belong to the support

806 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816

Page 3: Contextual and non-contextual performance evaluation of edge detectors

region of the ideal edge. Among edge pixels de-tected within the support region, there are am-biguous edge pixels corresponding to multipleresponses. All edge pixels detected within thesupport region, which are not ambiguous, arecalled unambiguous edge pixels. Among the mul-tiple responses of a detector to an ideal edge, thedetected edge closest to the ideal edge belongs tothe unambiguous edge. The performance of anedge detector is de®ned by six types of errors:omission, localization, multiple-response, sensitiv-ity, suppression, and orientation errors. Suppres-sion and orientation errors concern only gradientdetectors. A good detector must minimize all ofthese errors. De®nitions of these performancemeasures are as follows:· Omission error. This error occurs when the de-

tector fails to ®nd an ideal edge Fig. 1(b). Theerror is measured by dividing the total numberof omitted edge pixels by the total number ofideal edge pixels.

· Localization error. This error occurs when thelocation of the unambiguous edge is di�erentfrom the location of the ideal edge (Fig. 1(c)).The error is measured by dividing the total dis-tance between unambiguous edge pixels andideal edge pixels by the total number of unam-biguous edge pixels.

· Multiple-response error. This error occurs whenmultiple edges are detected in the vicinity of anideal edge (Fig. 1(d)). The error is de®ned by di-viding the total number of ambiguous edge pix-els by the total number of unambiguous edgepixels.

· Sensitivity error. This error occurs when thedetector localizes edges, which do not belong

to the support region of the ideal edge(Fig. 1(e)). The error is de®ned by dividing thetotal number of false edge pixels by the totalnumber of edge pixels detected.

· Suppression error. Usually false-edge suppres-sion is done by a thresholding operation. Theedges that have a gradient modulus below agiven threshold are suppressed. However, thegradient modulus of an unambiguou edge maybe lower than the gradient modulus of a falseedge. Suppression errors occur when there is asuppression of unambiguous edges while falseedges persist. Let us consider the distributionof the gradient modulus of false edges and thedistribution of the gradient modulus of unam-biguous edges; the suppression error is measuredby the overlap between these distributions.

· Orientation error. This error occurs when the es-timated orientation of the detected edge is notequal to the given orientation. The error is de-®ned by dividing the sum of the absolute valueof the di�erence between the estimated andgiven orientations of unambiguous edge pixelsby the total number of unambiguous detectededge pixels.The parameters in¯uencing detector perfor-

mance that were considered related to detectorproperties, image characteristics, and the perfor-mance evaluation method. The parameter relatedto the performance evaluation method is the sizeof the edge support region. The parameters relatedto image characteristics concern edges character-istics such as type, sharpness, signal-to-noise ratioand subpixel. The edge types we considered are thestep, staircase and pulse (Fig. 2). Fig. 3 presentsan example of a synthetic image containing

Fig. 1. Detection errors: (a) ideal edges, black pixels are in the ideal edge and grey pixels belong to the support region of ideal edge; (b)

omission error; (c) localization error; (d) multiple-response error; (e) sensitivity error. Suppression and orientation errors are not easy

to depict.

T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816 807

Page 4: Contextual and non-contextual performance evaluation of edge detectors

256� 256 pixels and 256 gray levels used in theevaluation. The image contains ®ve edges, theirtype being (from left to right) step, ascendingstaircase, descending staircase, pulse and invertedpulse. The vertical step edge is determined by thefollowing equation:

I�x; y� � c 1ÿ �1=2�eÿl xÿLocedge� �� �and if x6Locedge;

�c=2� � el�xÿLocedge� and if x > Locedge;

8<:where c is the contrast, l the sharpness and Locedge

is the location of the edge. This location canbe real (subpixel). The staircase and pulse edgesare formed by the combination of two stepsI�x; y� � aI�xÿ D; y� where a < 0 is a pulse anda > 0 is a staircase. To this image, we added whitenoise of a given standard deviation.

The edge detectors used are the gradient ofGaussian (DGG) (Canny, 1983), gradient ofDeriche (DGD) (Deriche, 1987), gradient of Shen(DGS) (Shen and Castan, 1992), and Laplacian ofGaussian (DLG) (Marr and Hildreth, 1980),Laplacian of Deriche (DLD). The parameters ofthese detectors are the scale, the order of the dif-ferentiation operator, and the smoothing ®lter.The scale for all of these detectors is controlled byone parameter. In the cases of DGG and DLG, thescale corresponds to the standard deviation of theGaussian used as the smoothing ®lter. For DGD,DLD, and DGS the scale is equal to 2 divided bythe ®lter parameter (Lacroix, 1990). Thus, it is easyto run all of these detectors at a similar scale.

All of these detectors have analytical de®nitionsand they obey the convolution theorem becausethey can be written as the di�erentiation of the

convolution of the image with the ®lter. Concep-tually, they include three operations: smoothing,di�erentiation, and false-edge suppression. Theimportance of the last operation depends on theproperties of the two others and the image char-acteristics. For gradient detectors, the previouslyde®ned suppression criterion takes into accountthe false-edge suppression step. In other words,false edges can be cleaned easily for detectorshaving a low value for this criterion. The false-edge suppression step is omitted for Laplaciandetectors. The ®ve detectors share either the dif-ferentiation operators (Gradient and Laplacian)and ®lters (Gaussian and exponential) on theperformance measures, it is possible to character-ize the behavior of each operation independentlyof the other. This makes it possible to build anedge detector that ful®lls the given requirementsby selecting the di�erentiation operator and the®lter. For example the di�erence in performancebetween the Laplacian of Gaussian and the gra-dient of Gaussian is due to the di�erentiation op-erator. Similarly, the di�erence in performancebetween the gradient of Deriche and the gradient

Fig. 2. Pro®les of (a) step; (b) staircase; (c) pulse.

Fig. 3. Synthetic image

808 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816

Page 5: Contextual and non-contextual performance evaluation of edge detectors

of Shen is due to the ®lter. To reduce the e�ect ofthe implementation method, all detectors havebeen implemented using convolution masks.

The evaluation method can be summarized asfollows. Recall that in order to obtain perfor-mance measures for the edge detectors, we ranthem, varying the parameters mentioned above.Parameters that take their values in continuousintervals were sampled. We considered largeintervals for the parameters and a small step forthe sampling, i.e., subpixel 2 �0:0; 0:5�, sharpness2 �1; 10�, signal-to-noise ratio 2 �2; 5�, scale2 �1; 2:5�, and support region size 2 �3; 11�.

Each performance measure is a function ofeight discrete variables: di�erentiation operator,®lter, scale, edge type, sharpness, signal-to-noiseratio, subpixel, and size of support region. There isno e�cient way to analyze this function. Thus, tocarry out the performance analysis, we chose toreduce the number of variables by using twocomparison techniques. Firstly, detector perfor-mance measures were compared by computing thecorrelation between them. Secondly, the meanperformance measures were computed by varyingparameters over the entire intervals given aboveand the detectors were ranked. The two techniquesgave similar results. The quantity of data gener-ated by the non-contextual method is over-whelming. Only a part of the data related to themean performance measures is given in this paper.The reader will ®nd all the complete results in(Nguyen, 1998).

3. Experimental results

In this section, we present the general observa-tions derived from the results for the non-contex-tual method. We will start by describing the mean

performance measures from which the e�ects ofthe ®lters and di�erentiation operators used arededuced. Then, we will present the e�ect of theother parameters considered on performance. Themean performance measures are computed sepa-rately for each edge type by varying the otherparameters. Table 1 presents the experimental re-sults obtained in the case of a step edge. The ®rstnumber indicates the mean error. To facilitatecomparison of detectors, we have normalized theerrors by dividing each of them by the highest. Thenormalized errors are presented in brackets. Forexample, in Table 1, for DGS, the omission erroris 0.05, the normalized omission error 0.56, thesuppression error 0.26, and the normalized sup-pression error 0.93. Performances for staircase andpulse edges are given in Tables 2 and 3.

By analyzing the results obtained, we concludethat:· The ranking of detectors is the same for multi-

ple-response, sensitivity, suppression and orien-tation errors. The sensitivity error of alldetectors is comparable. The DGG has the low-est multiple-response and sensitivity errorswhile the DLD has the highest. A detector withlow multiple-response and sensitivity errors hasa high omission error and vice versa.

· Performance is in¯uenced by the di�erentiationoperator. Laplacian detectors have lower omis-sion errors than their corresponding gradientdetectors. However, the latter have lower mul-tiple-response and sensitivity errors than thecorresponding Laplacian detectors. This ex-plains why Laplacian detectors are not suitablefor noisy or textured images. A Laplacian de-tector is more suitable for the localization ofstaircase and pulse edges, whereas, a gradientdetector is more suitable for the localizationof step edges.

Table 1

Mean value of performance measures in the case of a step edge

Omission Localization Multiple-response Sensitivity Suppression Orientation

DLD 0.02 (0.22) DGS 0.70 (0.87) DGG 1.21 (0.52) DGG 0.93 (0.97) DGG 0.18 (0.64) DGG 31.22 (0.86)

DLG 0.04 (0.44) DGD 0.73 (0.92) DGD 1.58 (0.68) DGD 0.94 (0.98) DGD 0.21 (0.75) DGD 33.42 (0.92)

DGS 0.05 (0.56) DGG 0.77 (0.97) DGS 1.65 (0.71) DGS 0.94 (0.98) DGS 0.26 (0.93) DGS 35.12 (0.97)

DGD 0.06 (0.67) DLG 0.78 (0.99) DLG 1.75 (0.76) DLG 0.95 (0.99)

DGG 0.09 (1.00) DLD 0.79 (1.00) DLD 2.31 (1.00) DLD 0.96 (1.00)

T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816 809

Page 6: Contextual and non-contextual performance evaluation of edge detectors

· Performance is in¯uenced by the ®lter. The ®lterthat has lower suppression and orientation er-rors has a higher omission error and lower mul-tiple-response and sensitivity errors. The ®lterof Shen has the lowest omission error for allthe edges, followed by the ®lters of Dericheand Gauss. For the localization error, the rank-ing varies according to the scale and the edgetype.

We will now deal with the in¯uence of the pa-rameters considered on the performance of a de-tector. As mentioned above, the quantity of datagenerated by the non-contextual method is over-whelming. In order to analyze the variations ofperformance, we decided to de®ne a ``language'' todescribe the behavior of the detectors. Fig. 4 showsthe increasing curves (see, Nguyen, 1998 for thedecreasing curves). Fig. 4(a)±(d) present curvesthat increase linearly. Fig. 4(e) presents a curve

that increases exponentially and Fig. 4(f) presentsa curve that increases logarithmically. Borders 1and 2 show the interval of variation of the error.

To complete our language we needed to addthree curves. The ®rst represents quasi-linearmeasures Fig. 5(a) and the second, oscillatingmeasures Fig. 5(b). Finally, it is possible that ameasure may oscillate between the two borders,that is, it is neither increasing nor decreasingFig. 5(c).

Fig. 6 presents an example of detector perfor-mance as a function of the signal-to-noise ratio inthe case of a step. The results below concern alltypes of edges, since the behavior of the detectorsis the same for step, stair and pulse edges:· When the subpixel increases, the omission error

oscillates for gradient detectors and increaseslinearly with oscillations for Laplacian detec-tors. The localization error increases linearly

Fig. 4. Increasing curves: (a), (b), (c) and (d) are linear; (e) is exponential; (f) is logarithmic.

Table 3

Mean values of performance measures in the case of a pulse edge

Omission Localization Multiple-response Sensitivity Suppression Orientation

DLD 0.02 (0.18) DLD 0.28 (0.41) DGG 0.74 (0.47) DGG 0.86 (0.95) DGG 0.16 (0.57) DGG 24.27 (0.77)

DLG 0.03 (0.27) DLG 0.50 (0.72) DGD 0.98 (0.62) DGD 0.88 (0.97) DGD 0.20 (0.71) DGD 27.17 (0.86)

DGS 0.07 (0.64) DGS 0.67 (0.97) DGS 1.08 (0.68) DGS 0.88 (0.97) DGS 0.26 (0.93) DGS 30.43 (0.96)

DGD 0.08 (0.73) DGG 0.68 (0.99) DLG 1.17 (0.74) DLG 0.89 (0.98)

DGG 0.11 (1.00) DGD 0.69 (1.00) DLD 1.59 (1.00) DLD 0.91 (1.00)

Table 2

Mean value of performance measures in the case of a staircase edge

Omission Localization Multiple-response Sensitivity Suppression Orientation

DLD 0.02 (0.18) DLD 0.25 (0.36) DGG 0.74 (0.43) DGG 0.86 (0.96) DGG 0.16 (0.57) DGG 24.27 (0.77)

DLG 0.06 (0.55) DLG 0.49 (0.70) DGD 0.97 (0.57) DGD 0.88 (0.98) DGD 0.19 (0.68) DGD 26.85 (0.85)

DGS 0.07 (0.64) DGS 0.67 (0.96) DGS 1.08 (0.63) DGS 0.88 (0.98) DGS 0.26 (0.93) DGS 30.37 (0.96)

DGD 0.08 (0.73) DGG 0.69 (0.99) DLG 1.38 (0.81) DLG 0.89 (0.99)

DGG 0.11 (1.00) DGD 0.70 (1.00) DLD 1.71 (1.00) DLD 0.90 (1.00)

810 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816

Page 7: Contextual and non-contextual performance evaluation of edge detectors

with oscillations for all detectors. The multiple-response error oscillates for gradient detectorsand decreases linearly for Laplacian detectors.The sensitivity error oscillates for all detectors.The suppression and orientation errors oscillatefor all gradient detectors.

· When the sharpness increases, the omissionerror decreases exponentially for all detectors.The localization and multiple-response errorsdecrease exponentially for gradient detectors anddecrease linearly with oscillations for Laplaciandetectors. The sensitivity error decreases linear-ly with oscillations for all detectors. The sup-pression and orientation errors decreaseexponentially for all gradient detectors.

· When the signal-to-noise ratio increases (Fig. 6),the omission and multiple-response errors

decrease quasi-linearly for all detectors. The lo-calization error decreases quasi-linearly for gra-dient detectors and decreases linearly withoscillations for Laplacian detectors. The sensi-tivity error decreases linearly with oscillationsfor all detectors. The suppression error decreasesexponentially for DGG, DGD and DGS. Theorientation error decreases quasi-linearly forall gradient detectors.

· When the size of the support region increases,the omission error decreases exponentially,and the localization error increases logarithmi-cally for all detectors. The multiple-responseerror increase linearly for DLG and DLD,increases quasi-linearly for DGD and DGS,and increases exponentially for DGG. The sen-sitivity error decreases linearly for all detectors.

Fig. 6. Performance of detectors as a function of the signal-to-noise ratio in the case of a step edge. Results obtained are rounded to

two decimals; this explains why some borders are equal (e.g., in the sensitivity column).

Fig. 5. Other curves.

T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816 811

Page 8: Contextual and non-contextual performance evaluation of edge detectors

The suppression error increases quasi-linearlyfor all detectors, except for DGG where it in-creases exponentially. The orientation error in-creases logarithmically for all detectors. Weconclude that the non-contextual evaluationmethod is sensitive to the size of the supportregion.

· When the scale increases, the omission error in-creases logarithmically for gradient detectorsand increases linearly for Laplacian detectors.The localization error increases exponentiallyfor gradient detectors and increases logarithmi-cally for Laplacian detectors. The multiple-response error decreases linearly for gradientdetectors and decreases exponentially forLaplacian detectors. The sensitivity errordecreases linearly for all detectors. The suppres-sion error decreases linearly with oscillationsand the orientation error increases logarithmi-cally for DGG, DGD and DGS.

4. Contextual performance evaluation method

This method consists of measuring the perfor-mance of the detectors in the context of imagereconstruction from edges. Carlsson (1988) pro-posed an algorithm for image compression from

coded edges. The image-coding algorithm is basedon the principle that important features like edgesshould be coded and reproduced as exactly aspossible and no spurious features should be in-troduced in the image reconstruction process. Thereconstructed image is smooth and is obtained asthe solution to a heat di�usion equation. Thedrawback is that the decompressed image is de-graded because there is a loss of informationduring the edge detection process. As we willshow, the reconstructed image is in¯uenced by thedetector used and the image characteristics. Morerecently, the Carlsson algorithm has been used toreconstruct images from the representation ofedges in scale space (Elder and Zucker, 1998). Weare not interested in the image compression pro-cess; rather, our primary interest lies in using thereconstructed image to characterize the perfor-mance of the edge detector used. The performanceevaluation methods applied in two steps. The ®rstconsists of obtaining edges by performing an edge-detection with a given detector. Fig. 7 presentseight images used in the evaluation. These imageshave a size of 256� 256 pixels and contain 256 graylevels. We also considered di�erent types of edgesin order to determine the in¯uence of image char-acteristics on detector performance (Fig. 7(a)±(c)).The second step consists of reconstructing the

Fig. 7. Synthetic images: (a) step; (b) staircase; (c) pulse. Real images: (d) nuts; (e) glasses; (f) Lena; (g) back; (h) Sherbrooke.

812 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816

Page 9: Contextual and non-contextual performance evaluation of edge detectors

original image from the edge image, using thedi�usion process.

The considered edge detectors are DGG, DGD,DGS, DLG and DLD. The interval used for thescale is between 0.95 and 5.0. The performance ofthe detector is de®ned by the mean square di�er-ence between the reconstructed image Lrec and theoriginal Iori

Equadratic ���������������������������������������������������P

x

Py�Irec�x;y� ÿ Iori�x;y��2

qn

;

where n is the image size. When Equadratic equals 0, itmeans that the reconstructed image is identical tothe original one. The greater the value of Equadratic,the more degraded the reconstructed image.

5. Experimental results

This section presents the experimental results forthe contextual method. To provide the reader withreference point, Fig. 8 shows as example of a re-constructed image obtained by the ®ve detectors we

Fig. 8. Image reconstruction: (a) and (b) edges obtained by DGS and the reconstructed image; (c) and (d) DGD; (e) and (f) DGG;

(g) and (h) DLD; (i) and (j) DLG.

T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816 813

Page 10: Contextual and non-contextual performance evaluation of edge detectors

Fig. 9. Mean square errors. Synthetic images: (a) step; (b) staircase; (c) pulse. Real images: (d) nuts; (e) glasses; (f) Lena; (g) back;

(h) Sherbrooke.

814 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816

Page 11: Contextual and non-contextual performance evaluation of edge detectors

have used. The scale of all these detectors is two. Aswe mentioned earlier, the scale of Deriche and Shen®lters is equal to two over the scale of Gaussian®lter. Edges used in the reconstruction process havenot been cleaned. Experimentation showed that thecleaning operation has no e�ect on the rank of the®ve detectors. This allows avoiding, taking thethreshold into account in our study. Subjectively,according to the sharpness of reconstructed imagesthe ®ve detectors are ranked as follows (Fig. 8):DGS, DGD, DLD, DGG, and DLG.

For each image, Fig. 9 presents the mean squaredi�erence between the reconstructed image and theoriginal, as a function of the scale. We concludethat:· The quadratic error depends on the scale. It in-

creases when the scale increases, for all detec-tors. In fact, at high scale, there are fewdetected edges. The di�usion process, which isiterative, has edges to start from.

· Performance depends on the edge type. In thecase of a step edge, the ranking for a small scale2 �1:0; 1:6� is DGG, DGD, DGS, DLG, andDLD (see Fig. 9(a)). The ranking for a largerscale 2 �1:6; 5:0� is DGS, DGD, DLD, DGG,DLG. At high scale, a gradient detector produc-es a lower error than the corresponding Lapla-cian detector. In the case of staircase andpulse edges, the ranking for a small scale2 �1:0; 1:6� is DGG, DGD, DGS, DLG andDLD (see Fig. 9(b) and (c)). We noticed thatthis ranking is similar to the one for a step edge.The ranking for a larger scale 2 �1:6; 5:0� isDGS, DLD, DLG, DGG. In this case, a Lapla-cian detector has lower error than the corre-sponding gradient detector.

· For Fig. 9(d) and (e), gradient detector hasa lower error than a Laplacian one. ForFig. 9(f) and (g), gradient detector has a lowererror than the corresponding Laplacian detector.For Fig. 9(h), a gradient detector has a lowererror than the corresponding Laplacian detector,except for the Gaussian detectors. We concludethat performance is in¯uenced by the di�erenti-ation operator.

· Performance is in¯uenced by the ®lter. The ®lterof Shen gives the best results, followed by the®lters of Deriche and Gauss.

6. Conclusion

In this paper, we have presented two methodsfor measuring the performance of edge detectors.The ®rst one is non-contextual and evaluates theperformance of edge detectors in terms of detec-tion errors. The main features of this method are:· The detection errors include classical errors (of

omission, localization, multiple responses, sen-sitivity, and orientation) as well as a new errorto false-edge suppression.

· The in¯uence of image characteristics and of theproperties of detectors on their performance isdetermined. The image characteristics used areedge type, subpixel, sharpness, and signal-to-noise ratio. The detector parameters are thescale, the order of the di�erentiation operator,and the ®lter. A last parameter, the size of theideal-edge support region is used to measurethe performance of the detectors.

Most quantitative evaluation methods are non-contextual. However, these methods do not com-pletely characterize the performance of an edgedetector. It is important to take into account thesubsequent use of the detector to know whether itsatis®es the requirements of a particular applica-tion. This is why we proposed a second evaluationmethod for the performance of edge detectors inthe context of image compression. It involvesmeasuring the performance of an edge detectoraccording to a mean square di�erence between thereconstructed image and the original one. In bothmethods, we studied the in¯uence of the imagecharacteristics and detector properties on detectorperformance. The results of this study will behelpful in selecting an edge detector for a givenapplication. However, several improvements canbe made in order to make these performanceevaluation methods more complete. These includede®ning a better synthesis of experimental resultsand considering other types of edges.

References

Abdou, I.E., 1978. Quantitative methods of edge detection.

Technical Report No. 830, Image Processing Institute,

University of Southern California.

T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816 815

Page 12: Contextual and non-contextual performance evaluation of edge detectors

Bowyer, K.W., Phillips, P.J., 1998. Empirical Evaluation

Techniques in Computer Vision. IEEE Computer Society

Press, Los Alamitos, CA.

Canny, J.F., 1983. Finding edges and lines in images. Technical

Report No. 720, Massachusetts Institute of Technology.

Carlsson, S., 1988. Sketch based coding of grey level images.

Signal Process. 15 (1), 57±83.

Cho, K., Meer, P., Cabrera, J., 1997. Performance assessment

through bootstrap. IEEE Trans. Pattern Anal. Machine

Intell. 19, 1185±1198.

Deriche, R., 1987. Using canny's criteria to derive a recursive

implemented optimal edge detector. Internat. J. Comput.

Vision 1 (2), 167±187.

Dougherty, S., Bowyer, K.W., Kranenburg, C., 1998. ROC

curve evaluation of edge detector performance. In: Proc.

IEEE Internat. Conf. Image Process.

Elder, J.H., Zucker, S.W., 1998. Local scale control for edge

detection and blur estimation. IEEE Trans. Pattern Anal.

Machine Intell. 20 (7), 699±716.

Heath, M.D., Sarkar, S., Sanocki, T., Bowyer, K.W., 1997. A

robust visual method for assessing the relative performance

of edge-detection algorithms. IEEE Trans. Pattern Anal.

Machine Intell. 19 (12), 1338±1359.

Kanungo, T., Jaisimha, M.Y., Palmer, J., Haralick, R.M.,

1995. A methodology for quantitative performance evalu-

ation of detection algorithms. IEEE Trans. Image Process. 4

(12), 1667±1674.

Kitchen, L., Rosenfeld, A., 1981. Edge evaluation using local

edge coherence, IEEE Trans. Systems Man Cybernat. SMC-

11(9), 597±605.

Kitchen, L.J., Venkatesh, S., 1992. Edge evaluation using

necessary components. CVGIP: Graphical Models and

Image Processing 54 (1), 23±30.

Lacroix, V., 1990. Edge detection: what about rotation invari-

ance? Pattern Recognition Letters 11, 797±802.

Marr, D., Hildreth, E.C., 1980. Theory of edge detection. In:

Proc. Roy. Soc. London B207, pp. 187±217.

Nair, D., Mitiche, A., Aggarwal, J.K., 1995. On comparing the

performance of object recognition systems. Internat. Conf.

Image Process, 631±634.

Nguyen, T.B., 1998. �Evaluation des algorithmes d'extraction de

contours dans des images �a niveausx de gris. M�emoire de

Maõtrise, Universit�e de Sherbrooke.

Pratt, W.K., 1991. Digital Image Processing, second ed. Wiley-

Interscience, New York.

Sanocki, T., Bowyer, K.W., Heath, M.D., Sarkar, S., 1998. Are

edges su�cient for object recognition? J. Exp. Psychol. 24

(1), 340±349.

Shen, J., Castan, S., 1992. An optimal linear operator for edge

detection. CVIGIP: Graphical Models and Image Process-

ing 54 (2), 122±133.

Shin, M.C., Goldgof, D., Bowyer, K.W., 1998. An objective

comparaison methodology of edge detection algorithms

using a structure from motion task. In: Bowyer, K.W.,

Phillips, P.J., (Eds.), IEEE Computer Society Press, Los

Alamitos, CA.

Ziou, D., Koukam, A., 1998. Knowledge-based assistant for

the selection of edge detectors. Pattern Recognition 31 (5),

587±596.

Ziou, D., Tabbone, S., 1998. Edge detection techniques ± an

overview. Internat. J. Pattern Recognition Image Anal. 8,

537±559.

816 T.B. Nguyen, D. Ziou / Pattern Recognition Letters 21 (2000) 805±816