1 Robust Real Time Pattern Matching using Bayesian

1

Robust Real Time Pattern Matching using BayesianSequential Hypothesis Testing

Ofir Pele and Michael Werman

Abstract

This paper describes a method for robust real time pattern matching. We first introduce a family of imagedistance measures, the “Image Hamming Distance Family”. Members of this family are robust to occlusions, smallgeometrical transforms, light changes and non-rigid deformations. We then present a novel Bayesian frameworkfor sequential hypothesis testing on finite populations. Based on this framework, we design an optimal rejec-tion/acceptance sampling algorithm. This algorithm quickly determines whether two images are similar with respectto a member of theImage Hamming Distance Family. We also present a fast framework that designs a sub-optimalsampling algorithm. Extensive experimental results show that the sequential sampling algorithm performance isexcellent. Implemented on a Pentium 4 3GHz processor, detection of a pattern with 2197 pixels, in 640x480 pixelframes, where in each frame the pattern rotated and was highly occluded, proceeds at only 0.022 seconds perframe.

Index Terms

Pattern matching, template matching, pattern detection, image similarity measures, Hamming distance, realtime, sequential hypothesis testing, image statistics, Bayesian statistics, finite populations

I. INTRODUCTION

MANY applications in image processing and computer vision require finding a particular pattern inan image,pattern matching. To be useful in practice, pattern matching methods must be automatic,

generic, fast and robust.Pattern matching is typically performed by scanning the entire image, and evaluating a distance measure

between the pattern and a local rectangular window. The method proposed in this paper is applicable toany shape of a pattern, even a non-contiguous one. We use the notion of “window” to cover all possibleshapes.

First, we introduce a family of image distance measures called the “Image Hamming Distance Family”.A distance measure in this family is the number of non similarcorresponding elements between twoimages. Members of this family are robust to occlusions, small geometrical transforms, light changes andnon-rigid deformations.

Second, we show how to quickly decide whether a window is similar to the pattern with respect to amember of the “Image Hamming Distance Family”. The trivial,but time consuming solution is to computethe exact distance between the pattern and the window by going over all the corresponding pixels. Wepresent an algorithm that samples corresponding pixels andaccumulates the number of non-similar pixels.The speed of this algorithm is based on the fact that the distance between two non-similar images is usuallyvery large whereas the distance between two similar images is usually very small (see Fig. 1). Therefore,for non-similar windows the sum will grow extremely fast andwe will be able to quickly decide thatthey are non-similar. As the event of similarity in pattern matching is so rare (see Fig. 1), we can affordto pay the price of going over all the corresponding pixels insimilar windows. Note that the algorithmdoes not attempt to estimate the distances for non-similar windows. The algorithm only decides that thesewindows, with a very high probability (for example, 99.9%),are non-similar. The reduction in runningtime is due to the fact that this unnecessary information is not computed.

The idea of sequential sampling a distance is not new [1]. Themajor contribution in our work isa novel efficient Bayesian framework for hypothesis testingon finite populations. Given acceptablebounds on the probability of error (false negatives and false positives) the framework designs a sampling

2

0 500 1000 1500 21970

2000

4000

6000

8000

Hamming distance

#w

indo

ws

0 500 10000

2

4

6

8

10

Hamming distance

#w

indo

ws

Fig. 1. The distance of the pattern to most windows in patternmatching is very high. A distance measure from the Image HammingDistance Family was computed between the pattern and a sliding masked window in the video frames of Fig. 2. Above we see theresultinghistogram. The left part of the histogram is zoomed in. We cansee that most of the windows in the video were very far from thepattern.This is a typical histogram.

TABLE I

NOTATION TABLE

A, |A| Mask that contains coordinates of pixels or pairs of coordinates of pixels.|A| is the size of this mask.

D Random variable of the Hamming distance.t Image similarity threshold,i.e. if the Hamming distance of two images is

smaller or equal tot, then the images are considered similar. Otherwise,the images are considered non-similar.

p Pixel similarity threshold. Used in several of the members of the ImageHamming Distance Family.

algorithm that has the minimum expected running time. We also present a fast framework that designs asub-optimal sampling algorithm. For comparison, we also present a framework that designs an optimalfixed size sampling algorithm. Theoretical and experimental results shows that sequential sampling needssignificantly fewer samples than fixed size sampling.

Sampling is frequently used in computer vision, to reduce time complexity that is caused by the size ofthe image data. Our work (like work by Matas et al. [2], [3]) shows that designing an optimal samplingscheme is important and can improve speed and accuracy significantly.

A typical pattern matching task is shown in Fig. 2. A non-rectangular pattern of 2197 pixels was soughtin a sequence of 14, 640x480 pixel frames. Using the samplingalgorithm the pattern was found in 9 outof 11 frames in which it was present, with an average of only 19.70 pixels examined per window insteadof 2197 needed for the exact distance computation. On a Pentium 4 3GHz processor, it proceeds at only0.022 seconds per frame. Other distances such as cross correlation (CC), normalized cross correlation(NCC), l1, l2, yielded poor results even though they were computed exactly (the computation took muchlonger).

This paper is organized as follows. Section II is an overviewof previous work on fast pattern matching,block matching for motion estimation, Hamming distance in computer vision and sequential hypothesistesting. Section III introduces the Image Hamming DistanceFamily. Section IV describes the Bayesianframework. Section V discusses the issue of the prior. Section VI presents extensive experimental results.Finally, conclusions are drawn in Section VII. A notation table for the rest of the paper is given in TableI.

3

(a)

(b) (c) (d)

Fig. 2. Real time detection of a rotating and highly occludedpattern.(a) A non-rectangular pattern of 2197 pixels. Pixels not belonging to the mask are in black. (b) Three 640x480 pixel frames out of fourteenin which the pattern was sought. (c) The result. Most similarmasked windows are marked in white. (d) Zoom in of the occurrences of thepattern in the frames. Pixels not belonging to the mask are inblack.Thesequentialalgorithm proceeds at only 0.022 seconds per frame. Offline running time (time spent on the parameterization of thesequentialalgorithm) was 0.067 seconds. Note that the distance is robust to out of plane rotations and occlusions. Using other distances such as CC,NCC, l2, l1 yielded poor results. In particular they all failed to find the pattern in the last frame. We emphasize that no motion considerationwas taken into account in computation. The algorithm ran on all windows.

II. PREVIOUS WORK

A. Fast Pattern Matching

The distances most widely used for fast pattern matching arecross correlation and normalized crosscorrelation. Both can be computed relatively quickly in theFourier domain [4], [5].

The main drawback of correlation, which is based on the Euclidean distance, is that it is specific toGaussian noise. The difference between images of the same object comes from occlusions, geometricaltransforms, light changes and non-rigid deformations. None of these can be modeled well with a Gaussiandistribution. For a further discussion on Euclidean distance as a similarity measure see [6]–[9]. Note that

4

(a)

(b) (c)

(a-zoom)

(d)

Fig. 3. Real time detection of a specific face in a noisy image of a crowd.(a) A rectangular pattern of 1089 pixels. (b) A noisy versionof the original 640x480 pixel image. The pattern that was taken from the originalimage was sought in this image. The noise is Gaussian with a mean of zero and a standard deviation of 25.5. (c) The result image. All similarmasked windows are marked in white. (d) The occurrence of thepattern in the zoomed in image. Thesequentialalgorithm proceeds at only0.019 seconds per frame. Offline running time (time spent on the parameterization of thesequentialalgorithm) was 0.018 seconds. Note thatalthough the Hamming distance is not specific to Gaussian noise as thel2 norm, it is robust to Gaussian noise. The image is copyright by BenSchumin and was downloaded from:http://en.wikipedia.org/wiki/Image:July 4 crowd at Vienna Metro station.jpg

although the Hamming distance is not specific to Gaussian noise as thel2 norm, it is robust to Gaussiannoise (see Fig. 3). Normalized cross correlation is invariant to additive and multiplicative gray levelchanges. However, natural light changes include differenteffects, such as shading, spectral reflectance,etc. In addition, when a correlation is computed in the transform domain, it can only be used withrectangular patterns and usually the images are padded so that their height and width are dyadic.

Lucas and Kanade [10] employed the spatial intensity gradients of images to find a good match usinga Newton-Raphson type of iteration. The method is based on Euclidean distance and it assumes that thetwo images are already in approximate registration.

Local descriptors have been used recently for object recognition [11]–[15]. The matching is done byfirst extracting the descriptors and then matching them. Although fast, our approach is faster. In addition,there are cases where the features do not work (see Fig. 7). Ifone knows that the object view does notchange drastically, the invariance of the features can hurtperformance and robustness. In this work wedecided to concentrate on pixel values or simple relation ofpixels as features. Combining the sequentialsampling algorithm approach with the features approach is an interesting extension of this work.

Recently there have been advances in the field of fast object detection using a cascade of rejectors[16]–[20]. Viola and Jones [18] demonstrated the advantages of such an approach. They achieved realtime frontal face detection using a boosted cascade of simple features. Avidan and Butman [19] showedthat instead of looking at all the pixels in the image, one canchoose several representative pixels forfast rejection of non-face images. In this work we do not dealwith classification problems but ratherwith a pattern matching approach. Our approach does not include a learning phase. The learning phasemakes classification techniques impractical when many different patterns are sought or when the soughtpattern is given online,e.g. in the case of patch-based texture synthesis [21], pattern matching in motionestimation, etc.

Hel-Or and Hel-Or [20] used a rejection scheme for fast pattern matching with projection kernels.Their method is applicable to any norm distance, and was demonstrated on the Euclidean distance. Theycompute the Walsh-Hadamard basis projections in a certain order. For the method to work fast the firstWalsh-Hadamard basis projections (according to the specific order) need to contain enough informationto discriminate most images. The method is applicable only to rectangular dyadic patterns.

Cha [22] uses functions that are lower bounds to the sum of absolute differences, and are fast tocompute. They are designed to eliminate non-similar imagesfast. The first function he suggests is the

5

h-distance:∑r−1

k=0 |∑k

l=0(Gl(Im1) − Gl(Im2))|, whereGl(Im) is the number of pixels with gray levellin the intensity histogram of the image,Im, andr is the number of gray levels, usually 256. The timecomplexity is O(r). The second function he suggests is the absolute value of difference between sumsof pixels: |

∑Im1(x, y)−

∑Im2(x, y)|. The method is restricted to thel1 norm and assumes that these

functions can reject most of the images fast. The method is again applicable only to rectangular images.One of the first rejection schemes was proposed by Barnea and Silverman [1]. They suggested the

Sequential Similarity Detection Algorithms - SSDA. The method accumulates the sum of absolute differ-ences of the intensity values in both images and applies a threshold criterion - if the accumulated sumexceeds a threshold, which can increase with the number of pixels, they stop and returnnon-similar. Theorder of the pixels is chosen randomly. Aftern iterations, the algorithm stops and returnssimilar. Theysuggested three heuristics for finding the thresholds for the l1 norm. This method is very efficient but hasone main drawback. None of the heuristics for choosing the thresholds guarantees a bound on the errorrate. As a result the SSDA was said to be inaccurate [23]. Our work is a variation of the SSDA. We usea member of the Image Hamming Distance Family instead of thel1 norm. We also design a samplingscheme with proven error bounds and optimal running time. Asthe SSDA uses thel1 norm, in each figurewhere thel1 norm yields poor results (see Figs. 2, 4, 5 and 6) the SSDA alsoyields poor results.

Mascarenhas et al. [24], [25] used Wald’s Sequential Probability Ratio Test (SPRT) [26] as the samplingscheme. Two models were suggested fork, the random variable of the distance of the sample. The firstconverts the images to binary and thenP (k) is binomially distributed. The second assumes that theimages are Gaussian distributed; henceP (k) is also Gaussian distributed. The likelihood ratio is definedas: λ(k) = P (k|images are non-similar)

P (k|images are similar) . The SPRT samples both images as long as:A < λ(k) < B. Whenλ(k) ≤ A the SPRT stops and returnssimilar. Whenλ(k) ≥ B the SPRT stops and returnsnon-similar.Let the bounds on the acceptable errors beP (false positive) = β, P (false negative) = α. Wald’s [26]approximation forA andB is: A = β

1−α, B = 1−β

α.

There are four main problems with their method. Images are not binary and they are far from beingGaussian distributed [27]–[30]. Our method does not assumeany prior on the images. Mascarenhas et al.assume that all similar images have exactly the same pairwise small distance, whereas any two non-similarimages have exactly the same large distance, an assumption that is faulty. Our framework gets a prioron the distribution of image distances as input. Mascarenhas et al. fail to take into account the fact thatthe true state can be obtained by simply performing a full check, a feature our technique incorporates.The classical SPRT can go on infinitely. There are ways to truncate it [31], but they are not optimal.By contrast, we designed an optimal rejection/acceptance sampling scheme with a restricted number ofsamples.

B. Block Matching for Motion Estimation

Block motion compensation is one of the most commonly used video compression techniques. In thistechnique, the current image frame is first partitioned intofixed size rectangular blocks. Then, the motionvector for each block is estimated by finding the closest block of pixels in the previous frame accordingto a matching criterion. The matching criterion used is typically the l1 norm. This problem is similar topattern matching although there are several key differences. First, the blocks are always rectangular andof a fixed small size (for example16×16). Second, the search space is usually much smaller in the blockmatching problem. Finally, in block matching only the best match is sought, whereas in pattern matchingall similar matches are sought. For a survey on block matching, see Huang et al. [32]. We review fastmethods for block matching connected to our work below.

Chan and Siu [33] divide each block into sub-blocks and choose pixels that are different from theirneighbors from each sub-block. Wang et al. [34] improved this idea by choosing the edge pixels globallyinstead of locally. Bierling [35] applied a uniform 4:1 sub-sampling on each block. Liu and Zaccarin [36],[37] improved the algorithm by alternating four different 4:1 sub-sampling patterns.

Using a Bayesian sequential sampling approach on the problem of block matching for motion estimationwould be an interesting extension of our work.

6

C. Hamming Distance in computer vision

Hamming Distance in computer vision [38]–[44] has usually been applied to a binary image, ordinarilya binary transform of a gray level image. Ionescu and Ralescu’s crisp version of the “fuzzy Hammingdistance” [42] is an exception, where a threshold function is applied to decide whether two colors aresimilar.

A comprehensive review of local binary features of images and their usage for 2D Object detectionand recognition can be found in Amit’s book [38]. Amit suggests using the Hough transform [45] to findarrangements of the local binary features. In Appendix II weshow how the Hough transform can be usedto compute the Hamming distance of a pattern with all windowsof an image. We also show that theexpected time complexity for each window isO(|A| − E[D]), where|A| is the number of pixels in themask of the pattern andD is the random variable of the Hamming distance between a random windowand a random pattern. For the pattern matching in Fig. 2,E[D] = 1736.64, |A| = 2197. Thus, the averagework for each window using the Hough transform is460.36, much higher than the19.70 needed usingour approach, but much less than comparing all the corresponding pixels.

Bookstein et al. [43] proposed the “Generalized Hamming Distance” for object recognition on binaryimages. The distance extends the Hamming concept to give partial credit for near misses. They suggest adynamic programming algorithm to compute it. The time complexity is O(|A|+

∑Im1

∑Im2), where

|A| is the number of pixels and∑

Im is the number of ones in the binary image,Im. Our methodis sub-linear. Another disadvantage is that their method only copes with near misses in the horizontaldirection. We suggest using theLocal Deformationsmethod (see Section III) to handle near misses in alldirections.

D. Sequential Hypothesis Testing

Sequential tests are hypothesis tests in which the number ofsamples is not fixed but rather is a randomvariable. This area has been an active field of research in statistics since its initial development by Wald[26]. A mathematical review can be found in Siegmund [31].

There have been many applications of sampling in computer vision to reduce time complexity that iscaused by the size of the image data. However, most have been applied with a sample of fixed size.Exceptions are [1]–[3], [24], [25], [46]–[48]. The sampling schemes that were used are Wald’s SPRT[26] for simple hypotheses, or a truncated version of the SPRT (which is not optimal) or estimation ofthe thresholds. The Matas and Chum method [2] for reducing the running time of RANSAC [49] is anexcellent example of the importance of optimal design of sampling algorithms.

There are several differences between the above methods andthe one presented here. The first is that inthe pattern matching problem, the hypotheses are compositeand not simple. LetD be the random variableof the Hamming distance between a random window and a random pattern. Instead of testing the simplehypothesisD = d1 against the simple hypothesisD = d2, we need to test the composite hypothesisD ≤ t

against the composite hypothesisD > t. This problem is solved by using a prior on the distribution of theHamming distance and developing a framework that designs anoptimal sampling algorithm with respectto the prior. The second difference is that the efficiency of the design of the optimal sampling algorithmis also taken into consideration. In addition, we present a fast algorithm that designs a sub-optimal, butgood, sampling scheme. Third, the cost of going over all the samples sequentially is usually lower thansampling in random order, a feature we take into account. Finally, as a by-product, our approach returnsthe expected running time and the expected error rate.

III. I MAGE HAMMING DISTANCE FAMILY

A distance measure from the Image Hamming Distance Family isthe number of non-similar corre-sponding elements between two images, where the definition of an element and similarity vary betweenmembers of the family. Below is a formal definition and several examples.

7

A. Formal Definition

sim(Im1, Im2, (x, y)m) → 0, 1 is the similarity function, 1 is for non-similar, 0 is for similar, whereIm1, Im2 are images and(x, y) are the spatial coordinates of a pixel. In all our examplesm is 1 or 2. Ifm = 2 we are testing for similarity between pairs of pixels (see “Monotonic Relations” in Section. III-Bfor an example). We usually omitm for simplicity.

HammingDistanceA(Im1, Im2) =∑

(x,y)m∈A sim(Im1, Im2, (x, y)m) is the Hamming distance be-tween the maskA applied to the imagesIm1, Im2. Note thatA does not need to be a rectangular windowin the image. In fact it does not have to form a connected region.

B. Examples

“Thresholded Absolute Difference”

sim(Im1, Im2, (x, y)) = δ(|(Im1(x, y)) − (Im2(x, y))| > p)

The distance is similar to Gharavi and Mills’s PDC distance [50].

“Thresholdedl2 norm in L*a*b color space”

sim(Im1, Im2, (x, y)) = δ(||L*a*b(Im1(x, y)) − L*a*b(Im2(x, y))||2 > p)

The L*a*b color space was shown to be approximately perceptually uniform [51]. This means thatcolors which appear similar to an observer are located closeto each other in the L*a*b* coordinatesystem.

“Monotonic Relations”An element here is a pair of pixels -[(x1, y1), (x2, y2)]. The pair[Im1(x1, y1), Im1(x2, y2)] is considered

similar to [Im2(x1, y1), Im2(x2, y2)] if the same relation holds between them. For example the relationcan be smaller, equal or greater:

sim(Im1, Im2, [(x1, y1), (x2, y2)]) =

0 if (Im1(x1, y1) > Im1(x2, y2))&(Im2(x1, y1) > Im2(x2, y2))0 if (Im1(x1, y1) = Im1(x2, y2))&(Im2(x1, y1) = Im2(x2, y2))0 if (Im1(x1, y1) < Im1(x2, y2))&(Im2(x1, y1) < Im2(x2, y2))1 otherwise

This distance is invariant to noises that preserve monotonic relations. Thus it is robust to light changes(see Figs. 4 and 5). The distance is equivalent to the Hammingdistance on the Zabih and Woodfillcensustransform [39]. We suggest that for a specific pattern, a reasonable choice forA = [(x1, y1), (x2, y2)]are pairs of indices that correspond to edges;i.e. they are close, while the intensity difference is high.Such pairs are discriminative because of image smoothness.

Another kind of relation that can be used is equality:

sim(Im1, Im2, [(x1, y1), (x2, y2)]) =

0 if (Im1(x1, y1) = Im1(x2, y2))&(Im2(x1, y1) = Im2(x2, y2))0 if (Im1(x1, y1) 6= Im1(x2, y2))&(Im2(x1, y1) 6= Im2(x2, y2))1 otherwise

This kind of distance can be useful near occluding boundaries when the sign of the contrast along theboundary is possibly reversed. Darrell and Covell [52] useda cumulative similarity transform to deal withsuch areas.

“Local Deformations”Local Deformationsis an extension to distance measures of the Image Hamming Distance Fam-

ily which makes them invariant to local deformations,e.g. non-rigid deformations (see Fig. 6). Let

8

(a)

(b) (c) (d)

Fig. 4. Monotonic RelationsHamming distance is robust to light changes and small out of plane and in plane rotations.(a) A non-rectangular pattern of 7569 pixels (631 edge pixelpairs). Pixels not belonging to the mask are in black. (b) A 640x480 pixel imagein which the pattern was sought. (c) The result image. All similar masked windows are marked in white. (d) The two found occurrences ofthe pattern in the image. Pixels not belonging to the mask arein black. Thesequentialalgorithm proceeds at only 0.021 seconds. Offlinerunning time (time spent on the parameterization of thesequentialalgorithm and finding the edge pixels) was 0.009 seconds. Note thesubstantial differences in shading between the pattern andits two occurrences in the image. Also note the out of plane (mostly the head) andin plane rotations of the Maras (the animals in the picture).Using other distances such as CC, NCC,l2, l1 yielded poor results. In particularthe closest window using CC,l2, l1 was far from the Maras. Using NCC the closest window was near the right Mara but it found manyfalse positives before finding the left Mara. The pairs that were used are pairs of pixels belonging to edges,i.e. pixels that have a neighborpixel, where the absolute intensity value difference is greater than 80. Two pixels,(x2, y2), (x1, y1) are considered neighbors if theirl∞distance:max(|x1 − x2|, |y1 − y2|) is smaller or equal to 2. There are 631 such pairs in the pattern. Similar windows are windows whereat least 25% of their pairs exhibit the same relation as in thepattern.

(a)

(b) (c) (d)

(a-zoom)

Fig. 5. Monotonic RelationsHamming distance is robust to light changes and occlusion.(a) A non-rectangular pattern of 2270 pixels (9409 edge pixel pairs). Pixels not belonging to the mask are in black. (b) A 640x480 pixelimage in which the pattern was sought. (c) The result image. All similar masked windows are marked in white. (d) The occurrences of thepattern in the image zoomed in. Pixels not belonging to the mask are in black. Thesequentialalgorithm proceeds at only 0.037 seconds.Offline running time (time spent on the parameterization of the sequentialalgorithm and finding the edge pixels) was 1.219 seconds. Notethe considerable differences in the light between the pattern and the occurrences of the pattern in the image, especially the specular reflectionin the pattern. Also note the difference in the spotting of the frogs and the difference in the pose of the legs (the top right leg is notvisible in the image). Using other distances such as CC, NCC,l2, l1 yielded poor results. In particular the closest window using CC,NCC, l2, l1 was far from the frog. The pairs that were used are pairs of pixels belonging to edges,i.e. pixels that have a neighbor pixel,where the absolute intensity value difference is greater than 80. Two pixels,(x2, y2), (x1, y1) are considered neighbors if theirl∞ distance:max(|x1 − x2|, |y1 − y2|) is smaller or equal to 5. There are 9409 such pairs in the pattern. Similar windows are windows where at least25% of their pairs exhibit the same relation as in the pattern.

9

sim(Im1, Im2, (x, y)) be the similarity function of the original Hamming distancemeasure. Letε =(εx, εy) be a shift. Let(Im)ε(x, y) = Im(x + εx, y + εy). We denote byΓ the set of accepted shifts. TheLocal Deformationsvariant similarity function of this Hamming distance measure is:

sim(Im1, Im2, (x, y)) = minε∈Γ

sim(Im1, (Im2)ε, (x, y))

Brunelli and Poggio [53] used a similar technique to make CC more robust.

(a)

(b) (c)

(a-zoom)

(d)

Fig. 6. Local Deformationsis robust to non-rigid deformations.(a) A non-rectangular pattern (snake skin) of 714 pixels. Pixels not belonging to the mask are in black. (b) A 640x480 pixel image in whichthe pattern was sought. (c) The result image. All similar masked windows are marked in white. (d) The found occurrence of the pattern in thezoomed in image. Pixels not belonging to the mask are in black. The sequentialalgorithm proceeds at only 0.064 seconds. Offline runningtime (time spent on the parameterization of thesequentialalgorithm) was 0.007 seconds. Using other distances such asCC, NCC, l2, l1yielded poor results. In particular the closest window using CC, NCC,l2, l1 was far from the snake skin. SIFT descriptor matching [12],also yielded poor results (see Fig. 7). The distance that wasused is theLocal Deformationsvariant of theThresholded Absolute Differencedistance with a threshold of 20. The group of shifts isΩ = ±1,±1, i.e. 8-neighbors. Similar windows are windows where at least 5% oftheir pixels (or neighbors) have al1 distance smaller or equal to 20.

(SIFT-1) (SIFT-5)

Fig. 7. SIFT descriptor matching [12] on the pattern matching in Fig. 6. The pattern in the left part of the figures is zoomed. (SIFT-1)The correspondences between the eleven SIFT descriptors inthe pattern and the most similar SIFT descriptors in the image. Note that allcorrespondences are false. (SIFT-5) The correspondences between the eleven SIFT descriptors in the pattern and the fivemost similar SIFTdescriptors in the image (each one to five correspondence groups has a different symbol). Note that only one correspondence is true. It isthe fifth most similar correspondence of the descriptor and is marked with a circle.

10

C. Advantages

Members of the Image Hamming Distance Family can be invariant to light changes, small deformations,etc. Invariance is achieved by “plugging in” the right similarity function,sim.

Members of the Image Hamming Distance Family have an inherent robustness to noise that changesthe value of pixels completely, for example, out of plane rotation, shading, spectral reflectance, occlusion,etc. In the Hamming distance we simply regard pixels with this kind of noise as different, regardless oftheir intensity. Norms such as the Euclidean add irrelevantinformation; namely, the difference betweenthe intensity values of such pixels and the image. For example, if we search for a window such that thedistance between the window and the pattern is smaller thant, the search will be invariant up tot of suchpixels.

The Euclidean norm is most suited to deal with Gaussian noise. The difference between images of thesame object comes from occlusions, geometrical transforms, light changes and non-rigid deformations.None of these can be modeled well with a Gaussian distribution.

Although it might seem that members of the Image Hamming Distance Family are not robust becausethe similarity function of an elementsim is a threshold function, it is in fact robust because it is a sumof such functions.

Finally, the simplicity of the Image Hamming Distance Family allows us to develop a tractable Bayesianframework that is used to design an optimal rejection/acceptance sampling algorithm. After we design thesampling algorithm offline, it can quickly determine whether two images are similar.

IV. SEQUENTIAL FRAMEWORK

We first present thesequentialalgorithm that assesses similarity by a sequential test. Then we evaluateits performance and show how to find the optimal parameters for the sequentialalgorithm. Finally weillustrate how to quickly find sub-optimal but good parameters for thesequentialalgorithm.

A. Sequential algorithm

The sequentialalgorithm, Alg. 1, uses a decision matrixM . M [k, n] is the decision after samplingknon-similar corresponding pixels out of a total ofn corresponding pixels. The decision can beNS=returnnon-similar, S=return similar, E=compute the exact distance and return the true answer orC=continuesampling. The last column cannot beC as the test has to end there, see the diagram in Fig. 8. We randomsample uniformly as we do not want to make any assumptions about the noise. We sample withoutreplacement for efficiency.

The framework computes the optimal decision matrix offline.Then, the algorithm can quickly decidewhether a pattern and a window are similar,i.e. if their Hamming distance is smaller or equal to the imagesimilarity threshold,t. Note that the optimal decision matrix does not have to be computed for each newpattern. It should be computed once for a given prior on the distribution of the distances, desired errorbounds and the size of patterns. For example, if the task is finding 30× 30 patterns, then it is enough tocompute the decision matrix once.

B. Evaluating performance of a fixed decision matrix

The performance of the algorithm is defined by its expected running time and its error probabilities:PM(false negative) = Probability of returningnon-similaron similar windows.PM(false positive) = Probability of returningsimilar on non-similar windows.EM(running time) = Expected running time.

We denote byIn(k, n) the event of samplingk non-similar corresponding pixels out of a total ofn

corresponding pixels, in a specific order. As we sample without replacement we get:

11

Algorithm 1 sequentialM(pattern, window, A)k ⇐ 0for n = 0 to |A| do

if M [k, n] = NS thenreturn non-similar

if M [k, n] = S thenreturn similar

if M [k, n] = E thenreturn (HammingDistanceA(pattern, window)) ≤ t

random sample uniformly and without replacement coordinates (x, y) from A

k ⇐ k + sim(pattern, window, (x, y)) add 1 if pixels are non-similar

C C C C C C C CC C

NS SC C C C C C C CC C

NS SC C C C CC C E E E

NS SC C C C C C CC ENS

NS SC C C C C CNS NS C S

NS C C C C CNS CC S S

NS C C C C CNS CC S

NS C C C CNS CC S

NS C C CNS CNS S

C C SNS NS NS

NS NS NS

2119181711 12 13 14 15 16 200 1 2 3 4 5 6 7 8 9 10

01

23

45

67

89

10

S

n = #corresponding pixels sampled

k=

#n

on

-sim

ilar

corr

esp

on

din

gp

ixel

ssa

mp

led

Fig. 8. Graphical representation of the decision matrix,M which is used in thesequentialalgorithm (Alg. 1). In each step the algorithmsamples corresponding pixels and goes right if they are similar or right and up if they are non-similar. If the algorithm touches a redNSpoint,it returnsnon-similar, with the risk of a false negative error. If the algorithm touches a yellowE point, it performs the exact computation ofthe distance and returns the true answer. If the algorithm touches a greenS point, it returnssimilar, with the risk of a false positive error.In this example, the size of the pattern,|A| is 21 and the threshold for image similarity,t is 9.

P (In(k, n)|D = d) =

(∏k−1i=0

d−i|A|−i

)(∏n−k−1i=0

|A|−d−i

|A|−k−i

)if (d ≥ k)&(|A| − d ≥ n − k)

0 otherwise(1)

The naive computation ofP (In(k, n)|D = d) for eachk, n andd runs inO(|A|4). In order to reducetime complexity toO(|A|3), we use a dynamic programming algorithm to compute the intermediate sums,ΩS[k, n] andΩNS[k, n] (see Eq. 2) for eachk andn whereP (D = d) is the prior on the distribution ofthe Hamming distance (see Section V).

12

ΩS[k, n] =

t∑

d=0

P (In(k, n)|D = d)P (D = d)

ΩNS[k, n] =

|A|∑

d=t+1

P (In(k, n)|D = d)P (D = d)

(2)

We denote byRI the running time of taking one sample and byRE(n) the running time of performingthe exact computation. Note that as we sample without replacement the exact computation can be carriedout on the indices that we have not sampled yet. Thus, it depends onn.

In each step the algorithm samples coordinates(x, y) and addssim(pattern, window, (x, y)) to thesample dissimilarity sum,k. We denote a specific run of the algorithm as a sequence of random variables:s1, s2, . . . wheresn ∈ 0, 1 is the result ofsim(pattern, window, (x, y)) in iteration numbern. We denoteby ΨM [k, n] the number of different sequences ofs1, s2, . . . , sn with k ones andn − k zeros which willnot cause thesequentialalgorithm that uses the decision matrixM to stop at an iteration smaller thann.Graphically (see Fig. 8)ΨM [k, n] is the number of paths from the point(0, 0) to the point(k, n) that donot touch a stopping point (S,NS,E). Alg. 2 computesΨM with time complexity of O(|A|2).

Algorithm 2 computeΨM

Ψ[0...|A|, 0...|A|] ⇐ 0k = 0 n = 0while M [k, n] = C do

Ψ[k, n] = 1n = n + 1

Ψ[k, n] = 1for n = 1 to |A| do

for k = n to 1 doif M [k, n − 1] = C then

Ψ[k, n] ⇐ Ψ[k, n] + Ψ[k, n − 1]if M [k − 1, n − 1] = C then

Ψ[k, n] ⇐ Ψ[k, n] + Ψ[k − 1, n − 1]return Ψ

Now we can compute (see full derivation in Appendix III) the error probabilities and expected runningtime explicitly using a prior on the distribution of the Hamming distance ,P (D = d) (see Section V):

PM(false negative) =

∑(k,n):

M(k,n)=NS

Ψ[k, n]ΩS[k, n]

P (D ≤ t)(3)

PM(false positive) =

∑(k,n):

M(k,n)=S

Ψ[k, n]ΩNS[k, n]

P (D > t)(4)

EM(running time) = EM(# samples)RI + PM(compute exact)RE(n) (5)

=∑

(k,n):M(k,n)∈S,NS,E

Ψ[k, n](ΩS[k, n] + ΩNS[k, n])nRI

+∑

(k,n):M(k,n)=E

Ψ[k, n](ΩS[k, n] + ΩNS[k, n])RE(n)

13

C. Finding the optimal decision matrix

Our goal is to find the decision matrix,M that minimizes expected running time given bounds on theerror probabilities,α, β:

arg minM

EM(running time) s.t : PM(false negative) ≤ α, PM(false positive) ≤ β, (6)

Instead of solving Eq. (6) directly we assign two new weights: w0 for a false negative error event andw1 for a false positive error event,i.e. we now look for the decision matrix,M that solves Eq. (7):

arg minM

loss(M, w0, w1) s.t :

loss(M, w0, w1) = EM (running time) + PM(false negative)P (D ≤ t)w0 +

PM(false positive)P (D > t)w1

(7)

Following the solution of Eq. (7) we show how to use it to solveEq. (6). We solve Eq. (7) using thebackward induction technique [54]. The backward inductionalgorithm, Alg. 3 is based on the principlethat the best decision in each step is the one with the smallest expected addition to the loss function.In Appendix IV we show how to explicitly compute the expectedadditive loss of each decision in eachstep.

Algorithm 3 backward(w0, w1)for k = 0 to |A| do

M [k, |A|] ⇐ arg mindecision∈NS,S,E E[addLoss(decision)|k, |A|]for n = |A| − 1 to 0 do

for k = 0 to n doM [k, n] ⇐ arg mindecision∈NS,S,E,C E[addLoss(decision)|k, n]

return M

If we find error weights,w0, w1 such that the decision matrix,M that solves Eq. (7) has errorprobabilities,PM(false negative) = α andPM(false positive) = β, then we have also found the solutionto the original minimization problem Eq. (6). See Appendix V, Theorem 1 for the proof.

In order to find the error weights,w0, w1 which yield a solution with errors as close as possible to therequested errors (α for false negative andβ for false positive) we perform a search (Alg. 4). The searchcan be done on the rangesw0 ∈ [0,

RE(0)

αP (D≤t)] andw1 ∈ [0,

RE(0)

βP (D>t)] as it guarantees that the error will be

small enough (see Appendix V, Theorem 2).Alg. 4 returns a decision matrix with minimum running time compared to all other decision matrices

with fewer or equal error rates (see Appendix V, Theorem 1). However, as the search is on two parameters,the search for the requested errors can fail. In practice, the search always returns errors which are veryclose to the requested errors. In addition, if we restrict one of the errors to be zero, the search is on oneparameter, hence a binary search returns a solution with errors as close as possible to the requested errors.If Alg. 4 fails to return a decision matrix with errors close to the requested errors, an exhaustive searchof the error weights,w0, w1 with high resolution can be performed.

One special case that allows us to find the optimal decision matrix faster is the case in which all similarimages have a Hamming distance of zero,i.e. P (D ≤ t) = P (D = 0). Thus, once we sample one non-similar corresponding pixel we know that the images are non-similar. The optimal decision matrix can be oftwo types and Eq. (6) can be solved by going over all possibilities of the decision matrix,M (The number

of C=continue sampling can be between zero and|A| − 1):

2

6666664

NS . . . NS

.

.

.. . .

.

.

.NS . . . NS

| z

C

E E. . . E

3

7777775

or

2

6666664

NS . . . NS

.

.

.. . .

.

.

.NS . . . NS

| z

C

S S. . . S

3

7777775

14

Algorithm 4 searchOpt(α, β)

uw0 ⇐RE(0)

αP (D≤t)dw0 ⇐ 0 uw1 ⇐

RE(0)

βP (D>t)dw1 ⇐ 0

repeatmw0 ⇐

uw0+dw0

2mw1 ⇐

uw1+dw1

2M ⇐ backward(mw0, mw1)

PM(false negative) ⇐ 1P (D≤t)

∑(k,n):

M(k,n)=NS

Ψ[k, n]ΩS [k, n]

PM(false positive) ⇐ 1P (D>t)

∑(k,n):

M(k,n)=S

Ψ[k, n]ΩNS [k, n]

if PM(false negative) > α thendw0 ⇐ mw0

elseuw0 ⇐ mw0

if PM(false positive) > β thendw1 ⇐ mw1

elseuw1 ⇐ mw1

until |PM(false negative) − α| + |PM(false positive) − β| < ε

return M

D. Finding a sub-optimal decision matrix using P-SPRT

Above, we showed how to find the optimal decision matrix. The search is done offline for eachcombination of desired error bound, size of pattern and prior and not for each sought pattern. However,this process is time consuming. In this section we describe an algorithm that quickly finds a sub-optimaldecision matrix.

Our goal is again to find the decision matrix that minimizes the expected running time, given bounds onthe error probabilities (see Eq. 6). However, we now discardthe option of performing an exact computationof the Hamming distance. For this problem, we present a sub-optimal solution based on Wald’s SequentialProbability Ratio Test (SPRT) [26]. We call this test the “Prior based Sequential Probability Ratio Test”,P-SPRT.

The classical SPRT [26] is a test between two simple hypotheses, i.e. hypotheses that specify thepopulation distribution completely. For example, letD be the random variable of the Hamming distancebetween a random window and a random pattern. A test of simplehypotheses isD = d1 againstD = d2.However, we need to test the composite hypothesisD ≤ t against the composite hypothesisD > t. Thisproblem is solved by using a prior on the distribution of the Hamming distance,D.

We now define the likelihood ratio. We denote byIn(k, n) the event of samplingk non-similarcorresponding pixels out of a total ofn corresponding pixels, in a specific order. The likelihood ratio,λ(In(k, n)) is:

λ(In(k, n)) =P (In(k, n)|D > t)

P (In(k, n)|D ≤ t)=

∑t

d=0 P (In(k, n), D = d|D > t)∑|A|

d=t+1 P (In(k, n), D = d|D ≤ t)

=

∑t

d=0 P (In(k, n), D = d)P (D ≤ t)∑|A|

dt+1 P (In(k, n), D = d)P (D > t)

(8)

The P-SPRT samples both images as long as:A < λ(In(k, n)) < B. When λ(In(k, n)) ≤ A theP-SPRT stops and returnssimilar. When λ(In(k, n)) ≥ B the P-SPRT stops and returnsnon-similar.Let the bounds on the acceptable errors beP (false positive) = β, P (false negative) = α. Wald’s [26]approximation forA andB is: A = β

1−α, B = 1−β

α.

15

The optimal character of the SPRT was first proved by Wald and Wolfowitz [55]. For an accessibleproof see Lehmann [56]. The proof is for simple hypotheses. However, replacing the likelihood ratio inthe Lehmann proof with the prior based likelihood ratio (seeEq. 8) demonstrates that the P-SPRT is asub-optimal solution to Eq. 6. Note that the sub-optimalityof the P-SPRT is unlike the sub-optimality ofthe classical SPRT. The classical SPRT for simple hypotheses is sub-optimal for all priors. The proposedP-SPRT is sub-optimal only with respect to the given prior,P (D = d).

The SPRT and P-SPRT are sub-optimal because of the “overshoot” effect, i.e. because the samplingis of discrete quantities, and finding a P-SPRT with the desired error rates may not be possible. In ourexperiments Wald’s approximations gave slightly lower error rates and a slightly larger expected samplesize. An improvement can be made by searchingA andB for an error closer to the desired error bound.This can be done withO(|A|2) time complexity and(O(|A|)) memory complexity for each step of thesearch (where|A| is the size of the mask). However, we have no bound on the number of steps that needsto be made in the search. In practice, Wald approximations give good results.

As we decline the option of performing an exact computation,the search for the optimal decision matrixis equivalent to a search for two monotonic increasing lines. First is the line of acceptance (see Fig. 8greenS line); i.e. if the sequentialalgorithm touches this line it returnssimilar. Second is the line ofrejection (see Fig. 8 redNS line); i.e. if the sequentialalgorithm touches this line it returnsnon-similar.

We now describe an algorithm (Alg. 5) that computes the line of rejection inO(|A|2) time complexityandO(|A|) memory complexity. The computation of the line of acceptance is similar. For each numberof samples,n, we test whether the height of the point of rejection can staythe same as it was for thelast stage, or whether it should increase by one. For this purpose we need to compare the likelihood ratiowith the thresholdB. In order to compute the likelihood ratio fast, we keep a cache of the probability ofbeing in the next rejection point and that the true distance is equal tod. The cache is stored in the arrayP [In(k, n), D = d] for each distanced. Thus its size is|A| + 1. The initialization of the cache is:

P [In(k = 1, n = 1), D = d] = P [In(k = 1, n = 1)|D = d]P [D = d] =d

|A|P [D = d] (9)

The update of the cache, whereb = 0 or 1, is:

P [In(k + b, n + 1), D = d] = P [D = d]P [In(k + b, n + 1)|D = d]

= P [D = d]P [In(k, n)|D = d]P [next b|In(k, n), D = d]

= P [In(k, n), D = d]P [next b|In(k, n), D = d]

(10)

where:

P [next 0|In(k, n), D = d] = min

(1,

max(0, ((|A| − d) − (n − k))

)

|A| − n

)(11)

P [next 1|In(k, n), D = d] = min

(1,

max(0, (d − k)

)

|A| − n

)(12)

For numerical stability, the cache in Alg. 5 can be normalized. In our implementation we storeP (D =d|In(k, n)) instead ofP (D = d, In(k, n)).

E. Implementation note

The fastest version of our algorithm is a version of thesequentialalgorithm that does not check itsplace in a decision matrix. Instead, it only checks whether the number of non-similar elements sampledso far is equal to the minimum row number in the decision matrix that is equal toNS. In other words,

16

Algorithm 5 computeRejectionLine(|A|, α, β, P [D])

B ⇐ 1−β

α

rejectionLine[0] ⇐ 1 Never reject after 0 samplesk ⇐ 1 Try (1,1) as first rejection pointfor d = 0 to |A| do

P [In(k, n), D = d] ⇐ d|A|

P [D = d]

for n = 1 to |A| do

likelihoodRatio⇐

(P

t

d=0 P (In(k,n),D=d)P (D≤t)P|A|

dt+1 P (In(k,n),D=d)P (D>t)

)

if likelihoodRatio> B thenfor d = 0 to |A| do

P [In(k, n), D = d] ⇐ P [In(k, n), D = d]P [next 0|In(k, n), D = d]else

for d = 0 to |A| doP [In(k, n), D = d] ⇐ P [In(k, n), D = d]P [next 1|In(k, n), D = d]

k ⇐ k + 1rejectionLine[n] ⇐ k

return rejectionLine

we simply check whether we have only hit the upper rejection line (see Fig. 8 red line ofNS). If wefinish sampling all the corresponding elements and we have not touched the upper line, the window isunquestionably similar. In fact, we even compute the exact Hamming distance in such cases. There is anegligible increase in the average number of samples, as we do not stop on similar windows as soon asthey are definitely similar. However, the event of similarity is so rare that the reduction inRI , the runningtime of processing each sample, reduces the total running time.

V. PRIOR

The proposed frameworks are Bayesian,i.e. they use a prior on the distribution of the distances betweentwo natural images,P (D = d). The prior can be estimated, offline, by computing the exact distancebetween various patterns and windows. Another option is to use a non-informative prior,i.e. a uniformprior in which the probability for each possible distance isequal. Fig. 9 and Fig. 10 show that the truedistribution of distances is not uniform. Nevertheless, Fig. 15 shows that even though we use an incorrect(uniform) prior to parameterize the algorithm, we obtain good results. It should be stressed that otherfast methods assume certain characteristics of images. Forexample, Hel-Or and Hel-Or [20] assume thatthe first Walsh-Hadamard basis projections (according to their specific order) contain enough informationto discriminate most images. Mascarenhas et al. [24], [25] assume that images are binary or Gaussiandistributed. In addition, they assume that all similar images have exactly the same pairwise small distance,while all two non-similar images have exactly the same largedistance. By explicitly using a prior ourmethod is more general.

For each distance measure and pattern size, we estimated theprior using a database of 480 naturalimages. First, noise that changes the value of pixels completely was added to each image. To simulatesuch noise we chose a different image at random from the test database and replaced between 0% to 50%(the value was chosen uniformly), with replacement, of the original image pixels with pixels from thedifferent image in the same relative place.

For each image we computed the set of distances between two patterns (each a not too smooth randomlychosen 2D window from the image before the addition of the noise) and a sliding window over the noisyimage. The prior that was used is a mixture model of this histogram and a uniform prior (with a very

17

small probability for uniformity). We used a mixture model as we had almost no observations of smalldistances.

Fig. 9 shows that priors of the same Hamming distance for different pattern sizes are similar. Fig. 10shows that the distances in color images are higher. In addition it shows that as the distance measurebecomes more invariant, the distances are smaller.

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 200 400 600 8000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 500 1000 1500 20000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 1000 2000 30000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

(a) (b) (c) (d)

Fig. 9. Estimated cumulative PDFs of priors ofThresholded Absolute DifferenceHamming distance with threshold 20 (pixels with intensitydifference greater than 20 are considered non-similar) forpatterns size: (a)15×15 (b) 30×30 (c) 45×45 (d) 60×60. Note that the shapesof the priors are similar.

0 1000 2000 30000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 1000 2000 30000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 1000 2000 30000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

0 1000 2000 30000

0.2

0.4

0.6

0.8

1

d=Hamming Distance

P(D

≤d)

(a) (b) (c) (d)

Fig. 10. Estimated cumulative PDFs priors ofThresholdedl2 norm in L*a*b color spaceHamming distance for60 × 60 patterns, withthresholds: (a) 100 (b) 300 (c) 500 (d) 1000. Note that the distances in color images are higher. Also note that as the distance measurebecomes more invariant (with a higher threshold), the distances are smaller.

VI. EXPERIMENTAL RESULTS

The proposed frameworks were tested on real images and patterns. The results show that thesequentialalgorithm is fast and accurate, with or without noise.

We set the false positive error (percent of non-similar windows that the algorithm returns as similar)to zero in all experiments. Setting it to a higher value decreases the running time mostly for similarwindows. As it is assumed that similarity between pattern and image is a rare event, the speedup causedby a higher bound on the false positive is negligible. We set the false negative error bound (percent ofsimilar windows that the algorithm is expected to classify as non-similar) to 0.1%;i.e. out of 1000 similarwindows, only one is expected to be classified as non-similar.

A typical pattern matching task is shown in Fig. 2. A non-rectangular pattern of 2197 pixels wassought in a sequence of 14, 640x480 pixel frames. We searchedfor windows with aThresholded AbsoluteDifferenceHamming distance lower than0.4 × 2197, i.e. less than 40% noise that changes the valueof pixels completely such as out of plane rotation, shading,spectral reflectance, occlusion, etc. Twopixels were considered non-similar if their absolute intensity difference was greater than 20,i.e. p = 20.The sequentialalgorithm was parameterized with P-SPRT (see Section IV-D), a uniform prior and falsenegative error bound of 0.1%. Using the parameterizedsequentialalgorithm, the pattern is found in 9out of 11 frames in which it was present, with an average of only 19.70 pixels examined per windowinstead of 2197 needed for the exact distance computation. On a Pentium 4 3GHz processor, detection ofthe pattern proceeds at 0.022 seconds per frame. The false positive error rate was 0%. The false negative

18

TABLE II

SUMMARY OF FIGURE RESULTS

Fig. (a) (b) (c) (d) (e) (f) (g)Distance |A| = Max False Average Offline Onlinetype Mask Diff Negative Elements Time Time

Size (%) (%) Sampled (seconds)(seconds)

2 20TAD(1) 2197 40 0.25 19.70 0.067 0.0223 20TAD(1) 1089 40 0.39 12.07 0.018 0.0194 MR(2) 631 25 0.07 35.28 0.009 0.0215 MR(2) 9409 25 0.17 39.98 1.219 0.0376 LD-20TAD(3) 714 5 0.034 17.23 0.007 0.064

(a) Distance types:1) 20TAD - Thresholded Absolute Difference, with threshold(p) of 20.2) MR - Monotonic Relations.3) LD-20TAD - Local Deformationsvariant ofThresholded Absolute Difference, with threshold(p) of 20.

(b) Number of pixels inThresholded Absolute Differencedistances, or number of pairs of pixels inMonotonic Relationsdistance.

(c) Maximum percentage of pixels, or pairs of pixels, that can be different in similar windows. For example, in Fig. 2, similar windows

Hamming distance is less than( 40100

)2197 = 878.

(d) The false negative error rate (percentage of similar windows that the algorithm returned as non similar). For example, in Fig. 2, on

average out of 10000 similar windows, 25 were missed. Note that due to image smoothness, there were several similar windows in each

image near each sought object. The errors were mostly due to missing one of these windows.

(e) Average number of pixels sampled inThresholded Absolute Differencedistances, or average number of pairs of pixels sampled in

Monotonic Relationsdistances.

(f) Running time of the parameterization of thesequentialalgorithm. In addition, inMonotonic Relationsdistances it also includes the

running time of finding the pairs of pixels that belong to edges.

(g) Running time of pattern detection using thesequentialalgorithm, where each image is 640x480 pixels in size.

error rate was 0.25%. Note that due to image smoothness, there are several similar windows in each framenear the sought object. The errors were mostly due to missingone of these windows. Although we use anincorrect (uniform) prior to parameterize the algorithm, we obtain excellent results. Other distances suchas cross correlation (CC), normalized cross correlation (NCC), l1, l2, yielded poor results even thoughthey were computed exactly (the computation took much longer).

More results are given in Figs. 2, 3, 4, 5 and 6. All of these results are on 640x480 pixel imagesand use thesequentialalgorithm that was parameterized with P-SPRT (see Section IV-D), a uniformprior and false negative error bound of 0.1%. These results are also summarized in Table II. Note thatthe parameters (pixel similarity threshold,p and relative image similarity threshold,t

|A|) are the same

for each kind of distance. These parameters were chosen as they yield good performance for imagesexperimentally. They do not necessarily give the best results. For example, on Fig. 3, usingThresholdedAbsolute Differencedistance with threshold of pixels,p equal 100 and the threshold for image similarity,t equal 0, thesequentialalgorithm ran in only 0.013 seconds. The average number of pixels examinedper window was only 2.85 instead of 1089 needed for the exact distance computation. The false negativeerror rate was 0%.

To illustrate the performance of Bayesian sequential sampling, we also conducted extensive randomtests. The random tests were conducted mainly to illustratethe characteristics of thesequentialalgorithmand to compare its parameterization methods. A test database (different from the training database thatwas used to estimate priors) of 480 natural images was used. We consider similar windows as windowswith a Hamming distance smaller/equal to 50% of their size;e.g.a 60× 60 window is considered similarto a60×60 pattern if the Hamming distance between them is smaller/equal to 1800. In all of the random

19

tests, we setRI = 1 andRE(n) = |A|−n; i.e. we tried to minimize the average number of samples taken.For comparison we also developed an optimal fixed size sampling algorithm,fixedsize(see Appendix

I). Each test of thefixedsizealgorithm or sequentialalgorithm in Figs. 13, 14 and 15 was conductedusing a different combination of members of the Image Hamming Distance Family and different sizes ofpatterns. For each such combination a prior was estimated (see Section V). In order to parameterize thefixedsizeand thesequentialalgorithms, we used either the estimated prior or a uniform prior.

Each test of the parameterized algorithms was conducted by performing 9600 iterations (20 times foreach image) as follows:

• A random not too smooth 2D window pattern was chosen from one of the images,Im, from the testdatabase.

• Noise that changes the value of pixels completely was added to the image,Im. To simulate suchnoise we chose a different image at random from the test database and replaced between 0% to 50%(the value was chosen uniformly), with replacement, of the original image (i.e. Im) pixels with pixelsfrom the different image in the same relative place.

• The pattern was sought for in the noisy image, using the parameterizedsequentialalgorithm orparameterizedfixedsizealgorithm.

In each test the false negative error rate and the average number of pixels examined per window werecalculated. Overall, the results can be summarized as follows:

1) Even with very noisy images thesequentialalgorithm is very fast and accurate. For example, theaverage number of pixels sampled for pattern matching on60 × 60 patterns with additive noise ofup to 20 (each pixel gray value change can range from -20 to +20) and noise that changes the valueof pixels completely of up to 50% was only 92.9, instead of 3600. The false negative error rate wasonly 0.09% (as mentioned above, the false positive error rate was always 0%).

2) Thesequentialalgorithm is much faster than thefixedsizealgorithm, with the same error rates. Inaddition, usually thesequentialalgorithm is less sensitive to incorrect priors (see Fig. 13).

3) The performance of the sub-optimal solution, P-SPRT, is good (see Fig. 14).4) The average number of pixels examined per window is slightly smaller with the uniform prior.

However, the error rate is higher (although still small). Thus, there is not a substantial difference inperformance when using an incorrect (uniform) prior (see Fig. 15).

To further illustrate the robustness of the method we conducted another kind of experiment. Five imagetransformations were evaluated: small rotation; small scale change; image blur; JPEG compression; andillumination. The names of the datasets used arerotation; scale; blur; jpeg; and light respectively. Theblur, jpeg and light datasets were from the Mikolajczyk and Schmid paper [13]. Our method is robustto small but not large geometrical transforms. Thus, it did not perform well on the geometrical changesdatasets from the Mikolajczyk and Schmid paper [13]. We created two datasets with small geometricaltransforms: ascaledataset that contains 22 images with an artificial scale change from 0.9 to 1.1 in jumpsof 0.01; and arotation dataset that contains 22 images with an artificial in-plane rotation from -10 to10 in jumps of 1(see for example Fig. 12).

For each collection, ten rectangular patterns were chosen from the image with no transformation. Thepairs that were used in the mask of each pattern were pairs of pixels belonging to edges,i.e. pixelsthat had a neighbor pixel, where the absolute intensity value difference was greater than 80. Two pixels,(x2, y2), (x1, y1) are considered neighbors if theirl∞ distance:max(|x1−x2|, |y1−y2|) is smaller or equalto 2. We searched for windows with aMonotonic RelationsHamming distance lower than0.25× |A|. Ineach image we considered only the window with the minimum distance as similar, because we knew thatthe pattern occurred only once in the image. As in the figure experiments, thesequentialalgorithm wasparameterized using P-SPRT (see Section IV-D) with input ofa uniform prior and a false negative errorbound of 0.1%. We repeated each search of a pattern in an image1000 times.

We defined two new notions of performance: miss detection error rate and false detection error rate.As we know the true homographies between the images, we know where the pattern pixels are in the

20

transformed image. We denote a correct match as one that covers at least 80% of the transformed patternpixels. A false match is one that covers less than 80% of the transformed pattern pixels. Note that there isalso an event of no detection at all if thesequentialalgorithm does not find any window with aMonotonicRelationsHamming distance lower than0.25 × |A|. The miss detection error rate is the percentage ofsearches of a pattern in an image that does not yield a correctmatch. The false detection error rate is thepercentage of searches of a pattern in an image that yields a false match. Note that in the random teststhat illustrated the performance of the Bayesian sequential sampling, it was not possible to use these errornotions. In these tests we used a large number of patterns that were chosen randomly, thus we could notguarantee that the patterns did not occur more than once in these test images.

In the light and jpeg tests, the performance was perfect;i.e. 0% miss detection rate and 0% falsedetection rate. In theblur test, only one pattern was not found correctly in the most blurred image (seeFig. 12). The miss detection rate and false detection rate for this specific case was 99.6%. In all otherpatterns and images in theblur test, the miss detection rate and false detection rate was 0%. In thescaletest, there was only one pattern with false detection in two images with scale 0.9 and 0.91. In therotationtest, there was only one pattern with false detection in images with rotation smaller than -2 or biggerthan +2. Miss detection rates in thescaleandrotation tests (see Fig. 11) were dependant on the pattern.If the scale change or rotation was not too big, the pattern was found correctly.

The average number of pair of pixels that thesequentialsampled per window was not larger than 45 inall of the above tests. The average was 29.38 and the standarddeviation was 4.22. In general, the numberof samples decreased with image smoothness;e.g. it decreased with image blur, lack of light and JPEGcompression (see for example Fig. 11).

replacements

Miss Detection Rate 100% <1% 0%

scale

pat

tern

nu

mb

er

0.9 0.92 0.94 0.96 0.98 1 1.02 1.04 1.06 1.08 1.1

123456789

10

0.4

0.2

rotation

pat

tern

nu

mb

er

-10 -8 -6 -4 -2 0 2 4 6 8 10

123456789

10

0.50.3

12345678910

JPEG compression level

#av

gsa

mp

les

0 1 2 3 4 55

10

15

20

25

30

35

(a) (b) (c)

Fig. 11. (a) Miss detection error rates on thescaletest. (b) Miss detection error rates on therotation test. (c) Average number of pairs ofpixels that thesequentialalgorithm sampled per window in thejpeg test.

VII. CONCLUSIONS

This paper introduced the “Image Hamming Distance Family”.We also presented a Bayesian frameworkfor sequential hypothesis testing on finite populations that designs optimal sampling algorithms. Finally,we detailed a framework that quickly designs a sub-optimal sampling algorithm. We showed that thecombination of an optimal sampling algorithm and members ofthe Image Hamming Distance Familygives a robust, real time, pattern matching method.

Extensive random tests show that thesequentialalgorithm performance is excellent. Thesequentialalgorithm is much faster than thefixedsizealgorithm with the same error rates. In addition, thesequentialis less sensitive to incorrect priors. The performance of the sub-optimal solution, P-SPRT, is good. It isnoteworthy that performance using a incorrect (uniform) prior to parameterize thesequentialalgorithm isstill quite good.

The technique explained in this paper was described in an image pattern matchingcontext. Howeverwe emphasize that this is an example application. Sequential hypothesis tests on finite populations are

21

(a) (b)Fig. 12. (a) The single false detection event on theblur test. (b) An example of detection on therotation test. The image is 5 artificiallyin-plane rotated.

0 3 5 10 15 20 25 300

100

200

300

pavg(

#pix

els

exam

ined

)

0 3 5 10 15 20 25 300

100

200

300

pavg(

#pix

els

exam

ined

)

0 3 5 10 15 20 25 300

200

400

600

800

pavg(

#pix

els

exam

ined

)

0 3 5 10 15 20 25 300

200

400

600

800

pavg(

#pix

els

exam

ined

)

(a) (b) (c) (d)

0 3 5 10 15 20 25 300.0%

0.1%

0.2%

p

fals

ene

gativ

e

0 3 5 10 15 20 25 300.0%

0.4%

0.8%

1.2%

1.6%

2.0%

p

fals

ene

gativ

e

0 3 5 10 15 20 25 300.0%

0.1%

0.2%

p

fals

ene

gativ

e

0 3 5 10 15 20 25 300.0%

0.4%

0.8%

1.2%

1.6%

2.0%

p

fals

ene

gativ

e

(e) (f) (g) (h)

fixedsize sequential

Fig. 13. Comparing thefixed sizealgorithm with thesequentialalgorithm.(a),(e) Results on30 × 30 patterns where the algorithms were parameterized using an estimated prior. (b),(f) Results on30 × 30 patternswhere the algorithms were parameterized using a uniform prior. (c),(g) Results on60×60 patterns where the algorithms were parameterizedusing an estimated prior. (d),(h) Results on60× 60 patterns where the algorithms were parameterized using a uniform prior. Thesequentialalgorithm is much faster than thefixed sizealgorithm, with the same error rates. In addition, thesequentialalgorithm is less sensitive toincorrect priors. In (a)-(d) we see that the average number of pixels examined per window was smaller using thesequentialalgorithm. In(e)-(h) we see that the error rate was the same in both algorithms. We can also see that thesequentialalgorithm is less sensitive to incorrectpriors (note that when the pixel similarity threshold is equal to 0, 3 and 5 the number of samples using thefixed sizealgorithm increaseswith incorrect prior). All tests were conducted using theThresholded Absolute DifferenceHamming distance with various pixel thresholds.

22

0 3 5 10 15 20 25 300

50

100

150

pavg(

#pix

els

exam

ined

)

100 300 500 10000

20

40

60

pavg(

#pix

els

exam

ined

)

0 3 5 10 15 20 25 300

100

200

300

400

pavg(

#pix

els

exam

ined

)

100 300 500 10000

50

100

pavg(

#pix

els

exam

ined

)

(3a) (3b) (6c) (6d)

0 3 5 10 15 20 25 300.0%

0.1%

0.2%

0.3%

p

fals

ene

gativ

e

100 300 500 10000.0%

0.1%

0.2%

0.3%

p

fals

ene

gativ

e

0 3 5 10 15 20 25 300.0%

0.1%

0.2%

0.3%

p

fals

ene

gativ

e

100 300 500 10000.0%

0.1%

0.2%

0.3%

p

fals

ene

gativ

e

(3e) (3f) (6g) (6h)

P-SPRT OPT

Fig. 14. Comparing the two parameterizations of thesequentialalgorithm: optimal and P-SPRT. All parameterizations weredone with theestimated prior.(3a),(3e) Results forThresholded Absolute Differenceon 30 × 30 patterns. (3b),(3f) Results forThresholdedl2 norm in L*a*b color spaceon 30 × 30 patterns. (6c),(6g) Results forThresholded Absolute Differenceon 60 × 60 patterns. (6d),(6h) Results forThresholdedl2 normin L*a*b color spaceon 60 × 60 patterns.P-SPRT samples a little more, with slightly smaller error rates.

used in quality control, sequential mastery testing and possibly more fields. Thus the method can be usedas is to produce optimal sampling schemes in these fields.

The project homepage is on:http://www.cs.huji.ac.il/∼ofirpele/hs

REFERENCES

[1] D. I. Barnea and H. F. Silverman, “A class of algorithms for fast digital image registration,”IEEE Trans. Computer, vol. 21, no. 2, pp.179–186, Feb. 1972.

[2] J. Matas and O. Chum, “Randomized ransac with sequentialprobability ratio test,” inProc. IEEE International Conference on ComputerVision (ICCV), vol. 2, October 2005, pp. 1727–1732.

[3] J. Sochman and J. Matas, “Waldboost - learning for time constrained sequential detection,” inProc. of Conference on Computer Visionand Pattern Recognition (CVPR), vol. 2, June 2005, pp. 150–157.

[4] P. E. Anuta, “Spatial registration of multispectral andmultitemporal digital imagery using fast fourier transform,” IEEE Trans.Geoscience Electronics, vol. 8, pp. 353–368, 1970.

[5] J. P. Lewis, “Fast normalized cross-correlation,” Sept. 02 1995. [Online]. Available: http://www.idiom.com/∼zilla/Work/nvisionInterface/nip.pdf

[6] A. J. Ahumada, “Computational image quality metrics: A review.” Society for Information Display International Symposium, vol. 24,pp. 305–308, 1998.

[7] B. Girod, Digital Images and Human Vision, ”Whats wrong with the mean-squared error?”. MIT press, 1993, ch. 15.[8] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,”IEEE Trans. Communications, vol. 43, no. 12, pp.

2959–2965, 1995.[9] S. Santini and R. C. Jain, “Similarity measures,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 871–883,

Sept. 1999.[10] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inImage Understanding

Workshop, 1981, pp. 121–130.[11] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” inProceedings

of the British Machine Vision Conference, vol. 1, London, UK, September 2002, pp. 384–393.[12] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004.[13] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,”IEEE Trans. Pattern Analysis and Machine Intelligence,

vol. 27, no. 10, pp. 1615–1630, 2005.

23

0 3 5 10 15 20 25 300

50

100

150

200

pavg(

#pix

els

exam

ined

)

100 300 500 10000

20

40

60

pavg(

#pix

els

exam

ined

)

0 3 5 10 15 20 25 300

100

200

300

400

pavg(

#pix

els

exam

ined

)

100 300 500 10000

50

100

150

pavg(

#pix

els

exam

ined

)

(3a) (3b) (6c) (6d)

0 3 5 10 15 20 25 300.0%

0.4%

0.8%

1.2%

1.6%

2.0%

2.4%

p

fals

ene

gativ

e

100 300 500 10000.0%

0.4%

0.8%

1.2%

1.6%

2.0%

2.4%

p

fals

ene

gativ

e

0 3 5 10 15 20 25 300.0%

0.4%

0.8%

1.2%

1.6%

2.0%

2.4%

p

fals

ene

gativ

e

100 300 500 10000.0%

0.4%

0.8%

1.2%

1.6%

2.0%

2.4%

p

fals

ene

gativ

e

(3e) (3f) (6g) (6h)

estimated prior uniform prior

Fig. 15. Comparing optimal parameterization of thesequentialalgorithm with the estimated prior against optimal parameterization with auniform prior.(3a),(3e) Results forThresholded Absolute Differenceon 30 × 30 patterns. (3b),(3f) Results forThresholdedl2 norm in L*a*b color spaceon 30 × 30 patterns. (6c),(6g) Results forThresholded Absolute Differenceon 60 × 60 patterns. (6d),(6h) Results forThresholdedl2 normin L*a*b color spaceon 60 × 60 patterns.In (3a)-(3b) and in (6c)-(6d) we see that the average number of pixels examined per window was slightly smaller with the uniform prior.However, in (3e)-(3f) and (6g)-(6h) we see that the error rate was higher with the uniform prior. Although higher, the error rate is still small.To conclude, the performance using a incorrect (uniform) prior is still quite good.

[14] H. Bay, T. Tuytelaars, and L. J. V. Gool, “Surf: Speeded up robust features.” inECCV (1), 2006, pp. 404–417.[15] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees.”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28,

no. 9, pp. 1465–1479, 2006.[16] D. Keren, M. Osadchy, and C. Gotsman, “Antifaces: A novel, fast method for image detection,”IEEE Trans. Pattern Analysis and

Machine Intelligence, vol. 23, no. 7, pp. 747–761, 2001.[17] S. Romdhani, P. H. S. Torr, B. Scholkopf, and A. Blake, “Computationally efficient face detection,” inInternational Conference on

Computer Vision, 2001, pp. II: 695–700. [Online]. Available: http://dx.doi.org/10.1109/ICCV.2001.937694[18] P. Viola and M. J. Jones, “Rapid object detection using aboosted cascade of simple features,” inIEEE Computer Vision and Pattern

Recognition or CVPR, 2001, pp. I:511–518.[19] S. Avidan and M. Butman, “The power of feature clustering: An application to object detection,”Neural Information Processing

Systems (NIPS), Dec. 2004. [Online]. Available: http://www.nips.cc/[20] Y. Hel-Or and H. Hel-Or, “Real-time pattern matching using projection kernels,”IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 27, no. 9, pp. 1430–1445, Sept. 2005. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2005.184[21] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum, “Real-time texture synthesis by patch-based sampling,”ACM Trans. Graph.,

vol. 20, no. 3, pp. 127–150, 2001.[22] S.-H. Cha, “Efficient algorithms for image template anddictionary matching,”J. Math. Imaging Vis., vol. 12, no. 1, pp. 81–90, 2000.[23] B. Zitova and J. Flusser, “Image registration methods:a survey,” Image and Vision Computing, vol. 21, no. 11, pp. 977–1000, Oct.

2003.[24] J. A. G. Pereira and N. D. A. Mascarenhas, “Digital imageregistration by sequential analysis,”Computers and Graphics, vol. 8, pp.

247–253, 1984.[25] N. D. A. Mascarenhas and G. J. Erthal, “Image registration by sequential tests of hypotheses: The relationship between gaussian and

binomial models,”Computers and Graphics, vol. 16, no. 3, pp. 259–264, 1992.[26] A. Wald, Sequential Analysis. Wiley, New York, 1947.[27] A. B. Lee, D. Mumford, and J. Huang, “Occlusion models for natural images: A statistical study of a scale-invariant dead leaves

model,” International Journal of Computer Vision, vol. 41, no. 1/2, pp. 35–59, 2001.[28] E. Simoncelli and B. Olshausen, “Natural image statistics and neural representation,”Annual Review of Neuroscience, vol. 24, pp.

1193–1216, May 2001.

24

[29] D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,”J.Opt.Soc. Am. A,, vol. 4,no. 12, pp. 2379–2394, 1987.

[30] J. G. Daugman, “Entropy reduction and decorrelation invisual coding by oriented neural receptive fields,”IEEE Trans. BiomedicalEngineering, vol. 36, no. 1, pp. 107–114, 1989.

[31] D. Siegmund,Sequential Analysis, Test and Confidence Intervals. Springer-Verlag, 1985.[32] Y.-W. Huang, C.-Y. Chen, C.-H. Tsai, C.-F. Shen, and L.-G. Chen, “Survey on block matching motion estimation algorithms and

architectures with new results,”J. VLSI Signal Process. Syst., vol. 42, no. 3, pp. 297–320, 2006.[33] Y.-L. Chan and W.-C. Siu, “New adaptive pixel decimation for block motion vector estimation,”IEEE Trans. Circuits Syst. Video

Techn., vol. 6, no. 1, 1996.[34] Y. Wang, Y. Wang, and H. Kuroda, “A globally adaptive pixel-decimation algorithm for block-motion estimation.”IEEE Trans. Circuits

Syst. Video Techn., vol. 10, no. 6, pp. 1006–1011, 2000.[35] M. Bierling, “Displacement estimation by hierarchialblockmatching,” inProc. SPIE, Visual Communications and Image Processing,

Jan. 1988.[36] A. Zaccarin and B. Liu, “New fast algorithms for the estimation of block motion vectors,”IEEE Trans. Circuits Syst. Video Techn.,

vol. 3, no. 2, pp. 148–157, 1993.[37] B. Liu and A. Zaccarin, “Fast algorithms for block motion estimation,” inIEEE International Conference on Acoustics, Speech, and

Signal Processing, vol. 3, 1992, pp. 449–452.[38] Y. Amit, 2D Object Detection and Recognition: Models, Algorithms, and Networks. MIT Press, 2002.[39] R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” inECCV, 1994, pp. 151–158.[40] B. Cyganek, “Comparison of nonparametric transformations and bit vector matching for stereo correlation.” inIWCIA, 2004, pp.

534–547.[41] Q. Lv, M. Charikar, and K. Li, “Image similarity search with compact data structures,” inCIKM. New York, NY, USA: ACM Press,

2004, pp. 208–217.[42] M. Ionescu and A. Ralescu, “Fuzzy hamming distance in a content-based image retrieval system,” inFUZZ-IEEE, 2004.[43] A. Bookstein, V. A. Kulyukin, and T. Raita, “Generalized hamming distance,”Inf. Retr., vol. 5, no. 4, pp. 353–375, 2002.[44] A. Abhyankar, L. A. Hornak, and S. Schuckers, “Biorthogonal-wavelets-based iris recognition,” A. K. Jain and N. K.Ratha, Eds.,

vol. 5779, no. 1. SPIE, 2005, pp. 59–67. [Online]. Available: http://link.aip.org/link/?PSI/5779/59/1[45] P. Hough, “A method and means for recognizing complex patterens,”U.S. Patent 3,069,654, 1962.[46] S. D. Blostein and T. S. Huang, “Detecting small, movingobjects in image sequences using sequential hypothesis testing,” IEEE Trans.

Signal Processing, vol. 39, no. 7, pp. 1611–1629, 1991.[47] D. Shaked, O. Yaron, and N. Kiryati, “Deriving stoppingrules for the probabilistic hough transform by sequential analysis,” Comput.

Vis. Image Underst., vol. 63, no. 3, pp. 512–526, 1996.[48] A. Amir and M. Lindenbaum, “A generic grouping algorithm and its quantitative analysis.”IEEE Trans. Pattern Analysis and Machine

Intelligence., vol. 20, no. 2, pp. 168–185, 1998.[49] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and

automated cartography,”Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.[50] H. Gharavi and M. Mills, “Blockmatching motion estimation algorithms-new results,”IEEE Trans. Circuits Syst. Video Techn., vol. 37,

no. 5, pp. 649–651, 1990.[51] G. Wyszecki and W. S. Stiles,Color Science: Concepts and Methods, Quantitative Data andFormulae. Wiley, 1982.[52] T. Darrell and M. Covell, “Correspondence with cumulative similiarity transforms,” IEEE Trans. Pattern Analysis and Machine

Intelligence., vol. 23, no. 2, pp. 222–227, 2001.[53] R. Brunelli and T. Poggio, “Face recognition: Featuresversus templates,”IEEE Trans. Pattern Analysis and Machine Intelligence.,

vol. 15, no. 10, pp. 1042–1052, 1993.[54] Z. Govindarajulu,Sequential Statistical Procedures. Academic Press, 1975, pp. 534–536.[55] A. Wald and J. Wolfowitz, “Optimum character of the sequential probability ratio test,”Ann. Math. Stat., vol. 19, pp. 326–339, 1948.[56] E. L. Lehmann,Testing Statistical Hypotheses. John Wiley & Sons, 1959, pp. 104–110.

25

APPENDIX IFIXED SIZE FRAMEWORK

We first present thefixedsizealgorithm that tests for similarity using a fixed size sample. Then weevaluate its performance. Finally we show how to find the optimal parameters for thefixed sizealgorithm.

A. fixedsize algorithm

The fixedsize algorithm has threshold parametersl, u and a fixed sample sizen. The frameworkcomputes optimall, u and n offline. Then, the algorithm can quickly decide whether a pattern and awindow are similar;i.e. if their Hamming distance is smaller or equal to the image distance threshold,t.

The algorithm samplesn corresponding pixels from the pattern and the window, computes theirHamming distance and decides according to the result whether to returnsimilar, non-similaror to computethe exact distance.

Algorithm 6 fixedsizel,u,n(pattern,window)k ⇐ 0for i = 1 to n do

sample uniformly and without replacement coordinates(x, y) from A

k ⇐ k + sim(pattern, window, (x, y)) add 1 if pixels are non-similarif k ≤ l then

return similarif k ≥ u then

return non-similarreturn (HammingDistanceA(pattern, window)) ≤ t

B. Evaluating performance for fixed parametersl, u, n

The performance of the algorithm is defined by its expected running time and its error probabilities.The computation is similar to the one in thesequentialalgorithm (see Section IV):

Pl,u,n(false negative) =

∑n

k=u

(n

k

)ΩS[k, n]

P (D ≤ t)(13)

Pl,u,n(false positive) =

∑l

k=0

(n

k

)ΩNS [k, n]

P (D > t)(14)

El,u,n(running time) = # samplesRI + P (compute exact)RE(n) (15)

= nRI +u−1∑

k=l+1

(n

k

)(ΩS[k, n] + ΩNS[k, n])RE(n)

C. Finding optimal parametersl, u, n for the algorithm

We first find optimal thresholdsl, u for a given sample size,n. Our goal is to minimize the expectedrunning time given bounds on the error probabilities,α, β:

arg minl,u

El,u(running time)

s.t :

Pl,u(false negative) ≤ α

Pl,u(false positive) ≤ β

Fixed number of samplesn

(16)

26

Pl,u(false negative) (Eq. 13) monotonically decreases with the thresholdu (the number of non-negativesummands decreases).Pl,u(false positive) (Eq. 14) monotonically increases with the thresholdl (thenumber of non-negative summands increases).El,u(running time) (Eq. 15) monotonically decreases withthe thresholdu and monotonically increases with the thresholdl. Thus, we wantl to be as large aspossible, andu to be as small as possible.

The algorithm that chooses the optimalu, Alg. 7, starts by takingu = n + 1 and decreases it untilPl,u(false negative) is too high. The algorithm that chooses optimall, Alg. 8 starts by takingl = −1 andincreases it untilPl,u(false positive) is too high.

Algorithm 7 opt u(n, t, α, |A|, P )nErr ⇐ 0nChooseK ⇐ 1for k = n to 0 do

err ⇐ nErr

nErr ⇐ nErr + nChooseK(ΩS[k,n])P (D≤t)

if nErr > α thenreturn k + 1

nChooseK ⇐ nChooseK( kn−k+1

)return k

Algorithm 8 otp l(n, t, β, |A|, P )nErr ⇐ 0nChooseK ⇐ 1for k = 0 to n do

err ⇐ nErr

nErr ⇐ nErr + nChooseK(ΩNS [k,n])P (D>t)

if nErr > β thenreturn k − 1

nChooseK ⇐ nChooseK(n−kk+1

)return k

In order to find the optimaln we compute optimall, u and the expected running time for eachn =1 . . . |A − 1|. Finally, we choose then that minimizes the running time. Note that the search can beterminated as soon as the current minimum of the running timeis smaller thannRI .

The intermediate sumsΩS[k, n] andΩNS[k, n] (see Eq. 2) are computed for all possiblek andn with adynamic programming algorithm with a time complexity ofO(|A|3). The algorithms that compute optimall, u for eachn (Algs. 8, 7) have a time complexity of O(|A|). These algorithms run a maximum of|A|times. Thus, finding optimaln, l, u has a time complexity of O(|A|3). It should be noted that the searchfor the optimal parameters is done offline. The user can employ the fixed sizealgorithm parameterizedwith the optimall, u, n, to quickly detect patterns in images.

APPENDIX IIHOUGH TRANSFORM COMPUTATION OFHAMMING DISTANCE

For simplicity we show how to use the Hough transform [45] to compute theThresholded AbsoluteDifferenceHamming Distance (see Section III-B) between a pattern and all windows in a 256 gray levelimage. The generalization to other members of the Image Hamming Distance Family is easy. We alsoanalyze its time complexity.

List of symbols:

27

A = Mask that contains coordinates of pixels.|A| is the size of this mask.RIm = Number of rows in large image.CIm = Number of columns in large image.L = A 256 array that contains lists of all pixel coordinates inthe pattern that are similar to aspecific gray value.p = Pixel similarity threshold of theThresholded Absolute DifferenceHamming Distance.H = TheThresholded Absolute DifferenceHamming Distance Map,i.e. H[r, c] is theThresholdedAbsolute DifferenceHamming Distance between the pattern and the window of the imageIm

whose top left pixel is[r, c].

Algorithm 9 HoughThresholdedAbsoluteDifferenceHammingDistance(pattern,image,p)L[0 . . . 255] ⇐ empty list of indices.for (x, y) ∈ A do

for g = (pattern[x, y] − p) to (pattern[x, y] + p) doL[g].insert([x, y])

H[0 . . .RIm, 0 . . . CIm] ⇐ |A|for r = 1 to RIm do

for c = 1 to CIm dofor it = (L[image[r, c]].begin) to (L[image[r, c]].end) do

H[r − it.r, c − it.c] ⇐ H[r − it.r, c − it.c] − 1

The first stage of Algorithm 9 which computes the array of lists, L has a time complexity ofO(|A|p).The second stage which computes the Hamming distance map, H has an expected time complexity ofO(RImCIm(|A| − E[D])), whereD is the random variable of the Hamming distance. Total expectedtime complexity isO(|A|p + RImCIm(|A| − E[D])). Average expected time complexity per window isO( |A|p

CImRIm+ |A| − E[D]). Since usually |A|p

CImRImis negligible the average expected time complexity per

window is O(|A| − E[D])

APPENDIX IIICOMPUTATION OF PROBABILITIES OF THE SEQUENTIAL ALGORITHM

List of symbols:• M = sequentialalgorithm decision matrix.M [k, n] is the algorithm decision after samplingk non-

similar corresponding pixels out of a total ofn corresponding pixels. The decision can beNS=returnnon-similar, S=returnsimilar, E=compute the exact distance and return the true answer orC=continuesampling. See the graphical representation in Fig. 8.

• D = Random variable of the Hamming distance.• t = Image similarity threshold,i.e. if the Hamming distance of two images is smaller or equal tot,

then the images are considered similar. Otherwise, the images are considered non-similar.• ΨM [k, n] = Number of paths from the point(0, 0) to the point(k, n) that do not touch a stopping

point (S, NS, E) in Fig. 8 on page 11.• In(k, n)= The event of samplingk non-similar corresponding pixels out of a total ofn corresponding

pixels, in a specific order. See Eq. 1 on page 11.• ΩS[k, n], ΩNS[k, n] = Intermediate sums defined in Eq. 2 on page 12.

PM(false negative) = PM(returnnon-similar|images are similar) (17)

= PM(returnnon-similar|D ≤ t) (18)

28

=

|A|∑

d=0

PM(returnnon-similar, D = d|D ≤ t) (19)

=1

P (D ≤ t)

t∑

d=0

PM(returnnon-similar, D = d) (20)

=1

P (D ≤ t)

t∑

d=0

∑

(k,n):M(k,n)=NS

Ψ[k, n]P (In(k, n), D = d) (21)

=1

P (D ≤ t)

∑

(k,n):M(k,n)=NS

Ψ[k, n]t∑

d=0

P (In(k, n), D = d) (22)

=1

P (D ≤ t)

∑

(k,n):M(k,n)=NS

Ψ[k, n]ΩS[k, n] (23)

PM(false positive) = PM(returnsimilar|images are non-similar) (24)

= PM(returnsimilar|D > t) (25)

=

|A|∑

d=0

PM(returnsimilar, D = d|D > t) (26)

=1

P (D > t)

|A|∑

d=t+1

PM(returnsimilar, D = d) (27)

=1

P (D > t)

|A|∑

d=t+1

∑

(k,n):M(k,n)=S

Ψ[k, n]P (In(k, n), D = d) (28)

=1

P (D > t)

∑

(k,n):M(k,n)=S

Ψ[k, n]

|A|∑

d=t+1

P (In(k, n), D = d) (29)

=1

P (D > t)

∑

(k,n):M(k,n)=S

Ψ[k, n]ΩNS[k, n] (30)

EM [# samples] =

|A|∑

d=0

EM [# samples, D = d] (31)

=

|A|∑

d=0

∑


Ψ[k, n]P (In(k, n), D = d)n (32)

=∑


Ψ[k, n](ΩS [k, n] + ΩNS[k, n])n (33)

29

PM(compute exact) =

|A|∑

d=0

PM(compute exact, D = d) (34)

=

|A|∑

d=0

∑

(k,n):M(k,n)=E

Ψ[k, n]P (In(k, n), D = d) (35)

=∑

(k,n):M(k,n)=E

Ψ[k, n]

|A|∑

d=0

P (In(k, n), D = d) (36)

=∑

(k,n):M(k,n)=E

Ψ[k, n](ΩS [k, n] + ΩNS[k, n])n (37)

APPENDIX IVCOMPUTATION OF EXPECTED ADDITIVE LOSS IN THE BACKWARD INDUCTION ALGORITHM

Let w1 and w0 be the loss weights for false positive error and false negative error respectively. Theexpected additive loss for each decision given that we sampledn samples, out of whichk were non-similaris:

E[addLoss(S)|k, n] = P (D > t|In(k, n))w1 (38)

=P (D > t, In(k, n))

P (In(k, n))w1 (39)

=

∑|A|d=t+1 P (D = d, In(k, n))∑|A|

d=0 P (D = d, In(k, n))w1 (40)

=ΩNS[k, n]

ΩS[k, n] + ΩNS[k, n]w1 (41)

E[addLoss(NS)|k, n] = P (D ≤ t|In(k, n))w0 (42)

=P (D ≤ t, In(k, n))

P (In(k, n))w0 (43)

=

∑t

d=0 P (D = d, In(k, n))∑|A|

d=0 P (D = d, In(k, n))w0 (44)

=ΩS[k, n]

ΩS[k, n] + ΩNS[k, n]w0 (45)

E[addLoss(E)|k, n] = RE(n) (46)

E[addLoss(C)|k, n] = RI + P (next elements similar|In(k, n))loss(k, n + 1)+ (47)

P (next elements non similar|In(k, n))loss(k + 1, n + 1) (48)

= RI +P (next elements similar, In(k, n))

P (In(k, n))loss(k, n + 1)+ (49)

30

P (next elements non similar, In(k, n))

P (In(k, n))loss(k + 1, n + 1) (50)

= RI +P (In(k, n + 1))

P (In(k, n))loss(k, n + 1)+ (51)

P (In(k + 1, n + 1))

P (In(k, n))loss(k + 1, n + 1) (52)

= RI +ΩS[k, n + 1] + ΩNS [k, n + 1]

ΩS[k, n] + ΩNS [k, n]loss(k, n + 1)+ (53)

ΩS[k + 1, n + 1] + ΩNS [k + 1, n + 1]

ΩS[k, n] + ΩNS [k, n]loss(k + 1, n + 1) (54)

APPENDIX VBACKWARD INDUCTION SOLUTION THEOREMS

Theorem 1:Let M∗ be a decision matrix which is the solution to Eq. (7). Then it is the solution tothe original minimization problem Eq. (6) withα = PM∗(false negative) andβ = PM∗(false positive).

Proof: Let M ′ be another decision matrix of the same size and smaller/equal error probabilities.Then:

loss(M∗, w0, w1) = PM∗(false negative)P (similar)w0+

PM∗(false positive)P (non similar)w1+

EM∗ [N ]RI + PM∗(compute exact)RE(n)

≤ PM ′(false negative)P (similar)w0+ M∗ is optimal (Eq. (6))

PM ′(false positive)P (non similar)w1+

EM ′ [N ]RI + PM ′(compute exact)RE(n)

≤ PM∗(false negative)P (similar)w0+ PM ′(false negative) ≤

PM∗(false negative)

PM∗(false positive)P (non similar)w1+ PM ′(false positive) ≤

PM∗(false positive)

EM ′ [N ]RI + PM ′(compute exact)RE(n)

m

EM∗ [N ]RI + PM∗(compute exact)RE(n) ≤ EM ′ [N ]RI + PM ′(compute exact)RE(n)

Theorem 2:Let M∗ be the optimal decision matrix returned by Alg. 3 for somew0, w1. If w0 =RE(n)

αP (similar) thenPM∗(false negative) ≤ α. If w1 =RE(n)

βP (non similar) thenPM∗(false positive) ≤ β

Proof: Let M ′ be a decision matrix, whereM ′[0, 0] = E; i.e. Alg. 1 parameterized withM ′ performsthe exact computation of the distance immediately, taking zero samples. Then:

RE(n) = Loss(M ′, w0, w1) price for exact

≥ Loss(M∗, w0, w1) M∗ is optimal

≥ PM∗(false negative)P (similar)w0 part of sum of non-negatives

= PM∗(false negative)P (similar)RE(n)

αP (similar)w0 =

RE(n)

αP (similar)

m

31

PM∗(false negative) ≤ α

The proof of if w1 =RE(n)

βP (non similar) thenPM∗(false positive) ≤ β is similar.

Documents

1 Robust Real Time Pattern Matching using Bayesian