5
A note on the computation of high-dimensional integral images Ernesto Tapia Freie Universität Berlin, Institut für Informatik, Arnimallee 7, 14195 Berlin, Germany article info Article history: Received 15 May 2009 Available online 27 October 2010 Communicated by Q. Ji Keywords: Integral image Haar-based features High-dimensional image Möbius inversion formula Neuroimaging abstract The integral image approach allows optimal computation of Haar-based features for real-time recognition of objects in image sequences. This paper describes a generalization of the approach to high-dimensional images and offers a formula for optimal computation of sums on high-dimensional rectangles. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Viola and Jones (2004) introduced the integral image approach for real-time detection of objects in image sequences. They con- structed a boosted cascade of simple classifiers based on Haar- similar features that measure vertical, horizontal, central, and diagonal variations of pixel intensities. These features are the dif- ference between the sums of image values on two, three, and four rectangles (see Fig. 1). However, it must be noted that the sum of image values i(x 0 , y 0 ) on a given rectangle (x 0 , y 0 ] (x 1 , y 1 ], A ¼ X x 0 <x 0 6x 1 X y 0 <y 0 6y 1 iðx 0 ; y 0 Þ ð1Þ is computationally expensive, because its complexity is propor- tional to the number of pixels contained in the rectangle. One of the key contributions of Viola and Jones is the use of the integral image (Crow, 1984; Viola and Jones, 2004) as an intermedi- ate array representation to optimally compute the sum A. The inte- gral image value at the pixel (x, y) is defined as the sum Iðx; yÞ¼ X 06x 0 6x X 06y 0 6y iðx 0 ; y 0 Þ; ð2Þ of the original image values on the rectangle [0, 0] [x, y]. They computed the integral image in one pass over the image using the recurrence cðx; yÞ¼ cðx; y 1Þþ iðx; yÞ; ð3Þ Iðx; yÞ¼ Iðx 1; yÞþ cðx; yÞ ð4Þ with cðx; 1Þ¼ Ið1; yÞ¼ 0; ð5Þ where c(x, y) is called the cumulative row sum (Fig. 2). Thus, they computed A in constant time using only four references to the inte- gral image using the formula A ¼ Iðx 1 ; y 1 Þ Iðx 1 ; y 0 Þ Iðx 0 ; y 1 Þþ Iðx 0 ; y 0 Þ: ð6Þ Ke et al. (2005) extended the integral image approach to detect the motion and activity of persons in videos. They considered the image sequences as three-dimensional images and defined the integral video to compute volumetric features from the video’s optical flow (see Fig. 3). The features are the sums of image values on parallelepi- peds, and the sums are optimally computed using eight references to the integral video. Many other high-dimensional image structures could benefit from this approach. Examples of these structures are flow through porous media in experimental fluid dynamics (Preusser and Rumpf, 2003), and functional magnetic resonance images (fMRI) in medical applications (Huettel et al., 2004). These three-dimensional images are formed with volumetric picture elements (voxels) p, which locate the values i(p) in space. These structures can also be extended to dy- namic four-dimensional images i(p, t), where p and t are the discrete indices in space and time, respectively. Neuroimaging is the most appealing application of the integral image approach owing to the advances in magnetic resonance technology that has made real-time fMRI possible (deCharms, 2007; Weiskopf et al., 2004). Real-time analysis of dynamic fMRI data could allow for the development of methods for mind-event recognition, which can led to new and practical applications, such 0167-8655/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2010.10.007 E-mail address: [email protected] Pattern Recognition Letters 32 (2011) 197–201 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

A note on the computation of high-dimensional integral images

Embed Size (px)

Citation preview

Page 1: A note on the computation of high-dimensional integral images

Pattern Recognition Letters 32 (2011) 197–201

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

A note on the computation of high-dimensional integral images

Ernesto TapiaFreie Universität Berlin, Institut für Informatik, Arnimallee 7, 14195 Berlin, Germany

a r t i c l e i n f o a b s t r a c t

Article history:Received 15 May 2009Available online 27 October 2010Communicated by Q. Ji

Keywords:Integral imageHaar-based featuresHigh-dimensional imageMöbius inversion formulaNeuroimaging

0167-8655/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.patrec.2010.10.007

E-mail address: [email protected]

The integral image approach allows optimal computation of Haar-based features for real-timerecognition of objects in image sequences. This paper describes a generalization of the approach tohigh-dimensional images and offers a formula for optimal computation of sums on high-dimensionalrectangles.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

Viola and Jones (2004) introduced the integral image approachfor real-time detection of objects in image sequences. They con-structed a boosted cascade of simple classifiers based on Haar-similar features that measure vertical, horizontal, central, anddiagonal variations of pixel intensities. These features are the dif-ference between the sums of image values on two, three, and fourrectangles (see Fig. 1). However, it must be noted that the sum ofimage values i(x0,y0) on a given rectangle (x0,y0] � (x1,y1],

A ¼X

x0<x06x1

Xy0<y06y1

iðx0; y0Þ ð1Þ

is computationally expensive, because its complexity is propor-tional to the number of pixels contained in the rectangle.

One of the key contributions of Viola and Jones is the use of theintegral image (Crow, 1984; Viola and Jones, 2004) as an intermedi-ate array representation to optimally compute the sum A. The inte-gral image value at the pixel (x,y) is defined as the sum

Iðx; yÞ ¼X

06x06x

X06y06y

iðx0; y0Þ; ð2Þ

of the original image values on the rectangle [0,0] � [x,y]. Theycomputed the integral image in one pass over the image using therecurrence

cðx; yÞ ¼ cðx; y� 1Þ þ iðx; yÞ; ð3Þ

ll rights reserved.

Iðx; yÞ ¼ Iðx� 1; yÞ þ cðx; yÞ ð4Þwith

cðx;�1Þ ¼ Ið�1; yÞ ¼ 0; ð5Þwhere c(x,y) is called the cumulative row sum (Fig. 2). Thus, theycomputed A in constant time using only four references to the inte-gral image using the formula

A ¼ Iðx1; y1Þ � Iðx1; y0Þ � Iðx0; y1Þ þ Iðx0; y0Þ: ð6Þ

Ke et al. (2005) extended the integral image approach to detect themotion and activity of persons in videos. They considered the imagesequences as three-dimensional images and defined the integral videoto compute volumetric features from the video’s optical flow (seeFig. 3). The features are the sums of image values on parallelepi-peds, and the sums are optimally computed using eight referencesto the integral video.

Many other high-dimensional image structures could benefitfrom this approach. Examples of these structures are flow throughporous media in experimental fluid dynamics (Preusser and Rumpf,2003), and functional magnetic resonance images (fMRI) in medicalapplications (Huettel et al., 2004). These three-dimensional imagesare formed with volumetric picture elements (voxels) p, which locatethe values i(p) in space. These structures can also be extended to dy-namic four-dimensional images i(p, t), where p and t are the discreteindices in space and time, respectively.

Neuroimaging is the most appealing application of the integralimage approach owing to the advances in magnetic resonancetechnology that has made real-time fMRI possible (deCharms,2007; Weiskopf et al., 2004). Real-time analysis of dynamic fMRIdata could allow for the development of methods for mind-eventrecognition, which can led to new and practical applications, such

Page 2: A note on the computation of high-dimensional integral images

Fig. 1. Haar-based rectangular features used for face recognition. The features arethe sum of the values on the gray region minus the sum of the values on the whiteregion.

Fig. 2. Left: Integral image representation. Right: The four references used tocompute the image values on the gray area.

198 E. Tapia / Pattern Recognition Letters 32 (2011) 197–201

as brain-computer interfaces, lie detection, and therapeutic appli-cations (deCharms, 2008). Thus, a natural question is how wecan generalize the integral image approach for real-time analysisof high-dimensional images.

We realized that generalization of the approach basically con-sists of adapting two main steps. One that computes an integral ar-ray in one pass, and the other that computes the sum of pixelsincluded in a high-dimensional rectangle using only few referencesto the integral array in constant time.

The next section states these generalization steps and beginswith some useful notations and definitions.

2. Integral representation in high dimensions

We denote vectors with the usual notation

x ¼ ðx1; . . . ; xdÞ: ð7Þ

Bold-faced scalars denote vectors whose entries are equal to thescalar.

Fig. 3. Above: Volumetric features computed by the integral video. Below: The blackparallelepiped V.

Superindices in vectors represent a labeling, which can be a sca-lar m such as

xm ¼ ðxm1 ; . . . ; xm

d Þ; ð8Þ

or a vector n such as

xn ¼ ðxn11 ; . . . ; xnd

d Þ: ð9Þ

The vector em is a member of the canonical basis, where emm ¼ 1 and

emn ¼ 0 for n – m.

A relation that plays a relevant role in this work is defined asfollows:

Definition 1. The partial order � on the vectors is defined as

x � y() xi 6 yi; i ¼ 1; . . . ; d: ð10Þ

Remark 1. The partial order lets us define intervals in analogy toone-dimensional intervals. For example, consider the semi-closedinterval

ðx; y� ¼ fz : x � z � yg: ð11Þ

Similarly, we define the intervals (x,y), [x,y], and [x,y). It must benoted that these intervals define geometrically high-dimensionalrectangles. We will use interval or rectangle without distinctionto denote such sets.

Definition 2. A d-dimensional image is a real-valued function

i : ½0;u� ! R: ð12Þ

The integral image I : ½0;u� ! R of the image i is defined as

IðxÞ ¼X

z2½0;x�iðzÞ: ð13Þ

2.1. Optimal computation of integral images

The first step in this approach is the computation of the integralimage in one pass. This step is relatively easy to generalize: if thearray has dimension d, then we have to maintain only d � 1 extra

circles are the references used to compute the sum of the optical flow on the

Page 3: A note on the computation of high-dimensional integral images

E. Tapia / Pattern Recognition Letters 32 (2011) 197–201 199

arrays and define a recursion similar to (3)–(5). We can formallystate this idea as follows:

Proposition 1. The integral image I is computed in one pass over theimage i using the arrays cm, m = 1, . . . ,d � 1, and the recurrence

IðxÞ ¼ Iðx� e1Þ þ c1ðxÞ; ð14Þc1ðxÞ ¼ c1ðx� e2Þ þ c2ðxÞ ð15Þ

..

.ð16Þ

cd�1ðxÞ ¼ cd�1ðx� edÞ þ iðxÞ ð17Þwith

cmðxÞ ¼ IðxÞ ¼ 0; ð18Þ

when

xn < 0 for m ¼ 1; . . . ; d� 1 and n ¼ 1; . . . ;d: ð19Þ

Proof. By reordering the sum in the integral image, we have

IðxÞ ¼X

0�z�x

iðzÞ ð20Þ

¼X

06z16x1

� � �X

06zd6xd

iðz1; . . . ; zdÞ ð21Þ

¼X

06z16x1�1

� � �X

06zd6xd

iðz1; . . . ; zdÞ

þX

06z26x2

� � �X

06zd6xd

iðx1; z2; . . . ; zdÞ ð22Þ

If we define

c1ðxÞ ¼X

06z26x2

� � �X

06zd6xd

iðx1; z2; . . . ; zdÞ; ð23Þ

then we have

IðxÞ ¼ Iðx� e1Þ þ c1ðxÞ: ð24Þ

Similarly, we define for n = 1, . . . ,d � 1,

cnðxÞ ¼ cnðx� enþ1Þ þ cnþ1ðxÞ ð25Þwith

cnþ1ðxÞ ¼X

06znþ16xnþ1

� � �X

06zd6xd

iðx1; . . . ; xn; znþ1; . . . ; zdÞ; ð26Þ

where cd(x) = i(x). It must be noted that recursions (24) and (25) areactually undefined if the entries of x � e1 or x � en+1 are negative.For this reason, we define (18) and (19). h

The second step in the approach is the optimal computation of

A ¼X

z2ðx0 ;x1 �

iðzÞ; ð27Þ

Fig. 4. (a) Integral image values. (b) Co

given the image i, its integral image I, and the rectangle of interest(x0,x1]. For such purpose, we define the following concepts:

Definition 3. The corners of the rectangle (x0,x1] are the vectors

xq ¼ ðxq11 ; . . . ; xqd

d Þ; ð28Þ

where q 2 {0,1}d.Geometrically, the corners are the points that limit the rectangle

along the axes. Fig. 4 shows an example in the two-dimensionalspace. In this case, there are four corners x(0,0), x(1,0), x(0,1), and x(1,1).

Binary labeling used in the limits of the rectangle (x0,x1] natu-rally induces a bijective mapping of the corners to the binary vec-tors q 2 {0,1}d. We used this bijection to define an importantconcept:

Definition 4. The binary representation of the rectangle (x0,x1] arethe sums defined on its corners

SðqÞ ¼X

z2½0;xq �iðzÞ ð29Þ

and

AðqÞ ¼X

z2ðxq�1 ;xq �iðzÞ; ð30Þ

where q 2 {0,1}d and x�1n ¼ �1 for n = 1, . . . ,d.

The binary representation offers the key to express sums on therectangles in terms of the partial ordering of binary vectors:

Sð1;1Þ ¼X

q�ð1;1ÞAðqÞ ¼ Að1;1Þ þ Að0;1Þ þ Að0;1Þ þ Að0;0Þ: ð31Þ

From Fig. 5, it can be observed that similar relations hold for all val-ues of S in the binary representation:

Sð1;0Þ ¼X

q�ð1;0ÞAðqÞ ¼ Að1;0Þ þ Að0;0Þ; ð32Þ

Sð0;1Þ ¼X

q�ð0;1ÞAðqÞ ¼ Að0;1Þ þ Að0;0Þ; ð33Þ

Sð0;0Þ ¼X

q�ð0;0ÞAðqÞ ¼ Að0;0Þ: ð34Þ

The values A correspond to the sums on each of the four rectan-gles defined by the origin, the axis, and the corners, and the valuesS correspond to the integral values on the rectangle’s corners (seeFig. 5).

We can also find an expression similar to (31) for the sumA(1,1) defined on the rectangle (x0,x1]:

Að1;1Þ ¼X

q�ð1;1ÞlðqÞSðqÞ

¼ Sð1;1Þ � Sð0;1Þ � Sð1;0Þ þ Sð0;0Þ; ð35Þ

rners of the rectangle of interest.

Page 4: A note on the computation of high-dimensional integral images

(C) (d)Fig. 5. Example of binary representation in two-dimensional space.

200 E. Tapia / Pattern Recognition Letters 32 (2011) 197–201

where l(q) is a coefficient that also depends on the corner (1,1). Theother values of A can also be written in terms of S and partial order-ing. Note that (35) is the optimal formula (6) by Viola and Jones, butexpressed in terms of binary representation. We can easily verifythis visually by inspecting and comparing Figs. 4 and 5.

However, using only visual inspection to obtain optimal expres-sions for sums on a rectangle in dimension higher than two is verydifficult, if not impossible. We actually need a general expression tocompute sums on the rectangles in terms of the integral array. Wewill show that the generalization of (6) is algebraically possible bydemonstrating that equations similar to (31) and (35) also hold ingeneral, using the binary representation and the following result ofcombinatorial theory:

Proposition 2 (Möbius Inversion Formula). Let f(q) be a real-valued function, defined for q ranging in a locally finite partiallyordered set Q. Let an element m exist with the property that f(q) = 0unless q P m. Suppose that

gðqÞ ¼Xp6q

f ðpÞ: ð36Þ

Then

f ðqÞ ¼Xp6q

lðp; qÞgðpÞ; ð37Þ

where the function l is called the Möbius function of the partially or-dered set Q. The value l(p,q) is computed recursively for p 6 q as

lðp; qÞ ¼1; p ¼ q;

�P

p6r<qlðp; rÞ; p – q:

8<: ð38Þ

Interested readers can refer to Rota (1964) for the proof ofMöbius Inversion Formula.

Now, we can state an important result of this section.

Proposition 3. We can express the binary representation of therectangle (x0,x1] as

SðqÞ ¼Xp�q

AðpÞ ð39Þ

and

AðqÞ ¼Xp�q

ð�1Þ‘ðqÞ�‘ðpÞSðpÞ; ð40Þ

where

‘ðqÞ ¼Xd

i¼1

qi: ð41Þ

Proof. Eq. (39) is easily proved using

½0; xq� ¼[p�q

ðxp�1; xp�: ð42Þ

Now, let us prove (40). Observe that the element m mentionedin Proposition 2 guarantees that the sum (36) is well defined. Inour case, we don’t have to prove the existence of this element,because the sum (39) runs over a finite number of indices, and thusit is already well defined. Thus, the partial order � and (39) satisfythe hypothesis of the Möbius Inversion Formula, and can be con-cluded that

AðqÞ ¼Xp�q

lðp; qÞSðpÞ: ð43Þ

Finally, we have to prove that

lðp; qÞ ¼ ð�1Þ‘ðqÞ�‘ðpÞ: ð44ÞTherefore, we use definition (38) of the Möbius function. Considerfirst that p = q. Then, we have ‘(p) = ‘(q) and thus

lðp; qÞ ¼ 1 ¼ ð�1Þ0 ¼ ð�1Þ‘ðqÞ�‘ðpÞ: ð45Þ

Page 5: A note on the computation of high-dimensional integral images

E. Tapia / Pattern Recognition Letters 32 (2011) 197–201 201

Suppose that p – q and that (44) is valid for l(p,r), with p � r � q.Using definition (38), we have

lðp; qÞ ¼ �X

p�r�q

lðp; rÞ ð46Þ

¼ �X‘ðqÞ�‘ðpÞ�1

i¼0

Xp�r�q

‘ðrÞ¼‘ðpÞþi

lðp; rÞ; ð47Þ

¼ �X‘ðqÞ�‘ðpÞ�1

i¼0

Xp�r�q

‘ðrÞ¼‘ðpÞþi

ð�1Þ‘ðrÞ�‘ðpÞ ð48Þ

¼ �X‘ðqÞ�‘ðpÞ�1

i¼0

Xp�r�q

‘ðrÞ¼‘ðpÞþi

ð�1Þi ð49Þ

¼ �X‘ðqÞ�‘ðpÞ�1

i¼0

ð�1Þijfp � r � q : ‘ðrÞ ¼ ‘ðpÞ þ igj ð50Þ

¼ �X‘ðqÞ�‘ðpÞ�1

i¼0

ð�1Þi‘ðqÞ � ‘ðpÞ

i

� �ð51Þ

¼ �X‘ðqÞ�‘ðpÞi¼0

ð�1Þi‘ðqÞ � ‘ðpÞ

i

� �þ ð�1Þ‘ðqÞ�‘ðpÞ ð52Þ

¼ ð�1Þ‘ðqÞ�‘ðpÞ: � ð53Þ

The above-mentioned result lets us conclude the second step inthe generalization of the approach:

Proposition 4. Given an image i, its integral image I, and therectangle (x0,x1], we can compute sum A of the image values on therectangle using 2d references to the integral image with the formula

A ¼X

p2f0;1gd

ð�1Þd�‘ðpÞIðxpÞ: ð54Þ

Proof. Eq. (54) is an immediate consequence of Proposition 3 and

SðqÞ ¼ IðxqÞ; ð55ÞAð1Þ ¼ A; ð56Þ

which are derived from the definition of binary representation. h

3. Concluding remarks

This paper gives a direction to generalize the Integral Image ap-proach for d-dimensional images. The generalization consists ofcomputation of an integral array in one pass and the optimal com-putation of sums on rectangles using 2d references to the integralarray.

However, the generalization has some drawbacks for high d.The computation of the integral array uses d � 1 extra arrays, sig-nifying a memory increase that many personal computers couldnot support for large images. Another problem is the curse ofdimensionality. The boosting method used by Viola and Jones se-lects the best feature from all possible ones generated by scaling,rotating, and translating a base feature through the image. If weconsider, for example, the first (and simplest) volumetric featurein Fig. 3, then the number of features is proportional to n2d for animage of dimension n � n � n. Despite these drawbacks, the resultspresented in this study seem to be an attractive starting point forboosting-based classification in high-dimensional imaging.

There is another generalization not directly related to objectrecognition. If we use integration on the rectangle of interest in-stead of addition in Proposition 4, then we can informally state that(54) offers a generalization of the Fundamental Theorem of Calculus(Apostol, 1967). Remember that this theorem states that if f is acontinuous function on the interval [x0,x1] and F is an antideriva-tive of f, thenZ x1

x0f ðxÞdx ¼ Fðx1Þ � Fðx0Þ: ð57Þ

Note that the integral image can be regarded as an ‘‘antiderivative’’of the original image, because it is an integral of the image with avariable upper limit, similar to the antiderivative of a function ofone variable. Using this analogy, generalization to several variablescomputes the integral of a function f on the interval using its anti-derivative F evaluated at the rectangle’s cornersZ½x0 ;x1 �

f ðxÞdx ¼X

p2f0;1gd

ð�1Þd�‘ðpÞFðxpÞ: ð58Þ

This generalization is a ‘‘direct’’ generalization of the FundamentalTheorem of Calculus, if we compare it to the generalization givenby Stokes’ theorem, which involves specialized concepts, such asmanifolds and differential forms (Katz, 1979). A formal statement ofgeneralization (58) needs the definition of integrability and antide-rivative for functions of several variables, among other concepts.However, we are sure the proof of our generalization could followthe procedure that we had developed in this study to demonstrateProposition 4.

Acknowledgments

The author is very grateful to Marte Ramírez for his commentsabout the preliminary results of this study. Special thanks to Dr.Waldemar Barrera, Dr. Fernando Galaz-Fontes, and Dr. Luis Verdeas well as the anonymous reviewers for their comments and cor-rections, which helped to improve this study.

References

Apostol, T.M., 1967. Calculus: One-Variable Calculus with an Introduction to LinearAlgebra, vol. 1. John Wiley & Sons Inc.

Crow, F.C., 1984. Summed-area tables for texture mapping. In: SIGGRAPH ’84: Proc.11th Annual Conference on Computer Graphics and Interactive Techniques.ACM, New York, NY, USA, pp. 207–212.

deCharms, R.C., 2007. Reading and controlling human brain activation using real-time functional magnetic resonance imaging. Trends Cogn. Sci. 11 (11), 473–481.

deCharms, R.C., 2008. Applications of real-time fMRI. Nat. Rev. Neurosci. 9 (9), 720–729.

Huettel, S.A., Song, A.W., McCarthy, G., 2004. Functional magnetic resonanceimaging. Sinauer Associates, Sunderland, MA.

Katz, V.J., 1979. The history of Stokes’ theorem. Math. Mag., 146–156.Ke, Y., Sukthankar, R., Hebert, M. 2005. Efficient visual event detection using

volumetric features. In: IEEE Internat. Conf. on Computer Vision, vol. 1,Washington, DC, USA, pp. 166–173.

Preusser, T., Rumpf, M. 2003. Extracting motion velocities from 3d image sequencesand coupled spatio-temporal smoothing. In: SPIE Conf. on Visualization andData Analysis, vol. 5009, pp. 181–192.

Rota, G.C., 1964. On the foundations of combinatorial theory I – theory of Möbiusfunctions. Prob. Theory Relat. Fields 2 (4), 340–368.

Viola, P., Jones, M., 2004. Robust real-time object detection. Internat. J. Comput.Vision 57 (2), 137–154.

Weiskopf, N., Mathiak, K., Bock, S.W., Scharnowski, F., Veit, R., Grodd, W., Goebel, R.,Birbaumer, N., 2004. Principles of a brain-computer interface (BCI) based onreal-time functional magnetic resonance imaging (fMRI). IEEE Trans. Biomed.Eng. 51 (6), 966–970.