Semi-supervisedstatisticalregionreﬁnementforcolorimage ...nielsen/pdf/2005... · Grouping is the discovery of intrinsic clusters in data [1]. Image segmentation is a particular

Pattern Recognition 38 (2005) 835–846www.elsevier.com/locate/patcog

Semi-supervised statistical region refinement for color imagesegmentation

Richard Nocka,∗, Frank NielsenbaGRIMAAG-Département Scientifique Interfacultaire, Université des Antilles-Guyane, Campus de Schoelcher, BP 7209, 97275 Schoelcher,

Martinique, FrancebSony Computer Science Laboratories, Inc., 3-14-13 Higashi Gotanda, Shinagawa-Ku, Tokyo 141-0022, Japan

Received 9 August 2004

Abstract

Some authors have recently devised adaptations of spectral grouping algorithms to integrate prior knowledge, as constrainedeigenvalues problems. In this paper, we improve and adapt a recent statistical region merging approach to this task, as a non-parametric mixture model estimation problem. The approach appears to be attractive both for its theoretical benefits and itsexperimental results, as slight bias brings dramatic improvements over unbiased approaches on challenging digital pictures.� 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords:Image segmentation; Semi-supervised grouping

1. Introduction

Grouping is the discovery of intrinsic clusters in data[1]. Image segmentation is a particular kind of grouping inwhich data consists of an image, and the task is to extractas regions the objects a user may find conceptually distinctfrom each other. The automation and optimization of thistask face computational issues[2] and an important con-ceptual issue: basically, segmentation has access only to thedescriptions of pixels (e.g. color levels) and their spatial re-lationships, while a user always uses higher level of knowl-edge to cluster the image objects. Without such a signifi-cant prior world knowledge, the accuracy of grouping is notmeant to be optimality or even near-optimality, but rather

∗ Corresponding author. Tel.: +596727424; fax: +596727362.E-mail addresses:[email protected]

(R. Nock), [email protected](F. Nielsen)URLs: http://www.univ-ag.fr/∼rnock,

http://www.csl.sony.co.jp/person/nielsen/.

0031-3203/$30.00� 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2004.11.009

accurate candidacy, as segmentation should come up withpartition(s) from a candidate segmentation set[3].With the advent of media making it easier and cheaper to

collect and store digital images, unconstrained digital pho-tographic images have raised this challenge even furthertowards both computational efficiency and robust process-ing. Consider for example the well-known benchmark imagelena in Fig. 1. Users would certainly consider the hat ofthe girl as an object different from the blurred background,and most would consider her shoulder as different from herface. Nevertheless, due to the distribution of colors, it is vir-tually impossible for segmentation techniques based solelyon low-level cues, such as the colors, to make a clean sepa-ration of these regions. The right image displays the resultof our algorithm. The regions found have white borders.This result is presented more in depth in the experimentalsection (Fig.2). Notice from the result the segmentation ofthe hat, cleanly separated from the background, and also thesegmentation of the girl’s chin, which is separated from hershoulder.

http://www.elsevier.com/locate/patcog

mailto:[email protected]

mailto:[email protected]

http://www.univ-ag.fr/~rnock

http://www.csl.sony.co.jp/person/nielsen/

836 R. Nock, F. Nielsen / Pattern Recognition 38 (2005) 835–846

Fig. 1. Imagelena (left), and our segmentation (right). In the segmentation’s result, regions found are white bordered (see text for details).

Fig. 2. Imagelena (upper left), and its segmentation bySRRB, without bias (w/o), and with bias (w/, seeFig. 1). In the segmentations’results, regions found are delimited with white borders. We havem= 4, |V1| = 2, |V2| = 2, |V3| = 2, |V4| = 2. The upper right table displayssome of the largest regions extracted from the segmentation (Reg. #X). The bottom table shows howlena ’s face is built from the twomodels whose pixels, denoted by red triangles on the upper right image, have been pointed by the user (each model’s color is random); thenumber below×2000 isSRRB’s iteration number.

Common grouping algorithms for image segmentationuse a weighted neighborhood graph to formulate the spa-tial relationships among pixels[1–6] and then formulate thesegmentation as a graph partitioning problem. An essentialdifference between these algorithms is the locality of thegrouping process. Shi and Malik[3] and Yu and Shi[1,7]solve it from a global standpoint, whereas Felzenszwalb andHuttenlocher[4], Nock [2] and Nielsen and Nock[5] makegreedy local decisions to merge the connex components ofinduced subgraphs. Since segmentation is a global optimiza-tion process, the former approach is a priori a good can-didate to tackle the problem, even when it faces computa-tional complexity issues[3]. However, strong global proper-ties can be obtained for the latter approaches, such as qual-itative bounds on the overall segmentation error[4,2], oreven quantitative bounds[5].

Our approach to segmentation, which gives the results ofFig. 1, is based on a segmentation framework previouslystudied by Yu and Shi[1,7]: grouping with bias. It is par-ticularly useful for domains in which the user may interactwith the segmentation, by inputting constraints to bias itsresult: sensor models in MRF[8], Human–computer inter-action, spatial attention and others[1]. Grouping with biasis basically one step further towards the integration of theuser in the loop, compared to the method of general expec-tations of a good segmentation integrated in non-purposivegrouping[9]; it is solved by pointing in the image some pix-els (thebias) that the user feel belong to identical/differentobjects, and then solving the segmentation as a constrainedgrouping problem: pixels with identical labelsmustbelongto the same region in the segmentation’s result, while pixelswith different labelsmust notbelong to the same region. The

R. Nock, F. Nielsen / Pattern Recognition 38 (2005) 835–846 837

solution for the global approach ofYu and Shi[1,7] is math-ematically appealing, but it is computationally demanding,and it requires quite an extensive bias for good experimentalresults on small images. Furthermore, it makes it difficult tohandle the constraints that some pixels must not belong tothe same regions; that is why this technique is mainly usedfor the particular biased segmentation problem in which onewants to segregate some objects from their background.In this paper, we propose a general solution to biased

grouping, based on a local approach to image segmentation[2,5], which basically consists in using the bias for the esti-mation of non-parametric mixture models. Distribution-freeprocessing techniques are useful, if not necessary, in group-ing [10,2]. However, estimating clusters in data is alreadyfar from being trivial even when significative distributionassumptions are assumed[11]. This is where the bias is ofhuge interest when it comes to grouping, as bias definespartially labeled regions, with the constraint that differentlabels belong to different regions. The approach of Nock[2] and Nielsen and Nock[5] is also conceptually appealingfor an adaptation to biased segmentation, because it consid-ers that the observed image is the result of the sampling ofa theoretical image, in which regions are statistical regionscharacterized by distributions. There is no distribution as-sumption on the statistical pixels of this theoretical image.The only assumption made is an homogeneity property, ac-cording to which the expectation of a single color is thesame inside a statistical region, and it is different betweentwo adjacent statistical regions. The segmentation problemis thus the problem of recognizing the partition of the sta-tistical regions on the basis of the observed image. The bi-ased grouping problem turns out to allow statistical regionsto contain different statistical sub-regions, not necessarilyconnected, each satisfying independently the homogeneityproperty,and for which the user feels they all belong to thesame perceptual object. Thus, it yields a significant general-ization of the theoretical framework of Nock[2] and Nielsenand Nock[5].Our contribution in this paper is twofold. First, it consists

of two modifications and improvements to the unbiased seg-mentation algorithm of Nock[2] and Nielsen and Nock[5].Their algorithm contains two stages. Informally, its first partis a procedure which orders a set of pairs of adjacent pix-els, according to the increasing values of some real-valuedfunction f. Its second part consists of a single pass on thisorder, in which it tests the merging of the regions to whichthe pixels belong, using a so-called merging predicateP. fandP are the cornerstones of the approach of Nock[2] andNielsen and Nock[5]. We propose in this paper a betterf,and an improvedP relying on a slightly more sophisticatedstatistical analysis. Our second contribution is the extensionof this algorithm to grouping with bias. Our extension keepsboth fast processing and the theoretical bounds on the qual-ity of the segmentation. Experimentally, the results appearto be very favorable when comparing them to those of theapproach of Yu and Shi[1,7]. They also appear to lead to

dramatic improvements over unbiased grouping, when com-paring our mostly automated biased grouping process to thehuman segmentations obtained on images of the Berkeleysegmentation data set and benchmark images[12].Section 2 summarizes the unbiased approach of Nock[2]

and Nielsen and Nock[5], and presents our modification totheir algorithm in the unbiased setting. Section 3 presentsour extension to biased grouping and some theoretical re-sults. Section 4 presents experimental results, and Section 5concludes the paper.

2. Grouping exploiting concentration phenomena

We recall here the basic facts of the model of Nock[2]and Nielsen and Nock[5]. Throughout this paper, “log” isthe base-2 logarithm. The notation| · | denotes the numberof pixels (cardinality) when applied to a regionR, or to theobserved imageI. Each pixel ofI contains three color levels(R,G,B), each of the three belonging to the set{1,2, . . . , g}.TheRGB setting is used to cast our results directly on thesame setting as Nock[2] and Nielsen and Nock[5]; however,the versatile technique of Nock[2] and Nielsen and Nock[5] can be tailored to other numerical feature descriptionspaces, and handles more complex formulations of the colorgamuts, such as CIE,L∗u∗v∗,HSI, etc. as well as channelsampling rates.

2.1. The model and algorithm of Nock[2] and Nielsen andNock[5]

The imageI is an observation of a perfect object (or “trueregion”, or statistical region) sceneI∗ we do not know of,and which we try to approximate through the observation ofI. It is I∗ which captures the global properties of the scene:theoretical (or statistical) pixels are each represented by a setof Q distributionsfor each color level, from which each ofthe observed color level is sampled. The statistical regionsof I∗ satisfy a 4-connexity constraint, and the simple ho-mogeneity constraint that theR (resp.,G, B) expectation isthe same inside a statistical region. In order to discriminateregions, we assume that between any two adjacent regionsof I∗, at least one of the three expectations is different. Itis important that, apart from an additional independency, nomore assumptions are put onI∗: for instance, the distribu-tions canall be different for all statistical pixels, therebycontrasting with usual statistical models used in image seg-mentation, involving hypotheses that can be quite restric-tive [10]. Q is a parameter which quantifies the complexityof the scene, the generality of the model, and the statisti-cal hardness of the task as well. IfQ is small, the modelgains in generality (Q = 1 brings the most general model),but segmenting the image is more difficult from a statisticalstandpoint. Experimentally,Q turns out to be a tunable pa-rameter which controls the coarseness of the segmentation,even if one value in[2,5] (Q = 32) appears to be sufficientto obtain nice segmentations for a large body of images.


From this model, Nock[2] and Nielsen and Nock[5]obtain a merging predicateP(R,R′) based on concentrationinequalities, to decide whether two observed regionsR andR′ belong to the same statistical region ofI∗, and thus haveto be merged. LetRa denote the observed average for colora in regionR of I, and letRl be the set of regions withlpixels. Let

b(R) = g

√1

2Q|R|(ln

|R|R||�

). (1)

Ref. [2] pick |Rl | = (l + 1)g . The merging predicate is[2]

P(R,R′) ={ true iff ∀a ∈ {R,G,B}, |R′

a − Ra |�b(R) + b(R′),

false otherwise.(2)

The description of the algorithmprobabilistic-sortedi magesegmentation (PSIS ) of Nock [2] is straightforward, asexposed in Algorithm 1. It basically consists in making apreliminary sorting over the setSI of the pairs of adjacentpixels of the image, according to the increasing values of areal-valued functionf (p, p′). f (., .) takes a pair of pixelsp andp′ as input, and returns the maximum of the threecolor differences (R, G andB) in absolute value betweenpandp′:f (p, p′) = max

a∈{R,G,B} |p′a − pa |. (3)

In 4-connexity, the number of such pairs is|SI | = 2rI cI −rI − cI = O(|I |) if I has rI rows andcI columns. Afterordering, the algorithm traverses this order only once, andtest for any pair(pi, p

′i) the merging of the two regions to

which they currently belong,R(pi) andR(p′i) (here, the

subscripti denotes the rank of the couple in the order). Suchan algorithm is called region-merging, since it graduallymerges regions, pixels being taken as elementary regions.

Algorithm 1: PSIS (I )

Input : an imageIS′I

=Order _increasing (SI , f);for i = 1 to |S′

I| do⌊

if R(pi) �= R(p′i) and P(R(pi), R(p′

i)) = true then

� Union (R(pi), R(p′i));

This approach to segmentation is interesting from the algo-rithmic standpoint, because it is both simple and fast. Theeigenvalue approach of Shi and Malik[3] admits a naive so-lution whose time complexity isO(|I |3)—impractical evenfor moderate-sized images—. Mathematical tricks can scaledown the time complexity at the expense of greater im-plementation efforts, but it still lies somewhere in betweenO(|I |√|I |) and O(|I |3). In the case of Algorithm 1, thefor...to complexity isO(|I |) (linear). Fortunately, the orderis also cheap asradixsort ing with f (., .) values as thekeys brings a time complexityO(|I | log g). Sinceg can beconsidered constant, we get an overall approximate linear-

time complexity for the whole algorithm using an Union-Find implementation[13].We now concentrate on our modifications tof and the

merging predicateP.

2.2. An improved f

Nielsen and Nock[5] have shown thatf should theoret-ically be an estimator, as reliable as possible, of the lo-cal between-pixel gradients. Eq. (3) is the simplest way tocompute these estimators, but there is another choice withwhich we have obtained yet better visual results. It consistsin extending convolution kernels classically used in edgedetection for pixel-wise gradient estimation. In 4-connexity,neighbor pixels are either horizontal or vertical. Thus, weonly need�x or �y between neighbor pixelsp andp′, foreach color channela ∈ {R,G,B}. A natural choice is toextend the Sobel convolution filter to the following kernels:

�x :[−1 0 0 1

−2 0 0 2−1 0 0 1

],

�y :

1 2 10 0 00 0 0

−1 −2 −1

.

For example, whenever neighbor pixelsp andp′ are hori-zontal,�x is used and the two pixels in the convolution fil-ter are located in the second row, and the second and thirdcolumns. Only for the pixels for which the estimations with�x and �y cannot be done (i.e. those of the image border)do we keep the estimation of Eq. (3).

2.3. An improved merging predicateP, and associatedtheoretical results

Our first result is based on the following theorem:

Theorem 1 (The independent bounded difference inequality,McDiarmid [14] ). Let X = (X1, X2, . . . , Xn) be a familyof independent r.v. withXk taking values in a setAk foreach k. Suppose that the real-valued function h defined on∏

kAk satisfies|h(x) − h(x′)|�ck whenever vectorsx andx′ differ only in the k-th coordinate. Let � be the expectedvalue of the r.v.h(X). Then for any��0,

Pr(h(X) − ��)� exp

−2�2

/∑k

(ck)2

. (4)

From this theorem, we obtain the following result onthe deviation of observed differences between regions ofI.Recall thatRa denotes the observed average of colora forregionR.

Theorem 2. Consider a fixed couple(R,R′) of regions ofI. ∀0< ��1, ∀a ∈ {R,G,B} the probability is no more


than� that

|(Ra − R′a) − E(Ra − R

′a)|�g

√1

2Q

(1

|R| + 1

|R′|)ln2

�.

Proof. Suppose we shift the value of the outcome of oner.v. among theQ(|R| + |R′|) possible r.v. of the couple(R,R′). |Ra −R

′a | is subject to a variation of at mostcR =

g/(Q|R|) when this modification affects regionR (amongQ|R| possible r.v.), and at mostcR′=g/(Q|R′|) for a changeinsideR′ (amongQ|R′| possible r.v.). We get

∑k(ck)

2 =Q(|R|(cR)2 + |R′|(cR′)2) = (g2/Q)((1/|R|) + (1/|R′|)).Using the fact that the deviation with the absolute value is atmost twice that without, and using Theorem 1 (solving for�) brings our result. This ends the proof of Theorem 2.�

Fix somea ∈ {R,G,B}. Provided we have some way toevaluate the theoretical deviation of|Ra −R

′a |, a candidate

merging predicate is straightforward. Nock[2] and Nielsenand Nock[5] use a concentration inequality on the deviationof the average color level around its expectation for anysingle arbitrary regionR. Thus, they get for each regionRprobabilistic bounds on the deviation of|Ra−E(Ra)|.Whentwo adjacent regionsR andR′ come from the same trueregion inI∗, we haveE(Ra) = E(R

′a); thus the triangular

inequality makes that the deviation of|Ra −R′a | is no more

than the sum of deviations for each region’s color around itsexpectation, and we get the merging predicate of Nock[2]and Nielsen and Nock[5] in Eq. (2). However, the use ofthe triangular inequality weakens the concentration bound.Notice that whenR andR′ come from the same true regionin I∗, Theorem 2 directly yields a probabilistic bound onthe deviation of|Ra −R

′a | (sinceE(Ra −R

′a)=0), without

a similar weakening.However, Theorem 2 is a single event’s concentration in

what it considers a single couple of regions(R,R′), and oneshould extend this to the whole image, in order to obtaina convenient merging predicate. Fortunately, one can easilyupperbound the probability that such a large deviation occursin the observed imageI, using the union bound. Nock[2]remark that this is no more than the probability to occurin the set of all regions (whether presentor absent fromI). The cardinal of each subset containing fixed-size regionscan be upperbounded by a degree-O(g) polynomial[2], butit yields to a pretty large upperbound.Another counting argument is possible, if we remark that

the occurrence probability onI is also no more than thatmeasured on the set of couples ofI whose merging is tested,S′I(See Algorithm 1). Its cardinal,|S′

I|, is comparatively

small: for a single-pass algorithm,|S′I|< |I |2, and even

|S′I|=�(|I |) in 4-connexity. Thus, we get the following the-

orem. �

Theorem 3. ∀0< ��1, there is probability at least1 −(3|S′

I|�) that all couples(R,R′) tested shall verify∀a ∈

{R,G,B}, |(Ra − R′a) − E(Ra − R

′a)|�b(R,R′), with

b(R,R′) = g

√1

2Q

(1

|R| + 1

|R′|)ln2

�. (5)

b(R,R′) would lead to a very good theoretical mergingpredicateP instead of using the thresholdb(R) + b(R′) inEq. (2), provided we pick a� small enough. This predicatewould be much better than[2,5] from the theoretical stand-point, but slightly larger thresholds are possible that keepall the desirable theoretical properties we look for, and givemuch better visual results. Our merging predicate uses onesuch threshold, which turns out to beO(b(R,R′)) (the tildeupon the big-Oh notation authorizes to remove constants andlog-terms). Remark that provided regionsR andR′ are notempty,b(R,R′)�

√b2(R) + b2(R′)< b(R) + b(R′). This

right quantity is that of Eq. (2). The center quantity is theone we use for our merging predicate. Notice that it is in-deedO(b(R,R′)) provided a good upperbound on|Rl | isused. Our merging predicate is thus:

P(R,R′) =true iff ∀a ∈ {R,G,B}, |R′

a − Ra |�

√b2(R) + b2(R′),

false otherwise.(6)

Let us now concentrate on|Rl |. Nock [2] and Nielsen andNock [5] pick |Rl | = (l + 1)g , considering that a regionis an unordered bag of pixels (each color level is given0,1, . . . , l pixels). This bound counts numerous duplicatesfor each region: e.g. at least(l + 1)g−l when l < g. Thus,we fix |Rl | = (l + 1)min{l,g}.As advocated before, the merging predicate in Eq. (6) has

proven experimentally to be more interesting than that ofEq. (2). Let us briefly illustrate its theoretical interests, thatencompass those of the merging predicate of Nock[2] andNielsen and Nock[5] (due to the fact that it is tighter).Suppose that we are able to make the merging tests

through f in such a way that when any test between two(parts of) true regions occurs, that means that all tests in-side each of the two true regions have previously occurred.Let us nameA this assumption. Informally,A stresses theneed to makef as accurate as possible (and it has moti-vated our study of Section 2.2). Notice also thatA does notpostulateat all that we know where the statistical regionsof I∗ are in I. Under assumptionA, Nielsen and Nock[5] have shown that with high probability, the error of thesegmentation is limited from both the qualitativeand thequantitative standpoint. There are basically three kind oferrors a segmentation algorithm can suffer with respect tothe optimal segmentation. First, under-merging representsthe case where one or more regions obtained are strict sub-parts of statistical regions. Second, over-merging representsthe case where some regions obtained strictly contain morethan one statistical region. Third, there is the “hybrid” (andmost probable) case where some regions obtained containmore than one strict subpart of true regions. Name for shorts∗(I ) as the set of regions of the ideal segmentation of


I following the statistical regions ofI∗, and s(I ) the setof regions in the segmentation we get. Then we have thefollowing qualitative bound on the error.

Theorem 4.With probability�1− �(|I |�), the segmenta-tion on I satisfyingA is an over-merging ofI∗, that is:∀O ∈ s∗(I ), ∃R ∈ s(I ) : O ⊆ R.

Proof. From Theorem 3, with probability>1 − (3|S′I|�)

(thus >1 − �(|I |�) in 4-connexity), all regionsR andR′ coming from the same statistical region ofI∗,whose merging is tested, satisfy∀a ∈ {R,G,B}, |Ra −R

′a |�b(R,R′)�

√b2(R) + b2(R′) (thus, P(R,R′) =

true ). Since A holds, the segmentation obtained is anover-merging ofI∗. �

Because of the choice of ourP, it satisfies also the quan-titative bounds of Nielsen and Nock[5], i.e. we can showthat with high probability, the over-segmentation of Theo-rem 4 shall not hopefully result in a too large error. We referthe reader to Nielsen and Nock[5] for the explicit values ofthis error bound, not needed here. In the sequel, we shall re-fer to our modified version ofPSIS asSRRB, which standsfor StatisticalRegionRefinement (withBias). We have cho-sen to use refinement better than merging, because it shallbe clear from the experiments that the bias may help to im-prove both the merging and the splitting of regions, evenwhenSRRB formally belongs to region merging algorithms.

3. Grouping with bias

It shall be useful in this section to think ofI as containingvertices instead of pixels, and the (4-)connexity as definingedges, so thatI can be represented by a simple graph(V ,E).Following Yu and Shi[1], we define agrouping biasto beuser-defined disjoint subsets ofV: {V1, V2, . . . , Vm} = V.Any feasible solution to the constrained grouping problemis a partition ofV into connex components (thus, a partitionof the pixels ofI into regions), such that

(i) any such connex component intersects at most one ele-ment ofV, and

(ii) ∀1� i�m, any element ofVi is included into one connexcomponent.

The first condition states that no region in the segmentationof I may contain elements from two distinct subsets inV,and condition (ii) states that each vertex inV belongs to aregion.For any regionR of I, we define amodelfor the region

to be a subset ofR, without connexity constraints on itselements, containingonevertex of some element ofV. Theterm model makes statistical sense because anyVi ∈ V

with |Vi |>1 may represent a single object for the user, butcomposed of different statistical regions (“models”) ofI∗.Each element ofVi can thus represent one theoretical pixelof each different statistical sub-regions, whose union makes

the object perceived by the user. For example, the hat oflena in Fig. 1visually contains two parts belonging to thesame conceptual object (a bandeau and the hat itself). Thereis a significant gradient between these two parts of the hat.So, one may imagine to create someVi ∈ V by pointingtwo pixels, one on the top of the hat, and one somewhereon the bandeau (or on the bright hat’s border whose color issimilar to the bandeau). This indicates that these two partswith different colors define two models that belong to thesame object, and should thus be considered as a single regionin the segmentation’s result (Notice that this is indeed thecase from our biased segmentation’s result inFig. 1.)Grouping with bias has another advantage, since it natu-

rally handles occlusions, as the user may specify that two(or more) models not connected represent the same region.This could be the case inFig. 1for lena ’s hair for example.Any region R defined by someVi ∈ V is a partition

of models, each represented by one element ofVi . Wename regions without vertices fromV as “model-free”.There are therefore two types of regions in our segmenta-tion: those model-free, and those being a partition of sub-regions, each defined by the elements of one subset inV.Our region merging algorithm keeps this as an invariant:thus, merging a model-free region and a model-based re-gion results in the merging of the first region into onemodel of the second. The modification of the approachin Refs. [2,5] consists in first making eachVi defined bythe user, and then, through the traversing ofS′

I, replacing

the merging stage (theif condition in thefor...to of Algo-rithm 1) by the following newif condition (∀(p, p′) ∈ S′

I):

if If R(p) andR(p′) are model-free, then we computeP(R(p), R(p′)) as in Algorithm 1, and eventuallymerge them;

else ifboth contain models, then we do not merge them;Indeed, in that case, either the models are defined byvertices of different subsets inV (and we obviouslydo not have to merge them), or they are defined byvertices of the same subset ofV. However, in thatcase, they have been defined by the used as differentsub-regions of the same object, so we keep these sub-regions distinct until the end of the algorithm.

else consider without loss of generality thatR(p) con-tains models andR(p′) does not. We first computeP(M(p), R(p′)), withM the model ofR(p) adjacentto R(p′) (notice thatp ∈ M(p)):if it returnstrue , then a merge is done: we fold

R(p′) into M(p); thus,M(p) grows (and notthe other models ofR(p)), as after this mergingit integratesR(p′).

elsewe search for the best matching modelM(p)

of R(p) w.r.t. R(p′) (the one minimizingmaxa∈R,G,B |M(p)a − R(p′)a |), and eventu-ally fold R(p′) intoM(p) iff P(M(p), R(p′))returnstrue .


At the end of the algorithm, all models of eachVi are mergedaltogether in the segmentation’s output.The theoretical properties enjoyed by this extension to

grouping with bias are the same as the unbiased approachgiven in Section 2, provided we make the assumption thatall the vertices of the subsets inV come from differentstatistical regions ofI∗. Recall that this is sound with the factthat the aim of bias is precisely to make it possible for theobserver to integrate in the same perceptual object differentstatistical (true) regions ofI∗. For example, we have:

Theorem 5. Suppose that the vertices ofV come from∑mi=1|Vi | different statistical regions ofI∗.Then,with prob-

ability �1− �(|I |�), the segmentation on I satisfyingA isan over-merging ofI∗.

Quantitative bounds on the are also possible, followingNielsen and Nock[5].

4. Experiments

We have runSRRB on a benchmark of digital pictures ofvarious contents and difficulty levels, to test its ability to im-prove the segmentation quality over the unbiased approach,while using a bias as slight as possible. While looking at theexperiments performed onFigs. 2–7, the reader may keepin mind that images are segmented as they are, i.e. withoutany preprocessing, and with the same value for the param-eters, following Nock[2] and Nielsen and Nock[5]:

Q = 32, (7)

� = 1

3|I |2 . (8)

Thus, the quality of the results may be attributed only toSRRB, and not to any content-specific tuning or preprocess-ing optimization.

4.1. SRRB ’s model choice rules, and first experiments

Fig. 2shows detailed results on the imagelena , that havebeen previously outlined in the introduction.We have chosenthis image because it is one of the mostly used benchmarkin image processing, and it has features making it difficultto segment, such as its blurred regions, its thin differencesbetween perceptually distinct regions (e.g. her face and hershoulder), its single majority tone (reddish), its strong gra-dients (e.g. her hat). The result ofSRRB with bias displaysfour model-based regions. Each such region is defined onlyby two models. In the result ofSRRB with bias, the sym-bols denote the pixels that have been pointed by the user todefine each element ofV. Different symbols denote differ-ent elements ofV, and thus different perceptual regions forthe user.

The result ofSRRB with bias displays a very good seg-mentation of the image. The girl’s hat and her shoulder, twovery difficult regions to segment, are almost perfectly seg-mented. The quality of the segmentation is to be evaluatedin the light of the bias given. The user has specified onlyeight pixels for the whole image. Obviously, these pixelshave not been chosen at random.Two simple “rules of thumb” appear to be enough for a

limited processing of most of the images, while obtainingthe significant improvements over unbiased segmentationobserved inlena , and over images as well such as thoseof Fig. 3.The first and most important is agradient rule: the sub-

regions of smoothest gradients between two perceptuallydistinct regions are good places for specifying models. Thisprevents the order to merge distinct perceptual sub-regionsin the early steps ofSRRB, during which the statistical ac-curacy of the merging predicate is the smallest. In imagelena , the two red triangles defining her face are approxi-mately located in smooth gradient parts between the face andthe hat, and between the face and the shoulder respectively.The second is asize rule: during the merging steps, no

two models mix altogether into a single model, even whenthey belong to the same region. Furthermore, the merging ofa model-free region into a model of another region does notnecessarily imply that they are connected, as connexity isensured only with the region containing the model. Thus, ifthe user specifies very small models in perceptually distinctregions, this may yield a significant over-merging for theregions to which suchmodel belongs. For example, if we hadput a red triangle inlena ’s right eye, her hair would havebeen merged with her face. Such a mistake does not reallyrepresent a real additional interaction burden, as putting asingle different additional model in the hair would solve theproblem. However, this simple rule of leaving model-freetoo small regions may save a significant number of models,and thus reduce the interaction time for the user.For relatively “easier” pictures, such as those ofFig. 3,

sparse and simple choices yield dramatic improvementsover unbiased grouping: the stairs and the tower ofcastle-1 are almost perfectly segmented, and the castleof castle-2 is almost perfectly extracted as a whole(notice the model in the drainpipe, which prevents it to bemerged with the bushy tree).In the next subsection, we make further comparisons of

SRRB with biased normalized cuts (NCuts ).

4.2. SRRB vs.NCuts

Yu and Shi[1,7] have proposed a mathematically appeal-ing extension of the original unbiased normalized cuts prob-lem of Shi and Malik[3]. In the original problem, the imageis transformed into a weighted graph, and the objective is tomake a partition of this graph into a fixed number of connexcomponents, so as to minimize the cut between the com-ponents and maximize an association (within-component)


Fig. 3. Segmentation results on imagescastle-1 (m = 3, |V1| = 5, |V2| = 4, |V3| = 3) and castle-2 (m = 5,|V1| = 3, |V2| = 1, |V3| = 1, |V4| = 1, |V5| = 2). Conventions followFig. 2.

measure. The solution of this problem, even when compu-tationally intractable[3], can be fairly well approximatedby the eigen decomposition of a stochastic matrix. The ex-tension of this technique to biased grouping involves con-strained eigenvalue problems. These problems make it dif-ficult to handle non-transitive constraints, such as the con-straint “must not belong to the same region”, which belongsto the core of the biased grouping problem (see Section 3).In the Refs.[1,7], the bias takes the form of transitive con-straints, such as “must belong to the same region”. In thatcase, nothing is assumed for the pixels with different la-

bels. In order to make a fair comparison withNCuts , wehave decided to study a particular biased grouping prob-lem whose constraints fit in this category, and for which theNCuts give some of their best results[7]: the segregationof the foreground from the background of an image. Thisproblem is addressed onNCuts by making a frame (10 pix-els width in most of our experiments) on an image, and thenconstrain the grouping to treat this frame as a single region.Fig. 4 presents some results that have been obtained for

three animal pictures, in which we want to segregate theanimal from its background.Fig. 5 presents additional re-


Fig. 4.SRRB vs.NCuts on imagesleopard (m=2, |V1|=4, |V2|=5),bat (m=2, |V1|=2, |V2|=7) andbadger (m=2, |V1|=10, |V2|=4).For each image, the result ofSRRB with bias and the largest regions found bySRRB andNCuts are shown. The remaining regions followthe convention ofFig. 2.

sults on an animal, and on a flower. The results displaythat theNCuts perform reasonably well, but fail to makeaccurate segmentations of regions with deep localized con-trasts, such as the head of thebadger , the speckled coatof the leopard , or theflower . Because these contrastsmake very small local cut values forNCuts , they tend tobe selected as region frontiers, and transitive constraints donot seem to be enough for preventing their split. This isclearly a drawback thatSRRB does not suffer, as it managesvery accurate segmentations of all animals. Even the beeof the flower image receives an accurate segmentation,which segregates the insect from both the background and

the flower, whileNCuts essentially succeed only in makingan accurate segregation of the insect from the background.This flower image is an example of a difficult image forgrouping algorithms based only on low-level cues: due tothe distribution of colors, it is virtually impossible to make asingle region out of the flower, whose colors are contrasted,whilepreventing the bee to be merged with the background.Eight models pointed are enough to solve this problem inSRRB.Since our merging predicate relies on comparing observed

averages, making segmentations without bias inSRRB facesthe problem of under merging for regions with strong


Fig. 5. More results on imageschamois (m=3, |V1|=5, |V2|=3, |V3|=3) andflower (m=3, |V1|=3, |V2|=3, |V3|=2). Conventionsfollow Fig. 4.

Fig. 6. Comparison ofSRRB andNCuts on the BSDB imagewoman (m = 2, |V1| = 3, |V2| = 3). The “human” result is a segmentationfrom a human, taken from the BSDB (regions are displayed white with black borders). Other conventions followFig. 4.


Fig. 7. Experiments on the BSDB. Segmentation’s conventions followFig. 6. The numbers below the results ofSRRB (w/) arem (bold),and the cardinal of theVis.

gradients, such as the ceiling of the cave in thebat imageof Fig. 4. Fortunately, the bias makes it possible to handlegradients in quite an efficient way. The strong gradient ofthebat picture comes from the flash of the camera and therocky irregular ceiling of the cave. The background containsvery bright parts (upper left) up to very dark parts (lowerright); in that case, only seven models have been necessaryto segregate the bat.All these result have to be appreciated in the light of the

amount of bias imposed, and the execution times.SRRB hasrequired no more than a total of 14 pixels pointed in each

image. Furthermore, the execution times give a significantadvantage toSRRB: each image was segmented in abouta second withSRRB, while it took between 5 and 9minwith NCuts . The experiments were ran on a Pentium IV2GHz PC with 512MB ram. Grouping with bias involves aninteraction with the user to define the constraints, and a loopbetween the user and the machine for their optimization: inthat case, a program running in no time to get the results isclearly an advantage.Fig. 6 shows a first experiment on the Berkeley Segmen-

tation Data set and Benchmark (BSDB,[12]). This data base


presents, on a large amount of images, the results of hu-man segmentations. This is particularly relevant for biasedgrouping, which precisely involves the human in the loop.Again, the task of segregating the woman from her back-ground is much more accurate forSRRB when compared toNCuts : SRRB fits very well the human segmentation, withonly six pixels pointed in the image.

4.3. SRRB vs. human on the BSDB images

We have performed more experiments on the BSDB, tocompareSRRB against human segmentations. The prob-lems considered are more general than the segregationbackground/foreground, which has led us to leave some-whatNCuts for these experiments (whose best results wereabove average), and focus on the way the biased segmen-tation inSRRB could come close to that of an human. Theresults are presented inFig. 7. Apart from imagecave , wehave not necessarily tried to fit the human segmentationsobserved in the BSDB. Rather, we have tried to fit the waywe would segment the images, and then we have chosenin the BSDB an human segmentation close to the resultobtained. Notice that the images chosen are, on average,quite difficult. A typically difficult example is thesandimage, in which very little can be obtained from the colorsonly. In this image, theNCuts without bias have obtainedalmost exactly the same result asSRRB without bias. Thebiased segmentation, which makes extensive use of the gra-dient rule to choose the models (Section 4.1), has obtaineda very good segmentation of the picture with few models,and the result closely follows the human segmentation. Onimage cave , we have tried to fit as exactly as possiblean human segmentation of the BSDB (shown), using thefewest number of models. Without an extensive tuning (thistook only few tries), we have almost exactly matched thehuman segmentation while using only 12 models.

5. Conclusion

In this paper, we have proposed a novel method for seg-menting an image with a user-defined bias. The bias takesthe form of pixels pointed by the user on the image, to defineregions with distinctive sub-parts. The algorithm proposedrely on an unbiased segmentation algorithm[2,5], which wefirst modify and improve. The extension to biased segmen-tation keeps both the fast processing time and the theoreticalproperties of the former unbiased approaches onto whichit is based. Experimental results show dramatic improve-ments of the biased segmentations over the unbiased results,and the biased results compare very favourably to previousapproaches[1,7]. All these benefits come at a negligibleadditional computational cost when compared to unbiasedgrouping, while biased eigenvalue-based segmentation ap-proaches suffer from slow-downs of magnitude orders[1].From an experimental standpoint, the running time of our

biased approach is also a magnitude order smaller than thoseof Yu and Shi[1,7], and it is compatible with the constraintthat grouping with bias involves close interactions with theuser. We also emphasize the slight bias we use, with whichthe quality improvements it brings make it a valuable com-panion well worth the try for the segmentation of compleximages, such as those obtained with digital media.Code availability: SRRB can be obtained from the au-

thor’s webpages.

References

[1] S.-X. Yu, J. Shi, Grouping with bias, in: Advances inNeural Information Processing Systems, vol. 14, 2001, pp.1327–1334.

[2] R. Nock, Fast and reliable color region merging inspired bydecision tree pruning, in: Proceedings of IEEE InternationalConference on Computer Vision and Pattern Recognition,IEEE Computer Society Press, Silver Spring, MD, 2001, pp.271–276.

[3] J. Shi, J. Malik, Normalized cuts and image segmentation,IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 888–905.

[4] P.F. Felzenszwalb, D.P. Huttenlocher, Image segmentationusing local variations, in: Proceedings of IEEE InternationalConference on Computer Vision and Pattern Recognition,IEEE Computer Society Press, Silver Spring, MD, 1998, pp.98–104.

[5] F. Nielsen, R. Nock, On region merging: the statisticalsoundness of fast sorting, with applications, in: Proceedingsof IEEE International Conference on Computer Vision andPattern Recognition, IEEE Computer Society Press, SilverSpring, MD, 2003, pp. 19–27.

[6] C.T. Zahn, Graph-theoretic methods for detecting anddescribing gestalt clusters, IEEE Trans. Comput. 20 (1971).

[7] S.-X. Yu, J. Shi, Segmentation given partial groupingconstraints, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004)173–183.

[8] S. Geman, D. Geman, Stochastic relaxation, Gibbs distri-bution, and the Bayesian restoration of Images, IEEE Trans.Pattern Anal. Mach. Intell. 6 (1984) 721–741.

[9] J. Luo, C. Guo, Perceptual grouping of segmented regions incolor images, Pattern Recognition 36 (2003) 2781–2792.

[10] D. Comaniciu, P. Meer, Robust analysis of feature spaces:color image segmentation, in: Proceedings of IEEEInternational Conference on Computer Vision and PatternRecognition, IEEE Computer Society Press, Silver Spring,MD, 1997, pp. 750–755.

[11] S. DasGupta, Learning mixtures of gaussians, in: Proceedingsof the 40th IEEE Symposium on the Foundations of ComputerScience, 1999, pp. 634–644.

[12] UC Berkeley vision group, The Berkeley segmentationdataset and benchmark, 2004,http://www.cs.berkeley.edu/projects/vision/grouping/segbench/.

[13] C. Fiorio, J. Gustedt, Two linear time Union-Find strategiesfor image processing, Theoret. Comput. Sci. 154 (1996) 165–181.

[14] C. McDiarmid, Concentration, in: M. Habib, C. McDiarmid,J. Ramirez-Alfonsin, B. Reed (Eds.), Probabilistic Methodsfor Algorithmic Discrete Mathematics, Springer, Berlin, 1998,pp. 1–54.

http://www.cs.berkeley.edu/projects/vision/grouping/segbench/

http://www.cs.berkeley.edu/projects/vision/grouping/segbench/

Documents

Semi-supervisedstatisticalregionreﬁnementforcolorimage ...nielsen/pdf/2005... · Grouping is the discovery of intrinsic clusters in data [1]. Image segmentation is a particular