Patch-based Generative Shape Model andMDL Model Selection ...image.diku.dk/ganz/Publications/miccai2010_final.pdf · Patch-based Generative Shape Model andMDL Model Selection for

Patch-based Generative Shape Model and MDL

Model Selection for Statistical Analysis of

Archipelagos

Melanie Ganz1,2, Mads Nielsen1,2, and Sami Brandt2

1 DIKU, University of Copenhagen, Denmark,[email protected] Nordic Bioscience Imaging, Herlev, Denmark

Abstract. We propose a statistical generative shape model for archipe-lago-like structures. These kind of structures occur, for instance, in medi-cal images, where our intention is to model the appearance and shapes ofcalcifications in x-ray radio graphs. The generative model is constructedby (1) learning a patch-based dictionary for possible shapes, (2) buildingup a time-homogeneous Markov model to model the neighbourhood cor-relations between the patches, and (3) automatic selection of the modelcomplexity by the minimum description length principle. The generativeshape model is proposed as a probability distribution of a binary imagewhere the model is intended to facilitate sequential simulation. Our re-sults show that a relatively simple model is able to generate structuresvisually similar to calcifications. Furthermore, we used the shape modelas a shape prior in the statistical segmentation of calcifications, wherethe area overlap with the ground truth shapes improved significantlycompared to the case where the prior was not used.

1 Introduction

In the field of computer vision as well as medical imaging one of the essentialtasks is to segment one or several objects from the background. In order to per-form a segmentation task, it is often easier to build a model of the interestingobjects including their shape and/or texture. Shape modeling can be performedvia deformable contour or level set methods, while histogram learning or classifi-cation approaches can be applied in texture modeling. But in the case of shapesthat e.g. have archipelago structures, traditional methods for shape and texturemodeling fail and one needs some other methods likely based on statistics, seeFig. 1. Similar research on modeling based on statistics has been done before ingeostatistics [1, 2] as well as in computer vision [3, 4].Many biological segmentation problems have to deal with archipelago-like struc-tures, e.g., brain lesions as observed in MRI or calcified deposits in the arteriesobserved by x-ray or CT imaging methods. In this paper, we will therefore focuson an example in lumbar aortic x-ray projections, where our goal is to auto-matically segment lumbar aortic calcifications that are related to cardiovasculardisease (CVD) and are good predictors of it [5–7]. The initial segmentation is

2

(a) (b) (c)

Fig. 1: Illustrations of archipelago like structures (a) in nature and (b) in a lumbaraortic x-ray; (c) simulation results of archipelago like structures in lumbar aortic x-rays.

performed by a pixel-wise classification algorithm, in our case random forests,trained by manual annotations of calcified lesions. The manual annotations arebinary, where the value 1 equals the detection of a calcified pixel, while the value0 corresponds to a background pixel. In order to improve the segmentation re-sult by the pixel-wise classification we want to build a generative shape modelfor binary patches and use it as a prior model in the classification and shapeanalysis.

2 Statistical Objective

We are interested in the general segmentation problem, where the likelihood ofthe pixel data is to be combined with a shape prior. The general solution for thesegmentation problem will be the posterior distribution

p(u|D)︸︷︷︸

Posterior

∝ p(D|u)︸︷︷︸

Likelihood

p(u)︸︷︷︸

Prior

, (1)

where u is a vector of latent variables or pixel labels, andD is the observed imagedata. Our goal is to construct the shape prior p(u), that statistically models thederived statistical structures of archipelagos shown in Fig. 1.

3 Generative Shape Model

To construct a prior model for archipelago like structures, we first will builda shape code book (Section 3.1) that contains the patch prototypes in whichthe structures are represented as building blocks. The grammar that models

3

Fig. 2: The causal neigh-bourhood for the patch v

which is a subset of theimage represented by u. Fig. 3: The training set of lumbar aortic calcifications.

the neighbourhood relations between the patches will be constructed as a time-homogeneous Markov model (Section 3.2). The patch size and number of patchesin the code book will be selected by the minimum description length (MDL)principle (Section 3.3), which completes our prior model for archipelagos.

3.1 Shape Code Book

Let the matrix X contain the n training patches of the size m×m each stackedinto a column vector, where the training patches are obtained by sliding a windowof the size m×m over the training images. The patches are to be summarized bythe m2 × k patch dictionary, D, containing the binary patch prototypes, whichideally minimize

E = ‖X−DA‖2fro , (2)

where, for a fixed j, aij = 1 for only one i = i′, while aij = 0 when i 6= i′ [?]. Ahas the size k × n and thus represents the sparse representation of X in termsof D. In general, we should minimize 2 over both D and A, but because it isa combinatory discrete optimization problem, we are satisfied by approximatingthe solution. We thus divide the problem into two parts:

1. We find the code book D by finding patch prototypes via clustering thetraining patches by K-means [8], and thresholding the prototypes to binary.

2. We find the optimal A, given the code book D, by picking up the prototypefor each j that minimizes 2.

Clearly, the code book is not globally optimal, but it gives us a fair model classwith varying patch sizes m×m and number of clusters k. The model selection,i.e., determining m and k will be described in Section 3.3.

3.2 Time Homogeneous Markov Model

The patch-based model does not yet describe the archipelago like structureswell, even though we could easily generate a random image that has a similar

4

patch histogram. We could trivially count the occurrence of each patch in thetraining images and generate a random image by drawing random patches fromthe empirical patch distribution. The problem however is that the neighbouringpatches are not independent, i.e., the neighbour patches significantly constrainthe outcome of the patch. To take these neighbour correlations into account wesuggest using a Markov model. This means that we assume that the patch prob-ability depends only on its neighbours.Another thing that needs to be taken in consideration when designing the priormodel is that sampling from the model should be feasible. We do this by assum-ing a time-homogeneous Markov model, i.e., we assume that the current patchprobability depends only on the neighbours that have been processed, i.e., thecausal neighbours, see Fig. 2. In practice, the probability distribution of thepatches v1,v2, . . . ,vN becomes

p(v1,v2, . . . ,vN ) = p(v1)p(v2|v1) . . . p(vN |v1,v2, . . .vN−1) (3)

=

N∏

i=1

p(vi|Nvi(v)), (4)

where Nvidenotes the causal neighbourhood of vi, i = 1, 2, . . . , N and N is the

total number of distinct patches of size m ×m in the image. This constructionallows sequential simulation of the patch distribution by first drawing the patchv1 from p(v1), then v2 from p(v2|v1), and so on.

3.3 Model Selection

To use our proposed model on data we need to find the optimal cluster numberk and optimal patch size m and estimate the transition probabilities for ourMarkov model. We decided to use MDL for the model selection due to its tangibledefinition of the model selection problem as the best model is defined to havethe minimal lossless transmission code length. MDL exactly fits to our purpose,since we are dealing with a binary problem for which it is easy to construct acompression model. Moreover, MDL provides a natural definition for noise, asnoise is considered everything that can not be compressed by the model [9].Let us first derive the code length for our model using a two-part coding model.The total code length of our model in bits is

L = Lpar + Lres, (5)

where Lpar = LD +LA is the code length of the model parameters and Lres thecode length of the residual. We choose to code D simply as a binary matrix, soone needs m2 × k bits to encode it, hence

LD = m2 × k + ⌈log2(max k)⌉︸︷︷︸

k

+ ⌈log2(maxm)⌉︸︷︷︸

m

, (6)

where the latter two terms, code lengths for k and m, are constant and can thusbe dropped. The content of A can be encoded by using the time-homogeneous

5

Markov model as soon as the 3-dimensional histogram H of patch labels andtheir causal neighbourhoods are available. The histogram can be encoded either,if sparse, by storing its Nnnz non-zero bin indices, and the counts in such bins;or otherwise by storing the counts in all the bins. In this way, assuming an idealcoding method,

LA = min(Nnnz · ⌈log2(n)⌉2 +

Nnnz

︷︸︸︷

⌈log2(Nnnz)⌉, k3 · ⌈log2(n)⌉)

︸︷︷︸

H

+−∑

log2(pk)︸︷︷︸

data

,

(7)where the conditional probability pk = p(vk|Nvk

(v)) of the patch k is computedfrom the histogram H.Lastly, let us consider the residual encoding, where the residual of our modelis ǫ = X − DA and each pixel can obtain only values {−1, 0, 1}. We can thuscode ǫ by only transmitting the indices of first the negative and then the positiveentries of the residual. In this way the code length for ǫ in bits becomes

Lres = q ⌈log2(Npix)⌉+ log2 ⌈q⌉︸︷︷︸

q

, (8)

where q is the number of non-zero residuals and Npix is the number of pixelsin the image. The latter term is bounded by log2 ⌈Npix⌉ and can thus be dropped.

4 Segmentation Using the Shape Prior

Our final goal is to use the prior model in segmentation by simulating the pos-terior (1) as

p(u|D) ≡ p(v1,v2, . . . ,vN |D). (9)

Assuming a separable likelihood, we may use the same time-homogeneous con-struction for which the prior was designed. Hence, the posterior at the time n

becomes

p(v1,v2, . . . ,vn|D) = p(v1|D)p(v2|v1, D) . . . p(vn|v1,v2, . . . ,vn−1, D), (10)

where

p(vn|v1,v2, . . . ,vn−1, D) ∝ p(D|v1,v2, . . . ,vn)p(v1,v2, . . . ,vn) (11)

∝

(∏

k:k∈vn

P (Uk = 0)1−ukP (Uk = 1)uk

)

p(vn|Nvn),

(12)

where k denotes the element of the latent variable vector u and P (Uk = 0) andP (Uk = 1) are the probabilities of the pixel k having the label 0 or 1. Theseprobabilities are given by the pixel classifier.We thus assume that the posterior is similarly sequentially simulated by firstdrawing the patch v1 from p(v1|D), then v2 from p(v2|v1, D), etc.

6

Fig. 4: The dictionary patches retrievedfrom training on all 18 calcificationpatches for m = 2 and k = 8.

Fig. 5: A simulated calcification image us-ing the MDL selected model dictionarypatches for m = 2 and k = 8.

5 Experiments

5.1 The Generative Shape Model

In our experiments, we used a training set of 18 manually annotated calcificationsfrom lumbar aortic X-ray radio graphs (Fig. 3). We selected the model class fromthe set of all pairs {m, k} over which we optimized the compression code length,where m = {2, 4, 6, 8, 10, 12} and k = {2, 4, 8, 16, 32, 64} . The normalized codelengths are shown in Fig. 6. MDL selects the model with m = 2 and k = 8.The dictionary of patches learned by the K-means for the model m = 2 andk = 8 is displayed in Fig. 4. We made simulations with the selected model bydrawing samples from the constructed shape distribution, as explained in Section3.2. One sample image is shown in Fig. 5. It can be seen that the shapes arequalitatively similar to the calcification shapes shown in Fig. 3.

5.2 Statistical Segmentation

To complete the experiments we apply the generative shape model as a shapeprior on a test set of 81 images displaying lumbar aortic X-ray data. In order to dothis we use the shape prior as described in Section 4 combined with the likelihoodfunction. The pixel-wise likelihood was constructed from the pixel classification

Fig. 6: The code length per pixel in bits.

n= / k= 2 4 8 16 32 64

2 0.15 0.12 0.10 0.23 0.24 0.244 0.26 0.22 0.15 0.23 0.72 1.356 0.36 0.36 0.21 0.27 1.00 2.158 0.50 0.45 0.27 0.32 1.05 2.4810 0.61 0.53 0.32 0.36 1.11 3.1412 0.69 0.66 0.37 0.38 1.03 2.95

Color Specs

Red 0.00-0.10Magenta 0.11-0.20Yellow 0.21-0.30Green 0.31-0.50Cyan 0.50-1.00Blue above 1.00

7

(a) (b) (c)

Fig. 7: Detailed result: (a) an annotation (ground truth), (b) the corresponding pixel-wise classification probabilities, (c) conditional mean, u = 1

N

∑u(n), of the posterior.

probabilities, where we used a random forests classifier with a set of 8 Gaussianderivative features. To measure the performance of our segmentation, we drawseveral samples u(n), n = 1, 2 . . . from the posterior distribution p(u|D) andestimate the expected value of the scoring function feval(u;uann), where uann

denotes the ground truth annotation, or

E{feval(u;uann)|D} =

∫

feval(u;uann)p(u|D)du (13)

≈1

N

∑

n

feval(u(n);uann) = feval(u(n);uann). (14)

We compare the resulting mean score with the value of feval(uref ;uann), whereuref is the classification probability map thresholded at 0.5 according to trainingdata balance. As evaluation function feval(u;uann) we use the Jaccard Index [10]

feval(u;uann) =|Iu ∩ Iann|

|Iu ∪ Iann|, (15)

which measures the area overlap between the binary segmentation results andthe manual annotation, which we assume to be our ground truth. The numericalresults for feval(u;uann) and feval(uref ;uann) are given in Table 1. Our methodimproves the classification results on average by 30%. We can prove statisticalsignificance of our modelling vs. the simple thresholding, where a Wilcoxon-Mann-Whitney test yields a significant difference with p = 0.0057. A closer lookat a manual annotation compared to our result can be seen in Fig. 7.

Table 1: Evaluation of the segmentation results using the Jaccard index against themanual annotation.

For 81 test images Mean

feval(uref ;uann) 0.17

feval(u;uann) 0.22

8

6 Conclusion

In this paper, we have proposed a generative model and MDL model selection forshape distributions with structures resembling archipelagos. The model is basedon patch-based description of the shapes combined with a time-homogeneousMarkov model that takes patch correlations into account. Our selection forthe dictionary, the K-means-clustered patch prototypes, seems reasonable eventhough it is not strictly optimal in the Frobenius norm. However, searching forthe optimal codebook is itself a combinatory optimization problem and less im-portant in practice. As far as the Markov model is concerned, by summarizingblocks of the images by patches it is able to model longer interactions than thoseof neighboring pixels which is important when one wants to generate visuallyacceptable results with a relatively small amount of training data that would notbe possible with only pixel-based Markov-models. In our experiments, the MDL-principle yielded a simple model with eighth 2× 2 patches, while the generativemodel produced realistic structures by simulation. In addition, our segmentationresults were promising indicating that our shape model can be used as a priordistribution in statistical segmentation of calcifications on X-ray image data.A possible direction in the future could be introducing an appropriate multi-resolution extension of the generative model that would be able to model evenlonger interactions between patches.AcknowledgementsWe gratefully acknowledge discussions with and assistancefrom Jesper Moeller and Rasmus P. Waagepetersen and their group.

References

1. Zhang, T., Switzer, P., Journel, A.: Filter-based classification of training imagepatterns for spatial simulation. Mathematical Geology 38(1) (2006) 63–80

2. Strebelle, S.: Conditional simulation of complex geological structures usingmultiple-point statistics. Mathematical Geology 34(1) (2002) 1–21

3. Mairal, J., et al.: Discriminative learned dictionaries for local image analysis. In:IEEE Conference on Computer Vision and Pattern Recognition, 2008. (2008)

4. Zhu, S., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy(FRAME): Towards a unified theory for texture modeling. International Jour-nal of Computer Vision 27(2) (1998) 107–126

5. Wilson, P., et al.: Abdominal aortic calcific deposits are an important predictor ofvascular morbidity and mortality. Circulation 103(11) (2001) 1529

6. Witteman, J., Kok, F., van Saase, J., Valkenburg, H.: Aortic calcification as apredictor of cardiovascular mortality. The Lancet 2(8516) (1986) 1120–2

7. Bolland, M., et al.: Abdominal aortic calcification on vertebral morphometry im-ages predicts incident myocardial infarction. Journal of Bone and Mineral Research25 (2009) 1–28

8. MacKay, D.: Information theory, inference, and learning algorithms. CambridgeUniv Pr (2003)

9. Rissanen, J.: MDL denoising. IEEE Transactions on Information Theory 46(7)(2000) 2537–2543

10. Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist 11(2)(1912) 37–50

Documents

Patch-based Generative Shape Model andMDL Model Selection ...image.diku.dk/ganz/Publications/miccai2010_final.pdf · Patch-based Generative Shape Model andMDL Model Selection for