New Image Analysis Project · 2013. 12. 10. · Image Analysis Project Christoﬀer Cronstro¨m, Joel Sj¨obom 10th December 2012 1 Background and Goals Everyone loves comics. A large

Image Analysis Project

Christoffer Cronstrom, Joel Sjobom

10th December 2012

1 Background and Goals

Everyone loves comics. A large number of them are printed every day, andmodern social media allow amateurs to draw and post their own webcomicsonline at low cost. Reading comics, however, has remained a similar experienceever since the “golden age” in the 1950s. Either the comics are printed on paperand you use your thumbs to flip pages manually; or they are static images ona screen, navigated using pagination links or pinch-and-drag touch interfacesmimicking the paper experience.

Modern technology does however have great potential for providing us withnew and exciting ways to read our comics! If the device has a small screen, suchas a mobile phone, it might be appropriate to show the panels one by one. Ona slightly larger screen, such as a tablet, one might comfortably display two orfour panels, whereas a desktop monitor might show more panels than we wouldnormally find on a printed page.

There are other aspects as well, such as resizing speech bubbles and reflowingthe text within to provide a better reading experience. Or inputting that textinto a text-to-speech system, to synthesise a voice reading the comics for you. Acontext-aware system could even identify which character is speaking, and usea different synthetic voice for each character, maybe even adapting the voice tofit the physical characteristics of the character.

Due to the unlimited possibilities but limited time for this project, we havepicked only one problem and focused on it: the segmentation of a comic intoseparate panels. The goal is to have an algorithm that can segment a given imageinto separate panels. It should be general enough to be capable of segmentingdifferent kind of comics, but can be guided by a priori knowledge of the comic(such as, for instance, the background colour used between panels).1

2 Methods

Our first goal was to segment a basic comic, ie. a comic without any difficultedges or speech bubbles between panels. This led us into the bw-package in

1This knowledge would in a production system be provided by a separate system, maybe

using information gathered from other sources than the comic itself. One could imagine,

for instance, the system learning certain parameters from previous pages in the same comic

book, or even using data such as the comic URL or artist name, etc. For this reason we feel

comfortable tweaking some parameters manually.

1

Matlab where we spent a fair amount of time researching the different seg-mentation methods available. In the end we settled for bwconvhull that findsthe convex hull of all objects in a black and white picture, and bwlabel thatfinds connected areas and numbers them. These methods form the backbone ofour program and work great if the panels in the image are fully separated.

The next step was to deal with speech bubbles that overlap two panels butthis proved to take more time than anticipated. We tried number of differentways to solve the problem and finally managed to solve it by using computerlearning. A number of features was used, mainly different derivatives. An imagewas convolved with the differentiation filters and the pixel areas containing textwas used as training data. This gave us the mean and standard deviations ofthe feature values for text. This could then be used to find the text in manydifferent images. The areas that tested positive for text was replaced with thebackground color for the image which takes care of the problem of segmentation.Unfortunately it also gives rise to another problem namely that the text thatlay between panels is lost after the segmentation. This however is fairly easyto remedy by detecting which text is overlapping the cut-away background andconcatenate it to the panel it mostly overlaps. This is not a perfect solution butit works in most cases.

3 Results

The final product after processing is a matrix in which every pixel is assignedto a panel, and some additional information about where in the image text wasfound, and to which panel the text belongs.

In simple cases, such as figure 2, the text detection bit of the algorithm isnot necessary.

We tried segmenting an entire comic book episode (18 pages) using onlyknowledge of the speech bubbles on the first page. The results are shown inthe appendix. The algorithm consistently finds speech bubbles, and to a fairlylarge degree manages to separate panels, even when they are linked together byspeech bubbles. There are places where it failed, such as page six in figure 1e,but overall it has done a fairly good job of working everything out.

2

To test our program further we have segmented an entire comic (the resultcan be seen in the appendix). Below is a discussion of the different segmentationsand ideas on how the program could be made better given more time.

Page two in the Avengers comic, found in figure 1a. Here we see that thespeech bubbles divides a panel that is then segmented into two different parts.This could be prevented if text bubbles were prevented from being the soledelimiter between two panels. There is also a difficult bubble that is almostentirely in the wrong panel and is naturally put that instead of the right one.To fix this we would need a much more advanced method to determine whichpanel the bubble belongs to. One idea could be to find out what way the arrowof the bubble is pointing.

Page three in the Avengers comic, found in figure 1b. This page is perfectlysegmented. It might be thanks to the fact that all the training data is takenfrom this image.

Page four in the Avengers comic, found in figure 1c. Here we can see thatour program also detects the text on the newspaper.

Page six in the Avengers comic, found in figure 1e. Here two panels has beenmerged to one, this is because the edge of a speech bubble is not detected andtherefore the segmentation failed. This could be resolved by dilating the textareas, although that could result in other difficulties. Another solution could beto include the speech bubbles edges in the training data.

Page eight in the Avengers comic, found in figure 1g, has not segmentedproperly. The reason for this is that two of the panels are embedded insidea larger panel, a case which we have not focused on. It could definitely beresolved, but it is outside the scope of this document.

Page ten, in figure 1i, is largely unproblematic except for one small speechbubble joining two panels. This bubble was perfectly resected and the segmen-tation worked out fine. This indicates that the machine-learning from page twois useful, and that the algorithm works as intended in simple cases.

Page eighteen in the Avengers comic, found in figure 1q. Here we can seethat bunch of speech bubbles has been merged into one. To solve this we wouldneed to detect the edges between the different bubbles. This could possibly bedone with the help of bwselect that returns the connected area to a point.

4 Discussion

In the end we are quite happy with our results even though we didn’t comeas far as we had initially hoped. A lot of the time spent on the project wentto getting to know the bw- image segmentation package and realizing just howmany special cases that would have to be taken into account to make the perfectcomic segmentation tool. In the end we decided to specialize on text betweenpanels. One problem was that the method flagged many of the boundaries astext as well. It would not be a problem if we did not try to put back the removedtext into its panels but now that we do, we sometimes get large areas falselyflagged as speech bubbles. This should not be very hard to fix but as we aretight on time it could not be fully fixed.

The biggest obstacle was to get the training data from one type of comic towork on another comic. We suspect it boils down to text fonts and mainly thefont size. And possibly the fairly basic features we have chosen. We could have

3

used a finished ocr package (text recognition) but we wanted to try to solve itfor ourselves. It should definitely be possible to solve it the way we have startedif we had some more time to tweak the computer learning algorithm.

A Images

This appendix contains all the example comics referred to in the text.

4

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(a) Page 2

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(b) Page 3

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(c) Page 4

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(d) Page 5

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(e) Page 6

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(f) Page 7

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(g) Page 8

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(h) Page 9

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(i) Page 10

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(j) Page 11

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(k) Page 12

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(l) Page 13

5

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(m) Page 14

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(n) Page 15

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(o) Page 16

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(p) Page 17

50 100 150 200 250 300 350 400 450 500 550

100

200

300

400

500

600

700

800

(q) Page 18

Figure 1: An entire episode of a superhero comic.

6

100 200 300 400 500 600

50

100

150

200

250

300

350

400

Figure 2: A simple comic and its segmentation.

7

Rikard Lindahl, F09

880106-‐4018

Handledare: Petter Strandmark

2012-‐12-‐07

Inledning I projektet erhölls 721 bilder av celler som befinner sig i olika stadier. Varje cell kunde befinna sig i ett av sex möjliga stadier och uppgiften var att skriva ett program för automatisk klassifikation av vilket stadie en cell är i. Projektet blev en slags utvidgning på OCR-‐uppgiften som gjordes i bildanalyskursen.

Metod Först konstruerades en bas för OCR-‐systemet. Denna bestod av en del som tränade data, en del som tog ut särdrag ur bilden och en del som klassificerade cellen. I delen som klassificerade k-‐nearest-‐neiutnyttjades medelvärdet och standardavvikelsen från den tränade datan. Då en särdragsvektor skulle klassificeras subtraherades den alltså med medelvärdet och dividerades med standardavvikelsen. Dessutom normerades bilderna med deras storlek för att bilderna var olika stora. Eftersom bilderna var väldigt diffusa histogramutjämnades de också.

Till resterande bilder, som inte ingick i träningsdata, skrevs ett program för att kontrollera hur många bilder som klassificerades rätt. Detta användes för att se hur koden kunde förbättras (främst i koden som beskrev särdrag). Resterande del lades framförallt på att förbättra särdragen. Slutligen skrevs en kod där en bild klassificerades i taget.

Särdrag

Först undersöktes bilderna utan tröskling. Då faltades de med motsvarigheten till derivatan, i både x och y-‐led. Även faltning med ett laplacefilter gjordes. Sedan summerades totala intensiteten i bilderna i de tre fallen. Därefter utnyttjades tröskling i tre nivåer och summation av antalet vita pixlar i bilden samt största och minsta antalet vita pixlar på rad i x-‐ respektive y-‐led räknades.

Bilderna trösklades även i tre nivåer (men annorlunda valda) då antalet objekt i bilden räknades. Samma tröskelnivå användes då den statistiska egenskapen sum(p.*log2(p)) (entropy i MATLAB) hos bilden räknades ut (som har med strukturen att göra), där p var histogramräkningarna av bilden.

Slutligen trösklades bilderna i 5 steg då MATLABs funktioner

-‐ arean på bilden.

-‐ den fyllda arean (då alla hål i objekten även räknades).

EulerN -‐ räknar antalet objekt och subtraherar med antalet hål i objekten.

EquivD -‐ räknar ut diametern av en cirkel vars area motsvarar arean av ett objekt.

O -‐ vinkeln mellan x-‐axeln i bilden och x-‐ i ett objekt.

E -‐ avståndet mellan fokuspunkterna i ellipsen i ett objekt dividerat med största axeln i ellipsen.

-‐ avståndet runt ett objekt.

-‐ arean av minsta konvexa polygonen som innesluter ett objekt.

-‐ antalet pixlar i ett objekt dividerat med antalet pixlar i minsta konvexa polygonen som

innesluter objektet.

-‐co-‐ -‐matris (matris som räknar antalet gånger två pixlar av en viss

intensitet befinner sig bredvid varandra horisontellt) räknats ut kunde även egenskaper från denna

undrsökas.

-‐ mått på intensitetskontrasten över hela bilden (beräknad över horisontella grannar).

-‐ korrelationen mellan en pixel och dess horisontella granne över hela bilden.

-‐ -‐co-‐ -‐matrisen.

-‐ -‐co-‐ -‐matrisen till dess

diagonalelement.

Resultat I figur 1 ser vi några typiska bilder före histogramutjämning. I figur 2 visas resultatet av histogramutjämning av några bilder från olika stadier i celler och tröskling av samma.

Figur 1. Tre exempel på bilder av celler utan någon bearbetning.

I träningsdatan ingick de första 400 bilderna. Resterande 321 bilder utnyttjades för att kontrollera k-‐nearest-‐ avståndet (med absolutbeloppet)

mellan en okänd bilds särdragsvektorn med alla särdragsvektorer i träningsdatan. De k närmsta grannarna avgjorde sedan vilken klass bilden tillhörde genom att räkna röster. T.ex. om de tre närmsta grannarna kontrollerades, och alla tre gav samma röst (säg klass 1), klassificerades den som klass 1. Om istället klass 1 fick två röster och klass 2 en röst klassificerades den också som klass 1. Slutligen, om tre olika klasser fick lika många röster klassificerades den som den närmsta av dessa grannar. Nedan, i figur 3, visas resultatet av lite olika många grannar i klassificeringen.

Antalet grannar i k-‐nearest-‐

Antalet rätt i procent

1 83.5 3 84.7 5 84.1

Figur 2. De tre första kolumnerna är histogramutjämnade bilder av de olika stadierna i en cell (varje rad är ett stadie). De tre sista kolumnerna är samma bilder trösklade (med tröskel >160).

Figur 3. Antalet rätt i procent då de 321 okända k-‐nearest-‐ -‐

metoden.

Två stycken utskrifter av resultatet, då endast en bild skickats in, visas nedan i figur 4 -‐nearest-‐ne .

Diskussion Till en början då endast särdrag togs från faltningar och summa av pixlar låg antalet rätt gissningar på lite drygt 50%. Denna siffra ökade signifikant då flera olika trösklar användes i funktionerna. Sedan

utnyttjades försämrades resultatet. Först när ett större antal användes ledde det till ett bättre resultat. Detta finner jag intressant. Antagligen kan det vara så att när bara en av egenskaperna extraheras blir det svårt för programmet att avgöra stadiet i en bild, men när flera samtidigt används kommer det bli en sammanvägning av alla. Vissa kanske då säger att det borde vara en viss klass men ett antal fler väger över till den riktiga klassen. Detta skulle då förklara det bättre resultatet då fler särdrag används.

Andra saker som påverkar antalet rätt är var man väljer att tröskla bilden och antalet gånger samma bild trösklas. Jag upptäckte att resultatet kunde variera ganska mycket beroende på detta. Det var inte alltid bättre att ha med många trösklar. Detta är inte så konstigt och kan säkert variera mycket beroende på vad för slags bilder som undersöks. Det svåra blir att göra kvalificerade gissningar på detta och jag fick mest gå på hur resultatet ändrades då jag ändrade detta.

En annan sak jag började fundera på var normaliseringen för att göra alla särdrag lika viktiga. Visst är detta viktigt för att inte ett visst särdrag ska ta övertaget när klassificering sker. Men vad jag kom fram till är att vissa särdrag faktiskt kan vara viktigare än andra och om man vet vilka detta är skulle man på något sätt kunna märka ut dessa och göra de viktigare för att förbättra koden. Jag kan också tänka mig att det blir problem om man har med för många särdrag i sin kod (och alla räknas som lika viktiga). Då skapar det fortfarande problemet att särdrag som ger missvisande tal kommer ha för stor inverkan på klassificeringen.

>> [gissning riktig] = cellIdentifier(401) >> [gissning riktig] = cellIdentifier(403)

gissning = gissning =

nucleolar centromere

riktig = riktig =

401;nucleolar 403;fine_speckled

Figur 4. Vänster kommandofält visar när programmet ger en riktig gissning. Höger när programmet gissar fel.

Man kan notera att i figur 3 så ökar antalet rätt i de två första stegen, vilket är logiskt. Men sedan minskar det då 5 grannar tas med, vilket man annars först kanske skulle tro skulle ge ett lite säkrare resultat. Att det minskar kan bero på att särdragsvektorn inte är tillräckligt bra och att osäkerheten i klassificeringen påverkas negativt då fler grannar tas med i beräkningen. Det skulle innebära att om

-‐nearest-‐ då fler grannar togs med. Även mängden träningsdata kan spela en stor roll. Mer data ger större säkerhet. Men vad man också kan notera är att resultatet i figur 3 inte skiljer sig nämnvärt för olika många grannar. Det kan bero på att ju säkrare särdragsvektor och träningsdata man har desto troligare är det att närmsta grannen klassificerar en okänd bild rätt.

Tyvärr nådde jag aldrig ett resultat som jag var riktigt nöjd med. Jag hade behövt mer tid för att hinna lära mig förstå hur vissa egenskaper fungerar och hur särdragsvektorn på så sätt skulle kunna konstrueras på ett bättre sätt. Jag hoppas att få lära

Object recognition using SIFT and Bag Of

Words

Project In Image Analysis 2012

Sverker Rasmuson, Jonna Hellström.

Supervisor: Erik Ask

December 10, 2012

1 Introduction

Our goal in this project is to try use the Bag Of Words model to be able todistinguish between different objects in a traffic situation e.g. cars, persons,motorbikes, trucks etc. The first step would be to use the algorithm to decide ifa certain object appears in an image. If possible, we also want to try to localizethe given object in the image, encircling it with a polygon.

2 Background

Bag of words is a commonly used method for text recognition. It can be usedto categorize contents of a web page as e.g. politics, economics, sports etc. Itworks by creating a huge vocabulary tree that takes words and place them ona certain leaf node in the tree. In this leaf node common words belonging to acertain category will end up, e.g. ”mortgage”, ”stocks”, ”market” and ”inflation”,will probably end up on a node related to economics. When sending a new textto the tree, all its words are sent to different leaf nodes, and the most commoncategory among these nodes classifies the text.

This method can be generalized to object recognition in image analysis. Insteadof words local features in the image are used.

3 Theory

When expanding the method to object recognition purpose, local features de-scribes an image in the same way as words describe a text. To extract repre-

1

sentative features for an image the algorithm SIFT is used.

3.1 SIFT

SIFT (Scale-invariant feature transform) is a method for extracting local fea-tures from an image. These features are in large insensitive to scale, rotationand translation. They are also partially invariant to illumination, perspectiveand other image distortions. It works by first creating a scale space of differentblurred versions of the original image. This is done by convolution with Gaus-sian kernels using different standard deviations. These blurred images are thensubtracted from each other, creating a band pass filter - so called Difference ofGaussians, see Fig. 1.

Figure 1: Illustration of the scale space and Difference of Gaussians conceptsused in SIFT.

Key points are chosen as extreme points in these images, after first removingedge responses and points with low contrast. Dominant directions in and aroundthis point is computed from image gradients at different scales, making the keypoint invariant to rotation. To create further invariance to image conditions,16*16 region around the key point is considered. Histograms with 8 bins arecreated from orientation and magnitude values of 4*4 subregions in this area.

2

The final key point descriptor is the bin values from these 16 histograms, storedin a 16 ∗ 8 = 128-dimensional array, see Fig. 2.

Figure 2: The final descriptor in SIFT is calculated using histograms of imageorientation and magnitudes around the key point.

3.2 Clustering

After the features has been extracted the tree structure mentioned earlier hasto be constructed. For this purpose hierarchical K-means is used. K-meansis a stochastic clustering algorithm. K cluster centers are chosen randomly asstarting points. The features that are nearest to each cluster center becomes amember of the respective cluster. After this the centers are updated and movedto the center of mass of its member features. This is repeated iteratively until thealgorithm stabilizes. Each of these K clusters are then treated as the startingpoint of new iteration of the K-means algorithm, creating a hierarchical treestructure, see Fig 3. The method is then called Hierarchical K-means.

3

Figure 3: The Hierarchical Tree used in the Bag Of Words model. Also showsand illustration of SIFT and inverted lists.

3.3 Forward lists

For each category, e.g. ”cars”, ”houses” or ”trees”, there is an associated list,or array, which keeps track of how common this category is in each leaf node.See Fig.4 . This is later used to measure how similar a given test image is to acertain category, using e.g. L1-norm or L2-norm.

4

Figure 4: An example of how a forward list can be created for different cate-gories.

3.4 Inverted lists

To be able to know which images that a certain leaf node represents, a so calledinverted list is used for each leaf node in the tree. This inverted list is simplycontaining a reference for each occurrence of features represented by a node fora given image.

3.5 Weighting

The weighting can be implemented in many different ways, but generally the ideais to try to compensate the difference in how many descriptors from differentclasses that are used to build the tree. If e.g. 1000 images of cars and 2000images of motorbikes are used to build the tree, the result may be biased ifweighting is not used.

3.6 Object Localization

When an image has been decided to contain e.g. a car, it is also interesting topoint out were the car is in the image. To be able to do this, each feature hasto be weighted according to how likely that it is a ”car feature”. Further thesefeatures have to be joined together to localize the whole of the car. The methodused is to fit a Gaussian function on each feature, weighing it with how likely itis to belong to a certain category. All these Gaussian functions are then added

5

together, creating a surface above the image. This surface is thresholded andfurther processed so that only a polygon encircling the object is left.

4 Implementation

The algorithm was implemented in MATLAB as described above. The opensource library VLFeat was used for the SIFT and Hierarchical K-means func-tions. Some important parameter settings can be seen in Table 1.

Branch Factor (K) 10Number of leaf nodes 10001

Number of images in database 4222Norm used to classify images L1-norm

Weighting none

Table 1: Final parameter settings used in the project.

The image set used contains 1144 images of cars, 808 images of motorbikesand 2270 negative images. It has been downloaded from The Pascal ObjectRecognition Database Collection.

5 Results

The success rate of the classification part of the algorithm was 25/30 = 83%.See Fig. 5-10 for results of the object localization.

Figure 5

6

Figure 6

Figure 7

Figure 8

7

Figure 9

Figure 10

6 Discussion

This turned out to be a more complex problem than we expected. During ourdevelopment of this algorithm we have tried a fair number of image sets. Buteven for fairly easy image sets it was still hard to get satisfying results. Ourgoals were probably set pretty high when starting out, and thus we soon hadto restricted ourselves to only two classes; cars and motorbikes. It was alsohard to find simple enough images with both cars and motorbikes in the sameimage (a traffic situation), so the final test images depicted either one of theseclasses.

One problem with the algorithm was the high number of parameters used. Tobe able to find optimal results a more formal approach on testing those valueswould be needed. In this project we have used a rather ad hoc trial and errorapproach since we have had enough trouble of getting any results at all.

For our simple image set however the final algorithm worked fairly well. Ac-cording to our supervisor we were to except about a 50% success rate, which weexceeded with a wide margin. The object localization part also worked pretty

8

well overall, as seen in the Results section above. Sometimes though erroneousareas of interest were found, as in Fig. 4. Sometimes almost the whole imagewas included, as in Fig. 3, making the localization rather pointless. Generallywe found that motorbikes were a lot easier to localize than cars, so obviously thecharacter and viewing angle of the object affects how difficult the task becomes.Finally it would have been interesting to test localization on larger images, andimages containing both cars and motorbikes.

Generally we used a pretty small image set of about 4000 images. The resultswould probably have been better using a larger image database.

9

Cartoons for smartphones

-‐Project in Image Analysis FMA 175

Martin Fredriksson

Alexander Quist

Supervisor: Olof Enquist

2012-‐12-‐09

Introduction Cartoons in magazines or on the Internet are almost every time created to be read from left to right and up to down. Today a lot of people are using smart phones with relatively small screens and to be able to read a cartoon strip the user has to scroll and zoom a lot, this can make the reading experience less enjoyable.

The aim of this project is to investigate the possibility to automatically divide and cut a cartoon strip into smaller images. By doing this correctly each cartoon square in a strip would become a separate image and it would be possible to show the new images in a correct succession without the need for zooming or scrolling to the sides. Since most of the cartoon strips separate their boxes in different ways; we are going to try to create an algorithm that works well for as many as possible. Our goal is to create an algorithm which can detect and follow the edges of an image. Even if many cartoon strips have different design and separate the boxes differently we believe that our algorithm will be able to detect and follow many different types of separations with just some small adjustments needed.

Almost every cartoon strip or magazine has its own characteristic way of separating the different boxes from each other and the way of separating boxes differs even in the same strip, see Figure 1. The cartoon strip in Figure 1 fit our project well since we want to create an algorithm which can detect and follow a frame. It does not need to be a horizontal or vertical line since the goal with our algorithm is to follow the frame and investigate where to cut the new images.

Figure 1. An example of how the boxes of a cartoon strip are separtated.

Method In this project we have been using our knowledge from the Image analysis course to make an algorithm which can cut a comic strip into many smaller images. We used a scanned in comic magazine as material and wrote our algorithm in Matlab. After completing the main structure of the algorithm we used other comic strips to tune the algorithm so the threshold works for many different comic strips.

The algorithm When we started discussing the overall design for our algorithm we came to the conclusion that we

the most basic commands in Matlab (like if, while and size). This meant that we would have to do the majority of the work by ourselves instead of using already existing algorithms. But the extra work load felt reasonable since our experience with Matlab is somewhat limited, so our algorithm is mostly made up of a lot of if-‐ and while-‐commands.

-‐works by splitting the image over and over again until it cannot find any more appropriate cuts; this means that we can handle any number of panels in one image as long as we can find appropriate cuts.

-‐ -‐part means that the algorithm works by applying three different functions on the image. The first function is a very simple one that checks for any possible horizontal cuts, since it is very simple it will only detect the very obvious ones (i.e. a completely white row). If this function cannot find a valid cut we go to function number two. This one is basically the same as the previous one but it works vertically. These two functions exist to improve the computation time of the algorithm since the third part is somewhat computational heavy.

The third and final stage of our imaginary rocket is without a doubt where we have spent most of our time developing, it handles panels that are separated by a crooked/skewed white area. It works something like this:

1. It searches the bottom of the image for the maximum white pixels in a straight line; if it finds a column which has more pixels in a line go on to the

2. Using the position that was

going left), these will traverse the image trying to find a proper path that we can use to cut the image. The main purpose of these two cursors is to find out if it is possible to cut the image at all and secondly where we are supposed to cut. These cursors work by following black lines. When theses cursors have reached the top of the image they will return their minimum-‐ and maximum X-‐coordinates (column number), these are used to cut the images.

3. These coordinates are being checked by the program to see if they are reasonable, since they might be completely wrong or the cursors might even have failed and given up (i.e. returned coordinate 0). One cursor may copy the others value if it has failed while the other succeeded.

4. The image is then split into two if at least one of the cursors above succeeded, otherwise the originany corners of the panels, a simple mean-‐value would cause us to lose pixels in both images.

We also have a tiny function that helps us get rid of excessive white areas around each panel, making them slimmer. This function is called for every new cut.

Here is a simple example of how the algorithm splits a page into smaller and smaller pieces.

1. It will detect a horizontal cut and split the image into two

2. It will then check the top half of the image,

however then move on to the second one; where there is a possible cut.

3. It will now check the middle image for

so

4. It will now check the two middle images without finding any possible cuts so it moves on to the bottom. There it will find a crooked cut. After that it will go on to search for more cuts, but without any results. The algorithm

Results The result of our algorithm is good, we succeeded to cut out every comic box from all the images without destroying or lose images, text boxes or text balloons. One blemish is however that you got a small part of the comic box to the left or right hand side depending on where the cut fall, see Figure 3c-‐3f. But otherwise you would lose a bit of the comic box which we think is worse. In Figure 2 you see an example how the material we worked with looked like, a scanned images from a comic magazine.

Figure 2. An original page from a comic magazine we scanned in and used the algorithm on.

Figure 3. How the image in Figure 2 are cut in to smaller images. The images are named a-‐h, start from the top left. As observed, the three-‐stage rocket manage horizontal, vertical and crocked frames good.

Discussion The algorithm we have created in this project worked well for the kind of separation of panels in the comic strips we investigated. That is, comic strips with white between the panels. We chose to investigate this specific design because most of the comics we found looked that way. One way to develop the algorithm to able to cut comic strips who does not consist of pure white separation is to calculate the value of the color that separate the panels and find the separate panels by investigating when this value deviates too much.

In our creation of the algorithm we got aware of that the different thresholds were really sensitive, especially when we tried the algorithm on other comics. The threshold that worked excellent on one strip totally ruined cuts in another comic. This is understandable because different comics have different designs and e.g. the space between two panels can be really small (a few pixels in an image that consist of about 3000x2000 pixels). Small changes in those sensitive thresholds made major different in the result. Another problem was that our algorithm investigated how many black pixels there is in a row or column to determine if there is separation between two panels. So if the panel had a text balloon along one side it could ruin the cut so our algorithm had to be very precise.

To make the algorithm work on comic strips of different size we introduced a scale factor. The scale factor we used depended on the size (number of rows and columns) of the image. This scale factor

pixels and this scale factor made it work on a small image (400x600). One trouble we encountered was when we used our algorithm on a very light colored comic strip. One way of solving this problem could be to introduce a similar scale factor for the color value of the image. Another way could be to calculate the value of the color, both inside the comic boxes and in the frame between each box. Then statistically calculate what threshold to use on the color value.

When cutting crooked comic boxes the algorithm uses the maximum or minimum x-‐value from the left or the right cursor (see explanation of the algorithm earlier). What value(s) it chooses and from what cursor differs; depending left or right image. This results in that a piece of the neighboring panel ends up in the new image (as seen earlier, Figure 3c-‐3f). It has to be this way because otherwise some information in the crooked box would disappear. One way of making it look

One disadvantage with our algorithm is that it has no way to check if the cutting was good or not. A solution could be to use Matlabs BWconnected-‐function to investigate how many areas every image has and compare it to the number of images after the cutting. Only the human eye can really tell if the result is good or not.

crooked lines that are vertical. Because of time limitations we never completed the horizontal one; due to the fact that it had required a minor overhaul of the already working vertical one. How the cursors move would be almost identical, but they would move from the left to the right instead of from the bottom to the top. But the big problem would be to rewrite the part of the algorithm that checks if the chosen path is valid or not, since it relies on the fact that we reach the top of the image without hitting the sides of the image.

Making the algorithm handle crooked horizontal lines would probably be our first priority if we had more time. But we would probably also have tried to fix so it managed to split panels even though there might be some noise in between, this noise can be either some noise from the scanner but also things like arrows (that point from one panel to another).

Overall we would like to say that we are satisfied with the result of our algorithm

ving a 100% hit rate on all comics in the world is more or less only a dream. It would require a long long tsmartphone friendly.

Jakob Wiesmeyr 1 Introduction Project Image Analysis

Image Analysis ProjectImage retrieval and SIFT

December 11, 2012

Jakob Wiesmeyr880506P176

Supervisor: Linus Svärm

1 IntroductionIn the project: Image retrieval and SIFT we use the famous SIFT descriptors to obtain ansimple image retrieval system using the toolbox vl_feat. Nevertheless I put mostly thefocus on the querying part where different methods have been studied and implemented.Let’s recapture fast how such an image retrieval works. For an input image, often calledquerying image, we extract the SIFT descriptors for this image. Next we determine towhich visual words (i.e cluster centers of a k-mean algorithm which runs on the wholedata set), the SIFT vectors of the query image are close to. This is how we get thevocabulary of the image. This means, if we have for example K different visual words,then the vocabulary of the image might be 1, 3, 7, ...23, .... Also every image in thedata set is represented by his own vocabulary. Now there are different methods how wecan determine which images in the data set are now similar to the query image. SinceSIFT (i.e Scale-invariant feature transform), we can for example find similar images ofan object from different position or situations (light, surroundings,...).In the report are just some minor parts of the code displayed but if you are interestedin more detail, please send me an email to [email protected] and Iwill send you the different files.

2 Gain the dataHere we want shortly describe how we get the data. As already mentioned we usedthe vl_feat toolbox and furthermore an image bank from the internet (more details inReferences ). We used a total number of N = 1600 images, mostly due to the fact thatmy laptop is not the newest an easily runs out of memory. For these images we calculatethe SIFT descriptors in the following way.

1 I = rgb2gray (bild);

2 I= single (I);% to single precision .3 [~,D] = vl_sift (I); % computing the sift - vector of the gray scaled single image I.4 % Now , the 128 - dimensional sift - vectors are placed as columns in matrix D.

1

mailto:[email protected]

Jakob Wiesmeyr 3 Querying methods Project Image Analysis

We do that for all the images and save the result in an matrix. The next step is togain the visual words i.e. the cluster centers. We get them as follows, we run an k-mean algorithm on the whole SIFT data matrix which we gained above. We run anhierarchical k-means (hik) using the vl_feat function vl_hik. From this function weget a tree where all the data is saved. It is a little tricky to regain the cluster centersand the assignment vector (tells us which vector in the data matrix is assigned to whichcluster center), therefore we have a function HIK(data, k) which does this for us. We usea number of K = 75 = 16807 cluster centers and the function is used as follows.

1 k=7; % 7^5=16 8072 [ visual_word ,A] = HIK( class_data_mat ,k) ;

3 % The program returns both the cluster centers visual_words4 % and the data -to - cluster assignments A, using the hierarchical k means and5 % various other functions to get the visual_word and the vector A6 % the number of cluster center is all the time the input k^5

Again we save this data for further use. This is mainly the basic for the querying part,which will be described in the next section.

3 Querying methodsIn the following we want to discuss three different methods and also compare them.One function we use in all three methods is Im2voc(I) or a little changed version of itIm2tf_idf(I), which takes an image I and gives back the vocabulary of the image. Thisis done by calculating the SIFT vectors of this image and finding out to which visualwords the SIFTs of the image are close to. Another crucial function in our implementa-tion is the function hist_gen(cen, K). This function generates a histogram of an imageconcerning the occurrence of the visual words 1, ..., K in the image. We get from thefunction Im2voc(I) a vector cen which tells us how often the K visual words appear inthe image, hist_gen generates then a vector of the length K, where each entry at po-sition j refers to how often the visual word j appears in the image. An implementationof this function can be done as follows.

1 function [ hist_vec ] = hist_gen ( cen ,K )

2 % this function generates an hist - vektor for the occurrence of the visual words ,K number of

3 % cluster centers45 censet = unique ( double (cen));

6 n= length ( censet );

7 count = hist ( double (cen),censet );

8 hist_vec = zeros (K ,1);

910 for k=1:n

1112 hist_vec ( censet (k))= count (k);

1314 end15 hist_vec = sparse ( hist_vec );

16 end

Now we are in the position to discuss the different methods used for querying.

2


3.1 Normal queryingFirst of all we have to bring our data of the image bank in an useful structure, we havethe visual words and the visual word assignment vector. Now we calculate for each ofthe N = 1600 images its histogram using the above explained function hist_gen andsave it for further use in an N × K matrix called visual_mat. This means the rowindex indicates the image number and the column index the visual word.The normal querying implemented in qu_normal(I, k), takes an image I and returns anvector S which contains the indices of the k most similar images. As already the nameindicates we use an simple querying method where we just calculate the norm betweenthe histogram of the query image I and the histogram representation of the images inthe data set (visual_mat in the code). Then we take the k images with the smallestdistance from I (all this is done in histogram representation) .

1 for j=1:s

23 err_im (j)= norm (I_hist - visual_mat (j ,:) ’);

45 end

This is an straight forward implementation, and does not work too good. The resultsshould be discussed later in the section 4 Results.

3.2 Inverted indices queryingWe use again the already above generated data matrix visual_mat. Up to now we usedit in the following way: in row j (i.e the image j), we have the different visual words theimage has. It makes also sense to do it the other way round (invert it) i.e. that we saythe visual word i i.e the ith column of the matrix, appears in the images Ik1 , ..., Iks andthat ak1 , ..., aks times. We use this as followed, as above, we get the vocabulary of thequery Image I using the function Im2voc . This means we know how often each visualword appears. Now we simple add up the columns of the matrix visual_mat times thecorresponding number of appearance of the visual word in the query image I.

1 [ vis ] = Im2voc ( I_in , visual_word );

23 % now we know which vocabulary is in the input image I_in4 vis_un = unique (vis);

56 n= length ( vis_un );

7 count = hist (vis , vis_un ); % get the number of occurrences of the visual words .8 sum_vis = count (1) .* visual_mat (:, vis_un (1));

910 for j=2:n

11 sum_vis = sum_vis + count (j).* visual_mat (:, vis_un (j));

12 end

Finally we take simple the images with the highest score. Compared to normal queryingwe do not have to calculate the norm, we just sum up, which saves time and improvesspeed and we get a quite good result.

3


3.3 Tf-idf queryingThis stands for term frequency – inverse document frequency, this is an approach fromtext retrieval where every document is represented by an vector consisting of the wordsfrequencies in it. To make the retrieval more efficient an weighting is applied to thevector. We can use this nearly directly for our querying part. The words are our visualwords and the documents are our images. Let us assume that we have K differentvisual words, our vocabulary. Now every image I is represented by an K-vector vI =(w1, ..., wK)T with components

wj = njd

ndlog

D

nj

, (3.1)

where njd is the number of occurrences of visual word j in the image d, nd is the totalnumber of visual words in an image d, nj is the number of occurrences of an visual wordj in the image bank and D is the number of all the images in an image bank. The firsttherm in (3.1) corresponds to the frequency of a visual word in the whole image. If wehave a certain visual word often in an image than this term is big. Nevertheless theremight be some visual words which are present in nearly all the images this means thesevisual words are not helpful so we want to weight them down and this is done by theinverse image frequency, the log term in (3.1). In the querying state we have the queryvector vq and the representation of each image in the image bank in the tf-idf form vI .Now we simple calculate the cosine of the angel between vq and all the vIs. This is doneby

cI = vq, vIvq vI (3.2)

And then we take the k images with the biggest cI , since this indicates that the angelbetween the query image and the image from the data base is small. This tells us thatthey are similar. We implemented this as follows.

1 [ tf_idf_im ] = Im2tf_idf ( I_in , visual_word ,A );

23 % retrieval state the images are ranked by there normalised scalar product4 % ( cosine of angle between the query Image and all the data_set vectors )56 D= size ( tf_idf_mat ,1);

78 vot= zeros (D ,1);

9 a= norm ( tf_idf_im );

1011 for j=1:D

1213 vot(j)= ( tf_idf_mat (j ,:)* tf_idf_im )/( norm_tf_idf (j)*a);

1415 end

In the code above the function Im2tf_idf is similar to the function Im2voc, we onlyadded some more lines to get a tf-idf representation of the query image. The objecttf_idf_mat is again the matrix where every row indicates an image and in the columnswe have the tf-idf form of that image. The calculation takes a little longer but the resultis really satisfying.

4

Jakob Wiesmeyr 4 Results Project Image Analysis

4 ResultsHere we want to compare the different methods concerning time and the result we aregetting.First we look at the normal querying here we get an elapse time of approximative60.834646 seconds using the function qu_normal. The querying part (i.e calculatingthe norm) takes 9.110799 seconds . Additionally we do not get all the time the k rightsimilar images. Right in that sense that our image bank consists all the time of fourimages of the same object with different perspectives, lightning and arrangement. Forexample we get with the function qu_normal the following result.

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(a) input image

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(b) output 1

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(c) output 2

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(d) output 3

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(e) output 4

Figure 1: result of normal querying searching for k = 4 similar images to the input

We see here that in (e) we got at totally different image, which indicates that the normas an measure for similarity seems not to be a good way. Also the computational workis quite high which we can see at the resulting elapse time especially in the queryingpart. The result can be improved when we use more visual words. But this indicatesalso more computations. Moreover this implementation is of no use, and was discussedand implemented for seeing the difference between good and bad approaches.Next we want to put our eyes on the querying which uses inverted indices. There forewe use the function qu_inv_ind. We get an elapse time of 51.807626 seconds for thefunction qu_inv_ind which is a big improvement compared to the normal querying.The querying part here is way faster it is 0.011633 seconds, which is approximatively110 of the querying time before. Further more we get the right similar images concerningthe input image above. But still if we have similar features in the image we still havesome problems finding the right images. Here again an example.

5

Jakob Wiesmeyr 4 Results Project Image Analysis

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(a) input image

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(b) output 1

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(c) output 2

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(d) output 3

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(e) output 4

Figure 2: result of inverted index querying searching for k = 4 similar images to theinput

Again we meet some difficulties in that case, but here an increased number of visualwords would not affect the computation time too much and it would improve the resultsa lot. We tried it with more visual words and the result fitted nearly most of the time.But we kept the number the same in the results since so we can easier compare themand this number I think is quite realistic for this many pictures. So we can say thatthis method is okay but it would need some improvements to get more efficient. Ingeneral the advantage of this method is the fast querying which does not need a lot ofcomputational work and still produces with a higher number of cluster centers a quitesatisfying result.Finally we want to have at look at the tf-idf method which is implemented in the functionqu_tf_idf. We get an elapse time of 39.907271 for the function above, which is againan improvement of ten seconds (concerning the whole system). Now we look closer.The querying part takes 8.946014 seconds, which is longer than in the inverted indicesquerying and takes nearly as long as in the normal querying. We also get here the twoprevious examples right and nearly all the other examples we tried. There are still someproblems, but we have more hits than in the previous two methods. So we can take theconclusion that this method is working best, in the sense of rightness and speed. Whichis mostly linked that we process the frequency vectors again and weight them.Nevertheless we have to state here also that the querying part takes longer and thefact that the total elapse time of the function qu_tf_idf is smaller is mostly linkedto the way we implemented this function. To put in a nutshell, this method has an bigadvantage concerning the result (getting the "right" images), but concerning speed it is

6

Jakob Wiesmeyr 5 Summary Project Image Analysis

way behind the inverted index querying. Here we give an final example of this method.

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(a) input image

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(b) output 1

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(c) output 2

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(d) output 3

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

(e) output 4

Figure 3: result of tf-idf querying searching for k = 4 similar images to the input

5 SummaryFrom the tests above we can conclude that the tf-idf method is the "best" in comparisonwith the other two methods (in the way they have been implemented here). It wouldbe interesting to test this methods on larger data sets that means > 1600 images forexample N = 10000. But this was due to the capacity of my laptop not possible. Alsoin the implementations is still room to make them faster and more effective, but for anfirst approach I’m quite satisfied. Furthermore another approach would be to use alsothe geometric information we get from vl_sift(I) and improve this way the queryingresults and times.This project was quite interesting for me since I learned more about SIFTs, k- meansand especially the querying part. Furthermore working with such huge data sets, let onesee how important good implementations are. I also have to thank Linus Svärm for hispatience and help understanding and implementing the topics in this project.

7

Jakob Wiesmeyr References Project Image Analysis

References[1] Image bank. http://www.vis.uky.edu/~stewe/ukbench/, 2012.

[2] Sivic, J., and Zisserman, A. Video google: A text retrieval approach to objectmatching in videos. Robotics Research Group, Department of Engineering Science(2003), 4.

[3] Vedaldi, A., and Fulkerson, B. VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/, 2008.

8

http://www.vis.uky.edu/~stewe/ukbench/

http://www.vlfeat.org/

Multi-label Image Segmentation Using a ContinuousMax-Flow Approach

Project in Image Analysis FMA175

Viktor LarssonSupervised by Niels Christian Overgaard

December 10, 2012

Abstract

In this project the continuous max-flow model of Yuan et al. [1] for multi-label image

segmentation is considered. The model and algorithm is presented and the connection to

the continuous Potts model is derived. The metrication error usually present in graph based

methods for the same problem is investigated. Finally possible uses of recent accelerated

methods for total variation minimization are briefly considered.

1

1 Introduction

In this project we consider the problem of multi-label image segmentation which is partition-ing an image into multiple disjoint regions. This can be seen as constructing an assignmentϕ : Ω → 1, ..., n which for each pixel x assigns one of the n labels.

Typically the segmentation is driven by color or grayscale information but other prior knowledgecan be used as well. This information is captured in the cost functions fi(x), which describesthe cost of assigning pixel x to the label i. The cost functions fi are often called the data terms.A common choice of cost functions are fi(x) = |I(x) − ci|2 where ci are constants and I(x)denotes the image. The problem is then to assign each pixel x to one of the labels such that thetotal cost is minimized.

A naive approach is simply to assign each pixel to the label with the pointwise smallest cost.But due to noise or poorly defined edges in the data this approach often result in very noisysegmentations with jagged boundaries. In Figure 1 we can see an example of this.

(a) (b) (c)

Figure 1: a) Input image b) Naive segmentation c) Regularized segmentation

The segmentation can be regularized by adding a spatial dependence where the labeling of apixel depends not only the data but also which labels the neighboring pixels are assigned. Thiscan be done by adding a cost penalizing the boundary length of the segmentation. This cost iscalled the regularization term.

1.1 Continuous Potts Model

A popular model for this problem is Potts model[8]. The idea behind Potts model is to find apartition Ωini=1 of the image domain Ω that minimizes the sum of integral of the correspondingcost functions inside each Ωi. To get a smooth boundary a term penalizing the boundary lengths|∂Ωi| is added.

minΩin

i=1

n

i=1

Ωi

fi(x) dx+ αn

i=1

|∂Ωi| (1)

s.t. ∪ni=1 Ωi = Ω and Ωk ∩ Ωl = ∅, k = l.

This can be rewritten by associating an indicator function ui(x) with each set Ωi which take thevalue 1 if x ∈ Ωi and 0 otherwise. If Ωi has sufficiently smooth boundary it holds that

|∂Ωi| =

Ω

|∇ui(x)| dx.

Using these indicator functions we get the equivalent definition of Potts model.

minu

n

i=1

Ωui(x)fi(x) dx+ α

n

i=1

Ω|∇ui(x)| dx (2)

s.t. ui(x) ∈ 0, 1 andn

i=1

ui(x) = 1.

The problem is non-convex since we are minimizing over a non-convex set of functions. This isa hard problem. In fact it can be shown that the discrete version of this problem is NP-hardfor more than two labels. So we have little hope of finding an algorithm which is both fast andexact for this problem.

2

1.2 The Convex Relaxation of Potts Model and Truncation

To make the problem tractable we instead consider the convex relaxation of the Potts modelwhere we minimize over the convex set of functions ui(x) ∈ [0, 1] with

ni=1 ui(x) = 1. The

model then becomes

minu

n

i=1

Ωui(x)fi(x) dx+ α

n

i=1

Ω|∇ui(x)| dx (3)

s.t. ui(x) ∈ [0, 1] andn

i=1

ui(x) = 1 for all x ∈ Ω

which is a convex problem.

The problem is now easier to solve but we pay for it by getting non-binary solutions. To handlethis we simply take the closest binary segmentation. Let u∗ = (u∗

1, ..., u∗n) be the optimal solution

to (3). We then construct our segmentation ϕ(x) ∈ 1, ..., n by taking the pointwise largestindicator functions, i.e. ϕ(x) = argmaxi ui(x). Which hopefully will be close to the solution ofthe original problem (2).

It is clear that if the optimal solution for this problem is binary then it is also optimal forthe original problem (2).

2 Continuous Max-Flow

Approximate solutions to the discrete Potts model can be found efficiently using graph cuts. Mostalgorithms for computing the graph cuts work with the dual problem of finding the maximumflow. Now we will consider a similar construction for the continuous Potts model which was firstpresented by Yuan et al. in [1]. First we present the model and then we derive the connectionto the convex relaxation of Potts model (3).

2.1 The Primal Model

In [1] Yuan et al. propose a continuous max-flow model which is analogous to the discrete max-flow model for this problem. The n label problem is modeled with n copies of the image domainΩ. For each copy there is a spatial flow field qi(x) which models the internal flow between points.There is also a n sink flows pi(x) which models the flow from each pixel to the sink. There is asingle flow ps(x) which is the flow from the source to the pixel nodes. Note that this is sharedbetween all copies of the image domain. This is illustrated in Figure 2.

Figure 2: Illustration of the continuous max-flow model of Yuan et al.

Similarly to the discrete case we have constraints on the sink flows and spatial flows.

pi(x) ≤ fi(x), |qi(x)| ≤ α, i = 1, ..., n

We also have a constraint for the conservation of flow at each point.

(∇ · qi − ps + pi)(x) = 0, i = 1, ..., n

3

We then want to maximize the total flow from the source with the above constraints.

supps,p,q

Ω

ps dx (4)

s.t. |qi(x)| ≤ α, pi(x) ≤ fi(x)

(∇ · qi − ps + pi)(x) = 0, i = 1, ..., n.

where p = (p1(x), ..., pn(x)) and q = (q1(x), ..., qn(x)).

In [1] Yuan et al. shows that this is the equivalent dual problem of (3).

2.2 The Primal-Dual Model

To show that the continuous max-flow model (4) is the dual problem of the convex relaxationof the Potts model (3) Yuan et al. [1] introduce introduce Lagrange multipliers ui(x) for theconservations constraints in (4)

supps,p,q

infu

Ω

ps dx+n

i=1

Ω

ui(∇ · qi − ps + pi) dx (5)

s.t. |qi(x)| ≤ α, pi(x) ≤ fi(x)

where u = (u1(x), ..., un(x)). They call this the Primal-Dual model. Note that there are noconstraints on u. It is clear that this is an equivalent problem. For choices of ps,p and q whichdoes not satisfy the constraint the functional becomes unbounded from below with respect to uand thus the inner infimum becomes −∞. This forces the supremum to choose ps,p and q whichsatisfies the constraint.

2.3 The Dual Model

Since the energy in (5) is convex lower semi-continuous for fixed u and concave upper semi-continuous for fixed ps,p and q we can change the order of the sup and inf [10]. Doing this andrearranging the terms we get

infu

supps,p,q

Ω

(1−n

i=1

ui)ps +n

i=1

ui∇ · qi +n

i=1

uipi dx (6)

s.t. |qi(x)| ≤ α, pi(x) ≤ fi(x)

In [1], Yuan et al. then make the following observations for (6),

infu

supps

Ω

(1−n

i=1

ui)ps dx ⇒n

i=1

ui = 1 (7)

and

infu

suppi≤fi

Ω

uipi dx = infu

Ω

uifi dx, ui(x) ≥ 0 (8)

Note that if ui were negative the supremum would be unbounded since pi could be chosen arbi-trarily negative.

Combining (6), (7) and (8) together with the well-known identity for the total variation norm

sup|qi(x)|≤α

Ω

ui ∇ · qi dx =

Ω

α|∇ui| dx (9)

we get the following optimization problem

infu

n

i=1

Ωuifi dx+

n

i=1

Ωα|∇ui| dx (10)

s.t.n

i=1

ui(x) = 1, and ui(x) ∈ [0, 1].

which is the convex relaxation of the Potts model (3).

4

3 Maximizing the Flow

To maximize the flow in (4) Yuan et al. [1] propose using the Augmented Lagrangian method withrespect to the flow conservation constraints. To do this we construct the Augmented Lagrangianfunction

Lc(ps,p, q,u) =

Ω

ps dx+n

i=1

Ω

ui(∇ · qi − ps + pi) dx− c

2

n

i=1

||∇ · qi − ps + pi||2.

where c > 0 is a constant. The Augmented Lagrangian method then consists of two steps whichare iterated until convergence.

1. We maximize Lc with respect to ps,p, and q with fixed u.

(pk+1s ,pk+1, qk+1) = argmax

ps,p,qLc(ps,p, q,u

k)

2. We then update the estimate of u with

uk+1i = uk

i − c(∇ · qik+1 − pk+1s + pk+1

i ).

We now consider the optimization in the first step. This is performed by optimizing in eachparameter independently. For updating the spatial flow qi(x) we have the problem

sup|qi(x)|≤α

Ω

ui ∇ · qi −c

2(∇ · qi − ps + pi)

2 dx

By completion of squares we get

ui ∇ · qi −c

2(∇ · qi − ps + pi)

2 =u2i

2c− ui(pi − ps)

constant w.r.t. qi

− c

2(∇ · qi − ps + pi − ui/c)

2

Thus to maximize Lc with respect to the spatial flow qi we have to solve

sup|qi(x)|≤α

Ω

− c

2(∇ · qi − ps + pi − ui/c)

2 dx = inf|qi(x)|≤α

||∇ · qi − ps + pi − ui/c||2. (11)

This is a non-linear projection onto a convex set and can be computed using Chambolle’s algo-rithm for total variation minimization [6].

Since the spatial flows from this minimization are only used as an intermediate step Yuan etal. [2] propose taking a single gradient descent step instead of performing the minimization toconvergence. The update for qi then becomes

qik+1(x) = PC

qi

k(x) + γ∇(∇ · qik − pks + pki − uki /c)

where γ > 0 is the step length which is chosen small enough so that the method converges andPC is the projection onto the constraint |qi(x)| ≤ α. This projection can be computed by

PC(q)(x) =q(x)

max1, α−1|q(x)| .

For a derivation of the descent direction of (11) see the Appendix.

Now we consider the sink flow pi(x). Then we have the optimization problem

suppi≤fi

Ω

uipi −c

2(∇ · qi − ps + pi)

2 dx.

As before we use completion of squares and get

uipi −c

2(∇ · qi − ps + pi)

2 =u2i

2c− ui(∇ · qi − ps)

constant w.r.t. pi

− c

2(∇ · qi − ps + pi − ui/c)

2

5

So our optimization problem becomes

suppi≤fi

Ω

− c

2(∇ · qi − ps + pi − ui/c)

2 dx = infpi≤fi

||∇ · qi − ps + pi − ui/c||2

Which is minimized by setting pi = −∇ · qi + ps + ui/c or if that violates the constraint pi ≤ fisetting it to the closest feasible point. Thus the update for pi becomes

pk+1i (x) = min

fi(x) , −∇ · qk+1

i (x) + pks(x) + uki (x)/c

Finally we consider the source flow ps.

supps

Ω

ps −n

i=1

uips −c

2(∇ · qi − ps + pi)

2 dx

We differentiate the integrand pointwise and get

∂

∂ps

ps −

n

i=1

uips −c

2

n

i=1

(∇ · qi − ps + pi)2

= 0 ⇔

1−n

i=1

ui − cn

i=1

(∇ · qi − ps + pi) = 0 ⇔

1− nc · ps + cn

i=1

(∇ · qi + pi − ui/c) = 0 ⇔

ps =1

nc

1 + c

n

i=1

(∇ · qi + pi − ui/c)

So the update for the source flow becomes

pk+1s =

1

nc

1 + c

n

i=1

(∇ · qk+1i + pk+1

i − uki /c)

So to summarize the algorithm consists of the following steps which are iterated:

1. Update the spatial flows qi, i = 1, ..., n.

qik+1 = PC

qi

k + γ∇(∇ · qik − pks + pki − uki /c)

2. Update the sink flows pi, i = 1, ..., n.

pk+1i = min

fi , −∇ · qik+1 + pks + uk

i /c

3. Update the source flow ps.

pk+1s =

1

nc

1 + c

n

i=1

(∇ · qik+1 + pk+1i − uk

i /c)

4. Update the multiplier estimate ui, i = 1, ..., n.

uk+1i = uk

i − c(∇ · qik+1 − pk+1s + pk+1

i )

Note that during the first two steps there is no data dependence for different labels so theycan be run in parallel. This allows for the algorithm to be efficiently implemented on GPUs orin other highly parallel environments.

6

4 Results

Now we look at some segmentations performed using the presented algorithm. We only considerdata terms of the form fi(x) = |I(x)− ci|p where p and ci, i = 1, ..., n are constants. In Figure3, 4 and 5 we see segmentations with n = 6, n = 8 and n = 5 respectively.

(a) Input (b) Result

Figure 3: Segmentation of Charlie Chaplin with n = 6.

(a) Input (b) Result

Figure 4: Segmentation of some tulips with n = 8.

(a) Original (b) Noise added (c) Result

Figure 5: Reconstruction of noisy image with n = 5.

7

5 Metrication Errors

In the discrete Potts model the length of the boundaries |∂Ωi| is defined as the number of pairsof neighboring pixels with different labels [3]. This definition allows the energy to be minimizedefficiently using graph cuts. The drawback is that boundary regularization is anisotropic, i.e. itpenalizes variation differently depending on its direction. In Figure 6 we can see two image regionswith approximately the same euclidean boundary length and their boundary costs visualized. Wecan see that the boundary lengths differ with about 30%.

Figure 6: Two image regions and their boundary costs. Left: |∂Ω| = 7. Right: |∂Ω| = 10.

In the continuous Potts model the labeling is regularized using the isotropic total variation semi-norm which is a more ideal metric since it corresponds to the euclidean length of the boundary.The error introduced by the anisotropic regularization is called the metrication error.

The metrication error in graph based methods can be reduced by expanding the neighborhoodstructure. This was studied by Boykov and Kolmogorov in [9]. This approach has the drawbackof increased memory usage and processing time.

To investigate the behavior of the regularization term we consider the k-label inpainting problem.We performed the inpainting on the two images in Figure 7 where the gray area is the area tobe inpainted and the colored areas are known.

(a) (b)

Figure 7: a) Input for 2-label inpainting problem. b) Input for 3-label inpainting problem.

To perform inpainting using the Potts model we choose the data terms fi so that they dominatethe expression for the known areas and for the unknown areas they give no preference to anylabel, i.e. fi = fj . This means that only the regularization term will be used for deciding thelabeling in the inpainted area. The optimal labeling will then minimize the length of the bound-ary in the inpainted area.

For the first image (Figure 7.a) it is clear that the optimal labeling will divide the image along

8

diagonal since the shortest curve between two points is a straight line in the euclidean metric.For the second image (Figure 7.b) the minimal interface becomes three straight line segmentsjoined at a 120 angle.

In Figure 8 we can see the two-label inpainting problem solved using both continuous max-flowand regular discrete graph cuts. The graph cut is done both with the 4-neighborhood (GC4) andthe 8-neighborhood (GC8). We can see that GC4 prefers cuts along the grid. For this particularinput we see that we get metrication errors for GC4 while GC8 reaches the correct solution.

Note that for two labels the graph cut problem can be solved exactly. The metrication errororiginates from the model and not from the algorithm used to find the minimum cut.

(a) (b)

(c) (d)

Figure 8: The two-label inpainting problem solved both using the continuous max-flow algorithmand graph cuts. a) Input b) Continuous max-flow c) GC4 d) GC8

In Figure 9 we have solved the three label inpainting problem using the same methods. For thisinput image we have metrication errors for both of the graph based methods. The GC4 methodhas three line segments joined at 90, 90 and 180. The GC8 method has three line segmentsjoined at 135, 135 and 90. We can see that with the 8-neighborhood the graph method preferscuts along the diagonal as well as the grid.

5.1 Regularizing with Other Metrics

In Figure 8 and 9 it becomes clear that the graph based method have a different boundary reg-ularization than the continuous Potts model (2). Now we consider what continuous model theycorrespond to.

In the continuous Potts model we regularize the boundary with the total variation semi-norm

TV (u) =

Ω

|∇u| dx

9

(a) (b)

(c) (d)

Figure 9: The three-label inpainting problem solved both using the continuous max-flow algo-rithm and graph cuts. a) Input b) Continuous max-flow c) GC4 d) GC8

which we can minimize efficiently in the continuous max-flow due to its dual formulation

Ω

|∇u| dx = sup|q(x)|≤1

Ω

∇u · q dx = sup|q(x)|≤1

Ω

u ∇ · q dx.

But we can consider other possible regularizations such as

TV1(u) =

Ω

|∇u|1 dx =

Ω

|ux|+ |uy| dx

We can find a similar dual formulation for this

Ω

|ux|+ |uy| dx = sup|p|∞≤1

Ω

∇u · q dx = sup|p|∞≤1

Ω

u ∇ · q dx.

and we can then minimize it using the same algorithm but we where instead project onto theconstraint |q(x)|∞ ≤ 1.Another possible regularization is

TV∞(u) =

Ω

|∇u|∞ dx

which then by symmetry corresponds to the constraint |q(x)|1 ≤ 1.

In Figure 10 we can see the three-label inpainting problem solved using the continuous max-flow algorithm with three different boundary regularizations; TV , TV1 and TV∞. Here we canclearly see that the GC4 method corresponds to the TV1 regularization.

10

(a) (b)

(c) (d)

Figure 10: The three-label inpainting problem solved with the continuous max-flow algorithmusing different boundary regularizations. a) Input b) TV c) TV1 d) TV∞

11

6 Accelerated Methods for Total Variation Minimization

As a intermediate step of the continuous max-flow algorithm we need to compute a projectionthat is of the form

infq

||∇ · q − g||2, |q(x)| ≤ α

. (12)

In [6] Chambolle shows that if q∗ is the solution to (12) then u∗ = g −∇ · q∗ is the solution tothe Rudin-Osher-Fatmi [7] problem

minu

Ω

α|∇u(x)| dx+1

2||u− g||2

In [4], [5] Beck and Teboulle propose accelerated methods for this problem which have betterconvergence speed than simple gradient descent methods. We experimented briefly with usingthe accelerated methods for the intermediate step in the continuous max-flow algorithm.

The result can be seen in Figure 11. We see that for small α, i.e. when the data term is strong,we get similar performance as the gradient descent method but at a higher computational cost.For larger α the accelerated methods perform better which is reasonable since there is moreweight on the regularization terms so accuracy in the spatial fields become more important.

12

(a) α = 0.5

(b) α = 1.5

(c) α = 2.5

Figure 11: Comparison of gradient descent and the accelerated methods for total variationminimization for different regularization costs α. Left: The input image, correct segmentationand the resulting segmentations. Right: The L2 norm of the error plotted against iterations forthe two methods.

13

References

[1] J. Yuan, E. Bae, X. Tai, Y. Boykov, A Continuous Max-Flow Approach to Potts Model,ECCV’10, 2010

[2] J. Yuan, E. Bae, X. Tai, Y. Boykov, A Fast Continuous Max-Flow Approach to Potts Model,Computational and Applied Mathematics Reports, 2011

[3] Y. Boykov , O. Veksler , R. Zabih, Fast approximate energy minimization via graph cuts,IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001

[4] A. Beck, M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse

Problems, SIAM J. Imaging Sciences, 2009

[5] A. Beck, M. Teboulle, Fast Gradient-Based Algorithms for Constrained Total Variation Image

Denoising and Deblurring Problems, IEEE Transactions on Image Processing, 2009

[6] A. Chambolle, An Algorithm for Total Variation Minimization and Applications, Journal ofMathematical Imaging and Vision, 2004

[7] L. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms,Physica D: Nonlinear Phenomena, 1992

[8] R. Potts, Some generalized order-disorder transformations, In proceedings of the CambridgePhilosphical Society, 1952

[9] Y. Boykov, V. Kolmogorov, Computing Geodesics and Minimal Surfaces via Graph Cuts,ICCV, 2003

[10] M. Sion, On general minimax theorems, Pacific J. Math., 1958

14

Appendix: Descent Direction for Spatial Flows

The energy we are minimizing in (11) is of the form

E(q) =

Ω

1

2(∇ · q − g)2 dx.

The Gateaux derivative of a functional in a direction ϕ is defined as

δϕE(q) =d

dtE(q + tϕ)

t=0

.

For our functional we get

δϕE(q) =1

2

Ω

d

dt(∇ · (q + tϕ)− g)2 dx

t=0

=

=

Ω

(∇ ·ϕ) · (∇ · q + t∇ ·ϕ− g) dx

t=0

=

=

Ω

(∇ ·ϕ) · (∇ · q − g) =:u

dx =

=

Ω

u∇ ·ϕ =

Ω

∇ · (uϕ)−ϕ ·∇u dxGauss=

=

∂Ω

uϕ · n dS +

Ω

−ϕ ·∇u dx

We can w.l.o.g. constrain ourselves to considering directions ϕ such that ϕ|∂Ω = 0.

δϕE(q) =

Ω

−ϕ ·∇u dx = ϕ, −∇(∇ · q − g) =:δ

= ϕ, δ

Since the scaling of ϕ is arbitrary we can w.l.o.g. only consider ϕ that satisfy ||ϕ|| = ||δ||assuming that ||δ|| = 0. Cauchy-Schwartz gives us that

−||δ||2 = −||ϕ|| · ||δ|| ≤ ϕ, δ ≤ ||ϕ|| · ||δ|| = ||δ||2

So it becomes clear that δϕE(q) is maximized for ϕ = δ and minimized for ϕ = −δ.

Thus the steepest descent direction of E is ϕ = ∇(∇ · q − g).

15

Projekt i Bildanalys- utvardering av bilder pa leverflackar med risk for hudcancer

Signe Sidwall Thygesen [email protected]

Handledare:Anders HeydenLunds Teknologiska Hogskola

10 december 2012

Figur 1: Bild pa leverflack med eventuell hudcancer (kalla: Anders Heyden)

Innehall

1 Introduktion och bakgrund 2

2 Metod 2

3 Resultat 5

4 Diskussion 7

5 Slutsats 7

1

1 Introduktion och bakgrund

Hudcancer, sasom Malignt Melanom och andra typer, ar i Sverige den tumorgrupp som okar mest[6]. Det ar darfor av stor vikt att utveckla metoder som gor att man snabbt kan fa reda pa ifall enleverflack innehaller hudcancer eller inte.Tidig detektering av hudcancer minskar risken for att cancernska sprida sig, och ger darfor mycket battre chanser for att patienten ska overleva [3]. Det kan ocksafinnas tillfallen da en undersokning behovs goras utan att en doktor finns i narheten, och darfor arjust bildanalys ett bra redskap for att kunna ge en snabb forsta diagnos och salla ut de patienter medleverflackar som eventuellt kan behova en behandling. Detta projekt har syftat till att skapa en algoritmmed hjalp av bildanalys som automatiskt kan detektera ifall en leverflack har risk for hudcancer ellerinte.

De karakteristiska dragen for Malignt Melanom i en leverflack kan definieras i de sa kalladeABCD [3]:

A Assymetri

B ’Border’ (Ojamna kanter)

C ’Color’ (Fargvariationer)

D Diameter (En diameter storre an 6 mm)

Utgangspunkten i det har projektet var att med hjalp av vanliga bilder pa leverflackar kunna skiljapa de som innehaller hudcancer och de som ar friska. Projektet avgransades till att undersoka lever-flackar med avseende pa assymetri samt ojamna kanter (karakteristiska drag A och B).

2 Metod

Bilderna som anvandes var av tre typer: friska leverflackar (5 st), leverflackar med ett forstadie av ma-lignt melanom (3 st), leverflackar med malignt melanom (10 st som slutligen anvandes) samt en bildpa en svart cirkelskiva som referens. (Bilder fran Anders Heyden, wikipedia samt bildarkiv pa natet;bland annat [5] och [3]) Nedan i figur 2-5 ses nagra bilder ur varje grupp samt bilden pa cirkelskivan.

Med hjalp av kunskapen om vad som karakteriserar en leverflack med malignt melanom (se avsnitt1) samt metoder ur kursen ’Bildanalys’ ([1]) och artiklar ([2], [3], [4]) sa implementerades en algoritmi MATLAB:

1. En bild matas in och beskars omkring leverflackenDetta gors med funktionen imcrop.

2. TrosklingEtt initialt troskelvarde T0 satts till medelvardet av intensiteten for alla pixlar i den beskurnabilden och med hjalp av detta delas alla pixlar i bilden in i tva grupper; de som har intensitetstorre an T0 eller mindre an T0. Darefter beraknas medelvardet av intensiteterna for dessa tvagrupper och ett nytt troskelvarde satts till medelvardet av dessa tva. Skillnaden mellan det tidi-gare troskelvardet och det nuvarande beraknas och om detta ar storre an en viss tolerans Ttol sadelas pixlarna in i tva grupper igen och proceduren upprepas tills skillnaden mellan gammaltoch nytt troskelvarde understiger toleransen.

T0 → Tgammalt =⇒medel1,medel2 =⇒ Tnytt =medel1+medel2

2 tills Tnytt−Tgammalt ≤ Ttol

2

Figur 2: Friska leverflackar.

Figur 3: Leverflackar med ett forstadie till hudcancer.

Figur 4: Leverflackar med hudcancer.

3. Ta bort sma pixelgrupper utanfor leverflack samt fyll i eventuella hal inuti denDetta gors med funktionerna bwareaopen samt imfill

3

Figur 5: Svart cirkelskiva, anvands till att jamfora resultaten med.

4. Ta fram kanterna pa leverflackenDetta gors med funktionen edge.

5. Hitta masscentrum av leverflacken samt berakna avstand till kantenMasscentrum hittas i den segmenterade bilden och avstandet till kanterna tas fram genom attman vet koordinaterna for kanten ur steg 4. Som ett matt pa hur mycket storsta och minstaavstandet till kanterna fran masscentrum skiljer sig sa raknas kvoten Q fram som det maximalaavstandet (MaxAvst) genom det minimala avstandet (MinAvst).

Q =MaxAvst

MinAvst. (1)

6. Undersok hur kanterna varierarTransformera kantpixlarnas euklidiska koordinater (x,y) till polara koordinater (θ, ρ). Sorteradessa sa att avstandet fran masscentrum (ρ) blir en funktion av vinkeln (θ) som varierar mellan−π och π. Med hjalp av faltning tas den diskreta derivatan fram: Derivatan = ρ ∗ [1;−1](detta med funktionen conv i Matlab).Som ett matt D pa hur mycket derivatan varierar tas summan av absolutbeloppen pa derivatannormerat med antalet pixlar som ingar i kanten (l) samt medelavstandet till kanten (ρmedel).

D =

| Derivatan |ρmedel · l

. (2)

7. Ta fram assymetriFor ett visst antal jamnt fordelade vinklar, θi, mellan −π rad och 0 rad (halva leverflacken)delas leverflacken in i tva delar av en linje genom masscentrum med vinkeln θi och skillnaden iavstand till masscentrum mellan tva speglade punkter beraknas (se fig 6), | ρk−ρk−l |. Absolut-beloppet av skillnaden summeras for alla speglade punkter for ett bestamt θi och normeras medmedelavstandet fran masscentrum till kanten (ρmedel). Detta upprepas for alla θi. Som ett mattpa assymetrin hos leverflacken, A, tas det minsta vardet av den normerade summan (alltsa valjsett visst θi som ger minst varde pa summan), eftersom det minsta av dessa varden ar det θi sommotsvarar den linje genom masscentrum som ar narmast att ge spegelsymmetri for leverflacken,och storleken pa A avgor da ifall leverflacken ar symmetrisk eller inte.

A =min(

| ρk − ρk−l |)ρmedel

, (3)

4

Antalet punkter dar skillnad i avstand beraknas for varje θ satts till 20.

Figur 6: θi varierar fran −π till 0 med 20 st jamnt fordelade punkter. For varje θi beraknas avstandetmellan tva speglade punkter | ρk − ρk−l | och summeras. Ett matt pa assymetrin ges av ekv 3

For varje bild som indata fas alltsa parametrarna Q; kvot av minsta och storsta avsand till mas-scentrum, D; matt pa hur derivatan varierar samt A; matt pa hur assymetrisk leverflacken ar. Dessakan sedan kombineras for att ge ett slutligt varde, C, som ska avgora ifall leverflacken innehallerhudcancer eller ej.

C = Q+ 15D + 2A. (4)

Konstanterna sattes till varden som gav en bra separation pa de friska leverflackarna och de le-verflackar med hudcancer (samt forstadie till detta) och att de gav parametrarna Q, D och A ungefarsamma vikt.

Efter att ha undersokt vilka varden pa Q, D och A som erholls for leverflackar sa sattes etttroskelvarde pa C0 = 4 for att leverflacken skulle klassas som att ha hudcancer, alltsa de leverflackarmed ett varde pa C som overstiger eller ar lika med C0 hamnar i klassen ’hudcancer’ och de som harett varde pa C som understiger C0 klassas som ’ej hudcancer’. De bilder pa leverflackar som ar ettforstadie till Malignt Melanom ar da ocksa tankt att klassas som ’hudcancer’.

3 Resultat

I figur 7 syns tva av leverflackarna efter att steg 1-5 ur metoden tillampats. Det har alltsa med hjalpav troskling skapats en bild som gjort att man kan ta fram kanterna (blaa punkter i figuren) och hittamasscentrum (en rod punkt i mitten av de tva leverflackarna i figuren).

I tabell 1 visas resultatet fran berakning av Q; kvot av minsta och storsta avsand till masscentrum,D; matt pa hur derivatan varierar, A; matt pa hur assymetrisk leverflacken ar samt C; kombineratvarde (se ekv 4) for de tre bildgrupperna (friska, forstadie till hudcancer samt hudcancer) i form avmedelvarde och standardavvikelse.

Resultatet for berakning av Q, D, A samt C for den svarta cirkelskivan visas i tabell 2.

5

Figur 7: Tva leverflackar efter steg 1-5 i metoden, alltsa da det med hjalp av troskling segmenterat bil-den samt tagit fram kanter (bla punkter) samt masscentrum (roda punkter i mitten av leverflackarna).

Tabell 1: Tabell med resultat for de tre olika bildtyperna: friska leverflackar, forstadie till hudcancersamt leverflackar med hudcancer. De beraknade parametrarna Q, D och A samt kombinerat varde Canges i medelvarde samt standardavvikelse.

FRISKA FORSTADIE HUDCANCERMedel Standardavvikelse Medel Standardavvikelse Medel Standardavvikelse

Q 1,44 0,12 1,61 0,35 2,39 0,77D 0,0203 0,0103 0,0455 0,0098 0,0363 0,0204A 0,7562 0,1812 1,4700 0,6257 1,9081 0,7148C 3,26 0,59 5,23 1,73 6,74 2,11

Tabell 2: Tabell med resultat fran berakning av parametrar Q, D, A samt C for en svart cirkelskiva.CIRKELSKIVA

Q 1,01D 0,0020A 0,0022C 1,04

Efter att en troskling satts till C0 = 4 och klassificering av bilderna gjorts sa klassificerades allabilder ratt, dvs de friska blev klassificerade som ’ej hudcancer’ och de med hudcancer samt de medett foorstadie till hudcancer blev klassificerade som ’hudcancer’.

6

4 Diskussion

Metoden for sjalva trosklingen visade sig vara effektiv och lyckades rama in var sjalva leverflackenbefann sig (se fig 7). Dock kunde det ibland vara svart da det fanns reflektioner i bilden eller dabelysningen var dalig da bilden togs. En antydan till svarighet vid reflektion kan ses i den hograkanten pa den vanstra leverflacken i fig 7 (en del av dessa ’flackar’ som kunde uppsta togs dock bortmed bwareaopen och imfill i steg 3 i metoden).

Parametern Q gav hyffsat bra separation pa friska leverflackar och leverflackar med hudcancereller ett forstadie till detta (se tabell 1), trots att den inte egentligen har nagon direkt anknytan till dekarakteristiska dragen for hudcancer.

Parametern D lyckades daremot inte helt separera pa de friska och de med hudcancer, man kani tabellen se att standardavvikelsen ar relativt stor. Det kan bero pa att leverflackar som egentligenhade ganska jamna kanter fick ojamna kanter da dessa togs fram efter trosklingen pa grund av daligupplosning eller att trosklingen inte blev helt perfekt, det medfor ju da att den diskreta derivatan blirstorre och sa aven parametern D. Detta skulle mojligen kunna motverkas genom att forst jamna utfunktionen som beskriver kanterna (ρ) innan den diskreta derivatan tas fram.

Parametern A gav bra uppdelning mellan friska leverflackar och de flackar med hudcancer. Denassymetriska egenskapen var ocksa ofta ganska framtradande pa bilderna med leverflackar som hadehudcancer.

Kombinationen C av parametrarna Q, D och A lyckades har klassificera alla bilder ratt. Dock arkonstanterna i uttrycket for C ganska godtyckliga (se ekv 4) och kan sakerligen modifieras for att fabattre resultat. Troskeln C0 = 4 kan ocksa undersokas mer och sattas till ett battre varde. (Den bori vilket fall hellre sattas for lagt an for hogt, alltsa att nagon frisk kan klassificeras som hudcancersnarare an tvartom)

Den svarta cirkelskivan (fig 5) borde rent teoretiskt fa resultatet pa parametrarna: Q=1, D=0, A=0,eftersom radien ar konstant och kanten ar jamn samt att den ar helt symmetrisk. Det beraknade resul-tatet som syns i tabell 2 ligger mycket nara dessa varden, vilket antyder att metoderna att berakna Q,D och A ar forhallandevis bra.

Det har projektet begransades till att bara undersoka ojamna kanter samt assymetri hos lever-flackar, men nagot som aven skulle kunna undersokas ar fargvariationerna hos leverflacken (ett av dekarakteristiska dragen). Det skulle ge ytterligare en parameter till klassificeringen ifall leverflackenhar hudcancer eller inte och da en sakrare metod. Att diametern kan vara storre hos leverflackar medhudcancer ar ocksa mojligt att ta med i berakningen, men da behovs nagon angiven skala i bildenvilket kan vara svart.

5 Slutsats

Over lag gav de framtagna parametrarna Q, D och A en bra beskrivning av de undersokta lever-flackarna. Bilderna kunde klassificeras ratt med den metod som beskrivits har, dock skulle det behovasen mycket storre mangd bilder for att kunna statistiskt sakerstalla resultatet fran det har projektet. Detskulle vara bra ifall detekteringsprogrammet kunde tranas med en mangd bilder pa hudcancer sa attkonstanterna i uttrycket for C samt troskeln C0 kunde sattas till battre varden, for att darefter provaprogrammets formaga pa nya bilder. Dessutom finns det ju olika typer av hudcancer, vilka alla harlite olika karakteristiska drag och kan darfor vara olika svara att detektera. Har undersoktes framstMalignt Melanom, men en utokning aven till fler hudcancertyper skulle vara mojlig, dar man inte barafar veta ifall leverflacken innehaller cancer eller inte, utan aven av vilken typ.

7

Referenser

[1] R. Szelinski, Computer Vision: Algorithms and Applications, September 3 2010 draft.

[2] B. Garcia Zapirain et al., Skin cancer parametrization algorithm based on epiluminiscence image

processing, University of Deusto, 2009.

[3] Anthony et al. , Early Detection and Treatment of Skin Cancer, Am Fam Physician, July 15 2000.

[4] M.Emre Celebi et al. , Unsupervised Border Detection of Skin Lesion Images, Information Tech-nology: Coding and Computing, 2005.

[5] Universitetssjukhuset Orebro, Malignt melanom bildarkiv,http://www.orebroll.se/sv/uso/Patientinformation/Kliniker-och-enheter/Hudkliniken/Patientinformation/Bildarkivet/.

[6] Cancerfonden, Cancerfondsrapporten 2011, Edita Vastra Aros 2011.

8

Punktmoln fran bild

Bjorn Hansson F09

Nariman Emamian L09

Course FMA175 - Bildanalys -projekt

11 december 2012

1 Sammanfattning

Detta projekt har inneburit en djupdykning i tekniken att skapa punktmoln ifran tva bilder. Utgangspunktenhar varit tva bilder pa skansen Lejon i Goteborg dar vi givits punktmatchningar pa forhand. Darefterimplementerades teori i form av matlab-script for att slutligen fa ut en 3D-modell i form av ett punktmoln.

2 Teori

2.1 8-punktsalgoritmen

8-punktsalgoritmen innebar att man staller upp ekvationen

xFx = 0 (1)

for att losa ut F, den fundamentala matrisen. Forsta delen av algoritmen innebar en normalisering av punkt-machningarnas bildkoordinater. Orsaken till detta ar att en stor spridning pa avstandet mellan punkternager samre losningar. Det finns ett antal olika satt att normalisera med den gemensamma namnaren att mansamlar”in punkterna runt origo (se Figur 1).

Ett av dessa satt innebar att man utifran punktmatchningar och kamerakalibreringsmatrisen far de nyapunkterna, x, genom formeln:

x = K−1xpunkt (2)

x’ och x ar har punktmatchningar fran de bada bilderna och kravet ar att man anvander sig av minst attastycken matchningar. Vilka punkter man matchar ar godtyckligt men approximationen av F kan forsamras avpunkter som ligger inom for kort radie till varandra eller pa en linje. Nedan syns mangden punktmatchningar,dar de roda punkterna ar de som kommer inga i var 8-punktsalgoritm.

1

Figur 1: Punktmatchningar

I nasta steg i 8-punktsalgoritmen gor man en omformulering av problemet x’Fx=0 till Af=0 istallet, dar f idet har fallet kommer att utgora var fundamentalmatris i kolonnvektorform. Sett i matrisform ges overgangenav

x y 1

F11 F12 F13

F21 F22 F23

F31 F32 F33

xy1

= 0

ger

x1x1 x

1y1 x1 y1x1 y1y1 y1 x1 y1 1

: : : : : : : : :xnxn x

nyn xn ynxn ynyn yn xn yn 1

f = 0

(3)

Genom sa kallad singularvardesuppdelning loses slutligen ekvationen Af=0.

Slutligen bor tillaggas att man kan kora fler an atta punkter in i algoritmen under forutsattning att mansorterat ut sarskilt daliga matchningar.

2.2 Singularvardesuppdelning

Singularvardesuppdelning, eller SVD, ar ett satt att faktorisera reella och komplexa matriser. Faktoriseringensker for en m*n-matris A enligt A = USVT dar

• U ar en reell eller komplex ortogonal m*m-matris

• S ar en positiv diagonalmatris innehallandes singularvarden

2

• V ar en ortogonal n*n-matris

Efter att ha gjort denna faktoriseringen hittar man losningen pa x ovan i V-matrisens sista kolonnvektor.Denna vektor extraheras och omstruktureras for att bilda en kvadratisk matris vilken ar den fardiga losningenpa matrisen x till problemet Ax = 0.

2.3 Essentiella matrisen

Grundlaggande i teorin bakom skapandet av punktmoln ar den fundamentala matrisen F som kan sagas varaden algebraiska representationen av den epipolara geometrin. Nar det kommer till normaliserade kameror sasags den fundametala matris som svarar mot dessa denna vara den essentiella matrisen, E.

Vid berakning av den essentiella matrisen, E’, fran punktmatchningar som inte ar exakta sa uppstar felaktig-heter i sammansattningen av den essentiella matrisen. Genom en singularvardesuppdelning fas sambandet

E = USV T (4)

dar diagonalen av S-matrisen kan ersattas med diag (1,1,0). Detta kan bast beskrivas som en projektion averhallen matris pa den tillatna mangden essentiella matriser.

2.4 Konstruera kameramatriser P och P’

Genom antagandet att kameramatrisen, P, ges av

P =I | 0

(5)

sa kan man genom singularvardesuppdelning av den essentiella matrisen, E, erhalla den andra kamerama-trisen P’. Antag att singularvardesuppdelningen ges av

E = Udiag(1, 1, 0)V T (6)

For att berakna P’ kravs att den essentiella matrisen faktoriseras i

E = SR (7)

dar S ar en skevsymmetrisk matris och R ar en rotationsmatris. Man kan visa att det finns tva mojliga fallav faktorisering som ges av

E = SR1, E = SR2

R1 = UWV T , R2 = UWTV T , S = UZUT(8)

Dar

W =

0 −1 01 0 00 0 1

(9)

3

Z =

0 1 0−1 0 00 0 0

(10)

Vidare kan med utgangspunkt i matris S visa att translationsmatrisen t ges av

t = U ∗ (0, 0, 1)T = u3 (11)

Men da tecken framfor E-matrisen inte kan bestammas sa kan saledes inte heller tecken framfor t bestammas,darmed kommer P’ ges av en av foljande matriser

U W V T | +u3

U W V T | −u3

U WT V T | +u3

U WT V T | −u3

(12)

Vilken av losningarna som ar ratt erhalls efter trianguleringen genom att geometriskt inse vilken av de senareerhallna losningarna som stammer in pa verkligheten.

2.5 Triangulering

Utifran bildpunkter x, x’ och korresponderande kameramatriser P, P’ kan ett punktmoln skapas genomsa kallad triangularing. For varje bild rader forhallandet x=PX, dar X ar ursprunglig punkt som genomkameramatrisen projiceras till bildpunkten x. Antar man att origo ligger i kameran som korresponderar motmatrisen P sa fas bildkoordinaterna genom foljande operation

x1

x2

x3

⇒

x1x3x2x3

1

(13)

som bygger pa sa kallad homogen representation. Detta innebar att

X ∼x y 1

T(14)

Hursomhelst sa kan x=PX kombineras ihop till annu en linjar ekvation av typen Ax=0. A byggs denna gangenligt nedan.

A =

xpT3 − pT1

ypT3 − pT2

xp3T − p1

T

yp3T − p2

T

(15)

dar pi ar radvektorer for P. Vidare loses x ut precis som tidigare genom singularvardesuppdelning.

4

Figur 2: De tva bilderna pa skansen Lejon

3 Metod

Materialet som legat till grund for detta arbete ar tva bilder ur olika vinklar pa ett och samma objekt, i dettafall Skansen Lejonet i Goteborg (se figur neda). Kameran som anvants for att ta bilderna ar kand, likasakalibreringen och bildkoordinater samt upplosning. Detta anvands som indata och den givna informationenar da i form av en 3x3 matris (K i var matlabkod). Vidare ar aven 2904 olika punktmatchningar mellanbilderna kanda med tillhorande bildkoordinater. Det innebar att man har en punkt i ena bilden och desskorresponderande punkt i den andra bilden, t.ex. ett horn pa ett fonster. Givet var aven att en stor del avstort avvikande och felaktiga punkter pa fohand var borttagna.

Allt material utarbetades genom Matlab och vi anvande 8-punktsalgoritmen for att erhalla den essentiellamatrisen.

For att erhalla ett battre resultat sa korde vi aven ett script innehallandes 300 punkter in i 8-punktsalgoritmen.

For att askadliggora resultaten tydligare sa hamtades fargen i originalbilden for motsvarande punkt i punkt-molnet. Detta kunde goras eftersom att bildkoordinaten, som tidigare namnts, var kand och att ordningenpa de erhallna punkterna i punktmolnet var densamma som i de ursprungligen erhallna bildpunkterna.

4 Resultat

Indata ar ovan namnda tva bilder och de utritade punktmatchningarna. Resultatet av var matlabkod ar ett3D-punktmoln enligt bild 3-6 nedan. Sjalva koden finns i bilaga 1.

5

Figur 3: Punktmoln framifran

Figur 4: Punktmoln snett ovanifran

5 Diskussion

Att kunna generera bra 3D-punkter forutsatter att man fran borjan har en bra grund i sin indata. Har talarvi i termer om punktmatchningar, som i vart fall ar 2904 stycken. Att bestamma var och en av dessa entydligtar ett tidsodande uppdrag och darfor utgar vi ifran automatiskt beraknade matchningar. Det ar oundvikligtatt vissa av dem da kommer ha samre kvalitet, och detta kommer saledes aven synas i den punktens 3D-motsvarighet. Det viktigaste har ar att punkter med samre kvalitet inte ingar i 8-punktsalgoritmen dadet hade resulterat i foljdfel for samtliga punkter. Ett satt att forsoka arbeta mot denna felkalla ar attlata fler punkter inga i 8-punktsalgoritmen for att inte lata enskilda fel inverka for mycket. Man kan se iskillnaden mellan figur 5 och 6 att figur 5 ar nagot mer skev i figuren i jamforelse. Men skillnaden ar inteuppseendevackande stor.

Ett annat satt att optimera koden ar att anta att felen pa punktmatchningarna ar normalfordelade och goraen sa kallad bundle-adjustment”. Detta hanns inte med att implementera dock.

6

Figur 5: Punktmoln ovanifran

Figur 6: Punktmoln med fler punkter in i 8-punktsalgoritmen

6 Referenser

1. Haner, Sebastian; Doktorand vid matematiska institutionen LTH. Handledning under perioden 2012-10-29till 2012-12-10.

2. Hartley, Richard och Zisserman, Andrew, 2006.Multiple View Geometry in computer vision. 2:a uppl. 3:e tryckn. Cambridge: Cambridge University Press.

3. Olsson, Carl; Forskarassistent vid matematiska institutionen LTH. Handledning under perioden 2012-10-29till 2012-12-10.

7

Image Analysis Project

— Cartoons for Smartphones —Gustaf Waldemarson ∗

Faculty of Engineering (LTH), Lund UniversitySweden

AbstractThe Comic book format is a useful way of displaying a lot of infor-mation at the same time – It can do anything from simply displayinginformation to a telling a compelling story. This medium has alsobeen used for a very long time, but only recently are people able todisplay them on digital equipment such as tablets and smartphones.With the introduction of these devices however, a new nuisance hasbecome apparent – The screens of these devices cannot fit the comicin a way that is pleasing to the eye! This paper investigates a wayof potentially segmenting the wanted information from the comics– such as the individual panels of the comics – and displaying themone at a time to the user.

Keywords: Image Analysis, Comic Book Panels, Segmentation,Panel Extraction

1 IntroductionA lot of people enjoy reading comics from time to time, be it inthe form of webcomics such as XKCD1 or digitally scanned or dis-tributed comics such as Asterix or Iron Man. These days however,a lot of people prefer to read them on devices the comics were notintended to be rendered on – such as smartphones and tablets. Thisalmost always leads to a lot of annoying scrolling back and forth onthe page in order to read the panels in the correct order.

The Aim of this project was to develop an algorithm, or a set ofalgorithms, that could use image analysis in order to segment digitalcomic book pages into their individual panels. These segmentedpanels could then be rendered one at a time to the viewer, savinghim or her from the annoying scrolling.

Furthermore, the applicability of this kind of algorithm as wellas others are discussed along with possible extensions in the future.

The actual implementation was constructed and tested in Mat-lab due to past experience, ease of debugging and due to the largeamount of image processing, image analysis and computer visionfeatures fully implemented in the various software toolboxes withinMatlab.

2 Anatomy of a Comic Book PageOne of the most challenging aspects of this project is the fact thatit is supposed to solve a very general problem, whereas most im-age analysis algorithms only deal with a small set of restricted andsimpler cases.

The general comic is any image which displays sub-images in-side panels of any size and shape making it a difficult problem ofextracting these sub-images. To make matters worse, authors tendto add their own styles to these images as well, such as letting theircharacters break the fourth wall, i.e., letting them leap out of theirpanels, or have speech bubbles stretch between panels. We humanshave no trouble in recognizing the continuity between panels de-spite all of these kinds of discrepancies between comics or authors,and sometimes they are even useful in conveying certain elementsof the comics.

∗e-mail: [email protected]

Figure 1: The Asterix comics follows the characteristics outlined inSection 2 very well.

From an image analysis standpoint however, this makes it an ab-solute nightmare in designing a robust algorithm that can accuratelysegment these panels. Analysing the common look of comics how-ever, it is possible to identify some characteristics:

• Panels are almost always quadrilateral, and typically rectan-gular.

• A single page contains many panels, but typically around 4−12.

• Panels are usually of the same, or similar style on the samepage.

• Panels are often placed along lines both vertically and hori-zontally.

• Panel borders are often sharp, or otherwise well defined.

A simple example of a comic displaying these characteristics canbe seen in Figure 1.

Looking at this anatomy of a simple page, it is possible to designa simple algorithm that can segment the individual panels with arelatively high success rate, as long as the comic follow the outlinedcharacteristics.

3 Testing FrameworkIt is important to point out that comic book pages are unusuallylarge to be used in image analysis on normal computers, and poten-tially too large on smartphones. This means that the complexity ofclassical image analysis techniques leads to a considerably longerruntime and a greater amount of consumed memory.

To alleviate both the spatial and computational complexitieswhile prototyping, a simple framework was developed. It worksin the following way:

Figure 2: Graphical illustration of the testing framework.

1. Collect all, or a subset of the file paths of image files from apre-determined directory on the hard drive and save them in alist.

2. For each file path on the list, load the image, execute an oper-ation or a set of operations on the loaded image and write theresulting image into an output directory.

A graphical illustration of the framework can be seen in Figure 2.This framework made it possible to test a wide variety of operationson a large set of comic book pages relatively quickly. In fact, theonly limiting factor was the inability to quickly cancel a runningbatch for debugging purposes.

4 Segmentation Algorithm OverviewUsing the characteristics outlined in Section 2, the following low-level segmentation algorithm was devised to segment a given imageof a comic book page:

1. Detect the color of the panel borders by collecting the 2 top-most rows of pixels and average them into a single color.

2. Construct a thresholded image as follows: if a pixel has acolor value close to the detected border color it is set to 1,otherwise 0. The resulting image will in theory only containthe panels and noise with the same color as the borders.

3. Look along the vertical edge of the thresholded image fora row of pixels going horizontally across the whole image.When found, count the number of consecutive ”empty” rowsto find the width of the panel border. The row coordinate andwidth of the border are saved in a list. Repeat this until reach-ing the bottom of the page.

4. Segment the original image from the top to the first border inthe list. Then from the first to the second border, and so onuntil the bottom of the page. This creates n + 1 new images,where n is the number of row coordinates and widths storedin the list.

5. For each new image, repeat the 2 previous steps, but instead ofgoing vertically, go horizontally along the edge of the image,and segment vertically.

6. For each new image, repeat the previous 4 steps in order tofind irregularly stacked panels. Stop when no new image isproduced.

A graphical representation of the algorithm can also be seen inFigure 3.

Figure 3: Graphical representation of the panel segmentation algo-rithm.

4.1 LimitationsA majority of all comics has characteristics as outlined in Section 2.It is however not the only way comics are outlined. As is obviousfrom the construction of the algorithm, it cannot possibly segmentcomics not lying on lines. Such as the comics seen in Figure 4 or 5.Figure 4 in particular also show us other problems:

• If borders are made up of different colors, the thresholdingwill give unreliable results.

• Pages with a large amount of color similar to the panel mightalso be segmented when in reality it should not, such as inFigure 6.

5 ResultsThis algorithm can for instance correctly segment the comic bookpage seen in Figure 1.2 Similarly, any comic book with a similarlayout will be correctly segmented as long as it follows the charac-teristics outlined in Section 2.

6 DiscussionConsidering how general the comic book layout can be, it is un-likely that a segmentation algorithm capable of accurately segment-ing all comic books will ever be developed. In Section 7.1 we willdiscuss some possible extensions in order to further increase thesuccess rate of the segmentation.

6.1 Success rate and Ground TruthThe complete implementation of this application is intended to beusable for anyone, meaning that the success rate should be high,

2The resulting subimages making up the comic book panels have beenomitted since they would occupy an unnecessarily large portion of the re-port.

Figure 4: A page from the Prototype comic, originally from thegame with the same name. The panels are laid out in a complexpattern.

Figure 5: Another couple of panels from the Asterix comics. ThisL-shaped panel presents a difficult case for such a simple extractionalgorithm.

in order to not disappoint the user. It is however not crucial that itsegments correctly every single time, as long as the users have anoption of turning off the segmentation when it does so erroneously.

Since we also have a user present at all times, it could be possibleto ask the user at least once per comic book to point out predeter-mined features such as the border in an image, or other useful datain order to introduce some ground truth to the page and thus in-crease the chance of a successful segmentation.

6.2 Comic Book FeaturesAs should be apparent from Section 2, every artist can create theirown layout, but comic books belonging to the same series i.e., ThePunisher, Iron Man or XKCD are typically laid out in the same, orat least in a similar way. This fact could be used to store featuresfor each different comic book series, such as the minimum panelsize, panel border structure etc. These features could for instancebe given to a classifier, deciding the most appropriate strategy to beused to segment the given comic book. This strategy requires somekind of control features however. These must be generated beforethe segmentation can be done and stored in a database located ei-ther on the smartphones directly, or on a remote server, streamingthe data to the smartphone when needed. While this opens up aninteresting approach to the segmentation, it opens up a far more in-teresting question i.e., the whole locality of the segmentation canbe altered:

6.3 Real-time or Offline Segmentation?Instead of streaming the features to be used in a segmentation algo-rithm it might be more beneficial to stream the results of the seg-

Figure 6: A panel from the Punisher comic. In the middle of theimage, a large black region stretches vertically across the wholepanel which will erroneously be segmented.

Figure 7: Illustration of the server solution.

mentation. This means that a remote server can contain a databaseof, for instance, a set of four spatial coordinates indicating thequadrilateral making up each panel in any given comic.

This also means that the segmentation could be done completelyoffline. A server could run the segmentation algorithm, saving theresults for future use and whenever it finds a difficult comic bookpage it could query a human operator for an accurate segmentation.This would ensure correct segmentations for the end user, as longas the comic is available in the database3 if not, a fallback algorithmsuch as the one seen in Section 4 could be used. See Figure 7 for agraphical overview of this offline segmentation strategy.

7 Future Research7.1 ExtensionsThe algorithm outlined in Section 4 will work in the general case,but as pointed out in Section 4.1, it will need some more work inorder order to correct some of the more serious problems. Still,it will not be able to segment difficult pages such as the one inFigure 4 or 5. For these cases, other approaches such as the oneused by [Ho et al. 2012]. Simple extensions should however beable to correct problems such as the one seen in Figure 6 and solveproblems such as overlapping speech bubbles.7.1.1 Additional InputWhen the image is loaded, instead of just generating the thresholdedimage, generate a set of new images containing the following:

SAT A Summed Area Table (i.e., an integral image).4

3The name and page number of a comic is typically available in the filename, and is thus often readily available. The user could also be queried toverify the name of the comic.

4An Integral Image, is an image where each pixel contains the sum of allpixels above and to the left. See http://en.wikipedia.org/wiki/

Figure 8: Spiderman comic book panel with a difficult speech bub-ble.

Edges an Image over the detected edges in the image using e.g.,a canny edge detector. Potentially slightly morphologicallyaltered.

Filled Edges The edge image with all holes cascade filled.

Filled Threshold The thresholded image with all holes cascadefilled.

Now, each time a panel is to be segmented, evaluate these newimages in order to find out additional information about page, suchas the area of the intended cut.

7.1.2 Overlapping Speech Bubbles and ObjectsIt is not uncommon for objects and speech bubbles in comics topop out off panels, such as seen in Figure 8. The algorithm in Sec-tion 4 successfully segments these panels, but half of the bubblewill be missing! Thus, it is not enough to simply segment the pan-els, a good implementation must also find the speech bubbles inthe comics, or at very least those that overlap with several panels.If such an algorithm could be made, the application could even beextended with an on-the-fly OCR-scanner able to extract text, andperhaps even translate the text in the panels.

7.1.3 Panel OrderingDue to the straight-forward implementation of the algorithm de-tailed in this paper, the actual order of the panels are assumed toalways be correct when the segmentation is correct. With furtherextensions to the algorithm, or even with a completely new algo-rithm, we might have to reorder the segmented panels in order forthe user to receive the panels in the correct chronological order.This is especially true in comics where authors use arrows to directthe user through the page. Similarly, the order might be differentfor certain comic book styles, e.g., in manga.

7.2 Hardware ConstraintsIt is worth pointing out once again that the final implementationof this application is intended to run on smartphones. Because theapplication is supposed to run on a mobile platform, which is pre-sumably powered by a battery, a set of software constraints must beconsidered during the implementation:

It is well known that memory accesses are expensive operationscomputationally, they do however also cost considerable amount ofelectric power. Since comic book pages generally are very largeimages, to simply display it, a considerable amount of work andpower must be consumed.

Depending on the complexity of the algorithm, a considerableamount of extra memory may be used. The extension in Section 7.1e.g., would need to use almost twice as much memory thus using al-most twice as much power. This might be unacceptable on a mobileplatform.

For this reason alone, a server based solution as seen in Sec-tion 6.3 would be appropriate, since the heavy calculations and

Integral_image and [Hensley et al. 2005] for more details.

memory accesses would be executed on a server were power con-sumption could be largely disregarded. Although the majority ofthe calculations would be executed on a remote server, sending theresulting data would obviously also consume a certain amount ofpower – thus both approaches need to be researched.

7.3 Runtime and ThreadsEven with a server solution such as the one proposed in Section 6.3,the application must be interactive – The user might tolerate somedelay during the start of the application, but he or she will mostlikely not tolerate any considerable delay between pages. Thus, ifthe algorithm cannot process pages in a quick enough manner, theuser have little to no use of the application. This problem can bealleviated by making the application multi-threaded in order to beable to both display a possibly interactive panel to the user, and si-multaneously processing one or more pages for segmentation. Thismeans that with proper multi-threading of the application, the finalruntime of just the segmentation algorithm is less important.

7.4 Other AlgorithmsIn the wake of this project, a couple of interesting papers regardingthis exact problem was found: [Ho et al. 2012; Chan et al. 2007].Both papers propose interesting solutions capable of solving someof the issues experienced by the algorithm in Section 4 while stillkeeping the algorithm simple enough to run on a variety of hard-ware. [Ho et al. 2012] in particular display a great amount of detailsin their implementation, but even they have to limit their scope.

8 ConclusionComic books convey a large amount of information, but the smallscreens of modern smartphones cannot possibly convey this infor-mation in a satisfying manner. This paper has researched and dis-cussed a single implementation of a simple segmentation algorithmcapable of segmenting the panel in the most common cases. Sev-eral extensions of this algorithm has also been discussed, as well aspossible limitations of the intended platform.

Future research in this specialised segmentation approach shouldimplement and compare a variety of different comic book segmen-tation techniques, such as those described in Section 7.4, both inregard to power consumption and execution time.5

ReferencesCHAN, C., LEUNG, H., AND KOMURA, T. 2007. Automatic panel

extraction of color comic images. In Advances in MultimediaInformation Processing PCM 2007, H.-S. Ip, O. Au, H. Leung,M.-T. Sun, W.-Y. Ma, and S.-M. Hu, Eds., vol. 4810 of LectureNotes in Computer Science. Springer Berlin Heidelberg, 775–784.

ENNIS, G., LAROSA, L., AND PALMER, T. 2004. The Punisher -In the Beginning. No. 1 in Punisher MAX. MAX Comics, March.

GOSCINNY, R., AND UDERZO, A. 1961. Asterix the Gaul. No. 1in Asterix. Dargaud.

GRAY, J., PALMIOTTI, J., ROBERTSON, D., AND JACOBS, M.2009. Prototype 2. No. 2 in Prototpe. Wildstorm Comics, July.

HENSLEY, J., SCHEUERMANN, T., COOMBE, G., SINGH, M.,AND LASTRA, A. 2005. Fast summed-area table generation andits applications. Computer Graphics Forum 24, 547–555.

HO, A. K. N., BURIE, J.-C., AND OGIER, J.-M. 2012. Panel andspeech balloon extraction from comic books. Document AnalysisSystems, IAPR International Workshop on 0, 424–428.

5The Comic book examples in this report were provided by e.g., [Grayet al. 2009; Goscinny and Uderzo 1961; Ennis et al. 2004].

Image Analysis, projectRegistration of Medical Images using Graph

Methods

Bartosz Malman

December 12, 2012

1 Abstract

In order to diagnose and study diseases, there is in medicine often a need toexamine samples of tissues, such as lung or kidney tissue. Such samples are cutinto thin slices on which experiments are conducted. The experiments couldinvolve staining the samples with different chemicals or markers. The purposeof such procedures could be to reveal relevant characteristics of the samples, asfor example ones that indicate evolving disease. Producing two different suchstained samples from two neighbouring tissue slices poses the problem of aligningthem with each other in scale, rotation and translation, to extract additionalmedically valuable information.

Given high resolution images of two corresponding samples, the problemis to estimate the transformation of one image relative to the other. Giventhe parameters of such a transformation, the two images could be aligned, forexample to precisely identify the corresponding characteristics in both images.An example of such corresponding samples can be seen in Figure 1. In this pairwe could be interested in estimating the rotation and translation (which is inthis case are small) between the two sample images. To obtain such information,it is natural to try to obtain a set of matching features in the correspondingimages, and based on the locations of those estimate the parameters of thetransformation.

2 Basics

2.1 Rotation

In general, given sets of pointsX = x1, . . . , xn, Y = y1, . . . , yn of the two im-ages together with correspondences xi, yi one could expect to approximatelymodel the relation between the points as an affine transformation L, that is,

1

Figure 1: Two corresponding samples stained using different markers.

yi = L(xi) = Axi + c, where A is a 2× 2 matrix and c a 2× 1 column matrix.Of course, most probably there is no L such that L(xi) = yi for all i. Thenone could instead seek a transformation that, in some sense, produces the bestmatching. A standard way of doing this is to minimize the following sum

n

i=1

||L(xi)− yi||2

for the standard Euclidean norm || · ||. Here we consider the points as vectorsin R2. This is commonly known as the ’least squares solution’.

The provided project data set of 88 corresponding sample images had theproperty that the matched pairs were similar in scale, and also rigid, in thesense that the corresponding samples were not ’stretched out’ relative to eachother. This suggested that the search for the transformation L could be reducedto the transformations which merely rotate and translate the points. Suchtransformations are on form y = L(x) = Rx + c where R is a rotation matrix.In this case the above sum can be minimized analytically.

2.2 Scale-invariant feature transform

Before any matching can be performed, of course we need the data sets X,Y andthe correspondences. The algorithm Scale-invariant feature transform (SIFT)was used for this purpose. SIFT, published by David Lowe in 1999, is a patentedalgorithm that detects and describes local features in images. With such set offeatures from two images, one can perform a matching of the features to producecorrespondences between points of the images, that is, produce exactly what weneed for the estimation of the transformation L.

SIFT is a general tool designed to produce acceptable matchings for anyinput. Trying the algorithm on the project data with different parameters,SIFT often produced either too low amount of matchings for any transformationestimation to be statistically significant, or produced too many outliers. Clearly

2

from low amount of correct matchings not much can be done, but given a largeset of possible matchings, one could try to in different ways filter out the outliersand produce a set of correct matchings of useful size. SIFT was powerful enoughto in (almost) all cases provide a large set of possible matchings, out of which asignificant portion were matchings that were correct. Then the problem couldbe reduced to filtering the good from the bad.

The initial idea was to in sensible way construct a graph representing thecorrespondences and running algorithms on this graph to perform the filter-ing. No reasonable results were obtained by this method alone, and thereforeheuristic methods were developed and applied to the problem.

The algorithm implementation that has been used in the project was aslightly modified demo that is available for download from the creators home-page [2].

3 Methods

3.1 Estimation of the transformation

We start with the easy problem. Assume we already have the sets of pointsX,Y and correspondences xi, yi. Let xi = (xi1, xi2), where xi1 is the pointshorizontal coordinate and xi2 the vertical coordinate and similarly for points yi.We are trying to find a transformation L(x) = Rx+ t where R is an orthogonalmatrix and t a vector in R2. The solution of the problem in least squares senseis to estimate the parameters of L, that is, the matrix R and the vector t, suchthat the sum

n

i=1

||Rx+ t− y||2

is minimized. Consider the points x, y as vectors in R2 with addition definedby standard coordinate-wise addition. Define xm = 1

n

ni=1 xi and ym =

1m

ni=1 yi. Then xm, ym are the geometric centers (centroids) of the corre-

sponding sets X,Y . Now it can be shown that if R is the orthogonal matrixthat appears in the pair R, t that minimizes the above sum, then also R is thesolution to the minimization of the sum

n

i=1

||R(xi − xm)− (yi − ym)||2

and vice versa. The sum estimates the rotation when both data sets are centeredat the origin. This problem is simple since it is an optimization problem ofa single variable (the angle of the rotation). To obtain the vector t we cancompute where our rotation R takes the centroid xm and then subtract thisfrom the centroid ym, hence

t = ym −Rxm.

3

Given the transformation L(x) = Rx+t we could now transform the images.Let the two images be I1, I2. We will paint a new image I3 which is the invertedI2, relative to L. For each pixel in I1, we an use its coordinates x = (i, j) asargument to L to obtain where the corresponding pixel in I2 is located. If the(rounded to integer coordinates) value of L at the pixel is within range of I2,then we pick the pixel L(x) of I2 and put it in I3 at the location of the argumentpixel, that is, at (i, j).

With this problem out of the way, we can focus on constructing the setsX,Y and the correspondences (matchings).

3.2 SIFT distance ratio parameter

When running SIFT, there is an important parameter that changes drasticallythe amount of possible matches that are returned. This parameter, which we callthe distance ratio can be set to a value on the interval from 0 to 1. For a givenfeature point x in the first image, we compute the similarity (Euclidean distance)to all feature points in the other image. If the best and second best matchare such that the corresponding distances are far away enough, then the bestmatching is accepted. What is ’far away enough’ is decided by distance ratio.Therefore, a value of distance ratio closer to 1 will accept more matches, andthis is exactly the behaviour we are after, because we need as many matchingsas possible.

The parameter in the demo implementation of SIFT that was used in theproject was hard-coded to a value 0.6. This was raised to a value of 0.8, andwhat this results in can be seen in figure 2. The first matching, with parameter0.6, returns a very low amount of correct matchings. The second returns a verybig amount of matches, most of which are wrong. As it turns out, there aremany new correct matchings, but they are obscured by the new mismatches.This suggests that development of a method of filtering out the bad matches isin order.

3.3 Construction of the graph

We will now construct an undirected graph G = (V,E) on a set of nodes Vwhere we have one node for each matching returned by SIFT. The idea is tobuild the set of edges E in such a way that if e = (u, v) is an edge between nodesu and v then the matches corresponding to these nodes are inconsistent witheach other. The sense in which they are deemed inconsistent is the following.Let x1 and y1 be a matching between two points in the images. Let x2 and y2be another matching, such that the points x1 and x2 are located close to eachother in the first image. Then one could reasonably assume that the points y1and y2 should also be close to each other in image 2, because we are assumingthe transformation to be a rotation followed by translation. If this is not thecase, that is, if the locations of y1 and y2 are distant in image 2, then we say thatthe matches x1, y1 and x2, y2 are inconsistent, and add an edge to the graphbetween the corresponding nodes.

4

Figure 2: Sift matching with distance ratio parameter set to 0.6 and 0.8

3.4 Vertex cover and its application to our graph

Quickly we define what we mean by a vertex cover for a graph. Let G = (V,E)be an undirected graph. Then a set S ⊆ V is defined to be a vertex cover for Gif for each e = (u, v) ∈ E either u ∈ S or v ∈ S. Therefore, S is a vertex coverfor G if and only if any edge has an end at a node that is in S.

Now if we consider our graph G with a vertex cover S for it. By definitionthe nodes in V \ S would contain no edges between them, and hence the cor-responding matchings would not be inconsistent with each other. This is notequivalent to the remaining matchings being correct but hope is that it will givea reasonable result. For example, a node corresponding to a lone mismatch insome area of the image with no other matches nearby will never receive an edge,and therefore could not possibly be removed by the vertex cover filtering.

Taking S = V is always a solution to vertex cover problem, but it is meaning-less. Optimization version of finding a vertex cover for a graph, that is findinga vertex cover of minimal size, is assumed not solvable in sub-exponential time.There is therefore no hope for finding an optimal cover for the moderately sizedgraphs related to this problem. Instead, one can use heuristic constructionsor approximation algorithms. One well-known and very simple approximationalgorithm produces a vertex cover of at most twice the size of a minimal one[1]. This algorithm has been implemented and tested on graphs that appear inthe project. A different idea is to greedily remove the nodes of the graph that

5

have most edges incident, until no edges are present. One can easily constructinstances on which such an approach produces a vertex cover of way suboptimalsize, but it does fairly well on average, especially on dense graphs. Both theapproximation and the greedy algorithms have a running time that is a linearfunction of the input size.

3.5 Voting

The graph method presented above is inherently such that we concern ourselvesonly with the mismatches. But say that we have a bunch of matchings in asmall area, and these are correct matchings. Now if a mismatch is placed inthe same area, then our graph will contain an edge from the bad matchingto each of the correct matches. An existing edge implies that either one, theother, or both matches will be filtered out with the vertex cover filtering. It istherefore very possible that one mismatch makes the algorithm remove severalcorrect matches, and in the worst case, removes all correct ones and leaves themismatch in place.

This suggests a look from a different perspective. Let x1, y1 and x2, y2 betwo pairs of matched points in the two images. If we have that x1, x2 are locatedclosely and similarly for y1, y2, then one can consider these matches vouchingthe correctness of each other. This is the idea of the following selection criterion.Let m1,m2, . . . ,mk be the set of k matchings that SIFT produced for our twoimages. If two matches mi,mj have the characteristics presented above, thenwe give both of these an upvote. Doing this over all pairs of matchings, we canproduce a sorted list of which matches received the most upvotes. A mismatchplaced among many correct matches would not get any votes in this way. Atthe same time, the method promotes clusters of correct matches where thematches are mutually voting for each other. This suggests that the criterion fora matching to be considered good should be a fixed value of upvotes, and notsimply the matches with most upvotes. This value could be fairly low, we donot expect it to be very probable that many mismatches align with each other.A possible characteristic property of a mismatch here would be that it did notget any upvotes from any other matching.

3.6 Vertex cover and voting combined

After receiving a set of matches m1, . . . ,mk from SIFT, we construct the graphG = (V,E) and obtain a vertex cover S through one of the above presentedalgorithms. We filter out the matches corresponding to the members of S. Theremaining subset of matches has the property that no matchings are inconsistentwith each other. In the sense of definition of inconsistency above, it is still fairlyprobable that a mismatch is not inconsistent with any other match. Therefore,from this subset we take out the matches that have received at least some setamount of upvotes in the voting scheme. This should, statistically, take out themismatches that passed the vertex cover filtering. This intersection of solutionsfrom both methods is what we declare as the final result.

6

Figure 3: First row: SIFT returning an infeasible set of matchings. Second row:the scaled down version and the set of matchings.

3.7 Helping SIFT along the way

At times, SIFT provides us with such a low amount of correct matches, regard-less of distance ratio parameter, that nothing can really be done with them.One such example can be seen in figure 3. There we have a total of six match-ings out of which two are incorrect, hence running any kind of filtering is notmeaningful. Even in the case of filtering out the two outliers, a matching of fourpoints could be regarded as incidental. We would like more matches to buildconfidence that our alignment is correct.

A way to obtain more matchings was constructed in the following way. Firstthe pair of images were scaled down to a fraction of initial size, normally 1/10 or1/5. Many of the details of the image were now gone, and only very distinctivefeatures could be captured by SIFT. The set of matches returned in an examplescaled image can be seen in figure 3.

The set still is too small, but at this scale one can be reasonably confidentthat the matches returned are correct. Experimentally, this seems to be the case.Now, each matching of points x, y from the lower scale image describes acorrespondence between two segments of the large scale image, namely a windowof some fixed size centered at the point x in first image, and y in second image,where x in full sized image corresponds to x in the scaled image and similarlyfor y. The advantage now is that we can take out the subset of SIFT featurepoints from image 1 that is contained in the window at x, and a subset of SIFT

7

feature points contained in the window at y, and match these with each other.Given the limited size of the report, it is hard to get into details, but the waySIFT works makes it possible to obtain new matches in this local matching, thatcould not be obtained through a matching of feature points on the global scale.This is exactly what we need to extract additional matchings. Hence, for eachcorrespondence in the scaled down image, we perform a SIFT matching on theimage segments pointed out by those correspondences. Of course, we still usethe vertex cover/voting scheme combination to filter out unwanted matches.

3.8 Summary

For any reader uninvolved in the project, I can imagine it can be a real challengeto understand any of what I have written. I will summarize here what is done inthe course of producing the required amount of correct matchings for the imagealignment to be statistically significant. In order, the following operations areperformed:

(i) Run SIFT feature extraction algorithm on images I1, I2 of correspondingsamples.

(ii) Scale down both images to a fraction of original size, let S1, S2 be thescaled down images.

(iii) Run SIFT feature extraction algorithm, matching algorithm on S1, S2

and the described filtering procedures to obtain a set of matchings mi =xi, yi between the scaled down images.

(iv) For each matching mi = (xi, yi) obtained in (iii), run the SIFT matchingalgorithm on subset of features extracted in (i) that are contained in thewindows around xi, yi. Use filtering procedures for each window.

(v) Return the set of matches that is the union of matches from each window.

Actually, there are a couple intermediate steps that I do not really have spaceto fully describe. In addition to the presented procedures, I use histograms,histogram equalization, grey-scale conversions and exploit a few characteristicsof SIFT to obtain better results. Describing in detail why and where I use theseideas would blow up the size of this report, which I feel is already long enough.

4 Results

Out of the two algorithms described for obtaining a vertex cover for the graphs,the greedy one have proven to work best in practice. There are several argumentsfor this. One being that the graphs produced are dense. One could assume thatthen also vertex covers of minimal size are big, and an approximation of twicethat size often amounted to the worthless solution of S = V .

8

In figure 4 we have the obtained filtered matchings of two sample images andthe resulting translation that was computed from those matchings. The secondimage has been artificially rotated by 20 degrees. In general, any rotation andtranslation seems to be able to be handled given that SIFT returns a reasonableset of matchings on scaled-down images.

Figure 4: Two corresponding samples, artificial rotation introduced and theeffect of rotation/translation estimation.

Out of the 88 sample pairs of images that I have received, I consider to haveobtained enough matches on 77 of those. Most, if not all, of the 11 failures areimages with which SIFT already has immense trouble, returning next to nonecorrect matches.

A nice example of drastic performance increase can be seen in figure 5, whichshould be compared to the SIFT matching that has been presented in figure 3.We have gone from a horrible set of 6 matches out of which 2 are wrong, toa quite reliable set of about 50 matches that are correct. One can in a waysee what has been going on throughout the algorithm in that example. Oneeasily detects that there are several distinct clusters of matchings. Clearly thescaling down resulted in matching of the cluster areas to each other, and thelocal match search resulted in additional matchings in those areas.

The matching filtering algorithm was also tested on a couple images withartificially introduced affine distortions. Such deformations of samples oftenoccur in practice. The results were not completely satisfactory, but do showsome promise. The initial step of scaling down and finding smaller windows towork in seems especially appealing in case of existent deformations.

9

Figure 5: Successful attempt at producing new matches

5 Conclusions

While I do not consider the project a complete success, I am nevertheless happywith the results. Many of the sample pairs of images that I worked with couldbe aligned with good precision. Work with non-rigid, or deformed, samples wasless successful. The idea to work with matchings on local scale would be whatI would explore further, given more time.

Personally, I have realised just how many tools from different disciplines areapplied in the field of medical imaging. Concepts, theories and tools from fieldslike computer vision, mathematics, computer science and optimization are usedto tackle the many challenging problems that are encountered. Often these toolsare not enough and have to be modified in order to obtain satisfactory results.This means that the researchers, scientists and engineers working in the field ofmedical imaging are required to have a working knowledge of several, sometimesunrelated, fields.

6 Acknowledgements

I thank my project supervisor, Olof Enqvist, for the tips and help that I havereceived at, in fact, almost all stages of the project.

References

[1] Kleinberg, Tardos Algorithm Design. Addison Wesley, 2006.

[2] David Lowe, Demo Software: SIFT keypoint detectionhttp://www.cs.ubc.ca/ lowe/keypoints/

10

!"#$%&

'())*+,-+*./0,1213245(6(478

9&$9:$9:$&

;<4(882=2>(,2*61*=1>/448;

!"#$%&&%#'?1#662/1@A*6B

(%)*+'*%#'?1-/,,/+1C,+(65D(+0

$

!""#$%&&'()*+#,-"."/

$E13(0B+A65EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE F

9E1G/6*D=H+(65/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEF

9E$1C,A5/+(15(,(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE F

9E$E$1I(+(0,J+85+(B/61K*815/1*420(1>/44,7)/+6(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEF

9E91LM,+(K/+(18J+5+(BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN

9E9E$1O*+D(428/+26BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE N

9E9E91I*6P/+,/+26B1=+Q61B+Q80(4/R2451,2441R26J+(1R245/+EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEES

9E9EF1C/BD/6,/+26B1TRU4(R/4VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEES

9E9EF1I(6,/+1T/5B/VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE W

9E9EN1G+(7:4/P/41>*:*>>A+/6>/1TB+(7>*)+*)8VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEW

9E9ES1XH+61T>*+6/+VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE W

9E9EW1L6,+*)21T/6,+*)7VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE W

9E9E%1C,(65(+5(PP20/48/1T8,59VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEW

9E9EW1"/5/4PJ+5/1TD/(69VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE W

9E9E%1LB/680()/+121R245+/B2*6/+1T+/B2*6)+*)8VEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE%

9E9EY1#65+(118J+5+(B1(,,1A,=*+80(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE%

9EF1Z+J626B1(P15(,(EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE %

9EN1I4(882=2>/+26B[Z/8,626BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE%

FE1'/8A4,(,EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE%

FE$1I4(882=2>/+26BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE %

XA+1D(610(61=H+RJ,,+(1,+J==8J0/+K/,/6?EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE $&

SE1\280A882*6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE $&

WE1'/=/+/68/+EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE$$

9

01234-/*5"6

\/,1KJ+1)+*./0,/,1K(+1BQ,,1A,1)Q1(,,]1D/51K.J4)1(P15/1,/0620/+18*D1P21K(+14J+,1*88121

0A+8/61;3245(6(478;]10*68,+A/+(1/,,1878,/D1=H+1(A,*D(,280104(882=2>/+26B1(P1R245/+E1

!H+1(,,1Q8,(50*DD(15/,,(1K(+1878,/D/,1,+J6(,81D/51/M/D)/45(,(1=+Q618/M1*420(1

04(88/+1TR245/+1(P1>/44/+121*420(18,(52/+1215/,1KJ+1=(44/,VE1^65/+1,+J626B/61K(+1

878,/D/,1/M,+(K/+(,18J+5+(B1T=/(,A+/8V1=H+1+/8)/0,2P/104(881=H+1(,,1)Q18Q18J,,10A66(1

8024.(15/1*420(104(88/+6(1=+Q61P(+(65+(E

!+QB/8,J44626B/61K(+1D/51(65+(1*+51P(+2,?1:1G2P/,1/61R2451)Q1/61>/44]1KA+10(61D(61

(PBH+(1P240/,18,(52AD1>/44/61R/=266/+182B12_

7128#"9:()*4"6#

!"#$%&'()*+$(+&+

^65/+15/,1KJ+18,(52/,1K(+1=*0A814/B(,1)Q1(,,18,A5/+(1R245/+6([5(,(6?

! %9$1R245/+1D/51,244KH+(65/1D(80/+

! /61>P8:=241D/51R/8,Q/65/1(P1W104(88/+

! /61>P8:=241D/5126=*+D(,2*61*D15/1*420(1>/44/+6(8126,/682,/,]1R+/551*>K1KH.5

`(B1268QB1+J,,1=*+,1(,,15/,1P(+1R/8PJ+42B,1(,,1D(6A/44,1268)/0,/+(1(44(1R245/+E1!H+1(,, 1

4J,,(+/10A66(18,A5/+(15/1*420(1>/44,7)/+6(1RH+.(5/1.(B180+2P(1/,,180+2),18*D14J8,/1261

R245=24/+6(1*>K18*+,/+(5/15/D1/642B,1>P8:=24/6E1'/8A4,(,/,1(P15/,,(1R4/P18/M18,7>0/61

>/44D(,+28/+15J+1P(+./1D(,+281266/KQ44/+15(,(1=H+18(DD(1,7)1(P1>/44/+E1-Q15/,,(18J,, 1

K(+1.(B18/5(610A66(,18,A5/+(1A,8//65/,1)Q1>/44/+6(]15/881D(80(+]126,/682,/,]1R+/551

*>K1KH.51P(+1=H+182BE1OQB*,18*D1P(+16H5PJ652B,1=H+16J8,(18,/B15J+1.(B180(1=H+8H0(1

/M,+(K/+(1+/4/P(6,(18J+5+(B1T=/(,A+/8V1=H+1>/44/+6(E

,-.-./0%#%1&2#3*#%4')/563/*'/6+71%/8'++&9:'#)%

a251D(6A/441B+(680626B1(P15/1*420(1>/44/+6(10*D1.(B1=+(D1,2441=H4.(65/184A,8(,81

T>/44,7)/+6(1R/6JD681KJ5(61/=,/+1/642B,16AD+/+26B/616/5(6V?

.-/(6;64')'6<3/=0+%33/.>

! `JD61TK*D*B/6V17,(

! +A65[*P(4121=*+D/6

,-/?6%#3'/@:'81+'*/=0+%33/,>

! G+762B17,(

! +A65[*P(4121=*+D/6

A-/B<8+'6+%#/=0+%33/A>

! "(80/61J+1+A65121=*+D/6

! CDQ1bH(+;1(P1>/44/+

C-/?')'#'/=0+%33/C>

! G+762B17,(1TD/+126,/682P1J61>*(+8/18)/>04/5V

F

! "(80/61J+1+A65121=*+D/6

D-/!7)'/@:'81+'*/=0+%33/D>

! -QD266/+1*D1K*D*B/6/*A81=(8,1D/51DH+0(1*D+Q5/6[80ABB*+

! +A65[*P(4121=*+D/6

E-/?9&6:+%3;%/=0+%33/E>

! c+/B/4RA65/61=*+D

!"#$%&'%()*%+,-./0%12#34*)*%+-5*%67*./)-%7-%8)##%/9:%&;%0*$-0%8)##/9:%<%28=%.5 %

6"$0-)>>>

"26184A,8(,81P(+1(,,1P288(1>/44,7)/+1J+1PJ452B,186(+420(1*>K1(,,1D(80/+6(1/B/6,42B/61

26,/1B(P18Q1D7>0/,126=*+D(,2*6E1OQB*,1(66(,18*D1.(B1P(+1,PA6B/61(,,1,(1KJ68761,2441

P(+1(,,18J+5+(B/6180A44/1P(+(126P(+2(6,(1BJ44(65/?

! F6&%&76)]15Q1R245/610(61P(+(1,(B/61A)):*>K:6/5E

! @&6#+'1]15Q1R245/610(61P(+(1,(B/61=+Q61*420(1(P8,Q65E

! !"#$+9&&)7)4]15Q1>/44/61/.1R/KHP/+1P(+(1>/6,+/+(5121R245/6E

!"!$,-&*+.)*+$/0*(*+1

,-,-./B6#;%+73'#7)4

a288(1(P1R245/+6(121,+J626B85(,(61P(+1PJ452B,1DH+0(1T6QB*,18*D126520/+(5/81(P1(,,1

N

b26,/682,7;1P(+18(,,1,2441b26,/+D/52(,/;121>P8:=24/6VE1!H+1(,,10A66(1.JD=H+(1R245/+6(1

)Q1/,,18Q1+J,,P28,18J,,18Q1DH.42B,1A,=H+5/81=H+8,1/610*6P/+,/+26B1,2441B+Q80(4(1

T+/5A>/+26B1=+Q61,+/152D/682*6/+1,2441,PQV1*>K18/5(61/616*+D(428/+26B1(P1(44(1R245/+E1

L=,/+15/66(1)+*>/881R/8,*51(44(1R245/+6(1(P1PJ+5/61D/44(61&1*>K19SSE1XJ5(61/=,/+1

0*DD/+1(44(18J+5+(B1(,,1/M,+(K/+(81A,2=+Q615/16*+D(428/+(5/1R245/+6(E

!"#$%<'%?*%@"#$%:5%8)##/9:%A%"**0*%B67*./)-C%28=%)+/)-%B=,D)-C%*2-40#".)-"*D

,-,-,/06)G'#&'#7)4/$#H)/4#H31%+'I7+*/&7++/I7)2#%/I7+*'#

L=,/+8*D1,+J626B85(,(61K(+1P(+2,1R/B+J68(51=2>01.(B1/,,1,2)81*D1(,,1(6PJ65(1*420(1

,+H80/4PJ+5/61D/44(61&1*>K1$1T5Q1R245/610*6P/+,/+(81=+Q61B+Q80(4(1,2441R26J+V1*>K1

5J+/=,/+1/M,+(K/+(18J+5+(B1=+Q615/88(E1

!"#$%E'%?//%)F)4:)#%:5%)*%8)##%.24%=0-%12*6)-/)-0/.%/"##%@"*7-0%@"#$)-%42/.60-0*$) %

/"2%2#"10%/-,.1)#*"65)-%B"%./"D0*$)%*"65%+-5*%67*./)-%/"##%=,D)-C

,-,-A/@'4;')&'#7)4/=IJ+%I'+>

OJ+1.(B18,A5/+(5/1A,8//65/,1)Q1>/44/+6(10*D1.(B1=+(D1,2441(,,1D(610A65/1A,67,,.(1

8/BD/6,/+26B1=H+1(,,1,(1+/5(1)Q1P240/,1>/448,(52AD1/61>/441,244KH+E1\/,,(1)Q1B+A651

A,(P1;*.JD6K/,/+6(;1K*815/1*420(1>/44/+6(E1L61K*D*B/61>/44,7)180A44/1D/51(65+(1

*+51+/8A4,/+(121=J++/18/BD/6,1J61/6126K*D*B/6E

!H+1(,,1A,=H+(15/,,(1K(+1.(B1Q,/+(6PJ6,1P288(15/4(+1(P1D261"(,4(R:0*51=+Q61X(65:261

$1d$eE1f15/,1KJ+1=(44/,1K(+1.(B1/65(8,1P(+2,126,+/88/+(51(P1(6,(418/BD/6,18*D1/61P2881

,7)1(P1>/441R/8,*51(PE

S

!"#$%E'%?*%@"#$%:5%8)##/9:%<%"**0*%B67*./)-C%28=%)+/)-%B=,D)-C%.)D4)*/)-"*D

,-,-A/0%)&'#/='*4'>

L=,/+18/BD/6,/+26B/610A65/1.(B1,(1,2441P(+(1)Q15/61R26J+(1R245/61T8*D15/61,252B(+/1

=A60,2*6/61+/,A+6/+(5/V1B/6*D1(,,1+J06(1(6,(4/,10(6,/+15/,1=2668121P(+./1>/44E

!"#$%G'%?*%@"*7-%@"#$%/"##%67*./)-%28=%.0440%@"#$%$7-%0##0%10*/)-%7-%40-1)-0$)%/"## %

=,D)-

,-,-C/K#%9L+'G'+/86L688<#')8'/=4#%986:#6:3>

3/+J06(+1/B/680()/+18*D10*6,+(8,]10*++/4(,2*6]1/6/+B21*>K1K*D*B/62,/,121/61

B+Q80(4/R245E

,-,-D/("#)/=86#)'#>

3/+J06(+1KA+1DQ6B(1KH+61/61R26J+1R2451K(+E

,-,-E/M)&#6:7/=')&#6:9>

3/+J06(+1/6,+*)261=H+1/61B+Q80(4/R245E1\/,1P24418JB(184AD)DJ882BK/,/618*D10(61

(6PJ65(81,2441(,,15/=262/+(1P240/61,7)1(P1,/M,A+1/61>/441K(+E

,-,-N/@&%)*%#*%GG71'+3'/=3&*,>

3/+J06(+18,(65(+5(PP20/48/61=H+1D(,+28/4/D/6,E

,-,-E/O'*'+G2#*'/=;'%),>

3/+J06(+1D/5/4PJ+5/,1=H+1(44(1D(,+28/4/D/6,E

W

,-,-N/M4')31%:'#/7/I7+*#'476)'#/=#'476):#6:3>

" M<+'#B<;I'#

#6B/+1(6,(4/,1*R./0,121*D+Q5/,1D26A81/P/6,A/44(1KQ4121*R./0,/6E

" MP&')&

#6B/+1=H+KQ44(65/,1D/44(61(6,(41)2M4(+15/,1J+121*D+Q5/,1*>K1(6,(41

)2M4(+15/,1J+1215/61*D84A,(65/1R*M/61TR*A6526B1R*MVE

" MQ<7GR7%;'&'#

#6B/+152(D/,/+61(P1/61>2+0/418*D1K(+18(DD(1(+/(18*D1*R./0,/,81

*D+Q5/E

" M88')&#787&9

#6B/+1=H+KQ44(65/,1D/44(61(P8,Q65/,1K*81/61/442)81T8*D1K(+18(DD(1

(65+(1>/6,+(4D*D/6,18*D1*R./0,/,81*D+Q5/V1R+J66)A60,/+1*>K1

4J6B5/61(P15/881KAPA5(M/4E

,-,-S/T)*#%//32#*#%4/%&&/<&$6#31%

! @&6#+'13$"#5H++%)*'

Z2441P(+./1R2451=(6681/61D(801D/5126=*+D(,2*61*D1>/44/681KH.51*>K1R+/55E1

!H+KQ44(65/,1D/44(615/88(10(61R245(1/,,167,,18J+5+(BE

! !6#;

a288(1>/44/+1J+1,2441/M/D)/41+A65(+/1J61(65+(121=*+D/61T8/1D(80/+6(VE

! O%338')&#<;

3/+J06(1D(88>/6,+AD1=H+1(44(1>/44/+E

!"2$3*04541$+6$(+&+

OJ+1(44(1*P(66JD65(18J+5+(B1K(+1/M,+(K/+(,81=*+D(5/81/618J+5+(B8P/0,*+1=H+1

+/8)/0,2P/104(88[>/448,(52ADE1\/,,(1B/6*D1(,,1(6+*)(1=A60,2*6/61,+(261=H+1P(+./1

R2458/0P/68E1

!"7$89+//5:5;)*541<3)/&4541

Z2441,/8,626B/61(6PJ65/81;O/(+/8,16/2BKR*A+;18*D104(882=2>/+26B8D/,*5E1

#4B*+2,D/61B2>01A,1)Q1(,,1D(61=H+8,1R/+J06(+1(P8,Q65/,1,2441(44(104(88/+1*>K1,2445/4(+1

>/44/615/6104(881P(+81(P8,Q651J+10*+,(8,E

L,,1(A,*D(,28/+(,1,/8,180+/P815J+1/618J+5+(B8P/0,*+184AD)DJ882B,1P(4,81A,1=+Q61

,+J626B85(,(61*>K18/5(61,(B2,81R*+,1=+Q615(,(6E

;12<#'5&+4+

A-./0+%337$78'#7)4

Z/8,/,1A,=H+5/81$&&&&&1BQ6B/+1D/51/61,+J==8J0/+K/,1)Q1YWgE

1

%

B'*%)/$"+U'#/573&64#%;/"G'#/*'/8'++'#/36;/$'+%1&74&/1+%337$78'#%&3/685/G7+1')/

1+%33/*'/5%#/I+7G7&/&7++*'+%*'V1

H)##/9:%&%B42/.60-0*$)%E;IGJC

H)##/9:%<%B42/.60-0*$)%I;KKJC

Y

H)##/9:%E%B42/.60-0*$)%&;&&JC

H)##/9:%G%B42/.60-0*$)%A;LAJC

H)##/9:%A%B42/.60-0*$)%<;<<JC

h

H)##/9:%M%B42/.60-0*$)%I;MKJC

='*$>+4$?+4$:@*A0&&*+$&*0::/0?)*.)&)4B

! @JBB(1,2441=4/+1TRJ,,+/V18J+5+(B

! "2680(1P(+2(68/61=H+1P(+./18J+5+(B

! C,A5/+(15/1=/404(882=2>/+(5/1>/44/+6(

=12>.'-5''.9"

!"#3&%/$"#3"1'&

"7>0/,1,251K(+14(B,816/+1)Q1(,,18,A5/+(1>/44/+6(81A,8//65/1*>K1KA+1.(B1)Q1RJ8,(18J,,1

80A44/10A66(1/M,+(K/+(18J+5+(B/61=+Q615/88(E1Z7PJ++14/55/126,/1+/8A4,(,/,1=+Q615/88(1

,24416QB*61R+(104(882=2>/+26B]1,+J==8J0/+K/,/614QB1)Q1A6B/=J+1N&gE1f1D2,,1=H+8,(1

=H+8H01K(5/1.(B1D/51K28,*B+(D18*D18J+5+(BE1OQB*,18*D1B.*+5/1(,,15/1HP+2B(1

8J+5+(B/61K(5/1/61R/,7542B,1D265+/1+*44[P20,1P25104(882=2>/+26B/61T5Q1K28,*B+(DD/,1

,*B1A))19SS1)4(8,/+1218J+5+(B8P/0,*+6VE1

L61(66(61*+8(01,24415/,15Q42B(1+/8A4,(,/,1P(+1(,,1.(B126,/1K(5/16*+D(428/+(,1P(+./1

P/0,*+E1L=,/+8*D1P288(1PJ+5/61P(+1PJ452B,18,*+(121=H+KQ44(65/1,2441(65+(]1R/,755/15/,1

(,,15/1D265+/1PJ+5/6(126,/1K(5/16QB*61DJ+0R(+126P/+0(61)Q1+/8A4,(,/,E

\/88A,*D1K(5/1.(B1D288(,1(,,1"(,4(R1K(5/1/6126R7BB5104(882=2>/+26B8(4B*+2,D18*D1

.(B10A65/1A,67,,.(E1

T)*#%/$"#3"1'&

L=,/+1/,,1DH,/1D/51D261K(654/5(+/]1=2>01.(B1/,,1,2)81)Q1(,,1B/6/+/+(1=4/+1R245/+1D/51

*420(1,+H80/4PJ+5/61*>K15J+/=,/+1/M,+(K/+(18J+5+(B1=+Q615/88(E1\/,,(14/55/1,2441/61

RJ,,+/1R/80+2P626B1(P1R245/61*>K1+/8A4,(,/,1=H+RJ,,+(5/816JD6PJ+,E

L61(66(,1,2)81.(B1=2>01P(+1(,,1A,=*+80(1"(,4(R81P/+0,7B84Q5(1T,**4R*MV1=H+1

$&

R245(6(478E1\J+1K2,,(5/1.(B1DQ6B(126,+/88(6,(104(88/+18*D1.(B1/M)/+2D/6,/+(5/1

D/51TP(+(P1P288(1.(B1K(+1(6PJ6,18*D18J+5+(BVE1

#,,1K2,,(1R+(18J+5+(B1K(+1P(+2,142,/1(P1/61A,D(626B1215/,1KJ+1)+*./0,/,E1C*D1,252B(+/1

6JD6,1J+15/,1P20,2B,1(,,15/88(18J+5+(B1J+126P(+2(6,(]16QB*,18*D1K(+1B.*+,1(,,1.(B126,/1

K(+10A66(,1A,67,,.(1P288(104(88/+1*>K1/B/680()/+18*D18,*+4/01*>K14206(65/E

\Q1.(B1K(+1P(4,18J+5+(B1K(+1.(B1A,BQ,,1=+Q618J+5+(B1D/51/618Q142,/61P(+2(6818*D1

DH.42B,E

C4A,42B/61(6PJ65/1.(B1"(,4(R8126R7BB(1104(882=2>/+26B8(4B*+2,D1T066>4(882=7V18*D1

(6PJ65/+1;6/(+/8,16/2BKR*A+;:D/,*5/611,244104(882=2>/+26B/6E1XJ+10A65/1D(618J,,(1

/,,1PJ+5/1=H+101*>K15J+D/51,*B1(4B*+2,D/61KJ68761,24415/1016J+D8,(1B+(66(+6(E1`(B1

0*D1=+(D1,2441(,,101i1S1P(+1/,,1R+(1PJ+5/1=H+1D26(104(882=2>/+26B(+E

?12<#(#*#"'#*

d$e1</6,+/1=*+1"(,K/D(,2>(41C>2/6>/8]1"(,K/D(,2>8]1@ZX]1C/),/DR/+19&$9]1

;fD(B/1#6(47828]1X(65261$;]1

K,,)?[[UUUED(,K8E4,KE8/[D/52([!"#$%&[9&$9[264$E)5=1 1111XJD,(519&$9:$$:9Y

$$

http://www.maths.lth.se/media/FMA170/2012/inl1.pdf

Documents

New Image Analysis Project · 2013. 12. 10. · Image Analysis Project Christoﬀer Cronstro¨m, Joel Sj¨obom 10th December 2012 1 Background and Goals Everyone loves comics. A large