16
Automatic image annotation and semantic based image retrieval for medical domain Dumitru Dan Burdescu, Cristian Gabriel Mihai, Liana Stanescu n , Marius Brezovan University of Craiova, Faculty of Automation, Computers and Electronics, Bvd. Decebal, No.107, Craiova, Romania article info Keywords: Image annotation Image segmentation Relevance models Ontologies Content based image retrieval abstract Automatic image annotation is the process of assigning meaningful words to an image taking into account its content. This process is of great interest as it allows indexing, retrieving, and understanding of large collections of image data. This paper presents a system used in the medical domain for three distinct tasks: image annotation, semantic based image retrieval and content based image retrieval. An original image segmentation algorithm based on a hexagonal structure was used to perform the segmentation of medical images. Image’s regions are described using a vocabulary of blobs generated from image features using the K-means clustering algorithm. The annotation and semantic based retrieval task is evaluated for two annotation models: Cross Media Relevance Model and Continuous- space Relevance Model. Semantic based image retrieval is performed using the methods provided by the annotation models. The ontology used by the annotation process was created in an original manner starting from the information content provided by the Medical Subject Headings (MeSH). The experiments were made using a database containing color images retrieved from medical domain using an endoscope and related to digestive diseases. & 2012 Elsevier B.V. All rights reserved. 1. Introduction The importance of automatic image annotation has increased with the growth of the digital images collections. Generally speaking, the users find difficult to represent the content of the image using image features and then to perform non-textual queries like color and texture. They prefer instead textual queries and automatic annotation can satisfy this request. Image annota- tion is a difficult task for two main reasons: semantic gap problem – it is hard to extract semantically meaningful entities using just low level image features and the lack of correspondence between the keywords and image regions in the training data. There is a great number of annotation models proposed like Co-occurrence Model [6], Translation Model [3], Cross Media Relevance Model (CMRM) [4], Continuous-space Relevance Model [5] each of them trying to improve a previous model. The annotation process implemented in our system is based on CMRM model which is using the principles defined for relevance models and on CRM model. For both models it is used as a set of color annotated images of the digestive system diseases. The diseases are indicated in images by color and texture changes. There are no public data sets available containing manually annotated images of the digestive system diseases and that could be used as trainning sets for the annotation models. We have used the set of images provided in [9] and these were manually segmented and annotated with words from the ontology. For CMRM model the system learns the distribution of the blobs and words. The blobs represent clusters of image regions obtained using the K-means algorithm. Having the set of blobs each image from the test set is represented using a discrete sequence of blobs identifiers. The distribution is used to generate a set of words for a new image. In CRM model every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations it is computed a joint probabilistic model of image features and words which allows the prediction of the probability of generating a word given the image regions. We have used two color spaces for our system: HSV quantified to 166 colors for extracting color histograms and RGB for extracting texture features using co-occurrence matrixes. Each image is segmented using a segmentation algorithm [17] which integrates pixels into a grid-graph. The usage of the hexagonal structure improves the time complexity of the methods used and the quality of the segmentation results. An annotation model annotates an image by providing a set of words that are describing the semantic content of that image. Each word is retrieved from a controlled vocabulary since not all words are properly describing the images from a specific domain. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.07.030 n Corresponding author. E-mail addresses: [email protected] (D.D. Burdescu), [email protected] (C.G. Mihai), [email protected] (L. Stanescu), [email protected] (M. Brezovan). Please cite this article as: D.D. Burdescu, et al., Automatic image annotation and semantic based image retrieval for medical domain, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07.030 Neurocomputing ] (]]]]) ]]]]]]

Automatic image annotation and semantic based image retrieval for medical domain

  • Upload
    marius

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Neurocomputing ] (]]]]) ]]]–]]]

Contents lists available at SciVerse ScienceDirect

Neurocomputing

0925-23

http://d

n Corr

E-m

mihai_g

stanesc

marius.

PleasNeur

journal homepage: www.elsevier.com/locate/neucom

Automatic image annotation and semantic based imageretrieval for medical domain

Dumitru Dan Burdescu, Cristian Gabriel Mihai, Liana Stanescu n, Marius Brezovan

University of Craiova, Faculty of Automation, Computers and Electronics, Bvd. Decebal, No.107, Craiova, Romania

a r t i c l e i n f o

Keywords:

Image annotation

Image segmentation

Relevance models

Ontologies

Content based image retrieval

12/$ - see front matter & 2012 Elsevier B.V. A

x.doi.org/10.1016/j.neucom.2012.07.030

esponding author.

ail addresses: [email protected] (D.D

[email protected] (C.G. Mihai),

[email protected] (L. Stanescu),

[email protected] (M. Brezovan).

e cite this article as: D.D. Burdescu,ocomputing (2012), http://dx.doi.or

a b s t r a c t

Automatic image annotation is the process of assigning meaningful words to an image taking into

account its content. This process is of great interest as it allows indexing, retrieving, and understanding

of large collections of image data. This paper presents a system used in the medical domain for three

distinct tasks: image annotation, semantic based image retrieval and content based image retrieval.

An original image segmentation algorithm based on a hexagonal structure was used to perform the

segmentation of medical images. Image’s regions are described using a vocabulary of blobs generated

from image features using the K-means clustering algorithm. The annotation and semantic based

retrieval task is evaluated for two annotation models: Cross Media Relevance Model and Continuous-

space Relevance Model. Semantic based image retrieval is performed using the methods provided by

the annotation models. The ontology used by the annotation process was created in an original manner

starting from the information content provided by the Medical Subject Headings (MeSH). The

experiments were made using a database containing color images retrieved from medical domain

using an endoscope and related to digestive diseases.

& 2012 Elsevier B.V. All rights reserved.

1. Introduction

The importance of automatic image annotation has increasedwith the growth of the digital images collections. Generallyspeaking, the users find difficult to represent the content of theimage using image features and then to perform non-textualqueries like color and texture. They prefer instead textual queriesand automatic annotation can satisfy this request. Image annota-tion is a difficult task for two main reasons: semantic gap problem– it is hard to extract semantically meaningful entities using justlow level image features and the lack of correspondence betweenthe keywords and image regions in the training data.

There is a great number of annotation models proposed likeCo-occurrence Model [6], Translation Model [3], Cross MediaRelevance Model (CMRM) [4], Continuous-space Relevance Model[5] each of them trying to improve a previous model.

The annotation process implemented in our system is based onCMRM model which is using the principles defined for relevancemodels and on CRM model. For both models it is used as a setof color annotated images of the digestive system diseases.The diseases are indicated in images by color and texture changes.

ll rights reserved.

. Burdescu),

et al., Automatic image anng/10.1016/j.neucom.2012.07

There are no public data sets available containing manuallyannotated images of the digestive system diseases and that couldbe used as trainning sets for the annotation models. We have usedthe set of images provided in [9] and these were manuallysegmented and annotated with words from the ontology. ForCMRM model the system learns the distribution of the blobs andwords. The blobs represent clusters of image regions obtainedusing the K-means algorithm. Having the set of blobs each imagefrom the test set is represented using a discrete sequence of blobsidentifiers. The distribution is used to generate a set of words for anew image. In CRM model every image is divided into regions,each described by a continuous-valued feature vector. Given atraining set of images with annotations it is computed a jointprobabilistic model of image features and words which allows theprediction of the probability of generating a word given the imageregions.

We have used two color spaces for our system: HSV quantifiedto 166 colors for extracting color histograms and RGB forextracting texture features using co-occurrence matrixes. Eachimage is segmented using a segmentation algorithm [17] whichintegrates pixels into a grid-graph. The usage of the hexagonalstructure improves the time complexity of the methods used andthe quality of the segmentation results.

An annotation model annotates an image by providing a set ofwords that are describing the semantic content of that image.Each word is retrieved from a controlled vocabulary since not allwords are properly describing the images from a specific domain.

otation and semantic based image retrieval for medical domain,.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]2

This constraint is mainly required for medical images. For examplean image containing details about an ulcer should be annotatedusing specific words that are related to digestive diseases. Thisrequirement can be satisfied by using ontologies.

In the medical domain can be used ontologies like Open Biologicaland Biomedical Ontologies [16] or custom ontologies created basedon a source of information from a specific domain. Existing ontologiesare provided in formats that are not always easy to interpret and use.Sometimes it is very difficult to have a clear overview on the existingconcepts and on the relationships between them. A high flexibility isobtained when an ontology is created from scratch using a customapproach. In this way the concepts and the relationships betweenthem are identified directly and are made available to an applicationfor a specific scope. In the context of image annotation, the applica-tion that performs this task receives as input the list of concepts andthe existing relationships. By taking into account the drawbacksassociated with existing ontologies we have decided to create forour system an ontology using a medical source of information likeMeSH. MeSH represents the reference point since it is updatedperiodically. Our ontology is only a derived product that is re-generated when a new version is available for MeSH. An extraargument for using a custom approach was represented by the needto identify and to store the hierarchical relationships. The hierarchicalstructure of the ontology is presented in a graphical manner in theapplication created for the evaluation of the system.

The Medical Headings (MeSH) [11,12] are produced by theNational Library of Medicine (NLM) and contain a high number ofsubject headings, also known as descriptors. MeSH thesaurus is avocabulary used for subject indexing and searching of journalarticles in MEDLINE/PubMed [13]. MeSH has a hierarchicalstructure [18] and contains several top level categories likeAnatomy, Diseases, Health Care, etc. Relationships among con-cepts [14] can be represented explicitly in the thesaurus asrelationships within the descriptor class. Hierarchical relation-ships are seen as parent–child relationships and associativerelationships are represented by the ‘‘see related’’ cross reference.

CMRM provides two methods for semantic based imageretrieval. Given a query word, the first method can be used torank the images using a language modeling approach. Thismethod is very usefull for ranked retrieval. The second method,corresponds to query expansion. The query word(s) is used togenerate a set of blob probabilities from the joint distribution ofblobs and words. This vector of blob probabilities is comparedwith the vector of blobs for each test image using Kullback–Liebler (KL) divergence and the resulting KL distance is used torank the images. CRM computes for each testing image J aconditional probability. All images in the collection are rankedaccording to the conditional likelihood.

The remainder of the paper is organized as follows: related workis discussed in Section 2, Section 3 presents error measures that canbe used to evaluate segmentation algorithms and details about thesegmentation algorithm used for the annotation task, Section 4provides details about ontologies and Medical Subject Headings,Section 5 presents the annotation models used by the system andthe evaluation of the annotation task, Section 6 provides detailsabout semantic based image retrieval and the evaluation of this task,Section 7 provides details about content based image retrievalprocess, Section 8 provides a description of the modules includedin the system architecture and Section 9 concludes the paper.

2. Related work

Object recognition and image annotation are very challengingtasks. For this reason a number of models using a discrete imagevocabulary have been proposed for image annotation [1–6].

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

One approach to automatically annotate images is to look atthe probability of associating words with image regions.

The first annotation model proposed was called co-occurrencemodel [6]. This model has used the co-occurrence of words withimage regions created using a regular grid. The image regionsfrom the training data were clustered into a number of regionclusters. For each training image its keywords were propagated toeach region. The major drawback of this model is that it assumesthat if some keywords are annotated to an image, they arepropagated to each region in this image with equal probabilities.To estimate the correct probability this model required largenumbers of training samples.

Images have been described [3] using a vocabulary of blobs.Image regions were obtained using the Normalized-cuts segmen-tation algorithm. For each image region 33 features such as color,texture, position and shape information were computed. Theregions were clustered using the K-means clustering algorithminto 500 clusters called ‘‘blobs’’. This annotation model calledTranslation Model was a substantial improvement of the Co-occurrence model. It used the classical IBM statistical machinetranslation model [7] making a translation from the set of blobsassociated to an image to the set of keywords for that image. Thismodel does not propagate the keywords of an image to eachregion with equal probability. Instead, the association probabilityof a textual keyword to a visual word is taken as a hidden variableand estimated by the Expectation-Maximization (EM) [36]algorithm.

The annotation process was viewed [4] as analogous to thecross-lingual retrieval problem. It was used a Cross MediaRelevance Model to perform both image annotation and rankedretrieval. This model is finding the training images which aresimilar to the test image and propagate their annotations to thetest image. It is assumed that regions in an image can bedescribed using a small vocabulary of blobs. Blobs are generatedfrom image features using clustering. Based on a training set ofimages with annotations and using probabilistic models it ispossible to predict the probability of generating a word giventhe blobs in an image. This model can be used to automaticallyannotate and retrieve images given a word as a query. CMRM ismuch more efficient in implementation than the above men-tioned parametric models because it does not have a trainingstage to estimate model parameters. The experimental resultshave shown that the performance of this model on the samedataset was considerably better than the models proposed in[3,6].

A model called Continuous-space Relevance Model (CRM) [5]was based on a formalism that models the generation of anno-tated images. It is assumed that every image is divided intoregions, each described by a continuous-valued feature vector.Given a training set of images with annotations, it is computed ajoint probabilistic model of image features and words whichallows the prediction of the probability of generating a wordgiven the image regions. This can be used to automaticallyannotate and retrieve images given a word as a query. CMRM isa discrete model and cannot take advantage of continuousfeatures but CRM directly models continuous features. This is asignificant difference between the two models. Another differenceis represented by the fact that CRM model directly associatescontinuous features with words and does not require an inter-mediate clustering stage.

The Correlation LDA model [2] extended the Latent DirichletAllocation model to words and images. The dependence of thetextual words on the image regions are modeled explicitly. Thismodel is estimated using Expectation-Maximization algorithmand assumes that a Dirichlet distribution can be used to generatea mixture of latent factors. The approach relies on a hierarchical

otation and semantic based image retrieval for medical domain,7.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 3

mixture representation of keyword classes, leading to a methodthat has a computational efficiency on complex annotation tasks.

A real-time ALIPR [8] image search engine has used multi-resolution 2D Hidden Markov Models to model concepts deter-mined by a training set. Categorized images are used to train adictionary of hundreds of statistical models each representing aconcept. Images of any given concept are regarded as instances ofa stochastic process that characterizes the concept. To measurethe extent of association between an image and the textualdescription of a concept, the likelihood of the occurrence of theimage based on the characterizing stochastic process is com-puted. A high likelihood indicates a strong association.

Image annotation systems are used in the medical domain.I2Cnet (Image indexing by Content network) [10] provides servicesfor the content-based management of images in health care. EachI2Cnet server maintains an autonomous repository of medicalimages and related information. The annotation service providedby I2Cnet allows specialists to interact with the contents of therepository, adding comments or illustrations to medical images ofinterests. The annotations can be communicate to other users bye-mail or posted to I2Cnet for inclusion in its local repositories.Annotation objects created using the annotation service representthe basis of the collaboration with other specialists through thediscussion and exchange of ideas and opinions.

Support Vector Machines (SVM)-based approaches can be usedfor the annotation purpose. A hierarchical medical image annota-tion system based on SVM is presented in [19]. The system isevaluating the combination of three different methods usingSupport Vector Machines. Global image descriptors are concate-nated with an interest points Bag-of-Words to build a featurevector. An initial annotation of the data is performed using twoknown methods and disregarding the hierarchy of the IRMA code.The hierarchy is taken into consideration by classifying consecu-tively its instances. At the end pairwise majority voting is appliedbetween methods by simply summing strings in order to producea final annotation.

A distributed image annotation architecture called Oxalis [20]allows the annotation of an image with diagnoses and patholo-gies. Oxalis enables a user to display a digital image, to annotatethe image with diagnoses and pathologies using a freeformdrawing tool, to group images for comparison and to assignimages and groups to schematic templates for clarity. Imagesand annotations are stored in a central database where they canbe accessed by multiple users simultaneously. The design ofOxalis enables developers to modify existing system componentsor add new ones, such as display capabilities for a new imageformat, without editing or recompiling the entire system. Systemcomponents can be notified when data records are created,modified, or removed, and can access the most current systemdata at any point. Even if Oxalis was designed for ophthalmicimages it represents a generic architecture for image annotationapplications.

A new generation medical knowledge annotation and acquisi-tion system called SENTIENT-MD (Semantic Annotation andInference for Medical Knowledge Discovery) is presented in[21]. The system has a semantic annotation and inference plat-form for precise semantic annotation of medical knowledge innatural language text. Natural language parse trees are semanti-cally annotated and transformed into annotated semantic net-works for the purpose of inferring general knowledge from thetext. Natural language processing techniques are used to abstractthe text into a semantically meaningful representation guided bya domain ontology.

An ontology annotation tree (OAT) browser [24] was created tofacilitate the analysis of gene lists. OAT includes multiple geneidentifier sets, which are merged internally in the OAT database.

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07

For this system were generated novel MeSH annotations bymapping accession numbers to MEDLINE entries. In OAT wereharmonized two ontologies: one ontology of Medical SubjectHeadings (MeSH) and gene ontology (GO), to enable users touse knowledge both from the literature and the annotationprojects in the same tool.

An annotation system called M7MeDe [25] is used for surgicalvideos. This system creates descriptions in the MPEG-7 [37]standard using MeSH classification based on a mapping betweenMPEG-7 structural annotation classes onto categories of MeSHdescriptors. Each video segment is described either using free textor by attaching keywords from MeSH thesaurus to a selectedMPEG-7 structured annotation category like What, What Object,What Action, etc.

A new approach for improving image retrieval accuracy byintegrating semantic concepts is presented in [44]. Images arerepresented according to various abstraction levels: at the lowestlevel they are represented with visual features, at the upper level,they are represented with a set of very specific keywords and atsubsequent levels, they are represented with more general key-words. Visual content together with keywords are used to create ahierarchical index. A probabilistic classification approach is pro-posed, which allows the grouping of similar images into the sameclass. The index is exploited in order to define three retrievalmechanisms: the first is text-based, the second is content-based,and the third is a combination of both. Experiments have showthat the combination can be used to reduce the semantic gapencountered by most current image retrieval systems, to reducethe retrieval time and to improve retrieval accuracy.

A new framework which tries to improve the effectiveness ofCBIR by integrating semantic concepts extracted from text ispresented in [32]. The model is inspired from the Vector SpaceModel (VSM) [31] developed for information retrieval. Each imageform the collection is represented with a vector of probabilitieslinking it to different keywords. In addition to the semanticcontent of images these probabilities capture the user’s prefer-ences in each step of the relevance feedback. Relevance feedbackenables the user to iteratively refine a query via the specificationof relevant and irrelevant items. By including user in the loop,better search performance can be achieved.

3. Segmentation error measures and images segmentation

Image segmentation is a difficult and challenging task in imageprocessing consisting in dividing an image into different andhomogeneous regions. Image segmentation has an essential rolein image analysis, pattern recognition and low-level vision. Sincemultiple segmentation algorithms exists in literature, numericalevaluations are needed to quantify the consistency betweenthem. Error measures can be used for consistency quantificationbecause are allowing a principled comparison between segmen-tation results on different images, with differing numbers ofregions, and generated by different algorithms with differentparameters. The consistency between segmentations must beevaluated because no unique segmentation of an image can exist.If two different segmentations arise from different perceptualorganizations of the scene, then it is fair to declare the segmenta-tions inconsistent [26].

Any error measure should have the following characteristics[26]: tolerant to refinement, independent of the coarseness ofpixilation, robust to noise along region boundaries, tolerant ofdifferent segment counts between the two segmentations due tothe complexity of the images.

When multiple segmentation algorithms are evaluated somemetrics are needed to establish which algorithm produce better

otation and semantic based image retrieval for medical domain,.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]4

results. A segmentation error measure takes two segmentationsS1 and S2 as input, and produces a real valued output in the range[0..1] where zero signifies no error. For a given pixel pi twosegments S1 and S2, containing that pixel, are considered. If onesegment is a proper subset of the other, then the pixel lies in anarea of refinement, and the local error should be zero. If there isno subset relationship, then the two regions overlap in aninconsistent manner. In this case, the local error should be non-zero. Let \ denote set difference, and 9x9 the cardinality of set x.If R(S,pi) is the set of pixels corresponding to the region insegmentation S that contains pixel pi, the local refinement erroris defined as

E S1,S2,pi

� �¼

9R S1,pi

� �\R S2,pi

� �9

R S1,pi

� � ð1Þ

This local error measure is not symmetric and it encodes ameasure of refinement in one direction only. Given this localrefinement error in each direction at each pixel, there are twonatural ways to combine the values into an error measure for theentire image.

In [26] are proposed two metrics that can be used to evaluatethe consistency of a pair of segmentations: Global ConsistencyError (GCE) and Local Consistency Error (LCE) and are defined as

GCE S1,S2ð Þ ¼1

nmin

XiE S1, S2 ,pi

� �,X

iE S2, S1 ,pi

� �n oð2Þ

LCE S1, S2ð Þ ¼1

n

XiminE S1, S2 ,pi

� �,E S2, S1 ,pi

� �ð3Þ

GCE forces all local refinements to be in the same direction andLCE allows refinement in different directions.

LCErGCE for any two segmentations and it is clear that GCEis a tougher measure than LCE. In [3] are shown that, as expected,when pairs of human segmentations of the same image arecompared, both the GCE and the LCE are low; conversely, whenrandom pairs of human segmentations are compared, the result-ing GCE and LCE are high. If the pixel wise minimum is replacedby a maximum it is obtained a new measure named BidirectionalConsistency Error (BCE) that is not tolerating the refinement. Thismeasure is evaluated using

BCE S1, S2ð Þ ¼1

n

Ximax E S1, S2 ,pi

� �,E S2, S1 ,pi

� �� �ð4Þ

To better understand how the GCE and LCE error metrics workit is interesting to consider what the metrics report on twoextreme cases:

(a)

PlNe

a completely under-segmented image where every pixel hasthe same label; the segmentation contains only one regionspanning the whole image;

(b)

a completely over-segmented image in which every pixel hasa different label.

Fig. 1. The grid-graph constructed on the hexagonal structure of an image.

From the definitions of the GCE and LCE it can be seen thatboth measures evaluate to 0 on both of these extreme situationsregardless of what segmentation they are being compared to.The reason for this can be found in the tolerance of thesemeasures to refinement. Any segmentation is a refinement ofthe completely under-segmented image, while the completelyover-segmented image is a refinement of any other segmentation.The BCE error measure was introduced to avoid this situation,being non-tolerant to refinement.

For image segmentation we have used an efficient segmenta-tion algorithm [17] based on color and some geometric features ofan image. Using GCE and LCE we have evaluated this algorithm in[38,39] against other well know segmentation algorithms likeNormalized Cuts segmentation algorithm, Efficient Graph-Based

ease cite this article as: D.D. Burdescu, et al., Automatic image annurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

segmentation algorithm, Mean-Shift segmentation algorithm, Colorset back-projection algorithm. The experimental results have shownthat this algorithm performs good segmentations.

The efficiency of this algorithm concerns two main aspects

(a)

otat7.03

minimizing the running time – a hexagonal structure basedon the image pixels is constructed and used in color andsyntactic based segmentation;

(b)

using an efficient method for segmentation of color imagesbased on spanning trees and both color and syntactic featuresof regions.

In Fig. 1 it is presented the hexagonal structure used by thesegmentation algorithm:

A particularity of this approach is the basic usage of thehexagonal structure instead of color pixels. In this way thehexagonal structure can be represented as a grid-graph G¼(V, E)where each hexagon h in the structure has a corresponding vertexv E V , as presented in Fig. 1.

Image segmentation is realized in two distinct steps:

(a)

a pre-segmentation step – only color information is used todetermine an initial segmentation;

(b)

a syntactic-based segmentation – color and geometric proper-ties of regions are used.

The segmentation process is using some methods in order toobtain the list of regions:

SameVertexColor – used to determine the color of a hexagon – ExpandColourArea – used to determine the list of hexagons

having the color of the hexagon used as a starting point andhas O(n) as running time where n is the number of hexagonsfrom a region with the same color.

ListRegions – used to obtain the list of regions and has O(n2) asrunning time where n is the number of hexagons from thehexagonal network.

4. Ontologies and Medical Subject Headings (MeSH)

4.1. Ontologies, a general overview

The term ontology originated as a science within philosophybut evolved over time being used in various domains of computerscience. Ontologies are enabling knowledge sharing and supportfor external reasoning. Ontologies can be used for improving theprocess of information retrieval, for solving the problem ofheterogeneous information sources that utilize different repre-sentations, to analyze, model and implement the domain knowl-edge. A taxonomy represents a classification of the data in adomain. Ontology is different than taxonomy from two important

ion and semantic based image retrieval for medical domain,0

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 5

perspectives: it has a richer internal structure as it includesrelations and constraints between the concepts, it claims torepresent a certain consensus about the knowledge in thedomain. This consensus is among the intended users of theknowledge, for example doctors using a hospital ontology regard-ing a certain disease. Computational ontologies are a means toformally model the structure of a system, the relevant entities andrelations that emerge from its observation [40]. The ontologyengineer analyzes relevant entities and organizes them intoconcepts and relations, being represented, respectively, by unaryand binary predicates. The backbone of an ontology consists of ageneralization/specialization hierarchy of concepts, a taxonomy.Ontologies can be very useful in improving the semantic informa-tion retrieval process by allowing an abstractization and anexplicit representation of the information. Ontologies can possessinference functions, allowing more intelligent retrieval. Accordingto their level of generality, ontologies can also be categorized bytop-level ontologies, domain and task ontologies, and applicationontologies. Top-level ontologies describe very general concepts,independent of a particular problem or domain. Domain ontolo-gies describe the vocabulary related to a generic domain. Taskontologies describe a generic task or activity, such as diagnosing,advertising, etc. Domain and task ontologies inherit and specializethe terms introduced in the top-level ontology. Applicationontologies describe concepts depending on both a particulardomain and task.

An ontology represents an explicit and formal specification of aconceptualization [35] containing a finite list of relevant termsand the relationships between them. A ‘‘conceptualization’’ is anabstract model of a phenomenon, created by identification of therelevant concepts of the phenomenon. The concepts, the relationsbetween them and the constraints on their use are explicitlydefined. ‘‘Formal’’ means that ontology is machine-readable andexcludes the use of natural languages. In medical domains, theconcepts are diseases and symptoms, the relations between themare causal and a constraint is that a disease cannot cause itself.A ‘‘shared conceptualization’’ means that ontologies aim torepresent consensual knowledge intended for the use of a group.

In [41] an ontology is a formal explicit description of conceptsin a domain of discourse (classes sometimes called concepts),properties of each concept describing various features and attri-butes of the concept (slots sometimes called roles or properties),and restrictions on slots (facets sometimes called role restric-tions). Classes are the focus of most ontologies. Classes describeconcepts in the domain and slots describe properties of classesand instances. From practical point of view the development of anontology includes: defining classes in the ontology, arranging theclasses in a taxonomic (subclass–superclass) hierarchy, definingslots and describing allowed values for these slots, filling in thevalues for slots for instances. For the ontology design processapplied for our system we have taken into account three funda-mental rules:

(a)

PlNe

there is no one correct way to model a domain—there arealways viable alternatives;

(b)

ontology development is necessarily an iterative process; (c) concepts in the ontology should be close to objects (physical

or logical) and relationships in the domain of interest.

The general process of iterative design used to obtain theontology for our system contains several steps:

(a)

Determining the domain and the scope of the ontology – todefine the domain and the scope a response should be givento the following questions: what is the domain covered by theontology? for what purpose will be used the ontology? In our

ease cite this article as: D.D. Burdescu, et al., Automatic image annotaturocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07.03

case the domain is represented by medical domain and theontology is used for the annotation process.

(b)

Reusing existing ontologies – it is a good approach to considerwhat someone else has done, to check if something can berefined and if existing sources for our particular domain andtask can be extended. Reusing existing ontologies can be arequirement if the system needs to interact with otherapplications that have already committed to particular ontol-ogies or controlled vocabularies. Existing ontologies like OpenBiological and Biomedical Ontologies can have formats thatare not always easy to interpret. For this reason we havedecided to create a custom ontology.

(c)

Enumerating important terms in the ontology – it is useful towrite down a list of all terms we would like either to makestatements about or to explain to a user. What are the termswe would like to talk about? What properties do those termshave? What would we like to say about those terms? Thedescriptors provided by MESH are representing the terms thatshould be taken into account.

(d)

Defining the classes and the class hierarchy – there are severalpossible approaches in developing a class hierarchy [42]:– A top-down development process starts with the definition

of the most general concepts in the domain and subse-quent specialization of the concepts

– A bottom-up development process starts with the definitionof the most specific classes, the leaves of the hierarchy,with subsequent grouping of these classes into moregeneral concepts.

– A combination development process is a combination of thetop-down and bottomup approaches. We have used a top-down development process for our ontology. The followingclasses were identified: concept, hierarchical, child, parent.

ion0

(e)

Defining the properties of classes (slots) – once we have definedsome of the classes, we must describe the internal structure ofconcepts. For example the fields associated with a descriptorwill be used to define the properties of the concept class.

(f)

Defining the facets of the slots – slots can have different facetsdescribing the value type, allowed values, the number of thevalues (cardinality), and other features of the values the slotcan take.

(g)

Creating instances – the last step is creating individualinstances of classes in the hierarchy. Defining an individualinstance of a class requires: choosing a class, creating anindividual instance of that class, filling in the slot values. Eachdescriptor will be represented as an instance of the concept

class and each hierarchical relationship existing between anytwo descriptors will be represented as an instance of thehierarchical class.

4.2. MESH description

Medical Subject Headings (MeSH) represent a comprehensivecontrolled vocabulary for the purpose of indexing journal articlesand books in the life sciences; it can also serve as a thesaurus thatfacilitates searching. Created and updated by the United StatesNational Library of Medicine (NLM), it is used by the MEDLINE/PubMed article database and by NLM’s catalog of book holdings.In MEDLINE/PubMed, every journal article is indexed with some10–15 headings or subheadings, with one or two of themdesignated as major and marked with an asterisk. When perform-ing a MEDLINE search via PubMed, entry terms are automaticallytranslated into the corresponding descriptors. The Medical Sub-ject Headings staffs continually revise and update the MeSHvocabulary. Staff subject specialists are responsible for areas ofthe health sciences in which they have knowledge and expertise.

and semantic based image retrieval for medical domain,

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]6

MeSH’s structure contains a high number of subject headings alsoknown as descriptors. Most of these are accompanied by a shortdescription or definition, links to related descriptors, and a list ofsynonyms or very similar terms known as entry terms. Because ofthese synonym lists, MeSH can also be viewed as a thesaurus. Thedescriptors or subject headings are arranged in a hierarchy and agiven descriptor may appear at several places in the hierarchicaltree. The tree numbers indicate the places within the MeSHhierarchies, also known as the Tree Structures, in which the MHappears. Thus, the numbers are the formal computable represen-tation of the hierarchical relationships. The tree locations carrysystematic labels known as tree numbers, and one descriptor mayhave several tree numbers. For example, the descriptor ‘‘DigestiveSystem Neoplasms’’ has the tree numbers C06.301 andC04.588.274. The tree numbers of a given descriptor are subjectto change as MeSH is updated. Every descriptor also carries aunique alphanumerical ID called DescriptorUI that will not change.

Two important relationship types are defined for MeSH con-tent: hierarchical relationships and associative relationships [18].The hierarchical relationships are fundamental components in athesaurus and MeSH has long formalized its hierarchical structurein an extensive tree structure, currently at nine levels, represent-ing increasing levels of specificity. This structure enables brows-ing for the appropriately specific descriptor. Many examples ofhierarchical relations are instances of the part/whole and class/subclass relationships, which are relatively well understood. Sinceits hierarchical relationships are between descriptors a MeSHdescriptor can have different children in different trees. Hierarch-ical relationships in the MeSH thesaurus are at the level of thedescriptor. Hierarchical relationships are seen as parent–childrelationships. Associative relationships are used to point out inthe thesaurus, the existence of other descriptors, which may bemore appropriate for a particular purpose. They may point outdistinctions made in the thesaurus or in the way the thesaurushas arranged descriptors hierarchically. Many associative rela-tionships are represented by the ‘‘see related’’ cross reference. Thecategories of relationships seem to be greater in number and arecertainly more varied than hierarchical relationships. One attri-bute which can be thought of as an associative relationship withinthe MeSH thesaurus is the Pharmacologic Action. Limited tochemicals this relationship allows the aggregation of chemicalsby actions or uses. MeSH content that can be obtained from [19]and is offered as an xml file named desc2010.xml (2010 version)containing the descriptors and a txt file named mtrees2010.txtcontaining the hierarchical structure. In Fig. 2 it is presented areduced representation of two descriptors, the usefull information

Fig. 2. Reduced representation of two MeSH descriptors.

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

being contained in the DescriptorUI, DescriptorName, TreeNumber

xml nodes:In Fig. 3 it is presented a sample from the mtrees2010.txt file:The hierarchical structure of each category can be established

based on the tree number. For example Colitis having theassociated tree number C06.405.20.265 has the following descen-dants: Colitis, Ischemic (C06.405.205.265.115), Colitis, Microscopic

(C06.405.205.265.173), Colitis, Ulcerative (C06.405.205.265.231),etc. This observation will be taken into account when establishingthe hierarchical relationships between concepts.

4.3. Mapping MESH content to the ontology and graphical

representation

The mapping of MESH content to the ontology is made in anoriginal manner during several steps and using the optionspresented in Fig. 4:

1.

ota7.03

Filtering the content of desc2010.xml file – for our ontology notall the information contained in this file is needed and becauseof this we have filtered the information, keeping only the xmlnodes that are considered useful: DescriptorRecordSet, Descrip-

torRecord, DescriptorUI, DescriptorName, String, TreeNumberList,

TreeNumber, ConceptList, Concept, TermList, Term.

2. Analyzing the content of the mtrees2010.txt file – this file is

processed line by line in order to detect the hierarchicalrelationships.

3.

Analyzing the content of the filtered xml file – the informationassociated to each descriptor is extracted. At the end of thisstep it is obtained the list of descriptors and the list of related(associative) relationships. Each descriptor will be representedas a concept in the ontology that will be created and eachrelationship (hierarchical, associative) will represent a rela-tionship having a specific type.

The ontology is represented as a topic map [33] using the xtmsyntax [34]. In a topic map a concept is represented by a topic and arelationship is represented as an association. In Table 1 it is presentedthe mapping of two Descriptors (parent with D003092 as DescriptorUI

and C06.405.205.265, C06.405.469.158.188 as tree numbers, childwith D003093 as DescriptorUI and C06.405.205.265.231, C06.405.205.731.249, C06.405.469.158.188.231, C06.405.469.432.249 as treenumbers) to two topic items and the mapping of the hierarchicalrelationship between them to an association.

Fig. 3. Sample from the mtrees2010.txt file.

tion and semantic based image retrieval for medical domain,0

Fig. 4. Importing MeSH’s content.

Table 1An example of descriptors and their hierarchical relationship mapping.

Descriptor Topic

/DescriptorRecord DescriptorClass¼ ‘‘1’’S /topic id¼ ‘‘D003092’’S/DescriptorUIS D003092//DescriptorUIS /instanceOfS

/DescriptorNameS /topicRef xlink:href¼ ‘‘#concept’’/S/StringSColitis//StringS //instanceOfS

//DescriptorNameS /baseNameS/TreeNumberListS /baseNameStringSColitis //baseNameStringS //baseNameS

/TreeNumberS C06.405.205.265//TreeNumberS //topicS/TreeNumberSC06.405.469.158.188//TreeNumberS

//TreeNumberListS//DescriptorRecordS/DescriptorRecord DescriptorClass¼ ‘‘1’’S /topic id¼ ‘‘D003093’’S/DescriptorUIS D003093 //DescriptorUIS /instanceOfS

/DescriptorNameS /topicRef xlink:href¼ ‘‘#concept’’/S/StringSColitis, Ulcerative //StringS //instanceOfS

//DescriptorNameS /baseNameS/TreeNumberListS /baseNameStringSColitis, Ulcerative //baseNameStringS

/TreeNumberSC06.405.205.265.231//TreeNumberS //baseNameS/TreeNumberSC06.405.205.731.249//TreeNumberS //topicS/TreeNumberSC0C06.405.469.158.188.231//TreeNumberS/TreeNumberSC06.405.469.432.249//TreeNumberS

//TreeNumberListS//DescriptorRecordS

Association/association id¼ ’’ D003092- D003093’’S/instanceOfS/topicRef xlink:href¼ ‘‘#hierarchical’’/S//instanceOfS/memberS

/roleSpecS/topicRef xlink:href¼ ‘‘#parent’’/S//roleSpecS/topicRef xlink:href¼ ‘‘# D003092’’/S

//memberS/memberS

/roleSpecS/topicRef xlink:href¼ ‘‘#child’’/S//roleSpecS/topicRef xlink:href¼ ‘‘# D003093 ‘‘/S

//memberS//associationS

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 7

The topic map contains:

PN

Topics – each descriptor is mapped to a topic item having asunique identifier the content of the DescriptorUI xml node. Thebase name of the topic is retrieved from the DescriptorNamexml node.

� Associations defined between topics – our ontology contains

two types of associations:1. hierarchical – generated using the hierarchical structure of

the MeSH trees and the tree identifiers defined for each

leaeur

se cite this article as: D.D. Burdescu, et al., Automatic image annotaocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07.03

concept (used to identify the concepts implied in theassociation)

2. related-to – a descriptor can be related to other descriptors.This information is mentioned in the descriptor content bya list of DescriptorUI values. In practice a disease can becaused by other diseases.

tio0

After the ontology is generated and represented as a topic mapit can be explored using the TMNav [43] software. TMNav is a Javaapplication for browsing topic maps. The application can connect

n and semantic based image retrieval for medical domain,

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]8

to and browse topic maps from any of the supported TM4Jbackends. The navigation is presented in both a standard SwingGUI and a dynamic graph GUI using the TouchGraph library. Thegoal of the TMNav project is to create a framework for developingtopic map browsing and editing applications and a referenceimplementation of a topic map editor. TMNav provides threeviews:

PN

a List view containing a dropdown list that can be used forfiltering; the user can select the option Topics and after thatthe list containing all topics will be shown; other possibleoptions are Associations, Topic Types, Association Types, Member

Types. This view can be seen in the left part of Fig. 5.

– a Tree View containing details about a selected topic or

association item. This view can be seen in the right part ofFig. 5.

a Graph view containing details about a selected topic orassociation item. This view can be seen in Fig. 6.

An ontology can have a big size and because of this it is neededto use a software that provides browsing capabilities. By using a

Fig. 5. List view and tree vie

Fig. 6. Graph view pro

lease cite this article as: D.D. Burdescu, et al., Automatic image anneurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

graphical representation of an ontology it is offered an intuitivesolution to a user that is not familiar with the details related tothe syntax used to represent the concepts and the relationshipswithin the ontology. Topic maps are very powerful in their abilityto organize information, but they may be very large. Intuitivevisual user interfaces may significantly reduce users’ cognitiveload when working with these complex structures. Visualizationis a promising technique for both enhancing users’ perceptions ofstructure in large information spaces and providing navigationfacilities. It also enables people to use natural tools of observationand processing – their eyes as well as their brains – to extractknowledge more efficiently and to find insights.

5. The annotation process

5.1. Annotation models

5.1.1. CMRM model

The Cross Media Relevance Model is a non-parametric modelfor image annotation and assigns words to the entire image and

w provided by TMNav.

vided by TMNav.

otation and semantic based image retrieval for medical domain,7.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 9

not to specific blobs. A test image I is annotated by estimating thejoint probability of a keyword w and a set of blobs:

P w,b1,. . .,bmð Þ ¼X

JATPðJÞP w,b1,. . .,bm9J

� �: ð5Þ

For the annotation process the following assumptions aremade:

(a)

PlNe

it is given a collection C of un-annotated images;

(b) each image I from C to can be represented by a discrete set of

blobs:

I¼ b1. . .bm

(c)

there exists a training collection T, of annotated images,where each image J from T has a dual representation in termsof both words and blobs:

J¼ b1. . .bm;w1. . .wn

(d)

P(J) is kept uniform over all images in T; (e) the number of blobs and words in each image (m and n) may

be different from image to image;

(f) no underlying one to one correspondence is assumed

between the set of blobs and the set of words; it is assumedthat the set of blobs is related to the set of words.

P(w,b1,y,bm9J) represents the joint probability of keyword w

and the set of blobs ðb1,. . .,bmÞ conditioned on training image J. InCMRM it is assumed that, given image J, the events of observing aparticular keyword w and any of the blobs ðb1,. . .,bmÞ aremutually independent. This means that P(b1,y,bm9J) can bewritten as

P w,b1,. . .,bm9J� �

¼ P w9J� �Ym

i ¼ 1P bi9J� �

ð6Þ

P w9J� �

¼ 1�aJ

� � # w,Jð Þ

9J9þaJ

# w,Tð Þ

9T9ð7Þ

P b9J� �

¼ 1�bJ

� � # b,Jð Þ

9J9þbJ

# b,Tð Þ

9T9ð8Þ

P Jð Þ ¼1

9T9ð9Þ

where

(a)

P(w9J), P(w9J) denote the probabilities of selecting the word w,the blob b from the model of the image J.

(b)

#(w, J) denotes the actual number of times the word w occursin the caption of image J.

(c)

#(w, T) is the total number of times w occurs in all captions inthe training set T.

(d)

#(b, J) reflects the actual number of times some region of theimage J is labeled with blob b.

(e)

#(b, T) is the cumulative number of occurrences of blob b inthe training set.

(f)

9J9 stands for the count of all words and blobs occurring inimage J.

(g)

9T9 denotes the total size of the training set. (h) The prior probabilities P(J) are be kept uniform over all images

in T being estimated with Eq. (9). The smoothing parameters aand b were used as: a¼0.1 and b¼0.9.

5.1.2. CRM model

CRM is based on a statistical formalism that allows to model arelationship between the contents of a given image and the

ease cite this article as: D.D. Burdescu, et al., Automatic image annurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07

annotation of that image. It will be described an approach forlearning a joint probability distribution over the regions of someimage and the words in its annotation. It is supposed that T is thetraining set of annotated images, and let J be an element of T. J isrepresented as a set of image regions rJ¼{r1yrn} along with thecorresponding annotation wJ¼{w1ywm}. It is assumed that theprocess that generated J Tis based on three distinct probabilitydistributions. The words are a random sample from some under-lying multinomial distribution PV(.9J) and the regions rJ areproduced from a corresponding set of generator vectors g1 y gn

according to a process PR ri9gi

� �which is independent of J. Finally,

the generator vectors are themselves a random sample from someunderlying multi-variate density function PG(.9J). The joint prob-ability of observing an image defined by ra together with annota-tion words wB as P ra, wBð Þ is defined as

P rA,wBð Þ ¼X

JET PT

�JÞYnb

b ¼ 1PV wb9J� �Yna

a ¼ 1

ZRk

PR ra9ga

� �PG ga9J� �

dga

ð10Þ

where nA represents the number of image regions and nB

represents the number of words.

5.2. Evaluation of the annotation task

Evaluation measures [23] are considered to evaluate theannotation performance of an algorithm. Let T’ represent a testset, JAT0 be a test image, WJ be its manual annotation set and Wa

J

be its automatic annotation set. The performance can be analyzedfrom two perspectives:

1.

ota.03

Annotation perspective: Two standard measures that are usedfor analyzing the performance from the annotation perspectiveare:(a) Accuracy: The accuracy of the auto-annotated test images

is measured as the percentage of correctly annotatedwords and for a given test image JCT0 is defined as

accuracy¼r

9WJ9ð11Þ

where variable r represents the number of correctly pre-dicted words in J. The disadvantage of this measure isrepresented by the fact that it does not take into accountfor the number of wrong predicted words with respect tothe vocabulary size 9W9.

(b) Normalized score (NS): It is extended directly from accuracyand penalizes the wrong predictions. This measure isdefined as

NS¼r

9WJ9�

r0

9W9�9WJ9ð12Þ

where variable r’ denotes the number of wrong predictedwords in J.

tion0

2.

Retrieval perspective: Retrieval performance measures can beused to evaluate the annotation quality. Auto-annotated testimages are retrieved using keywords from the vocabulary. Therelevance of the retrieved images is verified by evaluating itagainst the manual annotations of the images. Precision andrecall values are computed for every word in the test set.Precision is represented by the percentage of retrieved imagesthat are relevant. Recall is represented by the percentage ofrelevant images that are retrieved. For a given query word wq,precision and recall are defined as

precision wq

� �¼

9JAT 09wqAWaJ 4 wq A WJ9

9JAT 09wqAWaJ 9

ð13Þ

and semantic based image retrieval for medical domain,

Table 2Accuracy values obtained for both models.

Nr. Diagnostic Image CMRM_Accuracy CRM_Accuracy

0 Duodenal ulcer 0.5 0.75

1 Ulcerative colitis 0.75 0.75

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]10

PN

recall wq

� �¼

9JAT 09wqAWaJ 4 wq A WJ9

JAT 09wqA WJð14Þ

It can be useful to measure the number of single-conceptqueries for which at least one relevant image can be retrievedusing the automatic annotations. This metric complimentsaverage precision and recall by providing information abouthow wide the range of concepts that contribute to the averageprecision and recall is. It is defined as

9wq9precision wq

� �404recall wq

� �409 ð15Þ

2 Gastric cancer 0.33 0.66

3 Gastric

lymphoma

0.33 0.66

4 Stomach polyps 0.5 0.75

5 Colon cancer 0.25 0.75

6 Rectal cancer 0.33 0.33

Annotation perspective: In order to evaluate the annotation taskwe have used a testing set of 400 images that were manuallyannotated and not included in the training set used for CMRMand CRM models. This set was segmented using the segmenta-tion algorithm described above. The two annotation modelswere applied and a list of concepts having the joint probabilitygreater than a threshold value was assigned to each image. Thenumber of relevant concepts automatically assigned wascompared against the number of concepts manually assigned.Using this approach it was computed an accuracy value forboth models. CMRM_Accuracy column contains the valuesobtained using CMRM model and CRM_Accuracy contains thevalues obtained using CRM model. The average accuracy valueobtained for the CMRM model was 0.46 and the averageaccuracy value obtained for the m CRM model was 0.54. Itcan be observed that from the accuracy point of view CRMmodel produces better results. Some results are presented inTable 2.

7 Duodenal cancer 0.5 0.75

8 Stomach ulcer 0.33 0.33

9 Esophagus cancer 0.5 0.5

Retrieval perspective: The precision and recall charts are pre-sented in Figs. 7 and 8. It was used the following convention todistinguish between the two models: the values correspondingto the concepts ending with underline (e.g. esophagitis_)belong to the CRM model and the other values belong to theCMRM model.

After computing the precision and recall values for all concepts(not all concepts were shown in the charts due to space limita-tion) it was computed a mean precision equal with 0.38 forCMRM and 0.56 for CRM. A mean recall was equal with 0.47 forCMRM and 0.66 for CRM. It can be observed that the valuescorresponding to the CRM model are always greater than thevalues of the CMRM model. Based on the experimental results itcan be concluded that the CRM model produces better annotationresults.

6. Semantic based image retrieval

6.1. Methods for semantic based image retrieval

The task of semantic image retrieval in this context is similarto the general ad-hoc retrieval problem. It is given a text queryQ¼w1y.wk and a collection C of images. The goal is to retrievethe images that contain objects described by the keywordsw1y.wk, or more generally rank the images I by the likelihoodthat they are relevant to the query. Text retrieval systems cannotbe used because the images I AC are assumed to have no caption.

The Cross Media Relevance Model allows two methods forsemantic based image retrieval:

(a)

le

Probabilistic Annotation-based Cross-Media Relevance Model(PACMRM): Given a query Q¼w1y.wk and the imageI¼{b1ybm} the probability of drawing Q from the model of

ease cite this article as: D.D. Burdescu, et al., Automatic image annotaturocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07.03

I is defined as

P Q9I� �

¼Yk

j ¼ 1

P wj9I� �

ð16Þ

where P(wj9I) is computed using Eqs. (5)–(8)

(b) Direct-Retrieval Cross-Media Relevance Model (DRCMRM):

Given a query Q¼w1y.wk and the image I¼{b1ybm} it issupposed the existence of an underlying relevance modelP(.9Q) such that the query itself is a random sample from thatmodel. It is also assumed that images relevant to Q arerandom samples from P(.9Q). The query is converted into thelanguage of blobs and the probability of observing a givenblob b from the query model can be expressed in terms of thejoint probability of observing b from the same distribution asthe query words w1y.wk:

P b9Q� �

� P b9w1. . .wk

� �¼

P b,w1. . .wkð Þ

P w1. . .wkð Þð17Þ

ion and semantic based image retrieval for medical domain,0

Fig. 7. Precision chart for both models.

Fig. 8. Recall chart for both models.

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 11

PlNe

P b,w1, . . .,wkð Þ ¼X

JATPðJÞP b9J

� �Yk

i ¼ 1P wi9J� �

ð18Þ

Based on this approach images are ranked according to thenegative Kullback–Liebler divergence between the query modelP(.9Q) and the image model (.9I):

�KL Q:I� �

¼XbeB

P b9Q� �

logP b9I� �

P b9Q� � ð19Þ

where P(b9Q) is estimated using Eqs. (17) and (18) and P(b9I) isestimated using Eq. (8).

For CRM model it is given a text query wqry and a testingcollection of un-annotated images. For each testing image J itis used Eq. (10) to get the conditional probability P(wqry9rJ).All images in the collection are ranked according to the conditionallikelihood P wqry9rJ

� �. An image is considered relevant to a given

query if its manual annotation contains all of the query words.

ease cite this article as: D.D. Burdescu, et al., Automatic image annurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07

6.2. Evaluation of the semantic based image retrieval task

The evaluation was made with queries containing one, two orthree words in order to keep a good precision.

It can be observed from Table 3 that the average precisioncomputed for CRM model is greater than the values obtained forCMRM model. It can be concluded that CRM model producesbetter results for the semantic based image retrieval task.

In Table 4 are presented some relevant images that wereretrieved.

7. Content based image retrieval

7.1. General overview

The objective of the content-based visual query is to searchand retrieve in an efficient manner those images from thedatabase that are most appropriate to the image considered bythe user as query. The content-based visual query differs from the

otation and semantic based image retrieval for medical domain,.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]12

usual query by the fact that it implies the similitude search. Visualelements such as color, texture, shape that directly describe thevisual content, and also high-level concepts (for example thesignificance of the objects) are used for retrieving images with asimilar content from the database. The usage of the color andtexture features in content-based image query will lead to betterresults in some diseases. There are some diseases that arecharacterized by the change of the color and the texture of theaffected tissue, for example: ulcer, colitis, esophagitis, polyps,ulcer, and ulcerous tumor. In content-based visual query on colorfeature (the color is the visual feature immediately perceived onan image) the used color space and the level of quantization,meaning the maximum number of colors are of great importance.The color histograms represent the traditional method of describingthe color properties of the images. They have the advantages of easycomputation and up to certain point are insensitive to camerarotating, zooming, and changes in image resolution [30]. Thesolution of representing the color information extracted fromimages using HSV color space, quantized to 166 colors was chosen.It was proved that the HSV color system has the following properties

Table 3Different query sets and the relative performance of the retrieval methods in

terms of average precision.

Query length 1 Word 2 Words 3 Words

Number of queries 223 447 187

Relevant images 1523 1211 740

Average precison (PACMRM) 0.176 0.165 0.193

Average precision (DRCMRM) 0.192 0.174 0.212

Average precision (CRM) 0.241 0.252 0.321

Table 4Relevant images.

Query Relevant images

Esophagitis

Polyps

Rectocolitis

Ulcer

Ulcerated tumor

Esophagus cancer

Ulcerative colitis

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

[29]: it is close to the human perception of colors, it is intuitive, it isinvariant to illumination intensity and camera direction.

The operation of color quantization is needed in order to reducethe number of colors used in content-based visual query frommillions to tens. The chosen solution was proposed by Smith, namelythe quantization of the HSV space to 166 colors [28]. Because thehues represent the most important color feature, a most refinedquantization is necessary. In the circle that represents the color, theprimary colors, red, green and blue are separated by 1201. A circularquantization with a 20-degree step sufficiently separates the colors,such that the primary colors and yellow, magenta and cyan colors areeach represented by three subdivisions. The saturation and the valueare each quantized to three levels. This quantization produces 18hues, three saturations, three values and four greys, in total 166distinct colors in the HSV color space. The experimental studies madeboth on images from nature and medical images have proven thatchoosing the HSV color space, quantized to 166 colors, is one of thebest choices in order to have a content-based visual query process ofgood quality [27]. The 166 colors histogram will be used in thecontent-based visual query process. Together with color, texture is apowerful characteristic of an image, existent in nature and medicalimages, where a disease can be indicated by changes in the color andtexture of a tissue. There are many techniques used for textureextraction, but there is not a certain method that can be consideredthe most appropriate, this depending on the application and the typeof images taken into account [30]. One of the most representativemethods for texture detection is the method that uses co-occurrencematrices. For an image f(x, y), the co-occurrence matrix hdf (i, j) isdefined so that each entry (i, j) is equal to the number of times forthat f(x1,y1)¼ i and f(x2,y2)¼ j, where (x2,y2)¼(x1,y1)þ(d cos f,d sin f).In the case of color images, one matrix is computed for eachof the three channels (R, G, B). This leads to three quadratic matrices

otation and semantic based image retrieval for medical domain,7.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 13

of dimension equal to the number of the color levels presented in animage (256 in our case) for each distance d and orientation f. Theclassification of texture is based on the characteristics extracted fromthe co-occurrence matrix: energy, entropy, maximum probability,contrast, inverse difference moment and correlation. The threevectors of texture characteristics extracted from the three co-occurrence matrices are created using the six characteristics com-puted for d¼1 and f¼0. It results 18 values, used next in thecontent-based visual query.

The system offers the possibility to build the content based visualquery using color characteristic, texture characteristic or a combina-tion of them. The dissimilarity between images taking into con-sideration color characteristic is calculated using histogramsintersection method and for texture the Euclidian distance is used:

1.

Qu

ReRe

Irr

Fig. 9. Clustering process.

PleNe

Euclidian distance for texture feature:

dt ¼XM�1

m ¼ 09hq½m��ht ½m�9� �2

ð20Þ

2.

The intersection of the histograms for color feature:

dc ¼ 1�

PM�1m ¼ 0 min hq½m�, ht ½m�

� �min hq½m�, ht½m�

� ð21Þ

If both distances are used in the query, the total distance isarithmetical average between the distances:

D¼dcþdt

2ð22Þ

7.2. Evaluation of the content-based image retrieval task

The evaluation was made by observing the precision of theimage retrieval. For the following query image it was obtained thefollowing list of images:

ery image:

sults:levant

elevant

8. System’s architecture

System’s architecture is presented in Fig. 11 and contains eightmodules:

(a)

Segmentation module – this module segment an image intoregions using the segmentation algorithm presented above; itcan be configured to segment all images from an existingimages folder on the storage disk.

ase cite this article as: D.D. Burdescu, et al., Automatic image annurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07

(b)

otat.03

Characteristics Extractor Module – this module is using theregions detected by the Segmentation module. For eachsegmented region is computed a feature vector that containsvisual information of the region such as color (color histo-gram with 166 bins, texture (maximum probability, inversedifference moment, entropy, energy, contrast, correlation),position (minimum bounding rectangle) and shape (area,perimeter, convexity, compactness). The components of eachfeature vector are stored in the database.

(c)

Clustering module – we used K-means with a fixed value of 80(established during multiple tests) to quantize these featurevectors obtained from the training set and to generate blobs.After the quantization, each image in the training set isrepresented as a set of blobs identifiers. For each blob it iscomputed a median feature vector and a list of words

belonging to the test images that have that blob in theirrepresentation. The clustering process is presented in Fig. 9:

(d)

Annotation module – for each region belonging to a new image itis assigned the blob which is closest to it in the cluster space.The assigned blob has the minimum value of the Euclidiandistance computed between the median feature vector of thatblob and the feature vector of the region. In this way the newimage will be represented by a set of blobs identifiers. Havingthe set of blobs and for each blob having a list of words we can

ion and semantic based image retrieval for medical domain,0

Fig. 11. System’s architecture.

Fig. 10. Hierarchical structure of the ontology.

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]14

PlNe

determine a list of potential words that can be assigned to theimage. For each word is computed the probability to be assignedto the image an after that the set of n (configurable value) wordshaving a high probability value will be used to annotate theimage. We have used five words for each image.

(e)

Ontology creator module – this module has as input the MeSHcontent that can be obtained from [15] and is offered as anxml file named desc2010.xml (2010 version) containing thedescriptors and a txt file named mtrees2010.txt containingthe hierarchical structure. This module generates the ontol-ogy and exports the content as a topic map [22] by generatingan.xtm file using the xtm syntax. The hierarchical structure ofthe ontology is presented in Fig. 10:

(f)

Manual annotation module – this module is used to obtain thetraining set of annotated images needed for the automaticannotation process.The manual annotation module has a graphical interface anallows the user to select images from the training set, to seethe regions obtained after segmentation and to assign key-words from the created ontology to the selected image.

(g)

Content based image retrieval module – this module computesthe Euclidian distance between the characteristic vector ofthe analyzed image and the existing characteristic vectors inthe database. For each input image it is returned a list ofsimilar images having the value of the distance smaller than athreshold value which is configurable.

(h)

Semantic based image retrieval module – this module is usingthe two types of semantic based image retrieval provided by

ease cite this article as: D.D. Burdescu, et al., Automatic image annotaturocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07.03

CMRM against a query Q ¼ w1. . .wk. The ranked list of n

images containing objects described by the words w1. . .wk isreturned as a result.

9. Conclusions and future work

In this paper we described a system used in the medicaldomain. The CMRM and CRM annotation models implemented bythe system were proven to be very efficient by several studies.Because the quality of an image region and the running time ofthe segmentation process are two important factors for theannotation process we have used a segmentation algorithm basedon a hexagonal structure which was proved to satisfy bothrequirements: a better quality and a smaller running time. Ingeneral the words assigned to a medical image are retrieved froma controlled vocabulary and the usage of ontologies satisfies thisrequirement. For our system we have decided to create anontology using a general accepted source of information likeMeSH. A time consuming analysis was needed to generate theontology starting from MeSH content. A proper understanding ofits structure helped to identify the concepts and the existingrelationships. Only after this step it was possible to design thestructure of the ontology. The ontology content can also berepresented as a topic map and our system has the possibilityto export it using the xtm sintax. In this process each concept isrepresented as a topic item and each relation between concepts asan association. Our system integrates two methods provided by

ion and semantic based image retrieval for medical domain,0

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]] 15

CMRM for semantic based image retrieval and the approachprovided by CRM. By given a query of k words the system willretrieve the list of n words that are contain objects described bythe words specified in the query. The experimental results haveshown that CRM model produces better results than CMRM forimage annotation and semantic based image retrieval tasks. Nextare presented some advantages of our approach. A clear presenta-tion was used for all steps involved in obtaining an ontologybased on a custom approach. For the annotation system it wasused an original architecture and all modules included in thearchitecture were clearly presented. The experimental approachused for evaluating the annotation and semantic retrieval taskswas very efficient. Future work will include an evaluation of thesystem on larger data sets.

References

[1] K. Barnard, P. Duygulu, N. De Freitas, D. Forsyth, D. Blei, M.I. Jordan, Matchingwords and pictures, J. Mach. Learning Res. 3 (2003) 1107–1135.

[2] D. Blei, M.I. Jordan, Modeling annotated data, in: Proceedings of the 26thInternational ACM SIGIR Conference, 2003, pp. 127–134.

[3] P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth, Object recognition asmachine translation: Learning a lexicon for a fixed image vocabulary, in:Seventh European Conference on Computer Vision, 2002, pp. 97–112.

[4] J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrievalusing cross-media relevance models, in: Proceedings of the 26th Interna-tional ACM SIGIR Conference, 2003, pp. 119–126.

[5] V. Lavrenko, R. Manmatha, J. Jeon, A model for learning the semantics ofpictures, in: Proceedings of the 16th Annual Conference on Neural Informa-tion Processing Systems, NIPS’03, 2004.

[6] Y. Mori, H. Takahashi, R. Oka, Image-to-word transformation based ondividing and vector quantizing images with words, in: MISRM’99 FirstInternational Workshop on Multimedia Intelligent Storage and RetrievalManagement, 1999.

[7] P. Brown, S.D. Pietra, V.D. Pietra, R. Mercer, The mathematics of statisticalmachine translation: parameter estimation, Comput. Linguist 19 (2) (1993)263–311.

[8] J. Li, J. Wang, Automatic linguistic indexing of pictures by a statisticalmodeling approach, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003).

[9] /http://www.gastrolab.net/S.[10] E.C. Catherine, Z. Xenophon, C.O. Stelios, I2Cnet medical image annotation

service, Med. Informatics 22 (4) (1997) 337–347. (special issue).[11] /http://www.nlm.nih.gov/S.[12] /http://en.wikipedia.org/wiki/Medical_Subject_HeadingsS.[13] /http://www.ncbi.nlm.nih.gov/pubmedS.[14] /http://www.nlm.nih.gov/mesh/meshrels.htmlS.[15] /http://www.nlm.nih.gov/mesh/filelist.htmlS.[16] /http://www.obofoundry.org/S.[17] Burdescu, D.D., Brezovan, M., Ganea, E., Stanescu L.: A New method for

segmentation of images represented in a HSV Color Space. Springer, Berlin/Heidelberg, 2009.

[18] /http://www.nlm.nih.gov/mesh/2010/mesh_browser/MeSHtree.htmlS.[19] F.A. Igor, C. Filipe, F. Joaquim, C. Pinto da, S.C. Jaime, Hierarchical Medical

Image Annotation Using SVM-based Approaches, in: Proceedings of the 10thIEEE International Conference on Information Technology and Applications inBiomedicine, 2010.

[20] E. Daniel, OXALIS: A Distributed, Extensible Ophthalmic Image AnnotationSystem, Master of Science Thesis, 2003.

[21] L. Baoli, V.G. Ernest, R. Ashwin, Semantic Annotation and Inference forMedical Knowledge Discovery, NSF Symposium on Next Generation of DataMining (NGDM-07), Baltimore, MD, 2007.

[22] /http://www.topicmaps.org/S.[23] S. Biren, R. Benton, Z. Wu, V. Raghavan, Automatic and Semi-automatic

techniques for image annotation, in: Semantic-Based Visual InformationRetrieval, IRM Press, 2007.

[24] A. Bresell, Bo. Servenius, B. Persson, Ontology annotation treebrowser: aninteractive tool where the complementarity of medical subject headings andgene ontology improves the interpretation of gene lists, Appl. Bioinformatics5 (4) (2006) 225–236.

[25] A.A. Kononowicz, Z. Wisniowski, MPEG-7 as a Metadata Standard forIndexing of Surgery Videos in Medical E-Learning. ICCS 2008, Part III, LectureNotes in Computer Science, vol. 5103, 2008, pp. 188–197.

[26] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmentednatural images and its application to evaluating segmentation algorithmsand measuring ecological statistics, in: IEEE (Ed.), Proceedings of the EighthInternational Conference on Computer Vision (ICCV-01), July 7–14, 2001,Vancouver, British Columbia, Canada, vol. 2, pp. 416–425.

[27] L. Stanescu, D.D. Burdescu, A. Ion, M. Brezovan, Content-based image queryon color feature in the image databases obtained from DICOM Files, in:

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.07

International Multi-Conference on Computing in the Global InformationTechnology, Bucharest. Romania, 2006.

[28] J.R. Smith, Integrated Spatial and Feature Image Systems Retrieval, Compres-sion and Analysis. Ph.D. thesis, Graduate School of Arts and Sciences.Columbia University, 1997.

[29] T. Gevers, Image Search Engines: An Overview. Emerging Topics in ComputerVision, Prentice Hall, 2004.

[30] A. Del Bimbo, Visual Information Retrieval, Morgan Kaufmann Publishers, SanFrancisco USA, 2001.

[31] G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing,Commun. ACM 18 (11) (1975) 613–620 November.

[32] M.L. Kherfi, D.Brahmi, D. Ziou (2004). Combining visual features withsemantics fora more effective image retrieval, in: Proceedings of the PatternRecognition, 17th International Conference on (ICPR’04), vol. 2–02, pp.961–964.

[33] Topic Maps, /http://www.topicmaps.org/S.[34] XTM syntax, /http://www.topicmaps.org/xtm/S.[35] T.R. Gruber, A translation approach to portable ontology specifications,

Knowl. Acquis. 5 (1993) 199–220.[36] A.P. Dempster, N.M. Laird, D.,.B. Rubin, Maximum likelihood from incomplete

data via the em algorithm, J. R. Statist. Soc. Ser. B (Methodological) 39 (1)(1977) 1–38.

[37] MPEG-7 Ovesrview, /http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htmS.

[38] Gabriel Mihai, Alina Doringa, Liana Stanescu, A. Graphical interface forevaluating three graph-based image segmentation, 2010, in: Proceedings ofthe International Multiconference on Computer Science and InformationTechnology, 2010, pp. 735–740.

[39] Andreea Iancu, Bogdan Popescu, Marius Brezovan, Eugen Ganea, Region-based Measures for evaluation of color image segmentation, in: Proceedingsof the International Multiconference on Computer Science and InformationTechnology, 2010, pp. 717–722.

[40] Staab, Steffen; Studer, Rudi (Eds.), Handbook on Ontologies, InternationalHandbooks on Information Systems, 2nd ed., 2009.

[41] Natalya F. Noy, Deborah L. McGuinness, Ontology Development 101: A Guideto Creating Your First Ontology.

[42] M. Uschold, M. Gruninger, Ontologies: principles, methods and applications,Knowl. Eng. Rev. 11 (1996) 2.

[43] TMNav, /http://tm4j.org/tmnav.htmlS.[44] Mohammed Lamine Kherfi, Djemel Ziou, . A hierarchical classification

technique for semantics-based image retrieval, in: Yu-Jin Zhang, SemanticBased Visual Information Retrieval book, IGI Global, 2006 (Chapter XV,December 11, 2006).

Dumitru Dan Burdescu is a professor at CraiovaUniversity in Romania. He is currently head of Soft-ware Engineering Department and director of ResearchCentre – Multimedia Application Development –which was authorized by The National UniversityResearch Council, Romanian Education Minister. Forthe last 10 years, his research has focused on theAlgorithms Theory, Multimedia Databases and Infor-mation Retrieval Algorithms with applications on e-Learning and multimedia domains. He received hisdiploma in engineering from University of Craiova,

Faculty of Automation and Computers and diploma in mathematics from Uni-versity of Craiova, Faculty of Mathematics. His thesis work focused on control bycomputers of industrial processes. Since then he has worked both as an academicand as a computer professional. During his professional career, he has writteneight monographs (Romanian) and six book chapters in textbooks on computerscience domain. His current research interests include design of algorithms forvisual information retrieval and tools for Learner Adviser Service on e-Learningplatforms. He has been involved in several research projects on national andinternational level, author and co-author of several journals articles, track chairand session chair at several conferences, author and co-author of severalconference papers and articles, and program committee member and reviewerfor some of the most important conferences within his field of research. He isGeneral Chair at Multimedia – Applications and Processing Symposium, (MMAP).He is currently senior member of IEEE – Computer Society, fellow of InternationalAcademy, Research and Industry Association, member of Association of Comput-ing Machinery, member of Romanian Society of Automation and TechnicalInformatics and fellow of The Institution of Analysts and Programmers.

Cristian Gabriel Mihai is a PhD student in Computersand Information Technology at University of Craiova,Department of Software Engineering. His main fields ofinterest are digital images processing, algorithms forimage segmentation and annotation. His scientific activ-ity consists of a number of over 14 articles. He has anexperience of over four years as software developer,using Net and Java which are used for validating thetheoretical results from research activity.

otation and semantic based image retrieval for medical domain,.030

D.D. Burdescu et al. / Neurocomputing ] (]]]]) ]]]–]]]16

Liana Stanescu is currently professor at SoftwareEngineering Department, Faculty of Automation, Com-puters and Electronics, University of Craiova, and alsomember in the Research Centre – Multimedia Applica-tion Development. She received her diploma in auto-matic control and computer engineering from theUniversity of Craiova, and the Ph.D. degree in compu-ter science from the University of Craiova with thesisentitled: Multimedia databases – a study on somemethods for content-based visual retrieval. Herresearch fields are Databases, Multimedia Databases,

Content-Based Visual Information Retrieval, Applications on Medical Imagery,Image Mining, E-learning, Topic Maps.

Her research work consists of over 120 scientific papers presented at prestigiousinternational conference, two monographs, two chapters book, five textbooks andlaboratory guides, member in 14 research contracts and grants. Liana Stanescu hasserved as a member of many technical program committees and organizingcommittees.

Please cite this article as: D.D. Burdescu, et al., Automatic image annNeurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.0

Marius Brezovan received his diploma in automaticcontrol and computer engineering from the Universityof Craiova, and the Ph.D. degree in automatic controlfrom the University of Craiova. His thesis work focusedon the control of flexible manufacturing systems byusing a class of high level Petri nets.

He is currently a Professor at the Department ofSoftware Engineering, University of Craiova, and mem-ber in the Research Centre – Multimedia ApplicationDevelopment.

He has been actively involved with research in

various aspects of formal methods in system modeling, knowledge based systems,and image processing. This work is supported by research grants and contractsfrom various government and industrial organizations. He has published severalbooks and papers in learned journals and conference proceedings.

He has served as a member of several technical program committees andorganizing committees. His currently research interests include image segmenta-tion, image annotation, object-oriented Petri nets, and knowledge management.

otation and semantic based image retrieval for medical domain,7.030