8
Image Annotation by Learning Rules from Regions Patterns Stefan Udristoiu, Anca Loredana Ion Software Engineering Department Faculty of Automation, Computers and Electronics Craiova, Romania [email protected] , [email protected] Abstract—The modeling of multimedia and especially the semantic gap between the visual features and semantic concepts become an important domain due to the quantity of visual digital content, which speedily grows. In this paper, the analysis and semantic annotation of images are studied. The development of methods for colour image annotations based on learning represents the main contribution of the paper. The developed algorithms generate semantic pattern rules that identify high-level image concepts. A semantic pattern rule is a combination of images’ region patterns that identifies semantic concepts. Our methods are not limited to any specific domain and they can be applied in any field. Keywords- image annotation, semantic association rules, image segmentation, visual features, colour, texture, and shape. I. INTRODUCTION In this paper, an automatic method that generates annotations based on visual features of image regions is described. The correlation between the low-level features and high-level concepts is a challenge due to the “semantic gap” [5]. The visual characteristics can’t be directly interpreted in the terms of high-level concepts understood by humans [3]. In other words, the human can’t understand the visual features, because they are represented by multidimensional vectors. Also, the visual similitude does not always mean semantic similitude. If a user interrogates a visual-based retrieval system for a “red rose”, then it is possible that images that contain a “red ball” could be also retrieved. This is the semantic gap between the visual characteristics and semantic information. The fundamental scope of image retrieval is to provide efficient and simple modalities for searching in the image databases [2]. As it is mentioned above, it is difficult to get this target using the traditional retrieval image systems, which do not take into account the semantic aspects of images. The developed prototype permits of expert users to generate rules specific to their domain, by submitting significant categorized images, from which the system can learn semantic pattern rules. The semantic rules map the combinations of visual characteristics (colour, texture, shape, position, etc.) to semantic concepts. The semantic pattern rules capture the meaning and understanding of a domain by an expert, like the visual primitives are definitive for the semantic concepts of an image category. They are represented in Prolog and can be shared and modified depending on the updates in the respective domain. The remainder of this paper is structured as follows. Section 2 presents the selection of visual features and the segmentation algorithm. Section 3 presents the mapping between visual features and semantic concepts. Section 4 details the generation of semantic pattern rules, the semantic classification of images, and discusses the experiments. Finally, section 5 summarizes the conclusions of this study. II. THE SELECTION OF IMAGE VISUAL CHARACTERISTICS A. The image model The model of knowledge representation constitutes the principal element of the annotation system. The image (I) objects are modeled in two levels: physical and logical one. The physical representation R F (I) of an image is related to the image data sequence described by the pixel values. The logical representation R L (I) of the image represents the generalization of the physical representation and is also divided into levels: at the low-level of the hierarchy, the image is represented as a set of visual low-level features F{I}. In our chosen representation, each image is described as a set of colour regions R(I)={r ij }. The transition from the regions set to the semantic concepts recognition represents the big challenge. The semantic concept recognition is placed at the highest level of the model representation hierarchy. The selection of the visual feature set and the image segmentation algorithm are the definitive stage for the semantic annotation process of images [9]. After performing a large set of experiments for computing the visual features of images, we inferred the importance of semantic concepts in establishing the similitude between images. Even if the semantic concepts are not directly related to visual features (colour, texture, shape, position, dimension, etc.), these attributes capture the information about the semantic meaning [10]. B. The image segmentation The ability and efficiency of the colour feature for characterizing the colour perceptual similitude is strongly influenced by the colour space and quantization scheme selection. A set of dominant colours determined for each image provides a compact description, easy to be implemented. The HSV colour space quantized to 166 2010 International Conference on Complex, Intelligent and Software Intensive Systems 978-0-7695-3967-6/10 $26.00 © 2010 IEEE DOI 10.1109/CISIS.2010.43 124

[IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

Image Annotation by Learning Rules from Regions Patterns

Stefan Udristoiu, Anca Loredana Ion Software Engineering Department

Faculty of Automation, Computers and Electronics Craiova, Romania

[email protected], [email protected]

Abstract—The modeling of multimedia and especially the semantic gap between the visual features and semantic concepts become an important domain due to the quantity of visual digital content, which speedily grows. In this paper, the analysis and semantic annotation of images are studied. The development of methods for colour image annotations based on learning represents the main contribution of the paper. The developed algorithms generate semantic pattern rules that identify high-level image concepts. A semantic pattern rule is a combination of images’ region patterns that identifies semantic concepts. Our methods are not limited to any specific domain and they can be applied in any field.

Keywords- image annotation, semantic association rules, image segmentation, visual features, colour, texture, and shape.

I. INTRODUCTION In this paper, an automatic method that generates

annotations based on visual features of image regions is described. The correlation between the low-level features and high-level concepts is a challenge due to the “semantic gap” [5].

The visual characteristics can’t be directly interpreted in the terms of high-level concepts understood by humans [3]. In other words, the human can’t understand the visual features, because they are represented by multidimensional vectors. Also, the visual similitude does not always mean semantic similitude. If a user interrogates a visual-based retrieval system for a “red rose”, then it is possible that images that contain a “red ball” could be also retrieved. This is the semantic gap between the visual characteristics and semantic information.

The fundamental scope of image retrieval is to provide efficient and simple modalities for searching in the image databases [2]. As it is mentioned above, it is difficult to get this target using the traditional retrieval image systems, which do not take into account the semantic aspects of images.

The developed prototype permits of expert users to generate rules specific to their domain, by submitting significant categorized images, from which the system can learn semantic pattern rules. The semantic rules map the combinations of visual characteristics (colour, texture, shape, position, etc.) to semantic concepts.

The semantic pattern rules capture the meaning and understanding of a domain by an expert, like the visual primitives are definitive for the semantic concepts of an

image category. They are represented in Prolog and can be shared and modified depending on the updates in the respective domain.

The remainder of this paper is structured as follows. Section 2 presents the selection of visual features and the segmentation algorithm. Section 3 presents the mapping between visual features and semantic concepts. Section 4 details the generation of semantic pattern rules, the semantic classification of images, and discusses the experiments. Finally, section 5 summarizes the conclusions of this study.

II. THE SELECTION OF IMAGE VISUAL CHARACTERISTICS

A. The image model The model of knowledge representation constitutes the

principal element of the annotation system. The image (I) objects are modeled in two levels: physical and logical one.

The physical representation RF(I) of an image is related to the image data sequence described by the pixel values. The logical representation RL(I) of the image represents the generalization of the physical representation and is also divided into levels: at the low-level of the hierarchy, the image is represented as a set of visual low-level features F{I}. In our chosen representation, each image is described as a set of colour regions R(I)={rij}. The transition from the regions set to the semantic concepts recognition represents the big challenge. The semantic concept recognition is placed at the highest level of the model representation hierarchy.

The selection of the visual feature set and the image segmentation algorithm are the definitive stage for the semantic annotation process of images [9]. After performing a large set of experiments for computing the visual features of images, we inferred the importance of semantic concepts in establishing the similitude between images. Even if the semantic concepts are not directly related to visual features (colour, texture, shape, position, dimension, etc.), these attributes capture the information about the semantic meaning [10].

B. The image segmentation The ability and efficiency of the colour feature for

characterizing the colour perceptual similitude is strongly influenced by the colour space and quantization scheme selection. A set of dominant colours determined for each image provides a compact description, easy to be implemented. The HSV colour space quantized to 166

2010 International Conference on Complex, Intelligent and Software Intensive Systems

978-0-7695-3967-6/10 $26.00 © 2010 IEEE

DOI 10.1109/CISIS.2010.43

124

Page 2: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

colours is used to represent the colour information. Before segmentation, the images are transformed from RGB to HSV colour space and quantized to 166 colours. The colour regions extraction is realized with the colour set back projection algorithm [6]. This algorithm detects the regions of a single colour and has four stages:

1. Transformation, quantization, filtering, 2. Colour set back projection algorithm, 3. Label the detected colour regions, 4. Extraction of region characteristics. The figures 1 and 2 illustrate the image segmentation

process of an image from „cliff” category. For each colour region, the spatial coherency represents the spatial homogeneity of the region in an image, like in the figure 3. It is computed for identifying the 8-connected pixels of the same colour in a region.

(a) (b)

Figure 1. (a) Original image from category “sunset”. (b) Quantized image to 166 colours in HSV space.

Figure 2. The segmentation results.

Weak spatial coherency Strong spatial coherency

Figure 3. Spatial coherency

C. The region texture The texture is another feature that is taken into

consideration for classifying and recognizing the material surfaces. The texture describes the surface attributes like: regularity, thinness, granularity, and roughness. In this paper, we use the method based on co-occurrence matrices. The co-occurrence matrix is based on repeated occurrence of some configurations of pixels intensity in the image. These configurations vary with rapidity for thin texture and slower for roughly texture. The co-occurrence matrices can describe the occurrences of these intensity configurations, and in the case of colour images, a matrix was computed for the index colour of region. The classification of texture is based on the characteristics extracted from the co-occurrence matrix (Haralick et al., 1973).

The vector of texture characteristics extracted from the co-occurrence matrix is created using 6 characteristics.

The maximum probability detects the most frequent motif.

The energy describes the uniformity of the texture. The energy of an image is high when the image is homogeneous.

The entropy measures the randomness of the elements in the matrix; when all elements of the matrix are maximally random, entropy has its highest value. So, a homogeneous image has lower entropy than a non-homogeneous image. In fact, when energy gets higher, entropy should get lower.

The contrast is bigger for images with big contrast. Cluster shade and Cluster prominence are measures of

the skewness of the matrix, in other words the lack of symmetry. When cluster shade and cluster prominence are high, the image is not symmetric.

Correlation measures the correlation between the elements of the matrix. When correlation is high, the image will be more complex than when correlation is low.

D. The region shape Shape is an important characteristic of an object. The

goal of shape descriptors is to uniquely characterize the object shape. Two shape descriptors are used in our experiments:

• Eccentricity is the length ratio between the major and minor axes of the objects; smaller for rounded shapes and greater for distorting ones.

• Compactness is the ratio between the length of object’s boundary and the object’s area.

E. The spatial information The spatial information of the region is also considered

for image annotation. It provides the necessary information in the process of region indexing and semantic definition, like upper left, upper right, centre, etc. The spatial information of each region is represented by two parameters: the centroid of the region Cx,y = (Xc, Yc) and the minimum bounding rectangle (Xl,Xr,Yt,Yb), where (l,r) represents the coordinates of the upper left corner and (t,b) represents coordinates of the bottom right corner.

125

Page 3: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

F. The region indexing In conformity with the defined characteristics, a region

is described by: • The colour characteristics are represented in the

HSV colour space quantized at 166 colours. A region is represented by a colour index which is, in fact an integer number between 0…165. It is denoted as descriptor F1.

• The spatial coherency represents the region descriptor, which measures the spatial compactness of the pixels of same colour. It is denoted as descriptor F2.

• A seven-dimension vector (maximum probability, energy, entropy, contrast, cluster shade, cluster prominence, correlation) represents the texture characteristic. It is denoted as descriptor F3.

• The region dimension descriptor represents the number of pixels from region. It is denoted as descriptor F4.

• The spatial information is represented by the centroid coordinates of the region and by minimum bounding rectangle. It is denoted as descriptor F5.

• A two-dimensional vector-eccentricity and compactness - represents the shape feature. It is denoted as descriptor F6.

III. MAPPING THE VISUAL FEATURES TO SEMANTIC INDICATORS

The visual vocabulary developed for image annotation is based on the concepts of semantic indicators, whereas the syntax captures the basis models of human perception about patterns and semantic categories.

In this study, the representation language is simple, because the syntax and vocabulary are elementary. The language words are limited to the name of semantic indicators. Being visual elements, the semantic indicators are, by example, colour (colour-light-red), spatial coherency (spatial coherency-weak, spatial coherency-medium, spatial coherency-strong), texture (energy-small, energy-medium, energy-big, etc.), dimension (dimension-small, dimension-medium, dimension-big, etc.), position (vertical-upper, vertical-center, vertical-bottom, horizontal-upper, etc.), shape (eccentricity- small, compactness-small, etc.).

The syntax is represented by a model, which describes the images in terms of semantic indicators values. The values of each semantic descriptor are mapped to a value domain which corresponds to the mathematical descriptor.

The value domains of visual mathematical characteristics are experimentally determined using images of WxH dimension.

The colour correspondence between the mathematical and semantic indicator values is determined based on experiments effectuated on a training image database. The colour correspondence is illustrated by the following examples:

• light-red (108), medium-red (122), dark-red (139)

• light-yellow (109), medium-yellow (125), dark-yellow (141)

• light-green (112), medium-green (130), dark-green (142)

• Light-blue (116), medium-blue (136), dark-blue (148).

A hierarchy of values, which are mapped to semantic indicator values, is also determined for texture: texture-energy-big, texture-energy-medium, texture-energy-small, texture-entropy-big, texture-entropy-small, texture-entropy-medium, etc.

The texture correspondence is illustrated by the following examples:

• If 0≤entropy≤1, then the region has entropy-small, • If 1<entropy≤1.67, then the region has entropy-

medium, • If 1.67<entropy, then the region has entropy-big, • If 0≤energy≤0.22, then the region has energy-small, • If 0.22<energy≤0.42, then the region has energy-

medium, • If 0.42<energy≤1, then the region has energy-big, • If 0<contrast≤300, then the region has contrast-

small, • If 300<contrast≤900, then the region has contrast-

medium, • If contrast>900, then the region has contrast-big. A hierarchy of values, which are mapped to semantic

indicator values, is also determined for dimension feature: dimension-big, dimension-small, dimension-medium.

The correspondence between the mathematical and the semantic indicator values is also determined, based on experiments:

• If (PixelCnt≤WxH/10), then the region has dimension-small,

• If (WxH/10<PixelCnt≤WxH/5), then the region has dimension-medium,

• If (PixelCnt>WxH/5), then the region has dimension-big,

where PixelCnt is the number of region pixels, W and H are the width, respectively the height of the image.

A hierarchy of values, which are mapped to semantic indicator values, is also determined for the position feature of region: position-horizontal-right, position-horizontal-centre, position-horizontal-left, position-vertical-upper, position-vertical-centre, and position-vertical-bottom.

The correspondence is illustrated by the following examples:

• If 0<Cx≤W/4, then the region is situated horizontal-left,

• If W/4<Cx≤3W/4, then the region is situated horizontal -centre.

• If 3W/4<Cx≤W, then the region is situated horizontal-right,

• If 0<Cy≤ H/4, then the region is situated vertical-upper,

• If H/4<Cy<3H/4, then the region is situated vertical- centre,

• If 3H/4<Cy≤H, then the region is situated vertical-bottom,

where Cx, Cy represent the coordinates of the centroid region.

126

Page 4: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

A hierarchy of values, which are mapped to semantic indicator values, is also determined for spatial coherency feature. The correspondence between the mathematical and the semantic indicator values is also determined based on experiments:

• If PixelDif≤PixelCnt/8, then the region has spatial-coherency-big,

• If PixelCnt/8<PixelDif≤PixelCnt/4, then the region has spatial-coherency-medium,

• If PixelDif>PixelCnt/4, then the region has spatial-coherency-small,

where PixelDif represents the number of region pixels which have its 8 neighbors of different colour, PixelCnt represents the total number of region pixels.

A hierarchy of values, which are mapped to semantic indicator values, is also determined for shape, as in the figure 4 and 5: eccentricity-big, eccentricity-medium, eccentricity-small, and compactness-small, compactness-medium, compactness-big.

At the end of the mapping process, a figure is represented in Prolog by means of the terms figure(ListofRegions), where ListofRegions is a list of image regions.

The term region(ListofDescriptors) is used for region representation, where the argument is a list of terms used to specify the semantic indicators. The term used to specify the semantic indicators is of the form:

descriptor(DescriptorName, DescriptorValue). In the bellow example, the representation of an image

from cliff category can be observed: figure([

region([descriptor(colour,dark-brown), descriptor(horizontal-position, center), descriptor(vertical-position,center), descriptor(dimension,big),descriptor(shape-eccentricity, small), descriptor(texture-probability, medium), descriptor(texture-inversedifference, medium), descriptor(texture-entropy,small), descriptor(texture-energy,big), descriptor(texture-contrast,big), descriptor(texture-correlation, big)]), …

Figure 4. Correspondence between mathematical values of the visual shape feature (eccentricity) and semantic indicators

Figure 5. Correspondence between mathematical values of the visual shape feature (compactness) and semantic indicators

IV. THE GENERATION OF SEMANTIC PATTERN RULES FOR IMAGES

The knowledge discovery in databases, an important part of data mining, is defined as the automatic discovery of useful information, unknown, non-trivial [4].

The main component in image discovery is the identification of similar image objects [7]. The association rules discover the information about the elements that are frequent. The formal representation of an association rule is the following. Consider I={i1, i2,..,, im} as a set of distinct elements and D a transaction set, where each transaction T is a subset of I or T⊆I. A transaction T contains X if and only if X⊆T. An association rule is an implication A=>B, where A⊆T, B⊆T, A∩B=∅. A and B are called the body, respectively the head of the rule. The support of the rule is defined as the percent of transactions in which both the body (A) and head (B) of the rule are present. If a rule A=>B has support %s, this means that %s of transactions from D contain A∪B. The confidence is defined as the ratio between the number of transactions in which both the body (A) and head (B) of the rule are present, and the number of rules in which only the body is present. If the rule A=>B has the confidence %c, this means that %c transactions from D contain A∩B. The support estimates the probability P(A∪B), and the confidence estimates the conditional probability P(B|A).

We suppose to have an image database DB. In the learning/training phase, the set U = {S1,…,Sn} is a subset of DB and contains n image-examples, labelled by semantic concepts, for which we train the system and generate semantic pattern rules. In the learning phase, the scope is to automatically generate semantic rules R based on categorized image-examples, which identify the semantic concepts of images. A rule determines the set of semantic indicators, which identify the best a semantic concept. In the testing/annotation phase, for each image of the testing subset, namely the images from DB, but that are not in U, we use the generated semantic pattern rules to label them with one or more concepts.

The process of the automated generation of semantic pattern rules and image annotation is the following:

1. The learning phase: semantic pattern rules generation A semantic rule is of the form:

semantic indicators -> category The steps of the learning process are: -relevant images for a semantic concept are used for

training the system.

127

Page 5: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

-each image is automatically segmented and the primitive visual features are computed, as it is described in section 2.

-for each image, the visual primitive features are mapped to semantic indicators, as it is described in section 3.

-the determination of pattern regions for each image belonging to the transaction set, as it is described in section 4.

-the rule generation algorithms are applied to produce semantic pattern rules, which will identify each semantic category from database.

2. The image testing/annotation phase has as scope the automatic annotation of images.

-each new image is automatically segmented into regions,

-for each new image, the low-level characteristics are mapped to semantic indicators,

-the classification algorithm is applied for identifying the image category/semantic concept.

In the developed system, the learning of semantic pattern rules is continuously made, because when a categorized image is added in the training database, the system continues to generate semantic rules.

A. The description of the algorithm for rule generation The method for the generation of the semantic pattern

rule is based on A-priori algorithm for finding the frequent itemsets [1]. The scope of image association rules is to find semantic relationships between image objects.

The rule generation algorithm takes into account the entire region features. This algorithm is based on „region patterns” and necessitates some computations, being necessary a pre-processing phase for determining the visual similitude between the image regions of the same category.

In the pre-processing phase, the region patterns, which appear in the images, are determined. So, each image region Regij is compared with other image regions from the same categories. If the region Regij matches other region Regkm, having in common the features on n1, n2, …, nc positions, then the generated region pattern is SRj ( -, -, n1, n2,…,nc,-,-), and the other features are ignored.

The generation of image region patterns is described in pseudo-code by the following algorithm:

Algorithm IV.1. Generation of region patterns for each image. Input: set of n images {I1, I2,…, In} laying in the same category; each image I is a set of regions Regim. Output: set of region patterns for each image. Method:

for i1=1, n-1 do for j1=1, Ii1.nregions do for i2 = i1 +1, n do for j2=1, Ii2.nregions do

if Regj1 matches Regj2 in the components n1,n2, .., nc then

i=i+1 RSi1=RSi1+(-,-,n1, n2,..nc,-,-) RSi2 = RSi2+(-,-,n1, n2,..nc,-,-)

end. end. end. end. A database with five images relevant for a category is

used. For this category, an example of images representation using region patterns can be observed in the Table I.

By applying the algorithm IV.1 on the database from Table I, the results from Table II are obtained.

As it can be observe, the number of region patterns is big, but it is pruned in the phase of rules elimination process, described in the next section.

The image modelling in terms of itemsets and transaction is the following:

- the transactions represent the set of region patterns, determined by the previous algorithm.

- the itemsets are represented by region patterns of the images laying in the same category.

- the frequent itemsets represent the itemsets with the support greater than the minimum support.

- the itemsets of cardinality between 1 and k are iteratively found.

- the frequent itemsets are used for rule generation. The algorithm for rules generation based on region

patterns is described in pseudo-code:

Algorithm IV.2. The generation of semantic pattern rules based on region patterns. Input: the image set represented as: I =(RS1,…,RSk),where RSm is a region pattern. Output: the set of semantic pattern rules. Method: Ck: the set of region patterns of k-length. Lk: the set of frequent region patterns of k-length. PatternRules: the set of pattern rules constructed from frequent itemsets, for k>1. L1= {frequent region patterns} for(k=1; Lk!=null; k++) do

Ck+1=candidates generated from the set Lk;

foreach transaction t in the database do *Increment the number of all candidates that appear in t

end. Lk+1=candidates from Ck+1 that has the support greater or equal than suport_min.

end. PatternRules=PatternRules+{Lk+1->category}

end.

TABLE I. RELEVANT IMAGES FOR A CERTAIN CATEGORY

ImgID Image Regions

1 R(a,b,c,d), R(a’,b’,c’,d’)

2 R(a,b,c,f), R(a’,b’, c’,d’)

3 R(a,b,c,f’’),R(a’,b’,c’,i)

4 R(a,b,c,f’),R(a’,b’,c’,i’)

5 R(a,b,e,d), R(m,b’,c’,d’)

128

Page 6: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

TABLE II. REGIONS PATTERNS OF IMAGES OF A CERTAIN CATEGOY

ID

Region Patterns

1 R(a,b,c,-),R(a,b,-,d),R(a’,b’,c’,d’), R(a’,b’,c’,-), R(-,b’,c’,d’)

2 R(a,b,c,-),R(a,b,-,-),R(a’,b’,c’,d’), R(a’,b’,c’,-), R(-,b’,c’,d’)

3 R(a,b,c,-), R(a,b,-,-), R(a’,b’,c’,-), R(-,b’,c’,-)

4 R(a,b,c,-), R(a,b,-,-),R(a’,b’,c’,-), R(-,b’,c’,-)

5 R(a,b,-,d),R(a,b,-,-), R(-,b’,c’,d’), R(-,b’,c’,-)

The category of an image is determined by the

following semantic association rules. R(a,b,c,-) and R(a’,b’,c’,-) ->category R(a,b,c,-) and R(a,b,-,-)->category R(a’,b’,c’,-) and R(a,b,-,-) ->category R(a,b,-,-) and R(-,b’,c’,-) ->category R(a,b,c,-) and R(a’,b’,c’,-) and R(a,b,-,-) ->category

The rules are represented in Prolog as facts of the form: PatternRule(Category, Score, ListofRegionPatterns).

The patterns from ListofRegionPatterns are terms of the form: regionPattern(ListofPatternDescriptors).

The patterns from the descriptors list specify the set of possible values for a certain descriptor name. The form of this term is:

descriptorPattern(descriptorName,ValueList). One of the semantic pattern rules used to identify the cliff category is illustrated bellow. This semantic pattern rule has the score (confidence) equal to 100%.

PatternRule (cliff,100, [regionPattern([

descriptorPattern(colour,[dark-brown]), descriptorPattern(horizontal-position,[center,left, right]), descriptorPattern(vertical-position,[center,bottom]), descriptorPattern(dimension,[big]), descriptorPattern(eccentricity-shape,[small]), descriptorPattern(texture-probability, [medium]), descriptorPattern(texture-inversedifference, [medium]), descriptorPattern (texture-entropy,[big]), descriptorPattern(texture-energy,[big]), descriptorPattern (texture-contrast,[big]), descriptorPattern (texture-correlation, [big])])],

[regionPattern ([ descriptorPattern(colour,[medium-brown]), descriptorPattern(horizontal-position,[center,left, right]), descriptorPattern(vertical-position,[center, bottom]), descriptorPattern(dimension,[small]), descriptorPattern(eccentricity-shape,[small]), descriptorPattern (texture-probability,[big]), descriptorPattern(texture-inversedifference,[big]), descriptorPattern(texture-entropy,[medium]), descriptorPattern (texture-energy, [big]), descriptorPattern (texture-contrast,[medium]), descriptorPattern (texture- correlation,[big])])], ….

B. Semantic pattern rules elimination The number of semantic pattern rules that could be

generated is usually big. In this case two problems exist: the first one is that the set of rules can contain noise information and affect the classification time. Another problem is the big number of rules that can also affect the classification time. This is an important problem in the real applications, which necessitate rapid response. On the other side, the elimination of rules can affect the classification accuracy. The elimination methods are the following:

-elimination of specific rules and keeping the rules with big confidence,

-elimination of rules that can introduce errors in the classification process.

The following definitions introduce some notions used in this section [11].

Definition 1: Given two rules R1=>C and R2=>C, the first rule is called general if R1⊆R2. The second one is called specific.

Definition 2: Given two rules R1 and R2, R1 is called stronger than R2 (or R2 is weaker than R1):

(1) If R1 has confidence greater than R2, (2) If the confidences are equal, but the support(R1) is greater than support(R2), (3) If the supports are equal, support(R1) = support(R2), and confidences confidence(R1) = confidence(R2), are equal, but R1 has fewer attributes than R2. The elimination of weak and specific semantic pattern

rules is described in pseudo-code: Algorithm IV.3. Elimination of weak and specific rules.

Input: set of semantic pattern rules, S, generated for each category, C. Output: the set of rules which will be used for classification. Method:

*Sort the rules for C category in conformity to Definition 1

foreach rule in S do begin *Find the most specific rules.

*Eliminate the rules with smallest confidence. end.

C. Semantic image classification The classifier represents the set of semantic pattern rules

used to predict the category of images from the test database. Being given a new image, the classification process searches in the rules set for finding the most appropriate category. Images are processed and are represented by means of semantic indicators as Prolog facts. The semantic pattern rules are applied to the set of images facts, using the Prolog inference engine.

A semantic pattern rule matches an image, if all characteristics which appear in the body of the rule also appear in the image characteristics.

The algorithm for image categorization is described in pseudo-code:

129

Page 7: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

Algorithm IV.4. Semantic classification of an image. Input: new unclassified image and the set of semantic pattern rules; each pattern rule has the confidence Ri.conf. Output: the classified image, and the score of matching. Method:

S = null foreach rule R in PatternRules do begin if R matches I then *Keep R and add R in S I.score = R.conf *Divide S into subsets one foreach category: S1,…,Sn foreach subset Sk from S do

*Add the confidences of all rules from Sk *Add I image in the

category identified by the rules from Sk with the greatest confidence

I.score = max∑Sk.conf end. end. end. In the experiments realized through this study, two

databases are used for testing the learning process. The database used for learning contains 200 images from different nature categories and is used to learn the correlations between images and semantic concepts. All the images from the database have JPEG format and are of different dimensions. The database used in the learning process is categorized into 50 semantic concepts. The system learns each concept by submitting appreciatively 20 images per category. The testing database contains 500 unclassified images.

The results of the semantic image retrieval after the “sunrise” category are observed in the figure 6. These images from the test database are correctly classified by the algorithm.

The results of the semantic image retrieval after the “mountain” category are observed in the figure 7. Two images from the test database are incorrectly classified, since they represent “clouds”.

The performance metrics, precision and average normalized modified retrieval rate (ANMRR), are computed to evaluate the efficiency and accuracy of the semantic pattern rule generation and annotation methods.

The precision is defined as the ratio between the number of images correctly classified by system, and the total number of classified images. The precision is in the range of [0, 1] and greater values represent a better retrieval performance. Averaged Normalized Modified Retrieval Rate (ANMRR) is an overall performance calculated by averaging the result from each query [8]. The ANMRR is in the range of [0, 1] and smaller values represent a better retrieval performance.

These parameters are computed as average for each image category as in the table III.

Figure 6. The results of semantic image retrieval after the “sunrise” category.

Figure 7. The results of semantic image retrieval after the “mountains” category.

130

Page 8: [IEEE 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Krakow, TBD, Poland (2010.02.15-2010.02.18)] 2010 International Conference on Complex,

TABLE III. THE PRECISION AND ANMRR COMPUTED FOR EACH IMAGE

CATEGORY.

Category Precision ANMRR

Fire 0.77 0.39

Iceberg 0.71 0.34

Tree 0.65 0.45

Sunset 0.89 0.14

Cliff 0.93 0.11

Desert 0.89 0.11

Red Rose 0.75 0.20

Elephant 0.65 0.43

Mountain 0.85 0.16

See 0.91 0.09

Flower 0.77 0.31

Also, for each category from the database, the percent of images correctly classified by the system is computed as in Figure 8.

As it can be observed from the experiments, the results are strongly influenced by the complexity of each image category. Actually, the results of experiments are very promising, because they show a small average normalized modified retrieval rate and a good precision for the majority of the database categories, making the system more reliable.

V. CONCLUSION In this study we propose methods for semantic image

annotation based on visual content. For establishing correlations with semantic categories, we experimented and selected some low-level visual characteristics of images. So, each category is translated in visual computable characteristics and in terms of objects that have the great probability to appear in an image category.

Figure 8. Category vs. percent of images correctly classified.

The algorithm that generates the semantic pattern rules, selects the image characteristics with the greatest probability of apparition, offering a better generality.

By comparison to other image annotation methods, our proposed and developed methods have some advantages: the entire process is automated, and a great number of semantic concepts can be defined; these methods can be easily extended to any domain, because the visual features, semantic indicators remain unchanged, and the semantic pattern rules are generated based on the set of example labeled images used for learning semantic concepts; the spatial information is taken into account and it offers rich semantic information about the relationships of the image colour regions (left, right, center, bottom, and upper);

The proposed methods have the limitation that they can’t learn every semantic concept, due to the fact that the segmentation algorithm is not capable to segment images in real objects. Improvements can be brought using a segmentation method with greater semantic accuracy.

REFERENCES [1] A. Berson, S.J. Smith, Data Warehousing, Data Mining, and OLAP,

McGraw-Hill. New York, 1997. [2] G. Carneiro, A. Chan, P. Moreno and N. Vasconcelos, “Supervised

learning of semantic classes for image annotation and retrieval,” IEEE Pattern Analysis Machine Intelligence, vol. 29(3), 2007, pp. 394–410.

[3] A. Hoogs, J. Rittscher, G. Stein, and J. Schmiederer, “Video content annotation using visual analysis and a large semantic knowledge base,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, pp. 327 – 334.

[4] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, Knowledge Discovery in Databases, chapter Knowledge Discovery in Databases: An Overview, MIT Press, 1991.

[5] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22(12), 2000, pp. 1349–1380.

[6] J. R. Smith, S.-F. Chang, “VisualSEEk: a fully automated content-based image query system,” The Fourth ACM International Multimedia Conference and Exhibition, Boston, MA, USA, 1996.

[7] V. Mezaris, I. Kompatsiaris, and M. G. Strintz, “Region-based Image Retrieval using an Object Ontology and Relevance Feedback,” EURASIP JASP, 2004.

[8] B. S. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Standard, Wiley, New York, 2001.

[9] O. Marques, N. Barman, “Semi-automatic Semantic Annotation of Images Using Machine Learning Techniques,” D. Fensel, K. Sycara, and J. Mylopoulos, editors, Proceedings of the International Semantic Web Conference, 2003, pp. 550–565, Florida.

[10] N. Rasiwasia, P. J. Moreno, N. Vasconcelos, “Bridging the Gap: Query by Semantic Example,” IEEE Transactions On Multimedia, vol. 9(5), 2007, pp. 923-938.

[11] O. R. Zaiane, M. L. Antonie, A. Coman, “Application of Data Mining Techniques for Medical Image Classification”, Proceedings of Workshop on Multimedia Data Mining (MDM/KDD’2002), San Francisco, USA, 2002.

131