[IEEE 2009 2nd Conference on Human System Interactions (HSI) - Catania, Italy (2009.05.21-2009.05.23)] 2009 2nd Conference on Human System Interactions - Algorithms for reducing the

.

978-1-4244-3960-7/09/ $25.00 2009 IEEE 97

HSI 2009 Catania, Italy, May 21-23, 2009

Abstract — In this paper we study the possibilities to

discover correlations between visual primitive characteristics and semantic concepts of images, meaning the extraction of semantic meaning based on learning, from an image database. The problem of automatic discovery of semantic inference rules is approached. A semantic rule is a combination of semantic indicator values, which are visual elements, that identifies semantic concepts of images. The annotation procedure starts with the semantic rules generation on each image category. The language used for rules representation is Prolog. The advantages of using Prolog are its flexibility and simplicity in representation of rules. Our methods are not limited to any specific domain and they can be applied in any field.

Keywords — image annotation, association rules, mining images, visual features.

I. INTRODUCTION Our motivation to organize information is inherit. Over

the time, the human learned that this is the key to progress without loosing what they already knew. When we begin to possess more than we can manually organize and arrange, we construct tools for efficient, precise and rapid information organization [9]. The interpretation of what we see is more difficult to be characterized and to learn a machine such that the visual information could be automatically organized.

Actually, in the last decades, ambitious tentative to train the machines to learn, index and annotate images were taken place with great progress [3], [4]. The power of association rules is used in different domain, from semantic web to image mining [7].

In present, an algorithmic modality for characterizing the human vision, namely the understanding of image context, is not universally accepted. So it is not surprising to see continuous efforts for the development of new modelling methods of multimedia information, either reconstructing the previous work, or exploring innovator directions.

II. VISUAL CONTENT AND SEMANTIC GAP In general, the semantic gap is due the following

problems: -the difficulty of complete semantic concepts extraction

from images, meaning the objects recognition and understanding, known as the problem of semantic concepts extraction [3].

-the complexity, ambiguity and subjectivity of human interpretation, known as the problem of semantic concepts interpretation [3].

The problem of semantic concepts extraction appears because a real object can have more appearances, due to physical aspects like the light conditions, to the parameters of the capturing devices (focus, angle, distance), or to different expressions, positions, etc. Also, the same real object could have not the same colour, texture or shape.

The problem of semantic concepts interpretation is due to a lot of factors, like the cultural differences, education, which affects the user interpretation model. Also, the human perception and judgement are not time invariant.

To find semantic rules capable to discover the semantic concepts of images is the challenge of this study. Even if the semantic concepts are not directly related to visual features (colour, texture, shape, position, dimension, etc), these attributes capture the understanding of semantic concepts. The visual descriptors are easier to be selected and computed on specialized databases, in which apriori knowledge can be used to describe image content. Our study is done on a general database and the methods are domain independent. From the experiments effectuated in [1], the importance of semantic concepts for establishing the image similitude is deduced. We experimented various low -level image features for establishing their correlation with semantic categories. So, each semantic category is translated into computed image low-level features. Each category is described in terms of the objects with the biggest probability to exist in the images from that category. For example, in images from “cliff” category, the representative objects are: sky, water, and facet.

Algorithms for Reducing the Semantic Gap in Image Retrieval Systems

Anca Loredana Ion† †University of Craiova, Romania

.

98

III. SELECTION OF VISUAL FEATURES AND THEIR MAPPINGS TO SEMANTIC INDICATORS

The selection of the visual feature set and the image segmentation algorithm are the definitive stage for the semantic annotation process of images.

A. Image segmentation after colour feature The ability and efficiency of the colour feature for

characterizing the colour perceptual similitude is strongly influenced by the colour space and quantization scheme selection. A set of dominant colours determined for each image provides a compact description, easy to be implemented. The HSV colour space quantized to 166 colours is used to represent the colour information. Before to be segmented, the images are transformed from RGB to HSV colour space and quantized to 166 colours. The colour regions extraction is realized with the colour set back projection algorithm [1], [5], [8].

Each region is described by the following visual characteristics:

- The colour characteristics are represented in the HSV colour space quantized at 166 colours. A region is represented by a colour index, which is in fact an integer number between 0..165. It is noted as descriptor F1.

- The spatial coherency represents the region descriptor, which measures the spatial compactness of the pixels of same colour. It is noted as descriptor F2.

- A seven-dimension vector (maximum probability, energy, entropy, contrast, cluster shade, cluster prominence, correlation) represents the texture characteristic. It is noted as descriptor F3.

- The region dimension descriptor represents the number of pixels from region. It is noted as descriptor F4.

- The spatial information is represented by the centroid coordinates of the region and by minimum bounded rectangle. It is noted as descriptor F5.

- A two-dimensional vector (eccentricity and compactness) represents the shape feature. It is noted as descriptor F6.

B. Mapping low level features to semantic indicators Using the experiments, we construct a vocabulary for

image annotation based on the concepts of semantic indicators, whereas the syntax captures the basis models from the human perception about patterns and semantic categories.

In this study, the representation language is simple, because the syntax and vocabulary are elementary. The language words are limited to the name of semantic indicators. Being visual elements, the semantic indicators are by example the colour (colour-light-red), spatial coherency (spatial coherency – small, spatial coherency-medium, spatial coherency - big), texture (energy-small, energy-medium, energy-big, etc.), dimension (dimension-small, dimension-medium, dimension-big, etc.), position (vertical-upper, vertical-center, vertical-bottom, horizontal-upper, etc.), shape (eccentricity- small,

compactness-small, etc.). The syntax is represented by the model, which

describes the images in terms of semantic indicators values. The values of each semantic descriptor are mapped to a value domain, which corresponds to the mathematical descriptor.

The value domains for visual characteristics were manually experimented on images of WxH dimension.

A figure is represented in Prolog by means of terms of the form figure(ListofRegions), where ListofRegions is a list of image regions.

For the region representation, the term region(ListofDescriptors) is used, the argument is a list of terms used to specify the descriptors (semantic indicators), and for semantic indicators the term is of form:

descriptor(DescriptorName, DescriptorValue). The mapping between the values of the low-level

(mathematical) descriptors and the values of semantic indicators are based on experiments effectuated on images from different categories.

We use facts of the form: mappingDescriptor(Name,SemanticValue,ListValues). The argument Name is the name of the semantic

indicator, SemanticValue is the value of the semantic indicator (descriptor), ListValue represents a list of mathematical values and closed intervals, described by the terms: interval(InferiorLimit, SuperiorLimit). For example, the facts:

mappingDescriptor(colour, red_medium, [144, 134]). mappingDescriptor(colour, red_dark, [100, interval(105,

111), 123]). The semantic indicator red_medium has the values 144

and 134, and red_dark has the values 100,123 and all the values from the interval [105,111].

The mapping mechanism has the following Prolog representation:

mapDescriptor( descriptor(Name, MathematicalValues), descriptor(Name, SemanticValue)):-

mapareDescriptor(Name,SemanticValue, ListValues),containValue(ListValues, MathematicalValue).

containValue([Value|_], Value). containValue([interval(InferiorLimit,SuperiorLimit)|_],

Value):- InferiorLimit=<Value,Value=< SuperiorLimit.

containValue([_|ListValues], Value):- containValue(ListValues, Value).

IV. ALGORITHMS FOR ASSOCIATION RULES GENERATION In this paper, the knowledge mining from images is

used for the definition of rules, which convert the low-level primitives of images into semantic high-level concepts [11]. The methods used in this study bring important improvements related to the detailed

.

99

descriptions of images, which are necessary for defining relationships between:

-objects / regions, -classes of visual characteristic, -objects/regions and classes of visual characteristics. The language utilised for rule representation is the

declarative language Prolog with the following advantages: it is the most appropriate for rule interpretation, the rule generation in Prolog is easy and the process in not time consuming.

The annotation method includes two phases: o the learning/training phase in which the

rules are generated for each image category, o the testing/annotation phase in which new

images are annotated using the semantic rules.

We suppose we have an image database DB. In the learning/training phase, the set U = {S1,…,Sn} is a subset of DB and contains n image-examples, labelled by the semantic concept, for which we train the system and generate semantic rules. In the learning phase, the scope is to automatically generate semantic rules R based on labelled image-examples, which indentify the semantic concepts of images. A rule determines which is the set of semantic indicators, which identify the best a semantic concept. In the testing/annotation phase, for each image of the testing subset, namely the images from DB, but that are not in U, we use the generated semantic rules to label them with one or more concepts.

The logic language Prolog is used for representing the semantic rules and a Prolog inference mechanism is used to recognise the high-level semantic features.

A. Semantic rules generation The algorithm for semantic rules generation is based on

A-priori algorithm of finding the frequent itemsets. The choice of the itemsets and trasactions is a domain

dependent problem. In the case of market analysis, the itemsets are products, and the transactions are itemsets brought together.

The scope of image association rules is to find semantic relationships between image objects. For using association rules that discover the semantic information from images, the modelling of images in the terms of itemsets and transactions is necessary:

-the image set with the same category represents the transactions,

-the itemsets are the colours of image regions, -the frequent itemsets represent the itemsets with

support bigger or equal than the minimum support. A subset of frequent itemsets is also frequent,

-the itemsets of cardinality between 1 and k are iteratively found (k-length itemsets),

-the frequent itemsets are used for rule generation. In our method, the Apriori algorithm is used for

discovering the semantic association rules between primitive characteristics extracted from images and

categories/semantic concepts, which images belong to. A semantic rule describes the frequent characteristics

for each category, based on the Apriori rule generation algorithm [6]. The algorithm of semantic rule generation is described in pseudo-code

Algorithm 3.2: the generation of rules for each image category from the database Input: the images set: each image I=( f1, …,fk), where fm is the colour of the image region m. Output: the semantic association rules Method:

Ck: the colour set of k-dimension Lk: the frequent colour set of k-dimension Rules: set of rules constructed from the frequent itemsets for k>1 L1={frequent region colors}; for(k=1; Lk!=vide; k++) do begin

Ck+1=generated candidates from Lk; foreach transaction t in the database do begin

*increments the number of candidates, which appear in t

*Lk+1=candidates from Ck+1 that have the support greater and equal with suport_min

end. end. Rules=Rules + {Lk+1->category}.

In the second variant, the rule generation algorithm takes into account all the region features, not only the colour, as in the first variant. This algorithm is based on „region patterns” and necessitates some computations, being necessary a pre-processing phase for determining the visual similitude between the image regions from the same category.

In the pre-processing phase, the region patterns, which appear in the images, are determined. So, each image region Regij is compared with other image regions from the same categories. If the region Regij matches other region Regkm, having in commun the features on the positions n1, n2, …, nc, then the generated region pattern is SRj (-, -, -, n1, n2,…,nc,-,-), and the other features are ignored.

The algorithm for generation of image region patterns is described in pseudo-code:

Algorithm. Generation of region patterns for each image. Input: set of images {I1, I2, …, In} laying in the same

category; each image I is a set of regions Regim. Output: set of region patterns for each image. Method: for i1=1, n-1 do for j1=1, Ii1.nregions do for i2 = i1 +1, n do for j2=1, Ii2.nregions do if Regj1 matches Regj2 in the components n1, n2, .., nc then i=i+1 RSi1 = RSi1 + (-,-,n1, n2,..nc,-,-) RSi2 = RSi2 + (-,-,n1, n2,..nc,-,-)

We consider a database with five images relevant for a

.

100

category. For this category, an example of images representation using region patterns is the following:

TABLE 1: RELEVANT IMAGES FOR A CERTAIN CATEGORY.

ImgID

Image Regions

1 R(a,b,c,d), R(a’,b’,c’,d’) 2 R(a,b,c,f), R(a’,b’, c’,d’) 3 R(a,b,c,f’’),R(a’,b’,c’,i) 4 R(a,b,c,f’),R(a’,b’,c’,i’) 5 R(a,b,e,d), R(m,b’,c’,d’)

By applying the algorithm for region pattern

generation, the following results are obtained:

TABLE 2: REGIONS PATTERNS OF IMAGES OD A CERTAIN CATEGOY ID Region patterns

1 R(a,b,c,-),R(a,b,-,d),R(a’,b’,c’,d’), R(a’,b’,c’,-), R(-,b’,c’,d’)

2 R(a,b,c,-),R(a,b,-,-),R(a’,b’,c’,d’), R(a’,b’,c’,-), R(-,b’,c’,d’)

3 R(a,b,c,-), R(a,b,-,-), R(a’,b’,c’,-), R(-,b’,c’,-) 4 R(a,b,c,-), R(a,b,-,-),R(a’,b’,c’,-), R(-,b’,c’,-) 5 R(a,b,-,d),R(a,b,-,-), R(-,b’,c’,d’), R(-,b’,c’,-)

As it can be observe, the number of regions patterns is

big, but it is be pruned in the phase of rules elimination, described in the next section.

In this case, the image modelling in terms of itemsets and transaction is the following:

- the transactions represent the set of region patterns, determined by the previous algorithm.

- the itemsets are formed by region patterns of the images laying in the same category.

- the frequent itemsets represent the itemsets with the support greater than the minimum support.

- the itemsets of cardinality between 1 and k are iteratively found.

- the frequent itemsets are used for rule generation. The algorithm for rules generation based on region

patterns is described in pseudo-code: Algorithm 3.4: rules generation based on region patterns. Input: the set of images represented as: I = (RS1, …, RSk), where RSm is the region pattern. Output: the set of association rules. Method: Ck: the set of region patterns of k-length Lk: the set of frequent region patterns of k-length Rules: the set of rules constructed from frequent itemsets for k>1 L1= {frequent region patterns}; for(k=1; Lk!=vide; k++) do begin Ck+1= candidates generated from the set Lk; for each transaction t in the database do *Increment the number of all candidates that appear in t end. Lk+1=candidates from Ck+1 that has the support

greater or equal than suport_min end. Rules = Rules + {Lk+1->category} The category of image databases is determined by the

following semantic association rules. R(a,b,c,-) and R(a’,b’,c’,-) ->category R(a,b,c,-) and R(a,b,-,-)->category R(a’,b’,c’,-) and R(a,b,-,-) ->category R(a,b,-,-) and R(-,b’,c’,-) ->category R(a,b,c,-) and R(a’,b’,c’,-) and R(a,b,-,-) ->category

B. Pruning the rules The number of rules that could be generated is usually

great. In this case two problems exist: the first one is that the set of rules can contain noise information and affect the classification time. Another problem is the big number of rules that can also affect the classification time. This is an important problem in the real applications, which necessitate rapid response. On the other side, the elimination of rules can affect the classification accuracy. The elimination methods are the following:

-elimination of specific rules and keeping the rules with big confidence,

-elimination of rules that can introduce errors in the classification process.

The following definitions introduce some notions used in this section.

Definition 1: Given two rules R1=>C and R2=>C, the first rule is called general if R1⊆R2. The second one is called specific.

Definition 2: Given two rules R1 and R2, R1 is called stronger than R2 (or R2 is weaker than R1):

(1) If R1 has confidence greater than R2, (2) If the confidences are equal, then the support(R1)

has to be greater than support(R2), (3) If both, the supports and confidences are equal, then

R1 has fewer attributes than R2. The elimination algorithm of weak and specific

semantic rules is described in pseudo-code: Algorithm. Elimination of weak and specific rules. Input: set of semantic rules, S, generated for each category, C. Output: the set of rules, which will be used for classification. Method:

*Sort the rules for C category in conformity to Definition 1

foreach rule in S do begin *Find the rules most specific *Eliminate the rules with smallest confidence end.

V. IMAGES CLASSIFICATION The classifier represents the set of rules, which was

selected after the elimination process. It is used to predict the category image. Being given a new image, the classification process uses the semantic inference rules for finding its most appropriate semantic category.

Before the classification, the image is automatically

.

101

processed: -the mathematical and semantic descriptors are

generated; the semantic descriptors are saved as Prolog facts,

-the semantic rules are applied on the facts set, using the Prolog inference engine.

In this study a new method called the „perfect match classification method” for semantic annotation /classification of images, using semantic rules is proposed and developed.

A semantic rule matches an image, if all the characteristics, which appear in the body of the rule, also appear in the image characteristics.

Algorithm. Perfect match classification algorithm. Input: new image, which has to be classified and the rules

set, each rule having the confidence Ri.conf. Output: the image category, which the image will be

annotated with, and the score of matching. Method: S = Ø foreach rule R in Rules do begin if R matches I then *Keep R and add R in S *I.score = R.conf end.

*Divide S into subsets based on identified category: S1, S2,…, Sn

foreach subset Sk from S do *Add the confidences of all rules from Sk

*Add I image in the category identified by the rules from Sk with the greatest confidence I.score = max∑Sk

end. end.

The rules are represented in Prolog as facts: rule(Category, Score, ListofRegionPatterns).

The patterns from ListofRegionPatterns are terms of the form:

regionPattern(ListofPatternDescriptors). The patterns from the descriptors list specify which is the set of possible values for a certain descriptor name. The form of this term is:

descriptorPattern(descriptorName,ValueList.) The values list has the same form as the argument used

for mapping semantic descriptors. The mechanism of rules application is called by means

of the predicate isCategory(Figure, Category,Score), which has a single clause:

isCategory(figure(ListofRegions),Category,Score):- rule(Category, Score, ListofRegionPatterns), verify(ListofRegions, ListofRegionPatterns). The predicate verify tests if for each rule patterns there

is a region who matches the pattern: verify(_, []). verify(ListofRegions, [RegionPattern | ListofRegionPatterns]):-

containsRegion(ListofRegions, RegionPattern), verify(ListofRegions, ListofRegionPatterns ).

containsRegion ([region(ListofDescriptors)|_], RegionPattern(ListofPatternsDescriptors)):- matches(ListofDescriptors, ListofPatternsDescriptors). containsRegion([_|Rest], RegionPattern):-

containsRegion (Rest, RegionPattern). The predicate containsRegion searches in the list of

regions, which describes a figure, a region, which matches a region pattern.

A region matches a region pattern if for each descriptor pattern of the region pattern there is a region descriptor whose value belongs to the values set of the pattern descriptor.

matches(_, []). matches(ListofDescriptors, [PatternDescriptor|RestPatterns]):-

matchesPattern(ListofDescriptors, PatternDescriptor), matches(ListofDescriptors, RestPatterns). To test a descriptor pattern, the predicate

matchesPattern is used and it has the following clauses: matchesPattern([descriptor(DescriptorName, DescriptorValue)|_], PatternDescriptor(DescriptorName, ListofValues)):- containsValue(ListofValues, ValueDescriptor). matchesPattern([_|Descriptors], PatternDescriptor):- matchesPattern (Descriptors, PatternDescriptor).

VI. EXPERIMENTS AND RESULTS

A. Images Databases for learning and testing The application of the learning results –semantic rules,

on other images than the ones used in the learning process is much more difficult. In the experiments realized through this study, two databases are used for learning testing process.

The database used for learning contains 200 images from different nature categories and is used to learn the correlations between images and semantic concepts. All the images from the database have JPEG format and are of different dimensions. The database used in the learning process is categorized into 50 semantic concepts.

The categories used for learning are illustrated in the next table:

TABLE 3: DATABASE CATEGORIES.

ID Category Category keywords 1 Fire fire, night, light 2 Iceberg iceberg, landscape, ocean 3 Tree tree, sky, landscape, nature 4 Sunset sunset, nature, landscape, sun,

evening, sky 5 Cliff cliff, rock, see, nature,

.

102

landscape, sky 6 Desert desert, dry, sand 7 Red Rose rose, flower, love 8 Elephant elephant, animal, jungle 9 Mountain mountain, lake, rock, sky,

landscape 10 See see, sky, water, cliff, landscape 11 Flower rose, daisy, clover

The keywords are manually added to each category, for

describing the images used for learning process. The descriptions of these images are made from simple concepts like „flower, mushrooms” to complex ones „mountains, cliff, lake”. In average, 3.5 keywords were used to describe each category. The process of manual annotation of images used for learning semantic rules took about 7 hours.

B. Performance Study of Methods for Semantic Rules Generation For validating and testing the efficiency of image

annotation process, an experimental framework was designed and implemented based on the above-described methods. For learning a category, 20 images are used and the annotation process is tested on a database with 500 images. An image is considered correctly annotated by the system if the computer predicts the appropriate image category.

Depending on the complexity of images category, the learning process, which generates the semantic rules, can take between 30 and 60 seconds. The semantic rules for each category are stored as Prolog facts.

For testing the efficiency and performance of rules generation (Algorithm 3.2 and Algorithm 3.4), for each image category, the percent of images correctly classified by the two algorithms is computed.

Fig.1.Category vs. percent of images correctly classified

by the system using Algorithm 3.2 and Algorithm 3.4 From the experiments, we deduce that Algorithm 3.4

records better results, because it selects the images characterises with greater apparition probability and it offers greater generality.

VII. CONCLUSION In this study we propose methods for semantic image

annotation based on visual content. For establishing correlations with semantic categories, we experimented and selected some low-level visual characteristics of images. So, each category is translated in visual computable characteristics and in terms of objects that have the great probability to appear in an image category.

On the other hand, images are represented as a single colour regions list and they are mapped to semantic indicators.

Our methods have the limitation that it can’t learn every semantic concept, due to the fact that the segmentation algorithm is not capable to segment images in real objects. Improvements can be brought using a segmentation method with greater semantic accuracy.

REFERENCES [1] A. L.Ion, L. Stanescu, D. Burdescu, S.Udristoiu, "Improving an

Image Retrieval System by Integrating Semantic Features", HSI 2008, IEEE Catalog Number: 08EX1995C, ISBN: 1 4244 1543 8G, 2008, pp. 504-509.

[2] U. Srinivasan, S. Nepal, Managing Multimedia Semantic: IRM Press, 2005.

[3] V. Mezaris, I. Kompatsiaris, M. G. Strintz, “Region-based Image Retrieval using an Object Ontology and Relevance Feedback”, EURASIP JASP, vol. 2004, no. 6, 2004, pp. 886-901.

[4] O. Marques, N. Barman, "Semi-automatic Semantic Annotation of Images Using Machine Learning Techniques", D. Fensel, K. Sycara, and J. Mylopoulos, editors, Proceedings of the International Semantic Web Conference, 2003, pp. 550–565.

[5] J.R. Smith, S.-F. Chang, "VisualSEEk: a fully automated content-based image query system", The Fourth ACM International Multimedia Conference and Exhibition, Boston, MA, USA, 1996.

[6] W.J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, "Knowledge Discovery in Databases: An Overview", AI Magazin, vol. 13, no. 3, 1992, pp. 57-70.

[7] Y. Takama, S. Hattori, "Mining Association Rules for Adaptive Search Engine Based on RDF Technology", IEEE Transactions On Industrial Electronics, vol. 54, no. 2, 2007, pp. 790-796.

[8] Z. Stejic, Y. Takama, K. Hirota, "Relevance Feedback-Based Image Retrieval Interface Incorporating Region and Feature Saliency Patternsas Visualizable Image Similarity Criteria", IEEE Transactions On Industrial Electronics, vol. 50, no. 5, 2003, pp. 839-852.

[9] N. Rasiwasia, P. J. Moreno, N. Vasconcelos, "Bridging the Gap: Query by Semantic Example", IEEE Transactions On Multimedia, vol. 9, no. 5, 2007, pp. 923-938.

[10] A. Smeulders, M.Worring, S. Santini, A. Gupta, and R. Jain, “Contentbased image retrieval: The end of the early years,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 12, 2000, pp. 1349–1380.

[11] G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos, “Supervised learning of semantic classes for image annotation and retrieval,” IEEE Pattern Anal. Machine Intell., vol. 29, no. 3, 2007, pp. 394–410.

[12] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-Based Image Retrieval at the End of the Early Years”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, 2000, pp. 1349–1380.

Documents

[IEEE 2009 2nd Conference on Human System Interactions (HSI) - Catania, Italy (2009.05.21-2009.05.23)] 2009 2nd Conference on Human System Interactions - Algorithms for reducing the