Context-sensitive processing of semantic queries in an image database system

Pergamon Information Processing & Management, Vol. 32. No. 5, pp, 573--600, 1996

Copyright © 1996 Elsevier Science Ltd Printed in Great Britain. All fights reserved

0306-4573/96 $15+0.00

S0306-4573(96)00011-8

C O N T E X T - S E N S I T I V E P R O C E S S I N G O F S E M A N T I C Q U E R I E S

I N A N I M A G E D A T A B A S E S Y S T E M

HUSSAIN SABRI SHAKIR" and MAKOTO NAGAO Department of Electrical Engineering, Faculty of Engineering, Kyoto University, Japan

(Received I May 1995; accepted in final form 25 January 1996)

Abstract--In an image database environment, an image can be retrieved using common names (labels) of entities that appear in it (such as door, book, car, . . . . etc.), In many cases, we specify further details of these entities and some relations between them. A semantic query is the formal expression method of label-based image retrieval requests. This paper shows how an image is abstracted into a hierarchy of entity names and features (such as brightness, length . . . . . etc.), and how relations are established between entities visible in the image. Semantic queries are also hierarchical. However, they are short and often overlook certain levels of the abstraction of the requested image. The core of this paper is a fuzzy matching technique that compares semantic queries to image abstractions by assessing the similarity of contexts between the query and the candidate image. An important objective of this matching technique is to distinguish between abstractions of different images that have the same labels but are different in context from each other. Each image is tagged with a matching degree (against the query) even when it does not provide an exact match of the query. Several experiments have been conducted to evaluate the strategy presented in this paper. Copyright © 1996 Elsevier Science Ltd

1. INTRODUCTION

Semantic queries in image database systems are alphanumeric expressions used to retrieve images from image databases (IDB). The following features makes semantic queries distinguishable from the conventional alphanumeric queries (such as those in relational database systems):

(1) Associative relations

Using the following semantic query 'Retrieve all images in which there are two human figures

one of them larger than the other ' , we can retrieve the image of Fig. l(a). The label human is given to two regions of the image. In a generic semantic query (a graphic example is given in Fig. 9), we can specify any number of entities by giving their names and then specify any number of relations (associative and non-associative) between these names, In a relational database system, however, we can specify data only in records, which are strictly compliant with a schema and has a fixed number of attributes. The problem hence arises when we try to register a whole-part relationship (which is an associative relation) between the whole image (the whole) and these two human figures (the parts), since the record belonging to the whole image will have two values (two pointers to the records of the two human figures). Many associative relations

" To whom all correspondence should be addressed at: QMARK Technologies Inc., 1925 Leslie Street, North York, Ontario, Canada M3B 2M3.

Abbreviations: IDB: Image Database; ST: Semantic Tree; GUI: Graphic User Interface; SeQ: Semantic Query; LM: Local Matching function (produces the matrix LMA); CM: Context Matching function (produces the matrix CMA); GM: Gross Matching function (produces the matrix GMA); UMD: Universal Matching Degree function (produces a single value).

573

574 Hussain Sabri Shakir and Makoto Nagao

(a) (b)

Fig. 1. Sample IDB images.

exist between the human figures of Fig. 1 (a) such as 'mother of ' , 'daughter of ' , and 'holding'. Associative relations are difficult to handle by relational database systems due to the non- specifity of the number of relations linking the same two data entities (a data entity corresponds to a record in a relational database).

(2) Hierarchy

Using the following semantic query 'Retrieve all images with a lady wearing a kimono that has a wide sleeve', we can retrieve the image of Fig. 1 (b). Entities in this query, are organized in a major-to-minor fashion. The label lady in this example is a major component. The label kimono is a component of lady. (The area of kimono totally falls within the area of lady). The label sleeve is a component of kimono. In general, when we describe an image in a semantic query, we specify a number of entity labels and also give some minor details about them.

Neither associative relations nor hierarchies are constraints in conventional alphanumeric database systems (such as relational database systems). In these systems data is stored into records. Each record consists of several attributes that are independent of each other. When we wish to retrieve data records, we specify a number of key values as in the following example of a bank database:

Retrieve full name, address, and full occupation of all customers whose surname is Tanaka and who has deposited more than 500 000 Yen.*

The database engine then searches for records that have all key values within their attributes. No relation is specified between these values.

Semantic query interpretation is influenced by its hierarchy and relations (Shakir & Nagao, 1994). In many occasions, an intuitive similarity exists between two queries although they differ in their hierarchy or relations. For example, let us compare the following three expressions with each other:

(Q~) Retrieve an image that has a head with a face with two eyes. (Q2) Retrieve an image that has a head with two eyes. (Q3) Retrieve an image that has a head with an eye and a face with an eye.

If we consider Q2 above as our model, the question arises: Which of the other two expressions

* Attribute names are underlined, while search keys are given in Italic.

Semantic queries in an IDB system 575

can be considered similar to our model? Intuitively, we would say it is the first, and not the third. The reason for this decision is that we consider Q2 to be an abstracted form of Q~, and the abstraction is acceptable here since all remaining components (head and two eyes) maintained their position in Q2 hierarchy after omitting face. The same thing cannot be said with respect to Q3, and it is therefore rejected. On the other hand, if we provide the same three labels in a relational database query, then Qt will match Q3 and not Q2. This matching defies our perceptive judgment.

Several early and recent efforts took the moderate way in handling pictorial semantics by using conventional database management strategies to cover the issues of acquisition and retrieval. Several extensions to the relational data model (Codd, 1979) were proposed (Tang, 1981; Chang & Liu, 1984). Further, other existing data specification/manipulation languages were extended. The well known SQL was extended into a pictorial language called PSQL (Roussopoulos et al., 1988).

Other dedicated symbolic image retrieval methods were proposed (Chock et al., 1984; Chang et al., 1988; Benson & Zick, 1992; Gorsky & Mehrotra, 1992) as well as several object-oriented pictorial languages (Orenstein, 1986; Joseph & Cardenas, 1988; Orenstein & Manola, 1988). IDB systems that are based on these languages are mainly concerned with documenting a non- pictorial information. Images are considered to be passive attributes in most of these languages. In an IDB of natural images, it is most probable that certain parts of a particular entity (such as a car) might be visible in one image but not in another (such as headlight). The analysis of this entity (the car) therefore varies from one image to another. Documenting entities in the presence of picture-to-picture changes is very difficult using the methods above since they use fixed number of components for each object (i.e. fixed data structures).

The object-oriented model (Kim et al., 1990) is basically a hierarchical architecture. Several IDB systems were developed according to this model such as VIMSYS (Gupta et al., 1991) and REMINDS (Gorsky & Mehrotra, 1992). These systems, like the languages mentioned above, work well with pre-defined object structures but suffer from the following problems:

(a) Over specification: where several attributes are not visible in the image (e.g. when the image is of a car taken from the back. There is no headlights to be documented in the object oriented structure of car).

(b) Under specification: where a component is visible in the image but is not listed as a subclass of the parent object. (e.g. when there is an image of a house standing on arcs. These arcs are not normally included in the structure of house.)

From above, it is evident that issues related to semantic query processing are:

(1) Label spelling mistakes (2) Over-specification of an image entity (3) Under-specification of an image entity (4) Hierarchy errors at runtime (such as specifying the label Head as a part of Ears instead of the opposite).

Our strategy takes these problems into consideration. Beside matching labels with their instances in IDB images, it is shown that a context check is carried out. This check sees whether labels linked to a particular query label (such as head and nose which are linked to face) are also linked to an IDB instance of this label (face in this case). To address the first issue as well as several others (described later), a Graphic User Interface (GUI) for entering semantic queries is described.

2. IDB FEATURE EXTRACTION AND ORGANIZATION

Human perception of images is normally of a hierarchical nature. An image includes a number of entities, with each entity consisting of a number of components. The root of image hierarchy is always the whole image, with prominent entities (such as man, car, house . . . . . etc.) appearing as child nodes of the root. Each entity is then analyzed into minor components (such


as analyzing man to head, body, and limbs) until we reach a component that has no minor components of its own (such as hair). Each entity or component possesses data with respect to its contour, skeleton, area, and mean gray value. In addition, each entity or component possesses any number of labels (an entity that is called lady can also be called female, adult, and human). Our IDB system builds a vocabulary book that is retrievable by the system operator when requested to assure consistency in labeling. To provide maximum flexibility, however, the system operator is free to enter as different, as many, or as few labels as she/he sees proper. The operator can even add more labels that she/he did not enter for a similar entity in another image.

Our IDB data model, therefore, is a hierarchical one that emphasizes the decomposition of major image components into smaller components. As shown in Fig. 2, feature information extraction (during the IDB training phase) is a recursive operation. We developed a high level interface for feature extraction and hierarchy construction that proceeds as follows:

(a) Preliminary operations

(1) After an image is selected for processing [as in Fig. 3(a)], it is given a name (tree root name) and its edge diagram is detected. To generate the edge diagram, a smoothing function is used first to eliminate noise. The edge generating function then subtracts the gray level of each pixel from that of its neighbors. The result leaves only contrasting neighbors visible while the rest are reduced to extremely low values. The edge diagram of the image in Fig. 3(a) is given in Fig. 3(b).

(2) The edge diagram is provided to the system operator as a drawing background. A drawing tool is also provided to allow the user to outline the regions of interest.

(b) Regional feature extraction

(1) The system operator draws a loop around the area of interest as in Fig. 3(c). The edge diagram is used here as a drawing guide• The drawing tool is equipped with a loop completion function that is activated optionally to ensure that the area of interest is well defined.

(2) The selected area of interest (which is defined by the loop) is extracted from the original

I Drawing Tool

i r l .nu., i ° / Edge diagram outlining of Extracting generation region of region of

Interest • Interest

Hierarchy Region of "~ complete. Interest Label

Indices replaces assignment an updated , • )revious line feature

. . . . . . . . . . n~r t rnn t lnn

Fig. 2. Operational block diagram of the image hierarchy extraction module.


image using a segmentation algorithm that is sensitive only to the pen of the drawing tool as in Fig. 3(d).

(3) Some features are extracted from the isolated region. Extracted features are written into the IDB structure in a record corresponding to this region of the image. This record includes the contour string (a string of (x, y) coordinate pairs), smallest encapsulating rectangle, region area, and hierarchy pointers (for parent and daughter tree nodes).

(4) A number of labels (such as Man, Adult, and Male) are entered for the isolated region by the system operator. These labels are also stored in the same IDB record as the features in (3) above. Figure 3(e) shows some of the stored values of extracted and assigned features for the region Statue in the image of Fig. 3(a).

(a) Selected image. (b) Edge diagram. (c) Region outlining. (d) Region extracted.

(l) Color distribution histogram (extracted). The label "Statue" of the smaller window is an assigned feature.

C2) Edge diagram and skeleton (extracted).

(3) Physical image (4) Smallest encapsulating region (extracted). rectangle (extracted).

(e) Extracted and assigned features.

Fig. 3(a-e). Caption overleaf

(5) Distance from boundary diagram (extracted).


(C) Decomposition of current region into minor regions

(1) A menu appears on the screen that a l lows the fo l lowing options:

(i) To outl ine a new region f rom within the region that was jus t extracted in (b) above (for example , this a l lows the user to outl ine the Head region from within the Statue region).

(ii) To outl ine a new region somewhere other than the area occupied by the region extracted in (b) above (this a l lows the user to extract the region Ball from the original image, for example) .

(f) New drawing background. (g) Secondary region outlining. (h) Secondary region extracted.

(i) Complete hierarchy of nodes.

Fig. 3. Semantic tree establishment. (a) Selected image: (b) edge diagram: (c) region outlining; (d) region extracted; (e) extracted and assigned features: ( I ) Color distribution histogram (extracted). The label 'Statue' of the smaller window is an assigned feature. (2) Edge diagram and skeleton (extracted). (3) Physical image region (extracted). (4) Smallest encapsulating rectangle (extracted). (5) Distance from boundary diagram (extracted). (f) New drawing background; (g) secondary region outlining; (h)

secondary region extracted; (i) complete hierarchy of nodes.


(2) If the operator selected (i) above, the area of the edge diagram corresponding to the region extracted in (b) above is used as a new drawing background while the rest of the edge diagram is blanked as in Fig. 3(f). The region outlined now [as in Fig. 3(g)] will be registered [when extracted, as in Fig. 3(h)] as a sub-region of the region extracted in (b) above.

(3) If the operator selected (ii) above, a new region is outlined from an area that is not included in the region in (b) above.

(d) Relations establishment

(1) After the hierarchy of the image is completed, the system operator may (manually) link any two nodes of the hierarchy with an associative relation that has a suitable title [e.g. the operator may link the man and the ball in the image of Fig. 3(a) by the relation carrying]. To give this relation a name, the operator can either enter the name directly or recall the dictionary of the relations entered so far in the IDB.

(2) Comparisons are carried out between feature values of hierarchy nodes (each of which corresponds to an image region). The relation darker than, for example, is detected automatically when the mean gray value of one node is less than another.

The tree given in Fig. 3(i) is the graphical (mouse-sensitive) interface we developed to manage the data retrieval of particular image regions (relation arcs are not shown here to improve visibility). This tree is the final result of the procedure described above. The system operator first outlined the region Statue from within the image that he calls Japanese Culture Dancer. From within the Statue region, the operator then outlined the regions Head, Belt, Lower Limbs, Rob, and Upper Limbs.* The operator then outlined the region Hair from within the region Head, and also outlined the region Mouth from within the region Head.

Hierarchy construction is controlled, as seen in (c) above, by the selections that the operator makes in a number of decision menus. The resulting tree is best perceived as an analytical structure of visual entities. Each composite visual entity (such as Head) is analyzed into secondary components (Hair, Mouth, Eyes and Nose). This tree describes the semantics of these entities. For this reason we call this tree the semantic tree (ST).

We should point out an important performance point. Although IDB data preparation using the method described above depends on the judgment of the system operator on the importance of each image component, the proposed method is extremely beneficial for retrieval. Such human knowledge cannot be applied when a mathematical method is used to analyze the image as is the case in segmentation-based IDB systems.* Using the proposed method, we can retrieve images or parts of images using labels such as Woman, Worrier, . . . . etc. Once an acceptable match is achieved, all other labels registered in a particular semantic tree node for a specific image region are retrieved.

Furthermore, if the label index is used, we can trace any label that was assigned to a different image region jointly with one of the labels available in a particular node of some target image. We can also select and suppress features in the query to generate a wide range of selective outputs. This is the method we used to retrieve the images in Fig. 4(b) and (c). It is obvious that a segmentation-based IDB system cannot accept a semantic query.

* Due to space limitations in the GUI boxes, only parts of these labels are shown. Retrieval of the full label is done by a mouse click on the corresponding box.

A segmentation-based IDB system is the one in which a mathematical function (the segmentation function) is used to divide the image into many regions (usually hundreds or thousands for natural images). This function is sensitive to changes in color and texture between neighboring image pixel groups.


3. PARTIAL MATCHING OF LABELS AND HIERARCHIES

3.1. Node structure in semantic trees and query formalization

A semantic tree is a hierarchical structure where each node corresponds to a record. The node structure consists of the following attributes:

A1- Raw region data. A2- Contour diagram of the region. A3- Edge diagram of the region. A4- Gray scale population histogram of the region.

(a) Original image retneved. (b) Mean gray values retrieved.

(c) Contours retrieved.

Fig. 4. Selective feature retrieval. (a) Original image retrieved; (b) mean gray values retrieved; (c) contours retrieved.


A5- A6- A7- A8- A9- AI0- A l l -

AI

Smallest rectangular shape enclosing region. Representative Cartesian coordinates. Center of gravity of the region. Region area. Distance from boundary diagram. Region Skeleton. Shape feature list:

1.1- Convex space occupation;

Where: and and and and and

Where: ST ST.N

AI 1.2- Circularity; A 11.3- Rectangularity; A 11.4- Perimeter.

AI2- Texture features list: AI2.1- Mean gray value; A12.2- Local extreme; A12.3- Orientation; A 12.4- Texture edges.

AI 3- List of descriptive region labels: A 13.1- Extractable: such as Dark, Bright, Long, Short A13.2- Assigned: such as Eye, Gown, Samurai.

A 14- List of relation descriptive labels: A14.1- Extractable: such as Bigger_ than, Brighter_ than; A14.2- Assigned: such as HoMing, Carrying.

A 15- Pointer to parent region out of which current region was extracted A I6- Pointers to regions whose union generates current region (child nodes).

Attributes A I3 to A 16 are of particular interest to semantic query processing.* For example, the following query requires a semantic tree that satisfies the following conditions: QI

Select: 0.A 1 f rom ST, ('Long" in ST.N.AI3.1) ('Face' in ST.N.A 13.2) (('Wide' in ST.(ST.N.A16.X).AI3.1) ('Mouth' in ST.(ST.N.AI6.X).AI 3.2)) (('Short' in ST.(ST.N.A 16.Y).A 13.1 ) ('Nose" in ST.(ST.N.AI6.Y).A13.2))

= Semantic tree. = The Nth node (in sequence) of the semantic tree.

ST.N.AI6.X = The Xth child node of the node N.

is the formal hierarchical semantic expression for the query: Show all images with a long face that contains a wide mouth and a short nose.

We can observe the following from QI: (1) The semantic query is organized hierarchically. Q 1 has the labels Face and Long at its

root node. The labels Wide and Mouth are provided in the first child node of the root, while the labels Short and Nose in the second child node of the root.

(2) This query asks the IDB engine to search first for a semantic tree node with an instance of Face that has the label Long as well.

(3) When such a semantic tree node is found, its child nodes are searched to see whether one of them possesses both the label Wide and Mouth. This is why the term N in ST.N.AI3.2 is replaced by ST.N.A16.X in the third and fourth condition terms. Serial numbers of child nodes are written in attribute 16 of any semantic tree node. The Symbol X in the third and fourth term

* Attributes AI3.1 and AI4.1 are in practice fuzzy parameters that use particular criteria to measure the satisfaction of the associated label. For example, the label "long" is satisfied if the population of the skeleton (in pixels) divided by the length of the contour of the image region (also in pixels) exceeds a certain ratio. The label "short" works in the opposite direction.


indicates that both Wide and Mouth must belong to the same child node in the candidate semantic tree.

(4) An analysis similar to the one in (3) above applies to the labels Short and Nose, which are mentioned in the fifth and sixth condition terms.

(5) If conditions in (3) and (4) above are also met, the IDB engine retrieves the contents of the first attribute (A 1) of the root node of the semantic tree. As mentioned in Section 2, root node is always the whole image.

3.2. Label-to-label similarity assessment

Q1 presented us with an interesting problem. If a semantic tree of an image was found with both of the labels Face and Long belonging to the same node and if one of the child nodes contained the labels Mouth and Wide while the other contained Nose but not Short, should we regard this image as totally mismatching or partially matching?

An extended semantic query contains not only two levels of condition nodes, but several more levels. If a semantic query consisted of 10 levels of specification, and if nine of these levels were matched against the semantic tree of a particular image, then we should provide a matching assessment that reflects the fact of high matching rate. In order to produce proper matching results, we need to develop a fuzzy similarity measure that provides its output in the range [0,1 ] rather than providing an output from the set {0,1} as is the case in conventional alphanumeric database systems (which have only two cases: full match and fail).

This similarity measure assesses labels related to extracted features (such as length, width darkness, and so on). For example, the label Very Long is similar to Long but not to Very Short (although both Long and Very Short have one word of Very Long). Therefore, similarity of labels should be measured by their feature values and not by their alphabetic content. The following fuzzy measure serves this purpose.

Feature similarity measure (FS). Given: (1) Image reasons g~ and g2, (2) The fuzzy criterion R that associates g~ and g2 to particular feature-related labels

(0<R(gx)< 1). Then:

FS(&, g2) is a fuzzy measure that provides the rate at which the feature value of g2 (and its associated label) resembles the value of gj (and its associated label).

Thus:

Where

FS(g l, g :) = ( l-I( R(gz)-R(g OI) ws s)

SS : Similarity sharpness, SS> 1. Example:

The following label set belong to the length feature: LS= { Very Short, Short, Slightly Short, Normal, Slightly Long, Long, Very Long}. The criterion R which characterizes entity length (as in FSs definition above) is as follows:

(l)

R(gO = CNT~,

(2)

Where:

gx = Region corresponding to the entity in the image. PRg, = Perimeter length of gx- CNTg= Contour length of gx.

Equation (2) provide a minimum value of 0 (PRg=0) and a maximum value of I (PRg,=CNTJ2). If we divide the period [0,1] into seven equal sub-periods, then each member of the set LS (of length labels) corresponds to a particular R value. For example, if all seven sub-

Semantic queries in an IDB system

periods were equal, then:

IF 0<R(gx)<0.1428 Then gx is Very Short. IF 0.1428 <R(gx)<0.2857 Then gx is Short.

583

IF 0.8571 <R(gx)< 1 Then g., is Very Long. Having established the rules above, we can apply eqn (1) to see how similar are the labels of

the set LS to each other. According to eqn (1), the label Very Short is quite different from Very Long since these two labels provide a low FS value. Values of the criterion R for these labels is the middle point of their period. Therefore:

R(Very Short)=(O+O. 1428)/2=0.0714 R(Very Long) =(0.8571 + 1)/2=0.9286 FS(Veo, Short, Very Long)= 1 - (10.0714 - 0.92861)=0.1428 (SS= I)

FS is commutative, i.e. FS(x, y)=FS(y, x). It should be noticed that labels can be assigned to regions. Labels such as Book, Door, Chair, . . . . etc. imply no particular feature. The value of the FS measure for these labels is, therefore, a special case of either zero or one.

4. EVALUATING CANDIDATE SEMANTIC TREES

4.1. Strategy outline

A semantic query consists physically of the following components:

(a) Set of region labels (such as Man, Head, Face . . . . etc.), (b) Set of relations (such as carrying, bigger than, smaller than . . . . etc.), (c) Hierarchy relations (between Man and Head, Head and Face, Face and Eyes . . . . etc.).

Semantic query errors were outlined at the end of the first section of this paper. To handle these errors, we conduct a two-stage matching operation as shown in Fig. 5. In the first stage, labels are directly matched (using the FS function if necessary). The second stage is the context matching. It involves matching labels of regions and relations that are linked to the two labels that were matched in the first stage. The second stage verifies whether the labels that were matched in the first stage have similar parent and child nodes. After that, matching results of these two stages are joined and abstracted to generate a universal matching result between the semantic query and the candidate semantic tree.

We shall look into these stages independently then show how are they integrated. Section 6 provides experiments that test the concepts given in this section.

4.2. Local matching

The query QI in Section 3.1 provided us with an important feature of nodes in semantic queries and semantic trees. Each node in the hierarchy of Q1 had two labels to describe it (the labels Long and Face, for example, belong to the same node). In general, nodes in semantic queries and in semantic trees can have any number of labels. Our purpose in this stage of semantic query processing is to see whether all labels of a particular query node exist in the corresponding semantic tree node. Thus our matching is directed from the semantic query to the candidate semantic tree.

The following matching function uses the measure FS [which was introduced in eqn (1)] as its basis:

Local Matching (FM) function. Given:

(a) the set Lou,= {/oN,,, lou,~, ION,, . . . . ION,.,,}


J Input i semantic query

1 I 'm''' database il Index il f Semantic ree ielrlevalJ ~ t

(Local ~l (Context ~t (Abstraction ~ ~ 1 1 ~ ~ l ( max | ~. generatl_on)Ji

I 1

Universal i matching degree Generation Fig. 5. Operational block diagram of the semantic query processing method.

of labels that belong to the node QNx of a semantic query.

(b) the set Lrt¢, = { lrN,,, Ira,, Ira,, . . . . . lrN,.,}

of labels that belong to the node TN v of a semantic query.

Then:

LM(QN~, TNy) is the accumulated rate of matches achieved by QN~ members in TN~. m

~ ~/ FS(IoN , lrt%) b = l " '

LM(QN~, TN~.)= ,=1 m

(3)

Where:

V =Fuzzy OR [which is a Max operator (Klir & Folger, 1988)]. The following observations can be made of LM definition: (1) Each member of LoN e is matched against all Lm, members, i.e. each LQN, member is used

n times (n is the population of Lm,). (2) The n matches in (1) above provide n matching degrees. Of these, only the highest is

considered for later processing. The highest value is found by comparing these matching degrees with each other using the fuzzy OR.

(3) The average of maximum matching values of QN.~ labels [which is generated in (2) above] is taken. This average indicates the local matching between QN~ contents and TNy contents.

(4) LM is a fuzzy measure. Its maximum value is 1 (achieved when exact matches for all LQN~ members are found in Lr~,). The minimum value of LM is zero (when each of LeN ' members scores an FS value of zero against all members of TN,).

(5) If there were j nodes in the semantic query and k nodes in the candidate semantic tree,


the function LM produces a j × k (columns × rows) array that we call LMA. Each LMA column corresponds to local matching degrees between a particular semantic query node (QN~) and all nodes in the candidate semantic tree.

4.3. Context matching

As was mentioned in Section 4.1, the context of each query node should be matched against the context of the corresponding node in the candidate semantic tree. Context matching is carried out to see:

(a) Whether these nodes have similar position (i.e. similar parent and similar child nodes) in their hierarchies

(b) Whether these nodes are connected with similar relations to similar other nodes in their own hierarchies.

The result of context matching is joined with local matching results to generate gross matching results. Gross matching is discussed in Section 4.4. Context matching proceeds as follows:

Context Matching (CM) function. Given:

(a) the set BON= {(bQ~, , rout), (bQu,., roN,) . . . . . (boN,, ., rQN~.)} of semantic query nodes and relations that are linked to the node QN~. For example, the pairs ({Head}, {Parent node of}), ({Mouth}, {Child node of}), and ({Hand}, {Smaller than}) are found in BON ' if QN~ contained the label Face.*

(b) the set BrN = {(brN,, rrN,,), (brN,:, rrN,:) . . . . . (brN,., rrN, )}

of semantic tree nodes and relations that are linked to the node TN~.

Then:

CM(QN~, TN,) is the accumulated rate of matches achieved by BQN, members in BTN,. m

Z ~=1 (LM(b°N"' bru'")/'xLM(reu'"' r,:%))

CM(QN~, TN,) = "=~ (4) m

Where:

LM = Local matching function [of eqn (3)]. be>N,. =Semantic query node. reN,, ' = Semantic query relation. brN,, ' = Semantic tree node. rrN,, ' = Semantic tree relation. V =Fuzzy OR [Max operator (Klir & Folger, 1988)]. A. =Fuzzy AND [Min operator (Klir & Folger, 1988)].

The following observations can be made of CM definition:

(1) Each pair belonging to BQN ' is matched against all BrN ' pairs, i.e. each BoN ' member is used n times (n is the population of BrN,).

(2) In each matching, relation labels and node labels are matched separately from each other. Results obtained from these two matches are then joined using a Min operator (the fuzzy AND function) to reflect the match of the whole link (link= relation + related node).

(3) The n matches in (2) above provide n matching degrees, out of which the highest matching degree is obtained using fuzzy OR.

(4) The average of the maximum matching values of BoN ' members {which is generated in (3)

"These three pairs are examples. In general, a query node or relation can have more than one label.


(5)

(6)

above] is taken. This average indicates the context matching degree between QN~ neighbors and TN,. neighbors. CM is a fuzzy measure. Its maximum value is 1 (achieved when exact matches for all BQN ' members are found in BrN,). The minimum value of CM is zero (when each of BON ~ members scores a joint LM value of zero against all members of BrN,). If there were j nodes in the semantic query and k nodes in the candidate semantic tree, the function CM produces a j x k (columns × rows) array that we call CMA. Each CMA column corresponds to context matching degrees between a particular semantic query node (QNx) and all the nodes in the candidate semantic tree.

4.4. Gross matching

Thus far we have been able to generate matrices of local and context similarity values that state the degree to which the semantic query resembles the candidate semantic tree. Results obtained from the functions LM and CM are organized into the matrices LMA and CMA, respectively.

In order to generate a single value that indicates the similarity degree between the semantic query as a whole and the candidate semantic tree as a whole (which we call the universal matching degree), we need first to abstract the matrices LMA and CMA into a single matrix. The gross similarity parameter is defined as follows:

Gross matching (GM) function. Given:

(a) A semantic query that has m nodes (QN I, QN 2 . . . . . QN,,). (b) A candidate semantic tree that has n nodes (TN~, TN2 . . . . . TN,). (c) The matrices:

LM(QN,, TN,) . . . LM(QN,., TN,) 1

(i) LMA LLM(QN,. TN.) ~. LM(QNm, TN°)

I CM(QN,, TNO...CM(QNm, TNO 1

(ii) CMA = CM(QN~, TN,).:. . CM(QNm, TN.)]

Then: GM(QNiTNj) is the aggregate matching degree that exists between query node QNi and

candidate node TNj. GM value is generated using a fuzzy composition operation (Klir & Folger, 1988).*

GM(QNi, TNj)=LM(v,,, v2) /~ CM(QN~, TNj) (5)

Where: A=Fuzzy AND function [fuzzy Min operator (Klir & Folger, 1988)]. When we apply eqn (5) to all LMA and CMA members, and given: (a) A semantic query that has m nodes (QNI, QN2 . . . . . QN,,,). (b) A candidate semantic tree that has n nodes (TN~, TN2 . . . . . TN,). Then the following matrix is generated:

L "GM(QN,, TN,,)~ .GM(QNm, TN,)..J

Using information provided by this matrix, the universal matching degree (UMD) is generated as follows:

"Fuzzy AND is used here since the composition operation works here on single values rather than matrices.


Universal matching degree (UMD). Given:

(a) A semantic query that has m nodes (QNI, QN2 . . . . . QNm). (b) A candidate semantic tree that has n nodes (TN,, TN2 . . . . . TN,,). (c) The matrix GMA (described above).

Then: UMD is a function that summarizes GMA. UMD generates a single matching degree that

indicates the similarity rate of the semantic query as a whole against the candidate semantic tree as a whole. UMD is an average value generated by summing maximum similarity rates that each QN member generates against any TN member. Formerly:

ttl

~/ GM(QN,, TIVj)

UMD(o~,r~)_ i=l m (6)

Where: V=Fuzzy OR function [a Max operator (Klir & Folger, 1988)].

The following points can be observed from eqn (6):

(1) Each column in the GMA matrix is searched for the highest GM value which was generated between a particular QN member and any TN member. Search is conducted using fuzzy OR comparisons.

(2) The average of the maximum matching values of each GMA column (which is generated in (1) above] is taken. This average indicates the univerai gross matching between QN members and TN members.

(3) UMD is a fuzzy measure. Its maximum value is 1 (achieved when each GMA column has at least one member whose value is one). The minimum value of UMD is zero (when all GMA members are zeros).

This is the terminal matching process in which this candidate semantic tree is used. In practice, when a semantic query is entered, its labels are used as search keys to access the label index of the IDB as shown in Fig. 5. Candidates semantic trees are retrieved using this index and the matching operation described in this section is applied to them all.

5. GUI-BASED SEMANTIC QUERY ENTRY

As was shown in the query QI of Section 3.1, the involvement of parent-to-daughter pointers in specifying semantic queries makes it difficult to enter the semantic query using an alphanumeric interface. For example:

(1) When mentioning a label of the child node A of a child node B of current node N (two level difference):

ST.(ST.(ST.N.A 16. Y).A 16.X).A 13.2

(2) When mentioning a label of a node with afire level difference:

ST. (ST.(ST.(ST. (ST.(ST.N.A 16.E).A 16.D).A 16. C).A 16.B).A 16.A). A 13.2

Due to the difficulty involved in specifying such terms, we developed a graphical user interface (GUI) to enter the semantic query. The operational diagram of this GUI is outlined in Fig. 6. Construction of semantic queries proceeds according to the following operations:

(1) When the GUI is called, a blank screen is shown to the user. The user then clicks the mouse anywhere in the blank screen to establish the root node of the query.

(2) When the action specified in (1) above is taken, a label entry window appears to the user to enter the first label of the root. This window then appears again and again until the user


Start/

Emphasis degree entry

establishment Label entry I I I I~ Assigned

label

establlshmenl

Emphasis I Label i Idegree entry! I

verlllcatlon I _ ~ Extracted "~

l__t r Label entry I~ I~ Assigned

l label

Fig. 6. Block diagram of the monitored SeQ entry module.

selects NO as a menu answer for the question "Enter more Labels?". (3) To establish a child node, the user drags the mouse with a button pushed from the root

node (or any other established node) to the open space in the GUI. When such an action is taken, the operation described in (2) above is repeated for the child node until no more labels are necessary. A parent-to-daughter link is noted by a small solid box at its end at the child node box as in Fig. 7.

(4) To establish a relation, the user drags the mouse with a button pushed from one node to another. When this action is taken, the first node becomes the source of the relation, while the second becomes the sink of the relation. The user can than enter several relation labels as was the case in (2) and (3) above. The relation-to-source line is solid while the relation- to-sink line is dotted as shown in Fig. 7.

(5) The user can add labels, delete labels, or modify existing labels by clicking the box of a particular node or relation.

(6) To ensure that the user will enter a label that exists in the IDB (in other words, to avoid spelling differences), the user can enter the starting part or the middle part of the desired label and press the TAB key. When this action is taken, a menu appears to ask the user whether the entered string occurs at the beginning of or in the middle of the required label. When the user makes his/her selection, a menu appears with labels that contains the entered string at the position specified by the previous menu. The user can then select any of these labels and add them to the edited node. An illustration of the operation described here is given in Fig. 7.

(7) There are two kinds of labels; extracted (such as Long, Brighter than . . . . . etc.) and assigned such as Key, Holding . . . . . etc.). If the user enters an extracted label, an emphasis menu appears on the screen and asks the user to select a single option. This operation helps in improving matching quality by inserting an approximate value into the label. An illustration of the operation described now is given in Fig. 8.

II II


r - " l con f tem Your Selection. I-----Ixetrleval of Full Ordered Head Pattern. ~ R e t r t e v a l of ~ull Ordered K~dou Position Pattern. [~ '-JEetrteval of Full Disordered Kandoe Posttlon Pattern. ~ -" ]Ret r leva l of Partial Dl£ordered Kando| PoGltlon Pattern. [----JCancel. the h t r t e v a l Operation.

[ - - 7 Ccmarm Your Selection.

[ ' ~ JAPAHESE AR~CRAFT

JAPANESE CULTURE DANCER

JAPANESE LUCK IdlASCOT

JAPANESE WORRIOR

JAPANESE HOUSEV~FE

TRADmONAL JAPANESE FEMALE FASHION

[ ' ~ JAPANES NOBLEMAH

JAPANESE TRADITIONAL MEN FASHION

[ ~ Select All.

~ 7 Dlsclrd All,

Fig. 7. Label checking.

6. CASE STUDY

6.1. Full query expertment

Over 100 images are stored in our experimental IDB. The graphic form of the semantic query listed below is shown in Fig. 9. It includes five specification levels and the following components:

Semantic query (SeQ):. (a) Hierarchy*

Node 0 : 0 (population: 3): BALD MAN HOLDING A DARUMA subnodes: 1 10

Node 1 : 0 (population: 7): BALD PERSON 1 (population: 37): MALE 2 (population: 19): ADULT

Subnodes: 2 3 4 Node 2 : 0 (population: 107): HEAD

Subnodes: 5 6 7 8 Node 3 : 0 (population: 79): BODY Node 4 : 0 (population: 12): FEET Node 5 : 0 (population: 121): MOUTH Node 6 : 0 (population: 122): NOSE Node 7 : 0 (population: 120): EYES

Subnodes: 9 Node 8 : 0 (population: 29): MUSTACHE

" Population of the label in the IDB index is given in parenthesis.

IPM 3~-5 [3


Fig. 8. Emphasis degree entry.

Node 9 : 0 (population: 146): LEFT EYE Node 10:0 (population: 19): DARUMA

1 (population: 17): JAPANESE LUCK MASCOT Subnodes: 11 12

Node 11:0 (population: 91): FACE Subnodes: 13 14

Node 12:0 (population: 79): BODY Node 13:0 (population: 122): NOSE

Subnodes: 15 Node 14:0 (population: 120): EYES Node 15:0 (population: 121): MOUTH

(b) Relations: 11 ---. 2 Contacts 6 ---. 13 Bigger_ than 5 ---. 2 At_ the_ bottom_ left_ of We shall assess the compatibility between this semantic query and the semantic trees of the

images given in Fig. 10.* Semantic trees of these images are given in the Appendix. To test the strategy presented in this paper, the following errors were deliberately made in the

semantic query given above:

(1) None of ihe three relations is plausible. (2) MOUTH (SeQ node 15) is inserted under NOSE rather than FACE. (3) The region MUSTACHE (SeQ node 8) is registered under the region HEAD (SeQ node

2) rather than FACE (SeQ node 11).

The matching operation is carried out according to strategy presented in Section 4. Results are generated as follows:

* Semantic trees of these images contains most of the labels of this semantic query.


(1) The array LMA is generated between the given semantic query and each of the candidate semantic trees. Table 1 provides LMA information with respect to the best matching image [of Fig. 10(b)].

(2) The array CMA is generated between the given semantic query and each of the candidate semantic trees. Table 2 provides CMA information with respect to the best matching image.

(3) The array GMA is generated from the arrays LMA and CMA. Table 3 provides LMA information with respect to the best matching image.

(4) The GMA array is abstracted into number of maximum values. Maximum values of the GMA array listed in Table 3 is given in the bottom line of Table 3. Maximum values of semantic query nodes against all candidate semantic trees is given in Table 4.

BALD MA

/

7~oo~ 1 / / " 7_.OOT t ' / /

• I

EYES I

7 1' MUSTACH i

EYES

7"OUT. 1 Fig. 9. Experimental semantic query.


(a) (b) (c) (d)

Fig. I0. Best matching images (a) (h) (c) and (d).

(5) Using the information in Table 4, UMD values for all images in Fig. l0 is given in Table 5.

6.2 Restricted query experiment

In this section, we experiment with a new query. This new query is generated by carrying out the following changes on the initial query which was the subject of Section 6.1 above:

(1) The region FEET (SeQ node 4) is eliminated.

Table I. The array LMA

QNo ON, QN2 ON3 QN4 QN5 QN~ QN7 QN8 QN~ QN,o QN,~ ON,,. QN,3 QN]~ QN,5

TNt~ 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN t 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 TN 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ! .0 0.0 0.0 0.0 0.0 TNs 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNs 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 TN 6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN7 0.0 0.0 0.0 0.0 0.0 0.0 1,0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 TN s 0.0 0.0 0.0 0.0 0.0 1.0 0,0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 TN,~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN.a 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNll 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0,0 TNI3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,4 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 TNI5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNI7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNIs 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNI9 0.0 0.67 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2{ ~ 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2, - 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2s 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN24 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1,0 0.0 0.0 0.0 0.0 0.0 0.0 TN2~ 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 010 0.0 0.0 TN2~ 0.0 0.0 0.0 0.0 0.0 I.O 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 l.O TN2s 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.0 0.0 0.0 0.0 TNs, 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.0 0.0 0.0 0.0 TN32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 TNss 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 4 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 TNs5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0


(2) The region MOUTH (now SeQ node 14) is linked to its proper parent which is the region FACE (now SeQ node 10).

The new query is shown in Fig. 11. Both of the changes made above make a positive impact on the final UMD degree. The purpose of making these changes is to investigate:

(1) Whether UMD values of candidate semantic trees will increase. (2) Whether this increase is proportional to UMD values of Section 6.1. (3) Whether UMD margin between these candidates is maintained.

Table 6 shows maximum GM values (gross matching degree) for each of the nodes of the new semantic query. Using this information, Table 7 shows the impact of the changes made on the Final UMD output. We may observe the following points from Table 7:

(1) All matching rates showed some increase due to the corrections made to the original semantic query.

(2) The change was not proportional to the old matching rate (of Section 6.1). It can be seen that the images in Fig. 10(c) and (d) gained more than the best matching image [of Fig. 10(b)], while the gain of the image in Fig. 10(a) is far less than the rest of images. Therefore, UMD change (when the query is altered) is not uniform for all candidates.

(3) Due to the unbalanced changes in UMD values as described in (2) above, the difference between the best matching image and the rest of the images shrank a little. Still, it can be said that the UMD gap is wide enough to consider the image in Fig. 10(b) as the only qualified output and neglect the images in Fig. 10(a) (c) and (d).

Table 2. The array CMA

aNo QN. QN~_ QN~ QN, QNs QN~ QN~ QN 8 QN~ QN,o QN., QN.,. QN,s QN~4 QN,~

TN~ 0.889 0.250 0.095 0.667 0,667 0.0 0.0 0.0 0.0 0.0 0.333 0,250 1.000 0.0 0.0 0.0 TN~ 0.333 0.500 0,143 0.0 0~0 0.0 0.0 0.0 0.0 0.0 1.000 0.0 0.0 0.333 1.000 0.0 TN, 0,333 0.0 0.714 0.0 0.0 0.0 0.0 0.0 0.0 1,000 0.0 0,750 1.000 0.333 0.0 1.000 TN~ 0.0 0.0 0.286 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.333 0.0 0.0 TN4 0,0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.286 0.0 0.0 0.0 0.0 0.0 0.0 1.000 0.0 0,250 0.0 0.0 0.0 0.0 TN, 0.0 0.0 0.143 0.0 0.0 0.0 0.0 0.0 0.0 1.000 0.0 0.250 0.0 0.0 0.0 0.0 TN7 0.0 0.0 0.143 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.333 1.000 0.0 TN s 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.333 1.000 0.0 TN, 0.0 0.0 0.286 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN m 0.0 0.0 0.143 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN, 0.0 0.0 0.286 0.0 0.0 0.0 0,500 0.0 0.0 0.0 0.333 0.0 0.0 0.333 1,000 0.0 TN~,. 0.0 0.0 0.143 0.0 0.0 0.0 0.500 0.500 0.0 0.0 0.333 0.0 0.0 0.333 1.000 0.0 TN~ 0.0 0.0 0.143 0.0 0.0 0.0 0.500 0.0 0.0 0.0 0.333 0.0 0.0 0.333 1.000 0.0 TN~4 0.333 0.0 0.286 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.250 1,000 0.0 0.0 0.0 TN~ 0.0 0.250 0,286 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 0.0 TN~, 0.0 0.250 0.143 0.0 0.0 0.500 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 0.0 TN~7 0.0 0.250 0,143 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 0.0 TN~s 0.0 0.250 0.143 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 0.0 TN~ 0.333 0.750 0.0 0.0 0.0 0.500 0.500 0.500 1.000 0.0 0.667 0.250 0.0 0.0 0.0 0.0 TN,o 0.222 0.0 0.667 0.67 0.667 0.0 0.0 0.0 0.0 1.00 0.0 0.500 0.0 0,333 0.0 1.000 TN, L 0.0 0.250 0,143 0.0 0.0 0.500 0.500 0,500 1.000 0.0 0.0 0,250 0.0 0.0 0.0 0.0 TN22 0.0 0.0 0.143 0.0 0.0 0.500 0.0 0,0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2s 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0,0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,_~ 0.0 0.0 0.143 0.0 0.0 0.500 0.0 0,0 0.0 1.000 0.0 0.250 0.0 0.333 0.0 0.0 TN,~ 0.0 0.0 0.286 0.0 0.0 0.500 0.0 0.0 0.0 1.000 0.0 0,250 0.0 0.333 0.0 0.0 TN~ 0,0 0.250 0.0 0.0 0.0 1.000 0,500 0.500 1.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN27 0,0 0,250 0.0 0.0 0.0 1.000 0.500 0.500 1.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,_, 0.0 0.0 0.143 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.250 0.0 0.0 0.0 0.0 TN~ 0,0 0.0 0.0 0.0 0.0 0.500 0.0 0.0 0.0 0.0 0.0 0.250 0.0 0.0 0,0 0.0 TN~o 0,0 0.250 0.143 0.0 0.0 0.500 0.500 0.500 1.000 0.0 0.0 0.250 0.0 0.0 0,0 0.0 TN~ 0,0 0.250 0.143 0.0 0.0 1.000 0.500 0.500 1.000 0.0 0.0 0.0 0.0 0.0 0,0 0.0 TNs~ " 0,0 0.250 0,143 0.0 0.0 1.000 1.000 1.000 1.000 0.0 0.0 0.0 0.0 0.0 0,0 0.0 TN33 0.0 0.250 0.143 0.0 0.0 1.000 0.500 0.500 1.000 0.0 0.0 0.250 0.0 0.0 0.0 0.0 TN~4 0.222 0.0 0.095 0.667 0.667 0.0 0.0 0.0 0.0 0.0 0.0 0.250 0.0 0.0 0.0 0.0 TN3~ 0.333 0.250 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.250 0.0 0.0 0.0 0.0


Table 3. The array GMA.

QNI~ QNI QN. QN~ QN4 QNs QN~, QN7 QN8 QN9 QNm QNll QNI2 QNI~ QNI4 QNI~

TN o 0.889 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNI 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 .0000.0 0.0 0.0 0.0 0.0 T ~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.750 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 t . 0 0 0 0 . 0 0.0 0.0 0.0 0.0 0.0 TNa 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 TNs 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNio 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNil 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNiz 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 . 5 ~ 0.0 0.0 0.0 0.0 0.0 0.0 1 .0000.0 TNI~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 .0000.0 0.0 0.0 TNI5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN,6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNI7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TNt9 0.0 0.667 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~O.O 0.0 0.667 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN21 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN2~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN23 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 T~4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 .0000.0 0.0 0.0 0.0 0.0 0.0 TN26 0.0 0.0 0.0 0.0 0.0 0.0 0 . 5 ~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 T~7 0.0 0.0 0.0 0.0 0.0 1.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN28 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 .0000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~4 0.0 0.0 0.0 0.667 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TN~5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Max 0.89 0.667 0.667 0.667 0.0 1.000 0.500 1.000 0.0 1.00 1.000 0.750 1.000 0.333 1.000 0.0

Table 4. Maximum matching rates of candidate nodes

Fig. 10 QN,, QN, ON, QN~ QN4 QN~ QN~ QN7 QNx QN~ QN,o QN,, ON,, QN,~ QN,~ QN,~

(a) 0.0 0.333 0.476 0.333 0.0 0.0 0.50 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 (b) 0.89 0.667 0.667 0.667 0.0 1.000 0.50 1.000 0.0 1.00 1.000 0.750 1.000 0.333 1.000 0.0 (c) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.50 0.0 1.0 0.50 0.625 0.50 0.333 1.0 0.0 (d) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.50 0.0 1.0 0.50 0.625 0.50 0.333 1.0 0.0

Table 5. Sorted UMD values.

Fig.10 UMD Ripple

(b) 0.7021 - - (c) 0.2305 0.4716 (d) 0.2305 0.4716 (a) 0.2266 0.4755


I BALD MA I " BALD PE I ,..~Conta¢t I

l I 7

l T, r/ :' l.o , r :

P

• EYES I

] I NOSE

I EYES

I MOUTH

# e

I

I I

LEFT EY I Fig. 11. Modified query (second experiment).

6.3 Output set threshold and final output generation

In our study cases above as in all semantic queries, several semantic trees are expected to produce a significant similarity degree with the semantic query. In order to provide a final output, we need to sort these similarity degrees (the UMD) in a descending order into a table (as in Tables 5 and 7). Once such a sorted list is available, we should apply an automatic threshold function that eliminates all unsatisfactory solutions. The system user therefore is presented only with images whose UMD was in the upper part of the filtered list.

Table 6. Maximum matching rates of candidate nodes

Fig. I0 QNo QN, QN, QN 3 QN4 QN5 QN6 QN7 QN8 QN9 QN,o QNu QN,: QN,3 QN,~

(a) 0.0 0.333 0,476 0.333 0.0 0.50 1.0 1.0 1.0 0.0 0.0 0,0 0.0 0.0 0.0 (b) 0.889 0.667 0.524 0.667 0.50 0.50 1.00 0,0 1.00 1.00 0.80 1.00 0.50 1,00 1.00 (c) 0.0 0.0 0.0 0.0 0.0 0.0 0.50 0.0 1.0 0.50 0.70 0.50 0.50 1,0 1.0 (d) 0.0 0.0 0.0 0.0 0.0 0.0 0.50 0.0 1.0 0.50 0.70 0.50 0.50 1,0 1.0


Table 7. Sorted UMD values

Fig. 10 NewUMD Ripple OldUMD Change

(b) 0.7654 - - 0.7021 0.0633 (c) 0.3028 0.4626 0.2305 0.0723 (d) 0.3028 0.4626 0.2305 0.0723 (a) 0.2388 0.5266 0.2266 0.0122

Threshold level should be variable and is assigned according to the differences between output UMD results. Based on the points mentioned above, our threshold function uses UMD ripple to determine the position (from the top of the initial output list) after which members are eliminated. A UMD ripple in the sorted list is the difference between the UMD of a particular image and the best UMD obtained.

Our threshold function searches for a ripple value that is more than a quarter of the best UMD value. This criteria is used on the ripple columns in Tables 5 and 7. The final list, therefore, includes only the image of Fig. 10(b).

7. CONCLUSIONS

Semantic queries of image databases allows expressing associative relations between entity labels given in the query, i.e. allows defining the query context. This feature makes processing methods of conventional alphanumeric database systems of little use for semantic queries. This paper presented a new processing method that used a context matching method to guarantee that the location of matched labels within their respective contexts is similar in both the query and the candidate semantic tree.

By using fuzzy rather than crisp matching operations, we were able to produce an overall matching degree that range from zero to one. Thus we based our method on partial matching rather than full matching. Partial matching in IDB queries is important since it resembles human perception of images. The human viewer provides only a partial description of the target image.

Although the experiments given in this paper provided an acceptable discrimination margin between matching and mismatching semantic trees, the method presented in this paper has its drawbacks as follows:

(1) Successful response to the queries was generated because a highly matching image existed in the IDB [which is the image of Fig. 10(b)]. If no such image was available, then we have to take one of the following options:

(a) We can set a lower bound for acceptable UMD values such as 0.5 or 0.6. Setting this bound is quite difficult since this bound is a finite value that effects a fuzzy UMD value. For example, setting such a bound at 0.5 will result in accepting a UMD value of 0.501 while a UMD of 0.499 is dismissed although it is only 0.002 less.

(b) We can present the highest matching images as an output. In both of our experiments, if we exclude the image of Fig. 10(b) then the best matching degree achieved was 0.3028. This matching degree is clearly not good enough to make the corresponding image qualify as a proper query response.

(2) Although we incorporated a label checker (as was described in Section 5), there is no guarantee that a label entered by the user will match any IDB label. The use of an external thesaurus can improve this situation. However, the thesaurus is an external database that was not extracted from images. Accordingly, it is likely that it will not include some of the labels already existing in the database. So the use of the thesaurus has its tradeoffs.

Overall, the integration of local and context matching in the method presented in this paper enhances the performance of this method when multi-level hierarchical queries are processed. And with fuzzy matching involved, minor query errors can be handled and high matching rates can be produced with qualified images.


REFERENCES

Benson, D. & Zick, G. (1992). Spatial and symbolic queries for 3-D image data. SPIE 1662, 134-145. Chang, S. K. & Liu, S. H. (1984). Picture indexing and abstraction techniques for pictorial databases. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 6(4), 475--484. Chang, S. K., Yan, C. W., Dimitroff, D. C. & Arndt, T. (1988). An intelligent image database system. IEEE Transactions

on Software Engineering, 14(5), 681-688. Chock, M., Cardenas, A. E & Klinger, A. (1984). Database structure and manipulation capabilities of the picture

database management system (PICDMS). IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(4), 484-492.

Codd, E. E (1979). Extending the database relational model to capture more meaning. ACM Transactions on Database Systems, 4(4), 397-434.

Gorsky, W. I. & Mehrotra, R. (1992). Image database management. In M. C. Yovits (Eds), Advances in Computers (pp. 237-292). Boston, U.S.A.: Academic Press.

Gupta, A., Wymouth, T. E. & Jain, R. (1991). Semantic queries with pictures: The VIMSYS model. In Seventh International Conference on Ve~ Large Database Systems. Barcelona, Spain.

Joseph, "12 & Cardenas, A. E (1988). PICQUERY: A high level query language for pictorial database management. IEEE Transactions on Software Engineering, •4(5), 630-638.

Kim, W., Garza, E, Ballou, N. & Woelk, D. (1990). Architecture of the ORION next generation database system. IEEE Transactions on Knowledge and Data Engineering, 2( 1 ), 109-124.

Klir, G. J., & Folger, T. A. (1988). Fuzzy Sets, Uncertain~, and Information. Englewood Cliffs, NJ: Prentice Hall. Orenstien, J. A. & Manola, E A. (1988). PROBE spatial data modeling and query processing in image database

application. IEEE Transactions on Sofm'are Engineering, 14(5), 611-629. Ornestien, J. A. (May 1986). Spatial query processing in an object-oriented database system. Proceedings of ACM

SIGMOD, 326-336. Roussopoulos, N., Faloutsos, C. & Sellis, T. (1988). An efficient pictorial database system for PSQL. IEEE Transactions

on Software Engineering, •4(5), 639-650. Shakir, H. S. & Nagao, M. (1994). Transducer-based knowledge capture and adaptive hierarchical entity modeling of

image contour information. In Nineteenth Intelligent Systems Symposium (pp. 91-98). Tokyo, Japan: Information Processing Society of Japan.

Tang, G. Y. (1981). A management system for an integrated database of picture and alphanumerical data. Computer Graphics and Image Processing, /6(3), 270-286.

APPENDIX

Hierarchy o f the Images Di scussed in Sect ion 6

Lis t 1. Labe l in format ion o f the image in Figure lO(a)

Hierarchy

Labe ls of node 0 are: 0 : H U M A N D A R U M A

Subnodes : 1 2 Labe ls of node 1 are: 0: STATUE

1: H U M A N F I G U R E

2: B A L D P E R S O N

Subnodes : 3 18 19

Labe l s of node 2 are: 0: B A C K G R O U N D

Labe ls o f node 3 are: 0: H E A D

Subnodes : 10 11 12 13 14 15

Labe l s o f node 4 are: 0: L E F T E A R

Labe ls of node 5 are: 0: R I G H T E A R

Labe l s of node 6 are: 0: L E F T E Y E B R O W

Labe ls o f node 7 are: 0: R I G H T E Y E B R O W

Labe ls of node 8 are: 0: L E F T E Y E

Labe l s o f node 9 are: 0: R I G H T E Y E

Labe ls o f node 10 are: 0: N O S E

Labe l s o f node 11 are: 0: M U S T A C H E

Labe ls of node 12 are: 0: C H I N

Labe l s of node 13 are: 0: E A R S

Subnodes : 4 5

Labels o f node 14 are: 0: E Y E B R O W S

Subnodes : 6 7


Labels of node 15 are:

Labels of node 16 are: Labels of node 17 are: Labels of node 18 are: Labels of node 19 are:

Relations

0: EYES Subnodes: 8 9 0: LEFT ARM 0: RIGHT ARM 0: BODY 0: ARMS Subnodes: 16 17

There are 853 relations registered for this image. List 2. Label information of the image in Figure lO(b) Hierarchy

Node 0:

Node 1:

Node 2:

Node 3: Node 4: Node 5: Node 6: Node 7: Node 8: Node 9: Node 10: Node 11:

Node 12:

Node 13:

Node 14:

Node 15: Node 16: Node 17: Node 18: Node 19:

Node 20:

Node 21: Node 22: Node 23: Node 24: Node 25: Node 26: Node 27: Node 28: Node 29:

0: BALD MAN HOLDING A DARUMA Subnodes: 1 19 35 0: DARUMA 1: JAPANESE LUCK MASCOT Subnodes: 2 14 0: FACE Subnodes: 7 8 11 12 13 0: LEFT EYEBROW 0: RIGHT EYEBROW 0: LEFT EYE 1 5): EYE 0: RIGHT EYE 1 5): EYE 0: NOSE 0: MOUTH 0: LEFT MUSTACHE PART 0: RIGHT MUSTACHE PART 0: EYEBROWS Subnodes: 3 4 0: EYES Subnodes: 5 6 0: MUSTACHE Subnodes: 9 10 0: BODY Subnodes: 15 16 17 18 0: TOP 0: RIGHT SIDE 0: LEFT SIDE 0: BOTI'OM 0: BALD MAN 1: PERSON 2: MALE 3: ADULT Subnodes: 20 34 0: HEAD Subnodes: 21 26 27 30 31 32 33 0: HAIR O: LEFT EYEBROW 0: RIGHT EYEBROW 0: RIGHT EYE 0: LEFT EYE O: NOSE 0: MOUTH O: LEFT CHEEK 0: RIGHT CHEEK


Node 30: 0: BALD SPOT Node 31: 0: EYEBROWS

Subnodes: 22 23 Node 32: 0: EYES

Subnodes: 24 25 Node 33: 0: CHEEKS

Subnodes: 28 29 Node 34: 0: BODY Node 35: 0: BACKGROUND

Relations There are 2648 relations registered for this image. List 3. Label information of the image in Figure 10(c) Hierarchy




Labels of node 3 are: Labels of node 4 are: Labels of node 5 are: Labels of node 6 are: Labels of node 7 are: Labels of node 8 are:




Labels of node 17 are: Labels of node 18 are: Labels of node 19 are:

Relations

0: LAUGHING DARUMA 1: JAPANESE LUCK MASCOT Subnodes: 1 19 0: D A R U M A 1: ARTIFACT Subnodes: 2 17 18 0: FACE 1: HUMAN FACE Subnodes: 3 8 12 13 14 15 16 0: HAIR 0: RIGHT EYEBROW 0: LEFT EYEBROW 0: RIGHT EYE 0: LEFI" EYE 0: NOSE Subnodes: 9 0: PIMPLE 1¢~ 1): SWELLING SPOT 0: LEFT MUSTACHE SIDE 0: RIGHT MUSTACHE PART 0: MOUTH 0: BEARD 0: EYEBROWS Subnodes: 4 5 0: EYES Subnodes: 6 7 0: MUSTACHE Subnodes: 10 11 0: BASE 0: BODY 0: BACKGROUND

There are 799 relations registered for this image. List 4. Label information of the image in Figure 10(d) Hierarchy

Labels of node 0 are: 0: SMILING YOUNG DARUMA 1: JAPANESE LUCK MASCOT Subnodes: 1 16

Labels of node l are: 0: D A R U M A 1: STATUE 2: ARTIFACT Subnodes: 2 5



Labels of node 3 are: Labels of node 4 are: Labels of node 5 are:


Labels of node 12 are: Labels of node 13 are:



Labels of node

Relations

16 are:

0: FACE Subnodes: 6 7 10 11 14 15 0: TOP 0: FRONT 0: BODY Subnodes: 3 4 0: HAIR 0: LEFT EYEBROW 0: LEFT EYE 0: RIGHT EYE 0: NOSE 0: SMILING MOUTH 1: MOUTH 0: LEFT CHEEK 0: RIGHT CHEEK 1: CHEEK 0: EYES Subnodes: 8 9 0: CHEEKS Subnodes: 12 13 0: BACKGROUND

There are 588 relations registered for this image.

Documents

Context-sensitive processing of semantic queries in an image database system