11
Parallel Network Organization Algorithm for Graph Matching and Subgraph Isomorphism Detection Keita Maehara Department of Computer and Systems Engineering, Faculty of Engineering, Kobe University, Kobe, Japan 657-8501 Kuniaki Uehara Research Center for Urban Safety and Security, Kobe University, Kobe, Japan 657-8501 SUMMARY Data representations using graphs are very flexible and are used in a wide variety of fields. The development of algo- rithms to perform basic processing at high speeds is vital for detecting subgraphs with important meaning from a graph set and for searching for subgraphs which match a given graph. However, as the number of graphs in question increases, the computation costs required for processing rise dramatically. In this paper, the authors describe an algorithm which detects isomorphisms which have several graphs in common from among a set of labeled directed graphs and then organizes them into a network based on the detected isomorphisms. This algo- rithm also provides greater processing speed through heuristics using the MDL (Minimum Description Length) principle. ' 2000 Scripta Technica, Syst Comp Jpn, 31(8): 6878, 2000 Key words: Graph matching; MDL principle; par- allel processing; organization algorithm. 1. Introduction Graph representations are used in a variety of fields, including circuit design, knowledge representation, and image recognition, due to their flexibility and applicabil- ity as a means to represent data. Using graph repre- sentations for databases is extremely valuable for efficient management. In addition, valuable information can be found in graph representations, making searches of graphs with particular structures a vital topic. For instance, if a structure which appears repeatedly in a circuit can be extracted, the circuit can be designed efficiently using the circuit element which corresponds to this structure. However, in general, the computation costs for algorithms used in graphic representations are high, and there is considerable need for a faster algo- rithm. In this paper, the authors detect an isomorphism which is shared by multiple graphs within a set of labeled directed graphs and then describe an algorithm to organ- ize the detected structure as a network. This algorithm provides faster processing through heuristics using the MDL (Minimum Description Length) principle [6]. 2. Basic Approach 2.1. Organization of a graph set Let us consider finding a graph with a particular isomorphism from within a given set of graphs. In this instance, matching must be performed for the isomor- ' 2000 Scripta Technica Systems and Computers in Japan, Vol. 31, No. 8, 2000 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-D-I, No. 1, January 1999, pp. 111120 Contract grant sponsor: Japanese Ministry of Education Grant-in-Aid for Scientific Research; Contract grant sponsor: Research for the Future Program of the Japan Society for the Promotion of Science. 68

Parallel network organization algorithm for graph matching and subgraph isomorphism detection

Embed Size (px)

Citation preview

Parallel Network Organization Algorithm for Graph Matching

and Subgraph Isomorphism Detection

Keita Maehara

Department of Computer and Systems Engineering, Faculty of Engineering, Kobe University, Kobe, Japan 657-8501

Kuniaki Uehara

Research Center for Urban Safety and Security, Kobe University, Kobe, Japan 657-8501

SUMMARY

Data representations using graphs are very flexible and

are used in a wide variety of fields. The development of algo-

rithms to perform basic processing at high speeds is vital for

detecting subgraphs with important meaning from a graph set

and for searching for subgraphs which match a given graph.

However, as the number of graphs in question increases, the

computation costs required for processing rise dramatically. In

this paper, the authors describe an algorithm which detects

isomorphisms which have several graphs in common from

among a set of labeled directed graphs and then organizes them

into a network based on the detected isomorphisms. This algo-

rithm also provides greater processing speed through heuristics

using the MDL (Minimum Description Length) principle.

© 2000 Scripta Technica, Syst Comp Jpn, 31(8): 68�78, 2000

Key words: Graph matching; MDL principle; par-

allel processing; organization algorithm.

1. Introduction

Graph representations are used in a variety of fields,

including circuit design, knowledge representation, and

image recognition, due to their flexibility and applicabil-

ity as a means to represent data. Using graph repre-

sentations for databases is extremely valuable for

efficient management. In addition, valuable information

can be found in graph representations, making searches

of graphs with particular structures a vital topic. For

instance, if a structure which appears repeatedly in a

circuit can be extracted, the circuit can be designed

efficiently using the circuit element which corresponds

to this structure. However, in general, the computation

costs for algorithms used in graphic representations are

high, and there is considerable need for a faster algo-

rithm. In this paper, the authors detect an isomorphism

which is shared by multiple graphs within a set of labeled

directed graphs and then describe an algorithm to organ-

ize the detected structure as a network. This algorithm

provides faster processing through heuristics using the

MDL (Minimum Description Length) principle [6].

2. Basic Approach

2.1. Organization of a graph set

Let us consider finding a graph with a particular

isomorphism from within a given set of graphs. In this

instance, matching must be performed for the isomor-

© 2000 Scripta Technica

Systems and Computers in Japan, Vol. 31, No. 8, 2000Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-D-I, No. 1, January 1999, pp. 111�120

Contract grant sponsor: Japanese Ministry of Education Grant-in-Aid for

Scientific Research; Contract grant sponsor: Research for the Future

Program of the Japan Society for the Promotion of Science.

68

phism with respect to all of the given graphs. However, as

the number of graphs becomes larger, the costs of match-

ing rise, making a search in a realistic amount of time

impossible. Therefore, in order to search efficiently, re-

dundant repetition within matching must be curtailed.

In expert systems, the RETE algorithm [3] has been

proposed in order to accelerate matching of elements in

working memory and conditional rules. The RETE algo-

rithm is a method which attempts acceleration by creat-

ing a RETE network from the various elements of the

conditional rules and then matching them to the elements

in working memory in only one pass. Here, if individual

graphs correspond to a production rule and the subgraphs

correspond to the elements in working memory, graph

matching can be made more efficient by using a method

like this one. For instance, given Graph1 and Graph2 in

a set of graphs as in Fig. 1, if the common subgraphs are

focused on, the network structure shown in the lower area

can be created.

If the common structure of a graph can be extracted

and organized, then search speeds can be raised. However,

when the appropriate common structures which a graph has

are not reflected in the network structure, searches cannot

be made more efficient. For instance, Fig. 2 shows a case

in which a separate network structure has been created from

the same graph. In this figure, the common isomorphisms

in the two graphs are not extracted as network nodes. As a

result, there will be no reduction in matching costs even if

organization is performed. As such, the following

three points must be considered when creating net-

works.

x Improvements in search efficiency

In order to increase search efficiency through organi-

zation, the network structures must be optimized. In order

to do this, the appropriate isomorphisms must be extracted

after understanding the structure of the whole graph, and

the organization must be performed based on the results.

x Efficient descriptions of graph sets

Being able to represent the original data with the

smallest amount of information is desirable from the stand-

point of the amount of memory space. In other words,

decreasing the description costs of the network structure

means increasing the efficiency of memory management.

As is the case when using common data compression

algorithms, in graph processing the expression of the origi-

nal data using a smaller amount of information must be

considered based on the structures that repeatedly appear in

multiple graphs.

x Discovery of the important isomorphisms

The isomorphisms that appear repeatedly in a graph

set are considered to have an important conceptual signifi-

cance. Therefore, if such structures can be extracted, they

will have great practical value. From this perspective,

paying attention to isomorphisms common among sev-

eral graphs is vital.

Fig. 1. Network structure for graph retrieval.

Fig. 2. Inefficient network structure.

69

3. Common Structures in a Graph Set

3.1. Discovering isomorphisms using the MDL

principle

Let us consider extracting the subgraph common to

several graphs from a given graph set. Figure 3 represents

an example of a case in which a different subgraph is given

attention with respect to the same graph set. Figure 3(a)

represents a case in which attention is given to a total of two

types of subgraphs, one that is common to Graph 1 and

Graph 4, the other common to Graph 2 and Graph 3. Figure

3(b) represents a case in which attention is given to a total

of four types of subgraphs, two of which are common to

Graph 1 and Graph 2, the other two of which are common

to Graph 3 and Graph 4.

Here, if the selected subgraph is represented as a

single vertex, the graph set can be described more effi-

ciently. For instance, given the four graphs shown in Fig. 3,

if the subgraphs considered in each of the figures are

represented using a single vertex, the number of vertices

and such for the graph set can be reduced (Fig. 4).

In Fig. 4(a), the subgraph considered in Fig. 3(a) is

replaced with one vertex, and the number of vertices for

each graph is reduced to three, and the number of edges is

reduced to two. In Fig. 4(b), the number of vertices for each

graph is reduced to two, and the number of edges to one. In

this fashion, there are numerous possibilities for a subgraph

which has several graphs in common. As a result, some kind

of criterion must be defined.

A policy in which a subgraph includes a larger num-

ber of graphs in common could represent one such criterion.

This would be ideal in the sense of being able to describe

many graphs using the same subgraph. However, the size

of a subgraph that contains many graphs in common is

generally small, and as a result this policy would not be

optimal. In addition, a policy in which a subgraph of a large

size is used could be considered. However, although this

would be ideal in terms of being able to describe the graph

using a small number of vertices, the number of graphs that

contain common subgraphs of a large size is few in general.

As a result, this policy cannot be considered optimal either.

The MDL principle adopted in the authors� re-

search is intermediate between these two policies. In

other words, the optimal model in the MDL principle is

a model for which the sum of the description length of

the model itself which explains the data and the descrip-

tion length for when the data are described using the

model is a minimum. If this is applied to the problem of

description in graph sets, a subgraph which keeps the

sum of the description length of a particular subgraph

and the description length for when the graph set is

described using that particular graph at a minimum

would be the ideal as a model.

In other words, when considering the graph set

GSet, if one combination of subgraphs which have in

common the graphs included in Gset is taken to be

SGList, then the evaluation function Eval�SGList� for

SGList is given by the following equation, according to

the MDL principle:

Note that DL�SGList� is the description length for SGList,

and DL�GSet|SGList� is the description length when GSet

is rewritten using SGList.

DL�SGList� can be defined as follows. When the

number of vertices for the subgraph SG is set as v, the

number of labels for each vertex is lv, the number of edges

is e, and the number of labels for the edges is le, then theFig. 3. Graphs and their subgraphs.

Fig. 4. Efficient description of graphs.

(1)

70

number of bits required to encode the vertices and the edges

are given by

respectively. In addition, the number of bits required to

encode the information as to what vertices the two edges

are connected to is given by

Therefore,

Furthermore, the description length DL�G|SG� for G when

the subgraph SG of the graph G is taken to be one vertex is

defined as follows. The subgraph equivalent to SG is taken

to include nSG elements in G, and after the exchange the

number of vertices becomes vc and the number of edges, ec.

At this time, if encoding is performed, then

without losing the information on the nodes to which both

ends of the edges of the graph are connected to.

The evaluation function Eval�SGList� is a measure of

how efficiently the graph set is represented as the value

becomes smaller. For instance, the description length of

Figs. 3(a) and 3(b) is 113.4 bits for both, according to Eq.

(5). However, the evaluation values when describing the

original graph by focusing on the subgraph are, based on

Eq. (1), 86.2 bits when using Fig. 4(a) and 59.0 bits when

using Fig. 4(b). Therefore, it is clear that from the stand-

point of the MDL principle, the four types of subgraphs in

Fig. 4(b) should be used in order to describe a given graph

set more efficiently.

3.2. Narrowing the search space

Even when using the MDL principle, every subgraph

must be evaluated in order to detect a means to represent

the original graph more efficiently. For instance, given a

graph with n vertices, a method to use nodes from among

them lies in ¦i 1n

nCi. Moreover, it is clear that when the

combination or linking patterns of nodes are considered, the

number of subgraphs to be evaluated rises dramatically. In

addition, in practice it is also necessary to consider combi-

nations of subgraphs that share several graphs. As the scale

of the problem increases, the number of possible combina-

tions explodes.

As a result, the authors attempted in their research to

accelerate processing by using heuristics with the idea that

if the description length is small when describing a graph

set using a particular graph, then the description length will

also be small when describing a graph set using a different

graph. For instance, Fig. 5 shows the state when the Sub-

graph1 common to Graph1, Graph2, and Graph3 is already

selected as the solution and Subgraph2 and Subgraph3 exist

as candidates to be selected next. Although the number of

vertices and the number of edges for Subgraph2 and Sub-

graph3 are the same, Subgraph2 is represented twice in the

graph set and Subgraph3 is represented three times. As a

result, the length of the description when describing the

graph set using its individual units is smaller for Subgraph3.

Therefore, Subgraph3 is used when describing the graph set

in combination with Subgraph1.

4. Network Generation Algorithm

In this section, the parallel network organization al-

gorithm for the graph set using the MDL principle is

described. A network is created using this algorithm in

order to search a given graph set efficiently. Each node

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

Fig. 5. Selection policy of subgraph.

71

within the network corresponds to a detected subgraph. In

addition, the nodes which correspond to subgraphs com-

mon to multiple graphs are shared in the network, and as a

result, duplication of matching can be avoided, and graph

searches can be performed more efficiently.

4.1. Details of the algorithm

Only one node exists in the initial state. This node is

called the input node. A given graph is always passed to this

input node first. The graph is broken down into node sets at

the input node, and the corresponding vertex nodes are

generated. The vertex nodes are linked to the input node as

a child node of the input node (Fig. 6).

Figure 7 shows this procedure using pseudo-code. In

the initial state, there are no child nodes for the input node

at all. As a result, a vertex node newNode which has the

same label as the branching vertex node vertex is created,

and an instance of the vertex vertex is stored.*

Next the internodes are generated. An internode rep-

resents a subgraph generated using a combination of vertex

nodes and/or other internodes. Figure 8 illustrates an exam-

ple of the generation of internodes. Because the evaluation

of the nodes themselves is performed independently in

various parts of the network, higher speeds can be realized

through parallel processing.

Figure 9 shows this procedure using pseudo-code.

Within this procedure, attention should be paid to instance1

and instance2, which are linked together under the same

graph, from among the instances held in the node in the

network at a particular point in time. Then, the description

length of the graph set for when the original graph set is

described is calculated using the subgraph when combining

these instances. Next, in accordance with heuristics based

on the MDL principle, the subgraph that decreases the

description length of the original graph set can be consid-

ered as making the description length even smaller when it

is combined with other subgraphs. As a result, a new node

named newNode is generated from a combination which is

evaluated more highly.

In the authors� research, several processors evaluate

the nodes independently and then actually generate the one

which is evaluated most highly. This is the same as search-

ing in parallel for all subgraphs included in the given graph

set and then selecting the one which is evaluated more

highly. For instance, in Fig. 8, the processes Process1 and

Process2 evaluate the graph which has edge p from vertex

y to vertex x, respectively, and the graph which has edge r

from vertex y to vertex z. As a result, when the evaluation

of the graph which has edge q from vertex y to vertex x is

high, a node corresponding to that graph is actually gener-

ated, as shown in Fig. 10.

Fig. 6. Generation of vertex nodes.Fig. 7. Vertex-node generation algorithm.

*When there is more than one vertex with the same label, the same numbers

of instances are stored in one vertex node. Fig. 8. Generation of internodes (a).

72

When a new node is generated, the instance corre-

sponding to the two parent nodes is eliminated and that

node is stored as a new single node. In Fig. 10, because the

evaluation of the graph which had edge q from vertex y to

vertex x was evaluated most highly, the instance stored at

vertex node y and vertex node x is eliminated, and is stored

as a new single instance at the newly generated node. This

instance then becomes a candidate when creating even more

new nodes. In Fig. 10, the generated node is evaluated using

Process1. In this fashion, new nodes are generated one after

the other in all parts of the network.

The nodes corresponding to the individual graphs in

the original graph set are ultimately generated in the low-

ermost part of the network by repeating the above process.

The combinatorial process is complete when the nodes

corresponding to all the graphs have been generated.

5. Experiments

5.1. Validity of heuristics based on the MDL

principle

In order to demonstrate the validity of the heuristics

described in Section 3.2, the three graph sets with a total

number of vertices of 12 and 14 and the four graph sets

shown in Fig. 3 (total number of vertices, 16) were input,

and an experiment was performed to search for a subgraph

combination that could describe the original graph set more

efficiently. Table 1 shows the processing time and evalu-

ation value from Eq. (1) for search methods that were

completely random, that used heuristics as described in

Section 3.2, and that took into consideration all combina-

Fig. 9. Internode generation algorithm.

Fig. 10. Generation of internodes (b).

Table 1. Comparison of three subgraph search methods

(number of vertices 12, 14, 16)

73

tions of subgraphs (brute force). Note that the results rep-

resent the means and variances for experiments performed

10 times.

When using random as the search method, first one

candidate was selected at random from among the sub-

graphs generated at first, and then it was evaluated using

Eq. (1). When this evaluation value was even smaller than

the description length of the original graph set, that candi-

date was considered to be a solution, and then another

candidate was selected at random. Next, the subgraph se-

lected as a solution and the newly selected candidate were

combined, with the result being evaluated again using Eq.

(1). If this evaluation value was even smaller than the

previous one, the newly selected candidate was added as a

solution. This procedure was performed until the evaluation

value ceased to become smaller. On the other hand, when

heuristics were used as the search method, selection of

candidates for which the evaluation value of the subgraph

was small according to Eq. (1) were given priority.

As can be seen in Table 1, all combinations are

considered when brute force is used. As a result, there is a

combinatorial explosion for 16 vertices, making measure-

ments impossible. Consequently, the same experiment was

performed using only random and heuristic searches for

five subgraphs with 257 vertices. Note that the results

represent the means and variances for experiments per-

formed 10 times. From these experimental results it can be

seen that if the above heuristics are used, the processing

time is compressed. In addition, as the respective variances

show, in contrast to the large dispersion in the processing

time for random searches, the processing time for heuristic

searches is kept within a comparatively narrow space. Fur-

thermore, in comparison with the evaluation performed at

random, the given graph set is described more efficiently.

However, in both cases there was a large scatter of the

evaluations, and it is difficult to say that an optimum sub-

structure has been detected.

5.2. Effects of parallelization

In order to evaluate the effects of using parallel proc-

essing, content description based on the Japanese children�s

story �Omusubi Kororin� was performed, and then repre-

sented in a graph using the conceptual dependency repre-

sentation [5] proposed by Schank. The data in question

were four types of graph sets composed of a total number

of vertices of 257, 514, 771, and 1285. Note that for the

experiment a Silicon Graphics Origin 2000 (CPU: R10000

u 8; main memory: 512 Mbytes; OS: IRIX 6.4) was used.

Figure 11 shows the relationship between the number of

processors used and the processing time. Each result repre-

sents the average value obtained by performing the experi-

ment 10 times. With the evaluation of the newly generated

network nodes done in parallel, a clear increase in process-

ing speed was seen with respect to the increase in the

number of processors (Fig. 11).

As has already been described, the processing which

generates the internodes in Fig. 9 consists of evaluation

processing of nodes which calculates the description length

of the graph set with respect to combinations of instances

linked at the edges of the original graph, selecting the

smallest evaluation value from the combinations of in-

stances evaluated in the node evaluation process, and then

determining and generating the node which is to generate

the internode using that combination.

Because the portion of the node evaluation process-

ing in this system is performed independently by each

processor, the evaluation of one node can be performed

many times. If such duplication can be eliminated, far

greater speeds can be achieved.

In the area which determines and generates the nodes,

each instance which is stored in the two parent nodes which

generate a node is eliminated, and one new instance is

delivered for the newly generated child node. However,

caution is required when several processors are performing

such processes, as is the case with this algorithm. For

instance, the dotted lines in Fig. 12 indicate the situation in

Table 2. Comparison of two subgraph search methods

(number of vertices 257)

Fig. 11. Experimental result of parallel network

organization algorithm.

74

which several processors simultaneously generate two

graphs for a graph which has edge r from vertex y to vertex

z.

As shown in Fig. 12, if before an instance is erased

from a parent node a separate processor generates a new

child node using the same instance, instances which were

originally one and the same (in this case, the instance in

which the vertex node corresponding to vertex y is held)

end up being duplicated. As a result, an element which

originally represented one node in the graph can end up

being represented as a network of several elements. To put

this a different way, the node evaluation processing can be

performed independently in each part of the network, but

there must not be more than one processor engaged in

determining and generating nodes. In other words, the

process of determining and generating nodes represents a

critical region of this algorithm.

Furthermore, although the number of combinations

as well as the processing time increase for node evaluation

processing when the scale of the graph rises, the processing

to determine and generate nodes generates only one node

from among the evaluated combinations. Consequently, it

is small enough that it can be ignored, regardless of the scale

of the graph.* Therefore, even when the scale of the graph

grows larger, the time required for critical region calcula-

tions is expected to remain small, and the effects on paral-

lelization minimal.

5.3. Detection of subgraph isomorphism

Figure 3 shows the results of applying this organizing

algorithm to the data described above. In the original graph

expression, the relationships among objects were complex,

and understanding the structure was very difficult. How-

ever, by organizing them, the data take on layers, and the

relationships between them become clearer. This is useful

for searches. Moreover, in Fig. 13, structures which corre-

spond to various scenes, from small ones such as �a scene

in which an old man holds chopsticks� to comparatively

large ones such as �a scene in which a mouse�s daughter

holds a dish and brings it to the old man from outside the

room,� can be identified. However, because only the de-

scription length of the graph set is given attention in this

algorithm, semantically significant elements are not neces-

sarily found. Methods to find semantically significant struc-

tures by using background knowledge as a standard for

using isomorphism, in addition to the MDL principle, have

been considered with respect to this problem.

6. Related Research

Several algorithms similar to the one proposed in this

paper have been proposed for organization based on iso-

morphism and detection of important structures from

among large amounts of data. Among these, SEQUITUR

[2] represents an algorithm that detects grammar rules in

symbol sequences by focusing on partial sequences that

appear repeatedly in symbol sequences scanned in se-

quence from their start. An example of SEQUITUR would

be the detection of a group of words with the same meaning

or structure from within one sentence written using many

words or the extraction of a phrase repeated in the lyrics of

a song.

SEQUITUR is designed with two restrictions for

identifying grammar rules: Diagram Uniqueness and Rule

Utility. Diagram Uniqueness is a restriction in which two

neighboring symbols cannot appear under several rules. If

such a partial sequence is detected, that partial sequence is

replaced with a new nonterminating symbol immediately.

This can be seen as being the same as the procedure to

�combine small structures that appear in several graphs and

detect larger structures� in the authors� research. Rule Util-

ity is a restriction in which a detected rule cannot be used

more than once. This restriction is set up in order to guar-

antee that the rules are used effectively. This can be seen as

being the same as the standard to �detect structures which

have more graphs in common� used in the authors� research.

Therefore, the authors� algorithm can be thought of as a

generalization of SEQUITUR from simple symbol se-

quences to graph representations.

SUBDUE [1] is an algorithm which detects layered

structures which have graphs using the MDL principle.

SUBDUE uses one vertex of a graph as an isomorphism

when evaluating isomorphisms. Then it performs evalu-

ations for each subgraph composed of that vertex and other

Fig. 12. Instance passing in the network.

*As the scale of the graph grows larger, the number of internodes created

increases. As a result, the processing time for node generation may rise

slightly due to instance processing. This is, however, small enough to be

ignored when compared to the overall processing time.

75

vertices linked to it. The one that is evaluated most highly

is adopted and becomes the new isomorphism. SUBDUE

uses a strategy in which the above procedure is repeated and

the one isomorphism included in a graph is expanded.

Therefore, the number of isomorphisms detected from a

graph under SUBDUE will always be 1. In addition,

searches of graphs are not considered, something else that

makes SUBDUE different from the algorithm described in

this paper.

NA [4] is an algorithm which, like the authors�,

organizes a graph set into a network by focusing on the

subgraphs which have several common graphs. NA is a

system which performs searches of graphs which have

structures identical to newly input graphs from among a

given graph set. NA is an incremental algorithm which

processes a given graph consecutively in sequence. NA

does not use a method to select appropriate structures from

among the many subgraph combinations, and as a result the

structure finally obtained is not necessarily an appropriate

isomorphism with respect to the original graph set. When

organization is completed without detecting an appropriate

structure, matching for the same isomorphism may be

performed more than once. Consequently, the matching

costs may not be reduced, regardless of what organization

is performed.

The GBI (Graph Based Induction) [8] method has

been proposed as a way to extract isomorphisms for graphs

based on the idea of �expanding and combining patterns

discovered in previous iterations� applied to automated user

modeling [7] and to perform classification code learning

from graphs. Under the sequential pair expansion algorithm

in the GBI method, the original graph set can be described

using a combination of several graphs, as is the case with

the authors� algorithm. However, the use of statistical stand-

ards in class rule learning as a standard for extracting

similar structures which appear in several graphs is differ-

ent from the authors� algorithm. Here, �similar� refers to

the frequency of occurrence in the data. In the MDL prin-

ciple, the evaluation criterion used for the authors� algo-

rithm, the final description efficiency represents a problem.

The effect errors in this criterion have on the isomorphisms

ultimately obtained represents a topic for future study.

7. Conclusions

In this paper the authors describe the usefulness of a

multiprocessor environment for their algorithm in addition

to explaining a parallel algorithm used to structure net-

works for graph searches from graph set isomorphisms

detected using the MDL principle.

Fig. 13. Interesting subgraphs discovered from network.

76

The load and distribution of the parallel algorithm are

not considered sufficiently under the present algorithm. As

was described in Section 5, each processor evaluates node

in a completely independent fashion. As a result, an evalu-

ation of one particular node may be performed several

times. The authors believe that far greater speeds can be

achieved by eliminating such duplication to the extent

possible and making the processing performed by each

processor more efficient.

Future topics of study will include expressions within

the data. In order to calculate the evaluation value when

generating new nodes from instances of two internodes

within a network, whether or not there is an edge connecting

these two instances must be checked. For instance, when

considering a case of an instance with m vertices and an

instance with n vertices, the number of combinations of

vertices to be checked is nm. At present, a representation of

a neighbor list requiring minimal storage area is used as an

internal representation of the graph. If the size of each graph

is not too large, the effects of the problem of the storage

area are not expected to be too great. Therefore, if each

process can be returned for bit calculations by using an

array expression, faster processing can be realized.

Acknowledgments. This project was supported in

part by a Japanese Ministry of Education Grant-in-Aid for

Scientific Research on Priority Area: �Research and Devel-

opment of Advanced Database Systems for Integration of

Media and User Environments,� and by Research for the

Future Program of the Japan Society for the Promotion of

Science under project �Researches on Advanced Multime-

dia Contents Processing.� The authors thank Mr. Yoshinori

Nakanishi, a doctoral candidate at Kobe University, for

supervising the follow-up experiments.

REFERENCES

1. Cook DJ, Holder LB. Substructure discovery using

minimum description length and background knowl-

edge. J Artif Intell Res 1994;1:231�255.

2. Nevill-Manning CG, Witten IH. Identifying hierar-

chical structure in sequences: A linear-time algo-

rithm. J Artif Intell Res 1997;7:67�82.

3. Forgy CL. RETE: A fast algorithm for the many

pattern/many object pattern match problem. Artif

Intell 1982;19:17�37.

4. Messmer BT, Bunke H. A network based approach to

exact and inexact graph matching. Tech Rep LAM-

93-021, University of Berne, 1993.

5. Schank RC, Riesbeck CK. Inside computer under-

standing: Five programs plus miniatures. Erlbaum;

1981.

6. Yamanishi K. Introduction to MDL from viewpoints

of computational learning theory. J Jpn Soc Artif

Intell 1992;7:435�442.

7. Yoshida K, Motoda H. Automated user modeling for

intelligent interface. Int J Hum Comput Interact

1996;8:237�258.

8. Yoshida K, Motoda K. Inductive inference by step-

wise pair expansion. J Jpn Soc Artif Intell

1997;12:58�67.

AUTHORS

Keita Maehara graduated from the Department of Computer and Systems Engineering of Kobe University in 1996. He

completed the first half of his doctoral studies there in 1998. Currently, he is working at Yamaha, Inc. While a student, he was

primarily pursuing research related to machine learning and multimedia databases.

77

AUTHORS (continued)

Kuniaki Uehara graduated from the Engineering and Information Department of Osaka University in 1978. He completed

the second half of his doctoral program in 1983. After serving as a lecturer and then an instructor at the Institute of Scientific

and Industrial Research at Osaka University, and subsequently an assistant professor in the Department of Computer and Systems

Engineering at Kobe University, he is now a professor at the Research Center for Urban Safety and Security at Kobe University.

He also serves in the Department of Computer and Systems Engineering at Kobe University. He was a visiting assistant professor

at Oregon State University in 1989 and 1990. He served as assistant director of the General Information Processing Center at

Kobe University from 1994 to 1996. He holds a D.Eng. degree. He is pursuing research related to artificial intelligence, in

particular machine learning, multimedia databases, and human interfaces using natural language. He was the recipient of the

1990 Japanese Society for Artificial Intelligence�s Research Scholarship. He is a member of the Japanese Society for Artificial

Intelligence, the Information Processing Society of Japan, the Mathematical Linguistic Society of Japan, the Japan Society for

Software Science and Technology, and AAAI.

78