[IEEE 2008 IFIP International Conference on Network and Parallel Computing (NPC) - Shanghai, China (2008.10.18-2008.10.21)] 2008 IFIP International Conference on Network and Parallel

Towards Mapping Large Scale Ontologies Based on RFCA with Attribute Reduction*

Pingli Gu, Jiuyun Xu1, Changbao Li, and Youxiang Duan

School of Computer & Communication Engineering China University of Petroleum

[email protected];[email protected]; [email protected]

* This research work is supported by the graduate Student innovation funding of China University of Petroleum. 1 Corresponding author: Pingli Gu,[email protected] ; Jiuyun Xu, [email protected]

Abstract

Ontology mapping is one of the most fundamental issues to address the interoperability between heterogeneous and distributed ontologies. So far, many efforts have been conducted to suggest ontology mapping models. The RFCA mapping model is one of potential them. However, how to construct relationship among the large scale ontologies is also one of challenges concerning the semantic web world. This paper addresses the problem of the reduction of formal context in the mapping process of the RFCA model to adapt the large scale ontologies. Based on this issue, a method using Attribute Reduction to enhance the RFCA ontology mapping method is proposed. Using Attribute Reduction technology, the RFCA method can be adaptable to the large scale of ontology mapping. A prototype has implemented based on this method. The results of experiments show that this method is potential method to adaptable to the large scale of Ontology engineering. Keywords: Ontology Mapping, RFCA, Attributes Reduction, Large Scale Ontology, Formal Concept Analysis. 1. Introduction

Ontology engineering is one of the infrastructures towards the Semantic Web. Ontology mapping is the big challenge to making the ontology engineering practical. As a formal method, ontology is an explicit formal description of domain knowledge. With ontology, the domain knowledge can be described as concepts, properties of concepts, constraints on properties and attributes explicitly. Generally, ontology includes a lot of concepts, individual instances of concepts and the relationship among them. To address the interoperation issue, OWL language is usually adopted as ontology description language by W3C. As the distributed nature of local ontologies, different representations of ontology have built based on specific

requirements. For instance, one ontology in the same domain or overlapping field can be built with different representations including using different names for the same concept or using different structures for the same domain and so on. So, being got the interoperability of different formation of distributed ontology, mapping relation must be established at first. On the other hand, with the development of ontology, more and more ontologies are built in semantic web, and the scale of ontology is also larger and larger. It means that the method of ontology mapping should orient the large scale ontology.

According to systematic classification brought forward by E.Rahm[1], ontology mapping method provides a mapping model based on the ontology’s pattern, structure and instance and so on. For instance, Cupid method[2] brought forward by Madhavan J., Bernstein P A. and Rahm E., the main idea of the measure is that if the sub-concepts of two concepts are similar, the two concepts tend to be similar; if ancestors of the two concepts are similar, they are also considered to be same as their ancestors. GLUE method[3] proposed by Doan A., the main idea of that method is based on the multi-strategy learning. Ratio model[4], the model uses the quantity of the common feature sets of entities to determine the concepts similarity. Feature Number model[5], the model uses the feature number to compute the similarity. The two mapping model just consider the feature number, and they ignore the structure relation between feature and reduce the accuracy of mapping. FCA model[6] brought forward by Souza and Davis is designed with the set of common features and, the set of common structural elements of the concept lattice, which are called meet-irreducible elements.

Considering the performance issue of the FCA model proposed by Souza and Davis, Yi Zhao et al.[7] have suggested an ontology mapping model based on rough set and formal concept analysis (RFCA). The RFCA model expands the formal concept based on rough set, having improved the FCA model[6] to be able to only calculate on formal concept[7]. Compared with other similarity measures, the RFCA-based

2008 IFIP International Conference on Network and Parallel Computing

978-0-7695-3354-4/08 $25.00 © 2008 IEEE

DOI 10.1109/NPC.2008.36

413

2008 IFIP International Conference on Network and Parallel Computing

978-0-7695-3354-4/08 $25.00 © 2008 IEEE

DOI 10.1109/NPC.2008.36

407

similarity model has better performance and is more reliable since more information is incorporated in the decision.

In order to make these mapping methods to suitable for large scale ontology, a method of attribute reduction is introduced to improve ontology mapping in this paper. According to the theory of attribute reduction in document[8], attributes in ontology also were divided into 3 groups: core attributes, the relatively necessary attributes, unnecessary attributes. With maintaining the structural properties of the original concept, the unnecessary attributes set could be obtained using attribute reduction method of formal context. Through removing these unnecessary attributes, the scale of ontology is reduced, and these mapping methods will more suitable for large scale ontology. In this paper, we will take the RFCA model for example, and on the basis of RFCA model, a process of attribute reduction is added. Considering the main idea of this model, it is an extension of the RFCA model to improve the mapping efficiency and adapt it to large scale domain ontology. This paper is the following of our paper[10] which extends our modal with handling common attributes of mapping ontologies and more general experiments have taken.

The remainder of this paper is as follows. Section 2 describes the RFCA mapping model. The attribute reduction of RFCA model based on the theory of attribute reductions of formal context is introduced to realize the attribute reduction task in section 3. In section 4, we will describe the implementation of attribute reduction by example. Finally, conclusions are drawn with discussion in section 5. 2. Overview of RFCA mapping model

The RFCA model based on rough set and formal concept analysis was proposed on the FCA model[6] basis. This model expands the formal concept based on rough lower approximation, having improved the FCA model[6] to be able to only calculate on formal concept[7]. In this model, a reference concept lattice is constructed with the combination of two normalized contexts at first. Rough set theory is then used to calculate the similarity measure of the two ontology nodes. With a specified threshold, the final result of ontology mapping can be obtained. The mapping process can be summarized as three steps: 1) Extracting objects and attributes from the given ontology to generate formal context; 2) Generating concept lattice based on formal context; 3) Computing concept lattice node similarity degree

according to a formula. The formal context mentioned in the first step is a

concept of the theory of Formal Concept Analysis, the formal context is a triple (U, A, I), here U is the set of objects, A is the set of attributes, I is a binary relation between U and A. For example, o ∈U, a ∈A, oIa expresses that the object o has the attribute a. The

formal context can be expressed by two dimensions table, a matrix, the objects correspond to rows in the table, and attributes correspond to columns. If an object has some attribute, we can mark on the cross of the row and the column. The concept lattice mentioned in the second step is a lattice with partial order structure induced by formal context.

In the third step, a similarity value can be calculated according the formula:

|( ) |s im (a ,b )= |( ) | | | (1 ) | |

L A

L A L A L A L A L A

a ba b a b b aα α

∨∨ + − + − − (1)

Here, a , b are formal concepts, ( a b )∨ represents the set of meet-irreducible elements,

L A(a b )∨ represents the rough lower approximation of the set of meet-irreducible elements,

L A L Aa b− represents the attribute sets in LAa but not

in LAb , and L A L Ab a− represents those in LAb but

not in LAa , | . | represents the cardinality of a set. With a specified similarity threshold ό, concepts

whose similarity is greater than ό can be considered to be mapped.

3. RFCA Model of Using Reduction

Compared with other similarity measures, the RFCA mapping model has better performance and is more reliable since more information is incorporated in the decision. To further improve the performance of the RFCA-based similarity measure, the theory of attribute reductions of formal context is introduced in this section. The objective of attribute reduction of RFCA mapping model based on the theory of Reduction is to improve the mapping efficiency and the classification ability of concept lattice by removing some unnecessary attributes. For example, from Table 1. in section 4, we can see that only one row of the table namely one object has the attribute Postweaning. Intuitively, the attribute Postweaning is not important in this ontology. Accordingly, in Figure 2. in section 4, there is only one node has the attribute Postweaning. Intuitively, the unnecessary attribute effects on the classification ability of concept lattice. In fact, according to the theory of reduction, we also can find the unnecessary attributes in theory.

To describe our work clearly, we can use a flowchart to explain: the broken line is the improvement part of RFCA mapping model.

In the mapping process of RFCA mapping model, we get the formal context about the given ontologies in the first step. From this step, we can find that formal context matrix structure will be larger with ontology scale’s change, it will make the lattice more complicated in the second step and reduce mapping efficiency. And there are also some unnecessary attributes those effect the quality of the classification of the objects that compose the lattice in the second step.

414408

Figure 1. Flowchart of the RFCA model with reduction

To address the problem, we will make use of the theory of attribute reduction of formal context to reduce the attributes of the formal context in the step 1). The theory can get the attribute minimal subset, and do away with some unnecessary attributes on the premise of keeping the structure features of the original concept.

According to Definition in document[8], assuming a formal context (U, A, I), with x, y U ∈ , defining

A As im i la r ( x , y ) = f ( x ) f (y )∩ , and

s i m i l a r ( x , y ) is called similarity attributes set. Here, Af (x) represents those attributes about object x, and these attributes comes from attributes set A.

Af (y ) represents those attributes about object y, and the attributes comes from attributes set A. S im = (s im ila r(x , y):x , y U )∈ is called similarity attributes matrix of formal context ( U, A, I ).

This definition shows that if object x and object y has the same attributes, the similar(x,y) is their intersection attributes.

According to the Theorem in document[8], assuming a formal context ( U, A, I ), with x , y U∈ , B A⊆ , B is consistent attribute set, the sufficient and necessary condition is for the set of arbitrary similar attributes s im ila r (x , y ) φ≠ have B s im ila r (x , y ) φ∩ ≠ . Assuming i i{ (a :a similar(x, y):x, y U,Δ = ∧ ∨ ∈ ∈sim ilar(x, y) }φ≠ . According to Theorem, each con-junctive term of minimum disjunctive normal form of Δ corresponding to a set of attribute reduction of formal context. Therefore, from minimum disjunctive normal form of Δ , the reduction set which is included only essential attributes left is generated.

So, according to the theory of reduction, the work of the formal context reduction is primarily about: 1) Get the set of similarity attributes of formal context; 2) Get the minimum disjunctive normal form of Δ .

In the step 1, the primary work of the calculation of similar attributes is about:

Here, B is a temporary set. As a result, A is the set of

similar attributes. Each element of set A is a similar attribute set about arbitrary two objects. All of these subsets compose of the set A.

For the calculation of Δ in the second step, we can make use of Absorption Law and Distribution Law to get the minimum disjunctive normal form of Δ . The main works are:

In the follow section, we will give an example about

the reduction. We will use the example provided by the RFCA model, and compare with the result of RFCA model.

4. Implementation

415409

To verify the reduction, the reduction is implemented using Java programming language and Eclipse environment and the ConExp[9] tool. We take an example provided by the RFCA model to explain the reduction process.

Ontologies of Beef Cattle A and Dairy Cattle B are the example provided by RFCA model. Table 1. shows the part formal context about the two ontologies. In Table 1. the objects correspond to rows in the table, and attributes correspond to columns, the X on the cross of row and column means that the object has the attribute.

Table 1. Part formal contexts before reduction

From Table 1. we can see that there are fifteen

attributes, and there is few objects have attributes such as Postweaning (the eleventh attribute column) and Preweaning (the twelfth attribute column).

According to our step of reduction, firstly, we extract concepts and attributes from ontology of Beef Cattle A and ontology of Dairy Cattle B separately. We use the Dom technology of Java to obtain concepts and attributes. After obtaining concepts and attributes, we generate formal context separately, and then according to the steps described in section 3, we can get the unnecessary attributes. The unnecessary attributes in the ontology of Beef Cattle A are: sex(the fifth attribute column), Poaceae(6th column), GrazingSystems(8th column), DairyCattle(10th column), Postweaning(11th column), Preweaning(12th column), Pennisetum(13th column), Braquiaria(15th column). The unnecessary attributes in the ontology of Dairy Cattle B are: AnimalFeeding (3th

column), Growth (4th column), sex(5th attribute column), BeefCattle(9th column), Postweaning(11th column), Preweaning(12th column), Braquiaria(15th column).

Secondly, we combine the two formal context after reduction. Table 2. shows the reduction formal context of the ontologies of Beef Cattle A and ontology of Dairy Cattle B after reduction.

Table 2. Part formal contexts after reduction

From Table 2. we can find that those unnecessary

attributes are removed. According to formal context, we can use the tool ConExp[9] to get concept lattice.

Figure 2. and Figure 3. are Hasse diagram about the Table 1. and Table 2.

Figure 2. is the concept lattice before reduction, Figure 3. is the concept lattice after reduction. From Figure 2. and Figure 3. we can find that the number of the lattice node in Figure 3. is 24, the number of Figure 2. is 27, it have cut down 3, and we also can find that the classification is better than before.

416410

Figure 2. Hasse diagram before reduction about Table 1.

Figure 3. Hasse diagram after reduction about Table 2.

5. Conclusion and Future Work

In this paper, the method of formal context with attribute reduction is introduced to reduce the unnecessary attributes of ontology mapping in the RFCA mapping model. On the assumption of maintaining the main structural properties of the original concept, the attribute minimal subset is obtained, with some unnecessary attributes removed. Compared with the original model, the model of RFCA with attribute reduction can be easily adopted with the large scale of ontology mapping.

In the future work, we will further research how to identify the unnecessary attributes automatically in this mapping model mapping. References [1] DoHH.ErhardRahm, COMA-A System for Flexible Combination of Schema Matching Approaches. In: Proc. of the 28th Int .Conf. OV Very Large Database.2002.610～621. [2] Madhavan J, Bernstein P A, and RahmE, Generic schema matching with cupid. In: Proc. of the 27th Int1.Conf.on Very Large Databases,2001.49～58. [3] Doan A, et al., Learning to map Between ontologies on the semantic web. In: Proc. of the World Wide Web Conf. (WW W2002),2002. [4] Rodriguez.M.A and Egenhofer.M.J, Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering, 2003. 15. [5]Tversky.A., Features of Similarity. Psychological Review, 1977. 84. [6] X.S.de Souza and J.Davis., Aligning Ontologies and Evaluating Concept Similarities. R.Meersman, Z.Tari (Eds.):Lecture Notes in Computer Science, Springer Verlag, 2004. 3291: p. 1012-1029. [7] Yi Zhao, Xia Wang, and Wolfgang Halang, Ontology Mapping based on Rough Formal Concept Analysis. Proceedings of the Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services(AICT/ICIW 2006), 2006. [8] Li Tong-Jun, Zhang Wen-Xiu, and M. Jian—Min, Attribute Reductions and Attribute Features of Formal Contexts Based on a Type of Rough Sets. Computer Science, 2006. 33(9). [9] Concept Explorer. http://www.sourceforge.net/projects/conexp. [10] Jiuyun Xu Pingli Gu et al Enhancing Rough Set and Formal Context based Ontology Mapping Method with Attribute Reduction. ICPCA 2008 accepted.

417411

Documents

[IEEE 2008 IFIP International Conference on Network and Parallel Computing (NPC) - Shanghai, China (2008.10.18-2008.10.21)] 2008 IFIP International Conference on Network and Parallel