Ontology Mapping and Merging through OntoDNA for Learning Object Reusability

Kiu, C.-C., & Lee, C.-S. (2006). Ontology Mapping and Merging through OntoDNA for Learning Object Reusability. Educational Technology & Society, 9 (3), 27-42.

27 ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from the editors at [email protected].

Ontology Mapping and Merging through OntoDNA for Learning Object Reusability

Ching-Chieh Kiu

Faculty of Information Technology, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Malaysia [email protected]

Chien-Sing Lee Faculty of Information Technology, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Malaysia

[email protected] ABSTRACT

The issue of structural and semantic interoperability among learning objects and other resources on the Internet is increasingly pointing towards Semantic Web technologies in general and ontology in particular as a solution provider. Ontology defines an explicit formal specification of domains to learning objects. However, the effectiveness to interoperate learning objects among various learning object repositories are often reduced due to the use of different ontological schemes to annotate learning objects in each learning object repository. Hence, structural differences and semantic heterogeneity between ontologies need to be resolved in order to generate shared ontology to facilitate learning object reusability. This paper presents OntoDNA, an automated ontology mapping and merging tool. Significance of the study lies in an algorithmic framework for mapping the attributes of concepts/learning objects and merging these concepts/learning objects from different ontologies based on the mapped attributes; identification of a suitable threshold value for mapping and merging; an easily scalable unsupervised data mining algorithm for modeling existing concepts and predicting the cluster to which a new concept/learning object should belong, easy indexing, retrieval and visualization of concepts and learning objects based on the merged ontology.

Keywords

Learning object, Semantic interoperability, Ontology, Ontology mapping and merging, Semantic Web Introduction The borderless classroom revolves not only around the breaking down of geographical and physical borders but also, of cultures and knowledge. Realizing this vision however, requires orienting next-generation e-learning systems to address three challenges: first, refocusing development of e-learning systems on pedagogical foundations; second personalizing human-computer interaction in a seamless manner amidst a seemingly faceless technological world and third increasing interoperability among learning objects and among technological devices (Sampson, Karagiannidis & Kinshuk, 2002). A corresponding ontological solution, which addresses all these three issues, is Devedzic’s (2003) GET-BITS framework for constructing intelligent Web-based systems. Ontology is applied at three levels. The first is to provide a means to specify and represent educational content (domain knowledge as well as pedagogical strategies). The second application of ontology is to formulate a common vocabulary for human-machine interaction, human-human interaction and software-to-software agent interaction in order to enable personalization in presentation of learning materials, assessment and references adapted to different student models. The third application lies in systematization of functions (Mizoguchi & Bordeau, 2000) to enable intelligent help in learning systems, especially in collaborative systems. The concerns mentioned above point towards Semantic Web technologies as solutions. Layers in Semantic Web technologies are as shown in Figure 1 (Arroyo, Ding, Lara, Stollberg & Fensel, 2004). These layers appear to provide the solution to the third aspect mentioned by Sampson et al and Devedzic, which is also the focus in this paper. At the top most layer is the Web of trust, which relies on the proof and logic layers. The proof layer verifies the degree of trust assigned to a particular resource through security features such as digital signatures. The logic layer on the other hand, is formalized by knowledge representation data, which provides ontological support to the formation of knowledge representation rules. These rules enable inferencing and derivation of new knowledge. Basic description of data (metadata) is defined in the Dublin core such as author, year, location and ISBN. RDF (Resource Description Framework) schema and XML (eXtensible Markup Language) schema both provide means for standardizing metadata descriptions. RDF’s object-properties-values (OAV) building block creates a semantic net of concepts whereas XML describes grammars to represent document structure either through data type definitions (DTD) or XML schemas. Finally, all resources are tagged with uniform resource

28

identifiers to enable efficient retrieval. Unicode, a 16-bit character scheme, creates machine-readable codes for all languages. Problem Definition and Background Background to the Problem The Semantic Web layers discussed above can be applied to the retrieval and reuse of learning objects as shown in Figure 2 (Qin & Finneran, 2002). According to Mohan and Greer (2003), “learning objects (LO) are digital learning resources which meet a single learning objective that may be reused in a different context”. Basic metadata to describe a learning object are defined in the Dublin core followed by contextual information that allows reuse of a learning object in different contexts. However, metadata expressed in XML addresses only lexical issues. XML does not allow interpretation of the data in the document. XML allows only lexical interoperability.

Figure 1. The Semantic Web stack

Figure 2. Representation granularity

Furthermore, metadata schemas are user-defined. Due to differences in metadata specifications, metadata standardization initiatives such as ADL’s Sharable Content Object Resource Model (SCORM) and IEEE’s Learning Object Model (LOM) are introduced. However, LO reusability is complicated due to differences in LO metadata standards. For example, LO metadata standards such as IEEE LOM and Dublin Core are RDF binding whereas SCORM is XML binding. Hence, there are admirable efforts to map different learning object standards (Verbert, Gasevic, Jovanovic & Duval, 2005) in order to increase interoperability between learning object repositories. Significance of this work is the retrieval and assembly of parts of learning objects to create new learning objects suitable for different learning objectives and contexts.

29

On a higher layer of abstraction, additional constraints to schemas are introduced through W3C’s Web Ontology Language (OWL). OWL extends RDF’s OAV schemas using DAML (DARPA Agent Markup Language) and OIL (Ontology Inference Layer). OWL creates a common vocabulary by adding constraints to class instances, property value, domain and range of an attribute (property) by providing interoperability functions such as sameClassAs and differentFrom (Mizoguchi, 2000). However, there are also differences among ontologies. This paper deals with semantic and structural heterogeneity. Ontological semantic heterogeneity arises from two scenarios. In the first scenario, ontological concepts for a domain are described with different terminologies (synonymy). For instance, in Figure 3, the terms booking and reservation; client and customer are synonymous but termed differently. In the second scenario, different meanings are assigned to the same word in different contexts (polysemy). On the other hand, different taxonomies cause structural heterogeneity among ontologies (Euzenat et al., 2004; Noy & Musen, 2000; de-Diego, 2001; Ramos, 2001; Stumme & Maedche, 2001). Instances of structural differences in Figure 3 are between the concepts airline and duration.

Figure 3. Semantic differences and structural differences between ontologies

Hence, there is a need for ontology mapping and merging. Ontology mapping involves mapping the structure and semantics describing learning objects in different repositories whereas ontology merging integrates the initial taxonomies into a common schematic taxonomy. [As such, in this paper the term merged ontology is used interchangeably with the term shared ontology]. Problem Statement Four major issues constrain efforts toward ontology mapping and merging tasks: 1. Semantic differences: Some ontological concepts describe similar domain with different terminology

(synonymy and polysemy). This results in overlapping domains. Thus, there is a need for mapping tools to interpret metadata descriptions from a lexical and semantic perspective to resolve the problems of synonymy and polysemy.

2. Structural differences: Structure in ontology mapping and merging refers to the structural taxonomy associating concepts (objects). Different creators annotate LOs with different ontological concepts. This creates syntactical differences, necessitating merging tools, which are able to capture the different taxonomies and merge the taxonomies into a common taxonomy.

3. Scalability in ontology mapping and merging: This is true especially for large ontological repositories where the growth of LO repositories over the network can be explosive.

4. Lack of prior knowledge: Prior knowledge is needed for ontology mapping and merging using supervised data mining methods. However, such knowledge is not always available. Thus, unsupervised methods are needed as an alternative in the absence of prior knowledge.

30

Research Questions We aim to design an automated and dynamic ontology mapping and merging framework and algorithm to enable syntactical and semantic interoperability among ontologies. Our research questions are: 1. Semantic interoperability: 2. How can we capture attributes describing concepts (objects) in order to map the attributes of concepts from

different ontologies? 3. What is the best threshold value for automated mapping and merging between two ontological concepts

based on experiments on 4 different ontologies? 4. Structural interoperability: How can we use these captured attributes to create a shared (merged) taxonomy

from different ontologies? 5. Scalability: Which unsupervised data mining technique is sufficiently easy to scale? 6. Lack of prior knowledge for knowledge management: Which unsupervised data mining technique is easy to

use for modeling existing concepts in the database and for predicting the cluster to which a new concept should be categorized into?

Significance of the Study The contributions of this paper are: 1. An algorithmic framework for mapping attributes from concepts in different ontologies and for merging

concepts from different ontologies based on the mapped attributes. 2. Determination of a suitable threshold value for automated mapping and merging 3. An easily scalable unsupervised data mining technique that enables modeling of existing concepts in the

database and for predicting the cluster to which a new concept should be categorized into in order to enhance the management of knowledge.

4. Easy indexing and retrieval of LOs based on the merged ontology. 5. Easy visualization of the concept space based on formal context. The outline for this paper is as follows: First, we will present related work in ontology mapping and merging. This is followed by a discussion on the OntoDNA framework; the OntoDNA algorithm, simulation results on 4 different ontologies; and comparison with Ehrig and Sure’s (2004) integrated approach to ontology mapping in terms of precision, recall and f-measure. Next, we present how OntoDNA is applied in the CogMoLab integrated learning environment, in particular, in the OntoVis authoring tool. Finally, we conclude with future directions. Related Work on Ontology Mapping and Merging Ontology mapping is a precursor to ontology merging. As mentioned earlier, ontology mapping is the process of finding the closest semantic and intrinsic relationship between two or more existing ontologies of corresponding ontological entities such as concept, attribute, relation, instance and etc. Ontology merging however, is the process of creating new ontologies from two or more existing ontologies with overlapping or similar parts (Klein, 2001). Given ontology A (preferred ontology) and ontology B as illustrated in Figure 4, ontology mapping is used to discover intrinsic semantics of ontological concepts between ontology A and ontology B. Concepts between ontologies may be related or unrelated. The ontology merging process looks at the semantics between ontologies and restructures the concept-attribute relations between and among concepts in order to merge the different ontologies.

Figure 4. Ontology Mapping and Merging Process

31

Ontology mapping and merging methods mainly can be categorized into two approaches, the concept-based approach and the instance-based approach. Concept-based approaches are top-down approaches, which consider concept information such as name, taxonomies, constraints and relations and properties of concept elements for ontology merging. On the other hand, instance-based approaches are bottom-up approaches, which build up the structural hierarchy based on instances of concepts and instances of relations (Gomez-Perez, Angele, Fernandez-Lopez, Christophides, Stutt, & Sure, 2002). Some ontology mapping and merging systems, namely, Chimaera, PROMPT, FCA-Merge and ODEMerge are described in this section. Chimaera is a merging and diagnostic ontological environment for light-weight ontologies. The ontologies are automatically merged if the linguistic match is found. Otherwise, name resolution lists are generated, suggestion the terms from different ontologies to guide users in the merging process. The name resolution lists consists the candidate to be merged and taxonomic relationships that are yet to be merged into the existing ontology (McGuinness, Fikes, Rice, & Wilder, 2000). Similarly, PROMPT provides semi-automatic guided ontology merging. PROMPT is plugged into Protégé 2000. PROMPT’s ontology merging process is interactive. It guides the user through the ontology merging process. PROMPT identifies matching class names and iteratively performs automatic updates. PROMPT also identifies conflicts and makes suggestions on means to remove these conflicts to the user (Noy & Musen, 2000). A fully automated merging tool, ODEMerge is integrated with WebODE. ODEMerge performs supervised merging of concepts, attributes and relationships from two different ontologies using synonym and hypernym tables to generate the merged ontology. It merges ontologies with the help of corresponding information from the user. The user can modify results derived from the ODEMerge process (de-Diego, 2001; Ramos, 2001; Gomez et al., 2002). In contrast to Chimaera, PROMPT and ODEMerge, which are top-down approaches, FCA-Merge is a bottom-up ontology merging approach using formal concept analysis and natural language processing techniques. Given source ontologies, FCA-Merge extracts instances from a given set of domain-specific text documents by applying natural language processing techniques. The concept lattice, a structural result of FCA-Merge, is derived from the extracted instances using formal concept analysis. The result is analyzed and merged with the existing ontology by the ontology engineer (Stumme & Maedche, 2001). Differences between OntoDNA and Related Work OntoDNA utilizes Formal Concept Analysis (FCA) to capture attributes and the inherent structural relationships among concepts (objects) in ontologies. FCA functions as a preprocessor to the ontological mapping and merging process. Semantic problems such as synonymy and polysemy are resolved as FCA captures the structure (taxonomy) of ontological concepts as background knowledge to resolve semantic interpretations in different contexts.

Table 1. Comparison between OntoDNA and four ontology mapping and merging systems Chimarea PROMPT ODEMerge FCA-Merge OntoDNA

Problem addressed Merging Mapping and Merging

Merging Merging Mapping and Merging

Approach Top-down Top-down Top-down Bottom-up Top-down Is the tool integrated in other ontology tool? Which?

Yes Yes. Prompt Suite Yes. WebODE No. It is a method No.

Concept definitions and slot values

Taxonomies

Leve

l of

map

ping

Instances of concepts

Knowledge-representation supported

No. No. No. Lexicons Lexicons – string matching

Suggestion provided by the method

Name resolution lists are generated to guide users in the merging process

A list of suggested concepts to be merged

- - No.

Methodology or techniques supported

- - - Natural language processing and

Conceptual clustering (FCA)

32

conceptual clustering (FCA)

Unsupervised data mining (SOM and k-means)

Type of output Merged ontology Merged ontology Merged ontology Merged ontology in concept lattice

Merged ontology in concept lattice

Level of user interaction - Adjusting the system suggestion

Supply synonym and polysemy files for merging

The produced merged ontology is fine-tuning by the user

-

Level of automaticity Semi-automated Semi-automated Fully Automated Fully Automated Fully Automated A hybrid unsupervised clustering technique, Self-Organizing Map (SOM) – k-means (both explained in Vesanto & Alhoniemi, 2000; Kiu & Lee, 2005) is employed by OntoDNA to organize data and reduce problem size prior to string matching in order to address semantic heterogeneity in different contexts more efficiently. A priori knowledge is not required in unsupervised techniques as the unsupervised clustering results are derived from the natural characteristics of the data set. SOM organizes data, clustering more similar objects together, while k-means is used to reduce the problem size of the SOM map for efficient semantic heterogeneous discovery. Most of the mentioned tools are based on syntactic and semantic matching heuristics. User interaction on the ontology merging process is required to generate the merged ontology. However, OntoDNA provides automated ontology mapping and merging using unsupervised clustering techniques and string matching metric to generate the merged ontology. Similar to the above tools, OntoDNA allows the user to modify system-generated choices if he or she wants to. However, if the option is not selected, then the output from the system is automatically updated. Table 1 summarizes these comparisons in terms of the problem addressed, the type of approach, the level of mapping, the level of automation, the type of output and the presence or absence of knowledge-representation supported in ontology mapping and merging systems.

�Figure 5. OntoDNA framework for ontology mapping and merging

33

OntoDNA Framework and Algorithm OntoDNA Framework The OntoDNA automated ontology mapping and merging framework is depicted in Figure 5. The OntoDNA framework enables learning object reuse through Formal Concept Analysis (FCA), Self-Organizing Map (SOM) and K-means incorporated with string matching based on Levenshtein edit distance. Ontological Concept Ontology is formalized as a tuple O: = (C, SC, P, SP, A, I), where C is concepts of ontology and SC corresponds to the hierarchy of concepts. The relationship between the concepts is defined by properties of ontology, P whereas SP corresponds to the hierarchy of properties. A is axioms used to infer knowledge from existing knowledge and I is instances of concept (Ehrig & Sure, 2004). Clustering Techniques Formal Concept Analysis (FCA) is a conceptual clustering tool used for discovering conceptual structures of data (Ganter & Wille, 1997; Lee & Kiu, 2005). To allow significant data analysis, a formal context is first defined in FCA. Consequently, the concept lattice is depicted according to the context to represent the conceptual hierarchy of the data. As shown in Table 2, the formal context for learning objects is contextualized. The concepts of the ontology are filled in the matrix rows and the corresponding attributes and the matrix columns represent relations of the concepts. A ‘1’ indicates the binary relation that the concept g has the attribute m. The source ontology and target ontology is first formalized using Formal Concept Analysis, followed by semantic discovery through string matching using Levenshtein edit distance.

Table 2. Example of a formal context for an ontology

vers

ion

desc

riptio

ntit

lepu

blis

hedO

nab

stra

ctso

ftCop

yUR

Iso

ftCop

yFor

mat

softC

opyS

ize

inst

itutio

nvo

lum

eor

gani

zatio

nsc

hool

chap

ter

publ

ishe

rjo

urna

lco

unte

rty

peno

teke

ywor

dpa

ges

num

ber

book

title

serie

sad

dres

sed

ition

auth

orfir

stA

utho

red

itor

rela

tedP

roje

ctso

ftCop

y

PhdThesis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Misc 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1SoftCopy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0TechReport 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1MastersThesis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1InBook 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1InProceedings 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1InCollection 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

To accelerate the ontology mapping and merging process, the source ontology is fed to the self-organizing map (SOM). The SOM is an unsupervised neural network used to cluster the data set according to similarity. SOM compresses complex and high-dimensional data to lower-dimensional data, usually to a two-dimensional grid. The most similar data are grouped together in the same cluster. The SOM clustering algorithm provides effective modeling, prediction and scalability to cluster data. An unsupervised clustering technique, k-means is applied on the learnt SOM to reduce the problem size of the SOM cluster. K-means iteratively divides a data set to a number of clusters and minimizes the error function. To compute the optimal number of clusters k for the data set, the Davies-Bouldin validity index is used (Vesanto & Alhoniemi, 2000; Kiu & Lee, 2004). The determination of the best number of k-clusters to be used based on the Davies-Bouldin index provides scalability to the modeling of ontological concepts. Figure 6 illustrates the trained SOM, k-means clustering of the trained SOM based on k = 3, and the clustering of new ontological concepts to the SOM cluster which best matches it. In the event of new concepts, they will be clustered into the same cluster as their best matching units. Subsequently, the new concepts are merged based on semantic similarity defined by Levenshtein edit distance and the older version of the source ontology will be dynamically updated with the new concept.

34

(a)

(b)

(c)

Figure 6. (a) The trained SOM for the ontology, (b) K-Means clustering of trained SOM and (c) New Ontological Concepts to SOM’s BMU (boxed)

String Matching Metric String matching based on Levenshtein edit distance is found to provide the best semantic similarity metric as demonstrated in the empirical experiment section. As such, we have applied it in the OntoDNA to measure similarity between two strings. Given two ontological elements, ontological element A, OE1 and ontological concept B, OE2, the edit distance calculated between OE1 and OE2, is based on simple editing operations such as delete, insert and replace. The result returned by string matching is in the interval range [0, 1], where 1 indicates similar match and 0 dissimilar match. OntoDNA Algorithm The terms used in the OntoDNA framework are first explained:

Source ontology OS: Source ontology is the local data repository ontology Target ontology OT: Target ontology refers to non-local data repository ontology Formal context KS and KT: Formal context KS is the formal context representation of the conceptual

relationship of the source ontology OS, meanwhile formal context KT is the formal context representation of the conceptual relationship of the target ontology OT.

Reconciled formal context RKS and RKT: Reconciled formal context RKS and RKT are formal context with normalized intents of source and target ontological concepts’ properties.

The prototypical implementation of the automated mapping and merging process illustrated in Figure 5 above is explicated below: Input : Two ontologies that are to be merged, OS (source ontology) and OT (target ontology) Step 1 : Ontological contextualization

The conceptual pattern of OS and OT is discovered using FCA. Given an ontology O: = (C, SC, P, SP, A), OS and OT are contextualized using FCA with respect to the formal context, KS and KT. The ontological concepts C are denoted as G (objects) and the rest of the ontology elements, SC, P, SP and A are denoted as M (attributes). The binary relation I ⊆ G x M of the formal context denotes the ontology elements, SC, P, SP and A corresponding to the ontological concepts C.

Step 2 : Pre-linguistic processing A similarity calculation, Levenshtein edit distance is applied to discover correlations between KS and KT attributes (ontological properties). The computed similarity value of ontological properties at or above threshold value is persisted in the context, else it appends into the context to reconcile the formal context, KS and KT. RKS and RKT are used as input for the next step.

35

Step 3 : Contextual clustering SOM and k-means are applied to discover semantics of ontological concepts based on the conceptual pattern discovered in the formal context, KS and KT. This process consists of two phases: (a) Modeling and training

Firstly, SOM is used to model the formal context RKS to discover the intrinsic relationship between ontological concepts of the source ontology OS. Subsequently, k-means clustering is applied on the learnt SOM to reduce the problem size of the SOM cluster to the most optimal number of k clusters based on the Davies-Bouldin validity index.

(b) Testing and prediction In this phase, new concepts from the target ontology OT are discovered by SOM's best-matching unit (BMU). SOM's BMU clusters the formal concepts RKT into its appropriate cluster without need for prior knowledge of internal ontological concepts.

Step 4 : Post-linguistic processing The clusters, which contain the target ontological concepts, are evaluated by Levenshtein edit distance to discover the semantic similarity between ontological concepts in the clusters. If the similarity value between the ontological concepts are at or above threshold value, the target ontological concepts are dropped from the context (since they are similar to the source ontological concepts) and the binary relations I ⊆ G x M are automatically updated in the formal context. Else, the target ontological concept is merged with the source ontology. Finally a compounded formal context is generated.

Output : Merged ontology in a concept lattice is formed. Mapping and Merging of Ontology OT to Ontology OS Mapping between source ontology OS and target ontology OT is needed prior to merging ontology OT with ontology OS. Mapping between ontological elements (ontological concepts or ontological properties) is required to resolve semantic overlaps between OS and OT. To perform ontology mapping between OS and OT, each ontological element in source ontology OS is examined against each ontological element in the target ontology OT. Hence, the mapping algorithm of OntoDNA runs in O(nm) time where n and m are the length of the source and target ontological elements. The structure and naming conventions of source ontology OS are preserved in the mapping and merging process. OntoDNA addresses two types of mapping: matched case and unmatched case (or non-exact-match). A matched case occurs in one-to-one mapping, where the source ontological element correlates with a target ontological element. Meanwhile, the unmatched case (or non-exact-match) exists when:

1. there is no semantic overlap; where a source ontological element has no correlation with any target ontological element (no mapping)

2. there are semantic overlaps between many elements, where a source ontological element has correlation with more than one target ontological element (one-to-many mapping)

In OntoDNA, simple mapping and complex mapping algorithms are adopted to map the target ontology OT to the source ontology OS. The simple mapping algorithm is implemented to handle one-to-one mapping and also in cases where there are no semantic overlaps. The simple mapping algorithm uses lexical similarity to perform mapping and merging between ontologies. The simple mapping algorithm is outlined as follows:

Given source ontological element OelementSi and target ontological element OelementTj, apply lexical similarity measure (LSM) to map the target ontology OT to the source ontology OS at threshold value, t, where elements i and j = 1, 2, 3, …, n. a) map (OelementTj OelementSi), if LSM(OelementSi, OelementTj) ≥ t; b) the target ontological element, OelementTj is mapped to (integrated with) the source ontological

element, OelementSi and the naming convention and structure of the source ontological element, OelementSi are preserved.

c) merge (OelementTj OS), if LSM(OelementSi, OelementTj) < t; d) the target ontological element, OelementTj is merged (appended) to the source ontology and the

naming convention and structure of the target ontological element, OelementTj are preserved. A complex mapping algorithm is used to handle one-to-many mapping, where multiple matches between the target ontological elements to a source ontological element are resolved based on relative frequency of instances of the ontological elements. The complex mapping algorithm is outlined as follows:

36

Given source ontological element OelementSi and its instances IelementSi and target ontological element OelementTj and its instances IelementTi, lexical similarity measure (LSM) and relative frequency of instances (fI) are applied to map the target ontology OT to the source ontology OS at threshold value, t, where ontological elements i and j = 1, 2, 3, …, n. a) map (OelementTj OelementSi), if LSM(OelementSi, OelementTj) ≥ t AND LSM(OelementSi, OelementTj+1) ≥ t,

where LSM(OelementSi, OelementTj) = LSM(OelementSi, OelementTj+1), if fI(IelementSi, IelementTj) > fI(IelementSi, IelementTj+1); the target ontological element, OelementTj is mapped to the source ontological element, OelementSi and the naming convention and structure of the source ontological element, OelementSi are preserved.

For example, given threshold value, t = 0.8, target ontological elements, B, C, D, source ontological element, X, map (X B), map (X C) and map (X D). If LSM = 0.8678 (above threshold value) and relative frequency, fI of ontological properties to each mapping is 7, 8 and 5 respectively, map C (highest frequency match) to the source ontological element, X.

Empirical Experiment The objective of this experiment is

a) to investigate which lexical measure, i.e. string matching, Wordnet or a combination of string matching and Wordnet will provide more accurate semantic similarity results. The Levenshtein edit distance measure is used for string matching (Cohen, Ravikumar & Fienberg, 2003) whereas the Leacock-Chodorow for Wordnet linguistic matching (Pedersen, Patwardhan & Michelizzi, 2004).

b) to identify the best threshold value for semantic similarity discovery to automate the ontology mapping and merging process.

The mapping results generated by OntoDNA are compared against human mapping to evaluate the precision of the system. For this experiment, we used a threshold value from 0.6 to 1.0 for the evaluation. Other comparative experimental results highlighting OntoDNA’s degree of accuracy are found in Kiu & Lee (in press). Data Sets Four pairs of ontologies were used for evaluation. These ontologies and the human mapping results can be obtained from http://www.aifb.uni-karlsruhe.de/WBS/meh/mapping/. The details of the paired ontologies are summarized in Table 3.

Pair 1, Pair 2 and Pair 3 are SWRC (Semantic Web Research Community) ontologies, which describe the domain of universities and research. SWRC1a contains about 300 entities including concepts, properties and instances. Meanwhile the three ontologies, SWRC1b, SWRC1c and SWRC1d are small ontologies, each of them containing about 20 entities.

Pair 4 ontologies describe Russia. Each ontology contains about 300 entities. The ontologies are created by students to represent the content of two independent travel websites about Russia.

Table 3. Ontologies used in the experiment

Experiment Ontology 1 Ontology 2 # Concepts # Properties # Total Manual Mapping

Pair 1 SWRC1a SWRC1b 62 143 205 9 Pair 2 SWRC1a SWRC1c 59 144 203 6 Pair 3 SWRC1a SWRC1d 60 143 203 3 Pair 4 Russian2a Russian2b 225 122 347 215

Evaluation Metrics Information retrieval metrics such as precision, recall and f-measure are used to evaluate the mapping result from OntoDNA against human mapping (Do, Melnik, & Rahm, 2002). The objective and formula for each of these metrics are indicated below:

37

Precision: measures the number of correct mapping found against the total number of retrieved mappings.

mappingsretrievedofnumbermappingfoundcorrectofnumber

precision = Eq. 1

Recall: measures the number of correct mapping found comparable to the total number of existing mappings.

mappingsexistingofnumbermappingfoundcorrectofnumber

recall = Eq. 2

F-measure: combines measure of precision and recall as single efficiency measure.

recallprecisionrecallxprecisionx2

measure-f+

= Eq. 3

Results and Discussion The result of each semantic similarity approach is presented in Table 4. In terms of recall and precision, Levenshtein edit distance measure yields better results against the Leacock-Chodorow similarity measure and the combined WordNet-string matching approach in discovering correlation between two candidate mappings. String matching using Levenshtein edit distance generates an accuracy rate of 93.33% comparable to the human experts’ assessment as shown in Figure 7.

Table 4. Comparison of lexical measures for 4 paired ontologies based on thresholds 0.6 to 0.1

Threshold t=0.6 t=0.7 t=0.8 t=0.9 t=1.0 t=0.6 t=0.7 t=0.8 t=0.9 t=1.0 t=0.6 t=0.7 t=0.8 t=0.9 t=1.0Precision 0.6667 0.8000 1.0000 1.0000 1.0000 0.3333 0.5714 0.6667 1.0000 1.0000 0.5000 0.5714 0.5714 1.0000 1.0000

Recall 0.4444 0.4444 0.2222 0.5556 0.2222 0.2222 0.4444 0.4444 0.2222 0.4444 0.4444 0.4444 0.4444 0.2222 0.2222F-Measure 0.5333 0.5714 0.3636 0.7143 0.3636 0.2667 0.5000 0.5333 0.3636 0.6154 0.4706 0.5000 0.5000 0.3636 0.3636


Recall 0.5000 0.3333 0.5000 0.5000 0.5000 0.5000 0.6667 0.5000 0.3333 0.6667 0.6667 0.6667 0.5000 0.5000 0.3333F-Measure 0.6000 0.4444 0.6000 0.6000 0.6000 0.6000 0.7273 0.6000 0.4444 0.7273 0.7273 0.7273 0.5455 0.6000 0.4444


Recall 0.6667 0.6667 0.3333 0.3333 0.3333 0.6667 0.6667 0.6667 0.3333 0.3333 0.6667 0.3333 0.6667 0.3333 0.6667F-Measure 0.5714 0.8000 0.5000 0.5000 0.5000 0.4000 0.4444 0.4000 0.5000 0.5000 0.4000 0.2222 0.4000 0.5000 0.8000


Recall 0.1767 0.1814 0.5395 0.5488 0.5721 0.0326 0.0279 0.0186 0.0465 0.0465 0.0465 0.0558 0.0372 0.1860 0.2512F-Measure 0.3004 0.3071 0.6967 0.7024 0.7214 0.0625 0.0538 0.0362 0.0885 0.0889 0.0889 0.1053 0.0717 0.3113 0.3985


Recall 0.4470 0.4065 0.3988 0.4844 0.4069 0.3554 0.4514 0.4074 0.2339 0.3727 0.4561 0.3751 0.4121 0.3104 0.3683F-Measure 0.5013 0.5307 0.5401 0.6292 0.5463 0.3323 0.4314 0.3924 0.3491 0.4829 0.4217 0.3887 0.3793 0.4437 0.5017

String Matching Measure (SM) WordNet Matching Measure (WN) Combined Measure (SM + WN)Pair 1 Pair 1 Pair 1

Pair 2 Pair 2 Pair 2



Average Average Average

0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

String Matching Measure WordNet Matching Measure Combined Measure

Precision Recall F-Measure

Figure 7. Average values of lexical measures

38

Generally, a threshold value of 0.8 or above improves OntoDNA’s precision and recall. Based on the individual ontology mapping performance, threshold value 0.8 provides the best measurement for ontology Pairs 1, 2 and 3 in terms of precision. However the recall values for the mapping are affirmative at the threshold value 0.7 as evidenced by ontology Pairs 1 and 3. The graphical representation of the experimental results for Levenshtein edit distance measure is shown in Figure 8, Figure 9 and Figure 10.

0.00

0.20

0.40

0.60

0.80

1.00

1.20

Pair 1 Pair 2 Pair 3 Pair 4

Threshold = 0.6 (a)

0.00

0.20

0.40

0.60

0.80

1.00

1.20


Threshold = 0.7 (b)

0.00

0.20

0.40

0.60

0.80

1.00

1.20


Threshold = 0.8 (c)

0.00

0.20

0.40

0.60

0.80

1.00

1.20


Threshold = 0.9 (d)

0.00

0.20

0.40

0.60

0.80

1.00

1.20


Threshold = 1.0

PrecisionRecallF-Measure

(e)

Figure 8. Precision, recall and f-measure of the paired ontologies at threshold value 0.6 to 1.0

0.00

0.20

0.40

0.60

0.80

1.00

1.20

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

Pair 1 Pair 2 Pair 3 Pair 4 Average

Precision Recall F-Measure

Figure 9. Average mapping results based on the threshold value

39

0%

10%

20%

30%

40%

50%

60%

70%

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

t=0.

6

t=0.

7

t=0.

8

t=0.

9

t=1.

0

Pair 1 Pair 2 Pair 3 Pair 4 Average

Mat

chin

g A

ccur

acy

Figure 10. Improvement in matching accuracy of mapped ontologies

The average mapping result for all four pairs (Figure 9) shows that the threshold value 0.8 generates the best precision and recall. It contributes towards improvement in the f-measure and matching accuracy in mapping (Figure 10). Therefore, threshold value 0.8 will be adopted to automate the ontology mapping and merging process. However, we desire to perform more experiments to justify the current threshold value for ontological domains other than that of the academic domain. OntoDNA and Ontology Mapping (An Integrated Approach) Comparison Evaluation Result The statistics for Ehrig and Sure’s (2004) ontology mapping on measures at cut-off using a neural net similarity strategy is extracted and compared with that of OntoDNA in terms of precision, recall and f-measure as shown in Table 5 below. The statistics indicate the best results obtained in terms of precision among metric measures and similarity strategies.

Table 5. Summarized of precision, recall and f-measure OntoDNA Ontology Mapping (Ehrig & Sure, 2004) Precision Recall F-Measure Precision Recall F-Measure SWRC 0.9167 0.4630 0.6048 0.7500 0.6667 0.7059 Russia2 0.9752 0.5488 0.7024 0.7763 0.2822 0.4140

OntoDNA provides better precision compared to Ehrig and Sure’s ontology mapping tool (Figure 11). OntoDNA shows significant improvement in terms of recall and f-measure for the Russia2 ontology. This is evidence that OntoDNA can effectively address structural complexities and different ontological semantics. It is also noted that precision and recall relationships are inverse; increase in precision tends to result in decrease in recall, and vice-verse (Soboroff, 2005). There is tradeoff between precision and recall. Hence, it is up to the designer to decide on a suitable level of tradeoff.

OntoDNA vs. Ontology Mapping (Ehrig & Sure, 2004)

0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

1.2000

SWRC Russia2 SWRC Russia 2 SWRC Russia2

precision recall f-measure

prec

isio

n, re

call

& f-

mea

sure

OntoDNA Ontology Mapping (Ehrig & Sure, 2004)

Figure 11. Comparison result between OntoDNA with Ehrig and Sure’s ontology mapping

40

Learning Object Interoperability in CogMoLab with OntoDNA CogMoLab is an integrated learning environment comprising of the OntoID authoring tool (Lee & Chong, 2005), OntoVis authoring/visualization tool (Lee & Lim, 2005), Merlin agent-assisted collaborative concept map (Lee & Kuan, 2005) and OntoDNA (formerly named as OntoShare) ontology mapping and merging tool (Lee & Kiu, 2005). In CogMoLab, the student model (Lee, in press) forms the base component to provide intelligent adaptation with the applications to offer interactive environments for supporting student-learning tasks and instructors’ designing tasks. Currently, we plan to use OntoDNA to enable retrieval of resources from external repositories to enrich resources in OntoID, Merlin and OntoVis (Figure 12).

Figure 12. General view of learning objects interoperability with OntoDNA

Figure 13. Visualization of concepts through the OntoVis

Concepts and instances in the OntoVis (Lim & Lee, 2005) are designed based on Formal Concept Analysis’ formal context. Currently, concepts and attributes are keyed in manually by the instructor into OntoVis’ formal context and visualized as shown in Figure 13. Hence, we also aim to enable visualization of the merged ontology through the OntoVis.

41

Conclusion This paper has described the OntoDNA framework for automated ontology mapping and merging and for dynamic update of the new ontological concepts in the existing knowledge base or database. The utilization of OntoDNA to interoperate ontologies for learning object retrieval and reuse from local and external learning object repository in CogMoLab has been explained. The merged ontology can be visualized through the OntoVis authoring/visualization tool in the form of a composited concept lattice. For future work, we will experiment with the discovery of instances from the ontological clusters through SOM-k-means to enable more efficient query. We also desire to evaluate the performance of the OntoDNA on other ontological domains in terms of matching accuracy and threshold value. In the ASEAN seminar on e-learning, participating representatives have agreed to share knowledge and expertise with regards to human resource development through Information and Communications Technologies (ICT). One of the means for sharing knowledge is through the establishment of an ASEAN repository of learning objects. We hope that the OntoDNA will be able to contribute towards interoperability among standards and schemas and enrich the teaching and learning process, especially with the sharing of various cultures, not only in ASEAN but also with communities of practice in other parts of the world. References Arroyo, S., Ding, Y., Lara, R., Stollberg, M., & Fensel, D. (2004). Semantic Web Languages - Strengths and Weakness. International Conference in Applied computing (IADIS04), Lisbon, Portugal. Cohen, W., Ravikumar, P., & Fienberg, S. (2003). A Comparison of String Distance Metrics for Name-matching tasks. In IIWeb Workshop held in conjunction with IJCAI, retrieved June 20, 2006 from http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf. de-Diego, R. (2001). Método de mezcla de catálogos electrónicos, Final Year Project, Facultad de Informática de la Universidad Politécnica de Madrid. Spain. Devedzic, V. (2003). Key issues in next-generation Web-based education. IEEE Transactions on Systems - Man, and Cybernetics, Part C, 33 (3), 339-349. Do, H., Melnik, S., & Rahm, E. (2002). Comparison of schema matching evaluations. In Proceedings of the 2nd International.Workshop on Web Databases (German Informatics Society). Ehrig, M., & Sure, Y. (2004). Ontology Mapping - An Integrated Approach. Lecture Notes in Computer Science, 3053, 76-91. Euzenat, J., Bach, T. L., Barrasa, J., Bouquet, P., Maynard, D., Stamou, G., Stuckenschmidt, H., Zaihrayeu, H., Hauswirth, M., Ehrig, M., Jarrar, M., Shvaiko, P., Dieng-Kuntz, R., Hernández, R. L., Tessaris, S., & Acker, S. V. (2004). D2.2.3: State of the art on current alignment techniques, retrieved April 5, 2006 from http://knowledgeweb.semanticweb.org/. Ganter, B., & Wille, R. (1997). Applied Lattice Theory: Formal Concept Analysis, retrieved June 10, 2006 from http://citeseer.ist.psu.edu/rd/0%2C292404%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/14648/http:zSzzSzwww.math.tu-dresden.dezSz%7EganterzSzconcept.pdf/ganter97applied.pdf. Gomez-Perez, A., Angele, J., Fernandez-Lopez, M., Christophides, V., Stutt, A., & Sure, Y. (2002). A survey on ontology tools, OntoWeb deliverable 1.3. Universidad Politecnia de Madrid. Kiu, C. C., & Lee, C. S. (2004). Discovering Ontological Semantics using FCA and SOM, In Proceedings of M2USIC2004, Cyberjaya, Malaysia, 7-8 October 2004. Kiu, C. C., & Lee, C. S. (2005). Discovering Ontological Semantics for Reuse and Sharing of Learning Objects in a Contextual Learning Environment. The 5th International Conference on Advanced Learning Technology (ICALT2005), July, 8-9 2005, Kaohsiung, Taiwan. Kiu, C. C., & Lee, C. S. (2006). A Data Mining Approach for Managing Shared Ontological Knowledge. The 6th International Conference on Advanced Learning Technology, July 5-7, 2006, Kerkrade, The Netherlands.

42

Klein, M. (2001). Combining and relating ontologies: an analysis of problems and solutions. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI-01), Workshop: Ontologies and Information Sharing, USA. Lee, C. S. (in press). Diagnostic, predicting and compositional modeling with data mining in integrated learning environments. Computers & Education, Elsevier. Lee, C. S., & Chong, H. R. (2005). Synergistic design considerations for continuing education: Refocusing on instructional design. WSEAS Transactions on Advances in Engineering Education, 2 (4), 294-304. Lee, C. S., & Kiu, C. C. (2005). A concept-based graphical –neural approach to ontological interoperability, WSEAS Transactions on Information Systems and Applications, 2 (6), 761-770. Lee, C. S., & Kuan, C. L. (2005). Intelligent scaffolds for collaborative concept mapping, WSEAS Transactions on Information Systems and Applications, 2 (8), 1157-1166. Lee, C. S., & Lim, W. C. (2005). Visualization for course modeling, navigation and retrieval: Towards contextual learning and cognitive affordance. WSEAS Transactions on Advances in Engineering Education, 2 (4), 347-355. Lim, W. C., & Lee, C. S. (2005). Knowledge discovery through composited visualization, navigation and retrieval. Lecture Notes in Artificial Intelligence, 3735, 376-378. McGuinness, D. L., Fikes, R., Rice, J., & Wilder, S. (2000). An environment for merging and testing large Ontologies. Proceeding 7th International Conference on Principles of Knowledge Representation and Reasoning (KR’2000), Breckenridge, Colorado, USA, 483-493. Mizoguchi, R., & Bordeau, J. (2000). Using ontological engineering to overcome AIED problems. Journal of Artificial Intelligence in Education, 11, 107-121. Mohan, P., & Greer. J. (2003). Reusable Learning Objects: Current Status and Future Directions. In D. Lassner & C. McNaught (Eds.) Proceedings of ED-MEDIA 2003 World Conference on Educational Multimedia, Hypermedia and Telecommunication. Honolulu, Hawaii, USA. Noy, N. F., & Musen, M. (2000). PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI’00), Austin, USA. Qin, J., & Finneran, C. (2002). Ontological representation for learning objects. In Proceedings of the Workshop on Documents Search Interface Design and Intelligent Access in Large-Scale Collections, JCDL'02, Portland, OR, July 18, 2002. Pedersen, T., Patwardhan, S., & Michelizzi J. (2004). WordNet: Similarity - Measuring the Relatedness of Concepts, In the Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), San Jose, CA. Ramos, J. A. (2001). Mezcla automática de ontologías y catálogos electrónicos. Final Year Project. Facultad de Informática de la Universidad Politécnica de Madrid. Spain. Soboroff, I. (2005). IR Evaluation, retrieved December 2, 2005 from http://www.csee.umbc.edu/~ian/irF02/lectures/09Evaluation.pdf. Sampson, D., Karagiannidis C., & Kinshuk (2002). Personalized Learning: Educational, Technology and Standardization Perpective. Interactive Educational Multimedia, 4, 24-39. Stumme, G., & Maedche, A. (2001). FCA–Merge: A Bottom–Up Approach for Merging Ontologies. In IJCAI ’01 - Proceedings of the 17th International Joint Conference on Artificial Intelligence, Morgan Kaufmann. USA. Verbert, K., Gasevic, D., Jovanovic, J., & Duval, E. (2005). Ontology-based Learning Content Repurposing. WWW2005, Chiba, Japan, May 10-14, 2005. Vesanto, J., & Alhoniemi. E. (2000). Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks, 11 (3), 586-600.