12
SOLVING DIFFERENT LANGUAGES PROBLEM (PORTUGUESE, ENGLISH and BAHASA INDONESIA) IN DIGITAL LIBRARY WITH ONTOLOGY Herlina JAYADIANTI a,b , Carlos Sousa PINTO b , Lukito Edi NUGROHO c , Paulus Insap SANTOSA d , Wahyu WIDAYAT e a, Universitas Gadjah Mada, Electrical Engineering and Information Technology, Yogyakarta, Indonesia, [email protected] b Minho University,Information System Department,Campus Azurém, Guimaraes, Portugal, [email protected] c,d Universitas Gadjah Mada, Electrical Engineering and Information Technology, Yogyakarta, Indonesia, {lukito, Insap}@mti.ugm.ac.id e Universitas Gadjah Mada, Faculty of Economic and Development, Yogyakarta, Indonesia, [email protected] ABSTRACT In this paper we will present in how digital library work for different language support, perhaps in a different repositories and in a different countries. Our works requires available collections in one metadata associated with each collection in another metadata and build a relation between each metadata in each repository. In this paper we will use three languages from three different repositories, such as Indonesia, English, and Portuguese. it is very important to make a connection between references in different languages (English, Portuguese and Indonesia) in a large metadata; this is aim of our work. Ambiguity, equivalences and semantics problem will appear in this situation, and we will try to solve this problem trough this work. Keywords: Ontology, Library, References, Different languages, Indonesia, English, Portuguese. 1 INTRODUCTION A digital library is a repository of digital documents of different files formats like.pdf, .doc, .ppt or even plain .txt which can be any journal, newspaper, books, magazines, instruction manuals, presentations and others publications. Nowadays Ontology is very important for making an efficient searching in digital library [1], [2], [3], [4]. Ontology based digital library should have the additional features of semantic based accessing / querying and searching the library using a reference ontology to reform the user query and extract only appropriate content from the library. In this section we will present a future face of digital library, it will work for different language support, perhaps in a different country. In summary, our works requires available collections in one library (references in English) associated with each collection in another library (references in Portuguese and Indonesia). Figure 1. Metadata Architecture 6-08 Solving Different Languages Problem (portuguese, English And Bahasa Indonesia) In Digital Library With Ontology 197

Solving Different Languages Problem (Portuguese, English

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Solving Different Languages Problem (Portuguese, English

SOLVING DIFFERENT LANGUAGES PROBLEM (PORTUGUESE,

ENGLISH and BAHASA INDONESIA) IN DIGITAL LIBRARY WITH

ONTOLOGY

Herlina JAYADIANTI a,b , Carlos Sousa PINTO b, Lukito Edi NUGROHO c, Paulus Insap SANTOSA d

, Wahyu

WIDAYAT e

a, Universitas Gadjah Mada, Electrical Engineering and Information Technology, Yogyakarta, Indonesia,

[email protected]

bMinho University,Information System Department,Campus Azurém, Guimaraes, Portugal, [email protected] c,d

Universitas Gadjah Mada, Electrical Engineering and Information Technology, Yogyakarta, Indonesia,

{lukito, Insap}@mti.ugm.ac.id eUniversitas Gadjah Mada, Faculty of Economic and Development,

Yogyakarta, Indonesia, [email protected]

ABSTRACT

In this paper we will present in how digital library work

for different language support, perhaps in a different

repositories and in a different countries. Our works

requires available collections in one metadata

associated with each collection in another metadata and

build a relation between each metadata in each

repository. In this paper we will use three languages

from three different repositories, such as Indonesia,

English, and Portuguese. it is very important to make a

connection between references in different languages

(English, Portuguese and Indonesia) in a large metadata; this is aim of our work. Ambiguity,

equivalences and semantics problem will appear in this

situation, and we will try to solve this problem trough

this work.

Keywords: Ontology, Library, References, Different

languages, Indonesia, English, Portuguese.

1 INTRODUCTION

A digital library is a repository of digital

documents of different files formats like.pdf, .doc, .ppt

or even plain .txt which can be any journal, newspaper,

books, magazines, instruction manuals, presentations

and others publications. Nowadays Ontology is very

important for making an efficient searching in digital

library [1], [2], [3], [4]. Ontology based digital library

should have the additional features of semantic based

accessing / querying and searching the library using a

reference ontology to reform the user query and extract

only appropriate content from the library. In this section we will present a future face of digital library, it

will work for different language support, perhaps in a

different country. In summary, our works requires

available collections in one library (references in

English) associated with each collection in another

library (references in Portuguese and Indonesia).

Figure 1. Metadata Architecture

6-08 Solving Different Languages Problem (portuguese, English And Bahasa Indonesia) In Digital Library With Ontology

197

Page 2: Solving Different Languages Problem (Portuguese, English

We give an illustration (Figure1) there are

three metadata repositories, in English, in Portuguese

and in Indonesian language. Our aim is to build a

relation between each metadata in each repository.

Indonesian reader often searching literatures or references in English and Indonesian language,

similarly, in country such as East Timor, people use

more than three languages to communicate –

Portuguese, English, Indonesia and local language.

Alongside Malay, Portuguese was the language that is

absorbed by the Indonesian language. We can say that

it is very important to make a connection between

references in different languages (English (En) –

Portuguese (pt) – Indonesia (ina)) in a large metadata.

The term “reference” is a relation between objects in

which one object connects or link to another object.

The first object in this relation is said to refer to the second object. The second object – the one to which the

first object refers – is called the referent of the first

object. As an example:

Book (1): Artificial intelligence a modern approach by

Author (1) Russell Norvig (first object) refers to:

Book (2): Intelligent machinery by Author (2) Alan

Turing (second object).

Figure 2. Case Study Library

Base on Figure 2 : Books (En) ≈ Livro (pt) ≈

Buku (Ina) and Title (En) ≈ Titulo (pt) ≈ Judul (Ina).

The book “Artificial Inteligence” ≈ The book

“Inteligência artificial” ≈ “Kecerdasan Buatan”. We

will describe it in more detail in section 4. To get

common English terms, we use terms from Wordnet.

WordNet1 is a large lexical database or electronic

dictionary for English. WordNet implements measure of similarity and relatedness among terms [5] [6].

Measures of similarity use information found in an is–a

1 http://wordnet.princeton.edu/

hierarchy of concepts, and quantify how much concept

A is similar to concept B.

2 SEMANTIC HETEROGENEITY

Semantic heterogeneity occurs when the same

reality, modeled by two or more people, does not have

the same model or representation [7], [8], [9]. In this

research we consider different conceptualizations (sets

of terms) about library that cause a semantic

heterogeneity problem. Section 1 (introduction)

describes the concept of library and the different

perceptions of it. Since the representations or models of library are independently developed, they often have

different structures, terminologies, or even

interpretations, representing an obstacle for semantic

interoperation of those models. Semantic heterogeneity

problem takes place on naming, scaling and

confounding [10], [11]. Semantic heterogeneity on

naming includes problems with synonyms (same

concept with different terms of concepts and their

properties, e.g. Education and school background) and

homonyms (same term with different semantics, e.g.

Worm as animal, as muscle under tongue and as infection in computers). Semantic heterogeneity

problem in confounding occurs when one concept can

refer different realities and has an effect on the attribute

values. For example, latestmeasuredtemperature

doesn’t refer one and the same instant.

3 ONTOLOGY INTEGRATIONS

Ontology consists of classes, data properties,

object properties, and instances. Instances are objects

which cannot be divided without losing their structural

and functional characteristics. Data properties and

object properties are related and operate among the

various objects populating the ontology. Ontology

integration is one way to solve the problem of semantic

heterogeneity and can be done using several

approaches. For example, merging, matching or

mapping. In our case, we decided to use mapping

process because with mapping we can find the similarities and correspondences between terms of the

ontologies. Mapping works with logical axioms,

typically expressing logical equivalence or inclusion

among ontology terms. The integration of ontologies

creates a new ontology by reusing other available

ontologies through assembling, extending, or

specializing operations. In integration processes the

source ontologies and the resultant ontology can have

different amounts of information [9].

The goal of ontology integration is to derive more

general domain ontology (common ontology) from

several other ontologies in the same domain, into a consistent unit. The domain of both the integrated and

The Proceedings of The 7th ICTS, Bali, May 15th-16th, 2013 (ISSN: 9772338185001)

198

Page 3: Solving Different Languages Problem (Portuguese, English

the resulting ontologies is the same. Figure 3 shows an

example with several source ontologies (Oen, Oina,

Opt) and the integrated common ontology (CO - Oen).

Figure 3. Integration of ontologies

Ontology integration process implies several steps.

Finding similarities and differences between ontologies

in an automatic and semi-automatic way;

Defining mappings between ontologies;

Developing an ontology integration architecture;

Composing mappings across different ontologies;

Representing uncertainty and imprecision in

mappings.

Particularly, in ontology integration, some tasks

should be performed to eliminate differences and

conflicts between those ontologies. The task lies at two

levels: language level and ontology level [12]. Base on

Figure 3 we can see that Ontology En (OEn) from

library which is majority of the books is with English

language literature integrated with repository in

Portuguese language in ontology Pt (Opt) and

repository in Indonesian language in Ontology Ina

(Oina). Importing process is one way to integrate

ontologies. When an ontology imports another

ontology, all the definitions about classes, properties

and individuals of the imported ontology becomes

available to the importing ontology. Here Ontology

English (Oen) is a common ontology because it

will use a more common term for English, Portuguese

and Indonesian people than Portuguese language or Indonesian language.We will use an Ontology Web

Language (OWL), a language to create ontologies for

the Web, we can implement the referred process of

importing. The code below describes how the

owl:imports mechanism works and how OWL

resolves the location of an ontology, given its URI. <owl:Ontology

rdf:about="http://www.semanticweb.org

/Oen.owl">

<owl:imports

rdf:resource="http://www.semanticweb.

org/Oina.owl"/>

<owl:imports

rdf:resource="http://www.semanticweb.

org/Opt.owl"/>

4 ACHIEVING A COMMON

ONTOLOGY

The importing process, as explained before, allows

us to obtain a new ontology, a Common Ontology (CO),

consisting of common terms. Common term is a

common word recognized and used by different sets of

people. In this project we use an English as a common term, because English language is more common than

Portuguese and Indonesia. Label : [language : Indonesia]

Class : Buku

Buku SubClassOf Koleksi

Label : [language : pt]

Class : Livro

Livro SubClassOf Coleções

Label : [language : en]

Class : Books

Books SubClassOf Collection

Buku (indonesia)≈ Books (en) ≈ Livro (pt)

Figure 4. Integrating Classes between Ontologies

Figure 4 shows the relationship scheme

between terms in the considered O’s and the common

terms in the CO. The class Editores (Opt), is the class

that represents publisher in Portuguese language, is

equivalent to class Penerbit (Oina), If the differences in

perception of this problem occur among a group of

people or human beings, they can easily communicate

with each other with help from translator or dictionary

and agree on a common understanding about their

different language, but what happens if we are talking

about communication between machines? Let’s suppose that we have three libraries and three

ontologies (as an example we can use ontology english

(oen) and ontology portuguese (opt) and ontology

indonesia (oina) describing their respective perceptions

about library (see Figure 4 and Figure 5). Three

ontologies (oen, opt and oina) have a class refer to

6-08 Solving Different Languages Problem (portuguese, English And Bahasa Indonesia) In Digital Library With Ontology

199

Page 4: Solving Different Languages Problem (Portuguese, English

“collection” which represents each collection record

(see figure 5).

Figure 5. Ontograph Visualization Between Class Books(Oen),

Buku(Oina) And Livro(Opt)

The three ontologies look rather different, the

information that they capture is roughly the same but

they use a different language, english laguange,

portuguese language and indonesian language. We can

say that class buku belonging to the ontology oina is

equivalence to the class livro in ontology opt. Class

buku (oina) and class livro (opt) represent the same

semantic value. Not only the classes are integrated, we

also integrate the dataproperties, objectproperties and

instances. Considering the same example we can say

that classes artikel(oina) and artigos (opt) are equivalent – they represent various article but have

different terms.

5 ONTOLOGY MAPPING

After the ontology integration process is

complete, the next step should be the execution of a

mapping process. The main purposes of mapping processes are to find similarities between the source

ontologies through logical axioms, logical equivalences

and inclusions among ontology terms. We can do

“mapping” between classes, properties and individuals.

Mapping can be done by using automatic,

semiautomatic or interactive reasoning. The results of

“mapping” are used with various purposes such as data

transformation, query answering and data integration.

According to Noy [13] there are four dimensions of

ontology mapping:

Mapping discovery: Given two ontologies, find

similarities between them and determine which concepts and properties represent similar notions;

Interactive specification of mapping: Use tools that

interactively allows to define and compare

ontologies and mappings with automatic or semi-

automatic help;

Use declarative formal representation of mapping;

Do reasoning with mapping.

For instance, we can say that the Class Book in

Figure 4-6 is equivalent to the Class Livro.

ObjectProperties Has_Written is equivalent to

ObjectProperties Menulis. As an important note that

ObjectProperties Has_Written exist in Oen and

ObjectProperties Menulis exist in Oina, but through

ontology mapping and importing process now we can

integrated them in one ontology.

Figure 6. Object Properties Equivalence - Owl: EquivalentProperty

In this section, we can see a testing result of ontologies mapping. We use SPARQL. Prefix :

<http://www.semanticweb.org/Oen.owl"#>

Prefix :

<http://www.semanticweb.org/OIna.owl"#>

Prefix :

<http://www.semanticweb.org/OPt.owl"#>

PREFIX rdf:

<http://www.w3.org/1999/02/22-rdf-syntax-

ns#>

PREFIX owl:

<http://www.w3.org/2002/07/owl#>

PREFIX xsd:

<http://www.w3.org/2001/XMLSchema#>

PREFIX rdfs:

<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?Books ?Authors

WHERE { ?Books :Written_by ?Authors.

?Authors :AuthorName ?Value

FILTER (?Value = 'Stuart Russell' )}

Figure 7. OntoGraph visualization between Class Books(Oen),

Buku(Oina) and Livro(Opt)

Base on Figure 7 we can see that references

“Inteligência artificial”from (Opt) is

equivalence to Artificial intelligence from (Oen) and

Kecerdasan Buatan from (Oina). So only with one

term “Book”, SELECT ?Books ?Authors, system

can understand what user want. System will

give a result not from Oen, book Artificial

intelligence but also will give a result from Opt and Oina.

Ontology mapping can be applied in various

domains not only library. We can use ontology

mapping process to help us to find semantic

correspondences between similar or different elements

of different ontologies in any domain. In this paper we

also focus in a SPARQL query process which can be

used to achieve interoperability in semantic information

The Proceedings of The 7th ICTS, Bali, May 15th-16th, 2013 (ISSN: 9772338185001)

200

Page 5: Solving Different Languages Problem (Portuguese, English

retrieval and/or knowledge discovery processes over

interconnected RDF data sources. Formal mappings

between different overlapping ontologies are exploited

in order to rewrite initial user SPARQL queries, so that

they can be evaluated over different RDF data sources on different places.

6 CONCLUSIONS

Considering to the services of reference in

different languages, it is should be a front concern

activity in many central libraries in the world. The

service of reference will give an affect of the library service it self. If the library can accommodate a

different semantic, different terms and different

language from different user query so it will be very

easy for user from different languages find references

related to their own knowledge and language.

7 ACKNOWLEDGEMENT

We would like to acknowledge the support of the

Erasmus Mundus EuroAsia program for the research

foundation of this research, and also to acknowledge

Universidade do Minho Portugal and Universitas

Gadjah Mada Yogyakarta Indonesia for the

collaboration.

8 REFERENCES

[1] A. Bénel, E. Egyed-Zsigmond, Y. Prié, S.

Calabretto, A. Mille, A. Iacovella, and J. M.

Pinon, “Truth in the digital library: From

ontological to hermeneutical systems,” Research

and Advanced Technology for Digital Libraries,

pp. 914–914, 2001.

[2] S. Buckingham Shum, E. Motta, and J. Domingue,

“ScholOnto: an ontology-based digital library

server for research documents and discourse,”

International Journal on Digital Libraries, vol. 3, no. 3, pp. 237–248, 2000.

[3] M. Doerr, J. Hunter, and C. Lagoze, “Towards a

core ontology for information integration,”

Journal of Digital information, vol. 4, no. 1, 2003.

[4] L. Rajput and S. Shyam, “Ontology based digital

library,” 2010. [Online]. Available:

http://dl.acm.org/citation.cfm?id=1742233.

[Accessed: 27-Jan-2013].

[5] C. Fellbaum, WordNet. Springer, 2010.

[6] T. Pedersen and V. Kolhatkar, “WordNet::

SenseRelate:: AllWords: a broad coverage word sense tagger that maximizes semantic relatedness,”

in Proceedings of Human Language Technologies:

The 2009 Annual Conference of the North

American Chapter of the Association for

Computational Linguistics, Companion Volume:

Demonstration Session, 2009, pp. 17–20.

[7] K. Janowicz, “The role of space and time for

knowledge organization on the semantic web,”

Semantic Web, vol. 1, no. 1, pp. 25–32, 2010.

[8] V. Morocho, F. Saltor, and L. Perez-Vidal,

“Ontologies: Solving semantic heterogeneity in a federated spatial database system,” in In

Proceedings of 5th International Conference on

Enterprise Information System, 2003.

[9] Y. Xue, “Ontological View-driven Semantic

Integration in Open Environments,” The

University of Western Ontario, 2010.

[10] I. Boukhari, L. Bellatreche, and S. Jean, “An

ontological pivot model to interoperate

heterogeneous user requirements,” in Leveraging

Applications of Formal Methods, Verification and

Validation. Applications and Case Studies,

Springer, 2012, pp. 344–358. [11] L. Bellatreche, G. Pierra, and E. Sardet,

“Evolution Management of Data Integration

Systems by the Means of Ontological Continuity

Principle,” in Recent Trends in Information Reuse

and Integration, Springer, 2012, pp. 77–96.

[12] N. F. Noy, M. Crubézy, R. W. Fergerson, H.

Knublauch, S. W. Tu, J. Vendetti, and M. A.

Musen, “Protégé-2000: An Open-Source

Ontology-Development and Knowledge-

Acquisition Environment: AMIA 2003 Open

Source Expo,” in AMIA Annual Symposium Proceedings, 2003, vol. 2003, p. 953.

[13] N. F. Noy, “Ontology mapping,” Handbook on

ontologies, pp. 573–590, 2009.

6-08 Solving Different Languages Problem (portuguese, English And Bahasa Indonesia) In Digital Library With Ontology

201

Page 6: Solving Different Languages Problem (Portuguese, English

The Proceedings of The 7th ICTS, Bali, May 15th-16th, 2013 (ISSN: 9772338185001)

[This page is intentionally left blank]

202

Page 7: Solving Different Languages Problem (Portuguese, English
Page 8: Solving Different Languages Problem (Portuguese, English
Page 9: Solving Different Languages Problem (Portuguese, English
Page 10: Solving Different Languages Problem (Portuguese, English
Page 11: Solving Different Languages Problem (Portuguese, English
Page 12: Solving Different Languages Problem (Portuguese, English