16
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011 121 OWLPath: An OWL Ontology-Guided Query Editor Rafael Valencia-García, Francisco García-Sánchez, Dagoberto Castellanos-Nieves, and Jesualdo Tomás Fernández-Breis Abstract—Most Semantic Web technology-based applications need users to have a deep background on the formal under- pinnings of ontology languages and some basic skills in these technologies. Generally, only experts in the field meet these requirements. In this paper, we present OWLPath, a natural language-query editor guided by multilanguage OWL-formatted ontologies. This application allows nonexpert users to easily create SPARQL queries that can be issued over most existing ontology storage systems. Our approach is a fully fledged solution backed with a proof-of-concept implementation and the empirical results of two challenging use cases: one in the domain of e-finance and the other in e-tourism. Index Terms—Natural language interfaces (NLIs), ontology lan- guages, query formulation, user interfaces. I. I NTRODUCTION T HE SEMANTIC Web [1] aims to extend the current Web standards and technology so that the semantics of the Web content is machine processable. However, the chicken-or-egg dilemma has accompanied the Semantic Web from its very conception: “Without substantial Semantic Web content, few tools will be written to consume it; without many such tools, there is little appeal to publish Semantic Web content” [2]. The knowledge representation technology used in the Semantic Web is the ontology, which formalizes such meaning and facilitates the search for contents and information [3]–[5], as well as improves crawling [6], [7]. Moreover, ontologies have become one of the main components in knowledge management [8], [9], e-learning [10], medical models [11], knowledge in diagnostic systems [12], and the Semantic Web. A further problem has been identified, the need for tools that help in accessing the already large number of ontology-based knowledge bases avail- able. In fact, researchers have noticed that “the casual user is typically overwhelmed by the formal logic of the Semantic Web[13]. This is due to the fact that users, in order to use ontologies, have to be familiar with the following [14]: 1) the ontology syntax (e.g., Resource Description Framework (RDF) [15] and Web Ontology Language (OWL) [16]); 2) some formal query language (e.g., SPARQL Protocol and RDF Query Language Manuscript received November 15, 2008; revised May 22, 2009 and October 8, 2009; accepted January 13, 2010. Date of publication June 3, 2010; date of current version November 10, 2010. This work was supported in part by the Spanish Ministry for Science and Education under Project TSI2007- 66575-C02-02 and in part by the Spanish Ministry for Industry, Tourism and Commerce under Projects TSI-020400-2009-127, TSI-0204000-2009-148, and TSI-020100-2009-263. This paper was recommended by Associate Editor M. Zhou. The authors are with the Department of Informatics and Systems, University of Murcia, 30100 Espinardo, Spain (e-mail:[email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2010.2048029 (SPARQL) [17]); and 3) the structure and vocabulary of the target ontology. Consequently, filling the gap between the end user and this mathematical-intensive background is fundamen- tal to allow the general public to exploit the advantages of the Semantic Web. The approach taken by most researchers to bridge this gap is the use of natural language interfaces (NLIs) [13], [14], [18]– [20]. NLIs aim to provide end users with a means to access knowledge in ontologies, hiding the formality of ontologies and query languages [21]. Thus, NLIs help users avoid the burden of learning any logic-based language, offering end users a familiar and intuitive way of query formulation. However, the realization of NLIs involves several difficulties, one of such problems being that of linguistic variability and ambiguity. In recent years, controlled natural language (CNL) [22], [23] has received much attention due to its ability to reduce ambiguity in natural language. CNLs are mainly char- acterized by two essential properties [24]: 1) their grammar is more restrictive than that of the general language, and 2) their vocabulary only contains a fraction of the words that are permissible in the general language. These restrictions aim at reducing or even eliminating both ambiguity and complexity. In this paper, we present OWLPath, a CNL-based NLI that assists users in designing their queries. OWLPath sug- gests to the user how to complete a query by combining the knowledge of two ontologies, namely, the question and the domain ontologies. The question ontology plays the role of a grammar, providing the basic syntactic structure for building sentences. The domain ontology characterizes the structure of the application-domain knowledge in terms of concepts and relationships. The system makes then suggestions based on the content of the question ontology and its relationships with the domain ontology. Once the user has finished formulating the natural language query, OWLPath transforms it into a SPARQL query and issues it to the ontology repository. In the end, the results of the query are shown back to the user. The rest of this paper is organized as follows. In Section II, related research projects and tools are analyzed. The archi- tecture of our ontology-guided query editor is described in Section III. In Section IV, the application of OWLPath in two different application domains, namely, the stock exchange domain and e-tourism, is examined. The experiments carried out for measuring the performance of OWLPath are presented in Section V. Finally, some conclusions and future work are put forward in Section VI. II. RELATED WORK In recent years, the utilization of NLIs and CNLs toward an effective human–computer interaction [25] has received much 1083-4427/$26.00 © 2010 IEEE

OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011 121

OWLPath: An OWL Ontology-Guided Query EditorRafael Valencia-García, Francisco García-Sánchez, Dagoberto Castellanos-Nieves, and

Jesualdo Tomás Fernández-Breis

Abstract—Most Semantic Web technology-based applicationsneed users to have a deep background on the formal under-pinnings of ontology languages and some basic skills in thesetechnologies. Generally, only experts in the field meet theserequirements. In this paper, we present OWLPath, a naturallanguage-query editor guided by multilanguage OWL-formattedontologies. This application allows nonexpert users to easily createSPARQL queries that can be issued over most existing ontologystorage systems. Our approach is a fully fledged solution backedwith a proof-of-concept implementation and the empirical resultsof two challenging use cases: one in the domain of e-finance andthe other in e-tourism.

Index Terms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces.

I. INTRODUCTION

THE SEMANTIC Web [1] aims to extend the current Webstandards and technology so that the semantics of the Web

content is machine processable. However, the chicken-or-eggdilemma has accompanied the Semantic Web from its veryconception: “Without substantial Semantic Web content, fewtools will be written to consume it; without many such tools,there is little appeal to publish Semantic Web content” [2]. Theknowledge representation technology used in the Semantic Webis the ontology, which formalizes such meaning and facilitatesthe search for contents and information [3]–[5], as well asimproves crawling [6], [7]. Moreover, ontologies have becomeone of the main components in knowledge management [8], [9],e-learning [10], medical models [11], knowledge in diagnosticsystems [12], and the Semantic Web. A further problem hasbeen identified, the need for tools that help in accessing thealready large number of ontology-based knowledge bases avail-able. In fact, researchers have noticed that “the casual user istypically overwhelmed by the formal logic of the Semantic Web”[13]. This is due to the fact that users, in order to use ontologies,have to be familiar with the following [14]: 1) the ontologysyntax (e.g., Resource Description Framework (RDF) [15] andWeb Ontology Language (OWL) [16]); 2) some formal querylanguage (e.g., SPARQL Protocol and RDF Query Language

Manuscript received November 15, 2008; revised May 22, 2009 andOctober 8, 2009; accepted January 13, 2010. Date of publication June 3, 2010;date of current version November 10, 2010. This work was supported in partby the Spanish Ministry for Science and Education under Project TSI2007-66575-C02-02 and in part by the Spanish Ministry for Industry, Tourismand Commerce under Projects TSI-020400-2009-127, TSI-0204000-2009-148,and TSI-020100-2009-263. This paper was recommended by Associate EditorM. Zhou.

The authors are with the Department of Informatics and Systems, Universityof Murcia, 30100 Espinardo, Spain (e-mail:[email protected]; [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCA.2010.2048029

(SPARQL) [17]); and 3) the structure and vocabulary of thetarget ontology. Consequently, filling the gap between the enduser and this mathematical-intensive background is fundamen-tal to allow the general public to exploit the advantages of theSemantic Web.

The approach taken by most researchers to bridge this gap isthe use of natural language interfaces (NLIs) [13], [14], [18]–[20]. NLIs aim to provide end users with a means to accessknowledge in ontologies, hiding the formality of ontologies andquery languages [21]. Thus, NLIs help users avoid the burden oflearning any logic-based language, offering end users a familiarand intuitive way of query formulation.

However, the realization of NLIs involves several difficulties,one of such problems being that of linguistic variability andambiguity. In recent years, controlled natural language (CNL)[22], [23] has received much attention due to its ability toreduce ambiguity in natural language. CNLs are mainly char-acterized by two essential properties [24]: 1) their grammaris more restrictive than that of the general language, and2) their vocabulary only contains a fraction of the words thatare permissible in the general language. These restrictions aimat reducing or even eliminating both ambiguity and complexity.

In this paper, we present OWLPath, a CNL-based NLIthat assists users in designing their queries. OWLPath sug-gests to the user how to complete a query by combining theknowledge of two ontologies, namely, the question and thedomain ontologies. The question ontology plays the role of agrammar, providing the basic syntactic structure for buildingsentences. The domain ontology characterizes the structure ofthe application-domain knowledge in terms of concepts andrelationships. The system makes then suggestions based onthe content of the question ontology and its relationships withthe domain ontology. Once the user has finished formulating thenatural language query, OWLPath transforms it into a SPARQLquery and issues it to the ontology repository. In the end, theresults of the query are shown back to the user.

The rest of this paper is organized as follows. In Section II,related research projects and tools are analyzed. The archi-tecture of our ontology-guided query editor is described inSection III. In Section IV, the application of OWLPath intwo different application domains, namely, the stock exchangedomain and e-tourism, is examined. The experiments carriedout for measuring the performance of OWLPath are presentedin Section V. Finally, some conclusions and future work are putforward in Section VI.

II. RELATED WORK

In recent years, the utilization of NLIs and CNLs toward aneffective human–computer interaction [25] has received much

1083-4427/$26.00 © 2010 IEEE

Page 2: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

122 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

attention in the context of the Semantic Web. Several platformshave been developed to function as either natural languageontology editors or natural language query systems. Two goodexamples in the first category are CNL editor [18] (formerlyOntoPath [20]) and Guided Input Natural Language OntologyEditor (GINO) [13]. OntoPath is, in fact, situated in the frontierbetween these two categories because it manages and createsRDF ontologies, and it is also capable of defining queriesfrom natural language sentences. It is composed of three maincomponents in a layered architecture: “OntoPath-Syntax” inthe syntax layer, “OntoPath-Object” in the object layer, and“OntoPath-Semantic” in the semantic layer. In the upper layer,a knowledge engineer and a domain expert can work togetherto define the domain ontology by using “OntoPath-Semantic.”With this tool, it is possible to build a new ontology or edit apreviously existing one. After defining a set of concepts andtheir corresponding relationships, the system returns the ontol-ogy in an RDF file. In the next layer, “OntoPath-Object” assistsdomain experts, which have no knowledge of ontologies, ingraphically expressing natural language descriptions by usingnodes and arcs that correspond to the elements in the ontology.This graphical description is then stored as RDF triples. Finally,in the lower layer, “OntoPath-Syntax” guides users in the querygeneration process through a simple visual interface. The queryis formed from the knowledge available in an ontology and istranslated into RDF.

The ontology-based CNL editor extends OntoPath in pro-viding a context-free grammar with lexical dependence fordefining grammars. By using defined grammars, the CNL editorenables the system to get structured data from writer’s narra-tives with sophisticated, patternized, and informal expressions.With all, the editor provides guidance on proper choice of wordsand translates the results into RDF triples. The architectureof the CNL editor consists of five components: an interface,through which the system recommends proper next words tothe writer; a parser, which processes an incoming sentence anddetermines the dependences; a predictor, which examines therelations in the domain ontology to make a recommendation;a lexicon pool, which sends the candidate next words to theinterface; and a triple generator, which generates RDF tripleswhen the sentence is completed.

GINO allows users to edit and query any OWL knowledgebase using a guided input natural language akin to English.The user inputs a query or sentence into a free form text field,and based on the grammar, the system’s incremental parseroffers the possible completions of the user’s entry by presentingthe user with choice pop-up boxes. These pop-up menus offersuggestions on how to complete a current word or what the nextword might be. The GINO architecture consists of four parts:a grammar compiler, which generates the necessary dynamicgrammar rules to extend the static part of the grammar; a par-tially dynamically generated multilevel grammar, which is usedto specify the complete set of parsable questions/sentences andto construct the SPARQL statements from entered sentences;an incremental parser, which maintains an in-memory structurerepresenting all possible parse paths of the currently enteredsequence of characters; and an ontology-access layer, which isimplemented with Jena [26].

Fig. 1. OWLPath architecture.

In the category of natural language query systems, PortablenAtural laNguage inTerface to Ontologies (PANTO) [14] andNLP-Reduce [19] are two representative examples. PANTO isa system that takes ontologies and natural language queries asinput and whose output is a series of SPARQL queries. Whenan ontology is selected as the underlying knowledge base,PANTO uses the so-called “Lexicon Builder” to automaticallyextract entities out of the ontology in order to build a lexicon.This lexicon is used to make sense of the words that appearin a natural language query. Once the user has entered anatural language query, PANTO produces a parse tree whichis then translated into SPARQL. NLP-Reduce, on the otherhand, is a domain-independent NLI for querying SemanticWeb knowledge bases. Its architecture consists of five parts:an interface, which allows user to enter full natural languagequeries, sentence fragments, or just keywords; a lexicon, whichis automatically built by extracting all explicit and inferredsubject–property-object triples that exist in the knowledge base;an input query processor, which reduces a query by removingstop words and punctuation marks; a SPARQL query gener-ator, which generates SPARQL queries from the input text;and an ontology access layer, which uses Jena and the Pelletreasoner [27].

In [21], other similar approaches are examined, and the use-fulness of NLIs is analyzed. The authors came to the conclusionthat “casual end users” strongly prefer querying using full sen-tences rather than keywords or any other means. In [23], severalrelated systems are analyzed, and the exploitation of NLIs in arange of capabilities (e.g., the authoring of knowledge content,the retrieval of information from semantic repositories, and thegeneration of natural language texts from formal ontologies)is reviewed. In this report, the idea that CNLs could replaceconventional Semantic Web ontologies was also explored butfinally dismissed.

III. OWLPATH EDITOR

The global architecture of the proposed system is shownin Fig. 1. OWLPath is composed of five main components:the “Ajax interface,” the “Suggester,” the “Grammar checker,”

Page 3: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 123

Fig. 2. Sample interface.

the “SPARQL generator,” and the “Ontology manager.” It alsohas access to a knowledge base stored in an external ontologyrepository. In a nutshell, the system works as follows. The setof ontologies that conforms the underlying knowledge baseis loaded when the application is launched. Thereafter, usersinteract with the system through the “Ajax interface.” End usersbuild the query by selecting the desired terms from the listprovided in a pop-up. Entries that are not in that pop-up areungrammatical and, therefore, not accepted by the system. Thislist of terms is generated by the “Suggester,” which makes useof the “Grammar checker” to determine the potential next termsby combining the knowledge of both question and domainontologies. To this end, the terms that are currently part ofthe query are considered. The question ontology comprises thegrammar for all formulated queries, and the domain ontologycontains the relevant knowledge about the application domain.Once the query is completed, the “SPARQL generator” trans-lates it into SPARQL statements and issues the resulting queryto the knowledge base through the “Ontology manager.” Theresults of the query are finally shown back to the user.

Next, the components that form part of the system are de-scribed in detail, and the control flow in OWLPath is illustrated.

A. Ajax Interface

This component constitutes the input text interface for theuser. Through this interface, the system shows users the mostappropriate terms that can follow in the elaboration of a sen-tence. The candidate terms are generated by the “Suggester” inaccordance with a number of variables: the grammar defined inthe question ontology, the imported ontologies, and the wordsthat precede the new term in the query. The candidate termsare then presented to the user in a pop-up box (see Fig. 2).End users can navigate the pop-up and choose a highlightedproposed option. As stated before, entries that are not in thepop-up list are ungrammatical and not accepted by the system.In this way, OWLPath guides the user through a set of possiblesentences, preventing the construction of nonvalid statements.

OWLPath provides a Web interface to the user based onAjax technology. Ajax allows an easy implementation of so-phisticated functions by using Web standards, thus becominga real alternative for building powerful Web applications. Ajaxuses asynchronous requests, allowing the client’s Web browserinterface to be more interactive and to respond quickly toinputs, so improving the user experience. First, the precedingwords are collected, and an HTTP request object is created.Second, the operation to be executed is set, and the parametersare indicated. Third, the request is sent to the server. Finally, theusers see the pop-up boxes when the results are received.

B. Suggester

The main objective of this component is to determine whichwords can be inserted next in a sentence. Each user input gener-ates a navigation action in the ontology, so all the possibilitiesare explored by this component. From the linguistic perspec-tive, this means to provide the list of linguistic expressionsthat might be used in a particular context. This context canbe summarized by the current node in the navigated ontology.Then, the “Suggester” processes the user input by consideringthe semantic relations defined in the domain ontologies and thegrammar implicitly contained in the question ontology. Hence,it generates the tree of all the possible grammatical optionsfor the current input. The list of candidate expressions is thenshown to the end user through the Ajax interface.

Let us suppose that the current user input is “View anyCOMMODITY has_quoted_price_in.” “COMMODITY” is aclass of the ontology which has an object property called“has_quoted_price_in.” This property relates that concept withthe concept “COMMODITY_MARKET” (see Fig. 3). Giventhis input, the “Suggester” would look for all the possibilitiesfor the “COMMODITY_MARKET” class using the “Gram-mar checker.” In this case, the subclasses of “COMMOD-ITY_MARKET” and the instances of such subclasses areshown in a tree. For example, “BMF,” “CME,” and “NYMEX”are “American_commodity_markets.”

Two methods, namely, showAssist and chooseSelection,control the behavior of the “Suggester” (see Fig. 10). Thefirst one is responsible for getting, with the assistance of the“Grammar checker,” the list of possible entries that will bereturned to the user. This method returns an XML documentwith the list of terms, which is shown to the end user through“Ajax interface.” The latter method is in charge of storingthe terms chosen by the user in each iteration. Not only theselected term itself is stored but also the underlying RDFtriples, which will be fundamental for translating the naturallanguage sentence into a SPARQL query (as described inSection III-D). In the example shown in Fig. 2, the followingRDF triples would be obtained so far: 1) ?subject rdf:type:?type FILTER(?type =: COMMODITY) (the user is lookingfor individuals of the class COMMODITY) and 2) ?subject?cond ?object FILTER (?cond =: has_quoted_price_in) (theuser is looking for entities related to any entity through theobject property :has_quoted_price_in).

C. Grammar Checker

The main objective of the “Grammar checker” is to verifythe grammatical correctness of the sentences generated. Thiscomponent is also responsible for sending the list of the pos-sible next entries to the “Suggester.” As shown in Section II,most guided input systems (e.g., GINO [13] and Ontopath [20])are based on Backus-Naur-Form (BNF) grammars to define thegrammatically correct sentences. These grammars are mainlybased on rules containing references to classes or instances ina domain ontology. In our work, the grammar of the systemis represented by means of an OWL ontology, namely, thequestion ontology. The question ontology imports ontologicalelements from the domain ontologies that the system has to

Page 4: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

124 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 3. Design of a sample question ontology.

take into account for building the query. Given that domainontologies can contain many different ontological elements(classes, data-type properties, object properties, instances, etc.),it is desirable to restrict the elements that can be part of thequeries.

1) Question Ontology: This is the core of this module.It represents the grammar and models the queries that aregrammatically correct by importing relevant elements fromthe domain ontologies available in the knowledge base. Thisontology is implemented using the description logics (DL)variant of the OWL ontology language. Thus, the expressivepower of the grammar, and so of the input query language, isonly constrained by the expressivity of OWL-DL. In Fig. 3,an example of a question ontology defined in Protégé [28]is shown. It is possible to observe that new classes can bedefined and that the classes that form part of the domainontologies can be imported. In that figure, we show a simpleexample of the question ontology and its relationships witha domain ontology. The class named “Query” represents allpossible queries that can be done about the referred concept.In this example, we could ask the system about “Assets.” Inthe domain ontology, the concept “Assets” is related to “Fi-nancial Market” through the relation “has_quoted_price_in.”The “Grammar checker” will take that relationship into ac-count to allow queries such as the one shown in Fig. 2, i.e.,“View any COMMODITY has_quoted_price_in BMF,” andothers like “News about COMMODITY has_quoted_price_inAmerican_commodity_market.”

Defining the grammar using an OWL ontology has two mainadvantages over traditional NLI tools and state-of-the-art CNLsystems.

1) It allows the use of reasoners such as Pellet [27] or Fact++[29] for consistency checking and computing inferredtypes.

2) It is possible to include restrictions in the properties ofthe question ontology to restrict their range, i.e., thepossible values that can be assigned through the ontology,its cardinality, and so on. Thus, the grammar can takeinto account these restrictions while the sentence is beingentered.

Reasoning is a research area intensively investigated in thelast few years. Most of the techniques and inference engines de-veloped for dealing with Semantic Web data focus on either rea-soning over instances of an ontology or reasoning over ontologyschemata (DL reasoning). By reasoning over instances of anontology, it is possible to, for example, derive a certain valuefor an attribute applied to an object. These inference servicesare the equivalent of SQL query engines in databases, but theyprovide more “intelligent” support such as handling recursiverules. On the other hand, reasoning over concepts of an ontol-ogy makes it possible to automatically derive the correct hier-archical location of a new concept or to detect inconsistencies.

Regarding this last issue, the “Grammar checker” can deal,for example, with “allValuesFrom” constraints to restrict therange of an object property. In Fig. 3, the class “ASSETS” is

Page 5: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 125

related to the class “FINANCIAL_MARKET” through the objectproperty “has_quoted_price_in.” The class “COMMODITY,”which is a subclass of “ASSETS,” inherits this property. How-ever, a restriction has been introduced to represent the factthat the instances of the class “COMMODITY” can onlybe related to instances of the class “COMODITY_MARKET”(subclass of “FINANCIAL_MARKET”). Thus, when we se-lect the class “COMMODITY” in the query and also the ob-ject property “has_quoted_price_in,” this module processesthe restriction and informs the “Suggester” that only the“COMMODITY_MARKET” class, its subclasses, and individ-uals must be shown. In this case, the rest of branches descend-ing from the “FINANCIAL_MARKET” taxonomy are ignored.

The “Grammar checker” can also deal with the differentprimitive data types (Boolean, float, int, string, date, dateTime,and time) defined in the range of the datat-ype properties of theclasses contained in the question and domain ontologies (seeexample in Section IV-B).

Obtaining a domain ontology and customizing it to meetthe requirements of a particular application domain can bean arduous but achievable task. In contrast, developing thequestion ontology that defines the permissible grammar in acertain domain is quite tricky, particularly for those softwaredevelopers who are not familiar with Semantic Web technolo-gies. The methodology that has been adopted to implement thequestion ontology involves the following steps.

1) Analysis of a representative set of queries that could beposed to the knowledge base and have to be supported bythe system.

2) Determination of what elements in the queries are partof the domain knowledge and what elements must berepresented in the question ontology.

3) Selection of the ontological elements (i.e., classes, objectproperties, data-type properties, etc.) to be included inthe question ontology. Such elements correspond with therelevant items in the sentences analyzed.

4) Identification of the appropriate linkage points betweenthe question ontology and the domain ontologies.

D. SPARQL Generator

Once the query has been fully defined, OWLPath has totranslate it into SPARQL statements. The “SPARQL generator”is the component responsible for transforming natural languagequeries into SPARQL. The input to this component is thefinal sentence submitted by the user. This sentence not only iscomposed of the sentence words but also contains the triplesused by the “Suggester” to provide the optional entries.

The “SPARQL generator” has been implemented as a servlet.First, the component retrieves the whole query in its currentstatus. As it was pointed out before, the query is representedas a set of triples. These are processed, and the methodcreateSparql is invoked to generate the SPARQL query(see Fig. 10). Finally, the system returns both the SPARQL-formatted query and the results of the query.

Fig. 4 shows the final SPARQL query for a given naturallanguage sentence. Such queries are generated using the fol-lowing process. First, the prefixes necessary for the query areincluded. Then, the “SELECT” section is assembled by taking

Fig. 4. SPARQL query for a sample natural language query.

Fig. 5. First suggestion to be provided.

into account the identified subject. Finally, the triples and theconstraints defined by the user are introduced in the “WHERE”section of the query through “FILTER” statements.

Given that the triples are directly extracted from the schemeof the domain ontologies in the repository, there is no needfor further investigation (e.g., synonym search). The SPARQLquery is issued to the knowledge base through the “Ontologymanager,” and the results are sent back to the “Ajax interface.”

1) From Natural Language Sentences to SPARQL Queries:A simple example will be discussed next to illustrate theprocess. To enable the beginning of the query formulationprocess, a number of classes in the question ontology aremarked as root elements. These are the first elements that areproposed to end users when the application starts. As shownin Fig. 5, the labels that are shown to the user are associatedwith the uniform resource identifier (URI) of the correspondingontology element. Thus, when the final user selects an elementfrom the choice pop-up box, the system stores the URI of suchelement. In our example, only one root element is available inthe question ontology.

Once the system has successfully stored the chosen element,the system generates a new list of suggested elements based onthe previous selection (see Fig. 6). When this process ends, the“Selected elements” list contains all the elements that have beenselected by the user during the design of the query (see Fig. 7).This list, which actually represents a set of triples, is the inputof the “SPARQL generator.”

The query is always issued against the domain ontology.Thus, the first part of the SPARQL statement, which containsthe prefixes, is predefined. In the example under question, it isas follows:

PREFIX : 〈http : //sonar.um.es/finance#〉PREFIX rdf : 〈http : //www.w3.org/1999/02/22-

rdf-syntax-ns#〉

Page 6: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

126 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 6. Recording the choice and generating new suggestions

Fig. 7. Triple representation of the selected elements.

Fig. 8. Determining the subject of the query.

The system also assumes that the subject of the user query isthe object in the first triple (each triple being given in the formsubject–predicate-object). As a result, the “SELECT” clause ofthe statement and the first part of the “WHERE” section aregenerated from the first triple as shown in Fig. 8.

The remaining triples in the “Selected elements” list are usedfor defining further constraints over the subject of the query (seeFig. 9).

E. Ontology Repository

The “Ontology repository” represents the knowledge base ofthe system. It includes the domain ontologies that are going tobe queried by the end user through our tool. The repository cancontain several domain ontologies that will be imported by thequestion ontology as described before. This ontology repositorycan also be accessed through SPARQL queries to retrieve the

Fig. 9. Adding constraints to the SPARQL statement.

instances that meet the corresponding criteria. To facilitate theevaluation of our approach (see Section IV), this repositoryhas been implemented using the persistence capabilities of theJena Semantic Web Framework [26] over a MySQL database.In fact, it is possible to use any OWL repository that supportsSPARQL queries. Depending on the ontology repository to ac-cess, a customized implementation of the “Ontology manager”(see Section III-F) interface must be developed.

1) Domain Ontologies: As it can be seen at the bottom ofFig. 1, the “Ontology repository” can contain several domain-related ontologies. These ontologies comprise the relevantknowledge about the application domain in which OWLPathis going to be applied, and will be basic for both query formu-lation and execution processes, as explained in Section III-Cand D. This approach makes OWLPath easily portable anddomain independent.

F. Ontology Manager

This component allows the system to access the ontologyrepository through Jena. Strictly speaking, it provides an in-terface that can be implemented using Jena, the OWLAPI[30], Web services, and so on. It is capable of loading theontologies needed by the system, and this includes the domainand question ontologies. The “Ontology manager” also allowsOWLPath to handle distributed heterogeneous ontology repos-itories, providing an OWL interface for them.

This component is therefore responsible for providing the“Grammar checker” and the “SPARQL generator” with allthe required ontological contents. The “Suggester” sends theoptions to the “Grammar checker,” and then, the “Ontologymanager” provides the information for checking the correctnessof the question ontology. Furthermore, when the user completesthe query, and this has to be executed, the “Ontology manager”processes and executes the SPARQL query in the ontologyrepository.

1) Facing Knowledge Base Inconsistency: Additionally, the“Ontology manager” handles inconsistencies at two differentlevels. On the one hand, the ontologies included in the repos-itory have to be internally consistent, and this is verified byusing reasoners such as Pellet [27] or Fact++ [29], whichare capable of performing consistency checking of ontologymodels. The currently implemented prototype employed in the

Page 7: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 127

Fig. 10. Unified modeling language (UML) sequence diagram.

use-case scenarios of Section IV uses Pellet. However, thisis not enough to guarantee the consistency of the knowledgebase, since the consistency across the set of ontologies has tobe kept. In fact, the question ontology can refer to differentontologies, which are potential sources of inconsistencies. Thisissue is handled by the “Ontology manager” in the context ofthe question ontology rather than for the whole knowledge base.

When the question ontology is modified, the “Ontologymanager” checks for the consistency of the knowledge thatcan be accessed through it. For this purpose, the ontology in-consistency management framework developed in our researchgroup (see [10]) was used. This framework detects inconsisten-cies based on the properties associated to the concepts of theontologies. Thus, if two ontologies are considered inconsistent,such framework is capable of modifying them and making themconsistent. Consequently, a new version of the ontologies isproduced, stored in the repository, and used by the questionontology. Therefore, some ontologies used by the questionontology are the original versions, and some are amended. Itshould be pointed out that these new versions become invalidwhen the question ontology is modified, because the wholeconsistency checking process has to be executed again.

G. Control Flow

In Fig. 10, the execution flow of the application is shown.There are two clearly differentiated stages: 1) query formulationand 2) query execution. During the first phase, end usersinteract with the application through the “Ajax interface.” Inparticular, the mission of the user is to decide what term,from the candidates pop-up list, is put next in the sentence.Once the user has chosen such term, the “Ajax interface” asksthe “Suggester” for the next possible entries. For this, the“Suggester” must interact with the “Grammar checker,” whichhas access to the grammar, implicitly defined in the questionontology, and the semantic relations in the domain ontologies.With this information, the “Grammar checker” generates a listwith all the possible candidates and sends it to the “Suggester.”The “Suggester” translates the tree-formatted candidates listinto XML and returns it to the “Ajax interface,” which showsall the system suggestions in a choice pop-up box.

When the user finishes formulating the query and pressesthe submit button, the second phase starts. During the secondphase, the “Ajax interface” only interacts with the “SPARQLgenerator” component, which is responsible for the follow-ing: 1) translating the natural language query into SPARQL

Page 8: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

128 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 11. Excerpt of the e-tourism ontology.

statements and 2) issuing the query to the knowledge base.This latter step is done with the participation of the “Ontologymanager,” which is the only component capable of managingthe ontology repository. When the results have been retrievedfrom the knowledge base, both the issued SPARQL-formattedquery and the results are shown back to the user.

IV. USE-CASE SCENARIO

The application implemented has been tested in two use-case scenarios involving two different application domains: e-tourism and e-finance. In order to customize the system for aparticular domain, the question and domain ontologies haveto be defined, because they constitute the input of our CNLapproach. Next, the way OWLPath behaves in each experimentis described.

A. OWLPath in E-Tourism

Motivated by the new advances and trends in informationtechnologies, an increasing number of tourism operators offertheir products and services to their customers through onlineWeb services. Similarly, regional and local administrations pub-lish tourism-related information (e.g., places of interest, hotels

and restaurants, festivals, etc.) in world-accessible Web sites.Hence, the tourism industry is becoming information intensive,and both the market and information source heterogeneities aregenerating several problems to users, because finding the rightinformation is becoming rather difficult. The Semantic Webenables better machine information processing, by structuringWeb documents, such as tourism-related information, thus be-coming understandable by machines. In this sense, ontologiesprovide for a formal and structured knowledge representationschema that is reusable and shareable.

Thus, the successful application of these technologies highlydepends on the availability of tourism ontologies, which wouldprovide a standardized vocabulary and a semantic context.In the last few years, several tourism-related ontologies havebeen developed (see [31]–[33]). By considering the short-comings of developing a new ontology from scratch, wehave reused the ontology for e-tourism developed by Protégé[33], adding new restaurant-related classes and other prop-erties from the OnTour ontology [31]. As a result, we haveobtained an OWL ontology that contains all the touristic in-formation that will be used by the system. An excerpt ofthe tourism ontology is shown in Fig. 11. There, the class“ACCOMMODATION” is related to “DESTINATION” bymeans of the object property “Is_located_in.” Moreover, the

Page 9: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 129

Fig. 12. Excerpt of the question ontology for the tourism domain.

following are examples of accommodation instances: “Copaca-bana beach resort,” “Rio buzios beach resort,” “Le MeridienPhuket,” and “Katathani Phuket.” The first two accommoda-tions are located in “Rio de Janerio,” while the latter twoinstances are located in “Phuket.” Thus, if we query the systemfor accommodations in Phuket, at least, these two instancesmust be returned.

On the other hand, the question ontology has been designedfor querying information about accommodation, restaurants,and sightseeing places with some particular features suchas activities, services, dishes, and so on. A partial view ofthis ontology is shown in Fig. 12. The question ontologyonly contains two elements: the class “QUERY” and theobject property “Any.” This latter element imports the class“ACCOMMODATION,” giving access to all the ontologicalknowledge associated to this class in the domain ontology.For instance, the referred question ontology makes it possibleto query for instances of “ACCOMMODATION” that satisfycertain criteria, such as located in a particular place (e.g., Rio deJaneiro) or providing specific activities (e.g., golf). Notice thatthe ontology has two “allValuesFrom” constraints to restrict therange of the “Is_located_in” object property: The location of allthe instances of the “BEACH_RESORT” class must belong tothe class “BEACH,” while the instances in the “SKI_RESORT”

Fig. 13. Guided input in the e-tourism domain.

class must be related to destinations belonging to the class“MOUNTAIN.”

Once the domain and question ontologies have been cus-tomized to meet the needs of the tourism domain, OWLPathcan be used to answer queries in this application domain.In Fig. 13, the design of a query in this domain is shown.There, the user is looking for resorts located in a particulardestination. The system provides the user with the set ofall the possible destinations stored in the knowledge base,thus assisting the user in the query formulation process. An-other related example is shown in Fig. 14. In this case, the

Page 10: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

130 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 14. Guided input in the e-tourism domain. The location of beach resortsis restricted to beaches.

Fig. 15. SPARQL query for the natural language query.

Fig. 16. Query results in the e-tourism domain.

user focuses on resorts belonging to the “BEACH_RESORT”class. Consequently, when the “Is_located_in” link is selected,the system shows only the “BEACH” class, its subclasses,and individuals, taking into account the aforementionedrestriction.

When the query has been fully assembled, the user launchesthe execution process. The “SPARQL generator” translates thenatural language query into an SPARQL sentence (see Fig. 15),and this query is issued to the knowledge base through the“Ontology manager.” In Fig. 16, the results of the user query

are shown. Both the generated SPARQL query and the instancesretrieved from the knowledge base are presented.

B. OWLPath in E-Finance

The need to manage financial data has been coming intoincreasingly sharp focus for some time. Years ago, these data satin silos attached to specific applications in banks and financialcompanies. Then, the Web came into the arena, generatingthe availability of diverse data sets across applications, depart-ments, and other financial entities. However, throughout thesedevelopments, a particular underlying problem has remainedunsolved: Data reside in thousands of incompatible formatsand cannot be systematically managed, integrated, unified, orcleansed. To make matters worse, this incompatibility is notonly limited to the use of different data technologies or to themultiple different “flavors” of each technology (for example,the different relational databases in existence), but also affectsthe semantics. The Semantic Web aims to overcome theseshortcomings by representing the relevant knowledge for eachapplication domain by means of ontologies.

Therefore, we designed an ontology containing all the rel-evant concepts and relationships in the stock market domain,based on the financial ontology presented in [34]. Then, theontology was populated by using (semi)automatic techniquesfrom the “Ontology Population” field as described in [35].In particular, several Web-based stock-exchange-related infor-mation sources were accessed, and their contents were se-mantically enriched, so creating the knowledge base, whichis periodically updated. In order to manage and query thisknowledge base, the Jena library [26] was used to implementthe “Ontology manager” interface.

The domain ontology was represented also using OWL-DL [16]. This ontology, which is partially shown in Fig. 17,is mainly a taxonomy, and each class is defined through thefollowing elements: name, synonyms, properties inherited fromthe taxonomic parent classes, specific properties, original datasource, and links to external sources. By original data source,we refer to the origin of the information, namely, a database, anofficial Web page, etc., whereas external sources link to otherdatabases or data repositories containing information about theelement under question.

The question ontology, on the other hand, was designed toask for companies that have quoted prices information of assetsin international markets and that are included in some particularstock exchange indexes for a time interval. A partial view of thisontology is shown in Fig. 18.

Once both ontologies have been customized to meet theneeds of the stock market domain, OWLPath can be usedto answer queries in this application domain. Figs. 19 and20 show the elaboration of a sample query in the e-financedomain. In this process, the user expresses his/her de-sire for receiving information about “COMPANIES” whose“STOCK_PRICE” is greater than $30 and is included inthe “DOW_JONES” stock market index, in the defined timeinterval. In Fig. 19, the system detects that the element se-lected by the user (“STOCK_PRICE.Last_trade”) is a data-type property whose range is “xsd:float.” Thus, it shows

Page 11: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 131

Fig. 17. Excerpt of the stock market ontology.

a list of comparison operators for numbers (“is_equal_to,”“is_greater_than,” etc.). OWLPath identifies the range of thedata-type properties defined in the domain ontology and sug-gests the appropriate operators. A further example is shownin Fig. 20. Here, the system knows that the range for theproperty “in” is associated to the data-type “xsd:dateTime.”Therefore, the system provides a list of time intervals that canbe put next in the current sentence. The user can choose aparticular time period (e.g., “Last_week”) among the sugges-tions or define a precise date from the calendar shown by the“Ajax interface.”

When the sentence is completed, the user must push the“Sonar Search” button. Then, the “SPARQL generator” trans-lates the natural language query into SPARQL (see Fig. 21),and this query is issued to the knowledge base through the“Ontology manager.” In Fig. 22, a portion of the screenshotwith the results of the query is shown.

V. EVALUATION

In this section, we provide the experimental results of theperformance of OWLPath in terms of the time required toconstruct a valid sentence using the system and its perfor-mance depending on the size of the input ontologies. Themajor advantages over the current state of the art are alsohighlighted.

A. Performance Analysis

The behavior of our solution in the two scenarios describedearlier has been evaluated in terms of performance and accu-racy. The performance of OWLPath is closely related to the sizeof the involved ontologies. In order to customize OWLPath tomake it work in a particular domain, software engineers haveto set up both the question and domain ontologies. A briefdescription of the contents of the ontologies has been providedin the previous section. Some metrics of both ontologies arepresented in Table I.

With these settings, the time elapsed between the selec-tion of the next element of the query and the display of thenext choice pop-up box has been measured. The performanceanalysis also takes into account the time spent in generat-ing the SPARQL statement. Obviously, the values of bothparameters strongly depend on the number of words of thecurrent sentence. The results of the simulations are summarizedin Table II.

The time provided is the average time over ten runs (from thesecond word onward) once both ontologies have been loadedin memory. In order to avoid the possible interference of theInternet latency in our results, all the tests have been performedon a local machine. From the results, it is worth noting that theelapsed time does not change significatively for larger numberof words. This is due to the tradeoff between the reducednumber of relations available, as the user chooses between

Page 12: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

132 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 18. Excerpt of the question ontology for the financial domain.

Fig. 19. Guided input in the e-finance domain. Dealing with numbers.

options narrowing down the number of possible paths in theontology, and the (possibly) increasing number of conceptinstances accessible. On the other hand, the reason for sucha short time required for generating the SPARQL statementshas been pointed out before. While setting up the sentence,the system stores the word chosen by the user along withthe underlying RDF triples that entailed the generation of thechosen word. Thus, when the user finishes elaborating the

sentence, the system already has all the ingredients that arenecessary to build up the SPARQL query.

B. User Experience Evaluation

OWLPath has been designed to construct queries based onthe loaded ontologies. Thus, the queries resulting from thisprocess are always valid in the context of such ontologies, and

Page 13: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 133

Fig. 20. Guided input in the e-finance domain. Dealing with dates.

Fig. 21. SPARQL query for the natural language query.

Fig. 22. Query results in the e-finance domain.

the accuracy is therefore generally close to 100%. As a result,the typical information extraction performance measures basedon parameters such as precision and recall are not relevant. In-stead, we have designed an experiment to show how OWLPathcan potentially benefit users in elaborating queries to be issuedto OWL-based knowledge bases.

Four Ph.D. students were asked to make ten queries relatedto a given tourism-based ontology. The queries were obtainedfrom the ten more frequent questions in the Web of tourismof the Region of Murcia (http://www.murciaturistica.es). Thestudents had a strong background in ontology developmentand SPARQL. They were asked to create first the queries

TABLE IDETAILS ABOUT THE TOURISM AND FINANCIAL ONTOLOGIES

TABLE IIE-TOURISM AND E-FINANCE PERFORMANCE ANALYSES

in SPARQL manually and then define them using OWLPath.In Fig. 23, the time spent by the students in both tasks isgraphically shown.

Both sets of queries were then executed, and their resultswere compared. The first conclusion that can be drawn fromthese results is that both methods give nearly the same answer.Moreover, in most cases, the time required to generate a querythrough the OWLPath interface is shorter than doing it manu-ally. The benefit would obviously be much more patent in casesof users not familiarized with SPARQL.

C. Contributions Beyond Previous Work

OWLPath improves the current state-of-the-art technologiesand tools in several ways. The main features of OWLPath canbe summarized as follows.

1) OWLPath provides a simple user-friendly Web interfacefor constructing SPARQL queries.

2) OWLPath exploits OWL knowledge bases.3) OWLPath takes a CNL approach to minimize the prob-

lems associated with full NLI systems.4) OWLPath is highly portable, meaning that it can be

applied to any ontology or set of ontologies.5) OWLPath makes use of ontologies to represent the un-

derlying variable grammar. This approach provides threemain benefits.a) More sophisticated grammars: The expressivity of on-

tology languages allows one to define more powerfulgrammars.

b) Domain-independent system: The tool can be fullycustomized to suit the requirements of any given ap-plication domain by changing both the grammar anddomain ontologies.

Page 14: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

134 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

Fig. 23. Time involved in making the queries.

c) Semantic Web features: Specifying the grammar rulesin an OWL-relying syntax allows us to use consistencychecking and other features offered by Semantic WebAPIs and tools.

The main difference between OWLPath and full NLIsystems is that the expressive power of OWLPath isconstrained by the grammar and vocabulary defined in theloaded ontologies. Far from being a drawback, this allowsOWLPath to ensure that all the queries and sentences“make sense” in the context of the loaded ontologies, andthe answers to these queries are likely available in theknowledge base.

Having been unable to download and test other relatedtools such as GINO and Ontopath, the comparison withthese tools in terms of performance and efficiency cannotbe provided. However, the results of our experimentsshow that the user experience is highly satisfactory, sincethe time required for elaborating the query is significantlyreduced and the accuracy of the tool is close to 100%.Concerning functionality, the gist of our approach liesin the use of an ontology to represent the permissiblegrammar. Unlike the state-of-the-art CNL tools evalu-ated, which make use of a static set of grammar rulesmostly represented following the BNF notation, havingthe grammar defined in the form of an ontology offersseveral advantages. The first benefit is the ability to easilyadapt to changing scenarios and meet the requirements ofdifferent application domains. This dynamism is achievedby loading the grammar rules in the question ontologyalong with the application-specific concepts from the

domain ontology at runtime. The second major advantageis that logic-based restrictions can be imposed on theontological model of the grammar to express certainconditions that should be complied with in a partic-ular application domain. Moreover, inference engines(i.e., reasoners) can be employed to verify the modelconsistency, check cardinality constraints, class member-ship, and create an inferred ontology model.

It is worth to point out that OWLPath covers thefull query cycle. It provides an ontology-guided nat-ural language-alike interface for query generation. Itprocesses such query and issues the request to the cor-responding ontology repository. In the end, it presents theresults to the user. Finally, OWLPath is being integratedin several projects (e.g., [35] and [36]) to provide a highlyuser-friendly interface to the end user.

VI. DISCUSSION AND CONCLUSION

The Semantic Web is about adding logic-based metadatato Web content with the premise that machines can system-atically process it. The goal is to improve the access to, andthe management and retrieval of, high-quality information onthe ever-increasing, dynamic Web. However, when it comesto interacting with the user, the formal foundations of theSemantic Web might overwhelm untrained users, resulting ina gap between the semantically empowered data and users’ in-formation needs. NLIs can help overcome this gap by allowingusers to formulate their information needs using natural lan-guage. Still, NLIs have three major drawbacks that hamper

Page 15: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

VALENCIA-GARCÍA et al.: OWLPATH: AN OWL ONTOLOGY-GUIDED QUERY EDITOR 135

their usability [13]: 1) It is hard to adapt them to new domains;2) there is typically a mismatch between the user’s expectationsand the capabilities of the natural language system, namely,the habitability problem; and 3) NLIs suffer from linguisticvariability and ambiguities.

CNLs aim to overcome some of the shortcomings of NLIs.CNLs are subsets of natural languages whose grammars anddictionaries have been restricted to reduce or eliminate bothambiguity and complexity. We believe that the combined effectsof NLIs and CNLs can help to bridge the gap between endusers and the Semantic Web. To that end, we have developedOWLPath, an ontology-guided input natural language-queryeditor that accepts natural language queries and outputsthe results retrieved from ontological knowledge bases. InOWLPath, the controlled language and grammar are deter-mined by a question ontology. This limits the expressiveness ofthe user, although this restriction is not too severe since it helpsin addressing two of the main flaws associated with NLIs:ambiguity and the habitability problem. Moreover, the use ofontologies for defining both the grammar and the knowledgeabout the domain application makes the platform portable anddomain independent. Furthermore, the logical basis of ontologylanguages allows for the elaboration of more powerful gram-mars. Thus, for example, unlike other approaches such as in[13] and [18], the grammar in OWLPath can take into accountdifferent kinds of restrictions. In addition, in contrast withother related tools that work with RDF ontologies (e.g., [20]),OWLPath deals with more expressive (DL-variant) OWLontologies.

The following limitations apply to the current version ofOWLPath and will be subject of further work: 1) scalability,the ontologies used for test purposes are relatively small andare processed in-memory, and 2) expressivity, the questionontologies only include the most common constructors to al-low formulating typical queries in the experimental scenarios.Performance and scalability can be enhanced by using re-lational database management systems to store the ontologyinstances. On the other hand, different language constructsand connectors can be easily included to enrich the gram-mar expressivity defined in the domain-dependent questionontologies.

Also as future work, a tool to assist software developers inelaborating the question ontology is currently under construc-tion. Indeed, in order to customize and make the platform workin a particular scenario, software developers need to define boththe domain and the question ontologies. Developing a complexquestion ontology can be particularly tricky, and this is thefocus of our current research. The tool to facilitate the defin-ition of question ontologies must take into account the lessonslearned from our previous experience in the development ofthese ontologies for the evaluation experiments. In particular,the methodology described in Section III-C1 constitutes thebasis for this tool. We also plan to develop a search componentto be integrated with the choice pop-up box in the Ajax inter-face. This component would help users in finding the desiredclasses or instances in the suggestions list. This componentwould be particularly useful when the number of choices isoverwhelming.

REFERENCES

[1] N. Shadbolt, T. Berners-Lee, and W. Hall, “The semantic web revisited,”IEEE Intell. Syst., vol. 21, no. 3, pp. 96–101, Jan./Feb. 2006.

[2] D. Huynh, S. Mazzocchi, and D. R. Karger, “Piggy bank: Experience thesemantic web inside your web browser,” J. Web Semantics, vol. 5, no. 1,pp. 16–27, Mar. 2007.

[3] X. Jiang and A.-H. Tan, “Learning and inferencing in user ontology forpersonalized semantic web search,” Inf. Sci., vol. 179, no. 16, pp. 2794–2808, Jul. 2009.

[4] R. Marks and W. Dembski, “Conservation of information in search: Mea-suring the cost of success,” IEEE Trans. Syst., Man, Cybern. A, Syst.,Humans, vol. 39, no. 5, pp. 1051–1061, Sep. 2009.

[5] W. Mahoney, P. Hospodka, W. Sousan, R. Nickell, and Q. Zhu, “A co-herent measurement of web-search relevance,” IEEE Trans. Syst., Man,Cybern. A, Syst., Humans, vol. 39, no. 6, pp. 1176–1187, Nov. 2009.

[6] H.-T. Zheng, B.-Y. Kang, and H.-G. Kim, “An ontology-based approachto learnable focused crawling,” Inf. Sci., vol. 178, no. 23, pp. 4512–4522,Dec. 2008.

[7] M. A. Awad and L. R. Khan, “Web navigation prediction using multipleevidence combination and domain knowledge,” IEEE Trans. Syst., Man,Cybern. A, Syst., Humans, vol. 37, no. 6, pp. 1054–1062, Nov. 2007.

[8] L. F. Lai, “A knowledge engineering approach to knowledge manage-ment,” Inf. Sci., vol. 177, no. 19, pp. 4072–4094, Oct. 2007.

[9] L. Zhou and D. Zhang, “An ontology-supported misinformation model:Toward a digital misinformation library,” IEEE Trans. Syst., Man, Cybern.A, Syst., Humans, vol. 37, no. 5, pp. 804–813, Sep. 2007.

[10] J. T. Fernández-Breis, D. C. Nieves, and R. Valencia-Garcia, “Measuringindividual learning performance in group work from a knowledge integra-tion perspective,” Inf. Sci., vol. 179, no. 4, pp. 339–354, Feb. 2009.

[11] A. R. Tawil, M. Montebello, R. Bahsoon, W. A. Gray, and N. J. Fiddian,“Interschema correspondence establishment in a cooperative owl-basedmulti-information server grid environment,” Inf. Sci., vol. 178, no. 4,pp. 1011–1031, Feb. 2008.

[12] M. Ulieru, M. Hadzic, and E. Chang, “Soft computing agents for e-healthin application to the research and control of unknown diseases,” Inf. Sci.,vol. 176, no. 9, pp. 1190–1214, May 2006.

[13] A. Bernstein and E. Kaufmann, “Gino—A guided input natural lan-guage ontology editor,” in Proc. Int. Semantic Web Conf., vol. 4273,Lecture Notes in Computer Science, I. F. Cruz, S. Decker, D. Allemang,C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, Eds., 2006,pp. 144–157.

[14] C. Wang, M. Xiong, Q. Zhou, and Y. Yu, “Panto: A portable naturallanguage interface to ontologies,” in Proc. ESWC, vol. 4519, LectureNotes in Computer Science, E. Franconi, M. Kifer, and W. May, Eds.,2007, pp. 473–487.

[15] F. Manola and E. Miller, “Rdf primer,” in Proc. World Wide WebConsortium, W3C Recommendation 10 February 2004, Tech. Rep., 2004.[Online]. Available: http://www.w3.org/TR/rdf-primer/

[16] D. L. McGuinness and F. van Harmelen, “Owl web ontology lan-guage. overview,” in Proc. World Wide Web Consortium, W3C Rec-ommendation 10 February 2004, Tech. Rep., 2004. [Online]. Available:http://www.w3.org/TR/owl-features/

[17] E. Prud’hommeaux and A. Seaborne, “Sparql Query Language for rdf,”in Proc. World Wide Web Consortium, W3C Recommendation 15 January2008, Tech. Rep., 2008. [Online]. Available: http://www.w3.org/TR/rdf-sparql-query/

[18] H. Namgoong and H.-G. Kim, “Ontology-based controlled natural lan-guage editor using CFG with lexical dependency,” in Proc. ISWC/ASWC,vol. 4825, Lecture Notes in Computer Science, K. Aberer, K.-S. Choi,N. F. Noy, D. Allemang, K.-I. Lee, L. J. B. Nixon, J. Golbeck, P. Mika,D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, Eds.,2007, pp. 353–366.

[19] E. Kaufmann, A. Bernstein, and L. Fischer, “Nlp-reduce: A ‘naive’ butdomain-independent natural language interface for querying ontologies,”in Proc. 4th ESWC, 2007, pp. 1–2.

[20] H.-G. Kim, B.-H. Ha, J.-I. Lee, and M.-K. Kim, “A multi-layered appli-cation for the gross description using semantic web technology,” Int. J.Med. Inf., vol. 74, no. 5, pp. 399–407, Jun. 2005.

[21] E. Kaufmann and A. Bernstein, “How useful are natural language inter-faces to the semantic web for casual end-users?” in Proc. ISWC/ASWC,vol. 4825, Lecture Notes in Computer Science, K. Aberer, K.-S. Choi,N. F. Noy, D. Allemang, K.-I. Lee, L. J. B. Nixon, J. Golbeck, P. Mika,D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, Eds.,2007, pp. 281–294.

[22] R. Schwitter, “A controlled natural language layer for the semantic web,”in Proc. Australian Conf. Artif. Intell., vol. 3809, Lecture Notes in Com-puter Science, S. Zhang and R. Jarvis, Eds., 2005, pp. 425–434.

Page 16: OWLPath: An OWL Ontology-Guided Query Editor 2011 Base Paper/OWLPath An OW… · IndexTerms—Natural language interfaces (NLIs), ontology lan-guages, query formulation, user interfaces

136 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 1, JANUARY 2011

[23] P. R. Smart, “Controlled natural languages and the semantic web,” SchoolElectron. Comput. Sci., Univ. Southampton, Southampton, U.K., Tech.Rep. ITA/P12/SemWebCNL, 2008.

[24] U. Muegge, “Controlled language: The next big thing in translation?”ClientSide News Mag., vol. 7, no. 7, pp. 21–24, 2007.

[25] E. Lughofer, J. Smith, M. Tahir, P. Caleb-Solly, C. Eitzinger, D. Sannen,and M. Nuttin, “Human–machine interaction issues in quality controlbased on online image classification,” IEEE Trans. Syst., Man, Cybern.A, Syst., Humans, vol. 39, no. 5, pp. 960–971, Sep. 2009.

[26] B. McBride, “Jena: A semantic web toolkit,” IEEE Internet Comput.,vol. 6, no. 6, pp. 55–59, Nov./Dec. 2002. [Online]. Available: http://jena.sourceforge.net/

[27] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz, “Pellet: Apractical owl-dl reasoner,” J. Web Semantics, vol. 5, no. 2, pp. 51–53,Jun. 2007.

[28] T. Tudorache, N. F. Noy, S. Tu, and M. A. Musen, “Supporting col-laborative ontology development in protégé,” in Proc. Int. SemanticWeb Conf., vol. 5318, Lecture Notes in Computer Science, A. P. Sheth,S. Staab, M. Dean, M. Paolucci, D. Maynard, T. W. Finin, andK. Thirunarayan, Eds., 2008, pp. 17–32.

[29] D. Tsarkov and I. Horrocks, “Fact++ description logic reasoner: Sys-tem description,” in Proc. IJCAR, vol. 4130, Lecture Notes in ArtificialIntelligence, 2006, pp. 292–297.

[30] M. Horridge, S. Bechhofer, and O. Noppens, “Igniting the owl 1.1touch paper: The OWL API,” in Proc. OWLED, vol. 258, CEUR Work-shop Proceedings, C. Golbreich, A. Kalyanpur, and B. Parsia, Eds.,CEUR-WS.org, 2007. [Online]. Available: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-258/paper19 pdf.

[31] K. Prantner, Ontour. The ontology, DERI Innsbruck, Innsbruck, Austria.[Online]. Available: http://e-tourism.deri.at/ont/docu2004/OnTour%20-%20The%20Ontology.pdf

[32] M. Gouveia and J. Cardoso, “Tourism information aggregation using anontology based approach,” in Proc. ICEIS, J. Cardoso, J. Cordeiro, andJ. Filipe, Eds., 2007, vol. 1, pp. 569–572.

[33] H. Knublauch, “Case study: Using protege to convert the travel ontologyto uml and owl,” in Proc. EON, vol. 87, CEUR Workshop Proceedings,Y. Sure and Ó Corcho, Eds., CEUR-WS.org, 2003. [Online]. Available:http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-87/EON2003_Knublauch.pdf

[34] M. Martinez-Montes, J.L. Bas, S. Bellido, O. Corcho, S. Losada,R. Benjamins, and J. Contreras, WP10: Case Study Ebanking. D10.3-Financial Ontology, Data, Inf. Process Integr. Semantic Web Services.[Online]. Available: http://dip.semanticweb.org/documents/D10.3.pdf

[35] J. M. Gómez, F. Garcia-Sánchez, R. Valencia-Garcia, I. Toma, andC. Garcia-Moreno, “Sonar: A semantically empowered financial searchengine,” in Proc. IWINAC, vol. 5601, Lecture Notes in Computer Science,J. M. Mira, J. M. Ferrández, J. R. Álvarez, F. de la Paz, andF. J. Toledo, Eds., Springer-Verlag, Santiago de Compostela, Spain,Jun. 22–26, 2009, pp. 405–414.

[36] M. Rico, F. Garcia-Sánchez, J. M. Gomez, R. Valencia-Garcia, andJ. T. Fernández-Breis, “Enabling intelligent service discovery withGGODO,” J. Inf. Sci. Eng., vol. 26, no. 4, Jul. 2010, to be published.

[37] “The Semantic Web,” in Proc. 6th ISWC+2nd ASWC, vol. 4825, Lec-ture Notes in Computer Science, K. Aberer, K.-S. Choi, N. F. Noy,D. Allemang, K.-I. Lee, L. J. B. Nixon, J. Golbeck, P. Mika, D. Maynard,R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, Eds., Busan, Korea,Nov. 11–15, 2007.

Rafael Valencia-Garcia received the B.A., M.Sc.,and Ph.D. degrees in computer science from theUniversity of Murcia, Espinardo, Spain.

He is currently an Associate Professor with theDepartment of Informatics and Systems, Universityof Murcia. His main research interests are naturallanguage processing and the application of knowl-edge technologies. He has published over 40 articlesin journals, conferences, and book chapters. He is theauthor or coauthor of several books.

Francisco Garcia-Sánchez received the B.A.,M.Sc., and Ph.D. degrees in computer science fromthe University of Murcia, Espinardo, Spain.

He is currently a Lecturer with the Department ofInformatics and Systems, University of Murcia. Hisresearch interests include agent technology, service-oriented architectures, and the Semantic Web. Hehas conducted a number of research stays in worldleading research institutes in Ireland, Austria, U.S.,and Australia and has published over 30 articles injournals, international and national conferences, and

workshops. He is currently a Prime Investigator of three national projectsconcerning the development of user interfaces to Semantic Web servicesexecution environments and ontology-based intelligent systems to assist inaccessing financial data sources.

Dagoberto Castellanos-Nieves received the B.A.and M.Sc. degrees in mechanical engineering fromthe University of Holguin, Holguin, Cuba, andthe Ph.D. degree from the University of Murcia,Espinardo, Spain.

He is currently a Postdoc Researcher with theDepartment of Informatics and Systems, Universityof Murcia, and is currently collaborating in variousresearch projects concerning the Semantic Web. Hisresearch interests include e-learning and the Seman-tic Web. He has published over ten articles in jour-

nals, conferences, and book chapters. He is the author or coauthor of severalbooks.

Jesualdo Tomás Fernández-Breis received theB.A., M.Sc. and Ph.D. degrees in computer sciencefrom the University of Murcia, Espinardo, Spain.

He is currently an Associate Professor with theDepartment of Informatics and Systems, Universityof Murcia. His research interests include the develop-ment and application of knowledge technologies todifferent fields such as medicine, the Semantic Web,e-learning, and bioinformatics. He is currently aPrime Investigator of three national projects and haspublished over 70 articles in journals, conferences,

and book chapters. He is the author or coauthor of several books.