Translating natural language competency questions into sparql queries web2013

GlobeNet 2013WEB 2013, The First International Conference on

Building and Exploring Web Based EnvironmentsJanuary 27 - February 1, 2013 - Seville, Spain

Translating Natural Language Competency Questions into SPARQL

Queries: A Case Study

Authors:

Leila ZEMMOUCHI-GHOMARI, [email protected] Abdessamed Réda GHOMARI, [email protected]

LMCS LaboratoryNational Superior School of Computer Science, Algiers, Algeria

www.esi.dz

http://www.esi.dz/

2

WEB 2013 January 27 - February 1, 2013 - Seville, Spain

1. MOTIVATION

2. RELATED WORK

3. PROPOSED TRANSLATION APPROACH

4. CASE STUDY

6. CONCLUSIONS AND FUTURE WORK

OUTLINE

3WEB 2013 January 27 - February 1, 2013 - Seville, Spain

1. MOTIVATION

Translation

The context of the current research work is a PHD thesis focused on an ontology engineering process

4WEB 2013 January 27 - February 1, 2013 - Seville, Spain

1. MOTIVATION

Translation

The context of the current research work is a PHD thesis focused on an ontology engineering process

Competency questions is a well-known technique that

allow to determine the requirements or needs the

ontology should fulfill

expressed in a formal language in order to

allow automatic evaluation

7


2. RELATED WORK

CNL

OWLPATH

PANTO

DEANNA

Ben Abacha & Zweigenbaum

Approach, 2012

To the best of our knowledge, automatic translation of competency questions into SPARQL queries, with the aim of validating an ontology, has not been tackled by researchers. Although, in a more general perspective, there exist several approaches dedicated to web Question Answering (QA) area

8


2. RELATED WORK

CNL

Ontology-basedControlled

Natural LanguageEditor

OWLPATH

OWL Ontology-guided query

Editor

PANTO

Portable Natural Language

Interface to Ontologies

DEANNA

Deep Answers for Naturally Asked

Questions

Ben Abacha &ZweigenbaumApproach, 2012

Translating MedicalQuestions into

SPARQL Queries

Limitations:

Scalability: Their test ontologies are relatively small Preliminary work are necessary to apply theses approaches like Mapping set between concepts’ questions and queried knowledge bases difficult to carry out and to maintain. some of them focus on some types of questions and some know. domains No consensus of web QA community on a single approach

9


3. PROPOSED TRANSLATION APPROACH (1/3)A variation of [Ben Abacha & Zweigenbaum, 2012] Approach

WHY ?

HOW ?

Specific to the medical field

Limited to a particular set of questions:

WH questions, except complex ones (why and when).

Their approach Our approach

1. Identifying QuestionType 1. Identifying QuestionType

2. Determining the Expected Answer(s)Type(s) for WH questions

2. Determining the expected answer

3. Constructing the question’s affirmative and simplified form

4. Medical Entity Recognition (treatment, disease…)

3. Entity Extraction

5. Relation Extraction 4. Identifying answer entity type and entity location in the ontology

6. SPARQL Query Construction 5. SPARQL Query Construction

10


Phase I: Identifying competency questions’ categories according to expected

answers’ types:

a) Definition Questions: that begins with “What is/are” or “What does mean”

b) Boolean or Yes/No Questions

c) Factual Questions: the answer is a fact or a precise information

d) List questions: the answer is a list of entities

e) Complex Questions: that begins with “How” and “Why”

3. PROPOSED TRANSLATION APPROACH (2/3)

11


Phase I: Identifying competency questions’ categories according to expected

answers’ types:

a) Definition Questions: that begins with “What is/are” or “What does mean”

b) Boolean or Yes/No Questions

c) Factual Questions: the answer is a fact or a precise information

d) List questions: the answer is a list of entities

e) Complex Questions: that begins with “How” and “Why”

3. PROPOSED TRANSLATION APPROACH (2/3)the query result clause specifies the result form

12


Phase II: Determining the expected (perfect or ideal) answer

Phase III: Extracting Entity or Entities from questions and theircorresponding expected answers identified in II

Phase IV: Identifying answer entity type (class, data property,object property, annotation, axiom, instance) and entity location inthe ontology

Phase V: Constructing SPARQL query based on question typeidentified in phase I, question/answer entity extracted from phaseIII and its corresponding entity type/entity location in the ontologyfrom phase IV


13







Mapping between question/answer entity

and ontology entity

14






SELECT * WHERE {?Teacher rdf:type HERO:Teacher . }


15


4. CASE STUDY: HERO

Translation of Competency Questions of HERO ontology (Higher Education Reference Ontology) into SPARQL Queries

HERO describes several aspects of university domain such as organizational structure, administration, staff, roles, incomes, etc.

HERO aims to be a valuable tool for researchers and institutional employees interested in analyzing the system of higher education as a whole.

HERO Ontology is available at: http://sourceforge.net/projects/heronto/?source=directory Competency questions (81) and their corresponding queries are available at: http://herontology.esi.dz/content/downloads

16


Phase I: Identifying competency questions’ categories according to expected answers’ types

CQs’ Categories CQs’ Examples from 81 CQs

Definition questions CQ59.What is a Credit?

Yes/No questions CQ3. Must a university teacher be a researcher?

Factual questions CQ44. What average size and duration have governing board?

List questions CQ1. What are the possible academic ranks of a teacher?

Complex questions CQ41.Why universities are organized into departments?

4. CASE STUDY

17


CQs’ Examples Corresponding Answers

CQ59.What is a Credit? Each course bears a specified number of credits.In general, the number of credits a course carries is determined by the number of class hours the course meets each week.

CQ3. Must a university teacher be a researcher?

Nearly all faculty members are expected to engage in research.

CQ44. What average size and duration have governing board?

The average size of public boards is approximately 10 people and the average size among independent (private) institutions is 30. The length of board members’ terms varies from three years to as long as 12 years.

CQ1. What are the possible academic ranks of a teacher?

Assistant Professor, Associate Professor, Full Professor, Professor Emeritus.

CQ41.Why universities are organized into departments?

The basic unit of academic organization in most institutions is the department (e.g., chemistry, political science). Every department belongs to an academic field.

Phase II: Determining the expected answer

4. CASE STUDY

18


CQs’ Examples Corresponding Answers

CQ59.What is a Credit? Each course bears a specified number of credits.In general, the number of credits a course carries is determined by the number of class hours the course meets each week.




The average size of public boards is approximately 10 people and the average size among independent (private) institutions is 30. The length of board members’ terms varies from three years to as long as 12 years.


Assistant Professor, Associate Professor, Full Professor, Professor Emeritus.

CQ41.Why universities are organized into departments?

The basic unit of academic organization in most institutions is the department (e.g., chemistry, political science). Every department belongs to an academic field.

Phase II: Determining the expected answer

4. CASE STUDY Answers sources are: academic reports,

governmental websites, experts’ interviews, ...

19


Phase III: Extracting Entity or Entities from competency questions andtheir corresponding expected answers identified in II.This extraction is based on a mapping between relevant terms inquestions/answers pairs and their equivalent terms in the ontology

Extracted terms from CQs’ Extracted terms from Answers

CQ59.What is a Credit? Each course bears a specified number of credits.In general, the number of credits a course carries is

determined by the number of class hours the course meets each week.



CQ44. What average size andduration has governing

board?

The average size of public boards is approximately 10 peopleand the average size among independent (private)

institutions is 30. The length of board members’ terms varies from three years to as long as 12 years.

CQ41.Why universities areorganized into departments?

The basic unit of academic organization in most institutions is the department (e.g., chemistry, political science). Every

department belongs to an academic field.

4. CASE STUDY

20


Entities’ Types Entities’ Locations in the ontologyClass: CourseData Property: CourseCreditsNumber

CourseCreditsNumber Domain Course

Classes: Teacher, Researcher Teacher SubClassOf Researcher

Class: Governing BoardData Properties: Size, Duration

GoverningBoardSize Domain GoverningBoardGoverningBoardDuration Domain GoverningBoard

Class: TeacherData Property: Rank, Assistant Professor, Associate Professor, Full Professor, Professor Emeritus

TeacherRank Domain TeacherAssistantProfessor SubPropertyOf TeacherRankAssociateProfessor SubPropertyOf TeacherRankFullProfessor SubPropertyOf TeacherRankProfessorEmeritus SubPropertyOf TeacherRank

Classes: Higher Education Organization, Department

Department SubClassOf FacultyFaculty SubClassOf RoleRole SubClassOf HigherEducationOrganizationDepartment Definition

Phase IV: Identifying answer entity type (class, data property, object property, annotation, axiom, instance) and entity location in the ontology

4. CASE STUDY:

21


Competency Questions SPARQL Queries

CQ59.What is a Credit? SELECT ?comment WHERE{ HERO:CourseCreditsNumber rdfs:comment ?comment }


ASK{HERO:Teacher rdfs:subClassOf HERO:Researcher .}


SELECT ?university ?size WHERE { ?university rdf:type HERO:HigherEducationOrganization; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardSize ?size }SELECT ?university ?duration WHERE { ?university rdf:type HERO:HigherEducationOrganization ; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardDuration?duration }


SELECT ?a ?b ?c ?d WHERE{?a rdfs:subPropertyOf HERO:TeacherRank. ?b rdfs:subPropertyOf ?a . ?c rdfs:subPropertyOf ?b . ?d rdfs:subPropertyOf ?c .}

Phase V: Construction of SPARQL queries

4. CASE STUDY:

22


Competency Questions SPARQL Queries

CQ59.What is a Credit? SELECT ?comment WHERE{ HERO:CourseCreditsNumber rdfs:comment ?comment }


ASK{HERO:Teacher rdfs:subClassOf HERO:Researcher .}


SELECT ?university ?size WHERE { ?university rdf:type HERO:HigherEducationOrganization; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardSize ?size }SELECT ?university ?duration WHERE { ?university rdf:type HERO:HigherEducationOrganization ; ?y rdfs:subClassOf ?university ; ?y HERO:GoverningBoardDuration?duration }


SELECT ?a ?b ?c ?d WHERE{?a rdfs:subPropertyOf HERO:TeacherRank. ?b rdfs:subPropertyOf ?a . ?c rdfs:subPropertyOf ?b . ?d rdfs:subPropertyOf ?c .}

Phase V: Construction of SPARQL queries

4. CASE STUDY: These queries can be checked out by using available online SPARQL end-points or off-line tools such as: TWINKLE

23


5. CONCLUSION AND FUTURE WORK

• Summary

Intended users: ontology developers, i.e.; They are familiar with: ontology language, ontologystructure and query language

Intended uses: ontology validation, i.e.; Since competency questions are the starting point for extracting relevant terms that become later ontology entities

translated CQs on SPARQL Queries target directlyontology entities

24



• Summary

Intended users: ontology developers, i.e.; They are familiar with: ontology language, ontologystructure and query language

Intended uses: ontology validation, i.e.; Since competency questions are the starting point for extracting relevant terms that become later ontology entities

translated CQs on SPARQL Queries target directlyontology entities

Helps in Entity location (phase 4 ) and query

construction (phase 5)

Helps in Entity extraction (phase 3 )

25



• Limitations Two of proposed approach phases are manual and dependent of user knowledge background: Entity extraction from questions/answers pairs and mapping between questions/answers relevant terms and ontology entities

Weak treatment of complex questions

• Future WorkThe best way to tackle the issue of manual phases is to integrate natural language processing tools like GATE in terms extraction phase and automatic matching systems such as COMA 3.0 which efficiency has been already proved.

26


SOME REFERENCES

1. CQs……M. Gruninger and M. S. Fox, “Methodology for the design and evaluation of ontologies”, IJCAI95, Workshop on Basic Ontological Issues in Knowledge Sharing. Montreal, 1995, pp. 6.1–6.10.

2. Web QA Approach….. A. Ben Abacha and P. Zweigenbaum, “Medical Question Answering: Translating Medical Questions into SPARQL Queries”, Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, Miami, Florida, USA, 2012, pp. 41-50.

3. SPARQL….Querying the Semantic Web: SPARQL by Emanuelle Della Valle and Stefano Ceri, pp 299-363 in HANDBOOK OF SEMANTIC WEB TECHNOLOGIES, 2011, SPRINGER.

THANK YOU FOR YOUR ATTENTION

Technology

Translating natural language competency questions into sparql queries web2013