26
ANSWERING CONTROLLED NATURAL LANGUAGE QUERIES USING ANSWER SET PROGRAMMING Syeed Ibn Faiz

ANSWERING CONTROLLED NATURAL LANGUAGE QUERIES USING ANSWER SET PROGRAMMING Syeed Ibn Faiz

Embed Size (px)

Citation preview

ANSWERING CONTROLLED NATURAL LANGUAGE QUERIES USING ANSWER SET PROGRAMMING

Syeed Ibn Faiz

Outline

Overview Controlled Natural Language Answer Set Programming Transforming Queries into Programs Producing Answers Related Works Conclusion

Overview

People need to write queries to extract information from biomedical ontologies.

Problem: Formal query languages are not suitable for many of them.

Solution: Natural language query? Further problem: Ambiguities,

Complexities. Solution: Controlled natural language!

How will it work?

Controlled natural language is unambiguous.

Therefore, a query can be easily and unambiguously translated into a logical form.

Then one can do reasoning with it!

Which drug cures Asthma?

which_drug(A) ← drug_cure_disease(A,as

thma).Formoterol

Controlled Natural Language Subset of a natural language. It has a restricted grammar and

vocabulary. It overcomes the ambiguity and

complexity of natural language. Example:

Attempto Controlled English (University of Zurich)

Attempto Controlled English (ACE) Subset of standard English with a

restricted syntax and restricted semantics described by a small set of Construction rules – Grammar Interpretation rules – remove ambiguities

A customer who enters a card manually types a code.

APE translates ACE text into DRS.

Discourse Representation Structure (DRS)

DRS derived from ACE text is returned as: drs ( Domain, Conditions)

Uses a fixed number of predefined predicates: object, predicate, property, relation,

modifier_pp, modifier_adv, query. An example:

A man is mortal.

DRS Contd.

Query: What are the symptoms of the diseases that are related to ADRB1 or that are treated by Epinephrine?

Answer Set Programming (ASP) ASP is a form of declarative programming

oriented towards difficult search problems In ASP a problem is posed as a logic

program and solution is computed by its model or answer set.

It allows to automate reasoning with incomplete information.

Answer set solvers: Smodel, Clasp, DLV etc.

An Example

ide_drive :- hard_drive, not scsi_drive. scsi_drive :- hard-drive, not ide_drive. scsi_controller :- scsi_drive. hard_drive.

M1 = {hard_drive, ide_drive} M2 = {hard_drive, scsi_drive,

scsi_controller}

Converting Query to Program Three step process

Obtaining DRS produced by APE Parsing DRS Generating answer set program

Parsing DRS

Grammar for DRS: DRS drs( Domain , Conditions ) Domain [] | [ Referent {,Referent}* ] Conditions [ Condition {,Condition}* ] Condition Predicate | ComplexStructure Predicate Object | Property | Relation | Predicate | Modifier_pp | Modifier_adv | Query ComplexStructure Question | Negation | Disjunction .....

A Recursive Descent Parser

DRS ParserInternal Structu

re

Generating Answer Set Programs A program consists of rules. A rule has two parts:

Head :- Body Generating rules:

Constructing Head atom Constructing Bodies

Constructing Head Atom

Which drug cures Asthma?

query(A,which) object(A,drug,countabl

e,na,eq,1)

Which_drug(A)

What is the drug that cures Asthma?

query(A,what) predicate(B,be,A,C) object(C,drug,countabl

e,na,eq,1)

What_be_drug(C)

Which Query What Query

Generating Bodies

What are the symptoms of the diseases

THAT

Are related to ADRB1OR

that are treated by Epinephrine?

Generating Bodies Contd.

Depth First Traversal of internal DRS representation

Generate a new body for each leaf Add an atom to the body for

Each predicate-predicate predicate(D,cure,C,named(Asthma)) drug_cure_disease(C, asthma)

Each relation-predicate relation(A,of,B) Symptom_of_disease(A, B)

Examples

What are the symptoms of the diseases that are related to ADRB1 or that are treated by Epinephrine? what_be_symptom(C) :- symptom_of_disease(C,D), disease_be_related_to_gene(D,adrb1)

what_be_symptom(C) :- symptom_of_disease(C,D), drug_treat_disease(epinephrine,D)

Which gene is related to a disease that causes Insomnia? which_gene(A) :- disease_cause_symptom(B,insomnia), gene_be_related_to_disease(A,B)

Producing Answers

We need biomedical knowledge Knowledge must be encoded Answer Set Solver

Clasp We need an interface.

Biomedical Knowledge

Concepts Gene Drug Disease Symptom

PharmGKB database: Relationships between gene, drug and disease.

MedicineNet.com: Disease and symptom database.

Encoding Knowledge

Facts: disease_symptom(asthma,cough). gene_disease(adra1b,asthma). gene_drug(adra1b,norepinephrine). drug_disease(norepinephrine,hypertension)

. …..

Rules: drug_symptom(X,Y) :- drug_disease(X,Z),

disease_symptom(Z, Y).

System Architecture

User Interface

Query Pre-

processor

APE

ParserDRS

Translator

Post Processi

ng

Clasp Interface Clasp User

Interface

Post Processing

disease_be_related_to_gene(D,adrb1)

drug_treat_disease(epinephrine,D)

disease_cause_symptom(B,insomnia)

gene_disease(adrb1,D)

drug_disease(epinephrine,D)

disease_symptom(B,insomnia)

Before Afters

Related Works

A preliminary report on answering complex queries related to drug discovery using answer set programming by Oliver Bodenreider et al.

Transforming Controlled Natural Language Biomedical Queries into Answer Set Programs by Esra Erdem et al.

Conclusion

Limitations Data Language

Future Directions

Questions?

Thank You!