27
Towards ontology driven Towards ontology driven navigation of the lipid navigation of the lipid bibliosphere bibliosphere Chistopher J. O.Baker, Rajaraman Chistopher J. O.Baker, Rajaraman Kanagasabai, Kanagasabai, Wee Tiong Ang, Anitha Veeramani, Wee Tiong Ang, Anitha Veeramani, Hong-Sang Hong-Sang Low Low , and Markus R. Wenk , and Markus R. Wenk International Conference on Bioinformatics International Conference on Bioinformatics 2007 2007 (InCoB 2007) (InCoB 2007) 27-31 August 2007 27-31 August 2007

Towards ontology driven navigation of the lipid bibliosphere

  • Upload
    hani

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards ontology driven navigation of the lipid bibliosphere. Chistopher J. O.Baker, Rajaraman Kanagasabai, Wee Tiong Ang, Anitha Veeramani, Hong-Sang Low , and Markus R. Wenk International Conference on Bioinformatics 2007 (InCoB 2007) 27-31 August 2007. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Towards ontology driven navigation of the lipid bibliosphere

Towards ontology driven Towards ontology driven navigation of the lipid navigation of the lipid

bibliospherebibliosphere

Chistopher J. O.Baker, Rajaraman Kanagasabai, Chistopher J. O.Baker, Rajaraman Kanagasabai,

Wee Tiong Ang, Anitha Veeramani, Wee Tiong Ang, Anitha Veeramani, Hong-Sang LowHong-Sang Low, and , and Markus R. WenkMarkus R. Wenk

International Conference on Bioinformatics 2007International Conference on Bioinformatics 2007

(InCoB 2007)(InCoB 2007)

27-31 August 200727-31 August 2007

Page 2: Towards ontology driven navigation of the lipid bibliosphere

MotivationMotivation

Lipid research in 21Lipid research in 21stst century is in need of century is in need of reliable & sensible integration of data from reliable & sensible integration of data from different sources.different sources.

Lipid nomenclature in biomedical literature is Lipid nomenclature in biomedical literature is highly heterogeneous. highly heterogeneous.

Semantic data integration is necessary for lipid Semantic data integration is necessary for lipid research yet this is poorly achievable due to an research yet this is poorly achievable due to an absence of a single absence of a single unifiedunified, , consistentconsistent, and , and universally accepteduniversally accepted lipid classification system. lipid classification system.

Page 3: Towards ontology driven navigation of the lipid bibliosphere

ObjectiveObjective

Develop a system that can facilitate the Develop a system that can facilitate the navigation of the lipid bibliosphere using a navigation of the lipid bibliosphere using a standardized lipid vocabulary with precise standardized lipid vocabulary with precise semantics. semantics.

To make use of the expressivity of a w3c To make use of the expressivity of a w3c endorsed standard, the web ontology language endorsed standard, the web ontology language (OWL) for representing lipid nomenclature & (OWL) for representing lipid nomenclature & hierarchy. hierarchy.

Page 4: Towards ontology driven navigation of the lipid bibliosphere

Lipids Lipids OntologiesOntologies

Capture knowledge: The meaning of important vocabulary (classes, properties/relations and instance data in a domain model).

Provides a common terminology for a domain.

Provides a basis for interoperability between information systems.

Make the content in information sources explicit.

Provides an index and query model to a repository of information.

Lipids have many properties and biologically related information that needs to be systematically captured in a domain model.

Lipids have no universally accepted nomenclature.

Integration of lipid data is hampered by a lack of unified classification system and presence of multiple data formats.

Lipid nomenclature isnot always intuitive.

Semantics of lipid terminology can beambiguous, synonym rich, non standard.

Page 5: Towards ontology driven navigation of the lipid bibliosphere

Lipid OntologyLipid Ontology

Page 6: Towards ontology driven navigation of the lipid bibliosphere

Lipid Upper OntologyLipid Upper Ontology

Implemented in OWL-DL language

Uses LIPIDMAPS systematic lipid nomenclature

560 named classes 352 lipid subclasses 71 Object properties 4 Data properties Lipid instance: LIPIDMAPS systematic name Depth: 8 levels

Page 7: Towards ontology driven navigation of the lipid bibliosphere

Modeling lipid Modeling lipid informationinformation

Multiple features of lipids are modeled in the Lipid_Specification concepts and are directly related to the lipid classification hierarchy found under the Lipid concept

Page 8: Towards ontology driven navigation of the lipid bibliosphere

Linking lipids with other Linking lipids with other biological informationbiological information

Lipid-Protein Modeled with Protein concept Protein instance: Protein name

from SWISPROT Lipid concept is linked to the

Protein concept via the InteractsWith_Protein property

Lipid-Disease Modeled with Disease concept Disease instance: Disease name from Disease Ontology Lipid concept is linked to the Disease concept via the hasRole_In_Disease property

Page 9: Towards ontology driven navigation of the lipid bibliosphere

A LIPID has many names A LIPID has many names •Phosphatidylcholine is an important component of the mucus layer in the large intestine.

•The distribution of these pores was examined using 1,2-di-oleoyl-sn-glycero-3-phosphocholine (DOPC) phospholipid vesicles under a standard fluorescent microscope.

•Lecithin is usually used as a synonym for pure phosphatidylcholine, which is the major component isolated from egg yolk or soy beans.

2-[[(2R)-2,3-di(octadecanoyloxy)propoxy]-hydroxyphosphoryl]oxyethyl-trimethylazanium

Page 10: Towards ontology driven navigation of the lipid bibliosphere

Modelling SynonymsModelling Synonyms 4 types of name

LIPIDMAPS systematic name

IUPAC systematic name

Broad lipid name(non-systematic)

Exact lipid name(non-systematic)

Instances of names are connected via the propertieshasIUPAC_SynonymhasLIPIPMAPS_SynonymhasBroad_Lipid_SynonynhasExact_Lipid_Synonym

Page 11: Towards ontology driven navigation of the lipid bibliosphere

Literature SpecificationLiterature Specification

Page 12: Towards ontology driven navigation of the lipid bibliosphere

Literature-driven, Literature-driven, ontology-centric ….ontology-centric ….

Content Delivery Platform - Automated Document delivery from Pubmed-PDF / USPTO-HTML Tools for conversion of docs to text-minable text

Text Mining - Customized and Automated Regular Expressions, Named Entities, Relations, Co-reference

Knowledge Engineering Ontology Creation Domain Modeling / Customized / Rapid Prototype

Knowledge Navigation / Ontology Interrogation Tools Interactive Visual Query, Natural Language Interfaces

Service platform for knowledge-intensive lipid navigation tasks

Page 13: Towards ontology driven navigation of the lipid bibliosphere

Lipid Ontology as a Lipid Ontology as a knowledge integration knowledge integration

vehiclevehicleOWL interrogation• DL reasoning & inference• nRQL (new RACER Query Language)• Semantic query tools

Major Knowledge Sources• Lipid Ontology

• NLP tagged text• Database content

Knowledge navigation:

Page 14: Towards ontology driven navigation of the lipid bibliosphere

Ontology and Text Mining Ontology and Text Mining

2 Sentence Extraction

1 Document Content

3 Sentence Detection: lipid interaction protein

4 Entity Recognition: term identification / assign lipid class

5 Normalization: collapse lipid synonyms

6 Relation Extraction: Lipid-Protein or Lipid Disease

8 Populate OWL ontology (JENA API)

Complete Instantiated

OWL-DLOntology

Term List DB’s: Lipid names, LIPIDMAPS, Lipid Bank, KEGG classifications, Disease names, Protein namesStemmed Interactions

Document and sentence meta data "TLR4 binds to POPC", tagged as "TLR4 binds to POPC", tagged as

"<term category=""<term category="proteinprotein"> TLR4</term> "> TLR4</term> binds to binds to <term category="<term category="lipidlipid">POPC</term>"">POPC</term>"

7 Classification: Identify ontology classes and specify relations for all sentences, proteins, lipid subclasses.

Page 15: Towards ontology driven navigation of the lipid bibliosphere

Indexed Lipid SentencesIndexed Lipid SentencesLipid Instance

Lipid Instance

Lipid Class

Page 16: Towards ontology driven navigation of the lipid bibliosphere

User input query “lipid interact* protein”

110 full text papersPubmed NLP tagging

87 docs tagged with

relevant name

entities

Ontology instantiation

“Instantiated ontology”Knowledge Navigation

vehicleOutput for end userUser

2 sec/Doc

123 lipids,361 proteins,

920 lipid-protein interactions

Knowledge integration Knowledge integration pipelinepipeline

Specification• Content Acquisition pipeline:

• Automated Pubmed query• Text format converter

Page 17: Towards ontology driven navigation of the lipid bibliosphere

User input query “lipid interact* protein”

110 full text papersPubmed NLP tagging

87 docs tagged with

relevant name

entities

Ontology instantiation

“Instantiated ontology”Knowledge Navigation

vehicleOutput for end userUser

2 sec/Doc

123 lipids,361 proteins,

920 lipid-protein interactions

Knowledge integration Knowledge integration pipelinepipeline

Specification•Text-mining & NLP:

• BioText Suite for tokenization, part of speech tagging, named entity recognition, grounding, association mining

Page 18: Towards ontology driven navigation of the lipid bibliosphere

User input query “lipid interact* protein”

110 full text papersPubmed NLP tagging

87 docs tagged with

relevant name

entities

Ontology instantiation

“Instantiated ontology”Knowledge Navigation

vehicleOutput for end userUser

2 sec/Doc

123 lipids,361 proteins,

920 lipid-protein interactions

Knowledge integration Knowledge integration pipelinepipeline

Specification•Ontology Instantiation pipeline:

•custom script based on JENA API

Page 19: Towards ontology driven navigation of the lipid bibliosphere

User input query “lipid interact* protein”

110 full text papersPubmed NLP tagging

87 docs tagged with

relevant name

entities

Ontology instantiation

“Instantiated ontology”Knowledge Navigation

vehicleOutput for end userUser

2 sec/Doc

123 lipids,361 proteins,

920 lipid-protein interactions

Knowledge integration Knowledge integration pipelinepipeline

Specification•Knowledge Navigation platform:

•Knowledge navigator or Knowlegator•RACER•nRQL

Page 20: Towards ontology driven navigation of the lipid bibliosphere

OWL-DL Query with nRQLOWL-DL Query with nRQL•nRQL queries are built on a Lisp syntax• Elementary query atoms, combinable into highly expressive but syntactically complex A-box queries to derive assertions about instance data (individuals).

• Unary concept query (Instance Classification and retrieval)• Does this instance belong to this class?• What are instances of class X• To which classes does instance X belong ?

• Binary role query• What instances are related by relation X

• Binary role constraint query • Unary has known successor (Ancestor / Descendant)• Negation • Intersect / Conjunction• Union / Disjunction• Combinations (And / Union)

Mark-upMark-up

LanguageLanguageDescriptionDescription Query Query

LanguageLanguage

XMLXML

Structured Structured

DocumentDocument

XPath, XPath, XQueryXQuery

RDFRDF

Data Data Model Model

for for objectsobjects

RDQL, RDQL, RQL, RQL,

Versa, Versa, SquishSquish

OWLOWL

Data Data Model + Model +

RelationsRelations

nRQLnRQL,, OWL-QL, OWL-QL,

JENAJENA

Haarslev V., Moeller R., Wessel M., Querying the Semantic Web with Racer + nRQL In Sean Bechhofer, Volker Haarslev, Carsten Lutz, Ralf Moeller (Eds) CEUR workshop proceedings of KI-2004 Workshop on Applications of Description Logics (ADL 04), Ulm, Germany, Sep 24 2004 The New Racer Query Language www.cs.concordia.ca/~haarslev/racer/racer-queries.pdf

Page 21: Towards ontology driven navigation of the lipid bibliosphere

Knowledge Navigation ToolKnowledge Navigation Tool

Query Composition Panel

Ontology Content

Results Panel

Query Syntax

Query Engine DialogueConcept

PropertiesOverview

Page 22: Towards ontology driven navigation of the lipid bibliosphere

Lipid Ontology as a Lipid Ontology as a Query ModelQuery Model

Page 23: Towards ontology driven navigation of the lipid bibliosphere

Lipid

PK Lipid_ID

Lipid_Name...

Protein

PK Protein_ID

Protein_Name...

Sentence

PK Sentence_ID

Sentence_Text...

Disease

PK Disease_ID

Disease_Name...

Document

PK Document_ID

TitleAuthorsJournal...

interactsWith_Protein

FK1 Lipid_IDFK2 Protein_ID

occursIn_Sentence

FK1 Lipid_IDFK2 Sentence_ID

relatedTo_Disease

FK1 Lipid_IDFK2 Disease_ID

occursIn_Document

FK1 Sentence_IDFK2 Document_ID

Query: Find documents containing sentences where lipids interact with proteins and the lipids are related to a disease.

Page 24: Towards ontology driven navigation of the lipid bibliosphere
Page 25: Towards ontology driven navigation of the lipid bibliosphere
Page 26: Towards ontology driven navigation of the lipid bibliosphere

SummarySummary

We build a lipid ontology in the Web Ontology Language (OWL) to We build a lipid ontology in the Web Ontology Language (OWL) to represent the LIPIDMAPS classification hierarchy. represent the LIPIDMAPS classification hierarchy.

The ontology model resolves nomenclature inconsistencies by The ontology model resolves nomenclature inconsistencies by grounding lipid synonyms to a individual lipid names. grounding lipid synonyms to a individual lipid names.

We report a document delivery system that in conjunction with a lipid We report a document delivery system that in conjunction with a lipid specific text mining platform instantiates lipid sentences into the lipid specific text mining platform instantiates lipid sentences into the lipid ontology. ontology.

We facilitate navigation of lipid literature using a drag ‘n’ drop visual We facilitate navigation of lipid literature using a drag ‘n’ drop visual query composer which poses description logic queries to the OWL-DL query composer which poses description logic queries to the OWL-DL ontology. ontology.

Lipid – disease and Lipid - protein statements in the lipid literature can Lipid – disease and Lipid - protein statements in the lipid literature can be readily queried and made easily available to lipid researchers.be readily queried and made easily available to lipid researchers.

Page 27: Towards ontology driven navigation of the lipid bibliosphere

Acknowledgement Acknowledgement

A*STAR – Agency for Science and A*STAR – Agency for Science and Technology, Singapore Government.Technology, Singapore Government.

National University of Singapore, National University of Singapore, Graduate Student Travel Grant.Graduate Student Travel Grant.