Upload
hani
View
27
Download
0
Embed Size (px)
DESCRIPTION
Towards ontology driven navigation of the lipid bibliosphere. Chistopher J. O.Baker, Rajaraman Kanagasabai, Wee Tiong Ang, Anitha Veeramani, Hong-Sang Low , and Markus R. Wenk International Conference on Bioinformatics 2007 (InCoB 2007) 27-31 August 2007. Motivation. - PowerPoint PPT Presentation
Citation preview
Towards ontology driven Towards ontology driven navigation of the lipid navigation of the lipid
bibliospherebibliosphere
Chistopher J. O.Baker, Rajaraman Kanagasabai, Chistopher J. O.Baker, Rajaraman Kanagasabai,
Wee Tiong Ang, Anitha Veeramani, Wee Tiong Ang, Anitha Veeramani, Hong-Sang LowHong-Sang Low, and , and Markus R. WenkMarkus R. Wenk
International Conference on Bioinformatics 2007International Conference on Bioinformatics 2007
(InCoB 2007)(InCoB 2007)
27-31 August 200727-31 August 2007
MotivationMotivation
Lipid research in 21Lipid research in 21stst century is in need of century is in need of reliable & sensible integration of data from reliable & sensible integration of data from different sources.different sources.
Lipid nomenclature in biomedical literature is Lipid nomenclature in biomedical literature is highly heterogeneous. highly heterogeneous.
Semantic data integration is necessary for lipid Semantic data integration is necessary for lipid research yet this is poorly achievable due to an research yet this is poorly achievable due to an absence of a single absence of a single unifiedunified, , consistentconsistent, and , and universally accepteduniversally accepted lipid classification system. lipid classification system.
ObjectiveObjective
Develop a system that can facilitate the Develop a system that can facilitate the navigation of the lipid bibliosphere using a navigation of the lipid bibliosphere using a standardized lipid vocabulary with precise standardized lipid vocabulary with precise semantics. semantics.
To make use of the expressivity of a w3c To make use of the expressivity of a w3c endorsed standard, the web ontology language endorsed standard, the web ontology language (OWL) for representing lipid nomenclature & (OWL) for representing lipid nomenclature & hierarchy. hierarchy.
Lipids Lipids OntologiesOntologies
Capture knowledge: The meaning of important vocabulary (classes, properties/relations and instance data in a domain model).
Provides a common terminology for a domain.
Provides a basis for interoperability between information systems.
Make the content in information sources explicit.
Provides an index and query model to a repository of information.
Lipids have many properties and biologically related information that needs to be systematically captured in a domain model.
Lipids have no universally accepted nomenclature.
Integration of lipid data is hampered by a lack of unified classification system and presence of multiple data formats.
Lipid nomenclature isnot always intuitive.
Semantics of lipid terminology can beambiguous, synonym rich, non standard.
Lipid OntologyLipid Ontology
Lipid Upper OntologyLipid Upper Ontology
Implemented in OWL-DL language
Uses LIPIDMAPS systematic lipid nomenclature
560 named classes 352 lipid subclasses 71 Object properties 4 Data properties Lipid instance: LIPIDMAPS systematic name Depth: 8 levels
Modeling lipid Modeling lipid informationinformation
Multiple features of lipids are modeled in the Lipid_Specification concepts and are directly related to the lipid classification hierarchy found under the Lipid concept
Linking lipids with other Linking lipids with other biological informationbiological information
Lipid-Protein Modeled with Protein concept Protein instance: Protein name
from SWISPROT Lipid concept is linked to the
Protein concept via the InteractsWith_Protein property
Lipid-Disease Modeled with Disease concept Disease instance: Disease name from Disease Ontology Lipid concept is linked to the Disease concept via the hasRole_In_Disease property
A LIPID has many names A LIPID has many names •Phosphatidylcholine is an important component of the mucus layer in the large intestine.
•The distribution of these pores was examined using 1,2-di-oleoyl-sn-glycero-3-phosphocholine (DOPC) phospholipid vesicles under a standard fluorescent microscope.
•Lecithin is usually used as a synonym for pure phosphatidylcholine, which is the major component isolated from egg yolk or soy beans.
2-[[(2R)-2,3-di(octadecanoyloxy)propoxy]-hydroxyphosphoryl]oxyethyl-trimethylazanium
Modelling SynonymsModelling Synonyms 4 types of name
LIPIDMAPS systematic name
IUPAC systematic name
Broad lipid name(non-systematic)
Exact lipid name(non-systematic)
Instances of names are connected via the propertieshasIUPAC_SynonymhasLIPIPMAPS_SynonymhasBroad_Lipid_SynonynhasExact_Lipid_Synonym
Literature SpecificationLiterature Specification
Literature-driven, Literature-driven, ontology-centric ….ontology-centric ….
Content Delivery Platform - Automated Document delivery from Pubmed-PDF / USPTO-HTML Tools for conversion of docs to text-minable text
Text Mining - Customized and Automated Regular Expressions, Named Entities, Relations, Co-reference
Knowledge Engineering Ontology Creation Domain Modeling / Customized / Rapid Prototype
Knowledge Navigation / Ontology Interrogation Tools Interactive Visual Query, Natural Language Interfaces
Service platform for knowledge-intensive lipid navigation tasks
Lipid Ontology as a Lipid Ontology as a knowledge integration knowledge integration
vehiclevehicleOWL interrogation• DL reasoning & inference• nRQL (new RACER Query Language)• Semantic query tools
Major Knowledge Sources• Lipid Ontology
• NLP tagged text• Database content
Knowledge navigation:
Ontology and Text Mining Ontology and Text Mining
2 Sentence Extraction
1 Document Content
3 Sentence Detection: lipid interaction protein
4 Entity Recognition: term identification / assign lipid class
5 Normalization: collapse lipid synonyms
6 Relation Extraction: Lipid-Protein or Lipid Disease
8 Populate OWL ontology (JENA API)
Complete Instantiated
OWL-DLOntology
Term List DB’s: Lipid names, LIPIDMAPS, Lipid Bank, KEGG classifications, Disease names, Protein namesStemmed Interactions
Document and sentence meta data "TLR4 binds to POPC", tagged as "TLR4 binds to POPC", tagged as
"<term category=""<term category="proteinprotein"> TLR4</term> "> TLR4</term> binds to binds to <term category="<term category="lipidlipid">POPC</term>"">POPC</term>"
7 Classification: Identify ontology classes and specify relations for all sentences, proteins, lipid subclasses.
Indexed Lipid SentencesIndexed Lipid SentencesLipid Instance
Lipid Instance
Lipid Class
User input query “lipid interact* protein”
110 full text papersPubmed NLP tagging
87 docs tagged with
relevant name
entities
Ontology instantiation
“Instantiated ontology”Knowledge Navigation
vehicleOutput for end userUser
2 sec/Doc
123 lipids,361 proteins,
920 lipid-protein interactions
Knowledge integration Knowledge integration pipelinepipeline
Specification• Content Acquisition pipeline:
• Automated Pubmed query• Text format converter
User input query “lipid interact* protein”
110 full text papersPubmed NLP tagging
87 docs tagged with
relevant name
entities
Ontology instantiation
“Instantiated ontology”Knowledge Navigation
vehicleOutput for end userUser
2 sec/Doc
123 lipids,361 proteins,
920 lipid-protein interactions
Knowledge integration Knowledge integration pipelinepipeline
Specification•Text-mining & NLP:
• BioText Suite for tokenization, part of speech tagging, named entity recognition, grounding, association mining
User input query “lipid interact* protein”
110 full text papersPubmed NLP tagging
87 docs tagged with
relevant name
entities
Ontology instantiation
“Instantiated ontology”Knowledge Navigation
vehicleOutput for end userUser
2 sec/Doc
123 lipids,361 proteins,
920 lipid-protein interactions
Knowledge integration Knowledge integration pipelinepipeline
Specification•Ontology Instantiation pipeline:
•custom script based on JENA API
User input query “lipid interact* protein”
110 full text papersPubmed NLP tagging
87 docs tagged with
relevant name
entities
Ontology instantiation
“Instantiated ontology”Knowledge Navigation
vehicleOutput for end userUser
2 sec/Doc
123 lipids,361 proteins,
920 lipid-protein interactions
Knowledge integration Knowledge integration pipelinepipeline
Specification•Knowledge Navigation platform:
•Knowledge navigator or Knowlegator•RACER•nRQL
OWL-DL Query with nRQLOWL-DL Query with nRQL•nRQL queries are built on a Lisp syntax• Elementary query atoms, combinable into highly expressive but syntactically complex A-box queries to derive assertions about instance data (individuals).
• Unary concept query (Instance Classification and retrieval)• Does this instance belong to this class?• What are instances of class X• To which classes does instance X belong ?
• Binary role query• What instances are related by relation X
• Binary role constraint query • Unary has known successor (Ancestor / Descendant)• Negation • Intersect / Conjunction• Union / Disjunction• Combinations (And / Union)
Mark-upMark-up
LanguageLanguageDescriptionDescription Query Query
LanguageLanguage
XMLXML
Structured Structured
DocumentDocument
XPath, XPath, XQueryXQuery
RDFRDF
Data Data Model Model
for for objectsobjects
RDQL, RDQL, RQL, RQL,
Versa, Versa, SquishSquish
OWLOWL
Data Data Model + Model +
RelationsRelations
nRQLnRQL,, OWL-QL, OWL-QL,
JENAJENA
Haarslev V., Moeller R., Wessel M., Querying the Semantic Web with Racer + nRQL In Sean Bechhofer, Volker Haarslev, Carsten Lutz, Ralf Moeller (Eds) CEUR workshop proceedings of KI-2004 Workshop on Applications of Description Logics (ADL 04), Ulm, Germany, Sep 24 2004 The New Racer Query Language www.cs.concordia.ca/~haarslev/racer/racer-queries.pdf
Knowledge Navigation ToolKnowledge Navigation Tool
Query Composition Panel
Ontology Content
Results Panel
Query Syntax
Query Engine DialogueConcept
PropertiesOverview
Lipid Ontology as a Lipid Ontology as a Query ModelQuery Model
Lipid
PK Lipid_ID
Lipid_Name...
Protein
PK Protein_ID
Protein_Name...
Sentence
PK Sentence_ID
Sentence_Text...
Disease
PK Disease_ID
Disease_Name...
Document
PK Document_ID
TitleAuthorsJournal...
interactsWith_Protein
FK1 Lipid_IDFK2 Protein_ID
occursIn_Sentence
FK1 Lipid_IDFK2 Sentence_ID
relatedTo_Disease
FK1 Lipid_IDFK2 Disease_ID
occursIn_Document
FK1 Sentence_IDFK2 Document_ID
Query: Find documents containing sentences where lipids interact with proteins and the lipids are related to a disease.
SummarySummary
We build a lipid ontology in the Web Ontology Language (OWL) to We build a lipid ontology in the Web Ontology Language (OWL) to represent the LIPIDMAPS classification hierarchy. represent the LIPIDMAPS classification hierarchy.
The ontology model resolves nomenclature inconsistencies by The ontology model resolves nomenclature inconsistencies by grounding lipid synonyms to a individual lipid names. grounding lipid synonyms to a individual lipid names.
We report a document delivery system that in conjunction with a lipid We report a document delivery system that in conjunction with a lipid specific text mining platform instantiates lipid sentences into the lipid specific text mining platform instantiates lipid sentences into the lipid ontology. ontology.
We facilitate navigation of lipid literature using a drag ‘n’ drop visual We facilitate navigation of lipid literature using a drag ‘n’ drop visual query composer which poses description logic queries to the OWL-DL query composer which poses description logic queries to the OWL-DL ontology. ontology.
Lipid – disease and Lipid - protein statements in the lipid literature can Lipid – disease and Lipid - protein statements in the lipid literature can be readily queried and made easily available to lipid researchers.be readily queried and made easily available to lipid researchers.
Acknowledgement Acknowledgement
A*STAR – Agency for Science and A*STAR – Agency for Science and Technology, Singapore Government.Technology, Singapore Government.
National University of Singapore, National University of Singapore, Graduate Student Travel Grant.Graduate Student Travel Grant.