Upload
zuri
View
30
Download
1
Tags:
Embed Size (px)
DESCRIPTION
An Ontological Approach for Describing Phospho-proteins in Rhodococcus. Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang CPSC 445. April 5 th . 2007. What is an ontology?. Purpose: knowledge representation & reasoning - PowerPoint PPT Presentation
Citation preview
An Ontological Approach for Describing Phospho-proteins
in Rhodococcus
Dept. of Computer Science,University of British Columbia.
Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang
CPSC 445. April 5th. 2007
Purpose: knowledge representation & reasoning Facilitates knowledge sharing and reuse
Definition: a data model that represents a set of concepts within a
domain and the relationships between those concepts. It is used to reason about the objects within that
domain. Describe individuals (instances), classes (concepts),
attributes, relations and axioms
Uses: AI, information architecture, semantic web, software
engineer
Biology = knowledge based use prior knowledge to infer new knowledge data rich
Biologist needs extensive prior knowledge to analyze data obtained Pace of data production beyond one’s ability
to acquire knowledge
Need an automated system to apply domain experts’ knowledge to biological data
Joint effort of biologist and computer scientist
Build ontologies using domain knowledge Rapid classification of large datasets Allows query to find instances of a class Create controlled vocabularies for shared use
across different biological and medical domains. In bioinformatics, ontology can make
knowledge available to community and its applications.
“provides structured, controlled vocabularies and classifications that cover several domains of molecular biology”
Uses: annotation of large data sets the ability to group gene products to some high level
term Computational (putative) assignments of molecular
function based on sequence similarity to annotated genes or sequences.
Unknown gene product
Sequence in SWISS-PROT
Seq similarity
? Inferred gene function from electronic annotation
Known function
Infer function
There is no standardized methodology
But, efforts to make more comprehensive guidelines
In general: Informal Stage
natural language Formal Stage
formal knowledge representation language
Inspired by software engineering.
User Model (Biologist):
#1) Identification of the purpose and scope of the ontology#2) Acquisition of domain knowledge
Identify purpose and scope
Knowledge Acquisition
Conceptualization Model (Bioinformatician/Biologist):
#3) Identifying key concepts in the domain.#4) Integration by using and incorporating other existing ontologies
Building
Identify purpose and scope
Knowledge Acquisition
Conceptualization
Integrating existing
ontologies
Implementation Model (Bioinformatician):
#5) Representing concepts with a formal language
#6) Documenting informal and formal definitions#7) Evaluation of the appropriateness of the ontology for its intended application
Building
Identify purpose and scope
Knowledge Acquisition
Conceptualization
Integrating existing
ontologies
Encoding
Evaluation
Language & Representation
Available Development
Tools
ProvidesProvides
Results
Build using
OWL-DL
Made up of
Pellet
Reasoner
Uses
BiologistsSignal Protein Experts
Phosphatase & Kinase background knowledge
Proteomic experimental data
Data (Instances/Individuals)
Ontology(Classes)
Bioinformatician
Can we use the phosphabase ontology to describe phospho-proteins discovered by the Rhodococcus Genome Project?
subClassOf
XML syntax OWL-DL (Description Logic) : Certain restrictions to
guarantee decidability based on description logic OWL uses Resource Description Framework (RDF)
Subject Predicate Object Basic components in OWL:
classes Individuals properties Class
ProfessorSuperclass
FacultyMember
InstanceOf
IndividualAnne Condon
IndividualJennifer Chen
teaches
Biological Motivation Driven by protein domain architecture to describe
signalling protein families Background knowledge required for construction:
Signal protein domains Presence of protein domains within signal proteins
OWL Ontology Ontology uses OWL-DL
Description-logic can be applied to classify proteins using reasoners
Many different ways to represent this knowledge in OWL
Wolstencroft et al, 2006
Domain_Entity
Macromolecule
Protein_Phosphatase
Protein_Kinase
Input Ontology – OWL-DL format
axioms about classes into TBox type and property assertions
(individuals) into ABox Query - RDQL (SPARQL) format
Instance data (individuals)
Tableau Reasoner Checks satisfiability of an ABox
with respect to a TBox Test for knowledge base
consistency
[Parsia and Sirin, ISWC 2004]
Locus ID: RHA1_ro01186 Acknowledgements for this annotation
Strain:Rhodococcus sp. RHA1NBCI Taxonomy Database
Replicon: ChromosomeRefseq: NC_008268
Start: 1260414 Stop: 1260866
Gene Name: Alternate gene name(s):
Protein / Product Name:
protein-tyrosine-phosphatase
Alternate product name(s):
Refseq GI Number: 111018199
Category:Protein
Localization:Cytoplasmic (Class 3)
Transposon Mutant Available?:
No transposon mutant available yet
COG predictions:Wzb, Protein-tyrosine-phosphatase [Signal transduction mechanisms].
PseudoCAPEC Number:
3.1.3.48
COG0394
Comments:
PFAM predictions: PF01451: LMWPc, Low molecular weight phosphotyrosine protein phosphatase..
go_function: protein tyrosine phosphatase activity [goid 0004725]
Locus ID: RHA1_ro05453 Acknowledgements for this annotation
Strain:Rhodococcus sp. RHA1NBCI Taxonomy Database
Replicon: ChromosomeRefseq: NC_008268
Start: 5845588 Stop: 5847288
Gene Name: Alternate gene name(s):
Protein / Product Name:
probable protein-tyrosine kinase Alternate product name(s):
Refseq GI Number:
111022419
Category:Protein
Localization:Cytoplasmic Membrane (Class 3)
Transposon Mutant Available?:
No transposon mutant available yet
COG predictions:
Mrp, ATPases involved in chromosome partitioning [Cell division and chromosome partitioning].
PseudoCAPEC Number:
2.7.10.1
COG0489
TIGRFAM predictions:
TIGRFAM Accession: TIGR01007TIGRFAM name and function: eps_fam - capsular exopolysaccharide family (6.7e-46)TIGRFAM EC Number: Role: Transport and binding proteins Sub Role: Carbohydrates, organic alcohols, and acidsTIGRFAM to Gene Ontology Mappings:
Comments:
PFAM predictions:
PF02706: Wzz, Chain length determinant protein. This family includes proteins involved in lipopolysaccharide (lps) biosynthesis. This family comprises the whole length of chain length determinant protein (or wzz protein) that confers a modal distribution of chain length on the O-antigen component of lps. This region is also found as part of bacterial tyrosine kinases..
go_component: signal recognition particle (sensu Eukaryota) [goid 0005786]
Locus ID: RHA1_ro05554 Acknowledgements for this annotation
Strain:Rhodococcus sp. RHA1NBCI Taxonomy Database
Replicon: ChromosomeRefseq: NC_008268
Start: 5971327 Stop: 5972865
Gene Name: Alternate gene name(s):
Protein / Product Name:
probable alkaline phosphatase Alternate product name(s):
Refseq GI Number:
111022520
Category:Protein
Localization:Unknown (This protein may have multiple localization sites) (Class 3)
Transposon Mutant Available?:
No transposon mutant available yet
COG predictions: PhoD, Phosphodiesterase/alkaline phosphatase D [Inorganic ion transport and metabolism].
TIGRFAM predictions:
TIGRFAM to Gene Ontology Mappings: COG3540
Comments:
PFAM predictions:
PF00245: Alk_phosphatase, Alkaline phosphatase.
go_component: organelle inner membrane [goid 0019866]
Ontologies can be used as a standard model for the exchange of biological information
Building ontologies can get very complicated Biologists with little description logic training Computer scientist with little knowledge of biology Need more bioinformaticians
Ontologies can facilitate automated annotation of genes / gene products
Difficult to Read and Infer from Ontologies Ontologies can get very big (Phosphabase only
small example) Reasoners are sometimes slow and inaccurate
www.quicklybored.c
om
Rhodococcus sp. RHA1 data Eltis Lab: Dr. Lindsay Eltis, Dept. Microbiology
& Biochemistry Phosphabase Ontologoy
Wolstencroft Lab, University of Manchester, UK
Bioinformatics paper: Wolstencroft et al, 2006
Phosphabase Ontology processing Benjamin Good, iCAPTURE Centre, Vancouver