Upload
maximilian-poole
View
218
Download
0
Embed Size (px)
Citation preview
An Ontology for Protein-Protein Interaction Data
Karen JantzCIS Honors ProjectDecember 7, 2006
Overview
Problem Statement Objectives Approach Background Methodology Evaluation Demonstration Conclusion
Problem Statement
Several sources for protein-protein interaction data
Different schemata Different purposes Different strengths/weaknesses
Objectives
Unify the data Enable data mining Evaluate reliability of data across
data sources Gain new information about the
entire data set Enable others to easily add other
data sources to the set
Approach: ontology
o ontology – n.1. that which exists (philosophy)2. that which is represented (artificial
intelligence)o A descriptive data modelo Defines the entities and
relationships within a domaino Based upon datao Human-readable
Approach: ontology
Data integration Enables simultaneous querying across
multiple databases Data transformation
Enables interchange between database formats
Data mining Enables reasoning and learning over
the entire data set
Background: Data Sources
DIP (Jing Xia)
Database of Interacting Proteins
Most reliable data set Jing Xia
BIND (Abhijit Erande, Aaron Schoenhofer)
Biomolecular Interactions Network Databank
Very large data set Contains interactions, molecular
complexes, and pathways
Background: Data Sources
MINT Molecular INTeractions database
experimentally verified protein interactions Evaluates confidence level
IntAct Not limited to binary interactions Allows user submissions
mips CYGD Munich Information Center for Protein Sequences:
Comprehensive Yeast Genome Database
Limited to yeast Focuses on sequencing
Background: Tools
Protégé Open-Source Project Graphical ontology editor Interacts with OWL Reasoner Detailed API for modifying ontologies
programmatically
Background: Tools
Prompt A Protégé Plugin Enables ontology mapping Enables ontology comparison
Background: Related Work
PSI-MI Controlled vocabulary for PPI data Not a proposed database structure Decreases the strength of information Helpful in defining relationships and
keys
Methodology: Overview
Q: What interactions have been observed between with protein A?
DIP BIND MIPS MINT IntAct
WebInterface
Unified Ontology
UnifiedData Set
Q: What experiments give evidence for a given interaction?
Methodology: Design
Review the singular database schemata and determine strengths/weaknesses
View data files Native formats PSI-MI formats
Create a unified schema of the data sources
Create the unified ontology in Protégé Create each singular database as a subset
of the unified ontology
Protégé Screenshot
Methodology: Data Import
DOMParser Load data from XML
Protégé-OWL API Insert entities into singular databases
Methodology: Transformation
Use Prompt to create a mapping for each specific data source to the unified ontology
Use Prompt mappings to insert individuals from each singular ontology into the unified model
Methodology: Transformation
Duplicate Data Need to fill in attributes on existing
records Write ‘Algorithm Plugin’ for Prompt to
determine when individuals are the same
Prompt Screenshot - Mapping
Methodology: Query Interface
Export Protégé data into MySQL Web interface for collecting data Working with domain experts to
determine useful views, queries
Evaluation
Performance Transformation Time in Protégé Query Time for Web Interface
Size Minimize redundancy in data model Minimize duplicate data
Evaluation
Correctness Domain Experts
Dr. Brown, Dr. Wang Maintain proper data relationships
Utility Enrich data
Evaluation
Data Model Enrichment
0
5
10
15
20
25
30
IntAct MINT MIPS
Database
Nu
mb
er o
f C
lass
es
New
Changed
Existing
Demonstration
Future Work
Complete transformations Import data Evaluate ontology Add other databases to model
Conclusions
Adequate start Needs improvement, evolution,
more data sources As the project matures, the ontology
will be ready for use in the biological domain
Will be able to more easily gain information about protein-protein interactions
References
AAAI.org - AITopics: “Ontology” http://www.aaai.org/AITopics/html/ontol.html
Protégé http://protege.stanford.edu/overview/protege-o
wl.html Prompt
http://protege.cim3.net/cgi-bin/wiki.pl?Prompt PSI-MI
http://psidev.sourceforge.net/mi/xml/doc/user
References
BIND http://www.bind.ca
DIP http://www.dip.doe-mbi.ucla.edu
IntAct http://www.ebi.ac.uk/intact/site/
MINT http://mint.bio.uniroma2.it/mint/Welcome.do
MIPS http://mips.gsf.de/genre/proj/yeast
Q & A