Upload
kenneth-dickerson
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Using Ontologies to Annotate Phenotypic Data
Janan T. EppigDecember
2008www.informatics.jax.orgMouse Genome Informatics
Human FOXN1forkhead box N1
T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY
Frank J, et al. Nature 398, 473 - 474 (1999)
Mouse Foxn1. Homozygous “nude” mouse. One of 8 known phenotypic mutations in mouse for the forkhead box N1 gene.
www.informatics.jax.org
Data Integration
Primary literature
Centers: mutagenesis, gene
trap, etc
Data Loads: GenBank, SNPs, clone collections, UniProt, RIKEN, etc
Electronic Submissions (individual labs)
Processing, QC, and curation
• Gather data from multiple sources• Factor out common objects• Assemble integrated objects
Integration is hard…not just a matter of combining data sources…
• Data from multiple sources can be of differing quality
• The same data can enter the system via various paths
• Naming conventions may or may not be to standards
• Some data sources don’t maintain unique accession numbers (or allow them to change)
• Periodic updates from data sources can cause problems• if objects have disappeared… (or reappear)• If objects have split in two
Data integration is hard
• “Bucketizing” establishe types of correspondence between objects in the input sets.
• Allows immediate incorporation of 1:1 corresponding data.
• Sorts conflicting data into bins that allow prioritization for curator resolution.
• Data Acquisition
• Object Identity
• Standardizations
• Data Associations
• Integration with other bioinformatics resources
Literature &Loads
New Gene, Strain or
Sequence?
Controlled Vocabularie
s
Evidence & Citation
Co-curation of shared objects and concepts
Annotation Pipeline
Making semantic senseControlled vocabularies/nomenclatures
• Strains• Genes• Alleles (phenotypic or variant)• Classes of genetic markers• Types of mutations• Types of assays• Developmental stages• Tissues• Clone libraries• ES cell lines• and more…
….. organized as lists or simple hierarchies
Semantics plus relationship data
Ontologies/structured vocabularies
• Gene Ontology (GO)• Molecular function
• Biological process
• Cellular component
• Mouse Anatomy (MA)• Embryonic
• Adult
• Mammalian Phenotype (MP)
• Sequence Ontology (SO)
• Trait Ontology
….. organized as directed acyclic graphs (DAGs)
DAGs
Vocabularies in MGI
DAGs
DefinitionSynonyms
MP:1956
Strain: AEJ
Alleles:bd/bd
Genotype
Strain: C57BL/6
Alleles: Ppp1r3atm1Adpt/ Ppp1r3atm1Adpt
Terms
…
Respiratory failure
Postnatal lethality
Dilated renal tubules
Growth retardation
VocabularyNote
…
J:65378TAS
J:62648IDA
J:65322EE
Annotations
Common software for users to access vocabularies in MGI
Mammalian Phenotype Ontology
• Structured as DAG
• >6,250 terms covering physiological systems, behavior, survival, and development
• Available in web browser and in OBO and text formats from MGI ftp and OBO sites
• Each term linked to all annotations to the term or its children
• >133,00 annotations genotype - MP
Synonyms
Term in context
Links to all mouse genotypes with this phenotype
abnormal reflex
opisthotonustremors myoclonus
abnormal muscle
physiology
musclephenotype
behavior/neurologicalphenotype
abnormal Involuntarymovement
…make phenotype & disease model data robust & accessible to researchers & computational
biologists
• semantically consistent search methods
• integrated access to all phenotypic variation sources
(single-gene, genomic mutations, engineered mutations,
QTL, strains)
• data on human disease correlation
• access to mouse models from various approaches
- Genetic- Phenotypic
- Computational
Mammalian Phenotype (MP) Ontology
Developing the Mammalian
Phenotype Ontology• New terms from ongoing curation process
• Collaborative community efforts• identify new terms
• suggest improved organization of terms
• Rat Genome Database
• Mutagenesis Centers
• Human (NCBI)
• OMIA (Online Mendelian Inheritance in Animals)
• Proprietary Databases
• Future (International Mouse Knockout Projects)
• Comparisons among Ontologies (GO Process, Mouse
Anatomy, FMA, Cell Type, MPath, etc.)
• Systematic review by domain experts
MP Ontology Growth
0
1000
2000
3000
4000
5000
6000
7000
2004 2005 2006 2007 2008
Making Mammalian Phenotype Ontology Work
DAGs
• accommodate bio-specific terms
• computationally useful
• human accessible
• practical for curation
• cross-reference to other ontologies
Terms in MPMP term Entity PATO
QualityMP def
microphthalmia
eye small size reduced average size of the eyes
hydrocephaly cerebro-spinal fluid
increased, excessive, accumulated
excessive accumulation of cerebrospinal fluid in the brain, especially the cerebral ventricles, often leading to increased brain size and other brain trauma
brain large size (dilated)
trauma of brain
observed
Complex Examples:
id: MP:0006159 ! ocular albinism intersection_of: PATO:0001558 ! lacking processual parts intersection_of: inheres_in MA:0000261 ! eye intersection_of: towards GO:0006582 ! melanin metabolic process
MP definition: absence of melanin (pigment) production in the eye with identifiable melanocytes present
id: MP:0006110 ! ventricular fibrillation !intersection_of: PATO:0000688 ! asynchronous !intersection_of: inheres_in CL:0000746 ! cardiac muscle cell !intersection_of: towards GO:0060048 ! cardiac muscle contraction !intersection_of: located_in MA:0000079 ! ventricle endocardium !intersection_of: located_in MA:0000082 ! ventricle myocardium
MP definition: asynchronous contraction or quivering of individual cardiac muscle fibers in the ventricles
Status of Phenotype & Disease Data Nov
2008
Phenotype terms in MP ontology 6,355
Phenotypic alleles cataloged number of genes represented targeted alleles number of genes targeted
21,996 8,225 13,549 5,547
Alleles with MP annotationGenotypes with MP annotationTotal MP annotations
19,458 27,261137,577
Genotypes with OMIM associationsOMIM with associated genotypes
2,520 882
QTLs 4,015
Strains >10,500
Current QTL Display
Current QTL display
+
+
Genome coordinates: 132851306-135646474(MGI Mouse GBrowse)
Changes planned for QTL Display
Need for a trait ontology• What is measured
– Blood pressure– % body fat– Coat color
• Annotation of – QTL– Strain characteristics / baseline– Measurements
Some issues
• specificity vs broad
• synchronizing wih MP
• “how much” cross-species?
OBO-Edit, curation tool for building ontologies
Working on Trait Ontology
• MGI
• IMPC
• MPD
• RGD
• Domestic Species (Animal QTL)
Currently:
approx. 3600 terms, built initially by
stripping MP
working systematically on branches
MGI Phenotype Data Staff
Anna Anagnostopoulos Randal P. Babiuk Susan M. BelloDonna L. Burkart Howard DeneMichelle Knowlton Ira Lu Hiroaki Onda Cynthia L. Smith Monika Tomczuk Linda L. Washburn
Jonathan S. BealKim L. ForthoferPeter Frost
The End
NHGRI grant HG000330