27
Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics

Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 Mouse Genome Informatics

Embed Size (px)

Citation preview

Page 1: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Using Ontologies to Annotate Phenotypic Data

Janan T. EppigDecember

2008www.informatics.jax.orgMouse Genome Informatics

Page 2: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Human FOXN1forkhead box N1

T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY

Frank J, et al. Nature 398, 473 - 474 (1999)

Mouse Foxn1. Homozygous “nude” mouse. One of 8 known phenotypic mutations in mouse for the forkhead box N1 gene.

www.informatics.jax.org

Page 3: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Data Integration

Primary literature

Centers: mutagenesis, gene

trap, etc

Data Loads: GenBank, SNPs, clone collections, UniProt, RIKEN, etc

Electronic Submissions (individual labs)

Processing, QC, and curation

• Gather data from multiple sources• Factor out common objects• Assemble integrated objects

Page 4: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Integration is hard…not just a matter of combining data sources…

• Data from multiple sources can be of differing quality

• The same data can enter the system via various paths

• Naming conventions may or may not be to standards

• Some data sources don’t maintain unique accession numbers (or allow them to change)

• Periodic updates from data sources can cause problems• if objects have disappeared… (or reappear)• If objects have split in two

Page 5: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Data integration is hard

• “Bucketizing” establishe types of correspondence between objects in the input sets.

• Allows immediate incorporation of 1:1 corresponding data.

• Sorts conflicting data into bins that allow prioritization for curator resolution.

Page 6: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

• Data Acquisition

• Object Identity

• Standardizations

• Data Associations

• Integration with other bioinformatics resources

Literature &Loads

New Gene, Strain or

Sequence?

Controlled Vocabularie

s

Evidence & Citation

Co-curation of shared objects and concepts

Annotation Pipeline

Page 7: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Making semantic senseControlled vocabularies/nomenclatures

• Strains• Genes• Alleles (phenotypic or variant)• Classes of genetic markers• Types of mutations• Types of assays• Developmental stages• Tissues• Clone libraries• ES cell lines• and more…

….. organized as lists or simple hierarchies

Page 8: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Semantics plus relationship data

Ontologies/structured vocabularies

• Gene Ontology (GO)• Molecular function

• Biological process

• Cellular component

• Mouse Anatomy (MA)• Embryonic

• Adult

• Mammalian Phenotype (MP)

• Sequence Ontology (SO)

• Trait Ontology

….. organized as directed acyclic graphs (DAGs)

DAGs

Page 9: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Vocabularies in MGI

DAGs

DefinitionSynonyms

MP:1956

Strain: AEJ

Alleles:bd/bd

Genotype

Strain: C57BL/6

Alleles: Ppp1r3atm1Adpt/ Ppp1r3atm1Adpt

Terms

Respiratory failure

Postnatal lethality

Dilated renal tubules

Growth retardation

VocabularyNote

J:65378TAS

J:62648IDA

J:65322EE

Annotations

Page 10: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Common software for users to access vocabularies in MGI

Page 11: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Mammalian Phenotype Ontology

• Structured as DAG

• >6,250 terms covering physiological systems, behavior, survival, and development

• Available in web browser and in OBO and text formats from MGI ftp and OBO sites

• Each term linked to all annotations to the term or its children

• >133,00 annotations genotype - MP

Synonyms

Term in context

Links to all mouse genotypes with this phenotype

Page 12: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

abnormal reflex

opisthotonustremors myoclonus

abnormal muscle

physiology

musclephenotype

behavior/neurologicalphenotype

abnormal Involuntarymovement

Page 13: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

…make phenotype & disease model data robust & accessible to researchers & computational

biologists

• semantically consistent search methods

• integrated access to all phenotypic variation sources

(single-gene, genomic mutations, engineered mutations,

QTL, strains)

• data on human disease correlation

• access to mouse models from various approaches

- Genetic- Phenotypic

- Computational

Mammalian Phenotype (MP) Ontology

Page 14: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Developing the Mammalian

Phenotype Ontology• New terms from ongoing curation process

• Collaborative community efforts• identify new terms

• suggest improved organization of terms

• Rat Genome Database

• Mutagenesis Centers

• Human (NCBI)

• OMIA (Online Mendelian Inheritance in Animals)

• Proprietary Databases

• Future (International Mouse Knockout Projects)

• Comparisons among Ontologies (GO Process, Mouse

Anatomy, FMA, Cell Type, MPath, etc.)

• Systematic review by domain experts

MP Ontology Growth

0

1000

2000

3000

4000

5000

6000

7000

2004 2005 2006 2007 2008

Page 15: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Making Mammalian Phenotype Ontology Work

DAGs

• accommodate bio-specific terms

• computationally useful

• human accessible

• practical for curation

• cross-reference to other ontologies

Page 16: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Terms in MPMP term Entity PATO

QualityMP def

microphthalmia

eye small size reduced average size of the eyes

hydrocephaly cerebro-spinal fluid

increased, excessive, accumulated

excessive accumulation of cerebrospinal fluid in the brain, especially the cerebral ventricles, often leading to increased brain size and other brain trauma

brain large size (dilated)

trauma of brain

observed

Page 17: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Complex Examples:

id: MP:0006159 ! ocular albinism intersection_of: PATO:0001558 ! lacking processual parts intersection_of: inheres_in MA:0000261 ! eye intersection_of: towards GO:0006582 ! melanin metabolic process

MP definition: absence of melanin (pigment) production in the eye with identifiable melanocytes present

id: MP:0006110 ! ventricular fibrillation !intersection_of: PATO:0000688 ! asynchronous !intersection_of: inheres_in CL:0000746 ! cardiac muscle cell !intersection_of: towards GO:0060048 ! cardiac muscle contraction !intersection_of: located_in MA:0000079 ! ventricle endocardium !intersection_of: located_in MA:0000082 ! ventricle myocardium

MP definition: asynchronous contraction or quivering of individual cardiac muscle fibers in the ventricles

Page 18: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Status of Phenotype & Disease Data Nov

2008

Phenotype terms in MP ontology 6,355

Phenotypic alleles cataloged number of genes represented targeted alleles number of genes targeted

21,996 8,225 13,549 5,547

Alleles with MP annotationGenotypes with MP annotationTotal MP annotations

19,458 27,261137,577

Genotypes with OMIM associationsOMIM with associated genotypes

2,520 882

QTLs 4,015

Strains >10,500

Page 19: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Current QTL Display

Page 20: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Current QTL display

+

+

Page 21: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics
Page 22: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Genome coordinates: 132851306-135646474(MGI Mouse GBrowse)

Changes planned for QTL Display

Page 23: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Need for a trait ontology• What is measured

– Blood pressure– % body fat– Coat color

• Annotation of – QTL– Strain characteristics / baseline– Measurements

Some issues

• specificity vs broad

• synchronizing wih MP

• “how much” cross-species?

Page 24: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

OBO-Edit, curation tool for building ontologies

Page 25: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

Working on Trait Ontology

• MGI

• IMPC

• MPD

• RGD

• Domestic Species (Animal QTL)

Currently:

approx. 3600 terms, built initially by

stripping MP

working systematically on branches

Page 26: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

MGI Phenotype Data Staff

Anna Anagnostopoulos Randal P. Babiuk Susan M. BelloDonna L. Burkart Howard DeneMichelle Knowlton Ira Lu Hiroaki Onda Cynthia L. Smith Monika Tomczuk Linda L. Washburn

Jonathan S. BealKim L. ForthoferPeter Frost

Page 27: Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008  Mouse Genome Informatics

The End

NHGRI grant HG000330