55
Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa Haendel March 14 th , 2014 Updates in Clinical Genetics 2014

Haendel clingenetics.3.14.14

Embed Size (px)

Citation preview

Page 1: Haendel clingenetics.3.14.14

Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa HaendelMarch 14th, 2014

Updates in Clinical Genetics 2014

Page 2: Haendel clingenetics.3.14.14

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

Page 3: Haendel clingenetics.3.14.14

The Challenge: Interpretation of Disease Candidates

?

What’s in the box? How are

candidates identified?

How do they compare?

Prioritized Candidates, Models, functional validation

M1

M2

M3

M4

...

Phenotypes

P1

P2

P3

Genotype info

G1

G2

G3

G4

Pathogenicity, frequency, protein interactions, gene expression, gene networks, epigenomics, metabolomics….

Page 4: Haendel clingenetics.3.14.14

Candidate gene prioritization

Page 5: Haendel clingenetics.3.14.14

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

Models recapitulate various phenotypic aspects of disease

?

Page 6: Haendel clingenetics.3.14.14

OMIM Query # Records

“large bone” 785

“enlarged bone” 156

“big bone” 16

“huge bones” 4

“massive bones” 28

“hyperplastic bones” 12

“hyperplastic bone” 40

“bone hyperplasia” 134

“increased bone growth” 612

Searching for phenotypes usingtext alone is insufficient

Page 7: Haendel clingenetics.3.14.14

Problem: Clinical and model phenotypes are described differently

Page 8: Haendel clingenetics.3.14.14

“Expanding” the phenotypic coverage of the human genome

0%

20%

40%

60%

80%

100%OMIM

OMIM+GWAS

GWAS

% h

um

an

cod

ing

gen

es

Ortholog only

Human+Ortholog

Human only

Five model organisms (mouse, zebrafish, fly, yeast, rat) provide almost 80% phenotypic coverage of the human genome

Page 9: Haendel clingenetics.3.14.14

How can we take advantage this model organism phenotype data?

Page 10: Haendel clingenetics.3.14.14

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

Page 11: Haendel clingenetics.3.14.14

Using ontologies to compare phenotypes across species

Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247

Page 12: Haendel clingenetics.3.14.14

What is an ontology?A set of logically defined, inter-related terms used to annotate data

Use of common or logically related terms across databases enables integration

Relationships between terms allow annotations to be grouped in scientifically meaningful ways

Reasoning software enables computation of inferred knowledge

Groups of annotations can be compared using semantic similarity algorithms

Page 13: Haendel clingenetics.3.14.14

An ontology provides the logical basis of classification

Any sense organ that functions in the detection of smell is an olfactory sense organ

sense organcapable_of some detection of smell

olfactory sense organ

Page 14: Haendel clingenetics.3.14.14

nose

sense organ

nose

capable_of some detection of smell

sense organcapable_of some detection of smell

olfactory sense organ

nose

=> These are necessary and sufficient conditions

Classifying

Page 15: Haendel clingenetics.3.14.14

Representating phenotypes

Page 16: Haendel clingenetics.3.14.14

Human Phenotype Ontology

Used to annotate:• Patients• Disorders• Genotypes• Genes• Sequence variantsIn human

Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Page 17: Haendel clingenetics.3.14.14

Mammalian Phenotype Ontology

Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7

Used to annotate and query:• Genotypes• Alleles• GenesIn mice

Page 18: Haendel clingenetics.3.14.14

Post-composed models of phenotype annotation

EntityAnatomy: headAnatomy: heart

Anatomy: ventral mandibular archGene Ontology: swim bladder inflation

QualitySmall sizeEdematousThickArrested

Page 19: Haendel clingenetics.3.14.14

A human phenotype example

Abnormality of the eye

Vitreous hemorrhage

Abnormal eye morphology

Abnormality of the cardiovascular system

Abnormal eye physiology

Hemorrhage of the eye

Internal hemorrhage

Abnormality of the globe

Abnormality of blood circulation

Page 20: Haendel clingenetics.3.14.14

lung

lung

lobular organ

parenchymatous organ

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

Problem: Data silos

develops_frompart_of

is_a (SubClassOf)

surrounded_by

Page 21: Haendel clingenetics.3.14.14

Solution: bridging semantics

Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

anatomical structure

endoderm of forgut

lung bud

lung

respiration organ

organ

foregut

alveolus

alveolus of lung

organ part

FMA:lung

MA:lung

endoderm

GO: respiratory gaseous exchange

MA:lung alveolus

FMA: pulmonary

alveolus

is_a (taxon equivalent)

develops_frompart_of

is_a (SubClassOf)

capable_of

NCBITaxon: Mammalia

EHDAA:lung bud

only_in_taxon

pulmonary acinus

alveolar sac

lung primordium

swim bladder

respiratory primordium

NCBITaxon:Actinopterygii

Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30

Page 22: Haendel clingenetics.3.14.14

Phenotype representation requires more than “phenotype ontologies”

glucose metabolism (GO:00060

06)

Gene/protein function

data

glucose(CHEBI:17

234)

Metabolomics,

toxicogenomics

data

Disease & phenotyp

e data

type II diabetes mellitus

(DOID:9352)

pyruvate(CHEBI:15

361)

Disease Gene Ontology Chemical

pancreatic beta cell

(CL:0000169)

transcriptomic data

Cell

Page 23: Haendel clingenetics.3.14.14

OWLsim: Phenotype similarity across patients or organisms

https://code.google.com/p/owltools/wiki/OwlSim

Page 24: Haendel clingenetics.3.14.14

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

Page 25: Haendel clingenetics.3.14.14

General exome analysis Single Exome

Remove off-target and common variants, filter on predicted deleteriousness, candidate gene strategies

Prioritize based on known genes, allele frequency, and pathogenicity

Homozygous recessive, X-linked, De novo (if trio)

Page 26: Haendel clingenetics.3.14.14

Undiagnosed Disease Program exome analysis

Family exome data

Prioritize based on alignment quality, allele frequency, predicted deleterious, and PubMed

Filter using SNP chip data, Mendelian models of inheritance

and Population frequency

Page 27: Haendel clingenetics.3.14.14

exome analysis

Recessive, De novo filters

Remove off-target, common variants, and variants not in known disease causing genes

Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/

Page 28: Haendel clingenetics.3.14.14

Remove off-target and common variants

Recessive, De novo filters

https://www.sanger.ac.uk/resources/databases/exomiser/

Robinson et al. http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract

Exomiser exome analysis

Page 29: Haendel clingenetics.3.14.14

Current UDP analysis with semantic phenotyping

Family Exome Data

CombinedScore

Phenotype Data

Filter using SNP chip data, Mendelian models of

inheritance, and population frequency

Page 30: Haendel clingenetics.3.14.14

Benchmarking

1092 unaffected exomes 28,516 disease

associated variants

100,000 simulated exomes

Annotate variants Remove off-target, syn and common(>1%

MAF) variants (plus optional inheritance model filtering)

Prioritize based on combined score

Page 31: Haendel clingenetics.3.14.14

All diseases Autosomal Dominant

Autosomal Recessive

(hom)

Autosomal Recessive

(compound het)

0

10

20

30

40

50

60

70

80

90

100

% e

xom

es w

ith d

isea

se g

ene

as

top

hit

Variant

Phenotypic relevancePHIVE

Phenotype and variant data synergistically improve exome interpretation

Page 32: Haendel clingenetics.3.14.14

Results Correct gene as top scoring hit in 68.3% of

exomes out of an average of 272 post-filtering candidate genes

Improvement of between 1.8 and 5.1 fold in the percentage of candidate genes correctly ranked in first place compared to just using pathogenicity and frequency data

Shows utility of structured phenotype data for computational analysis

Page 33: Haendel clingenetics.3.14.14

UDP Experiment

UDP Diploid Aligned Cohort

VCF file18 families

Phenotype profiles

Mendelian filtered files (per family)

Mendelian Filters

Exomiser

PhenIX

Phenotype only

VCF files with phenotype and variant

scores (per family)

Page 34: Haendel clingenetics.3.14.14

Top de novo candidates for patient 2543

Patient Exomiser Phenotype only PhenIX

UDP2543 STIM1, CYP2D6, MUC5B

ITGA7, PLEC, STIM1, PTGS1,

TTN

STIM1, RB1, DLEC1, CHRNB4, MUC5B, REPIN1, NBPF8, GPRIN3, TMEFF1, FLT3LG,

OSM, FZD10, MUC12

Gene Variant MAF(ESP or 1000g)

Consequence Predicted pathogenicity: SIFT, PolyPhen, MutTaster (0-1)

STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1

Page 35: Haendel clingenetics.3.14.14

UDP2543: phenotypic similarityPatient Stim1 het mouse OMIM:612783

(IMMUNODEFICIENCY 10) - hom STIM1 mutations

OMIM:160565 (MYOPATHY, TUBULAR AGGREGATE) - het STIM1 mutations

Impaired platelet aggregation

abnormal platelet activation

Thrombocytopenia

Thrombocytopenia decreased platelet cell number

Thrombocytopenia

Myopathy Myopathy Myopathy

Generalized hypotonia

Muscular hypotonia Proximal muscle weakness

Petechiae increased bleeding time

Autoimmune hemolytic anemia

Delayed gross motor development

Epistaxisincreased bleeding time

Gower sign

Page 36: Haendel clingenetics.3.14.14

STIM1

Page 37: Haendel clingenetics.3.14.14

Proposed workflow for undiagnosed diseases

Page 38: Haendel clingenetics.3.14.14

What constitutes an adequate phenotype annotation for an undiagnosed patient?

Page 39: Haendel clingenetics.3.14.14

Defining minimum phenotype standard:

1. Is the annotation specificity similar to or better than the corpus of available phenotype data?

2. Is the number of annotations/patient similar or better?

3. How does the ontology and annotation set differ across anatomical systems in terms of granularity? Does this change specificity requirements for phenotypic profiles?

4. How does use of NOT annotations help further specify the uniqueness of an undiagnosed patient?

5. How do onset, temporal ordering, and severity affect specificity?

Page 40: Haendel clingenetics.3.14.14

UDP phenotype annotation metrics

UDP annotations have a similar Information content (IC) and a larger number of average annotations per disease/patient

Page 41: Haendel clingenetics.3.14.14

Anatomical annotation distribution in the corpus

Nervous system, skeletal system, and immune system is highest => these categories require greater specificity and numbers of annotations

Page 42: Haendel clingenetics.3.14.14

Annotation specificity meter

What about common traits, like blue eyes or acne?

Page 43: Haendel clingenetics.3.14.14

Making the patient phenotype profiles as good as can be

Total requests from UDP 614 Examples

Number of requests assigned to HPO terms 423 Chronic limb pain -> limb pain

Number of terms that need consideration by UDP 145

Expressive language -> delay? Increase? Abnormal?

Number of requests that belong in other parts of the patient record 68

Abnormal aCGH 12q21.1-12q.2 (662 kb duplication) paternal origin -> move to genotype information portion of the record

It is a community effort to contribute requests to the ontologies and quality profiling helps make our tools work better for everyone

Page 44: Haendel clingenetics.3.14.14

Limitations and ongoing work

Adding negation to the algorithm

Temporal ordering of phenotypes

Leveraging severity, expressivity, and penetrance data

Page 45: Haendel clingenetics.3.14.14

Additional tools leveraging structured phenotype data

Page 46: Haendel clingenetics.3.14.14

The Monarch system

http://monarchinitiative.org

Page 47: Haendel clingenetics.3.14.14

Monarch phenotype dataSpecies Source Unique

genotypes/variants

Disease/phenotype associations

Mouse MGI 53,573 406,618

Zebrafish ZFIN 14,703 75,698

C. elegans Wormbase 116,106 411,154

Fruit fly Flybase 98,596 265,329

Human OMIM 26,372 27,798

Human Orphanet 2,872 5,095

Human ClinVar 62,437 178,424

Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date

Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse

Page 48: Haendel clingenetics.3.14.14

ModelCompare: How do the models recapitulate the disease?

Late-onset Parkinson’s Phenotypes Mouse Phenotypes

Page 49: Haendel clingenetics.3.14.14

Slc6a3Dbh

Tyrosine metabolism

Slc6a3

Slc18a2

Uchl1

Uchl3

Snca

Mfn2

Cx IV

Cox8a

Th

Late-onsetParkinson’s Phenotypes

(subset)

Bradykinesia

Depression

Dysphagia

Lewy bodies

Network phenotype distribution

Page 50: Haendel clingenetics.3.14.14

Slc6a3Dbh

Tyrosine metabolism

Slc6a3

Slc18a2

Uchl1

Uchl3

Snca

Mfn2

Cx IV

Cox8a

Th

Late-onsetParkinson’s Phenotypes

(subset)

Bradykinesia

Depression

Dysphagia

Lewy bodies

Abnormal gait

ataxia

paralysis

BradykinesiaAbnormal locomotion

Abnormality of central motor function

Phenotypes in common

Page 51: Haendel clingenetics.3.14.14

Finding collaborators for functional validation

PatientPhenotype profile

Phenotyping experts

Page 52: Haendel clingenetics.3.14.14

Exome Walker: Network based exploration of phenotypically similar diseases

http://compbio.charite.de/ExomeWalker/Walking the interactome for prioritization of candidate disease genes.Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013.

Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network

Exploits vicinity in the protein interaction network between phenotypically related diseases and uses this to rank exome candidates

Large boost in rankings of candidate genes using 250 disease gene-families

Prototype version online, manuscript in preparation

Page 53: Haendel clingenetics.3.14.14

PhenoViz: Integrate all human, mouse, and fish data to understand CNVs

Desktop application for differential diagnostics in CNVs

Explain manifestations of CNV diseases based on genes contained in CNV

E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin Double the number of explanations using model data

Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72

Page 54: Haendel clingenetics.3.14.14

Conclusions Cross-species phenotype data can be

used to perform semantic similarity

Structured phenotype data for rare and undiagnosed disease patients can aid candidate evaluation

We are experimenting with these methods for UDP patient phenotypes to aid candidate prioritization, identify models, explore mechanisms, and find collaborators

Page 55: Haendel clingenetics.3.14.14

NIH-UDPWilliam BoneMurat SincanDavid AdamsAmanda LinksDavid DraperNeal BoerkoelCyndi TifftBill Gahl

OHSUNicole VasileskyMatt Brush

Lawrence BerkeleyNicole WashingtonSuzanna LewisChris Mungall

UCSDAmarnath GuptaJeff GretheAnita BandrowskiMaryann Martone

U of PittChuck BoromeoJeremy EspinoHarry Hochheiser

AcknowledgmentsSanger

Anika OehlrichJules JacobsonDamian Smedley

TorontoMarta GirdeaSergiu Dumitriu Mike Brudno

JAXCynthia Smith

CharitéSebastian KohlerSandra DoelkenSebastian BauerPeter Robinson

Funding:NIH Office of Director: 1R24OD011883NIH-UDP: HHSN268201300036C