Upload
mhaendel
View
2.947
Download
0
Embed Size (px)
Citation preview
@monarchinit@ontowonka
“Not everyone can become a great artist, but a great artist
can come from anywhere” Anton Ego, Ratatouille, 2007, Dixsney/Pixar
Envisioning a world where everyone helps solve disease
Melissa HaendelSWAT4LS 2015
Cambridge, England
Faith-based research
“I believe that my work on some obscure cell type in some obscure organism will matter to mankind one day”
Well, it can, and it does.
Four things it takes to solve an undiagnosed disease
1. Deep phenotyping the human organism
2. Crossing the language barrier
3. A lot of data from a lot of places
4. Very many people (who have faith)
1. DEEP PHENOTYPING THE HUMAN ORGANISM
PatientGenom
e/Exome
Filter
****
** ***** ****
Genomic data
Diagnosis,treatment
ATCTTAGCACGTTAC
ATCTTAGCACGTGACATCTTATCACGTTACATCTTAGCACGTTAC
What do all those variations do?
We only know the phenotypic consequences of mutation of <20% of the human coding genome
Patient
Genome
/Exome
Diagnosis,treatment
Filter
****
** ***** ****
Genomic data
Phenotype
Gene-Phenotype
Data
Environment
We have a common language for sequence data…. ATCTTAGCACGTTAC… ….not so much for phenotypes
CC2.0 European Southern Observatory https://www.flickr.com/photos/esoastronomy/6923443595
Can we help machines understand phenotypes?
“Palmoplantar
hyperkeratosis”
Human phenotype I have absolutely no
idea what that means
???
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
Marcin Wichary [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
A disease is a collection of phenotypes
Patient
Disease XDifferential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
Do we *really* need yet another clinical vocabulary?
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
UMLSSNOMED CT
CHVMedDRA
MeSHNCIT
ICD10-CICD9-CM
ICD-10OMIM
MedlinePlus
Existing clinical vocabularies don’t adequately cover phenotype descriptions
Disease-phenotype associations using an ontology
Once OMIM is rendered computable, are we done yet?
Free text -> HPO enables phenotype semantic similarity matching
Mendelian disease integrationMerges sources together using: equivalence and subclass axioms derived from xrefs string matching manual efforts to fill gaps based on phenotypes and
anatomical axioms
Parkinson’s disease subtypes
Different colors = different disease sources
https://github.com/monarch-initiative/monarch-disease-ontology
Why we need all the organisms
Model data can provide up to 80% phenotypic coverage of the human coding genome
We learn different things from different organisms
2. CROSSING THE LANGUAGE BARRIER
Ulcerated paws
Palmoplantar hyperkeratos
is
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
Challenge: Each database uses their own vocabulary/ontology
MPHP
MGIHPOA
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
Challenge: Each database uses their own vocabulary/ontology
ZFA
MPDPO
WPO
HP
OMIA
VT
FYPO APOSNOMED
………
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOAIMPC
OMIM
ICDQTLd
b
EHR
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
Decomposition of complex concepts allows interoperability
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
Cross-species ontology integration
3. A LOT OF DATA FROM A LOT OF PLACES
Graph Views
DiverseG2P/D
source data
Source Ontologies Owl Loader
Graph Views
Monarch App
FacetedBrowsing
Phenotype
Matching
.ttl
.ttl
Input OutputPipeline
Putting it Together: Data + Ontologies
https://github.com/SciGraph/SciGraph
Data Integrated in SciGraph>25 sources>100 species
51M triples4M curated
associations2.2M G-P / G-D
associations
Genotype-phenotype integration
One sourceTwo sources3 or more
9%
91% of our 2.2 Million G2P associations required integrating 2 or more data sources (this number does not even include orthology (Panther))
91%
Combining genotype and phenotype data for variant prioritization
Whole exome
Remove off-target and common variants
Variant score from allele freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
https://www.sanger.ac.uk/resources/software/exomiser/
York platelet syndrome and STIM1
Markello T et al. Molecular Genetics and Metabolism 2015, 114: 474
Grosse J, J Clin Invest 2007 117: 3540-50
Impaired platelet aggregation(HP:0003540) Thromocytopenia (HP:0001873)
Abnormal platelet activation(MP:0006298) Thrombocytopenia (MP:0003179)
UDP_2542 Stim1Sax/Sax
http://www.nature.com/gim/journal/vaop/ncurrent/full/gim2015137a.html
4. VERY MANY PEOPLE (WHO HAVE FAITH)
Who helped solve the STIM1 UDP_2542 case?
Credit extends beyond the publication
Johannes creates stim1 mouse
Melissa annotates patient UDP_2542 with HPO
Will performs analysis of UDP_2542 that includes stim1 mouse to generate a dataset of prioritized variants
Tom writes publication pmid:25577287 about the STIM1 diagnosis
Tom explicitly credits Will as an author but not Melissa.
Credit is connected
Credit to Will is asserted, but credit to Melissa can be inferred
Who is in the graph?
Melissa HaendelPeter RobinsonChris MungallSebastian KohlerCindy SmithNicole VasilevskySandra Dolken
Johannes GrosseAttila BraunDavid Varga-SzaboNiklas BeyersdorfBoris SchneiderLutz ZeitlmannPetra HankePatricia SchroppSilke MühlstedtCarolin ZornMichael HuberCarolin SchmittwolfWolfgang JaglaPhilipp YuThomas KerkauHarald SchulzeMichael NehlsBernhard Nieswandt
Thomas MarkelloDong ChenJustin Y. Kwan Iren Horkayne-Szakaly Alan Morrison Olga Simakova Irina Maric Jay Lozier Andrew R. Cullinane Tatjana Kilo Lynn Meister Kourosh PakzadSanjay Chainani Roxanne Fischer Camilo Toro James G. White David AdamsCornelius BoerkoelWilliam A. Gahl Cynthia J. Tifft Meral Gunay-Aygun
Melissa HaendelDavid AdamsDavid DraperBailey GallingerJoie DavisNicole Vasilevsky Heather TrangRena GodfreyGretchen GolasCatherine GrodenMichele NehrebeckyAriane SoldatosElise Valkanas,Colleen WahlLynne Wolfe
Elizabeth Lee Amanda LinksWill Bone Murat SincanDamian SmedleyJules JacobsonNicole WashingtonElise FlynnSebastian KohlerOrion BuskeMarta GirdeaMichael Brudno Jeremy Band
Hans GoebleKaren BalbachNadine PfeiferSandra WernerChristian Linden
Clinical/care Pathology Ontologist CS/informatics Curator Basic research
Tracking Evidence and Provenance of G2P Associations
Evidence is a collection of information that is used to support a scientific claim or association
Provenance is a history of what processes led to the claim being made, what entities participated in these processes
Value of Evidence and Provenance Metadata context to evaluate credibility/confidence support filtering and analysis of data detailed history for attribution
Evidence and Provenance for a Variant-Phenotype Association
Who is missing?
http://haluzz.deviantart.com/art/Waldo-at-the-hipster-party-273602450
What about patients? Can they help too?
HP:0000252Pref Label: MicrocephalySynonyms: Decreased Head Circumference; Reduced Head Circumference; Small head circumferenceSuggested Synonyms : Small Head; Little Head; Small Skull; Little Skull; Small Cranium…
Small headMicrocephaly
https://commons.wikimedia.org/wiki/File:Microcephaly.png#/media/File:Microcephaly.png
Job openinghttps://goo.gl/MlcnR5
Focusing on building ontologies and semantic web technologies to represent research, attribution, provenance, and scholarly communication
@ontowonka [email protected]
Funding: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P; NCINCI/Leidos #15X143,
BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)
PIs: Chris Mungall, Peter Robinson, Damian Smedley, Tudor Groza, Harry Hochheiserwww.monarchinitiative.org/page/team