29
Integrating Large, Disparate, Integrating Large, Disparate, Biomedical Ontologies to Boost Biomedical Ontologies to Boost Organ Development Network Organ Development Network Connectivity Connectivity Chimezie Ogbuji 1 and Rong Xu 2 Metacognition LLC 1 Case Western Reserve University 2

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Embed Size (px)

Citation preview

Page 1: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Integrating Large, Disparate, Biomedical Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Ontologies to Boost Organ Development Network ConnectivityNetwork Connectivity

Chimezie Ogbuji1 and Rong Xu2

Metacognition LLC1

Case Western Reserve University2

Page 2: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

OutlineOutlineOutline

◦Background◦Motivation◦Literature review / related work◦Opportunity / specific example◦Hypothesis◦Method◦Evaluation◦Discussion

Page 3: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

BackgroundBackgroundControlled biomedical vocabulary

systems (and ontologies) play a key role in the analysis of genetic disease◦Structured, interoperable, and machine-

readable◦Facilitate reproducibility of scientific

results and use of intelligent software that can leverage underlying meaning

◦Scientific results and the structured biomedical knowledge they are based on may be used for multiple - even unanticipated - purposes

Page 4: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

MotivationMotivationWant descriptive relations that

comprise terminology paths between (congenital) diseases and the anatomical entities that become malformed

Want to use these as the basis for analysis and classification of congenital disorders according to their underlying molecular mechanism

Page 5: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity
Page 6: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

OpportunityOpportunity The Gene Ontology (GO) is arguably the most

prominent example of how highly-organized and structured medical knowledge can be leveraged to facilitate medical genetics◦ Has a hierarchy of biological processes involving

organ development.   The Foundational Model of Anatomy (FMA) is a

vast ontology with an objective to conceptualize the physical objects and spaces that constitute the human body ◦ macroscopic, microscopic and sub-cellular

canonical anatomy. Their skeletal relations (is_a, part_of, and

has_part) have the same meaning

Page 7: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Literature reviewLiterature reviewCellular components function via

interaction with each other in a highly-complex and interconnected network

Interdependencies among a cell’s molecular components lead to functional, molecular, and causal relationships among distinct phenotypes.

Network-based approaches to disease have the potential to provide a framework for classifying disease, defining susceptibility, predicting disease outcome, and identifying tailored therapeutic strategies

Barabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature Reviews Genetics 2011.

Page 8: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Barabási et al. 2011

For over a decade, analysis of biological networks via network and graph theory has revealed the importance of locally-dense andwell-connected subgraphs (hubs).Schwikowski et al. A network of protein-protein interactions in yeast 2000

Page 9: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Related workRelated workInvestigation of structural and lexical

concordance between anatomy terms in the FMA and SNOMED-CT◦ Bodenreider & Zhang 2006

Leveraging this concordance for integrating modules from each for a specific domain◦ Ogbuji et al. 2010

Discussion of logical consequences of using part_of between both anatomical entities (in the FMA) and biological processes (the GO)◦ Jimenez-Ruiz et al. 2010

Page 10: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Marfan Syndrome (MFS)Marfan Syndrome (MFS)

[…] mainly characterized by aneurysm formation in the proximal ascending aorta, leading to aortic dissection or rupture at a young age when left untreated. The identification of the underlying genetic cause of MFS, namely mutations in the fibrillin-1 gene (FBN1), has further enhanced [...] insights into the complex pathophysiology of aneurysm formation

In UMLS Metathesaurus• Finding site: connective tissue structure (SNOMED-CT)

• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)

Page 11: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Marfan Syndrome Marfan Syndrome exampleexampleIn the GO, FBN1 is annotated with the

GO_0001501 (skeletal system development) and GO_0007507 (heart development) concepts (amongst others)

The former coincides with the more common finding site and classification of MFS as a congenital skeletal disorder

This is in spite of the fact that associations (causal and otherwise) between MFS and cardiovascular diseases such as aortic root dilation are well-documented in the medical literature

Page 12: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

HypothesisHypothesisA high-quality integration of the GO's

development process hierarchy with the FMA will have several benefits:◦ New biological pathways from genetic

diseases to the anatomical entities whose development are involved in their underlying molecular mechanisms

◦ Graph and network analysis can benefit from an increase in connectivity for discovering biologically meaningful motifs

◦ Similarly, classification algorithms can also take advantage of this

Page 13: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Copper: annotates human geneGold : does not annotate human gene

Page 14: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Method and materialsMethod and materialsIntegration is performed on the

following GO development process hiearchies◦ Anatomical structure development◦ Anatomical structure arrangement◦ Anatomical structure morphogenesis

Only GO concepts that annotate human genes are considered

In processing the GO, the logical properties (transitivity, for example) of the relations are fully considered◦ This will always be the case, henceforth

Page 15: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Method and materials Method and materials (continued)(continued)The FMA ontology is loaded (as OWL/RDF)

into a triple store for remote querying via SPARQL

The prefix of the human-readable label for each GO concept in the development hierarchies is stemmed and used as a basis for case-insensitive, lexical matching on primary labels and exact synonyms of FMA classes via a SPARQL query

FMA classes that match exactly are considered to denote the anatomical entities that participate in the corresponding GO biological process

Page 16: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

ExampleExampleGO_0007507 (heart

development)Prefix: heartMatching FMA concept: FMA_7088 (Heart)

Page 17: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

EvaluationEvaluationResult: 1644 development

process and anatomical entity pairs

We calculate the Jaccard coefficient of the overlap between hierarchies for 6 major organs and the anatomical development processes they participate in

Page 18: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Evaluation (continued)Evaluation (continued)Using the GO development process for

some FMA organ O as the starting point, the set of all subordinate terms is calculated: GOsubgraph(O)

Example:◦ GO_0007507 (heart development) has

GO_0003170 (heart valve development) as a component (via has_part)

◦ GO_0003170 subsumes GO_0003176 (aortic valve development) and has GO_0003179 (heart valve morphogenesis) as a component

◦ Each of these would be considered as subordinates of GO_0007507

Page 19: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Evaluation (continued)Evaluation (continued)In a similar fashion, the subordinate

anatomical entities for each O amongst the 6 chosen organs are calculated: ◦ FMAsubgraph(O)

For each O, we calculate the GO terms that are both in GOsubgraph(O) and were matched with an FMA class that is in FMAsubgraph(O)

This resulting set of GO terms is considered the intersecting set and the Jaccard coefficient is calculated with respect to this, FMAsubgraph(O), and GOsubgraph(O)

Page 20: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Jaccard Coefficient (overlap)Jaccard Coefficient (overlap)

Page 21: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Evaluation: network Evaluation: network connectivityconnectivityWe calculate number of new

paths from OMIM diseases through their genes to the anatomical entities in the FMA:◦P+

dgo

Similarly, we calculate the number of new paths starting from the genes to additional FMA anatomical entities◦P+

go

Page 22: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Network connectivity: Network connectivity: continuedcontinuedOnly genes that are annotated

with anatomical development processes matched to FMA classes and OMIM diseases associated with these genes were considered◦Genesdev

Page 23: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Number of additional P+dgo paths on a logarithmic

scale

Page 24: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Log-scaled histogram of additional paths from Genesdev to FMA classes, only for those genes that had additional paths

Page 25: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Evaluation summaryEvaluation summaryOn average, mapping introduces

9,549 additional P+dgo paths per

OMIM diseaseOn average, each Genedev gene had

17,037 additional paths to FMA classes

Caveat in normalizing the number of P+

dgo paths by number of genes◦paths from diseases to anatomical

entities introduce combinatorial factor of disease-gene pairings

Page 26: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

DiscussionDiscussionOverlap results indicate little

overlap between the GO hierarchies and corresponding FMA hierarchies

Not surprising as both cover disparate domains within medicine and one is specific to humans while the other is not

Page 27: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Discussion (continued)Discussion (continued)This along with the size of the FMA

as a whole and within the portions mapped to the GO hierarchies indicate opportunity to build on the mapping and to integrate both ontologies in a meaningful way

Connectivity results demonstrate significant increase of biological paths from genetic diseases (and their genes) to the anatomical entities participating in the development process

Page 28: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

Discussion (continued)Discussion (continued)As these paths are at least as

logically and biologically sound as the ontologies they were forged from, we expect that an appreciable amount of them will be useful for analysis

To our knowledge, this is the first attempt of this kind to integrate the anatomical structural development, morphogenesis, and organization hierarchies in the GO with the FMA

Page 29: Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

LimitationsLimitationsRegarding deductions (formal or

otherwise) that follow from an integration of the FMA and GO◦Need to be careful to only consider

annotations for humans or to have a robust way to manage the uncertainty introduced in not doing so