Evaluating Large-Scale Biomedical Ontology Matching ?· Evaluating Large-Scale Biomedical Ontology Matching…

Embed Size (px)

Text of Evaluating Large-Scale Biomedical Ontology Matching ?· Evaluating Large-Scale Biomedical Ontology...

  • Evaluating Large-Scale Biomedical Ontology Matching Over Parallel Platforms

    Muhammad Bilal Amina, Wajahat Ali Khana, Shujaat Hussaina, Dinh-Mao Buia, Oresti Banosa, Byeong Ho Kangb andSungyoung Leea

    aUbiquitous Computing Lab, Department of Computer Engineering, Kyung Hee University, Yongin-si, South Korea; bSchool of Computing andInformation Systems, University of Tasmania, Hobart, Australia

    ABSTRACTBiomedical systems have been using ontology matching as a primary technique for heterogeneityresolution. However, the natural intricacy and vastness of biomedical data have compelledbiomedical ontologies to become large-scale and complex; consequently, biomedical ontologymatching has become a computationally intensive task. Our parallel heterogeneity resolutionsystem, i.e., SPHeRe, is built to cater the performance needs of ontology matching by exploiting theparallelism-enabled multicore nature of todays desktop PC and cloud infrastructure. In this paper,we present the execution and evaluation results of SPHeRe over large-scale biomedical ontologies.We evaluate our system by integrating it with the interoperability engine of a clinical decisionsupport system (CDSS), which generates matching requests for large-scale NCI, FMA, and SNOMED-CT biomedical ontologies. Results demonstrate that our methodology provides an impressiveperformance speedup of 4.8 and 9.5 times over a quad-core desktop PC and a four virtual machine(VM) cloud platform, respectively.

    KEYWORDSBiomedical informatics;multithreading; biomedicalontologies; ontologymatching; parallelprocessing; parallelprogramming; semantic web

    1. Introduction

    Over the recent years, semantic web technologies havestarted penetrating in biomedical systems for greaterbenefits. Among these technologies, ontologies are exten-sively used in biomedical information systems [1]. Thisusage is largely contributed for annotation of medicalrecords [2], standardization of medical data formats [3],medical knowledge representation and sharing, clinicalguidelines (CG) management [4], clinical data integra-tion, and medical decision-making [5]. These vast usagesof ontologies in the biomedical field have compelledresearchers to invest more in development of newerontologies and provide continuity to the already createdones. Therefore, biomedical ontologies like the GeneOntology (GO) [6], the National Cancer Institute (NCI)Thesaurus [7], the Foundation Model of Anatomy(FMA) [8], and the Systemized Nomenclature of Medi-cine (SNOMED-CT) [9] have emerged and maintainedover the years. There exist several service-oriented infra-structures encouraging the development and usage ofontologies in biomedicine including, BioPortal [10] andOBO Foundry [11]. BioPortal currently hosts a reposi-tory of 384 biomedical ontologies. The Open BiomedicalOntologies (OBO) consortium worked on introducingstrategies for evolving ontologies [11]; however, thedesign, development, and management of biomedicalontologies become challenging due to continuous evolu-tion of medical data. Consequently, biomedical

    ontologies are becoming larger in size and their growinguse is making them increasingly available.

    Large biomedical ontologies are complex in nature, con-taining overlapping information. Utilization of thisinformation is necessary for the integration, aggregation,and interoperability; for example, NCI ontology definesthe concept of Myocardium related to the conceptCardiac Muscle Tissue, which describes the musclessurrounding the human heart. Concept Cardiac MuscleTissue is defined in FMA ontology; therefore, a biomed-ical system or a professional, integrating knowledgeregarding human heart, requires correspondencebetween candidate ontologies FMA and NCI [1]. Like-wise, finding correspondence between GO ontology andFMA ontology can be used by molecular biologist inunderstanding the outcome of proteomics and genomicsin a large-scale anatomic view [12]. Moreover, corre-spondence between ontologies has also been used forheterogeneity resolution among various health standard[13]. This correspondence between candidate ontologiesis called mappings or alignments and the process of dis-covering these mappings is termed as ontology matching.

    The active research community has acknowledged theimportance of matching large-scale biomedical ontolo-gies. Subsequently, initiatives like Unified Medical Lan-guage system (UMLS) by National Library of Medicine

    2015 IETE

    IETE TECHNICAL REVIEW, 2015http://dx.doi.org/10.1080/02564602.2015.1117399

    Dow

    nloa

    ded

    by [

    Kyu

    nghe

    e U

    nive

    rsity

    - S

    uwon

    (G

    loba

    l) C

    ampu

    s], [

    Bila

    l Am

    in]

    at 1

    9:57

    01

    Aug

    ust 2

    016

    http://dx.doi.org/10.1080/02564602.2015.1117399http://www.tandfonline.com

  • [14] and Ontology Alignment Evaluation Initiative(OAEI) [15] are now mainstream biomedical ontologymatching research campaigns. In complement to theseinitiatives, our motivation also lies in the use of ontologymatching for the integration of biomedical information;however, ontology matching over large-scale biomedicalontologies is a computationally intensive task withquadratic computational complexity [16]. Ontologymatching is a Cartesian product of two candidate ontolo-gies, which requires resource-based element-level (string-based, annotation-based, language-based, and label-based)[17] and structural-level (child-based, graph-based, andproperty-based) [17] matching algorithms to be executedover candidate ontologies for the generation of therequired mappings. In our experiments, executing thesematching algorithms over large-scale biomedical ontolo-gies, whole FMA with whole NCI has taken 3 days togenerate desirable results. This delay in mapping resultsmakes ontology matching of large-scale biomedical ontol-ogies ineffective for biomedical systems and professionalswith in-time processing demands. Ontology matchingproblem is formally defined as following:

    For given two ontologies OS D CS;RS; IS;ASh i andOT D CT;RT; IT;ATh i, different type of inter-ontologyrelationships called mappings or alignments can bedefined. These mappings are derived by a set of matchingalgorithms with a similarity degree d 2 0; 1 : A match-ing is a quadruple m D id; xS; xT; dh i, xS, and xTare aligned ontology terms, and d is the similarity degreeof m.

    Over the years, ontology matching systems and techni-ques have taken large-scale biomedical ontologies intoconsideration and proposed various resolutions. How-ever, these resolutions are matching effectiveness-centric,i.e., accuracy of the matching algorithms. The perfor-mance aspect of matching these ontologies is concen-trated on optimization of the matching algorithms andpartitioning of larger ontologies into smaller chunks forperformance benefits [18]. Due to the trade-off betweenperformance and accuracy, optimizing the matchingalgorithms can take the performance-gain in ontology-matching only to a certain degree. Furthermore, the per-formance improvement based on exploitation of newerhardware technologies has greatly been missed. Amongthese technologies are affordable parallelism-enabled sys-tems, which are easily available as stand-alone (desktop)and distributed platforms (cloud) [19].

    Parallelism has long been associated with high perfor-mance computing (HPC); however, with the advent ofvirtualization over multicore processors, performance-

    oriented computing environments are ubiquitously avail-able as cloud platforms [20]. Moreover, cloud computingwith its limitless yet affordable computational power canbe exploited for applications with higher complexity[21]. Our ontology matching system SPHeRe [22,23]avails this opportunity and provides a performance-based ontology matching resolution, which exploits mul-ticore platforms, i.e., desktop and particularly cloud forparallel ontology matching.

    To contribute in the performance aspect of large-scalebiomedical ontology matching, we have enabled SPHeReto execute parallel matching over these complex andcomprehensive ontologies. Therefore, we have evaluatedthe performance measures by incorporating it with theinteroperability engine of a clinical decision support sys-tem (CDSS) and deploying it over a quad-core desktopPC and four VM cloud platform. We have been able toachieve an impressive performance speedup of 4.8 timesover the desktop and 9.5 times over the cloud platformon matching requests for large-scale biomedical ontolo-gies FMA, NCI, and SNOMED-CT. Furthermore, wehave compared this method with GOMMAs parallelmatching techniques called inter- and intra-matching[18] used for matching biomedical ontologies. Ourmethod outperforms inter- and intra-matching tech-nique by 50% in performance speedup and 16% in scal-ability over multi-node platform, proving this method tobe more performance efficient and effective in utilizingavailable computational resources.

    The rest of the paper is structured as follows. In Section 2,we provide the methodology used by SPHeRe for match-ing large-scale biomedical ontologies. Section 3 describesthe experimentation performed over the multicore desk-top PC and the cloud platform, and discusses the results.Section 4 briefly discusses the related work in the area ofbiomedical ontology matching from the perspective ofperformance. Section 5 concludes this paper.

    2. Methodology overview

    This section provides the overview of the methodologyused by SPHeRe for large-scale biomedical ontologymatching. The intrinsic technical details of SPHeResimplementation are already provided in [22]; further-more, the finer details of the methodology are compre-hensively covered in [24].

    The primary goal of the methodology is to exploit paral-lelism-enabled platforms for large-scale biomedicalontology matching by distributing t