Health Data Visualization - A review - unifr.ch · developing health data visualization tools over recent years. Against this background, the goal of this term paper is to Table 1.The

Health Data Visualization - A review∗

Seminar Collaborative Data Visualization

Michael BärtschiUniversity of Fribourg

Department of Computer ScienceDIVA Group

CH-1700 Fribourg, [email protected]

KeywordsInformation visualization, electronic health records, datamining, decision support

ABSTRACTWith the increased complexity of electronic health data thedemand for supporting health data visualization applica-tions has grown recently. We summarize the existing healthdata visualization tools, analyze their strengths and weak-nesses, and identify the most important missing componentsin current systems. We argue that existing visualization sys-tems satisfy most of the various requirements of the differentstakeholders, but are restricted to a specific scope of appli-cation. Interesting lines for further work are incorporatingdata mining algorithms and decision support algorithms, aswell as integrating different visualizations into larger scale,multi-functionality systems. Such an integration will likelybring along new issues to tackle, like usability restrictionsand data format issues.

1. INTRODUCTIONWith the trend towards increased electronic health data ac-cumulation (e.g. from mobile health applications [1] or byreleasing public databases which were previously hold un-der lock [5]), health data visualization applications have be-come popular over recent years. These applications targetclinical research and appliance at the same time as govern-ment agencies and the general public (e.g. through personalhealth tracking applications on mobile phones). The pos-sibilities of welfare gains for these various stakeholders areenormous, and there has been considerable research effort indeveloping health data visualization tools over recent years.Against this background, the goal of this term paper is to

∗The Paper is a deliverable of the Msc Research SeminarCollaborative Data Visualization in 2011, DIVA Group Uni-versity of Fribourg, supervised by Dr. D. Lalanne

summarize for each stakeholder the components of a valu-able and successful health data visualization application. Inthe following, we focus on clinical research and applianceand government agencies.The paper is structured as follows. In Section 2 we definethe requirements and needs for the two types of applicationsdescribed above. In the third section we review several ex-isting systems with respect to the requirements and needsfrom section two. The paper concludes with a propositionfor future systems.

2. APPLICATION REQUIREMENTSHealth data visualization applications (HVA) are importanttools to gain insight into Electronic Health Records (EHR).In the field of clinical research and appliance they are oftenused to understand patient histories in an efficient, timelyand user friendly way [6]. Further, HVA can be useful infinding hidden patterns which reveal important cause-and-effect phenomena. Two kind of patterns are of interest.First, the patient centered patterns which reveal importantinformation for one patient, second, the inter-patient pat-terns revealing common information among several patients[1, 5].A patients health history consists of the clinical informationobtained by a physician in order to make the correct diag-nostics for that patient. This information enclose symptoms,diagnostics, illness, recent treatments and other data aboutthe patient from the past till this day. Based on this infor-mation, a physician draws his conclusions about the actualstate of the patient. Such health histories can be of highcomplexity and visualizations have to consider all healthconditions in order to enable physicians to work with thepatients health data [6]. For this reason, HVA in this fieldshould provide abstract (e.g. clustered) views of uni- andmultivariate health variables over time to discover anomaliesand relations in an overall view [6]. In addition there mustbe some flexibility to dive deeper into a specific pattern togain detailed knowledge about it [1, 5]. An important com-ponent is also the ability to compare specific variables of onepatient with the health data of other patients who have thesame or similar demographic characteristics and patterns.This component is particularly important for research [5].An overview of the various requirements for systems used inthe field of clinical research and appliance can be found inTable 1.

With respect to government agencies, the most important

Clinical research andappliance

Government agencies

(i) Patient history overview (iv) Multivariate pattern vi-sualization for a specific re-gion

(ii) Single-patient multi-variate pattern visualiza-tion

(v) Multivariate pattern vi-sualization for multiple re-gions

(iii) Multi-patient multi-variate pattern visualiza-tion

(vi) Dynamic query filtering

Table 1: Application Requirements

applications of HVA are in the field of visualizing populationphenomena such as public health care access and infectiousdisease-spreading tendency [3, 4].To visualize population health phenomena, government agen-cies are more interested in aggregated EHR categories whichare often multivariate and at the same time geographicallydistributed [3, 4]. Target HVA should enable users to visual-ize multiple public health care indicators (e.g. performance,access, and quality) with respect to time and geospatiality,to apply dynamic queries to filter regions of interest by de-mographic attributes, and to produce visual results whichmakes it possible to discover important patterns in an easyand fast way [4]. The right hand side of Table 1 summarizesthe three mentioned criteria.

3. REVIEW OF EXISTING SYSTEMSWe turn now to reviewing several existing systems with re-spect to the requirements listed in Table 1. As the visualiza-tion applications for the two target user groups are quite dif-ferent, the analysis is divided into two separate paragraphs.

Clinical Research and Appliance. Several visualizationapproaches have been made to overcome the complexity ofpatients health histories. Interactive timelines representingtemporal events of a patients life history became very pop-ular recently. EHR are divided into several categories (suchas symptoms, diagnoses, results, etc.) and attached to thetimeline as labeled pins which display the patients healthdata at a specific time. Users have the possibility to querypatterns either by filters next to the timeline, by clickingon legends to hide or show certain categories, or by directlymanipulating the timeline (e.g. zooming in or out). A re-view of several systems implementing this approach can befound in [5]. Visualization applications based on the time-line approach are very useful to gain insight into time-relatedhealth data and can help physicians and researchers make in-ferences about which symptoms leads to which diseases andwhat medical treatments have to be taken into account toexamine it. The main benefit of this approach is, that userscan easily filter target indicators and thereby switch froma general view of a patients health variables to a detailedview revealing important patterns of the queried variablesin a fast and clear way [5] (requirement (ii)). The maindrawback however is that there exists no possibility to com-pare target variables among several patients (requirement

(iii)). Furthermore, only a limited amount of health eventscan be displayed for a given time-range. For this reason,users have to either scroll forth and backward or to remem-ber a lot of information in order to gain an overall pictureof the patients health history [5] (requirement (i)).An extension to this approach is the system Lifeline2 [5]which, instead of visualizing health variables of a single pa-tient, allows users to select a subset of EHRs from multiplepatients. Records are attached to the timeline as coloredtriangles and can be clearly assigned to patients by IDs.This allows users to query patterns among multiple patients.Furthermore, users can rearrange events to reveal previouslyunknown patters resulting in additional knowledge [5]. Sucha multivariate and multi-patient comparison can, for exam-ple, help physicians and researchers link symptoms to spe-cific diseases and find treatments which were successful inhelping other patients. This approach is clearly the best so-lution regarding requirement (iii), but it is very restrictedregarding requirements (i) and (ii).A completely different approach was implemented in the sys-tem named AnamneVis[6] in which a patients health historyis visualized as a radial sunburst (see Figure 1). Nodes rep-resenting health events such as diagnoses, symptoms, treat-ments, etc. (encoded as ICD9 codes1) are drawn as wedgesor bars around the human body in a circular fashion. Therelative placement to neighbor nodes represents the rela-tionship in the hierarchy. The angle of a node represents itsnumber of incidents whereas the severity of a specific healthevent is color encoded. Three layers of nodes are used tocluster categories. Layer one stands for the highest hierarchylevel, layer two represents more detailed categories and layerthree contains the discrete incident node (symptom or diag-nostic etc.). Red dots on the human body encodes incidentlocations. The whole sunburst contains the patients entirehistory. Having three different abstraction layers available,users can easily switch from an overall view of the patientshealth conditions to a very detailed view about a specificsymptom, diagnostic or treatment. Moreover, the color en-coding makes it easy to examine the severity of a specifichealth event. Because neighboring boxes represent relationsbetween the displayed incidents, inferences from previouslymade decisions and the resulting effects are easily accessi-ble. This features clearly address the requirements (i) and

Figure 1: Patient health history visualization as ra-dial sunburst (taken from [6])

1International Classification of Diseases, Revision 9

(ii). The main downside of AnamneVis is the lack of multi-patient comparison which violates requirement (iii) and thatthe overall chronological order of health events is not avail-able [6].Taking all this information into account we can say thata timeline based visualization containing features for mul-tivariate and single-/multi-patient comparison possibilitiesbelong to the most valuable components of a HVA in thisarea of application. But particularly for physicians it is alsoof great interest to have a complete overview of a patientshealth history at hand, like it is implemented in AnamneVis.

Government agencies. The most common approach to vi-sualize multivariate and geographically distributed healthdata is the use of interactive choropleth maps2 together witha filter to select the target health categories. Recently de-veloped choropleth maps can be split up into two kinds ofsystems. The first one allows users to select a target healthcategory and to display it as a color encoded attribute on aninteractive map for multiple regions (e.g. Dartmouth Atlas[4]). Hovering a certain region compares the selected cate-gory relative to the average of the state or the country (usingdifferent kind of charts). The second kind of systems allowsusers to select and compare multiple health categories, butonly for one specific region (e.g. Health Profiles Interactive[4]). An overview of such visualization applications can befound in [4]. As one can clearly see, both approaches violaterequirement (v). In addition, the first type of systems alsoviolates requirement (iv).The newly developed visualization application CommunityHealth Map [4] tries to combine the two approaches by in-cluding several views on a single page (see Figure 2). Thispage contains a map view (1) to visualize one specific healthcategory as color encoded attribute, a selection panel to se-lect the health categories of interest (2), a dynamic filterpanel to filter health categories by demographic variables (3)as well as a chart or table view to compare multiple healthcategories among different regions (4). Changing one of thedemographic variables or one of the health categories auto-matically updates the map and table view. With the com-bination of several different views, Community Health Mapaddresses all three requirements and therefore represents agreat visualization system for government agencies. Userscan investigate target variables for a single region throughdynamic query filtering among with the choropleth map dis-play and for multiple regions through the chart or table view.One drawback is that the integration of multiple views intoone visualization restricts the usability somewhat, as usershave to switch the visualization context if they want to shiftfrom a single-region to a multi-region representation. More-over a chart or table view is not really suitable for patternrecognition as the process of searching for interesting infor-mation has to be done manually and is not supported by thevisualization.Against the background of these critics, another interestingsystem is MediMap [3]. Although it does not introduce newvisualization techniques, it enhances the visualization by in-cluding data mining and decision support algorithms. Usingdifferent clustering methods to make similarity groups out

2A choropleth map is a map displaying color encodedstatistical variables (e.g. public health access) for predefinedregions

Figure 2: Community Health Map visualization con-sisting of the four views: map (1), selection panel(2), dynamic query filtering (3), and chart or tableview (4) (taken from [4])

of the target health categories, decision tree learning algo-rithms3 to explain the main differences between those clus-ters, and chart and table visualizations which displays thosedifferences in a clear and meaningful way, MediMap enablesusers to reveal hidden patterns which, without applying datamining and decision support algorithms, may have not beenfound [3].

4. FUTURE SYSTEMSThe review of Section 3 shows that there already exist ad-equate visualization systems for all requirements stated inTable 1. The main problem is that each system targetsonly a specific scope of application. For instance a physiciancan use AnamneVis for managing a patients health historyand one of the various timeline based visualizations from[6] to investigate a specific symptom or disease. Howeverit would be helpful to have a visualization system whichintegrates multiple visualization applications and thereforeaddresses all requirements at once. The benefit of such asystem is that users would no longer have to change appli-cations to fulfill certain tasks which can not be handled withonly one application. The integration could, for instance, beimplemented similarly to the visualization system Commu-nity Health Map, combining multiple visualizations into oneapplication. Although this is the most straight forward ap-proach, there are several problems. On one hand this wouldrestrict the usability as we already stated in Section 3. Onthe other hand there is also a data format issue. Each re-viewed visualization system demands its own data format.While we leave problem one to further work, a possible solu-tion to solve the data format issue is to use an open softwarearchitecture like Open mHealth [1]. Open mHealth uses sev-eral modular data processing units to build EHR from mo-bile health applications in a bottom-up way. More specificOpem mHealth provides modules which can be included inany mobile health application to gather basic health infor-mation like blood pressure. This data can then be aggre-gated by higher level modules to build more complex healthinformation data including location and time information[1]. Such a data gathering and processing architecture can

3For a detailed description of the applied algorithmsplease see [3]

Figure 3: Circular ideogram visualizing the copynumber profiles for chromosomes 6 and 17 of fivefollicular lymphoma tumor samples (taken from [2])

be used to collect and store huge amount of standardizedEHR. By adapting the API of the previously discussed visu-alization systems to the standardized EHR the data formatissue is solved.

One thing that is missing in all reviewed HVA systemsis the possibility to visualize multivariate and multi-patienthealth variables in a large scale manner by using data miningand decision support algorithms. The previously discussedsystems allow users to find (hidden) patterns by manuallysearching for such events or by rearranging elements of thevisualization (also randomly). However, one of the mainpurposes of good data visualization tools is to automate thisprocess of revealing hidden patterns by using data miningand decision support algorithms together with data visual-izations which can handle large amount of data and explainanomalies in the data by visualizing them in a clear andmeaningful way without any involvement of users. Such avisualization system would be especially important for re-search, in which large amount of patients are compared bymultiple variables. An interesting approach to address thisproblem is the data visualization tool Circos [2]. Circos tar-gets the identification and analysis of similarities and differ-ences in large scale data. With its efficient way of display-ing variations or anomalies in either clustered or detailedviews of related data (see Figure 3), Circos is a promisingvisualization application which combined with preprocessedhealth data is likely a benefit for researchers. As data miningand decision support algorithms are not part of Circos, theyhave to be applied on the dataset separately. The prepro-cessed data can then be loaded into the visualization tool. Apossible example of use is to visualize all patients who havea disease of interest along with several other common char-acteristics like age, gender, etc. This could be done by usingwell-known dimensionality reduction and feature extractionalgorithms (e.g. PCA or LDA) together with classificationalgorithms (e.g. KNN or ANN) to extract target patientsout of the large scale dataset and to group them based onhow good their characteristics match. Such preprocesseddata would perfectly suite the visualization pattern of Cir-cos which could then be used to visualize the similaritiesbetween those patients as described above. This would notonly help physicians and researchers identify important pat-

terns among multiple patients, but also to automatically se-lect the relevant patients.

5. CONCLUSIONWith the increased amount of collected electronic healthdata and an accompanied rising complexity in understand-ing and evaluating them, the demand for supporting healthdata visualization applications has grown recently. Our sum-mary of existing systems shows that several good visualiza-tion systems have been implemented which satisfy the mainrequirements and needs of the various different stakehold-ers. Overall, we identify two key avenues for further work:First, one limitation that these current systems share is thatthey are in general very specific to one field of application,e.g. single-patient multivariate pattern recognition or multi-patient multivariate pattern recognition. However, for users,it would likely be beneficial to integrate these different func-tionalities and features into one flexible system. This combi-nation of multiple visualizations will likely bring along newcomplications, such as usability restrictions and data formatissues. There is room for interesting work to be done.Second, a feature lacking most systems is data mining anddecision support. Given the increasing and complex amountof data available, the potential gains of such algorithms areenormous. Interesting work in this direction is done by Cir-cos [2] which targets to identify and analyze similarities anddifferences in large scale data by providing an API to includesuch preprocessed data. This opens up a new and interestingline of work.

6. REFERENCES[1] C. Chen, D. Haddad, J. Selsky, J. E. Hoffman, R. L.

Kravitz, D. E. Estrin, and I. Sim. Making sense ofmobile health data: An open architecture to improveindividual- and population-level health. Journal ofMedical Internet Research, 14(4):e112, August 2009.

[2] M. Krzywinski, J. Schein, ArnanAg Birol, J. Connors,R. Gascoyne, D. Horsman, S. J. Jones, and M. A.Marra. Circos: An information aesthetic forcomparative genomics. Genome Research,19(9):1639–1645, September 2009.

[3] N. Lavrac, M. Bohanec, A. Pur, B. Cestnik,M. Debeljak, and A. Kobler. Data mining andvisualization for decision support and modeling ofpublic health-care resources. Journal of BiomedicalInformatics, 40(4):438–447, August 2007.

[4] A. Sopan, A. S.-I. Noh, S. Karol, P. Rosenfeld, G. Lee,and B. Shneiderman. Community health map: Ageospatial and multivariate data visualization tool forpublic health datasets. Government InformationQuarterly, 29(2):223–234, April 2012.

[5] D. W. Taowei, C. Plaisant, A. J. Quinn, R. Stanchak,S. Murphy, and B. Shneiderman. Aligning temporaldata by sentinel events: Discovering patterns inelectronic health records. Proceedings of the SIGCHIConference on Human Factors in Computing Systems,CHI ’08:457–466, April 2008.

[6] Z. Zhang, F. Ahmed, A. Ramakrishnan, R. Zhao,A. Viccellio, and K. Mueller. AnamneVis: A frameworkfor the visualization of patient history and medicaldiagnostics chains. IEEE VAHC Workshop, January2011.

Documents

Health Data Visualization - A review - unifr.ch · developing health data visualization tools over recent years. Against this background, the goal of this term paper is to Table 1.The