11
An infrastructure for real-time population health assessment and monitoring D. L. Buckeridge M. Izadi A. Shaban-Nejad L. Mondor C. Jauvin L. Dube ´ Y. Jang R. Tamblyn The fragmented nature of population health information is a barrier to public health practice. Despite repeated demands by policymakers, administrators, and practitioners to develop information systems that provide a coherent view of population health status, there has been limited progress toward developing such an infrastructure. We are creating an informatics platform for describing and monitoring the health status of a defined population by integrating multiple clinical and administrative data sources. This infrastructure, which involves a population health record, is designed to enable development of detailed portraits of population health, facilitate monitoring of population health indicators, enable evaluation of interventions, and provide clinicians and patients with population context to assist diagnostic and therapeutic decision-making. In addition to supporting public health professionals, clinicians, and the public, we are designing the infrastructure to provide a platform for public health informatics research. This early report presents the requirements and architecture for the infrastructure and describes the initial implementation of the population health record, focusing on indicators of chronic diseases related to obesity. Introduction In many countries, including the United States, governments are directing considerable resources toward increasing the adoption of electronic medical records in clinical practice [1]. At the same time, the increasing use of data and messaging standards within clinical and administrative information systems is improving interoperability and allowing data to be shared between providers and institutions [2]. These data are increasingly available for use in public health practice, and they offer enormous potential for describing and monitoring population health and for making population health data available to clinicians and the public. There is, however, limited infrastructure in public health settings to make effective use of these data. One approach to managing the increasing amount of data available in public health settings is to develop a system analogous to an electronic medical record but with the data organized around defined populations as opposed to individuals. Some researchers have called such an infrastructure a population health record [3], and a conceptual framework for the development of such an infrastructure was proposed recently [4]. A fundamental element of such an infrastructure is a repository of health-related indicators for a population. An indicator in this context is an epidemiological measure of health status, healthcare utilization, or a determinant of health. For example, an indicator may estimate the burden of disease, adherence to therapy, rates of hospitalization or complication for a disease, or screening to prevent conditions in a population. Despite the appeal of the population health record concept, such systems are not available in practice. One main reason for this lack of infrastructure is the difficulties in accessing representative and timely data. This preliminary report presents the design and initial implementation of a population health record that integrates administrative and clinical data for a representative sample of a population and calculates population health indicators. This population health record is designed so that users can query, visualize, and analyze indicators related to specific diseases or factors that influence health; identify disease outbreaks and geographic clusters of disease or influencing factors; and display temporal trends of diseases. The system ÓCopyright 2012 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. D. L. BUCKERIDGE ET AL. 2:1 IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012 0018-8646/12/$5.00 B 2012 IBM Digital Object Identifier: 10.1147/JRD.2012.2197132

An infrastructure for real-time population health

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An infrastructure for real-time population health

An infrastructure forreal-time population healthassessment and monitoring

D. L. BuckeridgeM. Izadi

A. Shaban-NejadL. MondorC. JauvinL. DubeY. Jang

R. Tamblyn

The fragmented nature of population health information is a barrierto public health practice. Despite repeated demands by policymakers,administrators, and practitioners to develop information systemsthat provide a coherent view of population health status, there hasbeen limited progress toward developing such an infrastructure.We are creating an informatics platform for describing andmonitoring the health status of a defined population by integratingmultiple clinical and administrative data sources. This infrastructure,which involves a population health record, is designed to enabledevelopment of detailed portraits of population health, facilitatemonitoring of population health indicators, enable evaluation ofinterventions, and provide clinicians and patients with populationcontext to assist diagnostic and therapeutic decision-making.In addition to supporting public health professionals, clinicians, andthe public, we are designing the infrastructure to provide a platformfor public health informatics research. This early report presentsthe requirements and architecture for the infrastructure anddescribes the initial implementation of the population health record,focusing on indicators of chronic diseases related to obesity.

IntroductionIn many countries, including the United States, governmentsare directing considerable resources toward increasing theadoption of electronic medical records in clinical practice [1].At the same time, the increasing use of data and messagingstandards within clinical and administrative informationsystems is improving interoperability and allowing data to beshared between providers and institutions [2]. These data areincreasingly available for use in public health practice, andthey offer enormous potential for describing and monitoringpopulation health and for making population health dataavailable to clinicians and the public. There is, however,limited infrastructure in public health settings to makeeffective use of these data.One approach to managing the increasing amount of data

available in public health settings is to develop a systemanalogous to an electronic medical record but with the dataorganized around defined populations as opposed toindividuals. Some researchers have called such aninfrastructure a population health record [3], and a conceptual

framework for the development of such an infrastructurewas proposed recently [4]. A fundamental element ofsuch an infrastructure is a repository of health-relatedindicators for a population. An indicator in this context is anepidemiological measure of health status, healthcareutilization, or a determinant of health. For example, anindicator may estimate the burden of disease, adherence totherapy, rates of hospitalization or complication for a disease,or screening to prevent conditions in a population. Despitethe appeal of the population health record concept, suchsystems are not available in practice. One main reason forthis lack of infrastructure is the difficulties in accessingrepresentative and timely data.This preliminary report presents the design and initial

implementation of a population health record that integratesadministrative and clinical data for a representative sample ofa population and calculates population health indicators.This population health record is designed so that users canquery, visualize, and analyze indicators related to specificdiseases or factors that influence health; identify diseaseoutbreaks and geographic clusters of disease or influencingfactors; and display temporal trends of diseases. The system

�Copyright 2012 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done withoutalteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed

royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

D. L. BUCKERIDGE ET AL. 2 : 1IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

0018-8646/12/$5.00 B 2012 IBM

Digital Object Identifier: 10.1147/JRD.2012.2197132

Page 2: An infrastructure for real-time population health

is intended to meet the needs of public health practitioners,clinicians, and the general public.

Background

Burden of chronic diseases and the need forpopulation health systemsThere is a pressing need in public health practice for accessto representative and timely data about population health.Such data are required to describe population health status,to identify targets for public health interventions, and toevaluate interventions. The data currently available throughexisting systems tend to be limited by their lack of timeliness,their lack of geographical resolution, or their lack ofrepresentativeness. The implications of such datalimitations are most apparent for chronic diseases, such asobesity, hypertension, and ischemic heart disease. Alsoknown as noncommunicable diseases, these conditionsaccount for a large proportion of global illness, disability, anddeath. Demographic and lifestyle changes, together withincreases in the prevalence of risk factors, have contributedto the rising incidence of chronic diseases [5]. As the ratesof illness and death related to chronic diseases continue togrow worldwide, researchers and public health practitionersare struggling to develop and evaluate interventions to limittheir impact. In Canada, the prevalence of obesity hasincreased from 10% in 1970 to 23% in 2004, and theproportion of major chronic diseases attributable to obesityhas increased over the same interval by 138% for men andby 60% for women [6]. This increase is shocking, butin theory it can be reversed, given that 40% of chronicdiseases are attributable to risk factors such as smoking,obesity, physical inactivity, and poor nutrition, which arepreventable [7].Public health interventions can modify behaviors to

prevent and control chronic diseases. Examples of suchinterventions include tobacco control through taxation andlimiting sales, sodium intake control through labeling andawareness campaigns, and coordination of diseasemanagement programs. However, the effectiveness of theseinterventions is not always well documented and is rarelydemonstrated across different populations. One recent surveynoted that less than 10% of published public health researchdescribes evaluations of interventions, and very few ofthese studies use current population health data to evaluatechronic disease interventions in the last three decades [8].

Existing population health indicator systemsNumerous systems currently exist for making healthindicators accessible over the Internet, but none of thesesystems provides access to timely, representative indicatorsat a high level of geographical resolution. A good exampleof such a system is SAVI (Social Assets and VulnerabilitiesIndicators), which compiles indicators for communities in

central Indiana and provides tools to analyze and visualizethe data collected [9]. The indicators describe social,physical, and economic conditions of these communities atvarious levels of geographic resolution. Although this systemprovides a considerable amount of data in a useful format,limitations include timeliness (most recent data are more than3 years old) and uneven geographical coverage. In theCanadian context, a range of federal, provincial, and regionalagencies have created online repositories of health indicators.For example, the Public Health Agency of Canada hascreated Infobase [10] to provide the public with informationabout indicators of chronic disease. This system makesavailable health-related indicators calculated mainly fromsurveys. Although the data are relatively current (most recentdata are 1 year old), they are available for mostlyself-reported outcomes and only at a low level ofgeographical resolution. At the provincial level, Alberta hascreated the Interactive Health Data Application to provideaccess to health indicators derived from physician claimsdata, hospital discharge abstracts, and other sources [11].While these data are representative, they are not timely(most recent data are more than 3 years old) and do notprovide indicators at a high geographical resolution. At aregional level, the Montreal Health Region has created theEspace Montrealais d’information sur la sante (EMIS),which is a repository of indicators of population health statusand healthcare system utilization in Montreal [12]. Thissystem uses an interactive mapping interface and providesdata at the resolution of the local health areas within theregion. As with the other systems considered, however,the data are not current (most recent data are more than3 years old).

Population health record systemWe propose an architecture for a population health recordthat will provide timely estimates of health-related indicatorsat high geographical resolution for a defined population.Our initial work to develop the system is focused on thepopulation of Quebec, particularly the Montreal region,which has a population of 3.6 million people. In this section,we describe the anticipated user groups and theirrequirements, present an overview of the architecture,describe the data sources used in the system, explain howindicators are derived from the data, present the ontologyused in the system, and discuss issues related to privacy.

User groups and their requirementsWe intend to consider three main groups of users for thissystem: public health professionals, clinicians, and thegeneral public. Although we recognize the importance ofpopulation health information for clinical [13, 14] andindividual decision-making, we focus on public healthprofessionals in the initial stage of this project. For this usergroup, we consider three main functions: describing,

2 : 2 D. L. BUCKERIDGE ET AL. IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 3: An infrastructure for real-time population health

monitoring, and evaluating. Describing refers to thedevelopment of a portrait of health status for a population.In practical terms, this might entail creating a map or table ofage-specific prevalence of a disease, such as ischemic heartdisease, and a similar description of risk factors, such asdiabetes. Monitoring refers to establishing an ongoinganalysis of an indicator, with the application of statisticalalgorithms to a time series or space-time series in order todetect changes in the indicator. Evaluating refers todetermining the effect of an intervention or other event onan indicator, most frequently by comparing changes in anindicator over time or between regions. To perform thefunctions described (i.e., describing, monitoring, andevaluating), the system must have certain characteristics.These are described in the following subsections.

Data/indicators• TimelyVWhen describing the health status of a

population, public health personnel must work with datathat are current. Knowing the state of populationhealth 3 or 4 years ago is of limited help when monitoringhealth status or when planning interventions. Timelinessis also important for evaluating the effect of pastinterventions. If analysts must wait for years to evaluate aprogram, it becomes difficult to manage programseffectively.

• RepresentativeVIndicators must be accurate andunbiased for assessing differences in health status acrossgeographical regions and subgroups of a population.If the data are not representative, public healthprofessionals may make incorrect conclusions whendescribing and monitoring populations and evaluatinginterventions.

• High geographical resolutionVPublic health servicesare delivered at a local level, and many interventions arealso targeted at a local level, affecting neighborhoodswithin cities. In order to understand the need for healthinterventions and to evaluate the effect of theseinterventions, indicators must be available at the samelevel of geographical resolution as the intervention.Monitoring to detect spatial variations in disease activityalso requires indicators at a level of geographicalresolution that is consistent with the scale of clustering.

System• Easy to accessVUsers of the system must be able to find

indicators of interest with minimal effort. Experience inmedical informatics, and other fields, has clearlydemonstrated that people will not adopt a system if theycannot accomplish necessary tasks without disruptingtheir workflow [15]. This requirement means that theremust be an intuitive approach to interacting with thesystem and finding indicators that relate to commonlyunderstood health conditions.

• Explanation of indicatorsVUsers must be able tounderstand how indicators are calculated, including theelements used by algorithms (e.g., the type of data sourceused, diagnostic or medication codes used, number ofrecords with a diagnostic or medication code, and lengthof time). This level of exposition is necessary for users totrust the indicators in the system. In addition, usersshould be able to identify the evidence of the validity of aspecific indicator.

Overview of the architectureTo address the data and system requirements, we havedesigned an architecture that consists of three majorcomponents. The first component includes databases andinterfaces to other systems. The second component is anontology that provides a semantic framework for definingindicators and supporting interactive browsing of indicatorsby users. The third component is an application suite ofquery and analysis tools to allow users to analyze populationhealth indicators.Figure 1 illustrates the overall architecture of the

population health record system. Anonymized administrativedata are linked with clinical data and are stored in apatient-level database. Algorithms are applied to theseindividual data to classify patients according to indicators ofinterest. Individual data are then aggregated to estimateindicators at the population level, and these estimates arestored in the population-level database. An ontology is usedto capture the domain knowledge for indicators, by logicallydefining the concepts, instances, and relationships. Thishighly structured approach to defining how indicators areconstructed and what indicators mean facilitatesidentification and interpretation of indictors by humans andautomated calculation of indicators in software. In addition,because indicators are defined using logic, the ontologysupports queries to ensure the integrity of indicatordefinitions as the system grows, and it also enables reasoningto discover new knowledge.

Data sourcesA main requirement of the infrastructure is that the indicatorsmust be representative of the population. To achieve thisrequirement, we use as the foundation for our data thebeneficiary file from RAMQ (the Regie de l’AssuranceMaladie du Quebec), which is the provincial health insuranceagency in Quebec. This beneficiary file includes 99% ofresidents in the province and is therefore representative ofthe entire population. Using this file, we selected a 25%random sample of people living in the Montreal CensusMetropolitan Area (CMA) for the years 1996 to the present.For sampled individuals, administrative data (i.e., physicianbilling, drugs dispensed, and hospitalization records),death certificates, and laboratory test results can be updatedfrequently, as suggested in Figure 1. Data describing the

D. L. BUCKERIDGE ET AL. 2 : 3IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 4: An infrastructure for real-time population health

population at risk are obtained from the Canadian Census,and data describing determinants of health can be obtainedfrom various sources. Our design calls for cohortmembership to be updated twice yearly, sampling newindividuals to replace those lost to death and out-migration.In the remainder of this section, we give a brief description ofeach of the data sources.

Administrative and vital statistics dataPhysician services claims and prescription medication claimsdata are obtained directly from systems used to processpayment claims maintained by RAMQ. Hospitalizationdischarge abstract data are obtained from MED-ECHO(Maintenance et exploitation des donnees pour l’etude de laclientele hospitaliere), which is a provincial registry for thesedata, and death certificate data are obtained directly fromthe ISQ (Institut de la statistique de Quebec), which is theprovincial vital statistics agency. The physician servicesand prescription medication claims data files includedemographic information such as age and gender, thediagnostic and procedure codes associated with claims for

ambulatory and inpatient care, and the specific medicationsthat were dispensed to the patients by outpatient pharmacies.The claims data use standardized codes includingInternational Statistical Classification of Diseases andRelated Health Problems, 9th and 10th Revisions (ICD-9 andICD-10, respectively), Drug Identification Number (DIN),and American Hospital Formulary Service (AHFS**). Noneof these data sources includes patient identifiers such asname, Social Insurance Number, or exact address.

Clinical dataLaboratory test results from microbiology and clinicalchemistry labs in Montreal hospitals can be updated routinelythrough interfaces to hospital information systems or via aregional repository. Clinical and administrative data must belinked at the individual level. In the administrative data,cohort members are uniquely identified by a uniqueencrypted code, but in the clinical data held by hospitals,individuals are identified by an unencrypted code. To addressthis problem, our design uses a web service operated byRAMQ that returns an encrypted identifier for an

Figure 1

Schematic of a population health record demonstrating data flow and communications. Anonymized administrative data are linked with clinical datafrom hospital laboratories (labeled BH[ in this diagram) and stored in a patient-level database (labeled BIndividual data[) in our system. Algorithms areapplied to these individual data to classify individuals according to indicators of interest. Individual data are then aggregated to estimate population-levelindicators, which are stored in the population-level database. An ontology is used to define indicators and support interactive browsing of indicators byusers. Note that the two unlabeled boxes at the bottom, with double arrows to BLab,[ are look-up tables or a web service to identify an anonymizedidentifier for nominal data. Solid-line arrows correspond to a timing of every 2 weeks (or real-time), and dashed-line arrows refer to a timing ofevery year.

2 : 4 D. L. BUCKERIDGE ET AL. IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 5: An infrastructure for real-time population health

unencrypted identifier. Hospital laboratories or regionalrepositories can use this service to replace the encryptedidentifier with the unencrypted identifier in clinical databefore sending data to the population health recordinfrastructure.

Spatial and census dataData from the Canadian Census and other sources are used todefine geographical boundaries of administrative regions andto provide demographic data describing the populationsresiding within these regions. These data enable populationhealth assessments with a flexible spatial resolution,including postal codes, census tracts, local health regionboundaries, neighborhoods, and urban/rural areas.

Population health indicatorsMany indicators are available to describe the health andhealthcare utilization of a population. In our initial work,we have focused on health indicators related to determinantsand outcomes of obesity. The indicators measuredifferent aspects of diabetes, hypertension, ischemic heartdisease, and stroke. They span a continuum from diseaseburden (e.g., incidence, prevalence, and mortality),to measures of therapy (e.g., prescription, persistence, andtreatment risk factors), to outcomes (e.g., complicationsand hospitalizations), to preventive measures(e.g., disease-related screening). In our infrastructure, theseindicators are defined in an ontology (see the nextsubsection) that enables data and knowledge integration,facilitates knowledge discovery and exploration by users,

and serves as a computable repository of knowledge fordriving data manipulation and analysis.Since a public health indicator is generally expressed

as a ratio, we have defined three components of an algorithmfor calculating an indicator in the population health record.These include the following: 1) the algorithm component thatidentifies individual cases of a disease, 2) the algorithmcomponent that calculates the numerator (case counts) of anindicator, and 3) the algorithm component that calculates thedenominator (population at risk of a condition) of anindicator. Taken together, these components create analgorithm that can be applied to administrative, clinical, orlinked data to identify patients diagnosed with chronicdiseases such as diabetes and hypertension. Table 1 shows atypical example of an algorithm to identify eligible patientsfrom claims data and to define the numerator of anindicator for prevalence of diabetes. Algorithms can varywith respect to the type of data sources they use, thetime window they consider to determine a condition,demographic constraints to extract a subpopulation, andthe required frequency of codes in a time window for eachdata source. For instance, by looking at the medicationhistory and searching for a specific number of prescriptions,or by looking at the physician visit history and searchingfor a particular number of diagnostic codes or laboratorytest results within a certain time period, an algorithmmight identify a person as a diabetic patient. Changing theelements of an algorithm can change the estimate of anindicator. Figure 2 shows the results of 20 differentalgorithms for estimating the prevalence of diabetes. These20 algorithms differ in terms of three parameters: data

Table 1 Profile of an example indicator (Ind234) for the prevalence of diabetes. The denominator, numerator, andeligible case-finding algorithms are a23, d11, and c54, respectively. The þ sign next to each algorithm shows that therelated column can be expanded to describe more information about the algorithm (e.g., data sources used, duration oftime, and repetition allowed). The profile of an indicator also describes the actual codes related to all the data sourcesused in all the algorithms for that indicator. In this example, diagnostic codes (ICD-9) and drug codes (ATC) are the codesrelated to algorithms for Ind234 and are presented with their actual values for this indicator.

D. L. BUCKERIDGE ET AL. 2 : 5IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 6: An infrastructure for real-time population health

sources considered, time window examined, and frequencyof codes required.For any condition, there are several indicators, each of

which can be estimated by many different algorithms.On average, we include in the system 10 algorithms forestimating an indicator, selected on the basis of evidencepublished in scientific and gray literature (e.g., reports thatare not peer-reviewed and are prepared by governmentagencies and research groups). The system is designed toallow users to access the profile of an indicator and inspectthe algorithms used to estimate an indicator, to accessindicator values for the entire population or any desiredsubpopulation, as well as to access the hierarchicalrelationship of the indicator in the ontology. Theserelationships describe the data sources used in algorithms todefine the indicator, the codes in the data sources, thereferences in the literature, and the accuracy of the indicatorif it has been validated.

Indicator ontologyIn general, ontologies define basic concepts in a domainand relations among their instances in a common vocabulary

for researchers and others who need to share information[16, 17]. In the context of public health indicators, codingschemes used for diagnosis of a disease, for drugs prescribedand dispensed, or for medical procedures can varydemographically, geographically, or over time. Therefore,having a representation of concepts rather thansystem-specific details supports information sharing andreusability. We classified indicators into four majorcategories, as illustrated in Figure 3. These categoriesinclude indicators representing measures of disease burden(e.g., incidence, prevalence, and mortality), measures oftherapy (e.g., rate at which medications appropriate for acondition are prescribed or dispensed, compliance, andtreatment risk factors), outcome measures (e.g., rate ofdifferent complications, risk of hospitalization), andpreventive measures (e.g., rate at which physicians performappropriate preventive actions, such as requesting screeningtests or prescribing preventive medications for patients).In addition to creating a taxonomy of public health

indicators, which facilitates finding information andrealizing indicator values, the ontology encapsulates allconcepts, conceptual models, instance data, axioms, and

Figure 2

Indicator values for diabetes prevalence in Montreal metropolitan area in 2006 calculated using 20 different algorithms defined by the data sources theydraw upon (on the y-axis, BRx[ is dispensed medications; BMD Bill[ is physician billing records; and BAdmit[ is discharge abstracts from hospitaladmissions). The algorithms are also defined by the historical duration searched in the database for each data source (duration: 1 year; duration: 3 years)and the number of times an eligible code had to appear in the related data source during the historical duration (91 code; 92 codes).

2 : 6 D. L. BUCKERIDGE ET AL. IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 7: An infrastructure for real-time population health

relationships used to estimate public health indicators. Theontology defines the semantic relationship between differentcomponents and regulates their interactions through a set ofaxioms and rules. Defining appropriate rules and axioms(e.g., cardinality restrictions, which constrain indicator values)along with different associated relationships between theindividuals of those categories supports querying andinference using a set of logical reasoners (e.g., RACER [18]and Pellet [19]). The indicator ontology has been developedusing OWL 2 Web Ontology Language [20], which providesrich semantics to represent the expressive structure of theontological framework. Employing OWL 2 and DescriptionLogics (DL) enables us to apply different reasoning andinference tasks (i.e., subsumption and concept satisfiability)under the OWL 2 Direct Semantics [21].

Data security and privacyWe have obtained permission from the McGill UniversityFaculty of Medicine Institution Review Board to implementour system prototype and from the Quebec Commissiond’acces a l’information (CAI) to obtain and link the datasources used in our system prototype. A proposal to extendour data holdings to implement the full system design iscurrently under review by the CAI.

The system architecture defines a clear separation betweenindividual records and population-level data expressed asindicators. Users do not have access to individual recordsand can access only the indicator values. There still remains arisk that an individual might be identified statisticallyfrom population-level indicators, and we are examining thepossibility of placing constraints on the resolution withwhich certain indicators can be accessed through the system.

Current implementation and initial resultsWe are developing a prototype implementation of thearchitecture described above. In the current prototype,administrative data are assembled for the cohort for 11 years(1996–2006), individual and aggregate data models areestablished, an initial ontology has been created, and we havedefined algorithms and calculated estimates of diabetesprevalence among individuals 19 years of age and olderliving in the Montreal CMA for 2006 (see Figure 2).An example indicator definition is described in terms of

the ontology in Figure 4, and the specific codes used for theindicator are shown in Table 1. The indicator algorithmsare defined in terms of codes for diabetes from physicianbilling and hospital admissions (ICD-9) and dispensedprescription medications for treating diabetes [Anatomical

Figure 3

Representation of major high-level components (boxes) and their relationships (links) in the public health indicators ontology. The relationships includethe ones describing high-level to low-level concepts, as well as functional and temporal ones.

D. L. BUCKERIDGE ET AL. 2 : 7IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 8: An infrastructure for real-time population health

Therapeutic Chemical (ATC) Classification System],including insulin, analogs, and other blood glucose-loweringdrugs. The indicator definition also excludes gestationaldiabetes, which has a separate diagnostic code but may notbe properly coded. These indicators of diabetes do notdiscriminate between type 1 and type 2 diabetes mellitusbecause the ICD-9 coding system does not distinguishbetween the two types of diabetes. Many of the indicatoralgorithms we use to identify cases of diabetes have beendescribed in the literature [22, 23].The preliminary results shown in Figure 2 indicate that

the use of different data sources for identifying individualswith diabetes can have a considerable impact on thepopulation estimate of disease prevalence. Prescription drugsappear to contribute the most to case detection for thisindicator, although the inclusion of physician billing codesand hospital discharge codes increases the estimatedprevalence. Looking back for the presence of drugs anddiagnostic codes over 3 years as opposed to 1 year alsogave higher prevalence. Similarly, requiring greater than

one occurrence of a code gave a higher prevalence thanrequiring more than two occurrences of a code. We arecurrently conducting research to validate these and otherindicators.

Discussion and conclusionThis early report describes the design and prototypeimplementation of a novel public health informaticsinfrastructure to integrate clinical, administrative, and otherdata sources to produce estimates of population health.The architecture addresses requirements of public healthusers for easy access to indicators that are timely,representative, and available at a high geographicalresolution. The infrastructure is intended to enhanceinformation retrieval, analysis, and decision-making relatedto describing and monitoring populations and evaluatinginterventions. More importantly, this platform is designed toplay a central role in enhancing preparedness andevaluating health system effectiveness in response tononcommunicable diseases.

Figure 4

Instance of the ontology that presents an algorithm for finding prevalent cases (case-finding type algorithm) of diabetes ðdiagnostic code ¼ 250Þ in2006 for adults ðDate of Birth 9 1987Þ who have had more than two (repetition constraint) physician visits (in the data source for physician visits andtable db-study27.bill) in 3 years (2004–2006). A class is a general level concept, a subclass is a concept that has more specific attributes than a classconcept, and an individual is an instance of a class or subclass with specific values assigned to its attributes.

2 : 8 D. L. BUCKERIDGE ET AL. IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 9: An infrastructure for real-time population health

The architecture also includes a semantic layer, built ontop of the indicator definitions and the clinical andadministrative data. This semantic layer makes it possible toemploy logical reasoning to reveal implicit relationships,hidden dependencies, inconsistencies, and unsatisfiabilitywithin the semantic infrastructure. This feature is alsointended to facilitate the evaluation and assessment ofinherently complex, dynamic, and interrelated healthcaresystems. The indicator ontology should also enable us tomanage scalability/complexity of our integrated system viareusable modules.The prototype implementation of our architecture using

administrative data demonstrates that for one chroniccondition, the accuracy of indicator estimates differs by datasources. Our estimates of diabetes mellitus prevalence rangedfrom 1% to nearly 5%, depending on the algorithm used.Self-reported prevalence of diabetes in Canada in 2006 wasapproximately 5% but was lower in Quebec [24]. We expectthe inclusion of clinical data, such as laboratory test results,to improve the accuracy of case detection. While theindicators do not require validation prior to incorporationwithin the infrastructure, we are working to validateindicators against external measures such as health surveysand electronic medical records.Performance characteristics of the platform and evaluation

of the effect of the infrastructure on decision-making andoutcomes are areas of focus for future research. Althoughthere is evidence in the literature regarding the acceptabilityof population health information systems in clinical practice,future research will also elicit the requirements for thesepopulation data in clinical decision-making and the potentialeffects on the overall performance of the health system.While the infrastructure is ultimately intended to improve

the measurement of population health, it will also provide aplatform for public health informatics research. Initialresearch areas that we have identified include the following:evaluating the effect of different data linkage strategies on theaccuracy and usability of population health indicators forcomplex combinations of disease status, evaluating the effectof using multiple clinical and administrative data sourceson the accuracy of population health indicators, andevaluating the effect of access to health indicators on workpatterns and productivity in public health settings. Otherresearch avenues that we are pursuing include the use ofmachine learning to identify cases and the efficient dynamicidentification of cases following data updates.We anticipate a number of challenges in advancing our

population health record infrastructure. There are limitationsto the data sources and coding systems used in terms ofaccurately identifying from our cohort cases or individualsthat are at risk. For example, not all individuals may haveaccess to the same prescription medications, and this variableeligibility must be accounted for when using prescriptiondrug data within algorithms. In addition, little evidence

currently exists in scientific and gray literature facilitating thedevelopment of indicators beyond measures of diseaseburden (i.e., prevalence, incidence, and mortality) andacross a range of chronic conditions. The developmentand validation of indicators relevant to rare or comorbidconditions may be limited by statistical power, despiteour large overall sample size. Finally, there are tradeoffsbetween providing indicator estimates with a high degreeof geographic and demographic resolution and respectingprivacy so that individuals are not statistically identifiablefrom our infrastructure. Despite these challenges, ourinfrastructure holds considerable promise for fulfilling theneeds of public health users and others while providing aplatform for advancing informatics research.

AcknowledgmentThe Canadian Foundation for Innovation funded thedevelopment of the research infrastructure.

**Trademark, service mark, or registered trademark of AmericanSociety of Hospital Pharmacists, Inc., in the United States, othercountries, or both.

References1. D. Blumenthal and M. Tavenner, BThe Fmeaningful use_ regulation

for electronic health records,[ N. Engl. J. Med., vol. 363, no. 6,pp. 501–504, Aug. 2010.

2. J. Walker, E. Pan, D. Johnston, J. Adler-Milstein, D. W. Bates, andB. Middleton, BThe value of health care information exchangeand interoperability,[ Health Affairs (Project Hope),vol. Suppl Web, pp. W5-10–W5-18, 2005.

3. Board of Directors of the American Medical InformaticsAssociation. ‘‘A proposal to improve quality, increaseefficiency, and expand access in the U.S. health care system,[JAMIA, vol. 4, no. 5, pp. 340–341, Sep./Oct. 1997.

4. D. J. Friedman and R. G. Parrish, BThe population health record:Concepts, definition, design, and implementation,[ JAMIA,vol. 17, no. 4, pp. 359–366, Jul/Aug. 2010.

5. D. E. Crews and L. M. Gerber, BChronic degenerative diseases andaging,[ in Biological Anthropology and Aging: Perspectives onHuman Variation Over the Life Span, D.E. Crews and Garruto,Eds. New York: Oxford Univ. Press, 1994, pp. 174–208.

6. W. Luo, H. Morrison, M. de Groh, C. Waters, M. DesMeules,E. Jones-McLean, A.-M. Ugnat, S. Desjardins, M. Lim, andY. Mao, BThe burden of adult obesity in Canada,[ J. Chronic Dis.Canada, vol. 27, no. 4, pp. 134–144, 2007.

7. M. Mirolla. (2004, Jan.). The cost of chronic diseases in Canada.The Chronic Disease Prevention Alliance of Canada. Available:http://www.gpiatlantic.org/pdf/health/chroniccanada.pdf

8. R. W. Sanson-Fisher, E. M. Campbell, A. T. Htun, L. J. Bailey,and C. J. Millar, BWe are what we do: Research outputs of publichealth,[ Am. J. Preventive Med., vol. 35, no. 4, pp. 380–385,Oct. 2008.

9. The Polis Center. SAVI Community Information System, IndianaUniv.–Purdue Univ. Indianapolis (IUPUI), Mar. 2004. [Online].Available: http://www.savi.org

10. The Chronic Disease Infobase Website, Public Health AgencyCanada, Oct. 2010. [Online]. Available: http://www.infobase.phac-aspc.gc.ca

11. Interactive Health Data Application, Gov. Alberta. [Online].Available: www.health.alberta.ca/health-info/IHDA.html

12. Gov. Quebec. Espace Montrealais d’information Sur la Sante(EMIS). [Online]. Available: emis.santemontreal.qc.ca/outils/atlas-sante-montreal/

D. L. BUCKERIDGE ET AL. 2 : 9IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 10: An infrastructure for real-time population health

13. P. H. Gesteland, M. A. Allison, C. J. Staes, M. H. Samore,M. A. Rubin, M. E. Carter, A. Wuthrich, A. Kinney, S. Mottice,and C. Byington, BClinician use and acceptance ofpopulation-based data about respiratory pathogens: Implicationsfor enhancing population-based clinical practice,[ in Proc. AMIAAnnu. Symp., Jan. 2008, pp. 232–236.

14. W. Yasnoff, P. O’Carroll, D. Koo, R. Linkins, and E. Kilbourne,BPublic health informatics: Improving and transforming publichealth in the information age,[ J. Public Health Manage. Pract.,vol. 6, no. 6, pp. 67–75, Nov. 2000.

15. J. Horsky, D. Kaufman, and V. L. Patel, BWhen you come to a forkin the road, take it: Strategy selection in order entry,[ in Proc.AMIA Annu. Symp., 2005, pp. 350–354.

16. T. Gruber, BA translation approach to portable ontologyspecification,[ Knowl. Acquisition, vol. 5, pp. 199–220, 1993.

17. M. Musen, BDimensions of knowledge sharing and reuse,[Comput. Biomed. Res Int. J., vol. 25, no. 5, pp. 435–467,Oct. 1992.

18. V. Haarslev and R. Moller, BRACER system description,[ in Proc.1st IJCAR, 2001, pp. 701–706.

19. Pellet: OWL 2 Reasoner for Java. [Online]. Available: http://clarkparsia.com/pellet

20. OWL 2 Web Ontology Language, 2009. [Online]. Available:http://www.w3.org/TR/owl2-overview/

21. I. Horrocks, B. Parsia, and U. Sattler, OWL 2 Web OntologyLanguage Direct Semantics, 2009. [Online]. Available: http://www.w3.org/TR/2009/REC-owl2-direct-semantics-20091027/

22. J. Blanchard, S. Ludwig, A. Wajda, H. Dean, K. Anderson, andO. Kendall, BIncidence and prevalence of diabetes in Manitoba,1986–1991,[ Diabetes Care, vol. 19, no. 8, pp. 807–811,Aug. 1996.

23. L. Lix, M. Yogendran, C. Burchill, C. Metge, N. McKeen,D. Moore, and R. Bond, Defining and Validating ChronicDiseases: An Administrative Data Approach. Winnipeg, MB:Manitoba Centre Health Policy, 2006. [Online]. Available: http://mchp-appserv.cpe.umanitoba.ca/reference/chronic.disease.pdf.

24. Public Health Agency Canada. (2008). Diabetes in Canada:Highlights from the national diabetes surveillance system,2004–2005, PHAC, Ottawa, ON, Canada, PHAC Rep. [Online].Available: http://www.ndss.gc.ca

Received September 16, 2011; accepted for publicationNovember 9, 2011.

David L. Buckeridge Surveillance Lab, Clinical and HealthInformatics Research Group, McGill University, Montreal, QCH3A 1A3 Canada ([email protected]). Dr. Buckeridge isan Associate Professor of Epidemiology and Biostatistics at McGillUniversity in Montreal where he holds a Canada Research Chair inPublic Health Informatics. He is also a Medical Consultant to theInstitut national de sante publique du Quebec and the Direction de santepublique de Montreal. His research focuses on public health informaticsand particularly on the informatics of public health surveillance.Current research projects include developing and evaluating systems forautomated surveillance in community and hospital settings. In additionto his research activities, Dr. Buckeridge provides advice onsurveillance to groups such as the World Health Organization, theInstitute of Medicine, and the U.S. and Chinese Centers for DiseaseControl and Prevention. He has an M.D. degree from Queen’sUniversity in Canada, an M.Sc. degree in epidemiology from theUniversity of Toronto, and a Ph.D. degree in biomedical informaticsfrom Stanford University. He is also a Fellow of the Royal College ofPhysicians and Surgeons of Canada, with specialty training in PublicHealth and Preventive Medicine.

Masoumeh Izadi Surveillance Lab, Clinical and HealthInformatics Research Group, McGill University, Montreal, QCH3A 1A3 Canada ([email protected]). Dr. Izadi is a Research StaffMember in the Surveillance Lab at the Health Informatics ResearchGroup. She received a Ph.D. degree in artificial intelligence fromMcGill University, an M.Sc. degree in advanced computing from

King’s College University of London, and a B.Sc. degree in appliedmathematics from the University of Tehran. Her research contributionsin the investigation, development, and application of computationaltheories of reasoning and decision-making under uncertainty and inapplications of artificial intelligence in health informatics and medicinehave been published in various journals and conferences. She is amember of editorial board of Journal of Bioterrorism and Biodefense.

Arash Shaban-Nejad McGill Clinical and Health InformaticsGroup, Department of Epidemiology, Biostatistics and OccupationalHealth, Faculty of Medicine, McGill University, Montreal, QCH3A 1A3 Canada ([email protected]). Dr. Shaban-Nejadis a Postdoctoral Fellow at the McGill Clinical and Health InformaticsCentre at McGill University, where he conducts research on theknowledge representation and modeling for different applications inhealthcare. He received M.S. and Ph.D. degrees in computer sciencefrom Concordia University in 2005 and 2010, respectively. His primaryresearch interest is knowledge representation and Semantic Web,particularly ontologies and knowledge bases, description logics,category theory, agents, reasoning, and inferencing, with specialemphasis on applications from health and biomedical domains.His Ph.D. dissertation focused on change management in distributedbiomedical ontologies through a multi-agent-based framework,formalized using category theory. He is author or coauthor of severalscientific papers and technical manuscripts, which have appeared invarious international journals, conference/workshop proceedings,and book chapters. Dr. Shaban-Nejad is a member of the Institute ofElectrical and Electronics Engineers (IEEE), the Association forComputing Machinery (ACM), the National Institutes of HealthInformatics (NIHI), and the American Medical Informatics Association(AMIA).

Luke Mondor Surveillance Lab, McGill Clinical and HealthInformatics Research Group, McGill University, Montreal, QCH3A 1A3 Canada ([email protected]). Mr. Mondor is aResearch Assistant at the Surveillance Lab, part of McGill University’sClinical and Health Informatics Research Group. He received a B.A. degreein anthropology from the University of Manitoba in 2008 and recentlycompleted an M.Sc. in epidemiology from McGill University.

Christian Jauvin Surveillance Lab, McGill Clinical and HealthInformatics Research Group, McGill University, Montreal, QCH3A 1A3 Canada ([email protected]). Mr. Jauvin received an M.Sc.degree in computer science and machine learning from Universite deMontreal in 2003. Since then, he has focused mainly on scientificcomputing and is now a full-time independent researcher andconsultant, working in different scientific environments, on manytopics including geographic information systems, high-performancecomputing, databases, machine learning, and visualization.

Laurette Dube McGill University, Montreal, QC H3A 1G5Canada ([email protected]). Dr. Dube is Full Professor andholds the James McGill Chair of consumer and lifestyle psychology andmarketing at the Desautels Faculty of Management of McGillUniversity, which she joined in 1995. She is also the Founding Chairand Scientific Director of the McGill World Platform for Health andEconomic Convergence, a unique initiative to promote convergencebetween academic disciplines, innovations, and initiatives byindividuals, communities, enterprises, and governments, at all levels,to address the most pressing societal problems through systemicchange. Dr. Dube’s lifetime research interest bears on the study of whataffects and behavioral economic processes underlie consumption andlifestyle behavior and how such knowledge can inspire more effectivehealth and marketing innovations. A leader in marketing researchin Canada, and among the world leaders in her areas of expertise,she has made outstanding theoretical contributions while workingcreatively and rigorously on issues of managerial and socialsignificance. Beyond scientific publications in books and in leadingscientific journals of her field, including Journal of Consumer Research

2 : 10 D. L. BUCKERIDGE ET AL. IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012

Page 11: An infrastructure for real-time population health

and Journal of Marketing Research, her work has been presented inleading general audience and business publications such as Maclean’s,The Globe and Mail, USA Today, and The Wall Street Journal.Dr. Dube received her Ph.D. degree in marketing from CornellUniversity in Ithaca, New York (1990). She has an M.P.S. degree inservices marketing management from the same university, an M.B.A.degree in finance from HEC Montreal, Montreal, Quebec, Canada, anda B.Sc. degree in health sciences (nutrition) from Laval University,Quebec, Canada, with 10 years of experience in hospital management.She holds a research career award jointly funded by the CanadianInstitutes of Health Research and the Social Sciences and HumanitiesResearch Council of Canada. She is a Fellow of the Royal Societyof Canada.

Yeona Jang McGill University, Desautels Faculty ofManagement, Montreal, QC, H3A 1G5 Canada([email protected]). Dr. Jang is a professor of practice inmanagement in the Desautels Faculty of Management at McGillUniversity. She received M.S. and Ph.D. degrees in computer sciencefrom MIT and an M.S. degree in business management from the MITSloan School of Management. She has more than 15 years ofexperience as a senior IT (information technology) executive anddecision-maker in various companies, in innovative use of IT inbusiness. She is experienced in various aspects of IT management,including IT-enabled business innovation, large-scale systemsintegration, enterprise architecture, large-scale data warehouse andbusiness intelligence, and knowledge management. She joined theDesautels Faculty of Management at McGill University in June 2008.Her current research interest in health IT includes advancing researchin medical informatics and business models for electronic healthrecords to improve patient safety and health outcomes. Dr. Jang is amember of the Canada/U.S. Research on Healthcare IT team incollaboration with McGill Medical Department, University of OttawaMedical Department, and Brigham and Women’s Hospital/HarvardMedical School, and an international consortia of the SclerodermaPatient-centered Intervention Network.

Robyn Tamblyn Departments of Medicine and Epidemiology andBiostatistics, McGill University, Faculty of Medicine, Montreal, QCH3A 1A3 Canada ([email protected]). Dr. Tamblyn is aProfessor in the Department of Medicine and the Department ofEpidemiology and Biostatistics at McGill University. She is a JamesMcGill Chair, a Medical Scientist at the McGill University HealthCenter Research Institute, and the Scientific Director of the Clinicaland Health Informatics Research Group at McGill University.Dr. Tamblyn’s ground-breaking research on educational outcomes haselucidated important relationships between health professional training,licensure, and practice that have subsequently guided credentialingpolicies. Her work on prescription drug use, its determinants, andcomputerized interventions to improve drug safety [Medical Office ofthe XXIst Century (MOXXI)] have been recognized internationally.She leads a team funded by the Canadian Institutes of Health Researchto investigate the use of e-health technologies to support integrated carefor chronic disease, and she co-leads a Canadian Foundation forInnovation Informatics Innovation Laboratory to create advancedtechnologies to monitor adverse events in populations and create newtools to improve the safety and effectiveness of healthcare. Her work ispublished in the Journal of the American Medical Association, theAnnals of Internal Medicine, the British Medical Journal, MedicalCare, and Health Services Research, among others. She has receivedthe Canadian Health Services Research Foundation KnowledgeTranslation award for her research in improving the use of medication,as well as the L’Association canadienne-française pour l’avancementdes sciences (ACFAS) Bombardier award for innovation in thedevelopment of a computerized drug management system. As ofJanuary 2011, she became the Scientific Director of the Institute ofHealth Services and Policy Research at the Canadian Institutes ofHealth Research.

D. L. BUCKERIDGE ET AL. 2 : 11IBM J. RES. & DEV. VOL. 56 NO. 5 PAPER 2 SEPTEMBER/OCTOBER 2012