51
Public Health Ontology 101 Mark A. Musen, M.D., Ph.D. Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

  • Upload
    jared56

  • View
    473

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Public Health Ontology 101

Mark A. Musen, M.D., Ph.D.Stanford Center for Biomedical Informatics Research

Stanford University School of Medicine

Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Page 2: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Many Factors can Influence the Effectiveness of Outbreak Detection

• Progression of disease within individuals

• Population and exposure characteristics

• Surveillance system characteristics

From Buehler et al. EID 2003;9:1197-1204

Page 3: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 4: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Interpretation of data• Clinical data useful for public health surveillance are often

collected for other purposes (e.g., diagnostic codes for patient care, billing)

• Such data may be biased by a variety of factors– Desire to protect the patient– Desire to maximize reimbursement– Desire to satisfy administrative requirements with minimal effort

• Use of diagnostic codes is problematic because precise definitions generally are unknown—both to humans and computers

Page 5: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

A Small Portion of ICD9-CM724 Unspecified disorders of the back724.0 Spinal stenosis, other than cervical724.00 Spinal stenosis, unspecified region724.01 Spinal stenosis, thoracic region724.02 Spinal stenosis, lumbar region724.09 Spinal stenosis, other724.1 Pain in thoracic spine724.2 Lumbago724.3 Sciatica724.4 Thoracic or lumbosacral neuritis724.5 Backache, unspecified724.6 Disorders of sacrum724.7 Disorders of coccyx724.70 Unspecified disorder of coccyx724.71 Hypermobility of coccyx724.71 Coccygodynia724.8 Other symptoms referable to back724.9 Other unspecified back disorders

Page 6: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The combinatorial explosion1970s ICD9: 8 Codes

Page 7: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

ICD10 (1999): 587 codes for such accidents

• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while

working for income• W65.40 Drowning and submersion while in bath-

tub, street and highway, while engaged in sports activity

• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or

engaging in other vital activities

Page 8: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Syndromic Surveillance

• Requires enumeration of relevant “syndromes”• Requires mapping of codes (usually in ICD9) to

corresponding syndromes• Is complicated by the difficulty of enumerating

all codes that appropriately support each syndrome

• Is complicated by lack of consensus on what the “right” syndromes are in the first place

Page 9: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

There is no consistency in how “syndromes” are defined or monitored

System Syndrome

ENCOMPASS “Respiratory illness with fever”

CDC MMWR 10/01 “Unexplained febrile illness associated with pneumonia”

RSVP, New Mexico “Influenza-like illness”

Santa Clara County “Flu-like symptoms”

Winter Olympics, Utah 2002

“Respiratory infection with fever” consensus definition

Page 10: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The solution to the terminology mess:Ontologies

• Machine-processable descriptions of what exists in some application area

• Allows computer to reason about– Concepts in the world– Attributes of concepts– Relationships among concepts

• Provides foundation for– Intelligent computer systems– The Semantic Web

Page 11: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

What Is An Ontology?• The study of being • A discipline co-opted by computer science

to enable the explicit specification of – Entities– Properties and attributes of entities– Relationships among entities

• A theory that provides a common vocabulary for an application domain

Page 12: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Supreme genus: SUBSTANCE

Subordinate genera: BODY SPIRIT

Differentiae: material immaterial

Differentiae: animate inanimate

Differentiae: sensitive insensitive

Subordinate genera: LIVING MINERAL

Proximate genera: ANIMAL PLANT

Species: HUMAN BEAST

Differentiae: rational irrational

Individuals: Socrates Plato Aristotle …

Porphyry’s depiction of Aristotle’s Categories

Page 13: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test
Page 14: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test
Page 15: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test
Page 16: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Heart

Cavityof Heart

Wallof Heart

RightAtrium

Cavity ofRight Atrium

Wall ofRight Atrium

FossaOvalis

Myocardium

SinusVenarum

SANode

Myocardiumof Right Atrium

CardiacChamber

HollowViscus

InternalFeature

OrganCavity

Organ CavitySubdivision

AnatomicalSpatial Entity

AnatomicalFeature

BodySpace

OrganComponent

OrganSubdivision

Viscus

OrganPart

Organ

AnatomicalStructure

Parts of the heart

Foundational Modelof Anatomy

Is-a

Part-of

Page 17: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The FMA demonstrates that distinctions are not universal

• Blood is not a tissue, but rather a body substance (like saliva or sweat)

• The pericardium is not part of the heart, but rather an organ in and of itself

• Each joint, each tendon, each piece of fascia is a separate organ

These views are not shared by many anatomists!

Page 18: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Why develop an ontology?• To share a common understanding of the entities in a given

domain– among people– among software agents– between people and software

• To enable reuse of data and information– to avoid re-inventing the wheel– to introduce standards to allow interoperability and automatic

reasoning• To create communities of researchers

Page 19: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

We really want ontologies in electronic form

• Ontology contents can be processed and interpreted by computers

• Interactive tools can assist developers in ontology authoring

Page 20: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The NCI Thesaurus in Protégé-OWL

Page 21: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Goals of Biomedical Ontologies

• To provide a classification of biomedical entities• To annotate data to enable summarization and

comparison across databases• To provide for semantic data integration• To drive natural-language processing systems • To simplify the engineering of complex software

systems• To provide a formal specification of biomedical

knowledge

Page 22: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Biosurveillance Data Sources Ontology

Page 23: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Ontology defines how data should be accessed from the database

Page 24: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Ontologies: Good news and bad news

• The Good news– Ontologies allow computers to “understand” definitions of

concepts and to relate concepts to one another– Automated inheritance of attributes makes it very easy to add new

concepts to an ontology over time– Ontologies can be developed in standard knowledge-

representation languages that have wide usage• The Bad news

– Most current biomedical ontologies have been developed using non-standard languages

– It’s still very hard to get people to agree about the content of proposed ontologies

Page 25: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 26: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The Medical Entities Dictionary (after Cimino)

MEDPatientRegistration

ClinicalLaboratory

Radiology

Pharmacy

Page 27: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Ontologies for data integration

Hema-tology

LabResult

SerumChemistry

Electro-lytes

Amino-transferases

Sodium HCO3

PatientDatabase

1

PatientDatabase

2

PatientDatabase

3HCO3

Bicarbonate

Bicarb

HCO3

Ontology ofpatient data

(Canonicaldata value)

Page 28: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic

methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 29: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Different types of data require different types of problem solvers

• Are the data multivariate or univariate? • Do the data involve temporal or spatial dimensions?• Are the data categorical or probabilistic?• Are the data acquired as a continuous stream or as a batch?• Is it possible for temporal data to arrive out of order?• What is the rate of data acquisition and what are the

numbers of data that need to be processed?

Page 30: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

An ontology of problem solvers for aberrancy detection

Obtain Current Observation

Binary Alarm

Transform Data

Forecast

Compute Test Value

Estimate Model

Parameters

Obtain Baseline

Data

Evaluate Test Value

Compute Expectation

Empirical Forecasting

Moving Average

Mean, StDev

Database Query

Database Query

Aberrancy Detection (Temporal)

Residual-Based

Layered Alarm

EWMA

Cumulative Sum

P-Value

. . . .

Constant (theory-based)

Outlier Removal

Smoothing

. . . .

GLM Model Fitting

Trend Estimation

. . . .

. . . .

GLM Forecasting

Compute Residual

Evaluate Residual

Binary Alarm

Aberrancy Detection (Control Chart)

Layered Alarm

Raw Residual

Z-Score

. . . .

EWMA

Generalized Exponential Smoothing

ARIMA Model Fitting

Signal Processing Filter ARIMA Forecasting

Page 31: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

BioSTORM: A Prototype Next-Generation Surveillance Sytem

• Developed at Stanford, initially with funding from DARPA, now from CDC

• Provides a test bed for evaluating alternative data sources and alternative problem solvers

• Demonstrates– Use of ontologies for data acquisition and data

integration– Use of a high-performance computing system for

scalable data analysis

Page 32: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Data Source

s

Data Regularization Middleware

Epidemic Detection Problem Solvers

Control Structure

BioSTORM Data Flow

Mapping Ontology

Heterogeneous Input Data

Semantically Uniform Data

Customized Output Data

Data Broker Data Mapper

Data Source Ontology

Page 33: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Distributed Data

Sources

DataBroker

Data Source Ontology

Heterogeneous Data Input

Semantically Uniform Data

Objects

Data Broker and Data Source Ontology

Page 34: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Biosurveillance Data Sources Ontology

Page 35: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Ontology defines how data should be accessed from the database

Page 36: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Distributed Data

Sources

DataBroker

Data Source Ontology

Heterogeneous Data Input

Semantically Uniform Data

Objects

Data Broker and Data Source Ontology

Page 37: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Semantically Uniform Data

ObjectsData

Mapper

Customized Data Objects

Mapping Ontology

Data Source Ontology

Input–Output Ontology

Problem Solver

Data Mapper and Mapping Ontology

Page 38: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Data Mapper

Mapping Ontologies

Problem Solvers

Input–Output

Ontologies

Varying Problem Solvers

Customized Data Objects

Semantically Uniform Data Objects

Page 39: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

An ontology of problem solvers for aberrancy detection

Obtain Current Observation

Binary Alarm

Transform Data

Forecast

Compute Test Value

Estimate Model

Parameters

Obtain Baseline

Data

Evaluate Test Value

Compute Expectation

Empirical Forecasting

Moving Average

Mean, StDev

Database Query

Database Query

Aberrancy Detection (Temporal)

Residual-Based

Layered Alarm

EWMA

Cumulative Sum

P-Value

. . . .

Constant (theory-based)

Outlier Removal

Smoothing

. . . .

GLM Model Fitting

Trend Estimation

. . . .

. . . .

GLM Forecasting

Compute Residual

Evaluate Residual

Binary Alarm

Aberrancy Detection (Control Chart)

Layered Alarm

Raw Residual

Z-Score

. . . .

EWMA

Generalized Exponential Smoothing

ARIMA Model Fitting

Signal Processing Filter ARIMA Forecasting

Page 40: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Data Source

s

Data Regularization Middleware

Epidemic Detection Problem Solvers

Control Structure

BioSTORM Data Flow

Mapping Ontology

Heterogeneous Input Data

Semantically Uniform Data

Customized Output Data

Data Broker Data Mapper

Data Source Ontology

Page 41: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Computational Challenges

• Access to data• Interpretation of data• Integration of data• Identification of appropriate analytic methods• Coordination of problem solving to address

diverse data sources• Determining what to report

Page 42: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

We need to address the challenges of automating surveillance

• Current surveillance systems – Require major reprogramming to add new data sources or new

analytic methods– Lack the ability to select data sources and analytic methods

dynamically based on problem-solving requirements– Ignore qualitative data and qualitative relationships– Will not scale up to the requirements of handling huge data feeds

• The existing health information infrastructure– Is all-too-often paper-based – Uses 19th century techniques for encoding knowledge about clinical

conditions and situations– Remains fragmented, hindering data access and communication

Page 43: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

The National Center for Biomedical Ontology

• One of three National Centers for Biomedical Computing launched by NIH in 2005

• Collaboration of Stanford, Berkeley, Mayo, Buffalo, Victoria, UCSF, Oregon, and Cambridge

• Primary goal is to make ontologies accessible and usable• Research will develop technologies for ontology dissemination, indexing,

alignment, and peer review

Page 44: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Our Center offers

• Technology for uploading, browsing, and using biomedical ontologies

• Methods to make the online “publication” of ontologies more like that of journal articles

• Tools to enable the biomedical community to put ontologies to work on a daily basis

Page 45: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

http://bioportal.bioontology.org

Page 46: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Local Neighborhood view

Browsing/Visualizing Ontologies

Page 47: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test
Page 48: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

BioPortal will experiment with new models for

• Dissemination of knowledge on the Web• Integration and alignment of online content• Knowledge visualization and cognitive support • Peer review of online content

Page 49: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

BioPortal is building an online community of users who

• Develop, upload, and apply ontologies• Map ontologies to one another• Comment on ontologies via “marginal notes” to give

feedback – To the ontology developers– To one another

• Make proposals for specific changes to ontologies• Stay informed about ontology changes and proposed

changes via active feeds

Page 50: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test

Public Health Ontology 101

Mark A. Musen, M.D., Ph.D.Stanford Center for Biomedical Informatics Research

Stanford University School of Medicine

Die Seuche (The Plague), A. Paul Weber, Courtesy of the NLM

Page 51: Public-Health-Ontology--MarkMusen_Jun-2008a.pptx - Generating Test