26
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University

Searching and Exploring Biomedical Data

  • Upload
    chet

  • View
    31

  • Download
    3

Embed Size (px)

DESCRIPTION

Searching and Exploring Biomedical Data. Vagelis Hristidis School of Computing and Information Sciences Florida International University. Roadmap. Why is it challenging to search EMRs? XOntoRank : Leveraging Ontologies to improve sensitivity in EMR search - PowerPoint PPT Presentation

Citation preview

Page 1: Searching and Exploring Biomedical Data

Searching and Exploring Biomedical Data

Vagelis HristidisSchool of Computing and Information SciencesFlorida International University

Page 2: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 2

RoadmapWhy is it challenging to search

EMRs?XOntoRank: Leveraging

Ontologies to improve sensitivity in EMR search

ObjectRank: Use authority flow to rank EMR entities

BioNav: Using MeSH to explore the results of PubMed queries

Page 3: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 3

RoadmapWhy is it challenging to search

EMRs?XOntoRank: Leveraging

Ontologies to improve sensitivity in EMR search

ObjectRank: Use authority flow to rank EMR entities

BioNav: Using MeSH to explore the results of PubMed queries

Page 4: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 4

ELECTRONIC MEDICAL RECORDS (EMRs) Adoption of EMRs hard due to political reasons

◦ No unique patient id◦ Confidentiality◦ HIPAA (Health Insurance Portability and

Accountability Act) Move towards XML-based format. One of most promising:

Health Level 7’s Clinical Document Architecture (CDA).

EMRs pose new challenges for Computer Scientists◦ Confidentiality, authentication, secure exchange◦ Storage, Scalability◦ Dictionaries, terms disambiguation◦ Search for interesting patterns (Data Mining)◦ Data Integration, Schema mapping◦ Searching and Exploring

Page 5: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 5

SAMPLE CDA FRAGMENT

Page 6: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 6

CDA Document – Tree View

Page 7: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

7

LIMITATIONS OFTraditional IR General XML Search

Text-based search engines do not exploit the XML tags, hierarchical structure of XML

Whole XML document treated as single unit - unacceptable given the possibly large sizes of XML documents

Proximity in XML can also be measured in terms of containment edges

EMRs have known but complex semantics EMRs include free text, numeric data, time

sequences, negative statements. Routine references in EMRs to external

information sources like dictionaries and ontologies.

Page 8: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

Syntax vs. Semantics in Schema

8

Example – query “Asthma Theophylline”

More details at [Hristidis et al. NSF Symposium on Next Generation of Data Mining ’07]

Page 9: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 9

RoadmapWhy is it challenging to search

EMRs?XOntoRank: Leveraging

Ontologies to improve sensitivity in EMR search

ObjectRank: Use authority flow to rank EMR entities

BioNav: Using MeSH to explore the results of PubMed queries

Page 10: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

XOntoRank: Leverage Ontological Knowledge

Algorithm to enhance keyword search using ontological knowledge (e.g., SNOMED) [ICDE’08 poster, ICDE’09 full paper]

10

Medical DictionaryM

edic

al D

ictio

nary

50043002Disorder of

Respiratory system

79688008RespiratoryObstruction

Is a

118946009Disorder of

Thorax

41427001Disorder ofBronchus

Is a

195967001Asthma

Is a

Is a

301229001Bronchial Finding

Is a

405944004AsthmaticBronchitis

Is a

May be

266364000Asthma attack

Is a May be955009

Bronchial Structure

Finding site of

Finding site of

Finding site of

82094008Lower respiratory tract

structure

Is a

Page 11: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

Example 1q = {“bronchitis”, “albuterol”}

result = Observationcodevalue Bronchitisvalue Albuterol

11

Page 12: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

Example 2q = {“asthma”, “albuterol”}

result = ???

12

Page 13: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

XOntoRankA CDA node may be associated to a

query keyword w through ontology.XOntoRank first assigns scores to

ontological concepts◦ OntoScore OS(): Semantic relevance of a

concept c in the ontology to a query keyword w.

Then, given these scores, assign Node Scores NS() to document nodes

Other aggregation functions are possible. 13

Page 14: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

Computing OntoScore of Concept Given Query KeywordThree ways to view the ontology

graph:◦As an unlabeled, undirected graph.◦As a taxonomy.◦As a complete set of relationships.

14

Page 15: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 15

RoadmapWhy is it challenging to search

EMRs?XOntoRank: Leveraging

Ontologies to improve sensitivity in EMR search

ObjectRank: Use authority flow to rank EMR entities

BioNav: Using MeSH to explore the results of PubMed queries

Page 16: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 16

Authority Flow Ranking in EMRs

A subset of the electronic health record dataset.

Work under submission.

EventsPlan TimeStampCreated=”2004-11-03 11:57:00.0" Events=”….small residual pericardial effusion…..”

Hospitalization TimeStampCreated=”2004-10-27 22:00:00.0" History=”18 year old boy with an aggressive form of chest lymphoma…” Allergies = “NKDA”…...

Cardiac PatientID=”1438" Complication=”apical impulse … Echo-large increasing pericardial effusion…”

Employee TimeStampCreated=”2004-12-23 14:03:00.0" Title=”Pediatric Cardiologist”….

EventsPlan Events=“4 month old baby… pericardial effusion...”

Medication TimeStampCreated=”2003-02-13 21:57:00.0"..

Hospitalization History = “48 year old..”

v1v7

v2v3

v4

v5v6

prescribed_to

recorded_by

recorded_by

Query: “pericardial effusion”

Page 17: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 17

Authority Flow Ranking

Schema of the EMR dataset

Hospitalization

EmployeeAssociated_Events

Patient Medication

A-E

P-M H-M

M-E

A-H H-E

P-E

created_by

reco

rded

_by

pres

crib

ed_b

y

of prescribed_to

forcreated_by

Page 18: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 18

User Study

Page 19: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

19

Explaining Subgraph

Page 20: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 20

User Study Results

00.10.20.30.40.50.60.70.80.9

1

CO085BM25 BM25 CO085 CO030

Ave

rage

Sen

sitiv

ity

00.10.20.30.40.50.60.70.80.9

1

CO085BM25 BM25 CO085 CO030

Ave

rage

Spe

cific

ity

Mean Sensitivity Mean Specificity

BM25: Traditional Information Retrieval Ranking FunctionCO: Clinical ObjectRank (Authority Flow)

Page 21: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data 21

RoadmapWhy is it challenging to search

EMRs?XOntoRank: Leveraging

Ontologies to improve sensitivity in EMR search

ObjectRank: Use authority flow to rank EMR entities

BioNav: Using MeSH to explore the results of PubMed queries

Page 22: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

Biological Databases (cont’d) – Results Navigation [ICDE09, TKDE 2010]

With SUNY Buffalo.Demo at

http://db.cse.buffalo.edu/bionav/Most publications in PubMed

annotated with Medical Subject Headings (MeSH) terms.

Present results in MeSH tree.Propose navigation model and

smart expansion techniques that may skip tree levels. 22

Page 23: Searching and Exploring Biomedical Data

BioNav: Exploring PubMed Results

Static Navigation Treefor query “prothymosin”

MESH (313)Amino Acids, Peptides, and Proteins (310)Proteins (307)Nucleoproteins (40)

Biological Phenomena, … (217)Cell Physiology (161)Cell Growth Processes (99)

Genetic Processes (193)Gene Expression (92)Transcription, Genetic (25)

95 more nodes

2 more nodes45 more nodes

4 more nodes

3 more nodes15 more nodes

10 more nodes1 more node

Histones (15)

- Query Keyword: prothymosin- Number of results: 313- Navigation Tree stats:

• # of nodes: 3941• depth: 10• total citations: 30897

Big tree with many duplicates!

23Vagelis Hristidis, Searching and Exploring Biomedical Data

Page 24: Searching and Exploring Biomedical Data

24

BioNav: Exploring PubMed Results

Reveal to the user a selected set of descendent concepts that:(a) Collectively contain all results(b) Minimize the expected user navigation costNot all children of the root are necessarily revealed as in static navigation. Vagelis Hristidis, Searching and Exploring Biomedical Data

Page 25: Searching and Exploring Biomedical Data

Vagelis Hristidis, Searching and Exploring Biomedical Data

25

BioNav Evaluation

02468

101214161820

Overall Navigation Cost(# of Concepts Revealed + # of EXPAND Actions)

Static BioNav

Page 26: Searching and Exploring Biomedical Data

References Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari.

Effective Navigation of Query Results Based on Concept Hierarchies. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2010

Fernando Farfán, Vagelis Hristidis, Anand Ranganathan, and Michael Weiner. XOntoRank: Ontology-Aware Search of Electronic Medical Records. IEEE International Conference on Data Engineering (ICDE) 2009

Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari. BioNav: Effective Navigation on Query Results of Biomedical Databases. IEEE International Conference on Data Engineering, ICDE 2009

Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, Anthony F. Rossi, Jeffrey A. White. Information Discovery on Electronic Medical Records. National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM) 2007

Supported by NSF IIS-0811922: Information Discovery on Domain Data Graphs, 2008-

2011 NSF CAREER IIS-0952347, 2010-2015

26Vagelis Hristidis, Searching and Exploring Biomedical Data