65
Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA [email protected] www.kmed.cs.ucla.edu

Knowledge-based Information Management for Biomedical Applications

  • Upload
    ossie

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Knowledge-based Information Management for Biomedical Applications. Wesley Chu Computer Science Department University of California Los Angeles, CA [email protected] www.kmed.cs.ucla.edu. Outline. Data types Uses of knowledge bases to enhance information management Sample systems - PowerPoint PPT Presentation

Citation preview

Page 1: Knowledge-based Information Management for Biomedical Applications

Knowledge-based Information Management for Biomedical

Applications

Wesley Chu

Computer Science Department

University of California

Los Angeles, CA

[email protected]

www.kmed.cs.ucla.edu

Page 2: Knowledge-based Information Management for Biomedical Applications

Outline Data types Uses of knowledge bases to enhance

information management Sample systems

Structured data Multi-media Free-text

Conclusion

Page 3: Knowledge-based Information Management for Biomedical Applications

Information Formats used in Biomedical Applications

Structure Data

Multi-media Images

Semi-structure

Free-text

Page 4: Knowledge-based Information Management for Biomedical Applications

Uses of Knowledge Bases to Enhance Information Management

Approximate matching

Query conditions

Image features

Similar conceptual terms

Page 5: Knowledge-based Information Management for Biomedical Applications

Uses of Knowledge Bases to Enhance Information Management

KB query processing

Similarity query answering

Associative query answering

Scenario-specific query answering

Sentinel --Triggering and alerting

Page 6: Knowledge-based Information Management for Biomedical Applications

Examples of KB Information Systems

CoBase (1990-1998), DARPA A database that cooperates with the user for

structure data

KMeD (1991-2000), NSF A Knowledge-based medical multi-media

database

Medical Digital Library (2001-2005), NIH A knowledge-based digital file room for patient

care, education, and research.

Page 7: Knowledge-based Information Management for Biomedical Applications

CoBase www.cobase.cs.ucla.edu

Graduate students:K. ChiangC. LarsonR. Lee

M. Merzbacher M. Minock

Frank Meng Wenlei Mao

Mark YangK. Zhang

Staff:Q. ChenGladys ChowHua Yang

Project leader: Wesley W. Chu

Page 8: Knowledge-based Information Management for Biomedical Applications

CoBase: Cooperative Databases

Conventional query answering Need to know the detailed data based

schema Cannot get approximate answers Cannot answer conceptual queries

Cooperative query answering Derive approximate answers Answer conceptual queries Provide additional relevant answers that

user does not (or does not know how to) ask for

Page 9: Knowledge-based Information Management for Biomedical Applications

Find a seaport with railway facility in Los Angeles

CoBase ServersHeterogeneousInformation Sources

CoBase provides: Relaxation Approximation Association Explanation

Find a nearby friendly airport that can land F-15

Domain Knowledge

Find hospitals with facility similar to St. John’s near LAX

Cooperative Queries

Page 10: Knowledge-based Information Management for Biomedical Applications

Generalization and Specialization

More Conceptual Query

Specific Query

Conceptual Query Conceptual Query

Specific Query

Generalization

SpecializationGeneralization

Specialization

Page 11: Knowledge-based Information Management for Biomedical Applications

Cooperative Querying for Medical Applications

Query Find the treatment used for the tumor similar-

to (loc, size) X1 on 12 year-old Korean males.

Relaxed Query Find the treatment used for the tumor Class X

on preteen Asians. Association

The success rate, side effects, and cost of the treatment.

Page 12: Knowledge-based Information Management for Biomedical Applications

Type Abstraction Hierarchies forMedical Domain

Age

Preteens

910 1112

Teen Adult

Ethnic Group

Asian

Korean Chinese Japanese Filipino

African European

Tumor (location, size)

Class X

[loc1 loc3]

[s1 s3]

Class Y

[locY sY]

X1

[loc1 s1]

X2

[loc2 s2]

X3

[loc3 s3]

Page 13: Knowledge-based Information Management for Biomedical Applications

KB: Type Abstraction Hierarchy

Using clustering technique to group similar Attribute values Image features Spatial relationships among objects

Provides multi-level knowledge (conceptual) representation

Page 14: Knowledge-based Information Management for Biomedical Applications

Data mining for TAH for NumericalAttribute Values

Clustering metrics: relaxation error Difference between the exact value and

the returned approximate value Relaxation error is weighted by the

probability of occurrence of each value Can be extended to multiple

attributes

Page 15: Knowledge-based Information Management for Biomedical Applications

Query Relaxation

RelaxAttribute

Query

Yes

Display

QueryModification

AnswersDatabase

TAHs

No

Page 16: Knowledge-based Information Management for Biomedical Applications

Summary: CoBase

Derive Approximate Answers Answer Conceptual Queries Provide Associative Query

Answers

Page 17: Knowledge-based Information Management for Biomedical Applications

KMeD www.kmed.cs.ucla.edu

Graduate students:Alex BuiChrisitna ChuJohn DionisioT. PlattnerD. JohnsonC. HsuT. Ieong

Consultants:Denies Aberle, M.D.C.M. Breant, Ph.D

PI: Wesley Chu, Ph.D, Computer Science Department

Co-PIs: A. Cardenas, Ph.D, Computer Science

Department Ricky Taira , Ph.D, School of Medicine

Page 18: Knowledge-based Information Management for Biomedical Applications

KMeD Goal: Retrieval of Images by Features & Content

Features size, shape, texture, density,

histology Spatial Relations

angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction

Evolution of Object Growth fusion, fission

Page 19: Knowledge-based Information Management for Biomedical Applications
Page 20: Knowledge-based Information Management for Biomedical Applications
Page 21: Knowledge-based Information Management for Biomedical Applications
Page 22: Knowledge-based Information Management for Biomedical Applications

Characteristics of Medical Queries

Multimedia Temporal Evolutionary Spatial Imprecise

Page 23: Knowledge-based Information Management for Biomedical Applications
Page 24: Knowledge-based Information Management for Biomedical Applications
Page 25: Knowledge-based Information Management for Biomedical Applications

Knowledge-Based Image Model

Representation Level(features and content)

Brain TumorLateral

Ventricle

TAHSR(t,b)

TAHTumor Size

TAHSR(t,l)

TAHLateral

Ventricle

SR: Spatial Relationb: Braint: Tumorl: Lateral Ventricle

Knowledge Level

Schema LevelSR(t,b) SR(t,l)

Page 26: Knowledge-based Information Management for Biomedical Applications

Knowledge-BasedQueryProcessing

Queries

Query Analysis andFeature Selection

Knowledge-BasedContent Matching

Via TAHs

Query Relaxation

Query Answers

Page 27: Knowledge-based Information Management for Biomedical Applications

User Model

To customize users’interest and preference, needs, and

goals. e.g. query conditions, relaxation control,

etc.

User type Default Parameter Values Feature and Content Matching Policies

Complete Match Partial Match

Page 28: Knowledge-based Information Management for Biomedical Applications

User Model (cont.)

Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List

Measure for Ranking Triggering conditions

Page 29: Knowledge-based Information Management for Biomedical Applications

Query Preprocessing

Segment and label contours for objects of interest

Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects

Organize the features and spatial relationships of objects into a feature database

Classify the feature database into a Type Abstraction Hierarchy (TAH)

Page 30: Knowledge-based Information Management for Biomedical Applications
Page 31: Knowledge-based Information Management for Biomedical Applications

Similarity Query Answering

Determine relevant features based on query input

Select TAH based on these features Traverse through the TAH nodes to

match all the images with similar features in the database

Present the images and rank their similarity (e.g., by mean square error)

Page 32: Knowledge-based Information Management for Biomedical Applications

Visual Query Language and Interface

Point-click-drag interface Objects may be represented by

icons Spatial relationships among

objects are represented graphically

Page 33: Knowledge-based Information Management for Biomedical Applications
Page 34: Knowledge-based Information Management for Biomedical Applications

Visual Query Example

Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture

Page 35: Knowledge-based Information Management for Biomedical Applications
Page 36: Knowledge-based Information Management for Biomedical Applications
Page 37: Knowledge-based Information Management for Biomedical Applications
Page 38: Knowledge-based Information Management for Biomedical Applications
Page 39: Knowledge-based Information Management for Biomedical Applications
Page 40: Knowledge-based Information Management for Biomedical Applications

Summary: KMeD

Image retrieval by feature and content Matching images based on features Processing of queries based on spatial

relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in

a timeline metaphor

Page 41: Knowledge-based Information Management for Biomedical Applications
Page 42: Knowledge-based Information Management for Biomedical Applications

Medical Digital Librarywww.kmed.cs.ucla.edu

Graduate students:Victor Z. LiuWenlei MaoQinghua Zou

Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D.

Project leader: Wesley W. Chu

Page 43: Knowledge-based Information Management for Biomedical Applications

Data Types Used in a Medical Digital Library

Structured data (patient lab data, demographic data,…)--CoBase

Images (X rays, MRI, CT scans)--KMeD

Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system)

Page 44: Knowledge-based Information Management for Biomedical Applications

A Free-Text Retrieval System (FTRS)

Patient reports

Medical literature

Knowledge-based Free- Text Retrieval System (FTRS)

Teaching materials

Query results

Ad hoc query

Patient report for content correlation

News Articles

Page 45: Knowledge-based Information Management for Biomedical Applications

A Sample Patient Report…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT

LOWER LOBE)…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT

LOWER LOBE)…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

Page 46: Knowledge-based Information Management for Biomedical Applications

Treatment-related articles

??? How to treat the disease

Diagnosis-related articles

??? How to diagnose the disease

Scenario-Specific Retrieval…Tissue Source:LUNG (FINE NEEDLE

ASPIRATION) (LEFT LOWER LOBE)

…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

Page 47: Knowledge-based Information Management for Biomedical Applications

Challenge I: Indexing for Free-Text

Extracting key concepts in the free-text for indexing Free-text: Lung cancer, small cell,

stage II

Concept terms in knowledge source: stage II small cell lung cancer

Conventional methods use NLP Not scalable

Page 48: Knowledge-based Information Management for Biomedical Applications

Challenge II: Mismatch between terms used in query and documents

ExampleQuery: … lung cancer, …

Document 3: anti-cancerdrug combinations…

?? ?Document 1: … lung carcinoma …

Document 2: … lung neoplasm …

Page 49: Knowledge-based Information Management for Biomedical Applications

Challenge III: Terms used in the query are too general

Expanding the general terms in the query to specific terms that are used in the document

Query: lung cancer, diagnosis options

Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …

?√

Query: lung cancer, chest x-ray, bronchography, …

Page 50: Knowledge-based Information Management for Biomedical Applications

A Medical KB:Unified Medical Language System (UMLS)

Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts)

Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.)

Specialized Lexicon

Page 51: Knowledge-based Information Management for Biomedical Applications

Using knowledge sources to resolve these challenges

Challenge I: Automatic indexing of free text

Challenge II : Mismatch between terms in the query and the documents

Challenge III: Terms in the query are too general

Page 52: Knowledge-based Information Management for Biomedical Applications

IndexFinder: Extracting domain-specific key concepts

Technique Permute words from text to generate

concept candidates. Use knowledge base to select the

valid candidates. Problem

Valid candidates may be irrelevant to the document.

Redundant concept

Page 53: Knowledge-based Information Management for Biomedical Applications

Filtering out Irrelevant Concepts

Syntactic filter: Limit permutation of words within a

sentence. Semantic filter:

Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts

Use ISA relationship to filter out general concepts and yield specific concepts.

Page 54: Knowledge-based Information Management for Biomedical Applications

Using knowledge sources to resolve these challenges

Challenge I: Automatic indexing of free text

Challenge II : Mismatch between terms in the query and the documents

Challenge III: Terms in the query are too general

Page 55: Knowledge-based Information Management for Biomedical Applications

Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drugcombinations …

Document: … anti-cancer drugcombinations …

Phrase-based Vector Space Model (VSM)

Query: … lung cancer, …

?

Knowledge source

lung cancer = lung carcinoma …√

lung neoplasm …

parent_of

anti-cancer drug combinations

missing!!!

Query: … lung cancer, …

√??

Page 56: Knowledge-based Information Management for Biomedical Applications

Phrase-based VSM Examples

Query

Document

[(C0242379); “lung” “cancer”] …[(C0003393); “anti” “cancer” “drug” “combin”] …

Query:“lung cancer …”

Phrases:[(C0242379); “lung” “cancer”]…

Document:“anti-cancer drugcombinations …”

Phrases:[(C0003393); “anti” “cancer” “drug” “combin”]…

Page 57: Knowledge-based Information Management for Biomedical Applications

Using knowledge sources to resolve these challenges

Challenge I: Automatic indexing of free text

Challenge II : Mismatch between terms in the query and the documents

Challenge III: Terms in the query are too general

Page 58: Knowledge-based Information Management for Biomedical Applications

Query Expansion (QE)

Queries in the following form benefit from expansion:

<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. treatment options

<key concept> + <specific supporting concept(s)>e.g. lung cancer e.g. chemotherapy, radiotherapy

expansion

Page 59: Knowledge-based Information Management for Biomedical Applications

result

lung cancer

study

patientsurvive

mediastinoscopybronchoscopy chemotherapy radiotherapy

increase

Statistical lung cancer

study

patientsurvive

mediastinoscopybronchoscopy chemotherapy radiotherapy

increase

result

Knowledge Source

heart surgery

heart disease

Disease orSyndrome

Therapeutic orPreventive Procedure

treats

+Statistical

Knowledge-based Scenario-specific Expansion

lung cancer

study

patientsurvive

mediastinoscopybronchoscopy chemotherapy radiotherapy

increase

result

Knowledge Source

heart surgery

heart disease

Disease orSyndrome

Therapeutic orPreventive Procedure

treats

Page 60: Knowledge-based Information Management for Biomedical Applications

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

Statistical expansion (Stem VSM) Stem VSM (no expansion)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

Statistical expansion (Stem VSM) Phrase VSM (no expansion)

Stem VSM (no expansion)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

Knowledge-based expansion (Phrase VSM) Statistical expansion (Stem VSM)

Phrase VSM (no expansion) Stem VSM (no expansion)

Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)

Overallimprovement:33%,100 queriesvs.5%,50 queries

Page 61: Knowledge-based Information Management for Biomedical Applications

Template:“<disease>, treatment”

FTRS: Scenario-specificQuery Answering

Sample templates:“<disease>, treatment,”“<disease>, diagnosis ”

lung cancer

relevant documents

QueryExpansion

lung cancerradiotherapychemotherapycisplatin

IndexFinder

lung cancer,treatment

Phrase-basedVSM Engine

Page 62: Knowledge-based Information Management for Biomedical Applications

FTRS: Scenario-specific content correlation IndexFinder extracts key concepts from free-text for content

correlation

Query Templates Scenario

Selection

e.g. treatment, diagnosis, etc.

PatientReport

relevant documents

Phrase-basedVSM Engine

IndexFinder QueryExpansion

Page 63: Knowledge-based Information Management for Biomedical Applications

Summary: KB Free-text retrieval

Technologies IndexFinder – extracts key concepts from

the free-text Phrase-based VSM – a new document

indexing paradigm (concept and its word stems) to improve retrieval effectiveness

Knowledge-based query expansion – match query with scenario-specific documents

provides scenario-specific free-text retrieval

Page 64: Knowledge-based Information Management for Biomedical Applications

Conclusions Knowledge sources provides

Approximate matching Query conditions Image features

Query processing Similarity query answering User modeling Associative answering Triggering and alerting

Document retrieval Convert ad hoc free-text into controlled vocabulary Phrase-based VSM Content correlation Scenario-specific retrieval

Increase capabilities and effectiveness Information Management

Page 65: Knowledge-based Information Management for Biomedical Applications

Acknowledgement

This research is supported by DARPA, NSF Grant # 9619345, and NIC/NIH Grant#4442511-33780