64
Learning Classifiers for Computer Aided Diagnosis Using Local Correlations Glenn Fung, Computer-Aided Diagnosis and Therapy Siemens Medical Solutions, Inc. Collaborators: Volkan Vural, Jennifer Dy [Northeastern University] Murat Dundar, Balaji Krishnapuram, Bharat Rao [Siemens] Feb 13, 2008

Learning Classifiers for Computer Aided Diagnosis Using Local Correlations Glenn Fung, Computer-Aided Diagnosis and Therapy Siemens Medical Solutions,

Embed Size (px)

Citation preview

Learning Classifiers for Computer Aided Diagnosis Using Local Correlations

Glenn Fung, Computer-Aided Diagnosis and TherapySiemens Medical Solutions, Inc.

Collaborators: Volkan Vural, Jennifer Dy [Northeastern University]Murat Dundar, Balaji Krishnapuram, Bharat Rao [Siemens]

Feb 13, 2008

2

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Outline Brief Overview of CAD systems

Assumption in traditional classifier design are Often, not valid in CAD problems

Convex algorithms for Multiple Instance Learning (MIL)

Bayesian algorithms for Batch-wise classification

Faster, approximate algorithms via mathematical programming

Summary / Conclusions

Page 3 Siemens Medical Solutions, Inc.

1D*: EKG

2D: X-ray, Mammo, Pap...

2D+Time: Echo

3D: CT, MRI, PET...

3D+Time: 4D Cardiac US/CT, Gated PET/CT, Dynamic MRI...

Imaging Data: Growing Possibilities, Growing Challenges

*signal acquired in time

Page 4 Siemens Medical Solutions, Inc.

Computer-Aided Intelligent Imaging InterpretatonThe Goal

For computer to “see” (or do) what medical experts see (or do)- To automate routine, mind-numbing, and time-consuming tasks;- To improve consistency (by reducing intra- and inter-expert variability);

Page 5 Siemens Medical Solutions, Inc.

For computer to “see” what doctors may miss - To improve sensitivity for disease detection and diagnosis;- To perform quantitative assessment not achievable by “eyeballing” or “guesstimate”;

Sensitivity = 3/5 = 60%Specificity = 3/4 = 75%

False Positive Rate (= 1 – specificity)

True Positive Rate (= Sensitivity)

Receiver operating characteristic (ROC) curve

Computer-Aided Intelligent Imaging InterpretatonThe Goal

Page 6 Siemens Medical Solutions, Inc.

SegmentationSegmentation

“Segmentation is the partition of a digital image into multiple regions (sets of pixels), according to some criterion.” – wikipedia.org

At the low level, the criterion can be uniformity, which is determined according to pixel intensity, texture (repetitive patterns), etc.

At a semantic level, the criterion can be object(s) and the background.

In medical imaging, it usually refers to the delineation of different tissues or organs.

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches

Page 7 Siemens Medical Solutions, Inc.

DetectionDetection

Detection is the process of finding one or more object or region of interest.

In medical imaging, detection of abnormalities is often a primary goal. Examples include the detection of lung nodules, colon polyps, or breast lesions, all of which can be precursors to cancer; or the detection of abnormality of the brain (e.g., Alzheimer's disease) or pathological deformation of the heart (e.g., ventricular enlargement).

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches

Page 8 Siemens Medical Solutions, Inc.

ClassificationClassification

Classification is the separation of objects into different classes.

In medical imaging, classification is often performed on a tissue or organ to distinguish between its healthy and diseased state, or different stages of the disease.

A classifier is often trained using a training set, where one or more experts have assigned labels to a set of objects.

Computer-Aided Intelligent Imaging InterpretatonBasic Tools and Approaches

Page 9 Siemens Medical Solutions, Inc.

• More and more data available,

• It is the prediction and early detection of diseases that saves most lives. However, “early” usually means more subtle signs and weaker signals in the images. Doctor often use a complex set of features that are often hard to formulate in computational forms;

• If doctors miss them, who will teach the computer?

• How do we know that we are doing better, if doctors do not agree among themselves?

• Regulatory challenges

Computer-Aided Intelligent Imaging InterpretatonChallenges

Page 10 Siemens Medical Solutions, Inc.

CAD Algorithms

Page 11 Siemens Medical Solutions, Inc.

CAD Workflow: Core Tasks

Collect individual

patient’s data

Feature extraction

InferenceDecision

support forphysician

Feature Extractionfrom free text

Feature Extractionfrom images

Feature Extractionfrom omics data

ImageRegistration

Segmentation& quantification

Combine info frommultiple sources

ModelOptimization

Causal prob.inference

Fusion &Classification

Evidentialinference

TemporalReasoning

Low-levelimage processing

Knowledge-basedmodeling

Predictivemodeling

Modeling /Candidate generation

Classification (forcandidate pruning)

Page 12 Siemens Medical Solutions, Inc.

General Detection Examples

Vol 1Time 1

Detect /Analyze

Results1

Chest CT

Detect Nodules

Results

Colon CT

Detect Polyps

Results

Chest CT

Detect Emboli

Results

Page 13 Siemens Medical Solutions, Inc.

Lung CAD

Page 14 Siemens Medical Solutions, Inc.

Motivation

1. Lung cancer is the most commonly diagnosed cancer worldwide, accounting for 1.2 million new cases annually. Lung cancer is an exceptionally deadly disease: 6 out of 10 people will die within one year of being diagnosed

2. The expected 5-year survival rate for all patients with a diagnosis of lung cancer is merely 15%

3. In the United States, lung cancer is the leading cause of cancer death for both men and women, causes more deaths than the next three most common cancers combined, and costs $9.6 Billion to treat annually.

4. However, lung cancer prognosis varies greatly depending on how early the disease is diagnosed; as with all cancers, early detection provides the best prognosis.

Page 15 Siemens Medical Solutions, Inc.

1. Every pulmonary nodule, independent of size and location may be malign and needs to be looked at (20 - 50% of resected nodules are malignant)

2. The smaller the nodule the better the prognosis after nodule resection with respect to 5 year survival rate

3. There is need for a screening method, as it is already available for mammography.

The need for lung CAD

Page 16 Siemens Medical Solutions, Inc.

CAD in plain words :

Find nodules in a large volume data set- solitary or attached to anatomical structures

Segment nodules correctly- remove structures like vessel, bronchus and pleura consistently and anatomically correct

Quantify nodules- volume, calcification, morphology, localization

Classify nodules as benign or malignant

Lung CAD:Introduction

Page 17 Siemens Medical Solutions, Inc.

Detecting Lung Cancer is hard:Part of a Single CT study of Lung

Page 18 Siemens Medical Solutions, Inc.

Where is the nodule?

Page 19 Siemens Medical Solutions, Inc.

Where is the lung cancer?

Page 20 Siemens Medical Solutions, Inc.

Where is the lung cancer?

Page 21 Siemens Medical Solutions, Inc.

Where is the lung cancer?

Page 22 Siemens Medical Solutions, Inc.

Computer aided detection

automatic detection scheme acts as a second reader

Computer Aided Detection

Page 23 Siemens Medical Solutions, Inc.

Fly around

interactive visualization of the nodule, andeven fly around movies are possible ...

CAD Viewing Modes

Page 24 Siemens Medical Solutions, Inc.

Colon CAD

Page 25 Siemens Medical Solutions, Inc.

Motivation

Colorectal cancer is the 3rd most common diagnosed cancer in USA:

- 135,000 new cases forecast for 2001

- 48,000 deaths forecast in 2001

- 95% 5-year mortality rate for patients whose colorectal cancer has spread to other body parts

- 10% 5-year mortality rate if treated at early stage

Source: American Cancer Society

Page 26 Siemens Medical Solutions, Inc.

CT Colonography:Exciting opportunity

Invasive colonoscopy remains the Gold Standard

CT Colonography: a promising non-invasive method

- 0.8 mm slices of abdomen possible in 9 sec breath-hold with a 16-slice CT

- CT has been shown capable of down to 6 mm polyp visualization

- CT exam is more acceptable and comfortable for patients

Page 27 Siemens Medical Solutions, Inc.

Colon CAD Summary

GOALHigh sensitivity(Low specificity is acceptable)

Colon Segmentation (pre-processing)

Polyp Candidate Generation

Pruning/Filtering

CT Volume

Pre-processed Volume

Candidate List

Final List

Feature Extractions

Features for Candidate List

GOALHigh sensitivityHigh specificity

28

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Page 29 Siemens Medical Solutions, Inc.

Shown in endo-view (bottom right) example of located polyp. This polyp was missed by the physician prospectively

Detection missed by physician

30

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

General paradigm for CAD systems

Candidate generation

Image

Candidates

Feature Extraction

Numerical attributes for each candidate

Classification

Final Marks on Image

31

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Properties of the data used for designing classifiers for CAD systems

The training data is highly unbalanced

There is a form of stochastic dependence among the labeling errors of a group of candidates that are closer to a radiologist mark.

The features used to describe spatially close samples are highly correlated

The CG algorithm tends to have varying levels of sensitivity to different types of structures.

Some training images tend to contain far more false positive candidates as compared to the rest of the training dataset.

32

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Shortcomings in standard classification algorithms

Tend to underestimate minority class when problems are very unbalanced

Assume that the training examples or instances are drawn identically and independently from an underlying unknown distribution

Assume that the appropriate measure for evaluating the classifiers is based only on the accuracy of the system on a per-lesion basis

Correct classification of every candidate instance is the main goal, instead of the ability to detect at least one candidate to points to each malignant lesion.

33

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CAD: Correlations among candidate ROI

34

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Correlations among patients from the same hospital scanner type, patient preparation, geographical location etc

Correlations among samples from the same patient:

samples pointing to the same structure, samples from different orientations, image characteristics – e.g., contrast/artifacts/noise

Hierarchical Correlation Among Samples

35

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Initial Idea: Additive Random Effect Models

The classification is treated as iid, but only if given both

Fixed effects (unique to sample)

Random effects (shared among samples)

Simple additive model to explain the correlations

P(yi|xi,w,ri,v)=1/(1+exp(-wT xi –vT ri))

P(yi|xi,w,ri)=s P(yi|xi,w,ri,v) p(v|D) dv

Sharing vT ri among many samples correlated prediction

…But only small improvements in real-life applications

36

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Candidate Specific Random Effects Model: Polyps

1-Specificity

Sen

sitiv

ity

37

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CAD algorithms: Other examples of correlations between samples

Multiple (correlated) views: one detection is sufficient

Systemic treatment of diseases: e.g. detecting one PE sufficient

Modeling the data acquisition mechanism

Errors in labeling for training set.

38

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

The Multiple Instance Learning Problem (NIPS 2006): Motivation

4 Candidates pointing to the same polyp

Only ONE candidate needs to be correctly classified!!!

Bag of candidates

39

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

The Multiple Instance Learning Problem (NIPS 2006)

A bag is a collection of many instances (samples)

The class label is provided for bags, not instances

Positive bag has at least one positive instance in it

Examples of “bag” definition for CAD applications:

Bag=samples from multiple views, for the same region

Bag=all candidates referring to same underlying structure

Bag=all candidates from a patient

40

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CH-MIL Algorithm: 2-D illustration

41

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CH-MIL Algorithm for Fisher’s Discriminant

Easy implementation via Alternating Optimization

Scales well to very large datasets

Convex problem with unique optima

42

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Lung CAD

Lung Nodules

Computed Tomography

43

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CH-MIL: Pulmonary Embolisms

44

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

CH-MIL: Polyps in Colon

45

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Classifying a Correlated Batch of Samples (ECML 2006) : Motivation

The candidates that belong to the same patient’s medical images are highly correlated

There is not any correlation between candidates from different patients

The level of correlation is a function of the pair wise distance between candidates

The samples (candidates) are collected naturally in batches

All the samples that belong to the same image constitute a batch

46

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Classifying a Correlated Batch of Samples (ECML 2006) Let classification of individual samples xi be based on ui

Eg. Linear ui = wT xi ; or kernel-predictor ui= j=1N j k(xi,xj)

Instead of basing the classification on ui, we will base it on an unobserved (latent) random variable zi

Prior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori,

p(z)=N(z|0,)

Eg. due to spatial adjacency = exp(-D),

Matrix D=pair-wise dist. between samples

47

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Classifying a Correlated Batch of Samples

Prior: Even before observing any features xi (thus before ui), zi are known to be correlated a-priori,

p(z)=N(z|0,)

Likelihood: Let us claim that ui is really a noisy observation of a random variable zi :

p(ui|zi)=N(ui|zi, 2)

Posterior: remains correlated, even after observing the features xi

P(z|u)=N(z|(-12+I)-1u, (-1+2I)-1)

Intuition: E[zi]=j=1N Aij uj ; A=(-12+I)-1

48

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Related Work

Conditional Random Fields and Maximum Margin Markov Networks used for Natural Language Processing

Computationally expensive

Multiple Instance Learning (MIL)

MIL Batch

Same label is assigned to the entire batch (bag) of related samples

Individuals in the same batch may have different labels

Samples in the same bag are assumed to be equally related

More fine grained differences in the level of correlation

49

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Support Vector MachinesMaximizing the Margin between Bounding Planes

x0w= í +1

x0w= í à 1

A+

A-

jjwjj22

w

Support vectors

50

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Algebra of the Classification Problem 2-Category Linearly Separable Case

Given m points in n dimensional space Represented by an m-by-n matrix A

More succinctly:

D(Awà eí ) õ e;where e is a vector of ones.

x0w= í æ1: Separate by two bounding planes,

A iwõ í +1; for D i i =+1;

A iwô í à 1; for D i i =à 1:

An m-by-m diagonal matrix D with +1 & -1 entries

Membership of each A i in class +1 or –1 specified by:

51

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Support Vector MachinesLinear Programming Formulation

Use the 1-norm instead of the 2-norm:

÷e0y+kwk1y> 0;w; í

D(Awà eí ) +yõ e

min

s.t.

This is equivalent to the following linear program:

min ÷e0y+e0vyõ 0;w;í ;v

D(Awà eí ) +yõ es.t.

võ wõ à v

52

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Mathematical Programming formulation (cont.)

eeBwD )(

To be learned during training

eewBID jjjj

)(1

21

Standard SVM constraint replaced by the proposed equation

eewBID jjjj )(

Probabilistic-inspired approach replaced by a simpler

approximation

53

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Testing in Batch Classification

)'()( xwsignxf

Decision function for standard SVM:

Samples are tested one at a time

wBxwsignxf jji

ji')(

^

Decision function for batch classification:

Samples are tested in batchesContribution of

other samples in the same batch

54

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

SVM-like Approximate Algorithm

Intuition: classify using E[zi]=j=1N Aij uj ; A=(-12+I)-1

What if we used A=( + I) instead?

Reduces computation by avoiding inversion.

Not principled, but a heuristic for speed.

Yields an SVM-like mathematical programming algorithm:

55

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Mathematical Programming Formulation: Nonlinear versionA “kernelized” version can be also easily derived using the usual duality relation:

''' ee j

j

v0

min

s.t. eevABKID jjjj ',

56

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Toy Example: Geometrical Intuition

57

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Toy Example II

58

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Toy Example III

Point

Batch Label SVM Pre-classifier

Final classifier

1

2

3

4

5

1

1

1

1

1

+

+

-

+

-

0.2826

0.2621

-0.2398

-0.3188

-0.4787

0.1723

0.1315

0.0153

-0.0259

-0.0857

0.1918

0.2122

-0.0781

0.2909

-0.0276

6

7

8

9

10

2

2

2

2

2

+

-

+

-

-

0.2397

0.2329

0.1490

-0.2525

-0.2399

0.0659

0.0432

0.0042

-0.0752

-0.1135

0.0372

-0.0888

0.0680

-0.1079

-0.1671

59

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Detecting Polyps in Colon

60

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Detecting Pulmonary Embolisms

61

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Detecting Nodules in the Lung

62

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

Conclusions

IID assumption is universal in ML

Often violated in real life, but ignored

Explicit modeling can substantially improve accuracy

Described 3 models in this talk, utilizing varying levels of information

Additive Random Effects Models: weak correlation information

Multiple Instance Learning: stronger correlations enforced

Batch-wise classification models: explicit information

Statistically significant improvement in accuracy

Only starting to scratch the surface, lots to improve!

63

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

64

©2007 Siemens Medical Solutions. All rights reserved. Computer-aided Diagnosis & Therapy

We are hiring!Research Scientists (Machine Learning / Probabilistic Inference)Entry Level to Senior Level Opportunities

Computer-Aided Diagnosis & Therapy Solutions GroupSiemens Medical Solutions USA, Inc.Multiple open positions for candidates with a Ph.D. (or graduating with a PhD in ‘07)

to perform leading-edge R&D in activities involving all areas of probabilistic inference (Bayesian methods, temporal reasoning, graphical models) and/or machine learning (classification, statistical learning theory, optimization). We seek outstanding scientists who can solve challenging medical problems and continue to publish in leading journals and conferences.

Qualifications:Ph.D. in CS/EE/Statistics/Applied Math or an engineering discipline with an

interdisciplinary background.Strong publication record in leading conferences and journals in machine learning /

probabilistic inference.The ability to learn new technologies and apply them to challenging problems

involving reasoning from incomplete and unstructured medical patient data, classification of patients/diseases, as well as machine learning for automatically extracting patterns from massive amounts of free text, numeric, imaging, and symbolic data; combine imaging and clinical information; and other related areas. NLP is a plus.

We are located in Malvern, PA, less than an hour from Center City Philadelphia in the suburban Main Line area. Siemens offers a competitive salary and benefits package that reflects our leadership status.