Objective Evaluation of Intelligent Medical Systems using a Bayesian Approach to Analysis of ROC...

Preview:

DESCRIPTION

Evaluation Problem Collecting Medical Test Cases is Expensive Desirable to test Systems with few cases System may Pass by Luck Must use ‘Confidence Intervals’ ROC curves - convenient existing representation for results

Citation preview

Objective Evaluation of Intelligent Medical Systems using a Bayesian

Approach to Analysis of ROC Curves

Julian TilburyPeter Van Eetvelt

John CurnowEmmanuel Ifeachor

Contents

• Evaluation Problem• Introduction to ROC Curves• Frequentist Approach• Bayesian Approach• Area under the Curve (AUC)• Parametric ROC Curves• Conclusion

Evaluation Problem

• Collecting Medical Test Cases is Expensive• Desirable to test Systems with few cases• System may Pass by Luck• Must use ‘Confidence Intervals’

• ROC curves - convenient existing representation for results

Introduction to ROC Curves

• Two populations– Healthy– Diseased

• Known by a Gold Standard• Differentiate using a single Test Measure

– What Threshold will separate them?

Frequentist Approach

• E.g. Green & Swets – for each point

– False Alarm Rate Confidence Interval

– Hit Rate Confidence Interval

Combined to give cross

Three ‘Problems’

• False Alarm Rate Confidence Interval of Point 0 is zero width

• Hit Rate Confidence Interval of Point 1 is zero width

• Hit Rate Confidence Interval is beyond the graph

• Given the data, this makes no sense!

Four Observations

1. Sample too small2. Hit Rate (or False Alarm Rate) near 0 or 13. Correct within paradigm

• Population mean = Sample mean• Distribution of re-sampling

4. Confidence Interval off Graph• Off-graph = no samples, so add to taste

Bayesian ApproachConsider just the False Alarm RateUsing Bayes’ Law

•Assume a prior distribution for the population•Update the distribution according to evidence to give posterior distribution

Combine False Alarm Rate and Hit Rate to give combined posterior distributionCompute using Dirichlet Integrals

(For Point 0)

Convergence

At low sample sizes the two paradigms give radically different results

As the sample size increases the resultant distributions merge

Take multiples of 3 False positive and 2 True negatives …

Area Under the Curve

• Single value used as a summary of diagnostic accuracy

• Novel Bayesian method (by Dynamic Programming)

• Existing Frequentist methods

Parametric ROC Curves

• Both Healthy and Diseased populations are ‘Gaussian’

• Curve can be characterised by two parameters:– Difference in Means– Ratio of Standard Deviations

Healthy Mean – Disease Mean = Sigmoid 2µh - 2µd

δh + δd ( )

Healthy Sd =2δh

δh + δd

2δd

δh + δd Disease Sd =

Parametric Analysis

• Existing Maximum Likelihood– Brittle– Frequentist Confidence Intervals

• Novel Analysis (by Dynamic Programming)– Robust– Maximum Likelihood– Posterior Interval for Parameters– and Area Under Curve

Nonparametric

Parametric

Conclusion• Frequentist (for low sample size)

– Best – counterintuitive– Worst – ‘wrong’

• Bayesian– Best – robust and accurate– Worst – slow to calculate

• Still need the prior distribution

• Converge at high sample size• Therefore use Bayesian for all sample sizes

Recommended