The Complementary Roles of Computer-Aided Diagnosis and Quantitative Image Analysis

The Complementary Roles of Computer-Aided Diagnosis and

Quantitative Image Analysis: Similarities and Differences in

Assessment, Quality Assurance and Training

Berkman Sahiner1, PhD Samuel G. Armato III2, PhD Zhimin Huo3, PhD Heang-Ping Chan4, PhD Ronald M. Summers5, MD, PhD Nicholas Petrick1, PhD

1: US Food and Drug Administration, Center for Devices and Radiological Health 2: University of Chicago, Department of Radiology 3: Carestream Inc. 4: University of Michigan, Department of Radiology 5: National Institutes of Health, Clinical Center

Introduction Quantitative image analysis (QIA) and computer-aided

diagnosis (CAD) are closely linked

This exhibit reviews the methods developed for assessment, quality assurance, and user training for CAD, and highlights parallels and distinctions between CAD and QIA

Comparison between CAD and QIA in this computer exhibit is performed through several example applications developed by the authors or that are publicly available

What is CAD?

CAD systems incorporate pattern recognition and data analysis capabilities* and are intended to

Mark regions of an image that may reveal specific abnormalities and alert the clinician to these regions during image interpretation (computer-aided detection (CADe) systems)**

Provide to the clinician an assessment of disease, disease type, severity, stage, progression (computer-aided diagnosis (CADx) systems)**

*Guidance for industry and FDA staff: “Clinical performance assessment: Considerations for computer-assisted detection devices applied to radiology images and radiology device data - premarket approval (PMA) and premarket notification [510(k)] submissions," (FDA, 2012).

**N. Petrick et al., “Evaluation of computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:087001.

What is QIA? QIA is the process of extraction of quantitative imaging

biomarkers from medical images, typically involving computerized tools

A quantitative imaging biomarker is an objectively measured characteristic derived from an in vivo image as an indicator of normal biological processes, pathogenic processes, or response to a therapeutic intervention*

*Sullivan DC et al., “Metrology standards for quantitative Imaging biomarkers,” Radiology 2015 Aug 12:142202. (Epub ahead of print).

What is Quantitative Imaging? Both QIA and quantitative imaging biomarkers are within

the larger context of quantitative imaging, defined as

the extraction of quantifiable features from medical images for the assessment of normal [findings] or the severity, degree of change, or status of a disease, injury, or chronic condition relative to normal [findings]

Quantitative imaging includes the development, standardization, and optimization of anatomical, functional, and molecular imaging acquisition protocols, data analyses, display methods, and reporting structures

*Quantitative Imaging Biomarkers Alliance. http://rsna.org/QIBA.aspx. Accessed Oct. 15, 2015.

How are CAD and QIA Similar? Both aim at aiding clinicians through advanced image

analysis techniques Both commonly use computer methods to extract features

CAD may use other patient-related information, e.g., age, gender, risk factors, or non-image biomarkers

CAD features may include quantitative imaging biomarkers, but can also be relative, ordinal features

User interaction may be somewhat more prevalent in QIA (e.g., semi-automated segmentation)

Both emphasize appropriate image acquisition protocols, display methods, training, and reporting

How are CAD and QIA Different? In CAD, the emphasis is on how the CAD output can provide

decision support to clinicians In full assessment, it is required to demonstrate that a CAD system

aids the clinician

May integrate imaging and non-imaging information or biomarkers

Not necessary to establish a direct link between individual CAD features and a specific disease condition

Standalone testing (performance of the CAD system alone) is nonetheless important

In QIA, the emphasis is on the extraction of biomarkers Bias and variability across devices, patients and time are major

considerations

Assessment, Quality Assurance, and Training

The differences and similarities between CAD and QIA have a number of implications for assessment, quality assurance (QA), and training

We briefly summarize our work in CAD in these areas and draw parallels with QIA

Assessment - CAD CAD systems are typically assessed by evaluating the effect

of the system on clinicians Reader performance assessment: Evaluate performance of a

clinician using CAD as part of the decision making process Retrospective multiple-reader multiple case (MRMC) studies

Prospective field trials

Standalone performance of a CAD system is also useful as a performance indicator, both in System design phase and

Final system assessment

Assessment - CAD A recent paper by AAPM outlined important considerations in CAD

assessment including*: Data set selection

Representativeness of the test set for the targeted population

Reference standard and mark labeling Disease status of a case (sometimes includes location information)

Standalone assessment metrics True-positive fraction, false-positive fraction pairs Receiver operating characteristic (ROC) curves Location-specific ROC

Reader study design and analysis Prospective versus retrospective studies MRMC study design Reader training for reader studies

*N. Petrick et al., “Evaluation of computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:087001.

Assessment - Quantitative Imaging

Most groups working in quantitative imaging are interested in technical performance assessment

How a test performs in reference objects or subjects under controlled conditions

Other quantitative imaging assessment

Clinical assessment

Clinical impact

Technical Assessment - Quantitative Imaging

Technical assessment can be performed using Anthropomorphic phantoms

Large number of replicate measurements easily obtained The same phantoms can be used repeatedly Reference standard easier to obtain Phantom realism may be limited Generalizability to human data may be limited because of limited

abnormality variation and complexity

Patient data More realistic Number of replicate measurements may be limited Reference standard more difficult to obtain Number of available patients usually limited

Technical Assessment - Quantitative Imaging Important considerations include*

Measurand/Reference True value of the quantity intended to be measured

Bias Difference between the expected value of the biomarker and the measurand

Linearity Is a change in the measurand reflected as a proportional change in the biomarker?

Repeatability Ability to repeatedly measure the same feature under identical or near-identical

conditions

Reproducibility Ability to measure the same feature under different conditions expected in

A preclinical study Clinical trial Clinical practice

D.L. Raunig et al., “Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment,” Stat Methods Med Res. 2015, 24:27-67.

Clinical Assessment - Quantitative Imaging

Clinical assessment of a QIA method

How does the method help clinicians in diagnosis or treatment?

Recommendations developed for clinical assessment of CAD may be useful for the clinical assessment of quantitative imaging biomarkers after proper modifications

Methodology for prospective and retrospective studies

MRMC methods

Assessment metrics

Quality Assurance - CAD CAD systems are medical devices, and can benefit from QA

procedures like all medical devices

Clinical image acquisition: Follow manufacturer’s QA procedure for imaging device

Additional QA for CAD: Assure functionality and performance of CAD device according to

vendor’s specifications

Assure consistency of CAD device performance over time

A recent paper by AAPM outlined important considerations in CAD quality assurance*

*Z. Huo, et al., “Quality assurance and training procedures for computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:077001.

Quality Assurance - Quantitative Imaging

Clinical image acquisition:

Follow manufacturer’s QA procedure for imaging device

Image acquisition for quantitative imaging:

QA procedures to reduce variability across devices, patients and time

QA procedures to ensure that acquired images are of quantitative quality

Calibration methods and phantom tests in addition to those recommended by manufacturers for clinical image quality

Clinician Training - CAD Inform clinicians about

Intended use of the CAD system

CAD system’s limitations

Particular strengths and weaknesses of different CAD systems E.g., with respect to lesion type, acquisition parameters

Although many clinicians and researchers have underlined the need for clinician training for CAD Additional awareness needs to be raised for proper use and

potential adverse effects of improper use

Best practices for CAD training are yet to be established

Clinician Training - Quantitative Imaging

Clinicians using the quantitative imaging tool in the clinic should be qualified in accordance with local and national requirements and standards

Training of operators in image acquisition and analysis can influence the value of the extracted imaging biomarker e.g., training of an operator in a semi-automated segmentation task

for a lesion volume biomarker

There is a small number of studies on clinician training and its effect on the extracted quantitative imaging biomarker

• The rest of the computer exhibit consists of a summary and example applications developed by the authors or that are publicly available

• Click to go to the slide of your choice

• Summary

• Example applications

• Click on the E on lower right from any slide to go to the list of example applications

E

Summary CAD and QIA fields have a number of shared attributes as

well as some differences

Methodologies developed in one field are likely to be applicable in the other after careful consideration of the differences

We expect that the development of standardization, assessment and clinical use recommendations in these two fields will strengthen each other

E

Example Applications Example 1

CADe assessment: A commercial CAD system

Example 2 Quantitative imaging

technical assessment: A phantom study

Example 3 Quantitative volume change

analysis for treatment response assessment of head and neck neoplasms on CT scans

Example 4 Computer-aided analysis of treatment

response of bladder cancers on CT scans

Example 5 Radiomic biomarkers and decision

support system

Example 6 Measurement of pleural effusion in

thoracic CT volumes

Example 7 Radiomics-based assessment of normal

lung tissue damage in radiation therapy

Example 1 CADe Assessment:

A Commercial CADe System

E

A Commercial CADe System

M-Vu Algorithm Engine

CADe device intended to aid radiologists in reading screening mammograms Mark areas for review by a radiologist, who

First reviews each case in the conventional manner,

then re-examines regions marked by the M-Vu system before making a BI-RADS assessment

Pre-market approval (PMA) by the FDA in 2012

E

Test Data Set Selection 140 positive and 140 negative cases

11 sites in the United States Academic, specialty, and community clinics.

Pre-defined inclusion and exclusion criteria Sequential, eligible positive mammograms

A mammography exam for which a biopsy-proven breast cancer was found within 15 months following the exam date

Eligible negative mammograms A mammography exam for which

Breast cancer was not found within 15 months prior or 15 months after the exam date

At least one associated subsequent negative mammogram was acquired at least 11 months after the exam date.

E

Reference Standard Positive cases:

Pathology reports from biopsies or surgeries

Mammograms and radiology reports for lesion location

Each biopsy-proven lesion is outlined by site investigators

Negative cases

Radiology reports

E

Standalone Assessment: Sensitivity A malignant region was considered to be detected if

A CAD mark centroid was inside the region, or

if the region centroid was inside a CAD mark

Having the cancer within the CAD mark, as above, can lead to an optimistic estimate of sensitivity A large mark may cover the lesion, but may not attract the attention of

the radiologist to the abnormality

When this is used as a detection criterion, other additional data are evaluated to check if sensitivity is overestimated

A case was considered a true positive if At least one malignant region was detected by CAD

E

Standalone Assessment: Sensitivity

Cases Sensitivity 95% Conf. Interval

Overall 140 79.3% (72.6% 86.0%)

Microcalcification 69 79.7% (70.2%, 89.2%)

Mass 86 81.4% (73.2%, 89.6%)

Sensitivity reported for two important sub-groups: Masses and microcalcifications

Additional data on sensitivity with respect to Lesion size Lesion pathology (invasive, DCIS, other) Breast density E

Standalone Performance: False Positives

Additional data on False-positives by breast density

Specificity (percentage of mammograms with no FP marks)

False-positives per mammo. 95% Conf. Interval

Total 0.418 (0.346, 0.490)

Microcalcification 0.088 (0.042,0.133)

Mass 0.330 (0.275, 0.385)

E

Pivotal Study Multi-reader multi-case (MRMC) study to investigate the

effect of the CADe device on radiologists’ performance

21 radiologists from a variety of academic, specialty, and community clinics located across the US

For each case:

Evaluate the case without CAD and record an assessment

View CAD marks

Record a "with CAD" assessment

E

Pivotal Study With and without CAD assessments

Recall/do not recall patient

Screening BI-RADS (0, 1, 2, 3, 4a, 4b, 4c, 5)

Forced BI-RADS (1, 2, 3, 4a, 4b, 4c, 5)

Lesion findings

Lesion findings: Laterality

Type (Mass, Architectural Distortion, Asymmetry, microcalcification)

BI-RADS (1, 2, 3, 4a, 4b, 4c, 5)

Probability of Malignancy (POM) (0-100%)

E

Pivotal Study - ROC Analysis

POM data analyzed using ROC methodology

Area under the ROC curve (AUC)

AUC w/0 CAD = 0.885

AUC with CAD = 0.902

Difference statistically significant with p=0.013 (DBM MRMC method)

E

Pivotal Study – Sens., Spec. Radiologist per-case sensitivity and specificity based on

decision to recall

W/o CAD With CAD Difference P-value

Sensitivity 0.865 0.901 0.036 (0.014, 0.058) 0.002

Specificity 0.649 0.623 -0.026 (-0.039, -0.013) < 0.001

Sensitivity increased and specificity decreased with CAD Typical in reader studies for most second-read CADe systems

Lack of CADe mark for a lesion does not dissuade a reader from recall

Marks for missed lesions and some false-positive marks add to the number of recalled cases

E

Summary Components of CAD assessment

Test data set selection Representative of the targeted population

Reference standard Clear and accurate definitions of actual positives and negatives

Standalone performance How does the CAD system alone perform in the task for which it is intended to help

the radiologist ? Performance metrics, confidence intervals

Reader performance What is the effect of the CAD system on readers?

Performance metrics

Performance with and without CAD

Difference, confidence intervals

E

Example 2 Quantitative Imaging

Technical Assessment: A Phantom Study

E

Designing an Analysis Plan* Step 1: Define the quantitative imaging biomarker and its relationship to quantity to

be measured (measurand)

Step 2: Define question to be addressed

Hypothesis or bounds on performance

Step 3: Define the experimental unit

Lesion-level, patient-level

Step 4: Define statistical measures of performance

What are the metrics for bias, repeatability, reproducibility?

Step 5: Specify elements of statistical design

Reference, reproducibility conditions, etc.

Step 6: Determine the data requirements

Patient population, type of images, sample size, etc.

Step 7: Collect data and perform statistical analysis

*Raunig et al., Stat Methods

Med Res, 2014 E

Main Considerations in Technical Assessment

E

E

Main Considerations in Technical Assessment

Purpose of the Study Evaluate technical performance of a nodule volume

estimation tool

E

Study Design Phantom

Thorax phantom with vascular insert

Synthetic nodules

4 spherical nodules

5, 8, 9, 10 mm

• +100 HU

E

Study Design Image collection protocol

10 repeat acquisitions

Phantom

16×0.75 mm

100 mAs

0.75 mm 1.5 mm

Reconstruction

Acquisition

Detailed Detailed

E

Bias/Linearity

Visually assess data to define limits of quantitation Evaluate means/variances

Is data transformation necessary/appropriate to stabilize the variance?

-60

-40

-20

0

20

40

60

5 mm 8 mm 9 mm 10 mm

Nodule Size

50

150

250

350

450

550

0.75 1.5 0.75 1.5 0.75 1.5 0.75 1.5

Slice Thickness (mm)

Measu

red

Valu

e (

mm

3 )

5 mm 8 mm 9 mm 10 mm Nodule Size

Measu

red

Valu

e-M

ed

ian

(m

m 3

) E

Bias/Linearity

0 100 200 300 400 500 600 0

100

200

300

400

500

600

Reference Standard (mm3)

Measu

red

Valu

e (

mm

3 )

E

Repeatability/Reproducibility

RC

Estimated over 10 repeat acquisitions, 4 nodules

RDC Estimated over 10 repeat acquisitions, 4 nodules, and 2 slice

thicknesses

Slice Thickness

0.75 mm 1.5 mm

RC 14.4 mm3 27.6 mm3

RDC 29.0 mm3

E

Repeatability Analysis

1.5 mm Slices

0 100 200 300 400 500 600 -60

-40

-20

0

20

40

60 Bland-Altman Plot (1.5 mm Slices)

RC=27.62 mm3

0.75 mm Slices

0 100 200 300 400 500 600 -60

-40

-20

0

20

40

60 Bland-Altman Plot (0.75 mm Slices)

RC=14.44 mm3

Mean(Esti,Estj) (mm3) Mean(Esti,Estj) (mm3)

Dif

f(E

st i,E

st j)

(mm

3)

Dif

f(E

st i,E

st j)

(mm

3)

E

Reproducibility Analysis

0 100 200 300 400 500 600 -60

-40

-20

0

20

40

60 Bland-Altman Plot (Reproducibility)

Mean(Est i ) (mm ,Est j 3 )

RDC=28.95 mm3

Dif

f(E

st i,E

st j)

(mm

3)

E

Summary Main components of quantitative imaging biomarker technical

assessment Bias/linearity analysis Repeatability analysis Reproducibility analysis Others

Identification of significant factors/subgroups ….

Challenging to maintain consistency across studies Phantom/clinical data Transformation of data Reference standard Is test-retest data available? Reproducibility conditions E

Example 3 Quantitative Volume Change Analysis

for Treatment Response Assessment of Head and Neck Neoplasms on CT Scans

Ref. Hadjiiski L, Mukherji SK, Gujar SK, Sahiner B, Ibrahim M, Street E, Moyer J, Worden FP, Chan HP. Treatment response assessment of head and neck cancers on CT using computerized volume analysis. American Journal of Neuroradiology (AJNR) 2010;31(9):1744-1751. E

Lesion Segmentation Region of Interest

Preprocessing

Level Set

Automatic Segmentation

E

Example: Tongue Base Tumor

E

Example: Tongue Base Tumor

E

Pre-treatment [cm3]

0 10 20 30 40 50

Po

st-

treate

men

t [c

m3

]

0

10

20

30

40

50

Pre-treatment [cm3]

0 10 20 30 40 50

Po

st-

treate

men

t [c

m3

]

0

10

20

30

40

50

Volume Segmentation • Pre-to-Post-treatment change in segmented primary

tumor volumes – 23 pairs

Auto Manual

E

Results - % Volume Change Comparison • Automatic vs. Manual % volume change

comparison – 23 tumor pairs

- Automatic vs. Manual: r = 0.89

Pearson’s correlation r

E % Volume Change Automatic

-20 0 20 40 60 80 100% V

olu

me

Ch

an

ge

Ma

nu

al

-20

0

20

40

60

80

100

Results – Summary

WHO (longest diameter):

Primary tumors

RECIST (longest diameter and

perpendicular):

0.73

0.58

Pearson’s correlation coeff.

• % Pre-to-Post-treatment change in volume estimates by 3

methods, relative to radiologist’s segmentation:

Quantitative image analysis: 0.89

E

Example: Tonsillar Tumor

Pre-treatment Post-treatment E

Example: Tonsillar Tumor


Example: Tumor


Example: Tumor


Example 4 Computer-aided Analysis of Treatment

Response of Bladder Cancers on CT Scans

Ref. Hadjiiski L, Weizer AZ, Alva A, Caoili EM, Cohan RH, Cha K, Chan HP. Bladder cancers on CT: preliminary study of treatment response assessment based on computerized volume analysis, WHO and RECIST Criteria. American Journal of Roentgenology 2015, 205(2) pp 348-352.

E

Bladder Lesion Segmentation

Cascaded Level Set

Region of Interest

Automatic Segmentation

Auto-Initialized Cascaded Level Set (AI-CALS)

Preprocessing

E

Bladder Tumor: Pre- & Post-Treatment

Pre- treatment

Post- treatment

E

Feature Descriptors Descriptors automatically extracted from the segmented

lesions:

- 15 radiomic features (RF) based on pre- and post-treatment changes in:

- volume (V)

- 5 gray level descriptors (GL)

- 9 shape descriptors (S)

- Selected features merged into a Combined Response Index (CRI)

E

Combined Response Index (CRI) • AUC pT0 stage (complete response) vs. others

– 35 primary site tumors

Volume (3D): 0.68 ± 0.09

Reference (3D): 0.66 ± 0.11

CRI (Volume, Shape):

0.75 ± 0.10

CRI (Volume, Shape, Gray Level):

0.76 ± 0.09

E False Positive Fraction

0.0 0.2 0.4 0.6 0.8 1.0

Tru

e P

os

itiv

e F

rac

tio

n

0.0

0.2

0.4

0.6

0.8

1.0

V

Reference (3D)

CRI (V, S)

CRI (V, S, GL)

Example 5 Radiomic Biomarkers and Decision Support System

for Multiple Myeloma

Ref. Zhou C, Chan HP, Dong Q, Couriel DR, Pawarode A, Hadjiiski LM, Wei J. Quantitative analysis of MR

imaging to assess trestment response for patients with multiple myeloma by using dynamic intensity

entropy transformation 2015, Ahead of Print, 10.1148/radiol.2015142804

E

Multiple Myeloma (MM) – a cancer formed by malignant plasma cells in the bone marrow

o MR radiomic biomarkers (+ clinical biomarkers)

- Staging

- Treatment response assessment

- Prognosis prediction

- Recurrence – early detection

E

Radiomic Biomarkers for MM Treatment Response

T1W sagittal views

VOI of vertebral bodies and intervertebral discs

3D Dynamic Intensity Energy Transformation (DIET)

Quantitative Energy Enhancement Value:

Response Index DIET response map E

p-qEEVvbr and m-DqEEVvbr

T1-weighted MR DIET response map

Pre-Bone Marrow Transplant

55 days post-BMT Pre-BMT 55 days post-BMT

BMT Treatment Response Assessment

Responder

E

56 days post-BMT Pre-BMT 56 days post-BMT

T1-weighted MR DIET response map

Non-responder

BMT Treatment Response Assessment

Pre-Bone Marrow Transplant

E

Kaplan-Meier Survival Curve predicted by p-qEEV biomarker

0 5 10 15 20 25 30 350.2

0.4

0.6

0.8

1

Time Since BMT (Months)S

urviv

al p

ro

bab

ilit

y

p-qEEV >= 10%

p-qEEV < 10%

E

Prediction of treatment response by qEEV-based response index

Computer-Aided Decision Support for Multiple Myeloma

Example 6 Measurement of Pleural Effusion

in Thoracic CT Volumes

Ref. Yao J, Bliton J, Summers RM. Automatic segmentation and measurement of pleural effusions on CT. IEEE Transactions on Biomedical Engineering 2013, 60(7) pp 1834-1840.

E

Significance and Workflow

Pleural effusion: Size, location and temporal change

can be significant for diagnosis and patient care

Very time consuming to manually measure in 3D chest CT

E

Segmentation and Surface Modeling

Segmentation before deformable surface modeling

Segmentation after deformable surface modeling

Segmented PE surface before deformable surface modeling

Segmented PE surface after deformable surface modeling

E

Automated vs. Manual Segmentations

E

Comparison to Manual Segmentation

E

Example 7 Radiomics-Based Assessment of Normal Lung Tissue Damage in

Radiation Therapy

E

Ref. Cunliffe AR, et al.: Lung texture in serial thoracic CT scans: Correlation with radiologist-defined severity of acute changes following radiation therapy. Physics in Medicine and Biology 59: 5387–5398, 2014.

Ref. Cunliffe AR, et al.: Lung texture in serial thoracic computed tomography scans: Correlation of radiomics-based features with radiation therapy dose and radiation pneumonitis development. International Journal of Radiation Oncology • Biology • Physics 91: 1048–1056, 2015.

Purpose

Develop quantitative methods to measure dose-dependent normal lung tissue damage in patients who receive thoracic radiation therapy

Characterize CT scan appearance based on pixel values and spatial relationship between pixels

Use these quantitative techniques to distinguish between patients with and without radiation pneumonitis E

Radiation-Induced Lung Damage

E

Pre-RT ~3 months post-RT

Texture Analysis

E

Mean -785 -385

Median -820 -348

St. dev. 148 234

IQR 90 296

Fractal dimension 3.0 2.7

Entropy of ripple-filtered region 4.0 7.1

Follow-up Baseline

Identified using

deformable registration

Results

Extent of texture change between scans increases with severity of RT-induced damage E

Texture Analysis Image texture change related to dose through the dose map

of the treatment planning scan

E Treatment planning dose map Baseline

deformable registration

Results

E Extent of texture change between scans increases with

increased dose

Patient Classification

E

Area under the ROC curve for the ability of texture features to differentiate patients with and without radiation pneumonitis

Mean AUC Across 20 Features

Low Dose Medium

Dose High Dose Fitted Slope

0.64 0.68 0.71 0.71

Summary Identified a set of reliable texture features for use in lung

texture analysis applications

Fully automated method for quantitative analysis of changes in lung parenchyma between serial CT scans

Demonstrated relationship between texture change and Radiation dose

Severity of radiation therapy-induced damage

Radiation pneumonitis status

E

Healthcare

The Complementary Roles of Computer-Aided Diagnosis and Quantitative Image Analysis