Upload
carestream
View
5.699
Download
1
Embed Size (px)
Citation preview
The Complementary Roles of Computer-Aided Diagnosis and
Quantitative Image Analysis: Similarities and Differences in
Assessment, Quality Assurance and Training
Berkman Sahiner1, PhD Samuel G. Armato III2, PhD Zhimin Huo3, PhD Heang-Ping Chan4, PhD Ronald M. Summers5, MD, PhD Nicholas Petrick1, PhD
1: US Food and Drug Administration, Center for Devices and Radiological Health 2: University of Chicago, Department of Radiology 3: Carestream Inc. 4: University of Michigan, Department of Radiology 5: National Institutes of Health, Clinical Center
Introduction Quantitative image analysis (QIA) and computer-aided
diagnosis (CAD) are closely linked
This exhibit reviews the methods developed for assessment, quality assurance, and user training for CAD, and highlights parallels and distinctions between CAD and QIA
Comparison between CAD and QIA in this computer exhibit is performed through several example applications developed by the authors or that are publicly available
What is CAD?
CAD systems incorporate pattern recognition and data analysis capabilities* and are intended to
Mark regions of an image that may reveal specific abnormalities and alert the clinician to these regions during image interpretation (computer-aided detection (CADe) systems)**
Provide to the clinician an assessment of disease, disease type, severity, stage, progression (computer-aided diagnosis (CADx) systems)**
*Guidance for industry and FDA staff: “Clinical performance assessment: Considerations for computer-assisted detection devices applied to radiology images and radiology device data - premarket approval (PMA) and premarket notification [510(k)] submissions," (FDA, 2012).
**N. Petrick et al., “Evaluation of computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:087001.
What is QIA? QIA is the process of extraction of quantitative imaging
biomarkers from medical images, typically involving computerized tools
A quantitative imaging biomarker is an objectively measured characteristic derived from an in vivo image as an indicator of normal biological processes, pathogenic processes, or response to a therapeutic intervention*
*Sullivan DC et al., “Metrology standards for quantitative Imaging biomarkers,” Radiology 2015 Aug 12:142202. (Epub ahead of print).
What is Quantitative Imaging? Both QIA and quantitative imaging biomarkers are within
the larger context of quantitative imaging, defined as
the extraction of quantifiable features from medical images for the assessment of normal [findings] or the severity, degree of change, or status of a disease, injury, or chronic condition relative to normal [findings]
Quantitative imaging includes the development, standardization, and optimization of anatomical, functional, and molecular imaging acquisition protocols, data analyses, display methods, and reporting structures
*Quantitative Imaging Biomarkers Alliance. http://rsna.org/QIBA.aspx. Accessed Oct. 15, 2015.
How are CAD and QIA Similar? Both aim at aiding clinicians through advanced image
analysis techniques Both commonly use computer methods to extract features
CAD may use other patient-related information, e.g., age, gender, risk factors, or non-image biomarkers
CAD features may include quantitative imaging biomarkers, but can also be relative, ordinal features
User interaction may be somewhat more prevalent in QIA (e.g., semi-automated segmentation)
Both emphasize appropriate image acquisition protocols, display methods, training, and reporting
How are CAD and QIA Different? In CAD, the emphasis is on how the CAD output can provide
decision support to clinicians In full assessment, it is required to demonstrate that a CAD system
aids the clinician
May integrate imaging and non-imaging information or biomarkers
Not necessary to establish a direct link between individual CAD features and a specific disease condition
Standalone testing (performance of the CAD system alone) is nonetheless important
In QIA, the emphasis is on the extraction of biomarkers Bias and variability across devices, patients and time are major
considerations
Assessment, Quality Assurance, and Training
The differences and similarities between CAD and QIA have a number of implications for assessment, quality assurance (QA), and training
We briefly summarize our work in CAD in these areas and draw parallels with QIA
Assessment - CAD CAD systems are typically assessed by evaluating the effect
of the system on clinicians Reader performance assessment: Evaluate performance of a
clinician using CAD as part of the decision making process Retrospective multiple-reader multiple case (MRMC) studies
Prospective field trials
Standalone performance of a CAD system is also useful as a performance indicator, both in System design phase and
Final system assessment
Assessment - CAD A recent paper by AAPM outlined important considerations in CAD
assessment including*: Data set selection
Representativeness of the test set for the targeted population
Reference standard and mark labeling Disease status of a case (sometimes includes location information)
Standalone assessment metrics True-positive fraction, false-positive fraction pairs Receiver operating characteristic (ROC) curves Location-specific ROC
Reader study design and analysis Prospective versus retrospective studies MRMC study design Reader training for reader studies
*N. Petrick et al., “Evaluation of computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:087001.
Assessment - Quantitative Imaging
Most groups working in quantitative imaging are interested in technical performance assessment
How a test performs in reference objects or subjects under controlled conditions
Other quantitative imaging assessment
Clinical assessment
Clinical impact
Technical Assessment - Quantitative Imaging
Technical assessment can be performed using Anthropomorphic phantoms
Large number of replicate measurements easily obtained The same phantoms can be used repeatedly Reference standard easier to obtain Phantom realism may be limited Generalizability to human data may be limited because of limited
abnormality variation and complexity
Patient data More realistic Number of replicate measurements may be limited Reference standard more difficult to obtain Number of available patients usually limited
Technical Assessment - Quantitative Imaging Important considerations include*
Measurand/Reference True value of the quantity intended to be measured
Bias Difference between the expected value of the biomarker and the measurand
Linearity Is a change in the measurand reflected as a proportional change in the biomarker?
Repeatability Ability to repeatedly measure the same feature under identical or near-identical
conditions
Reproducibility Ability to measure the same feature under different conditions expected in
A preclinical study Clinical trial Clinical practice
D.L. Raunig et al., “Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment,” Stat Methods Med Res. 2015, 24:27-67.
Clinical Assessment - Quantitative Imaging
Clinical assessment of a QIA method
How does the method help clinicians in diagnosis or treatment?
Recommendations developed for clinical assessment of CAD may be useful for the clinical assessment of quantitative imaging biomarkers after proper modifications
Methodology for prospective and retrospective studies
MRMC methods
Assessment metrics
Quality Assurance - CAD CAD systems are medical devices, and can benefit from QA
procedures like all medical devices
Clinical image acquisition: Follow manufacturer’s QA procedure for imaging device
Additional QA for CAD: Assure functionality and performance of CAD device according to
vendor’s specifications
Assure consistency of CAD device performance over time
A recent paper by AAPM outlined important considerations in CAD quality assurance*
*Z. Huo, et al., “Quality assurance and training procedures for computer-aided detection and diagnosis systems,” Med. Phys. 2013, 40:077001.
Quality Assurance - Quantitative Imaging
Clinical image acquisition:
Follow manufacturer’s QA procedure for imaging device
Image acquisition for quantitative imaging:
QA procedures to reduce variability across devices, patients and time
QA procedures to ensure that acquired images are of quantitative quality
Calibration methods and phantom tests in addition to those recommended by manufacturers for clinical image quality
Clinician Training - CAD Inform clinicians about
Intended use of the CAD system
CAD system’s limitations
Particular strengths and weaknesses of different CAD systems E.g., with respect to lesion type, acquisition parameters
Although many clinicians and researchers have underlined the need for clinician training for CAD Additional awareness needs to be raised for proper use and
potential adverse effects of improper use
Best practices for CAD training are yet to be established
Clinician Training - Quantitative Imaging
Clinicians using the quantitative imaging tool in the clinic should be qualified in accordance with local and national requirements and standards
Training of operators in image acquisition and analysis can influence the value of the extracted imaging biomarker e.g., training of an operator in a semi-automated segmentation task
for a lesion volume biomarker
There is a small number of studies on clinician training and its effect on the extracted quantitative imaging biomarker
• The rest of the computer exhibit consists of a summary and example applications developed by the authors or that are publicly available
• Click to go to the slide of your choice
• Summary
• Example applications
• Click on the E on lower right from any slide to go to the list of example applications
E
Summary CAD and QIA fields have a number of shared attributes as
well as some differences
Methodologies developed in one field are likely to be applicable in the other after careful consideration of the differences
We expect that the development of standardization, assessment and clinical use recommendations in these two fields will strengthen each other
E
Example Applications Example 1
CADe assessment: A commercial CAD system
Example 2 Quantitative imaging
technical assessment: A phantom study
Example 3 Quantitative volume change
analysis for treatment response assessment of head and neck neoplasms on CT scans
Example 4 Computer-aided analysis of treatment
response of bladder cancers on CT scans
Example 5 Radiomic biomarkers and decision
support system
Example 6 Measurement of pleural effusion in
thoracic CT volumes
Example 7 Radiomics-based assessment of normal
lung tissue damage in radiation therapy
Example 1 CADe Assessment:
A Commercial CADe System
E
A Commercial CADe System
M-Vu Algorithm Engine
CADe device intended to aid radiologists in reading screening mammograms Mark areas for review by a radiologist, who
First reviews each case in the conventional manner,
then re-examines regions marked by the M-Vu system before making a BI-RADS assessment
Pre-market approval (PMA) by the FDA in 2012
E
Test Data Set Selection 140 positive and 140 negative cases
11 sites in the United States Academic, specialty, and community clinics.
Pre-defined inclusion and exclusion criteria Sequential, eligible positive mammograms
A mammography exam for which a biopsy-proven breast cancer was found within 15 months following the exam date
Eligible negative mammograms A mammography exam for which
Breast cancer was not found within 15 months prior or 15 months after the exam date
At least one associated subsequent negative mammogram was acquired at least 11 months after the exam date.
E
Reference Standard Positive cases:
Pathology reports from biopsies or surgeries
Mammograms and radiology reports for lesion location
Each biopsy-proven lesion is outlined by site investigators
Negative cases
Radiology reports
E
Standalone Assessment: Sensitivity A malignant region was considered to be detected if
A CAD mark centroid was inside the region, or
if the region centroid was inside a CAD mark
Having the cancer within the CAD mark, as above, can lead to an optimistic estimate of sensitivity A large mark may cover the lesion, but may not attract the attention of
the radiologist to the abnormality
When this is used as a detection criterion, other additional data are evaluated to check if sensitivity is overestimated
A case was considered a true positive if At least one malignant region was detected by CAD
E
Standalone Assessment: Sensitivity
Cases Sensitivity 95% Conf. Interval
Overall 140 79.3% (72.6% 86.0%)
Microcalcification 69 79.7% (70.2%, 89.2%)
Mass 86 81.4% (73.2%, 89.6%)
Sensitivity reported for two important sub-groups: Masses and microcalcifications
Additional data on sensitivity with respect to Lesion size Lesion pathology (invasive, DCIS, other) Breast density E
Standalone Performance: False Positives
Additional data on False-positives by breast density
Specificity (percentage of mammograms with no FP marks)
False-positives per mammo. 95% Conf. Interval
Total 0.418 (0.346, 0.490)
Microcalcification 0.088 (0.042,0.133)
Mass 0.330 (0.275, 0.385)
E
Pivotal Study Multi-reader multi-case (MRMC) study to investigate the
effect of the CADe device on radiologists’ performance
21 radiologists from a variety of academic, specialty, and community clinics located across the US
For each case:
Evaluate the case without CAD and record an assessment
View CAD marks
Record a "with CAD" assessment
E
Pivotal Study With and without CAD assessments
Recall/do not recall patient
Screening BI-RADS (0, 1, 2, 3, 4a, 4b, 4c, 5)
Forced BI-RADS (1, 2, 3, 4a, 4b, 4c, 5)
Lesion findings
Lesion findings: Laterality
Type (Mass, Architectural Distortion, Asymmetry, microcalcification)
BI-RADS (1, 2, 3, 4a, 4b, 4c, 5)
Probability of Malignancy (POM) (0-100%)
E
Pivotal Study - ROC Analysis
POM data analyzed using ROC methodology
Area under the ROC curve (AUC)
AUC w/0 CAD = 0.885
AUC with CAD = 0.902
Difference statistically significant with p=0.013 (DBM MRMC method)
E
Pivotal Study – Sens., Spec. Radiologist per-case sensitivity and specificity based on
decision to recall
W/o CAD With CAD Difference P-value
Sensitivity 0.865 0.901 0.036 (0.014, 0.058) 0.002
Specificity 0.649 0.623 -0.026 (-0.039, -0.013) < 0.001
Sensitivity increased and specificity decreased with CAD Typical in reader studies for most second-read CADe systems
Lack of CADe mark for a lesion does not dissuade a reader from recall
Marks for missed lesions and some false-positive marks add to the number of recalled cases
E
Summary Components of CAD assessment
Test data set selection Representative of the targeted population
Reference standard Clear and accurate definitions of actual positives and negatives
Standalone performance How does the CAD system alone perform in the task for which it is intended to help
the radiologist ? Performance metrics, confidence intervals
Reader performance What is the effect of the CAD system on readers?
Performance metrics
Performance with and without CAD
Difference, confidence intervals
E
Example 2 Quantitative Imaging
Technical Assessment: A Phantom Study
E
Designing an Analysis Plan* Step 1: Define the quantitative imaging biomarker and its relationship to quantity to
be measured (measurand)
Step 2: Define question to be addressed
Hypothesis or bounds on performance
Step 3: Define the experimental unit
Lesion-level, patient-level
Step 4: Define statistical measures of performance
What are the metrics for bias, repeatability, reproducibility?
Step 5: Specify elements of statistical design
Reference, reproducibility conditions, etc.
Step 6: Determine the data requirements
Patient population, type of images, sample size, etc.
Step 7: Collect data and perform statistical analysis
*Raunig et al., Stat Methods
Med Res, 2014 E
Main Considerations in Technical Assessment
E
E
Main Considerations in Technical Assessment
Purpose of the Study Evaluate technical performance of a nodule volume
estimation tool
E
Study Design Phantom
Thorax phantom with vascular insert
Synthetic nodules
4 spherical nodules
5, 8, 9, 10 mm
• +100 HU
E
Study Design Image collection protocol
10 repeat acquisitions
Phantom
16×0.75 mm
100 mAs
0.75 mm 1.5 mm
Reconstruction
Acquisition
Detailed Detailed
E
Bias/Linearity
Visually assess data to define limits of quantitation Evaluate means/variances
Is data transformation necessary/appropriate to stabilize the variance?
-60
-40
-20
0
20
40
60
5 mm 8 mm 9 mm 10 mm
Nodule Size
50
150
250
350
450
550
0.75 1.5 0.75 1.5 0.75 1.5 0.75 1.5
Slice Thickness (mm)
Measu
red
Valu
e (
mm
3 )
5 mm 8 mm 9 mm 10 mm Nodule Size
Measu
red
Valu
e-M
ed
ian
(m
m 3
) E
Bias/Linearity
0 100 200 300 400 500 600 0
100
200
300
400
500
600
Reference Standard (mm3)
Measu
red
Valu
e (
mm
3 )
E
Repeatability/Reproducibility
RC
Estimated over 10 repeat acquisitions, 4 nodules
RDC Estimated over 10 repeat acquisitions, 4 nodules, and 2 slice
thicknesses
Slice Thickness
0.75 mm 1.5 mm
RC 14.4 mm3 27.6 mm3
RDC 29.0 mm3
E
Repeatability Analysis
1.5 mm Slices
0 100 200 300 400 500 600 -60
-40
-20
0
20
40
60 Bland-Altman Plot (1.5 mm Slices)
RC=27.62 mm3
0.75 mm Slices
0 100 200 300 400 500 600 -60
-40
-20
0
20
40
60 Bland-Altman Plot (0.75 mm Slices)
RC=14.44 mm3
Mean(Esti,Estj) (mm3) Mean(Esti,Estj) (mm3)
Dif
f(E
st i,E
st j)
(mm
3)
Dif
f(E
st i,E
st j)
(mm
3)
E
Reproducibility Analysis
0 100 200 300 400 500 600 -60
-40
-20
0
20
40
60 Bland-Altman Plot (Reproducibility)
Mean(Est i ) (mm ,Est j 3 )
RDC=28.95 mm3
Dif
f(E
st i,E
st j)
(mm
3)
E
Summary Main components of quantitative imaging biomarker technical
assessment Bias/linearity analysis Repeatability analysis Reproducibility analysis Others
Identification of significant factors/subgroups ….
Challenging to maintain consistency across studies Phantom/clinical data Transformation of data Reference standard Is test-retest data available? Reproducibility conditions E
Example 3 Quantitative Volume Change Analysis
for Treatment Response Assessment of Head and Neck Neoplasms on CT Scans
Ref. Hadjiiski L, Mukherji SK, Gujar SK, Sahiner B, Ibrahim M, Street E, Moyer J, Worden FP, Chan HP. Treatment response assessment of head and neck cancers on CT using computerized volume analysis. American Journal of Neuroradiology (AJNR) 2010;31(9):1744-1751. E
Lesion Segmentation Region of Interest
Preprocessing
Level Set
Automatic Segmentation
E
Example: Tongue Base Tumor
E
Example: Tongue Base Tumor
E
Pre-treatment [cm3]
0 10 20 30 40 50
Po
st-
treate
men
t [c
m3
]
0
10
20
30
40
50
Pre-treatment [cm3]
0 10 20 30 40 50
Po
st-
treate
men
t [c
m3
]
0
10
20
30
40
50
Volume Segmentation • Pre-to-Post-treatment change in segmented primary
tumor volumes – 23 pairs
Auto Manual
E
Results - % Volume Change Comparison • Automatic vs. Manual % volume change
comparison – 23 tumor pairs
- Automatic vs. Manual: r = 0.89
Pearson’s correlation r
E % Volume Change Automatic
-20 0 20 40 60 80 100% V
olu
me
Ch
an
ge
Ma
nu
al
-20
0
20
40
60
80
100
Results – Summary
WHO (longest diameter):
Primary tumors
RECIST (longest diameter and
perpendicular):
0.73
0.58
Pearson’s correlation coeff.
• % Pre-to-Post-treatment change in volume estimates by 3
methods, relative to radiologist’s segmentation:
Quantitative image analysis: 0.89
E
Example: Tonsillar Tumor
Pre-treatment Post-treatment E
Example: Tonsillar Tumor
Pre-treatment Post-treatment E
Example: Tumor
Pre-treatment Post-treatment E
Example: Tumor
Pre-treatment Post-treatment E
Example 4 Computer-aided Analysis of Treatment
Response of Bladder Cancers on CT Scans
Ref. Hadjiiski L, Weizer AZ, Alva A, Caoili EM, Cohan RH, Cha K, Chan HP. Bladder cancers on CT: preliminary study of treatment response assessment based on computerized volume analysis, WHO and RECIST Criteria. American Journal of Roentgenology 2015, 205(2) pp 348-352.
E
Bladder Lesion Segmentation
Cascaded Level Set
Region of Interest
Automatic Segmentation
Auto-Initialized Cascaded Level Set (AI-CALS)
Preprocessing
E
Bladder Tumor: Pre- & Post-Treatment
Pre- treatment
Post- treatment
E
Feature Descriptors Descriptors automatically extracted from the segmented
lesions:
- 15 radiomic features (RF) based on pre- and post-treatment changes in:
- volume (V)
- 5 gray level descriptors (GL)
- 9 shape descriptors (S)
- Selected features merged into a Combined Response Index (CRI)
E
Combined Response Index (CRI) • AUC pT0 stage (complete response) vs. others
– 35 primary site tumors
Volume (3D): 0.68 ± 0.09
Reference (3D): 0.66 ± 0.11
CRI (Volume, Shape):
0.75 ± 0.10
CRI (Volume, Shape, Gray Level):
0.76 ± 0.09
E False Positive Fraction
0.0 0.2 0.4 0.6 0.8 1.0
Tru
e P
os
itiv
e F
rac
tio
n
0.0
0.2
0.4
0.6
0.8
1.0
V
Reference (3D)
CRI (V, S)
CRI (V, S, GL)
Example 5 Radiomic Biomarkers and Decision Support System
for Multiple Myeloma
Ref. Zhou C, Chan HP, Dong Q, Couriel DR, Pawarode A, Hadjiiski LM, Wei J. Quantitative analysis of MR
imaging to assess trestment response for patients with multiple myeloma by using dynamic intensity
entropy transformation 2015, Ahead of Print, 10.1148/radiol.2015142804
E
Multiple Myeloma (MM) – a cancer formed by malignant plasma cells in the bone marrow
o MR radiomic biomarkers (+ clinical biomarkers)
- Staging
- Treatment response assessment
- Prognosis prediction
- Recurrence – early detection
E
Radiomic Biomarkers for MM Treatment Response
T1W sagittal views
VOI of vertebral bodies and intervertebral discs
3D Dynamic Intensity Energy Transformation (DIET)
Quantitative Energy Enhancement Value:
Response Index DIET response map E
p-qEEVvbr and m-DqEEVvbr
T1-weighted MR DIET response map
Pre-Bone Marrow Transplant
55 days post-BMT Pre-BMT 55 days post-BMT
BMT Treatment Response Assessment
Responder
E
56 days post-BMT Pre-BMT 56 days post-BMT
T1-weighted MR DIET response map
Non-responder
BMT Treatment Response Assessment
Pre-Bone Marrow Transplant
E
Kaplan-Meier Survival Curve predicted by p-qEEV biomarker
0 5 10 15 20 25 30 350.2
0.4
0.6
0.8
1
Time Since BMT (Months)S
urviv
al p
ro
bab
ilit
y
p-qEEV >= 10%
p-qEEV < 10%
E
Prediction of treatment response by qEEV-based response index
Computer-Aided Decision Support for Multiple Myeloma
Example 6 Measurement of Pleural Effusion
in Thoracic CT Volumes
Ref. Yao J, Bliton J, Summers RM. Automatic segmentation and measurement of pleural effusions on CT. IEEE Transactions on Biomedical Engineering 2013, 60(7) pp 1834-1840.
E
Significance and Workflow
Pleural effusion: Size, location and temporal change
can be significant for diagnosis and patient care
Very time consuming to manually measure in 3D chest CT
E
Segmentation and Surface Modeling
Segmentation before deformable surface modeling
Segmentation after deformable surface modeling
Segmented PE surface before deformable surface modeling
Segmented PE surface after deformable surface modeling
E
Automated vs. Manual Segmentations
E
Comparison to Manual Segmentation
E
Example 7 Radiomics-Based Assessment of Normal Lung Tissue Damage in
Radiation Therapy
E
Ref. Cunliffe AR, et al.: Lung texture in serial thoracic CT scans: Correlation with radiologist-defined severity of acute changes following radiation therapy. Physics in Medicine and Biology 59: 5387–5398, 2014.
Ref. Cunliffe AR, et al.: Lung texture in serial thoracic computed tomography scans: Correlation of radiomics-based features with radiation therapy dose and radiation pneumonitis development. International Journal of Radiation Oncology • Biology • Physics 91: 1048–1056, 2015.
Purpose
Develop quantitative methods to measure dose-dependent normal lung tissue damage in patients who receive thoracic radiation therapy
Characterize CT scan appearance based on pixel values and spatial relationship between pixels
Use these quantitative techniques to distinguish between patients with and without radiation pneumonitis E
Radiation-Induced Lung Damage
E
Pre-RT ~3 months post-RT
Texture Analysis
E
Mean -785 -385
Median -820 -348
St. dev. 148 234
IQR 90 296
Fractal dimension 3.0 2.7
Entropy of ripple-filtered region 4.0 7.1
Follow-up Baseline
Identified using
deformable registration
Results
Extent of texture change between scans increases with severity of RT-induced damage E
Texture Analysis Image texture change related to dose through the dose map
of the treatment planning scan
E Treatment planning dose map Baseline
deformable registration
Results
E Extent of texture change between scans increases with
increased dose
Patient Classification
E
Area under the ROC curve for the ability of texture features to differentiate patients with and without radiation pneumonitis
Mean AUC Across 20 Features
Low Dose Medium
Dose High Dose Fitted Slope
0.64 0.68 0.71 0.71
Summary Identified a set of reliable texture features for use in lung
texture analysis applications
Fully automated method for quantitative analysis of changes in lung parenchyma between serial CT scans
Demonstrated relationship between texture change and Radiation dose
Severity of radiation therapy-induced damage
Radiation pneumonitis status
E