22
4 Effects of image characteristics on performance of tumor delineation methods: a test-retest assessment Patsuree Cheebsumon, Floris H.P. van Velden, Maqsood Yaqub, Virginie Frings, Adrianus J. de Langen, Otto S. Hoekstra, Adriaan A. Lammertsma, and Ronald Boellaard Published in: Journal of Nuclear Medicine 52 (2011), pp. 1550–1558

E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4Effects of image characteristics onperformance of tumor delineationmethods: a test-retest assessment

Patsuree Cheebsumon, Floris H.P. van Velden, Maqsood Yaqub, Virginie Frings, Adrianus J.

de Langen, Otto S. Hoekstra, Adriaan A. Lammertsma, and Ronald Boellaard

Published in:

Journal of Nuclear Medicine 52 (2011), pp. 1550–1558

Page 2: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Abstract

Positron emission tomography (PET) can be used to monitor response duringchemotherapy and assess biologic target volumes for radiotherapy. Previoussimulation studies have shown that the performance of various automatic orsemiautomatic tumor delineation methods depends on image characteristics.The purpose of this study was to assess test-retest variability of tumor delin-eation methods, with emphasis on the effects of several image characteristics(e.g., resolution and contrast). Methods: Baseline test-retest data from 19non-small cell lung cancer patients were obtained using [18F]-FDG (n=10) and3′-deoxy-3′-18F-fluorothymidine (18F-FLT) (n=9). Images were reconstructedwith varying spatial resolution and contrast. Six different types of tumor delin-eation methods, based on various thresholds or on a gradient, were applied to alldatasets. Test-retest variability of metabolic volume and standardized uptakevalue (SUV) was determined. Results: For both tracers, size of metabolic vol-ume, and test-retest variability of both metabolic volume and SUV were affectedby image characteristics and tumor delineation method used. The median vol-ume test-retest variability ranged from 8.3 to 23% and from 7.4 to 29% for 18F-FDG and 18F-FLT, respectively. For all image characteristics studied, largerdifferences (≤10-fold higher) were seen in test-retest variability of metabolicvolume than in SUV. Conclusion: Test-retest variability of both metabolicvolume and SUV varied with tumor delineation method, radiotracer and imagecharacteristics. The results indicate that a careful optimization of imaging anddelineation method parameters is needed when metabolic volume is used, forexample, as a response assessment parameter.

Reprinted by permission of the Society of Nuclear Medicine from:

Patsuree Cheebsumon, Floris H.P. van Velden, Maqsood Yaqub, Virginie Frings, Adrianus J.de Langen, Otto S. Hoekstra, Adriaan A. Lammertsma, and Ronald Boellaard. Effects ofImage Characteristics on Performance of Tumor Delineation Methods: A Test-RetestAssessment. J Nucl Med. 2011;52(10): 1550–1558.

50

Page 3: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.1. Introduction

4.1 Introduction

Positron emission tomography (PET) is a functional imaging modality thatprovides information about the metabolism, physiology, or molecular biologyof tumor tissue. There is growing evidence that PET can be used to monitorresponse during chemotherapy and to assess biologic target volumes for radio-therapy [7, 26, 31, 32]. For response monitoring studies, it is important to knowwhether a difference between tumor volumes in successive scans represents a trueresponse or methodology-related variability. In addition, for radiation treat-ment planning, accurate definition of tumor volume is important for focusingthe dose to the tumor and sparing surrounding normal tissue. Various PETtracers have been developed to visualize and quantify biologic characteristics oftumors, that is, metabolism, proliferation, hypoxia and apoptosis. The mostwidely used PET tracer, 18F-FDG, is increasingly applied to define gross tumorvolume in radiotherapy. Evidence is accumulating that 18F-FDG could improvethe accuracy with which tumor boundaries are defined [7,31,32]. 18F-FDG up-take reflects glucose metabolism, and tumors can be identified on the basis oftheir increased rate of glycolysis. However, increased glucose metabolism is notspecific to tumors, and increased 18F-FDG uptake is also seen in, for example,inflammatory tissue [14].

Proliferation of tumor cells is directly related to DNA synthesis, which canbe measured using radiolabeled thymidine or thymidine derivatives. The 18F-labeled thymidine analog 3′-deoxy-3′-18F-fluorothymidine (18F-FLT) has showna high correlation with thymidine kinase-1 and tissue markers of proliferation,that is, proliferating cell nuclear antigen (Ki-67), in pulmonary nodules [16].Moreover, 18F-FLT showed high sensitivity and specificity, comparable with18F-FDG [65]. Therefore, 18F-FLT is increasingly being used as a specific tracerfor noninvasive assessment of tumor cell proliferation.

In this paper, we will use the term metabolic volume to indicate tumorvolumes that are derived directly from PET. This term may be justified, as 18F-FLT and 18F-FDG are trapped in tissue by metabolic (kinase) activity. However,for volume assessments with other tracers, that is, those that measure perfusionor bind to receptors, the term functional volume may be more appropriate.

Various techniques for determining the boundaries of the gross tumor volumebased on PET images have been reported [7, 29–32], ranging from visual inter-pretation to automatic or semiautomatic methods. In the simplest case (visual),tumor boundaries are outlined manually by a nuclear medicine physician, radi-ologist, or radiation oncologist. Manual outlining may lead to a large variationin gross tumor volume delineation, as boundary definition depends on both theexperience of the physician and the contouring protocol used [10]. Automatic orsemiautomatic delineation methods, methods that automatically delineate a tu-mor after user input, have been proposed to reduce this variability. So far, to ourknowledge, only 2 studies have reported the test-retest variability of metabolicvolumes [66, 67]. However, in the study of Frings et al. [66], metabolic volumetest-retest variability was evaluated for a few percentage threshold-based auto-

51

Page 4: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

mated tumor delineation methods, and in both studies metabolic volume test-retest variability was assessed using constant imaging parameters only. Thereare, however, many factors that could affect the accuracy of PET-based auto-matic or semiautomatic delineation methods, that is, image resolution, recon-struction settings, image noise, and tumor characteristics [7, 31, 63]. Assessingthe effects of these different image characteristics on metabolic volume test-retest variability is of the utmost importance to understand the need to optimizeimage quality [19]. Moreover, there are several types of PET-based automatedtumor delineation methods for which test-retest performance may or may notbe sensitive to the image characteristics.

The aim of this study was to further evaluate both the test-retest variabilityand differences in metabolic volumes derived from PET studies using varioustypes of automatic or semiautomatic delineation methods, with emphasis on theeffects of image characteristics (i.e., resolution and contrast) and for 2 differenttracers.

4.2 Materials and methods

4.2.1 Patients and Radiotracers

Retrospective data from patients with stage IIIB or IV non-small cell lung can-cer for 2 radioactive PET tracers were used. All patients gave written informedconsent, and both studies were approved by the Medical Ethics Review Com-mittee of the VU University Medical Center.

Ten patients (3 women and 7 men; mean age ± SD, 51 ± 5 y; range, 45-63 y;mean weight, 76 ± 10 kg; range, 56-94 kg) were included in a dynamic baseline18F-FDG study. Blood glucose levels were obtained for each patient and werewithin the reference range (mean, 5.5 ± 0.6 mmol⋅L−1; range, 4.4-7.0 mmol⋅L−1).All patients fasted for at least 6 h before scanning. In all patients, 2 dynamic18F-FDG studies were acquired on consecutive days.

Nine patients (2 women and 7 men; mean age, 66 ± 11 y; range, 45-78 y;mean weight, 72 ± 8 kg; range, 61-87 kg) were included in a dynamic baseline18F-FLT study. All patients were scanned twice within an interval of 1 wk.

4.2.2 PET Protocol

Patients were prepared in accordance with recently published guidelines forquantitative PET studies [23,30]. All patients were scanned in the supine posi-tion and received an intravenous catheter for tracer administration. All scans,performed using an ECAT EXACT HR+ scanner (Siemens/CTI) [54], startedwith a 10-min transmission scan. Afterward, a tracer bolus was administratedintravenously (18F-FDG: 388 ± 71 MBq; 18F-FLT: 350 ± 47 MBq) while dynamicemission scanning began in 2-dimensional acquisition mode. Each dynamic scanconsisted of 40 frames with the following lengths: 1 × 30, 6 × 5, 6 × 10, 3 × 20,5 × 30, 5 × 60, 8 × 150, and 6 × 300 s.

52

Page 5: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.2. Materials and methods

Both the last 3 frames (45-60 min after injection) and the last 6 frames (30-60 min after injection) were summed to obtain various image contrasts, and theresulting sinograms were reconstructed using normalization and attenuation-weighted ordered-subsets expectation maximization with 2 iterations and 16subsets, followed by postsmoothing using a Hanning filter at 0.5 of the Nyquistfrequency [6]. An image matrix size of 256 × 256 × 63 was used, corresponding toa pixel size of 2.57 × 2.57 × 2.43 mm. Additional smoothing was applied to theimages using various gaussian kernels, thereby reducing both image resolutionand noise. The kernels used resulted in final spatial resolutions of 6.5, 8.3, and10.2 mm in full width at half maximum (FWHM). Using each combination ofimage contrast and noise (i.e., sum of last 3 or 6 frames), spatial resolution(i.e., 6.5, 8.3, and 10.2 mm FWHM) and tracer (i.e., 18F-FDG and 18F-FLT),test-retest variability of both metabolic volume and corresponding standardizeduptake value (SUV) was determined for all automatic or semiautomatic tumordelineation methods.

4.2.3 Data Analysis

Test-retest variability of both observed metabolic volumes and volumetricaverage SUVs was assessed for the following 6 different types of automatic orsemiautomatic tumor delineation methods:

1. Fixed threshold range of 50% and 70% of maximum voxel value withintumor (VOI50, VOI70). This method applies a threshold based on thepercentage of the maximum voxel intensity within the tumor [30]. Next,this threshold is used to delineate the tumor.

2. Adaptive threshold range of 41%-70% of maximum voxel value withintumor (VOIA41, VOIA50, VOIA70). This method is similar to the fixedthreshold method, except that it adapts the threshold relative to the localaverage background, thereby correcting for the contrast between tumorand local background [30].

3. Contrast-oriented method (VOISchaefer). This method uses a correctionby measuring the mean of 70% maximal SUV and background activity forvarious sphere sizes. Regression coefficients are calculated, which representthe relationship between optimal threshold and image contrast for varioussphere sizes [31]. This threshold equation is given by:

Thresholdoptimal = A ×meanSUV70% +B × background�� ��4.1

where A and B were fitted using phantom studies [31]. In general, differentvalues are applied for sphere diameters smaller and larger than 3 cm. Inour paper, we recalibrated this method; that is, we determined the A andB values that are specific for the PET system and image characteristicsused. Ideally, the diameter could be derived from CT images. However, asno CT images were available for the studies used, we obtained the diameter

53

Page 6: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

from 2 different delineation methods, VOIA41 and VOIA50 (multiplied by aconstant factor), and show them as VOISchaefer−A41 and VOISchaefer−A50,respectively.

4. Background-subtracted relative-threshold level (RTL) method (VOIRTL).This method is an iterative method based on a convolution of the point-spread function that takes into account the differences between varioussphere sizes and the scanner resolution [32].

5. Gradient-based watershed segmentation method (GradWT ). This methoduses 2 steps before calculating the volume of interest. First, this methodcalculates a gradient image on which a seed is placed in the tumor and an-other in the background. Next, a watershed algorithm is used to grow theseeds in the gradient basins, thereby creating boundaries on the gradientedges. In our presentation, the watershed continues to grow the gradientbasins until all voxels are classified as either tumor or nontumor (back-ground). The voxel is assigned to tumor if 2 watersheds are competing forthe same voxel.

6. Absolute SUV (SUV2.5). Normalised (SUV) voxel intensities at a chosenabsolute threshold are used to delineate tumor. An SUV of 2.5 was used, asit might properly differentiate between benign and malignant lesions [29].

For all delineation methods, the maximum voxel value was obtained byapplying a cross-shaped pattern that could be less sensitive to noise. Thismethod searches for the region with the (local) average maximum intensity,based on the average of 7 neighbouring voxels, which was then used as maxi-mum or peak value.

The volume measured by VOIA41 using both sum of last 3 frames and 6.5mm FWHM was used as the defined reference standard. The volumes obtainedby all tumor delineation methods using various image characteristics were com-pared with this defined reference standard. To assess accuracy, the mean ratio(of all methods compared with the reference dataset) and precision, that is, SD,for each tumor delineation method were calculated across all studies for a giventracer. Percentage test-retest variability was defined as:

∣Xtest −Xretest

Xmean of test and retest∣ ×100%

�� ��4.2

where X is either VOI size or SUV. For test-retest variability, we calculated me-dian, first quartile, third quartile, minimum and maximum values, and coefficientof determination (R2) between test and retest studies. All automated methodswere supervised to identify outliers. Outliers were removed from all analysesand were defined as either a small tumor (i.e., a node) that visually showed anunrealistically large measured metabolic tumor volume or a large tumor (>100mL) that had test-retest variability larger than 100% due to a clearly visuallyunderestimated metabolic volume in either the test or the retest baseline study.

54

Page 7: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.3. Results

Figure 4.1. Mean ratio of tumor volume obtained with various tumor delineation methodsagainst defined reference standard (sum of last 3 frames and 6.5 mm FWHM) as functionof image resolution for 18F-FDG (A) and 18F-FLT (B). All bars cut off at 4 (indicated byabsence of SD bars) were higher than 20. Error bars represent SD

A 2-tailed paired Wilcoxon signed-rank test was used to indicate a statisti-cally significant difference between volume, SUV, and test-retest variability ofvolume and SUV obtained from images with various image characteristics andthose obtained from the defined reference standard. P values of less than 0.05were considered significantly different, and P values of between 0.1 and 0.05were considered to indicate a trend.

4.3 Results

4.3.1 Precision of Tumor Delineation Methods

Table 4.1 shows the number of outliers and detectable lesions for all tumor de-lineation methods in both test and retest studies. For 18F-FDG, identificationof several lesions was independent of contrast and resolution. Most methods didnot show a large difference (>3) in the number of outliers when image charac-teristics were varied, except for VOI50, VOIA41, both variants of VOISchaefer,and SUV2.5, which showed up to a 23% increase of the number of identifiedoutliers. Similarly, trends were observed for 18F-FLT. For this tracer, however,the number of lesions that could be detected depended moderately on imageresolution.

4.3.2 Accuracy of Tumor Delineation Methods

Figure 4.1 shows the effects of spatial resolution on the change in metabolicvolume for various tumor delineation methods and for both 18F-FDG and 18F-FLT. In general, there was variability (≤94%) in measured tumor volume when

55

Page 8: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Table

4.1.

Nu

mber

of

ou

tliersw

hen

determ

inin

gtu

mo

rvo

lum

efo

ra

llsca

ns

(testa

nd

retest)fo

rd

ifferen

tim

age

cha

racteristics

an

dra

dio

tracers

18F

-FD

G18F

-FLT

Sum

of

last

Sum

of

last

Sum

of

last

Sum

of

last

6fra

mes

3fra

mes

6fra

mes

3fra

mes

Tum

or

delin

eatio

nm

eth

od

6.5

*8.3

*10.2

*6.5

8.3

10.2

6.5

8.3

10.2

6.5

8.3

10.2

VO

I50

12

14

15

813

14

74

57

54

VO

I70

02

30

02

00

01

00

VO

IA

41

610

12

66

11

33

34

54

VO

IA

50

23

30

02

10

02

00

VO

IA

70

00

00

00

10

01

00

VO

ISchaefer−A

41

63

66

34

21

34

23

VO

ISchaefer−A

50

55

44

34

21

14

22

VO

IR

TL

03

20

01

00

01

00

Gra

dW

T0

02

00

20

01

01

1

SU

V2.5

12

11

11

53

63

01

21

0

Tota

ldete

cta

ble

lesio

ns

60

60

60

60

60

60

35

33

32

36

34

31

*im

age

reso

lutio

n(m

m)

56

Page 9: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.3. Results

Figure 4.2. Mean ratio of tumor volume obtained with various tumor delineation methodsagainst defined reference standard (sum of last 3 frames and 6.5 mm FWHM) as functionof image contrasts for 18F-FDG (A) and 18F-FLT (B). All bars cut off at 4 (indicated byabsence of SD bars) were higher than 20. Error bars represent SD

image resolution was changed. For almost all methods, except for VOIA70

and SUV2.5, the mean ratio obtained with low resolution (10.2 mm FWHM)was higher than that obtained with high resolution (6.5 mm FWHM). Com-pared with VOIA41 at 6.5 mm FWHM data, VOI50, VOISchaefer−A41, andGradWT provided similar volumes at high resolution. However, only GradWT

provided volumes independent of resolution. In contrast, VOI70, VOIA50, andVOIA70 gave lower volumes (>26%). Similar trends were observed between the2 tracers. However, for 18F-FLT, only a moderate overestimation of metabolicvolume (>15%) was observed for SUV2.5, compared with the reference value(Fig. 4.1B).

Figure 4.2 shows the effects of image contrast on the change in metabolicvolume for various tumor delineation methods and for both 18F-FDG and 18F-FLT. In general, the trends observed were similar to those when image resolutionwas changed; that is, results for lower contrast (6 frames or 30-60 min afterinjection) corresponded to those for lower resolution (10.2 mm FWHM).

4.3.3 Test-Retest Variability of VOI Size

Slope and R2 (intercept set to 0) between measured tumor volumes of test andretest studies obtained using different tumor delineation methods and tracersare shown in Table 4.2 for the defined reference standard. VOIA41, VOIA50,both variants of VOISchaefer, VOIRTL, and SUV2.5 showed good correlationbetween test and retest scans (R2 >0.90, slopes between 0.76 and 1.06) for bothtracers. For 18F-FDG, VOISchaefer−A41 showed the best correlation (R2, 1.00;slope, 1.01). Good correlation with respect to volume size (i.e., R2 >0.79, slopesbetween 0.71 and 1.11) was found for all tumor delineation methods, except for

57

Page 10: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Table 4.2. Slope (with intercept fixed to 0) and coefficient of determination between tumorvolume size measured for test and retest studies

Tumor delineation method 18F-FDG 18F-FLT

R2 Slope R2 Slope

VOI50 0.79 0.71 0.93 0.83

VOI70 0.89 0.99 0.52 1.52

VOI70 (reduced dataset) — — 0.81* 1.21*

VOIA41 0.91 0.90 0.91 0.82

VOIA50 0.94 0.93 0.91 1.06

VOIA70 0.88 1.11 0.72 0.91

VOISchaefer−A41 1.00 1.01 0.96 0.79

VOISchaefer−A50 0.92 0.83 0.95 0.76

VOIRTL 0.95 0.93 0.90 1.04

GradWT 0.58 1.34 0.41 1.02

GradWT (reduced dataset) 0.86� 0.94� 0.70� 1.10�

SUV2.5 0.97 1.11 0.99 0.86

*After removing 2 outliers

�After removing 5 outliers

�After removing 3 outliers

GradWT , which showed a correlation of only 0.58. However, 5 lesions were clearoutliers for this method. These outliers were found in cases of heterogeneouslesions or a low tumor-to-background ratio. After these outliers were removed,a good correlation (R2, 0.86; slope, 0.94) was observed for this method as well.A similar result was observed in the case of 18F-FLT, for which the correlationfor GradWT improved from 0.41 to 0.70 when 3 outliers were excluded. Inaddition, VOI70 showed 2 outliers that provided a much smaller volume inthe test scan than in the retest scan. After these outliers were removed, thecorrelation improved from 0.52 (slope, 1.52) to 0.81 (slope, 1.21). In all cases,these outliers were found for tumors with very heterogeneous uptake or lesionsthat were close to high-uptake structures. For 18F-FLT, VOIA50 and VOIRTL

showed the best correlation (R2 >0.90; slope, ∼1.05).

Figure 4.3 shows the test-retest variability of metabolic volume as a functionof image resolution for high image contrast or noise (45-60 min after injection).Overall, volume test-retest variability depended mainly on image resolution forall tumor delineation methods and for both tracers. Median test-retest vari-ability of tumor volume ranged from 8.3% to 23% and from 7.4% to 29% for18F-FDG and 18F-FLT, respectively. For 18F-FDG (Fig. 4.3A), fixed, adap-tive percentage threshold, both variants of VOISchaefer and VOIRTL methodsshowed deteriorating median test-retest variability (≤11% difference) for lowerresolution. Both variants of VOISchaefer showed good performance, having alow median volume test-retest value (<13%) and a low number of changes inmedian test-retest values (<3.7% difference) when resolutions were varied. In

58

Page 11: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.3. Results

Figure 4.3. Box-and-whisker plots of percentage test-retest (TRT) variability in tumorvolume obtained using various tumor delineation methods at high image contrast and varyingimage resolutions for 18F-FDG (A) and 18F -FLT (B). Median is horizontal line betweenlower (first) and upper (third) quartiles. Upper whisker represents upper quartile to maximumvalue, corrected for outliers (not exceeding 1.5 times interquartile range)

addition, VOIA41 and VOIA50 showed relatively low median volume test-retestvalues (14% and 17%, respectively) and a low number of changes in mediantest-retest values (<6.0% and 1.4% difference, respectively) when resolutionswere varied. Interestingly, for 18F-FLT (Fig. 4.3B), most methods showed anopposite trend in median test-retest variability when resolution was changed,with better performance at lower resolution. VOI70 and SUV2.5 were relativelyindependent of changes in resolution (<0.5% difference), having a low mediantest-retest variability (<15%). All other delineation methods gave a moderatevariation in test-retest variability (<9.5% difference) and reasonable mediantest-retest values (<29%) when resolutions were changed.

Figure 4.4 illustrates the effects of image contrast on volume test-retestvariability for a fixed resolution of 6.5 mm FWHM. Figure 4.4A shows that,for 18F-FDG, most methods were nearly independent of a change in contrast(<6.7% difference), except for GradWT (>8.3% difference). In contrast, for18F-FLT (Fig. 4.4B), reducing the contrast showed—likely because of an im-provement in noise levels by summing over more frames—an improvement inmedian test-retest variability for all methods (<12% lower difference), exceptfor GradWT (2.6% higher difference). SUV2.5 was the method that showed thelowest dependence on contrast (<1% difference).

4.3.4 Test-Retest Variability of SUV

Figure 4.5 illustrates the test-retest variability of SUV at high image contrastand for various image resolutions. Overall, changes in test-retest variability ofSUV were much lower than those seen for VOI size (median test-retest variabilityof SUV ranged from 4.5% to 11% and from 1.8% to 8.5% for 18F-FDG and18F-FLT, respectively). For all tumor delineation methods and both tracers,

59

Page 12: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Figure 4.4. Box-and-whisker plots of percentage test-retest (TRT) variability of tumorvolume obtained by various tumor delineation methods when using different image contrastsfor 18F-FDG (A) and 18F-FLT (B). Median is horizontal line between lower (first) andupper (third) quartiles. Upper whisker represents upper quartile to maximum value, correctedfor outliers (not exceeding 1.5 times interquartile range)

Figure 4.5. Box plots of percentage test-retest (TRT) variability of SUV obtained by varioustumor delineation methods at high image contrast when image resolutions were varied for 18F-FDG (A) and 18F-FLT (B). Median is horizontal line between lower (first) and upper (third)quartiles. Upper whisker represents upper quartile to maximum value, corrected for outliers(not exceeding 1.5 times interquartile range). Note that scale differs from Figure 4.3

the effect of image resolution on test-retest variability of SUV was small (<4%difference). Figure 4.6 illustrates the effects of image contrast on test-retestvariability of SUV for a fixed resolution of 6.5 mm FWHM. Trends were similarto those seen for changes in image resolution.

60

Page 13: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.4. Discussion

Figure 4.6. Box plots of percentage test-retest (TRT) variability of SUV obtained by vari-ous tumor delineation methods when different image contrasts were used for 18F-FDG (A)and 18F-FLT (B). Median is horizontal line between lower (first) and upper (third) quar-tiles. Upper whisker represents upper quartile to maximum value, corrected for outliers (notexceeding 1.5 times interquartile range). Note that scale differs from Figure 4.4

4.3.5 Statistics

Tables 4.3 and 4.4 (mean ± SD provided in Tables 4.5 – 4.8) indicate that formost tumor delineation methods a change in resolution has a more significantimpact on SUV and volume than their corresponding test-retest variability. Thesame trend was observed for a change in contrast, except for volumes obtainedon 18F-FLT images, where for most tumor delineation methods a change incontrast has a more significant impact on volume test-retest variability than onvolume itself.

4.4 Discussion

The aim of this study was to further investigate metabolic volume test-retestvariability beyond those findings published recently [66,67], not only by includ-ing various types of tumor delineation methods for 2 different tracers but alsoby studying the impact of image characteristics.

In theory, estimating metabolic tumor volume accuracy and reproducibilityis important for a curative outcome of radiation treatment planning. Tumordelineation methods may show metabolic tumor volumes that are too smallfor radiation treatment planning purposes, leading to local recurrences. Forresponse monitoring, however, consistent underestimations of metabolic tumorvolumes are less important, as only relative changes in tumor volume duringtherapy may be relevant. In general, all tumor delineation methods showedmuch larger variations in measured metabolic tumor volume (<29%) than inSUV (<11%), when image characteristics and radiotracers were varied. Thisfinding corresponds with the results of a previous report [30] showing that, in

61

Page 14: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Table

4.3.

Pva

lues

for

actu

al

volu

me,

SU

Va

nd

test-retestva

riability

of

volu

me

an

dS

UV

obta

ined

from

18F

-FD

Gim

ages

with

vario

us

ima

gech

ara

cteristicsco

mpa

redto

tho

seo

btain

edfro

mth

ed

efin

edreferen

cesta

nd

ard

(sum

of

last

3fra

mes

an

d6

.5m

mF

WH

M)

Volu

me

Volu

me

test-re

test

varia

bility

SU

VSU

Vte

st-rete

stvaria

bility

Tum

or

delin

eatio

nR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

st

meth

od

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

8.3

10.2

68.3

10.2

68.3

10.2

68.3

10.2

6

VO

I50

<0.0

01

<0.0

01

<0.0

01

0.2

22

0.4

95

0.5

62

<0.0

01

<0.0

01

<0.0

01

0.3

05

0.3

93

0.8

38

VO

I70

<0.0

01

<0.0

01

0.0

01

0.4

52

0.9

32

0.8

08

<0.0

01

<0.0

01

<0.0

01

0.2

21

0.5

94

0.4

90

VO

IA

41

<0.0

01

<0.0

01

<0.0

01

0.1

26

0.1

79

0.1

01

<0.0

01

<0.0

01

<0.0

01

0.0

30

0.9

46

0.9

66

VO

IA

50

<0.0

01

<0.0

01

0.0

23

0.6

26

0.8

15

0.5

79

<0.0

01

<0.0

01

<0.0

01

0.2

53

0.5

65

0.5

50

VO

IA

70

0.2

61

0.4

38

0.2

21

0.9

83

0.9

71

0.2

29

<0.0

01

<0.0

01

<0.0

01

0.1

35

0.5

00

0.6

55

VO

ISchaefer−A

41

0.0

88

<0.0

01

<0.0

01

0.2

20

0.0

98

0.0

42

<0.0

01

<0.0

01

<0.0

01

0.1

20

0.8

23

0.9

88

VO

ISchaefer−A

50

0.0

12

<0.0

01

<0.0

01

0.2

37

0.0

45

0.1

14

<0.0

01

<0.0

01

<0.0

01

0.0

46

0.8

33

0.5

84

VO

IR

TL

<0.0

01

<0.0

01

<0.0

01

0.5

03

0.1

76

1.0

00

<0.0

01

<0.0

01

<0.0

01

0.3

39

0.0

72

0.8

55

Gra

dW

T0.1

10

0.0

02

0.5

12

0.1

00

0.4

17

0.2

29

<0.0

01

<0.0

01

<0.0

01

0.0

26

0.9

15

0.9

19

SU

V2.5

0.2

07

0.1

52

<0.0

01

0.4

83

0.5

65

0.3

93

<0.0

01

<0.0

01

<0.0

01

0.1

81

0.4

39

0.0

55

62

Page 15: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.4. Discussion

Table

4.4.

Pva

lues

for

act

ua

lvo

lum

e,S

UV

an

dte

st-r

etes

tva

ria

bili

tyo

fvo

lum

ea

nd

SU

Vo

bta

ined

fro

m18F

-FL

Tim

age

sw

ith

vari

ou

sim

age

cha

ract

eris

tics

com

pare

dto

tho

seo

bta

ined

fro

mth

ed

efin

edre

fere

nce

sta

nd

ard

(su

mo

fla

st3

fra

mes

an

d6

.5m

mF

WH

M)

Volu

me

Volu

me

test

-rete

stvari

abilit

ySU

VSU

Vte

st-r

ete

stvari

abilit

y

Tum

or

delineati

on

Reso

luti

on

Contr

ast

Reso

luti

on

Contr

ast

Reso

luti

on

Contr

ast

Reso

luti

on

Contr

ast

meth

od

(FW

HM

,in

mm

)(f

ram

es)

(FW

HM

,in

mm

)(f

ram

es)

(FW

HM

,in

mm

)(f

ram

es)

(FW

HM

,in

mm

)(f

ram

es)

8.3

10.2

68.3

10.2

68.3

10.2

68.3

10.2

6

VO

I50

<0.0

01

0.0

01

0.2

72

0.0

24

0.1

05

0.0

67

<0.0

01

<0.0

01

0.0

43

0.0

83

0.7

70

0.0

43

VO

I70

<0.0

01

<0.0

01

0.0

12

0.2

77

0.5

61

0.1

17

<0.0

01

<0.0

01

<0.0

01

0.0

64

0.8

04

<0.0

01

VO

IA41

<0.0

01

<0.0

01

0.3

94

0.0

27

0.0

54

0.0

92

<0.0

01

<0.0

01

0.1

43

0.1

51

0.9

66

0.1

43

VO

IA50

<0.0

01

<0.0

01

0.0

73

0.3

91

0.2

17

0.0

09

<0.0

01

<0.0

01

0.0

03

0.4

63

0.2

41

0.0

03

VO

IA70

0.4

90

0.2

60

0.5

07

0.3

89

1.0

00

0.9

00

<0.0

01

<0.0

01

0.0

03

0.3

03

0.5

61

0.0

03

VO

ISchaefer−A

41

0.1

49

0.0

02

0.1

85

0.7

91

0.2

40

0.0

52

<0.0

01

<0.0

01

0.2

64

0.6

77

0.9

66

0.2

64

VO

ISchaefer−A

50

0.0

83

0.0

02

0.1

85

0.9

10

0.1

75

0.0

21

<0.0

01

<0.0

01

0.2

64

0.6

22

0.7

00

0.2

64

VO

IRT

L0.0

01

0.0

01

0.0

08

0.5

42

0.9

52

0.0

04

<0.0

01

<0.0

01

0.0

02

0.1

35

0.2

17

0.0

02

Gra

dW

T0.8

07

0.4

26

0.7

88

0.8

39

0.6

85

0.3

30

0.0

01

<0.0

01

0.0

96

0.7

87

0.1

68

0.0

96

SU

V2.5

<0.0

01

<0.0

01

<0.0

01

0.7

61

0.3

40

0.4

97

<0.0

01

<0.0

01

<0.0

01

0.2

41

0.4

14

<0.0

01

63

Page 16: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methodsTable

4.5.

Mea

n(S

D)

for

actu

al

volu

me

an

dtest-retest

varia

bilityo

fvo

lum

eo

btain

edfro

m18F

-FD

Gim

ages

with

vario

us

ima

gech

ara

cteristics

Volu

me

Volu

me

test-re

test

varia

bility

Tum

or

delin

eatio

nR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

st

meth

od

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

6.5

8.3

10.2

66.5

8.3

10.2

6

VO

I50

26.0

(48.0

)34.0

(60.1

)67.8

(138.1

)51.0

(94.1

)23.6

(34.9

)21.5

(22.0

)25.6

(22.9

)15.8

(16.7

)

VO

I70

5.2

(9.1

)6.2

(11.0

)7.0

(12.3

)6.6

(11.7

)31.4

(40.0

)29.0

(35.4

)26.6

(26.3

)34.9

(47.8

)

VO

IA

41

15.4

(25.3

)20.3

(29.6

)55.7

(94.9

)38.0

(65.2

)16.0

(22.2

)16.1

(15.7

)23.3

(23.3

)19.3

(21.2

)

VO

IA

50

9.9

(18.2

)11.0

(19.3

)12.3

(20.7

)10.7

(19.3

)29.8

(39.3

)28.2

(32.8

)25.5

(24.6

)33.6

(40.8

)

VO

IA

70

2.1

(4.2

)2.8

(5.7

)6.0

(16.0

)2.2

(4.6

)32.7

(32.1

)38.0

(46.4

)33.1

(40.9

)27.2

(35.5

)

VO

ISchaefer−A

41

14.7

(25.7

)16.2

(26.4

)46.5

(81.2

)29.9

(50.7

)13.8

(12.3

)19.8

(23.1

)23.4

(24.4

)23.2

(24.7

)

VO

ISchaefer−A

50

19.5

(32.0

)16.2

(26.4

)35.2

(64.8

)34.1

(62.5

)17.1

(19.6

)19.8

(20.9

)24.9

(25.4

)21.0

(22.7

)

VO

IR

TL

11.5

(19.3

)13.5

(21.4

)21.2

(38.1

)13.0

(21.0

)27.1

(31.3

)33.8

(37.6

)37.1

(38.7

)29.7

(35.0

)

Gra

dW

T13.7

(23.3

)16.2

(29.8

)16.8

(30.6

)15.4

(29.7

)30.4

(29.9

)18.0

(11.1

)21.1

(16.6

)38.7

(33.7

)

SU

V2.5

308.6

(410.0

)276.9

(411.4

)290.9

(435.2

)392.4

(429.7

)31.5

(35.5

)32.2

(36.9

)23.9

(19.1

)24.1

(21.6

)

Table

4.6.

Mea

n(S

D)

for

actu

al

SU

Va

nd

test-retestva

riability

of

SU

Vo

btain

edfro

m18F

-FD

Gim

ages

with

vario

us

ima

gech

ara

cteristics

SU

VSU

Vte

st-rete

stvaria

bility

Tum

or

delin

eatio

nR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

st

meth

od

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

6.5

8.3

10.2

66.5

8.3

10.2

6

VO

I50

5.7

(2.4

)5.3

(2.3

)5.3

(2.3

)5.5

(2.0

)8.6

(4.6

)8.5

(10.4

)10.0

(7.6

)9.6

(6.9

)

VO

I70

6.5

(3.0

)6.2

(2.8

)5.7

(2.8

)6.0

(2.6

)9.1

(9.3

)8.5

(9.2

)10.2

(10.0

)8.0

(5.3

)

VO

IA

41

5.3

(2.2

)5.1

(2.1

)5.1

(2.1

)5.0

(2.0

)7.5

(4.3

)7.5

(8.5

)11.3

(10.0

)8.6

(6.0

)

VO

IA

50

6.0

(2.6

)5.7

(2.5

)5.3

(2.4

)5.6

(2.3

)8.2

(7.4

)7.1

(8.4

)9.4

(9.3

)7.8

(6.1

)

VO

IA

70

7.3

(3.0

)6.9

(2.9

)6.7

(2.8

)6.7

(2.6

)8.8

(8.4

)7.2

(6.7

)9.3

(7.7

)7.7

(4.7

)

VO

ISchaefer−A

41

5.3

(2.0

)5.0

(2.1

)4.9

(2.0

)5.0

(1.9

)6.9

(4.0

)7.5

(9.2

)10.5

(10.5

)7.3

(4.9

)

VO

ISchaefer−A

50

5.3

(2.1

)5.0

(2.0

)4.9

(2.0

)5.0

(1.9

)7.2

(4.2

)7.3

(9.2

)9.7

(10.4

)8.1

(5.6

)

VO

IR

TL

5.8

(2.4

)5.5

(2.3

)5.0

(2.1

)5.3

(2.1

)7.7

(7.7

)9.2

(14.6

)12.4

(16.1

)9.0

(8.2

)

Gra

dW

T4.8

(2.0

)4.6

(2.0

)4.5

(2.0

)4.6

(1.9

)8.2

(5.3

)6.2

(4.9

)9.0

(11.2

)9.2

(7.4

)

SU

V2.5

3.9

(0.7

)3.8

(0.7

)3.7

(0.7

)3.8

(0.6

)6.9

(6.0

)6.1

(5.5

)6.0

(4.7

)5.3

(6.3

)

64

Page 17: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.4. Discussion

Table

4.7.

Mea

n(S

D)

for

act

ua

lvo

lum

ea

nd

test

-ret

est

vari

abi

lity

of

volu

me

obt

ain

edfr

om

18F

-FL

Tim

age

sw

ith

vari

ou

sim

age

cha

ract

eris

tics

Volu

me

Volu

me

test

-rete

stvari

abilit

y

Tum

or

delineati

on

Reso

luti

on

Contr

ast

Reso

luti

on

Contr

ast

meth

od

(FW

HM

,in

mm

)(f

ram

es)

(FW

HM

,in

mm

)(f

ram

es)

6.5

8.3

10.2

66.5

8.3

10.2

6

VO

I50

54.8

(111.3

)134.3

(268.3

)149.0

(246.3

)107.5

(182.8

)26.3

(22.2

)21.4

(18.1

)26.7

(26.8

)22.9

(28.5

)

VO

I70

5.5

(6.7

)6.5

(7.5

)7.1

(7.9

)5.8

(6.5

)30.8

(40.1

)32.2

(36.9

)20.5

(19.8

)26.0

(46.1

)

VO

IA41

53.1

(107.3

)26.9

(36.0

)68.0

(149.7

)58.1

(106.5

)22.5

(23.3

)17.7

(17.1

)14.4

(11.8

)13.0

(18.0

)

VO

IA50

9.1

(12.0

)13.1

(17.7

)12.4

(13.5

)25.4

(64.9

)20.4

(18.2

)28.9

(32.8

)16.7

(12.2

)15.2

(15.1

)

VO

IA70

1.3

(1.9

)1.3

(1.7

)1.7

(2.5

)1.3

(1.6

)29.9

(31.7

)44.1

(41.4

)34.3

(42.2

)30.5

(29.6

)

VO

ISchaefer−A

41

53.4

(108.8

)21.9

(27.7

)26.1

(27.4

)54.3

(105.7

)19.7

(18.0

)19.6

(17.4

)12.6

(10.2

)8.6

(8.8

)

VO

ISchaefer−A

50

54.3

(112.0

)21.9

(27.7

)24.2

(27.1

)54.3

(105.6

)19.7

(18.5

)19.7

(16.6

)12.8

(9.9

)7.9

(8.4

)

VO

IRT

L11.0

(13.6

)17.6

(21.8

)17.5

(18.8

)11.9

(13.5

)38.2

(42.4

)24.1

(19.3

)23.0

(18.1

)14.4

(15.0

)

Gra

dW

T12.0

(11.2

)17.0

(23.1

)19.2

(27.3

)16.3

(18.5

)39.8

(34.5

)28.2

(34.6

)24.3

(25.5

)48.0

(47.8

)

SU

V2.5

126.9

(228.1

)79.7

(151.4

)82.2

(148.8

)88.0

(176.2

)16.1

(13.8

)14.2

(11.2

)19.7

(24.7

)23.5

(24.2

)

65

Page 18: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

Table

4.8.

Mea

n(S

D)

for

actu

al

SU

Va

nd

test-retestva

riability

of

SU

Vo

btain

edfro

m18F

-FL

Tim

ages

with

vario

us

ima

gech

ara

cteristics

SU

VSU

Vte

st-rete

stvaria

bility

Tum

or

delin

eatio

nR

eso

lutio

nC

ontra

stR

eso

lutio

nC

ontra

st

meth

od

(FW

HM

,in

mm

)(fra

mes)

(FW

HM

,in

mm

)(fra

mes)

6.5

8.3

10.2

66.5

8.3

10.2

6

VO

I50

3.7

(1.3

)3.2

(1.1

)3.1

(1.0

)3.7

(1.3

)9.9

(9.4

)7.5

(8.1

)7.9

(5.7

)8.8

(6.8

)

VO

I70

4.2

(1.3

)3.8

(1.3

)3.4

(1.3

)4.1

(1.4

)8.1

(7.6

)7.5

(7.3

)7.9

(6.2

)8.8

(7.9

)

VO

IA

41

3.7

(1.3

)3.2

(1.1

)3.0

(1.1

)3.6

(1.2

)7.9

(7.9

)6.8

(7.0

)7.3

(6.1

)7.1

(5.8

)

VO

IA

50

3.7

(1.2

)3.5

(1.1

)3.2

(1.1

)3.9

(1.3

)7.4

(7.2

)6.8

(6.2

)6.8

(5.6

)6.5

(5.0

)

VO

IA

70

4.7

(1.4

)4.3

(1.3

)3.9

(1.4

)4.7

(1.5

)7.8

(7.1

)6.9

(6.6

)7.6

(5.7

)6.3

(4.9

)

VO

ISchaefer−A

41

3.6

(1.2

)3.2

(1.1

)3.0

(1.1

)3.5

(1.2

)7.2

(7.1

)6.2

(6.9

)7.0

(5.3

)7.0

(5.4

)

VO

ISchaefer−A

50

3.6

(1.2

)3.2

(1.1

)2.9

(1.1

)3.5

(1.2

)7.2

(7.0

)6.5

(6.9

)6.8

(5.1

)7.1

(5.3

)

VO

IR

TL

3.8

(1.3

)3.4

(1.1

)3.1

(1.0

)3.7

(1.2

)11.3

(13.3

)5.9

(7.4

)6.8

(7.3

)8.3

(9.2

)

Gra

dW

T2.9

(0.7

)3.0

(0.9

)2.8

(0.9

)3.1

(0.9

)7.5

(5.2

)5.9

(6.6

)7.7

(5.9

)9.8

(8.4

)

SU

V2.5

3.5

(0.6

)3.3

(0.4

)3.2

(0.4

)3.5

(0.6

)2.7

(3.2

)2.7

(2.5

)2.5

(2.1

)3.1

(2.0

)

66

Page 19: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.4. Discussion

response studies, there was only a small dependency of SUV ratios on VOI defi-nition and image parameters. Therefore, this discussion will focus on metabolicvolumes.

In our study, volumes determined by different tumor delineation methodswere affected by imaging parameters (resolution and noise or contrast) andtracers being used (Fig. 4.1– 4.6). This finding is in line with a previousstudy [19], showing that measured tumor volumes were affected by severalfactors, that is, image reconstruction settings, smoothing filters, and measuredmaximal SUV within a lesion. Moreover, the performance of several automaticor semiautomatic tumor delineation methods as a function of PET image charac-teristics agreed with results obtained from simulation and phantom studies [68].

Differences in tumor volumes generated with different methods have beenreported previously [7,31,32]. Substantially different results could be obtained incomparison with other image modalities or pathologic data. Few clinical studieshave shown the potential of threshold-based methods for different tracers [66,67,69]. Two articles [66,67] showed that different metabolic tumor volume test-retest repeatabilities were obtained when different tumor delineation methodswere used. Moreover, similarly to this study, they showed that volume test-retest variability obtained from 18F-FLT was larger than that from 18F-FDG.To date, however, no gold standard exists for accurately defining tumor volumeson various image modalities, with the possible exception of pathologic findings.

A previous study [70] reported excellent reproducibility, with an intraclasscorrelation coefficient of 0.98 and an SD of 7% for quantitative 18F-FLT mea-surements with high image contrast, a resolution of about 7 mm FWHM,VOIA41. In addition, this study showed that there was no significant corre-lation between absolute 18F-FLT uptake and lesion size in either lung or head-and-neck cancers, indicating that this threshold-based delineation method wasreliable for defining tumor boundaries of all lesion sizes. For this reason, in ourstudy, VOIA41 was used to compare measured volumes obtained by all other tu-mor delineation methods. However, our study shows that VOIA41 seemed to besensitive to a change in image characteristics for both tracers. For 18F-FLT, ahigh SD was observed when we summed over more frames, caused by 1 primarylesion with heterogeneous uptake. Furthermore, a relatively high number ofoutliers (≤20%) was found for both tracers when different image characteristicswere used (Table 4.1). Therefore, VOIA41 seems to be reliable only for highimage resolution and lesions with high contrast to background.

Two versions of VOISchaefer were investigated in this study. The perfor-mances between the 2 versions were similar (Fig. 4.2). However, for 18F-FDGat high resolution, a high SD of VOISchaefer−A50 was observed, caused by 1lesion with heterogeneous uptake that was near the spine. Therefore, the ver-sion in which diameter is obtained using VOIA41 is preferred in this dataset.

For radiation treatment planning, VOI50, VOIA41, both versions ofVOISchaefer, and GradWT provided, for both tracers, volumes that wererelatively independent of image contrast. However, VOI50, VOIA41, and bothvariants of VOISchaefer showed poorer performance at low image resolution(Fig. 4.1). As GradWT was relatively independent of image characteristics and

67

Page 20: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

Chapter 4. Effects of image characteristics on performance of tumor delineation methods

showed a low number of outliers when image characteristics were varied (<4%),GradWT seems to be a good possible candidate for radiation treatment planning.However, validation of the various tumor delineation methods against a goldstandard, for example, pathology-determined tumor sizes, is still warranted.

Test-retest variability is important for assessing differences between suc-cessive scans beyond methodology-related variability. Clearly, for monitoringresponse, test-retest variability needs to be as low as possible. Large differencesin test-retest variability of tumor volume estimates (≤ 94%) were obtained fordifferent tumor delineation methods when different tracers or image character-istics were used, especially in cases of low image resolution. For both tracers,GradWT showed small test-retest variability (<17%) when image resolution wasvaried but resulted in larger test-retest variability when contrast was varied.One limitation of GradWT is that delineation of the tumor boundaries by thegradient algorithm depends on the tumor-to-background ratio, that is, con-trast, showing better performance for higher image contrast. For both tracers,VOI70 and SUV2.5 gave low changes in test-retest variability (<5.3% difference)when image characteristics were varied. Measured volumes obtained by VOI70,however, were too small to cover the whole lesion. SUV2.5 showed large overesti-mations of volume. In addition, SUV2.5 generated a large number of outliers fordifferent contrasts, especially for 18F-FDG (Table 4.1). In general, VOIA50 andVOIRTL showed reasonable test-retest variability and a small number of outliersfor both 18F-FDG and 18F-FLT (Fig. 4.3- 4.6). In general, VOIA50 showeda slightly smaller coefficient of variation (calculated as mean divided by SD)(<21%) than did VOIRTL (>27%) when resolution was changed. Therefore, asalso reported previously [66], VOIA50 seems to be a good possible candidate forresponse monitoring purposes.

For all image characteristics investigated, there was poor agreement betweenmedian test-retest variability of tumor volume and SUV (R2 <0.3, data notshown). In addition, there were large differences in median test-retest vari-ability between the 2 parameters for all image characteristics; that is, me-dian differences between test-retest variability of tumor volume and SUV wereapproximately 2.3-fold (range, 1.1–4.5) and 3.7-fold (range, 1.0–11) for 18F-FDGand 18F-FLT, respectively. The implication is that tumor volume and its test-retest variability are more sensitive to changes in image characteristics than areSUV and its test-retest variability, as also confirmed by the statistical post hocanalysis.

For most methods, higher values of median SUV test-retest variability wereobtained for lower contrast, with the exception of both variants of VOISchaefer

and SUV2.5 for 18F-FDG (Fig. 4.6A) and VOIA70, VOIRTL, and GradWT for18F-FLT (Fig. 4.6B). The likely explanation for this poorer percentage re-producibility is the lower average SUV caused by summing over more frames.When imaging parameters are varied, larger differences in SUV test-retest vari-ability were seen for 18F-FLT than for 18F-FDG, probably because of the lowerSUV for 18F-FLT. Previously, in a comparative study, it was shown that meanmaximal SUV in all lesions was lower for 18F-FLT than for 18F-FDG [70].

There were several limitations in determining the test-retest variability of

68

Page 21: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission

4.5. Conclusion

metabolic tumor volume and SUV using the various methods. Although 2 dif-ferent types of tracers were used in this study, both tracers have the same kind ofkinetic model. Therefore, the impact of various image characteristics on tracerswith other kinetic behaviors should be further investigated. In addition, becausethis was a clinical study, the exact lesion volumes clearly were not known. Thisissue needs to be addressed in future studies by comparing VOI measurementswith independent measurements based on other (anatomic) image modalitiesor pathologic specimens. Furthermore, visual inspection of outliers may haveaffected the performance evaluations to some extent. However, these visualinspections were required, as unrealistically large tumor segmentations mightoccur during segmentation because of noise, surroundings, or uptake hetero-geneity. Finally, the sum of the last 3 frames not only shows higher contrastbut also more noise. However, data were summed over 30 or 15 min, providingimages with good statistical quality. In this way, we attempted to reduce theeffect of a difference in noise between the 2 datasets.

4.5 Conclusion

For all automatic or semiautomatic tumor delineation methods tested, derivedmetabolic tumor volumes themselves and test-retest variability of both metabolictumor volume and SUV depended on image characteristics. Differences intest-retest variability of SUV were much smaller than those of tumor volume.These findings underline the need for a careful optimization of both the tumordelineation method used and the imaging parameters to obtain accurate andreproducible delineations of tumors or metabolic volume assessments.

69

Page 22: E ects of image characteristics on performance of tumor ... 4.pdf · Chapter 4. E ects of image characteristics on performance of tumor delineation methods Abstract Positron emission