161
Downloaded from UvA-DARE, the Institutional Repository of the University of Amsterdam (UvA) http://dare.uva.nl/document/45003 File ID 45003 Filename Klein_Zeggelink.pdf SOURCE, OR PART OF THE FOLLOWING SOURCE: Type Dissertation Title Quantitative evaluation of uncertainty in the radiological assessment of breast tumor extent : effect on treatment Author W.F.A. Klein Zeggelink Faculty Faculty of Medicine Year 2007 Pages 160 FULL BIBLIOGRAPHIC DETAILS: http://dare.uva.nl/record/217309 Copyrights It is not permitted to download or to forward/distribute the text or part of it without the consent of the copyright holder (usually the author), other then for strictly personal, individual use. UvA-DARE is a service provided by the Library of the University of Amsterdam (http://dare.uva.nl)

Contrast-enhanced MRI in breast cancer patients eligible for breast-conserving therapy: complementary value for subgroups of patients

Embed Size (px)

Citation preview

Downloaded from UvA-DARE, the Institutional Repository of the University of Amsterdam (UvA)http://dare.uva.nl/document/45003

File ID 45003Filename Klein_Zeggelink.pdf

SOURCE, OR PART OF THE FOLLOWING SOURCE:Type DissertationTitle Quantitative evaluation of uncertainty in the radiological assessment of breast tumor extent :

effect on treatmentAuthor W.F.A. Klein ZeggelinkFaculty Faculty of MedicineYear 2007Pages 160

FULL BIBLIOGRAPHIC DETAILS: http://dare.uva.nl/record/217309

Copyrights It is not permitted to download or to forward/distribute the text or part of it without the consent of the copyright holder(usually the author), other then for strictly personal, individual use. UvA-DARE is a service provided by the Library of the University of Amsterdam (http://dare.uva.nl)

QUANTITATIVE EVALUATION OF UNCERTAINTY IN THE RADIOLOGICAL ASSESSMENT OF BREAST TUMOR EXTENT: EFFECT ON TREATMENT

WILLIAM F. A. KLEIN ZEGGELINK

The research described in this thesis was performed at the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Radiology Department, Amsterdam, The Netherlands

The studies have been financially supported by: The Dutch Cancer Society (Grant No. NKI 99-2035)

Printing of this thesis was financially supported by: The Dutch Cancer Society

This thesis was printed by: PrintPartners Ipskamp, Enschede, The Netherlands

© 2007 William Franciscus Agatha Klein Zeggelink

QUANTITATIVE EVALUATION OF UNCERTAINTY IN THE RADIOLOGICAL ASSESSMENT OF BREAST TUMOR EXTENT: EFFECT ON TREATMENT

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus

prof. dr. J.W. Zwemmer

ten overstaan van een door het college voor promoties ingestelde

commissie, in het openbaar te verdedigen in de Aula der Universiteit

op donderdag 1 maart 2007, te 12:00 uur

door William Franciscus Agatha Klein Zeggelink

geboren te Lichtenvoorde

Promotiecommissie:

Promotor: prof. dr. G.M.M. Bartelink

Co-promotor: dr. K.G.A. Gilhuijs

Overige leden commissie: prof. dr. ir. C.A. Grimbergen

prof. dr. G.J. den Heeten

prof. dr. B.B.R. Kroon

prof. dr. ir. J.J.W. Lagendijk

prof. dr. A.M. Vossepoel

Faculteit der Geneeskunde

TABLE OF CONTENTS

Section I. General introduction Chapter 1. Introduction 9 1.1. Incidence and mortality of breast cancer 11 1.2. Current practice of breast cancer treatment 11 1.2.1. Locoregional treatment options 11 1.2.2. Systemic treatment options 12 1.3. Radiological assessment of breast tumor extent 12 1.3.1. Treatment selection 12 1.3.2. Treatment implementation 13 1.3.2.1. Extent measurement 14 1.3.2.2. Extent localization 15 1.3.3. Treatment evaluation 16 1.4. Objectives of this thesis 17 1.5. References 17

Section II. Treatment selection Chapter 2. Contrast-enhanced MRI in breast cancer patients eligible for breast- 25 conserving therapy: complementary value for subgroups of patients European Radiology, Vol. 16, No. 3, March 2006, pp. 692 – 701

Section III. Treatment implementation Chapter 3. Assessment of analysis-of-variance-based methods to quantify the 51 random variations of observers in medical imaging measurements: guidelines to the investigator Medical Physics, Vol. 31, No. 7, July 2004, pp. 1996 – 2007 Chapter 4. Reproducibility of the assessment of tumor extent in the breast using 75 multiple imaging modalities Medical Physics, Vol. 30, No. 11, November 2003, pp. 2919 – 2926 Chapter 5. Reproducibility of mammary gland structure during repeat setups in 95 a supine position Medical Physics, Vol. 29, No. 9, September 2002, pp. 2062 – 2069

Section IV. Treatment evaluation Chapter 6. Potential impact of measurement variation on false-categorization 117 rates in evaluating breast-tumor response using RECIST Submitted to the Journal of the National Cancer Institute (JNCI)

Section V. General discussion Chapter 7. Discussion 135 7.1. Radiological assessment of breast tumor extent 137 7.1.1. Treatment selection 137 7.1.2. Treatment implementation 138 7.1.2.1. Extent measurement 138 7.1.2.2. Extent localization 139 7.1.3. Treatment evaluation 141 7.2. Conclusions 144 7.3. References 145

Summary 151

Samenvatting 155

Dankwoord 159

List of abbreviations 160

SECTION I

General introduction

CHAPTER 1

Introduction

Introduction 11

General introduction 11

1.1. INCIDENCE AND MORTALITY OF BREAST CANCER

Breast cancer is by far the most frequently occurring type of carcinoma among women in western countries; about one third of all cancer cases in these women concerns breast malignancies. Closely following the United States, the breast cancer incidence rate in the Netherlands is one of the highest worldwide 1. Around one out of every nine women will develop breast cancer during her lifetime, and in nearly one out of every 22 women, the disease will lead to premature death. In the year 2000, 11300 Dutch women were diagnosed with breast cancer, and 3400 died from the disease.

The past decades have demonstrated a steadily increase in the breast cancer incidence rate. The increment is caused in part by the introduction of a nationwide biannual mammographic screening program for women in the age between 50 and 70 years that was started in the Netherlands in the early 1990’s 2. Despite the increased breast cancer incidence rate, the mortality rate remains fairly constant, and this is likely to be attributable to both earlier diagnosis and better treatment. Tumor extent, histological grade, and lymph node status are the most important prognostic factors 3.

1.2. CURRENT PRACTICE OF BREAST CANCER TREATMENT

The treatment of breast cancer currently exists of a combination of locoregional therapy and systemic therapy. In either category various treatment options are available, and the specific treatment combination that will be applied, depends on the stage of the cancer (tumor extent, lymph node involvement, distant metastases), tumor type, tumor extent in relation to breast size, patient age, and patient preferences.

1.2.1. Locoregional treatment options

Significant progress has been made in treating breast cancer patients for whom mastectomy is the treatment of choice. The modified radical mastectomy according to Madden is commonly performed, and aims at sparing the major pectoral muscle as much as possible 4. Preservation of the skin and the nipple facilitate reconstructive surgery, which is often performed directly after resection of the mammary gland 5. If the tumor has metastasized to the lymph nodes in the axilla, an axillary lymph node dissection is included in the surgical procedure 6. A mastectomy is regularly followed by irradiation of the chest wall and, if indicated, the regional lymph node stations 7.

Obviously, patients are preferably offered breast-conserving therapy (BCT). The principal goal of BCT is complete elimination of the cancer while maintaining optimal cosmesis. Treatment by BCT basically involves local excision of the lesion followed by postoperative radiotherapy. Surgeons aim to remove the tumor with a margin of healthy tissue, and a sentinel-node procedure is regularly performed to assess the need for an axillary lymph node dissection 8. Radiotherapy aims at eliminating any remaining (residual and microscopic) tumor in the breast, and consists of whole-breast irradiation, a

12 Chapter 1

12 Section I

boost dose to the tumor bed, and (if indicated) irradiation of the regional lymph node stations 7. Several randomized trials prospectively compared the results of BCT with the results of mastectomy, and found similar five- and ten-years survival rates 9-11.

1.2.2. Systemic treatment options

The past decades have also demonstrated significant advances in systemic treatment strategies. Chemotherapy may be either used as primary treatment or act as an adjunct to locoregional therapy. Locally-advanced breast cancer is generally considered inoperable. For those patients, chemotherapy may be used as primary treatment 12. If the chemotherapy is successful, the patient is usually treated by lumpectomy or mastectomy, followed by radiotherapy. Neoadjuvant (i.e., prior to locoregional treatment) chemotherapy is offered to patients who are not eligible for BCT a priori, but are deemed to have a considerable chance of becoming eligible by chemotherapy prior to surgery 13. Neoadjuvant chemotherapy may also be offered for patients who already are eligible for BCT, in order to reduce the tumor load prior to surgical excision, thereby increasing the probability of successful cosmetic outcome 14. Adjuvant (i.e., after locoregional treatment) chemotherapy is offered to selected patients with the aim to eliminate already existing distant metastases 15. Hormonal therapy aims at reducing the growth-stimulating influence of the estrogen hormone on existing breast tumors, and may even prevent the development of a new primary tumor in the other breast 16. Hormonal therapy is currently administered to selected patients with anti-estrogen-sensitive breast tumors.

1.3. RADIOLOGICAL ASSESSMENT OF BREAST TUMOR EXTENT

Current clinical guidelines for the treatment of breast cancer incorporate tumor extent as a major parameter for selecting the type of treatment, treatment implementation, and for the evaluation of treatment results. Imaging techniques have become invaluable tools in oncology and are routinely used for the assessment of tumor extent in the breast. This thesis focuses on the interaction between uncertainty in radiological assessment of tumor extent and current clinical guidelines for breast cancer treatment.

1.3.1. Treatment selection

Eligibility for BCT depends on several factors. Among those are the tumor type and its location in the breast, both of which are associated with difficulty of obtaining a complete surgical resection. Another important factor determining eligibility is tumor extent in relation to the size of the breast, which is associated with difficulty of obtaining satisfactory cosmetic outcome after treatment. As such, current clinical guidelines consider a large tumor in a small breast as an important contra-indication for BCT, and pertinently advise against BCT in all cases of multicentric disease 17.

Introduction 13

General introduction 13

Mammography and ultrasonography are the standard modalities currently used to preoperatively assess tumor extent and, combined with clinical examination, to determine patient eligibility for BCT 18. Conventional imaging may, however, not correctly visualize the extent of the tumor in specific cases 19,20. Typical examples are tumors with a multinodular or diffuse growth pattern, such as invasive lobular carcinoma, or lesions with an extensive intraductal component. Assessment of tumor extent using mammography may also be problematic in younger patients due to dense breast parenchyma 21. It is currently unknown to what degree inaccuracy in the radiological assessment of tumor extent affects the ability of current clinical guidelines to accurately select patients for BCT on the basis of conventional imaging.

Contrast-enhanced magnetic resonance imaging (CE MRI) was found to be highly sensitive for invasive carcinoma 22,23. Several investigators showed that CE MRI results in a more accurate assessment of the extent of unifocal tumors compared to conventional imaging 24,25. CE MRI was also found to be more accurate than conventional imaging in defining multinodular disease 26,27. The majority of these studies were, however, performed for heterogeneous patient groups, i.e., both eligible and not eligible for BCT. The accuracy of CE MRI for the assessment of tumor extent specifically in patients considered eligible for BCT on the basis of conventional imaging and clinical examination is therefore still unknown. If CE MRI is more accurate than conventional imaging in preoperative assessment of tumor extent, it may improve the efficacy of current clinical guidelines to evaluate eligibility for BCT.

A major drawback of performing CE MRI is its relatively high cost compared to mammography and ultrasonography. Subgroups of patients specifically considered eligible for BCT for which CE MRI is more accurate than conventional imaging in assessment of tumor extent have not yet been identified. If such subset of patients can be found then unnecessary CE MRI for patients who do not benefit may be avoided. This necessitates a quantitative comparison between measurements of tumor extent performed at preoperative imaging and those obtained postoperatively at pathology.

1.3.2. Treatment implementation

BCT pursues a successful balance between adequate treatment of the disease (local control) and adverse side effects (reduced cosmetic outcome). For optimal treatment of the cancer, the tumor should be completely excised with a margin of healthy tissue around it 28-30. For optimal cosmetic outcome, however, the total excision volume should be as small as possible 31. The margin of tissue around the tumor aims at taking multiple sources of uncertainty into account during surgery, including uncertainty in extent measurement, and uncertainty in extent localization.

14 Chapter 1

14 Section I

1.3.2.1. Extent measurement

Especially for non-palpable tumors, the surgeon relies heavily on the radiological measurements of tumor extent, in order to optimally estimate the total volume of tissue that has to be excised. The preoperative measurement of tumor extent is, however, subject to random variations, which are caused by the combined effects of imaging modality, measurement technique, and human observer. Such may result in an underestimation of tumor extent at surgery, thus increasing the risk of incomplete excision, or an overestimation of tumor extent, thus reducing cosmetic outcome.

Current clinical guidelines for the implementation of BCT recommend surgeons to remove the tumor with a margin of healthy tissue of typically 1 cm 32. It is currently unknown to what degree the uncertainty in the preoperative measurement of tumor extent is taken into account within the widely-adopted 1 cm surgical margin. One way to address this question, is to derive minimum surgical safety margins to reduce the probability of underestimating tumor extent at surgery up to an a priori defined confidence level, and compare these to the generally-intended 1 cm safety margin. Such an approach requires, however, reliable estimates for the random variations in the preoperative radiological measurements of tumor extent.

Some authors compared measurements of tumor extent at preoperative imaging with measurements of tumor extent at pathology. Amano et al. performed linear-regression analysis to establish the correspondence between preoperative imaging and pathology 33. The correspondence was expressed by the correlation coefficient and the slope of the regression line. Such measures of correspondence can, however, not be used to derive minimum surgical safety margins. Davis et al. also performed linear-regression analysis and provided the standard error of the residuals around the regression line to express the random variations 34. Boetes et al. assessed the average discrepancy and expressed the random variations by the standard deviation of the differences between preoperative imaging and pathology 35. Either way, the reported random variations are the joined random variations of those at preoperative imaging and those at pathology. In addition, these studies were not performed using measurements of tumors extent specifically for a population of patients eligible for BCT. Applying the estimates for the random variations obtained in those studies may result in overestimating minimum surgical safety margins, thus rendering a comparison with the currently widely-adopted 1 cm margin useless.

If the reproducibility of CE MRI in the preoperative measurement of tumor extent in the breast is better than that of conventional imaging then CE MRI may improve the efficacy of the 1 cm surgical margin in taking uncertainty in tumor extent into account. The merit of CE MRI compared to conventional imaging in reducing the uncertainty in tumor extent within the 1 cm safety margin can, however, only be assessed if the random variations at preoperative imaging are separated from those at pathology and if estimates are obtained specifically in patients eligible for BCT.

Introduction 15

General introduction 15

Several techniques exist to quantify the random variations in measurements of breast tumor extent. The mathematically more sophisticated analysis-of-variance- (ANOVA-) based methods have the major advantages that the necessity to choose pathology as the gold standard is avoided and that measurements do not need to be repeated 36. Nevertheless, whereas basic ANOVA-based methods do not provide all statistics of interest, those that do are rather complicated to implement. Such calls for the development of a more efficient algorithm that provides all statistics of interest, is straightforward to implement, and easy to use. Furthermore, it is currently unknown how and to what extent the performance of ANOVA-based methods depends on the number of techniques and the number of cases in an observer study. A comparative study to assess the performance of existing and newly-developed ANOVA-based methods is needed to establish guidelines to the investigator for practical application.

1.3.2.2. Extent localization

Christiaens et al. performed a multi-institutional investigation on the accuracy of BCT and found a poor correlation between the extent of the tumor and the total volume of excised breast tissue 37. The largest distance between the tumor boundary and the border of the excision specimen is often larger than the intended 1 cm, whereas the smallest distance is often smaller than the intended 1 cm. The asymmetry with respect to the position of the tumor in the volume of excised breast tissue thus increases the risk of obtaining an incomplete excision and concurrently reduces cosmetic outcome.

Non-palpable tumors represent a substantial part of the lesions in patients eligible for BCT. Surgeons are typically guided toward the position of the tumor by a hook wire that is preoperatively installed in the breast under image guidance. Despite the use of preoperatively installed hook wires for the resection of non-palpable tumors, the experience of the surgeon was found to be the most determinant factor in obtaining a complete tumor excision 38. The reason for this observation may be found in the fact that, although a hook wire indicates the position of the tumor, it does not provide the surgeon with information regarding tumor extent during surgery.

Preoperatively obtained breast imaging may aid the surgeon to localize both the position and the extent of the tumor just before surgery commences. Compression of the breast during mammography causes, however, significant distortion of the breast which results in alterations of the internal structure 39. Using the geometrical information provided by mammography may result in errors when such knowledge is applied to the breast in uncompressed state 40. CE MRI has been shown to be more accurate in demonstrating tumor extent than conventional imaging 33-35. In addition, CE MRI provides geometrical information on tumor position and tumor extent in three dimensions (3D), and does not require compression of the breast. Preoperatively obtained CE MRI of the patient in a supine position may thus provide undistorted geometrical 3D information on both position and extent of the tumor prior to surgery.

16 Chapter 1

16 Section I

The potential of preoperative CE MRI to provide a reliable representation of the geometry of the breast at surgery depends, however, on the magnitude of internal shifts of glandular tissue between imaging and surgery. The variability of such internal tissue displacements is determined by the geometrical reproducibility of mammary gland structure, which is currently unknown. Once the geometrical reproducibility is known, margins can be formulated that take the uncertainty in the internal tissue shifts between preoperative CE MRI and surgery into account. Such an approach directly allows to establish how the geometrical reproducibility of the breast compares to the current 1 cm margin in BCT. The geometrical reproducibility of mammary gland structure is also of interest for future advanced strategies that aim at marking the extent of the tumor prior to actual surgery based on preoperative imaging.

1.3.3. Treatment evaluation

Breast tumor extent is regularly used as a determinative parameter to assess the efficacy of hormonal therapy, chemotherapy, and radiotherapy. Differences in tumor extent between treatment start and follow up are used to evaluate whether therapy induced tumor regression, stopped tumor growth, or did not affect tumor progression. Assessment of response by serial measurements of tumor extent is currently the standard method to evaluate the value of new treatment strategies, and is also used to guide the clinician and the patient in deciding whether to continue, change, or stop the current treatment 41-46.

Standardized guidelines based on uniform criteria are mandatory for an objective, meaningful, and consistent assessment of tumor response to therapy. During the last decades of the previous century, various oncology working groups employed their own specific guidelines 47-49. The guidelines proposed in 1979 by the World Health Organization (WHO) have been used most commonly 50. These were predominantly based on bidimensional measurements of tumor extent: the product of perpendicular tumor diameters. In the year 2000, the Response Evaluation Criteria In Solid Tumors (RECIST) were introduced, and rapidly superseded the WHO guidelines 51. RECIST is based on unidimensional measurements of tumor extent: the largest tumor diameter only.

The RECIST guidelines incorporate quantitative criteria to classify the effect of therapy to one of four response categories: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). The presence of random variations inevitably affects, however, the measurements at treatment start as well as the measurements during follow up. Consequently, a difference in tumor extent may be observed (regression or progression), while in reality the tumor extent did not change (stable disease). If the observed difference in tumor extent is large enough to meet the criteria imposed by the RECIST guidelines, such will result in falsely classifying the tumor to either the PR or the PD instead of the SD category.

Some authors have estimated measurement error due to random variations and its possible impact on response evaluations 52-55. These studies were, however, restricted to pulmonary metastases on chest roentgenograms, or to simulated nodules and neck nodes

Introduction 17

General introduction 17

assessed by physical examination (palpation), and were all focused at bidimensional measurements of tumor extent. To our very best knowledge, no authors have addressed the false-categorization rates in response evaluations that may result solely due to random variations specifically for breast tumors. Such knowledge is, however, essential to assess the reliability of current response evaluations and decision-making processes in current clinical practice, and is also mandatory to allow for studies that are aimed at comparing future results with those obtained at present.

Furthermore, it is unknown whether the false-categorization rates due to random variations differ between conventional imaging and CE MRI, nor is it clear whether these are correlated with the size of the tumor. It is also of interest to know the minimum difference between two measurements of tumor extent that needs to be observed to discriminate between random variation and a statistically significant reduction of tumor extent. A study aimed at estimating the false-categorization rates due to random variations that occur in response evaluations using the current RECIST criteria, imaging modalities, and measurement techniques is therefore necessary.

1.4. OBJECTIVES OF THIS THESIS

The aim of this thesis is twofold:

- To quantify the impact of uncertainties in the radiological assessment of tumor extent on the efficacy of current clinical guidelines for the treatment of breast cancer;

- To assess whether CE MRI of the breast improves the efficacy of these guidelines;

1.5. REFERENCES 1 Research Committee of the Netherlands Cancer Registry, "Incidence of cancer in the

Netherlands 1999/2000," (2003). 2 Research Council of the Netherlands Cancer Registry, "Trends of Cancer in The

Netherlands 1989-1998," (2002). 3 G. D'Eredita, C. Giardina, M. Martellotta, T. Natale, and F. Ferrarese, "Prognostic

factors in breast cancer: the predictive value of the Nottingham Prognostic Index in patients with a long-term follow-up that were treated in a single institution," Eur. J. Cancer 37, 591-596 (2001).

4 J. L. Madden, S. Kandalaft, and R. A. Bourque, "Modified radical mastectomy," Ann. Surg. 175, 624-634 (1972).

5 K. Sandelin, A. M. Billgren, and M. Wickman, "Management, morbidity, and oncologic aspects in 100 consecutive patients with immediate breast reconstruction," Ann. Surg. Oncol. 5, 159-165 (1998).

18 Chapter 1

18 Section I

6 M. Blichert-Toft, "Axillary surgery in breast cancer management--background, incidence and extent of nodal spread, extent of surgery and accurate axillary staging, surgical procedures," Acta Oncol. 39, 269-275 (2000).

7 J. Kurtz, "The curative role of radiotherapy in the treatment of operable breast cancer," Eur. J. Cancer 38, 1961-1974 (2002).

8 U. Veronesi, V. Galimberti, S. Zurrida, F. Pigatto, P. Veronesi, C. Robertson, G. Paganelli, V. Sciascia, and G. Viale, "Sentinel lymph node biopsy as an indicator for axillary dissection in early breast cancer," Eur. J. Cancer 37, 454-458 (2001).

9 U. Veronesi, B. Salvadori, A. Luini, M. Greco, R. Saccozzi, M. del Vecchio, L. Mariani, S. Zurrida, and F. Rilke, "Breast conservation is a safe method in patients with small cancer of the breast. Long-term results of three randomised trials on 1,973 patients," Eur. J. Cancer 31A, 1574-1579 (1995).

10 J. A. van Dongen, A. C. Voogd, I. S. Fentiman, C. Legrand, R. J. Sylvester, D. Tong, S. E. van der, P. A. Helle, K. van Zijl, and H. Bartelink, "Long-term results of a randomized trial comparing breast-conserving therapy with mastectomy: European Organization for Research and Treatment of Cancer 10801 trial," J. Natl. Cancer Inst. 92, 1143-1150 (2000).

11 U. Veronesi, N. Cascinelli, L. Mariani, M. Greco, R. Saccozzi, A. Luini, M. Aguilar, and E. Marubini, "Twenty-year follow-up of a randomized study comparing breast-conserving surgery with radical mastectomy for early breast cancer," N. Engl. J. Med. 347, 1227-1232 (2002).

12 V. Valero, V, A. U. Buzdar, and G. N. Hortobagyi, "Locally Advanced Breast Cancer," Oncologist. 1, 8-17 (1996).

13 B. Fisher, J. Bryant, N. Wolmark, E. Mamounas, A. Brown, E. R. Fisher, D. L. Wickerham, M. Begovic, A. DeCillis, A. Robidoux, R. G. Margolese, A. B. Cruz, Jr., J. L. Hoehn, A. W. Lees, N. V. Dimitrov, and H. D. Bear, "Effect of preoperative chemotherapy on the outcome of women with operable breast cancer," J. Clin. Oncol. 16, 2672-2685 (1998).

14 T. A. Buchholz, K. K. Hunt, G. J. Whitman, A. A. Sahin, and G. N. Hortobagyi, "Neoadjuvant chemotherapy for breast carcinoma: multidisciplinary considerations of benefits and risks," Cancer 98, 1150-1160 (2003).

15 F. Cardoso and M. J. Piccart, "The best use of chemotherapy in the adjuvant setting," Breast 12, 522-528 (2003).

16 R. W. Blamey, "Guidelines on endocrine therapy of breast cancer EUSOMA," Eur. J. Cancer 38, 615-634 (2002).

17 Kwaliteitsinstituut voor de Gezondheidszorg (CBO), Nationaal Borstkanker Overleg Nederland (NABON), and Vereniging van Integrale Kankercentra (VIKC), "Richtlijn Behandeling van het Mammacarcinoom," Utrecht: CBO (2002).

Introduction 19

General introduction 19

18 Kwaliteitsinstituut voor de Gezondheidszorg (CBO), Nationaal Borstkanker Overleg Nederland (NABON), and Vereniging van Integrale Kankercentra (VIKC), "Richtlijn Screening en Diagnostiek van het Mammacarcinoom," Utrecht: CBO (2000).

19 D. J. Hilleren, I. T. Andersson, K. Lindholm, and F. S. Linnell, "Invasive lobular carcinoma: mammographic findings in a 10-year experience," Radiology 178, 149-154 (1991).

20 K. N. Krecke and J. J. Gisvold, "Invasive lobular carcinoma of the breast: mammographic findings and extent of disease at diagnosis in 184 patients," AJR Am. J. Roentgenol. 161, 957-960 (1993).

21 H. M. Zonderland, E. G. Coerkamp, J. Hermans, M. J. van de Vijver, and A. E. van Voorthuisen, "Diagnosis of breast cancer: contribution of US as an adjunct to mammography," Radiology 213, 413-422 (1999).

22 S. H. Heywang-Kobrunner, P. Viehweg, A. Heinig, and C. Kuchler, "Contrast-enhanced MRI of the breast: accuracy, value, controversies, solutions," Eur. J. Radiol. 24, 94-108 (1997).

23 P. Viehweg, I. Paprosch, M. Strassinopoulou, and S. H. Heywang-Kobrunner, "Contrast-enhanced magnetic resonance imaging of the breast: interpretation guidelines," Top. Magn Reson. Imaging 9, 17-43 (1998).

24 H. Mumtaz, M. A. Hall-Craggs, T. Davidson, K. Walmsley, W. Thurell, M. W. Kissin, and I. Taylor, "Staging of symptomatic primary breast cancer with MR imaging," AJR Am. J. Roentgenol. 169, 417-424 (1997).

25 L. Esserman, N. Hylton, L. Yassa, J. Barclay, S. Frankel, and E. Sickles, "Utility of magnetic resonance imaging in the management of breast cancer: evidence for improved preoperative staging," J. Clin. Oncol. 17, 110-119 (1999).

26 P. J. Drew, S. Chatterjee, L. W. Turnbull, J. Read, P. J. Carleton, J. N. Fox, J. R. Monson, and M. J. Kerin, "Dynamic contrast enhanced magnetic resonance imaging of the breast is superior to triple assessment for the pre-operative detection of multifocal breast cancer," Ann. Surg. Oncol. 6, 599-603 (1999).

27 U. Fischer, L. Kopka, and E. Grabbe, "Breast carcinoma: effect of preoperative contrast-enhanced MR imaging on the therapeutic approach," Radiology 213, 881-888 (1999).

28 B. Spivack, M. M. Khanna, L. Tafra, G. Juillard, and A. E. Giuliano, "Margin status and local recurrence after breast-conserving surgery," Arch. Surg. 129, 952-956 (1994).

29 D. E. Wazer, G. Jabro, R. Ruthazer, C. Schmid, H. Safaii, and R. K. Schmidt-Ullrich, "Extent of margin positivity as a predictor for local recurrence after breast conserving irradiation," Radiat. Oncol. Investig. 7, 111-117 (1999).

20 Chapter 1

20 Section I

30 C. C. Park, M. Mitsumori, A. Nixon, A. Recht, J. Connolly, R. Gelman, B. Silver, S. Hetelekidis, A. Abner, J. R. Harris, and S. J. Schnitt, "Outcome at 8 years after breast-conserving surgery and radiation therapy for invasive breast cancer: influence of margin status and systemic therapy on local recurrence," J. Clin. Oncol. 18, 1668-1675 (2000).

31 C. Vrieling, L. Collette, A. Fourquet, W. J. Hoogenraad, J. H. Horiot, J. J. Jager, M. Pierart, P. M. Poortmans, H. Struikmans, B. Maat, E. Van Limbergen, and H. Bartelink, "The influence of patient, tumor and treatment factors on the cosmetic results after breast-conserving therapy in the EORTC 'boost vs. no boost' trial. EORTC Radiotherapy and Breast Cancer Cooperative Groups," Radiother. Oncol. 55, 219-232 (2000).

32 E. J. Rutgers, "Quality control in the locoregional treatment of breast cancer," Eur. J. Cancer 37, 447-453 (2001).

33 G. Amano, N. Ohuchi, T. Ishibashi, T. Ishida, M. Amari, and S. Satomi, "Correlation of three-dimensional magnetic resonance imaging with precise histopathological map concerning carcinoma extension in the breast," Breast Cancer Res. Treat. 60, 43-55 (2000).

34 P. L. Davis, M. J. Staiger, K. B. Harris, M. A. Ganott, J. Klementaviciene, K. S. McCarty, Jr., and H. Tobon, "Breast cancer measurements with magnetic resonance imaging, ultrasonography, and mammography," Breast Cancer Res. Treat. 37, 1-9 (1996).

35 C. Boetes, R. D. Mus, R. Holland, J. O. Barentsz, S. P. Strijk, T. Wobbes, J. H. Hendriks, and S. H. Ruys, "Breast tumors: comparative accuracy of MR imaging relative to mammography and US for demonstrating extent," Radiology 197, 743-747 (1995).

36 K. G. Gilhuijs, A. Touw, M. van Herk, and R. E. Vijlbrief, "Optimization of automatic portal image analysis," Med. Phys. 22, 1089-1099 (1995).

37 M. R. Christiaens, L. Cataliotti, I. Fentiman, E. Rutgers, M. Blichert-Toft, J. E. DeVries, H. P. Graversen, K. Vantongelen, and R. Aerts, "Comparison of the surgical procedures for breast conserving treatment of early breast cancer in seven EORTC centres," Eur. J. Cancer 32A, 1866-1875 (1996).

38 J. M. Dixon, O. Ravisekar, M. Cunningham, E. D. Anderson, T. J. Anderson, and H. K. Brown, "Factors affecting outcome of patients with impalpable breast cancer detected by breast screening," Br. J. Surg. 83, 997-1001 (1996).

39 R. Novak, "Transformation of the female breast during compression at mammography with special reference to the importance for localization of a lesion," Acta Radiol. Suppl 371, 1-47 (1988).

40 F. M. Hall and H. A. Frank, "Preoperative localization of nonpalpable breast lesions," AJR Am. J. Roentgenol. 132, 101-105 (1979).

Introduction 21

General introduction 21

41 G. von Minckwitz, S. D. Costa, W. Eiermann, J. U. Blohmer, A. H. Tulusan, C. Jackisch, and M. Kaufmann, "Maximized reduction of primary breast tumor size using preoperative chemotherapy with doxorubicin and docetaxel," J. Clin. Oncol. 17, 1999-2005 (1999).

42 J. M. Dixon, L. Renshaw, C. Bellamy, M. Stuart, G. Hoctin-Boes, and W. R. Miller, "The effects of neoadjuvant anastrozole (Arimidex) on tumor volume in postmenopausal women with breast cancer: a randomized, double-blind, single-center study," Clin. Cancer Res. 6, 2229-2235 (2000).

43 J. A. van der Hage, C. J. van de Velde, J. P. Julien, M. Tubiana-Hulin, C. Vandervelden, and L. Duchateau, "Preoperative chemotherapy in primary operable breast cancer: results from the European Organization for Research and Treatment of Cancer trial 10902," J. Clin. Oncol. 19, 4224-4237 (2001).

44 P. Therasse, L. Mauriac, M. Welnicka-Jaskiewicz, P. Bruning, T. Cufer, H. Bonnefoi, E. Tomiak, K. I. Pritchard, A. Hamilton, and M. J. Piccart, "Final results of a randomized phase III trial comparing cyclophosphamide, epirubicin, and fluorouracil with a dose-intensified epirubicin and cyclophosphamide + filgrastim as neoadjuvant treatment in locally advanced breast cancer: an EORTC-NCIC-SAKK multicenter study," J. Clin. Oncol. 21, 843-850 (2003).

45 P. Therasse, "Evaluation of response: new and standard criteria," Ann. Oncol. 13 Suppl 4, 127-129 (2002).

46 P. Therasse, "Measuring the clinical response. What does it mean?," Eur. J. Cancer 38, 1817-1823 (2002).

47 J. L. Hayward, P. P. Carbone, J. C. Heuson, S. Kumaoka, A. Segaloff, and R. D. Rubens, "Assessment of response to therapy in advanced breast cancer: a project of the Programme on Clinical Oncology of the International Union Against Cancer, Geneva, Switzerland," Cancer 39, 1289-1294 (1977).

48 M. M. Oken, R. H. Creech, D. C. Tormey, J. Horton, T. E. Davis, E. T. McFadden, and P. P. Carbone, "Toxicity and response criteria of the Eastern Cooperative Oncology Group," Am. J. Clin. Oncol. 5, 649-655 (1982).

49 S. Green and G. R. Weiss, "Southwest Oncology Group standard response criteria, endpoint definitions and toxicity criteria," Invest New Drugs 10, 239-253 (1992).

50 A. B. Miller, B. Hoogstraten, M. Staquet, and A. Winkler, "Reporting results of cancer treatment," Cancer 47, 207-214 (1981).

51 P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, R. S. Kaplan, L. Rubinstein, J. Verweij, M. Van Glabbeke, A. T. van Oosterom, M. C. Christian, and S. G. Gwyther, "New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada," J. Natl. Cancer Inst. 92, 205-216 (2000).

22 Chapter 1

22 Section I

52 J. Gurland and R. O. Johnson, "How reliable are tumor measurements?," JAMA 194, 973-978 (1965).

53 C. G. Moertel and J. A. Hanley, "The effect of measuring error on the results of therapeutic trials in advanced cancer," Cancer 38, 388-394 (1976).

54 P. T. Lavin and G. Flowerdew, "Studies in variation associated with the measurement of solid tumors," Cancer 46, 1286-1290 (1980).

55 D. Warr, S. McKinney, and I. Tannock, "Influence of measurement error on assessment of response to anticancer chemotherapy: proposal for new criteria of tumor response," J. Clin. Oncol. 2, 1040-1046 (1984).

SECTION II

Treatment selection

CHAPTER 2

Contrast-enhanced MRI in breast cancer patients eligible for breast-conserving therapy: complementary value for subgroups of patients

Eline E. Deurloo 1,2, William F. A. Klein Zeggelink 1, H. Jelle Teertstra 1, Johannes L. Peterse 3, Emiel J. Th. Rutgers 4, Sara H. Muller 1, Harry Bartelink 5, and Kenneth G. A. Gilhuijs 1

1 Department of Radiology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 2 Department of Radiology, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands 3 Department of Pathology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 4 Department of Surgery, The Netherlands Cancer Institute / Antoni van Leeuwenhoek hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 5 Department of Radiotherapy, The Netherlands Cancer Institute / Antoni van Leeuwenhoek hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

European Radiology, Vol. 16, No. 3, March 2006, pp. 692 – 701

Contrast-enhanced MRI in breast cancer patients 27

Treatment selection 27

ABSTRACT

The purpose of this study was to identify patients prior to breast-conserving therapy (BCT) who have complementary value of contrast-enhanced magnetic resonance imaging (MRI) over conventional imaging in the assessment of tumor extent.

All patients were eligible for BCT according to conventional imaging, and underwent preoperative MRI as part of this study. One hundred and sixty-five patients (166 tumors) were included. MRI was defined to have complementary value if conventional imaging underestimated or overestimated tumor extent (by more than 10 mm compared to histology) and MRI assessed the extent accurately. Logistic regression was employed to identify characteristics that are predictive of the complementary value of preoperative MRI.

MRI had complementary value in 39 cases (23%). Patients younger than 58 years with irregular lesion margins at mammography and discrepancy in tumor extent by more than 10 mm between mammography and ultrasonography had a 3.2 × higher chance of accurate assessment at MRI (positive predictive value 50%, negative predictive value 84%, p = 0.0002).

Preoperative MRI in patients eligible for BCT is more accurate than conventional imaging in the assessment of tumor extent in approximately one out of four patients. Subgroups of patients in whom MRI has complementary value may be defined by the differences in clinical and imaging features.

2.1. INTRODUCTION

Breast-conserving therapy (BCT) is increasingly performed as an alternative to mastectomy for the treatment of breast cancer. In BCT, an optimal balance has to be found between local control and cosmetic outcome; to optimize local control, the excision must be wide enough to encompass the tumor and a margin of healthy tissue 1-4. Conversely, to optimize cosmetic outcome, the excision must be as small as possible 5. The tumor is often incompletely removed in the first surgical procedure, resulting in reexcisions in up to one third of all patients eligible for BCT 6-8. Especially, tumors with a multifocal or multicentric (multinodular) growth pattern (e.g., invasive lobular carcinoma and extensive intraductal component) give rise to an increased probability of residual tumor in the breast after surgical excision 9,10.

To optimize the results of BCT and to reduce the number of surgical procedures required to remove the tumor, accurate assessment of tumor extent prior to surgery is necessary. Underestimation of the tumor extent may result in incomplete excisions, whereas overestimation of the tumor extent may result in unnecessarily large excision volumes. Tumor extent is currently assessed preoperatively by conventional imaging (mammography and ultrasonography). Conventional imaging fails, however, to

28 Chapter 2

28 Section II

accurately assess tumor extent in 35% – 40% of patients who are considered to be eligible for BCT 11; especially diffuse or multinodular growth patterns are frequently underestimated. In addition, breast cancer in young patients may lead to problems in the assessment of tumor size at mammography due to dense breast parenchyma 12.

Magnetic resonance imaging (MRI) has a high sensitivity for invasive cancer 13,14 and is reported to be more accurate in defining multinodular disease than conventional imaging, both in the ipsilateral and contralateral breast 15-28. MRI detects multinodular disease in 16% – 38% of patients with breast cancer in whom palpation and conventional imaging only shows a unifocal tumor. Furthermore, several studies showed that MRI is more accurate in assessing the extent of unifocal tumors than conventional imaging 15,17,18,29-31. On the other hand, it has also been shown that MRI offers no additional value compared to conventional imaging in up to 80% of patients with breast cancer 32,33. If subgroups of patients in whom MRI has complementary value could be identified beforehand by a combination of clinical and imaging characteristics, preoperative MRI in patients in whom MRI is unlikely to have complementary value could be avoided.

The purpose of this study was to determine how many patients, and which patients, eligible for BCT are more likely to have an inaccurate assessment of tumor extent at conventional imaging and accurate assessment of extent at preoperative MRI.

2.2. MATERIALS AND METHODS

2.2.1. Patients and imaging

This prospective clinical study was performed after approval from the institutional review board. All patients were enrolled from a clinical breast unit and gave written informed consent. Patients were included if they had proven breast cancer (by fine-needle aspiration or core biopsy), and were considered eligible for BCT by a multidisciplinary team of breast cancer specialists, based on clinical examination and conventional imaging. All patients underwent mammography and ultrasonography as part of the clinical workup of breast cancer. In addition to the regular clinical workup, preoperative MRI was performed in all patients included in this study. Between March 1999 and October 2003, 165 patients with 166 malignant tumors (one patient with a bilateral tumor confirmed at MRI) were included.

Mammography was performed using a Trex Lorad M-IV (Trex Medical Corporation, Lorad Division, Danbury, CT) or a Philips Mammo Diagnost 3000 (Philips Medical Systems AG, Hamburg, Germany) and Agfa HDR Film (Agfa Gevaert NV, Mortsel, Belgium). Both breasts were imaged in the craniocaudal and medio-lateral oblique directions. The largest mammographical diameter from these views was measured and corrected for the magnification factor. Ultrasonography was performed with a GE Voluson 730 (GE Medical Systems, Milwaukee, WI) or with a Siemens Sonoline Elegra (Siemens Medical Systems AG, Erlangen, Germany). The direction in which the diameter

Contrast-enhanced MRI in breast cancer patients 29

Treatment selection 29

of a lesion was largest was searched. The diameter was measured and the probe was turned 90° to measure the largest diameter perpendicular to this direction. All measurements were printed on hard copies.

MRI was performed with a 1.5 T Siemens Magnetom scanner (Siemens Medical Systems AG, Erlangen, Germany) using a coronal fast low-angle shot three-dimensional (FLASH 3D) technique. Both breasts were imaged with the patient in prone orientation, using a dedicated phased-array bilateral breast coil. One series was acquired before and four series after intravenous injection of contrast agent (Prohance, Bracco-Byk Gulden, Konstanz, Germany; 0.1 mmol/kg body weight, at a rate of 2 – 4 ml/s). The series were acquired at intervals of approximately 120 s to achieve theoretically optimal time points to describe contrast uptake in the lesion 34-37 and to achieve isotropic voxel sizes. The following MRI parameters were used: T1-weighted sequence, repetition time (TR) 8.1 ms, echo time (TE) 4.0 ms, reconstructed in-plane matrix 256 × 256 pixels, isotropic in-plane resolution 1.35 × 1.35 mm2, slice thickness 1.35 mm, no fat suppression. Subtraction images were reformatted and displayed in three perpendicular planes (coronal, transversal, and sagital) on a viewing station to examine initial and late enhancement 38,39. The largest tumor diameter at MRI was measured on the initial-enhancement images in the coronal, transverse, and sagital directions by a senior radiologist experienced in breast MRI. If more than one suspicious lesion was visible, the total area, including normal breast tissue between the lesions, was measured. This was also done for all multicentric lesions, independent of the distance between them.

The results of conventional imaging and MRI were discussed by a multidisciplinary team of breast cancer specialists, consisting of breast surgeons, radiotherapists, pathologists, and radiologists, all specialized in breast cancer diagnosis and treatment. Based on the results of all three imaging modalities, it was decided by this team whether the patient was advised to undergo BCT or mastectomy. For this decision, previously published guidelines were used for reference 40,41. Typically, large breast tumors in relatively small breasts were not considered eligible for BCT. The surgical excision was performed by taking the result of the MR examination into account. Changes in the surgical plan were based on pathology-proven findings only.

2.2.2. Histology

All excision specimens were handled at the pathology department according to a standard protocol based on correlated radiographic and pathologic assessment comparable to the approach of Egan 42. Briefly, specimens were sliced into 5 mm parallel slices and a radiograph of the slices was obtained. Complete cross-sections of the tumor, including surrounding breast tissue and nearest margins, were sampled, and all additional grossly or radiologically suspicious areas (as defined by a radiologist and a pathologist experienced in breast pathology) were sampled for microscopic examination. The total microscopic extent of the tumor cells as measured by the pathologist in or across slices was used as the gold truth.

30 Chapter 2

30 Section II

2.2.3. Complementary value of MRI to determine tumor extent

2.2.3.1. All lesions

Tumor extent at conventional imaging and that at MRI were compared with tumor extent at histology.

A senior radiologist experienced in breast imaging reviewed all mammograms and ultrasound images. The radiologist was told that all images were of patients with breast cancer; no additional information was given. The mammograms were evaluated to assess the breast density (almost entirely fat or scattered fibroglandular tissue (BI-RADS breast density categories 1 or 2) / heterogeneous dense or extremely dense (BI-RADS breast density categories 3 or 4)) and the presence of suspicious microcalcifications in or around the lesion. Shape (irregular / not irregular), margin (spiculated or ill-defined / not spiculated or ill-defined), the mammographic largest diameter, and the largest diameter including spiculae and/or suspicious microcalcifications were determined in the case of a mass lesion. The ultrasound images were also evaluated. Shape (irregular / not irregular), margin (sharp / narrow irregular / broad irregular), orientation (parallel to the skin / not parallel to the skin), posterior acoustic phenomenon (none or enhancement / attenuation / shadowing), and the ultrasonographically largest diameter were assessed in the case of a mass lesion. Lesions that showed no abnormalities or an architectural distortion at mammography or at ultrasonography were defined as non-measurable at that modality and no measurements of tumor extent were made. Tumor extent at conventional imaging was defined as the largest diameter of the tumor at ultrasonography or at mammography when the diameter of the tumor, including suspicious microcalcifications, exceeded that at ultrasonography.

A pathologist experienced in breast pathology reviewed the histological slides of the 166 tumors. The largest diameter of the invasive tumor, largest diameter including ductal-in-situ components, histological type and grade, as well as the growth pattern of the lesion (unifocal, diffuse or multinodular) were assessed. Tumor extent at histology was defined as the largest diameter of the area involved by the tumor, including additional tumor foci, ductal-in-situ components, and healthy tissue between malignant lesions measured in and across slices. If a reexcision or a mastectomy was performed after the initial breast-conserving surgery, the total area involved by tumor cells (including healthy tissue between two malignant lesions) was reconstructed from both excisions.

Based on statistical analysis of random variations 43, a difference in tumor extent of more than 10 mm between imaging and pathology was defined as a significant discrepancy, indicating underestimation or overestimation of preoperative imaging. Cases that were histologically proven to be multinodular, but which were unifocal with significantly smaller size at imaging, were defined as underestimated by imaging. Lesions that were non-measurable at imaging were also considered to be underestimated at that modality.

Contrast-enhanced MRI in breast cancer patients 31

Treatment selection 31

MRI was defined to have complementary value if conventional imaging underestimated or overestimated the tumor extent, and the extent at MRI agreed with the tumor extent at histology. MRI did not have complementary value to assess tumor extent if: (1) conventional imaging and MRI both accurately assessed tumor extent; (2) conventional imaging and MRI both underestimated or overestimated tumor extent; (3) conventional imaging accurately assessed tumor extent, while MRI underestimated or overestimated the tumor extent.

2.2.3.2. Measurable lesion at both mammography and ultrasonography

For patients with a measurable lesion at both mammography and ultrasonography, analyses were performed to determine which clinical and imaging characteristics were associated with complementary value of preoperative MRI. Analyzed patient/tumor characteristics were: age and tumor type (invasive lobular carcinoma / no invasive lobular carcinoma). Mammographical characteristics were: breast density, shape, margin, presence of suspicious microcalcifications, and largest diameter at mammography. Ultrasonographical characteristics were: shape, margin, orientation, edge shadows, posterior shadowing, and largest diameter. In addition, the absolute value of the discrepancy between the tumor extent measured at mammography (including spiculae and suspicious microcalcifications) and that at ultrasonography was analyzed.

2.2.4. Statistical analysis

SPSS 10.0 and MATLAB 6.0 R12 were used for all analyses. A p value of 0.05 was defined as a significant test result. Univariate analyses were performed by t tests for continuous, normally distributed characteristics, by Mann-Whitney U exact tests for non-normally distributed continuous characteristics, and by Fisher’s exact tests for binomial categorical characteristics. The McNemar test for paired proportions was used to compare the accuracy of MRI with that of conventional imaging.

For multivariate analysis, logistic regression with feature selection by double cross-validation was employed, e.g., 44. This approach aims at obtaining a combination of characteristics that is unlikely to be predictive by chance. The analysis consists of the validation in an inner loop and in an outer loop (double validation). In the outer loop, each case was left out consecutively. In the inner loop, 100 ten-fold cross-validations were performed for each feature combination. In each cross-validation, a logistic model was build from the training set and the area (Az) under the receiver operating characteristic (ROC) curve was used to quantify the performance of the model in the test set 45. The feature combination that yielded the best performance was retrained on all cases in the inner loop and subsequently applied to the left-out case in the outer loop to produce a posterior probability of "complementary value of MRI". This procedure was repeated until all cases were tested in the outer loop. The feature combination selected the

32 Chapter 2

32 Section II

most frequently was chosen to be the predictive model and the performance of this model was quantified by the area under the ROC curve obtained from the posterior probabilities assigned to each case. Note that an Az value of 1.0 indicates perfect performance to distinguish between patients who have complementary value of MRI to define tumor extent and those who do not.

2.3. RESULTS

2.3.1. Patients and findings

The mean age of the patients was 55 years (range 28 – 86 years). Breast density at mammography was almost entirely fat in 10 (6%), showed scattered fibroglandular tissue in 65 (39%), was heterogeneously dense in 79 (48%), and was extremely dense in 12 cases (7%). A suspicious abnormality was seen at mammography in 158 cases (95%), and at ultrasonography in 159 cases (96%). Thirty-one tumors (19%) were non-measurable in at least one of the two conventional imaging modalities: three lesions were non-measurable at both mammography and ultrasonography, 18 lesions were non-measurable only at mammography, and 10 were non-measurable at ultrasonography only.

MRI showed all but one of the 166 tumors (99%) and detected a contralateral malignancy in three patients (2%). In 22 patients, MRI detected at least one additional benign lesion (13%) 46. These were pathology-proven prior to surgery by ultrasound-guided biopsy or had a low probability of malignancy based on their MRI features and were followed up

46.

One hundred and thirty-five patients were eligible for BCT based on conventional imaging and MRI. The remaining 31 patients underwent mastectomy due to the larger tumor extent found at MRI. All patients who underwent mastectomy did indeed have a larger tumor extent at histological examination.

The histological type of the 166 tumors was invasive ductal carcinoma in 138 (83%), invasive lobular carcinoma in 25 (15%), and ductal carcinoma in situ (DCIS) in three cases (2%) (two with microinvasion and one pure DCIS). Twenty-three tumors had a multinodular or diffuse growth pattern (14%) and 143 tumors were unifocal (86%). The median histological tumor size of these unifocal lesions was 16 mm (range 5 – 80 mm). The excision of six invasive cancers was focally incomplete (4%). Seventeen of the 163 invasive tumors were associated with an extensive intraductal component (10%). In eight of these cases, the excision was extensively incomplete for the in-situ component (47%), in one case, it was focally incomplete (6%), and the remaining eight cases were radically excised (47%). One hundred and thirty-two patients had clinically and ultrasonographically tumor-negative axillary lymph nodes and underwent a sentinel node procedure. Thirty-three patients had tumor-positive axillary lymph nodes proven preoperatively and underwent an axillary lymph node dissection. Ninety-two patients (55%) had a histologically tumor-negative axilla, and 74 (45%) had a tumor-positive axilla.

Contrast-enhanced MRI in breast cancer patients 33

Treatment selection 33

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

<...,-15> [-15, -5> [-5, 5> [5, 15> [15, ...>

Difference between tumor extent at conventional imaging and measured at histology (mm)

Num

ber o

f les

ions

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

<...,-15> [-15, -5> [-5, 5> [5, 15> [15, ...>

Difference between tumor extent at conventional imaging and measured at histology (mm)

Num

ber o

f les

ions

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

Difference between tumor extent at MRI and measured at histology (mm)

Num

ber o

f les

ions

<...,-15> [-15, -5> [-5, 5> [5, 15> [15, ...>0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

Difference between tumor extent at MRI and measured at histology (mm)

Num

ber o

f les

ions

<...,-15> [-15, -5> [-5, 5> [5, 15> [15, ...>

2.3.2. Complementary value of MRI to determine tumor extent

2.3.2.1. All lesions

Figs. 2.1 and 2.2 show the difference between the tumor extent measured at preoperative imaging and at histology. Table 2.1 shows the precision of measuring the tumor extent at conventional imaging compared to MRI. Conventional imaging assessed tumor extent correctly in 117 of 166 cases (70%), underestimated extent in 42 (25%), and overestimated it in seven cases (4%). MRI accurately assessed tumor extent in 150 of 166 cases (90%), underestimated extent in ten (6%), and overestimated it in six cases (4%). MRI showed tumor extent significantly more often correct than conventional imaging (90% and 70%, respectively; p < 0.001). Conventional imaging and MRI were both accurate in assessing tumor extent in 67% of the cases (111 / 166). Fig. 2.1. Difference between tumor extent measured at conventional imaging and measured at histology (N = 166). A difference of –15 means that tumor extent is underestimated at conventional imaging by 15 mm compared to histology measurements. Fig. 2.2. Difference between tumor extent measured at MRI and measured at histology (N = 166). A difference of –15 means that tumor extent is underestimated at MRI by 15 mm compared to histology measurements.

34 Chapter 2

34 Section II

Table 2.1. Tumor extent measured at conventional imaging and at MRI. Underestimation and overestimation are defined as a difference between the measurement of tumor extent at imaging and that at histology of less than –10 mm or more than 10 mm, respectively.

MRI

Conventional imaging Correct Underestimation Overestimation Total

Correct 111 2 4 117

Underestimation 33* 8 1 42

Overestimation 6* 0 1 7

Total 150 10 6 166

*Together, these 39 cases showed complementary value of MRI to assess tumor extent.

In thirty-nine of the 166 proven breast cancers, MRI had complementary value to defining the tumor extent (23%). Conventional imaging underestimated the extent in 33 of these cases (85%) and overestimated extent it in six (15%). Eleven of the 31 lesions that were non-measurable at mammography or at ultrasonography (35%) and 28 of the 135 measurable lesions at conventional imaging (21%) had complementary value of MRI (p = 0.10).

In eight of the 166 cases, the tumor extent was underestimated at conventional imaging but also at MRI (5%). Mean underestimation of these cases was 19.1 mm (range 15.6 – 26.1 mm) at conventional imaging and 16.5 mm (range 11.0 – 23.8 mm) at MRI. Seven of the eight underestimated cases were associated with an extensive intraductal component (88%). In four of these cases, no microcalcifications were visible outside the lesion at mammography.

In six of the 166 cases (4%), conventional imaging accurately assessed tumor extent, but MRI underestimated (1%) or overestimated (2%) tumor extent. Mean overestimation at MRI was 11.4 mm (range 10.5 – 12.0 mm). No histopathologic correlation could be found for this overestimation.

2.3.2.2. Measurable lesion at both mammography and ultrasonography

Table 2.2 shows the accuracy of the measurement of tumor extent at conventional imaging and at MRI for the 135 patients with a measurable lesion at both mammography and ultrasonography. In this subset of patients, 28 of 135 patients had complementary value of preoperative MRI in the assessment of tumor extent (21%). Seven of the 135 cases were underestimated both at conventional imaging and at MRI (5%). Four of the 135 cases were overestimated at MRI but not at conventional imaging (3%).

Contrast-enhanced MRI in breast cancer patients 35

Treatment selection 35

Table 2.2. Tumor extent measured at conventional imaging and at MRI for 135 lesions eligible for breast-conserving therapy and a measurable lesion at both mammography and ultrasonography.

MRI

Conventional imaging Correct Underestimation Overestimation Total

Correct 93 2 4 99

Underestimation 25* 7 0 32

Overestimation 3* 0 1 4

Total 121 9 5 135

*Together, these 28 cases showed complementary value of MRI to assess tumor extent.

The results of the univariate analysis of characteristics for the 135 lesions are shown in Table 2.3. Two characteristics were found to be significantly associated with complementary value of MRI over conventional imaging to define tumor extent: an irregular shape at mammography and a large difference in the measurement of tumor extent between mammography and ultrasonography.

In multivariate analysis, a subset of three characteristics was found to be associated with inaccurate assessment of tumor extent at conventional imaging and accurate assessment at MRI: age, margin of the lesion at mammography, and discrepancy in tumor extent (including spiculae and suspicious microcalcifications) between mammography and ultrasonography. Patients younger than 58 years with irregular or spiculated lesion margins at mammography and a discrepancy in extent at mammography and ultrasonography of at least 10 mm had a 50% probability of complementary value of MRI over conventional imaging versus 16% in the remaining patients (3.2 × higher probability, positive predictive value 50%, negative predictive value 84%). The area under the ROC curve is 0.78 in our study population and 0.73 after double validation, indicating significant improvement over random selection (p = 0.0002).

Figs. 2.3 and 2.4 show two examples of the complementary value of MRI.

36 Section II

36 Chapter 2 Ta

ble

2.3.

Uni

varia

te te

stin

g of

clin

ical

, mam

mog

raph

ical

and

ultr

ason

ogra

phic

al c

hara

cter

istic

s fo

r 135

pat

ient

s w

ith a

mea

sura

ble

lesi

on a

t bot

h m

amm

ogra

phy

and

ultra

sono

grap

hy

to

iden

tify

char

acte

ristic

s th

at

occu

r m

ore

frequ

ently

in

pa

tient

s in

w

hom

pr

eope

rativ

e M

RI

has

com

plem

enta

ry v

alue

ove

r con

vent

iona

l im

agin

g to

ass

ess

tum

or e

xten

t.

Mod

ality

C

hara

cter

istic

p

valu

e

Clin

ic

Age

0.

35

Tu

mor

type

0.

12

Mam

mog

raph

y B

reas

t den

sity

0.

83

S

hape

of t

he le

sion

0.

04

M

argi

n of

the

lesi

on

0.12

S

uspi

ciou

s ca

lcifi

catio

ns p

rese

nt

1.00

La

rges

t dia

met

er

0.10

Ultr

ason

ogra

phy

Sha

pe o

f the

lesi

on

0.51

M

argi

n of

the

lesi

on

0.74

O

rient

atio

n 0.

50

E

dge

shad

ows

0.81

P

oste

rior s

hado

win

g 0.

47

La

rges

t dia

met

er

0.86

Con

vent

iona

l im

agin

g A

bsol

ute

disc

repa

ncy

betw

een

tum

or e

xten

t at m

amm

ogra

phy

(incl

udin

g sp

icul

atio

ns a

nd s

uspi

ciou

s m

icro

calc

ifica

tions

) and

that

at u

ltras

onog

raph

y <

0.00

1

Treatment selection 37

Contrast-enhanced MRI in breast cancer patients 37

Fig.

2.3

. P

atie

nt o

f 43

yea

rs w

ith a

pal

pabl

e m

ass

in t

he le

ft br

east

. A

: M

edio

-late

ral

obliq

ue m

amm

ogra

phy

show

s a

spic

ulat

ed l

esio

n. T

he

diam

eter

of t

he le

sion

incl

udin

g sp

icul

ae w

as 4

7 m

m. B

: Ultr

ason

ogra

phy

of th

e sa

me

lesi

on. T

he la

rges

t dia

met

er m

easu

red

at u

ltras

onog

raph

y w

as 2

2 m

m.

C: S

agita

l sub

tract

ion

MR

imag

e of

the

sam

e br

east

. M

RI

show

s an

add

ition

al le

sion

loca

ted

near

the

nip

ple

(arr

ow).

This

lesi

on

turn

ed o

ut to

be

mal

igna

nt.

A

B

C

A

B

C

38 Section II

38 Chapter 2

Fig.

2.4

. P

atie

nt o

f 39

yea

rs w

ith a

pal

pabl

e m

ass

in t

he le

ft br

east

. A

: M

amm

ogra

phy

in c

rani

ocau

dal d

irect

ion

show

s tw

o sp

icul

ated

lesi

ons

(arr

ows)

. Th

e di

amet

er o

f th

e to

tal

area

was

37

mm

. B

: U

ltras

onog

raph

y of

one

of

the

two

lesi

ons

(the

seco

nd l

esio

n w

as a

lso

visi

ble

at

ultra

sono

grap

hy).

C a

nd D

: Sag

ital s

ubtra

ctio

n M

R im

age

and

max

imum

inte

nsity

pro

ject

ion

of th

e sa

me

brea

st. M

RI s

how

s m

ultip

le a

dditi

onal

le

sion

s th

roug

hout

the

who

le b

reas

t. A

t his

tolo

gy, e

xten

sive

duc

tal c

arci

nom

a in

situ

was

foun

d w

ith m

ultip

le a

reas

of m

icro

inva

sion

.

A

BC D

A

BC D

Contrast-enhanced MRI in breast cancer patients 39

Treatment selection 39

2.4. DISCUSSION

MRI of the breast is reported to have a high sensitivity for the detection of breast cancer and to be more accurate in defining tumor extent than conventional imaging 13-28. The purpose of the current study was to identify subgroups of patients prior to BCT in whom tumor extent is inaccurately assessed at conventional imaging and accurately assessed at MRI. In the majority of patients where MRI was more beneficial to assess tumor extent than conventional imaging, the extent was underestimated at conventional imaging (85%); in the remaining cases, it was overestimated. Underestimation of tumor extent by conventional imaging may result in incomplete excisions, leading to reexcisions or higher local recurrence rates 4,47-49. Overestimation of tumor extent at conventional imaging may result in excisions that are larger than required, leading to a poor cosmetic outcome 50,51. Accurate assessment by MRI may guide the surgeon to completely excise the tumor in smaller excision volumes and in less surgical procedures. There is the potential for complementary value in one out of every four patients. In most cases, however, MRI does not provide additional information on tumor extent, and it would, therefore, be useful to know in which patients MRI is more likely to have complementary value. The current study showed that younger patients with spiculated or irregular tumor margins and a large discrepancy in tumor extent measured at mammography and at ultrasonography had a 3.2 × higher chance of inaccurate assessment of tumor extent at conventional imaging and correct assessment by MRI than the remaining patients. This finding needs, however, to be further validated in a prospective setting.

In addition to the 23% of cases in which MRI was accurate in the assessment of tumor extent and conventional imaging was not, MRI was inaccurate in 4%, while conventional imaging was correct. No reason could be found for this observation. The reported incidence of the overestimation of tumor extent at MRI ranges between 2% 18 and 39% 52. This wide range is caused by the different criteria that were used to define overestimation in these studies. A difference between two measurements of more than 50% was used in one study 18, while in another study, the used criterion is not clear 52. Possibly, any positive difference between two measurements was used to indicate overestimation, which could explain the high percentage of overestimation by MRI in that study. In the current study, a difference in tumor extent between imaging and pathology of more than 10 mm was defined as a significant difference. This value was used because our surgeons employ a safety margin of 10 mm around the tumor in BCT and this margin may compensate for an underestimation of the extent at imaging of up to 10 mm compared to histology. A recently published study provides guidelines to determine which differences between measurements of tumor extent at histology and at radiology should be attributed to random measurement variation and which to actual underlying differences 43. Applying these guidelines to the current study yields probabilities of incorrectly defining a difference in measurements as a true discrepancy at the 10 mm threshold of only 1.6%, 1.2%, and 0.9% for mammography, ultrasonography, and MRI, respectively.

40 Chapter 2

40 Section II

In 73% of patients, conventional imaging and MRI were both accurate in defining tumor extent or were both inaccurate. In the majority of the lesions that were underestimated at conventional imaging as well as at MRI, multinodular disease or extensive intraductal component was found at histology.

One of the drawbacks of MRI of the breast is the lower specificity: benign lesions are also detected. In the current study, MRI detected benign lesions that were occult at conventional imaging in 13% of the patients. These lesions are currently managed by correlative ultrasonography with ultrasound-guided biopsy 53, MRI-guided biopsy with dedicated biopsy coils 54-61, or a combination of radiological reading and reading by a computer 39. We emphasize that a change of treatment in the current study was only performed after obtaining histopathological proof of the addition lesion.

Several studies performed on patients with large tumors scheduled for either BCT or mastectomy showed that MRI assessed the tumor extent more accurately than conventional imaging 15,17,29-31. Conventional imaging especially underestimated the size of larger tumors, while MRI was accurate in the majority of cases. The results from these studies cannot, however, be easily extrapolated to the population of patients exclusively eligible for BCT, because patients in the latter group typically have a smaller tumor load. In the current population, MRI was significantly more often accurate in assessing tumor extent than conventional imaging (90% versus 70%). This resulted in complementary value of MRI to assess tumor extent in 23% of the patients. In patients in whom the lesion was not visible at either mammography or ultrasonography, MRI was more often of complementary value for the assessment of tumor extent than patients in whom the lesion was visible at both mammography and ultrasonography. These results suggest that MRI is of value in the preoperative workup of patients eligible for BCT and especially in patients with a (non-measurable) non-mass lesion on at least one of the imaging techniques. The trend towards increased complementary value in patients with a non-mass lesion may explain the finding that lesions presenting as architectural distortion at mammography are more often irradically excised than masses 62-64. Performing MRI in these patients may result in a more adequate assessment of tumor extent and, thus, in a wider and possibly more radical excision.

Drawbacks of performing MRI in all patients eligible for BCT in particular are the relatively high cost and the fact that complementary value only applies to a fraction of patients. Several studies have aimed at defining subgroups of patients with a higher likelihood of change of treatment 32,33,65. Again, these studies were performed in a population of patients both eligible and not eligible for BCT, and after excisional biopsy. Moreover, complementary value of MRI in these studies differs from our definition in at least one important aspect: in 6% – 62% of all patients in whom treatment was changed to a wider excision, additional biopsy or even mastectomy, the MRI findings were proven to be benign 32,33,65. Therefore, in a large proportion of patients who had their treatment changed due to MRI, MRI had a detrimental influence on patient management instead of a beneficial influence 66. In the current study, the complementary value of MRI was defined as the accurate assessment of malignant tumor extent at MRI compared with

Contrast-enhanced MRI in breast cancer patients 41

Treatment selection 41

histology in cases were conventional imaging did not correctly assess tumor extent. We underline the importance of obtaining histopathological proof of malignancy before converting therapy to mastectomy.

The percentage of DCIS in the current study (2%) is relatively low compared to the prevalence of DCIS in a general screening population of asymptomatic women (~15%). This difference may be explained in part by the inclusion of not only asymptomatic patients with lesions detected at screening, but also symptomatic patients with palpable lesions. In the first group, the percentage of DCIS is likely to be higher than in the latter group 67. In addition, not all patients with DCIS were eligible for BCT on the basis of conventional imaging, and, hence, were not included in the current study.

In conclusion, MRI in patients considered eligible for BCT has complementary value over conventional imaging to assess tumor extent in approximately one fourth of the patients. Especially in young patients with irregular lesion margins and a large discrepancy in mammographic and ultrasonographic measurements, preoperative MRI is likely to visualize tumor extent significantly more accurately than conventional imaging. How this complementary value translates to improvements in breast cancer treatment must be answered by prospective randomized trials with local failure, mortality, and cosmesis as endpoints.

ACKNOWLEDGMENTS

The authors thank A. A. M. Hart for proofing the statistical techniques. This work was financially supported by the Dutch Cancer Society, Grant No. NKI 99-2035.

REFERENCES 1 J. R. Harris, L. Botnick, W. D. Bloomer, J. T. Chaffey, and S. Hellman, "Primary

radiation therapy for early breast cancer: the experience at The Joint Center for Radiation Therapy," Int J Radiat Oncol Biol Phys 7, 1549-1552 (1981).

2 B. Zafrani, P. Vielh, A. Fourquet, V. Mosseri, J. C. Durand, R. J. Salmon, and J. R. Vilcoq, "Conservative treatment of early breast cancer: prognostic value of the ductal in situ component and other pathological variables on local control and survival. Long-term results," Eur J Cancer Clin Oncol 25, 1645-1650 (1989).

3 U. Veronesi, B. Salvadori, A. Luini, M. Greco, R. Saccozzi, M. del Vecchio, L. Mariani, S. Zurrida, and F. Rilke, "Breast conservation is a safe method in patients with small cancer of the breast. Long-term results of three randomised trials on 1,973 patients," Eur J Cancer 31A, 1574-1579 (1995).

4 H. Bartelink, J. C. Horiot, P. Poortmans, H. Struikmans, W. Van den Bogaert, I. Barillot, A. Fourquet, J. Borger, J. Jager, W. Hoogenraad, L. Collette, and M. Pierart; European Organization for Research and Treatment of Cancer Radiotherapy and Breast Cancer Groups, "Recurrence rates after treatment of breast cancer with standard

42 Chapter 2

42 Section II

radiotherapy with or without additional radiation," N Engl J Med 345, 1378-1387 (2001).

5 C. Vrieling, L. Collette, A. Fourquet, W. J. Hoogenraad, J. H. Horiot, J. J. Jager, M. Pierart, P. M. Poortmans, H. Struikmans, B. Maat, E. Van Limbergen, and H. Bartelink, "The influence of patient, tumor and treatment factors on the cosmetic results after breast-conserving therapy in the EORTC ‘boost vs. no boost’ trial. EORTC Radiotherapy and Breast Cancer Cooperative Groups," Radiother Oncol 55, 219-232 (2000).

6 M. C. Smitt, K. W. Nowels, M. J. Zdeblick, S. Jeffrey, R. W. Carlson, F. E. Stockdale, and D. R. Goffinet, "The importance of the lumpectomy surgical margin status in long-term results of breast conservation," Cancer 76, 259-267 (1995).

7 T. J. Kearney and M. Morrow, "Effect of reexcision on the success of breastconserving surgery," Ann Surg Oncol 2, 303-307 (1995).

8 P. I. Tartter, J. Kaplan, I. Bleiweiss, C. Gajdos, A. Kong, S. Ahmed, and D. Zapetti, "Lumpectomy margins, reexcision, and local recurrence of breast cancer," Am J Surg 179, 81-85 (2000).

9 S. J. Schnitt, J. L. Connolly, U. Khettry, G. Mazoujian, M. Brenner, B. Silver, A. Recht, G. Beadle, and J. R. Harris, "Pathologic findings on re-excision of the primary site in breast cancer patients considered for treatment by primary radiation therapy," Cancer 59, 675-681 (1987).

10 R. Holland, J. L. Connolly, R. Gelman, M. Mravunac, J. H. Hendriks, A. L. Verbeek, S. J. Schnitt, B. Silver, J. Boyages, and J. R. Harris, "The presence of an extensive intraductal component following a limited excision correlates with prominent residual disease in the remainder of the breast," J Clin Oncol 8, 113-118 (1990).

11 D. R. Faverly, J. H. Hendriks, and R. Holland, "Breast carcinomas of limited extent: frequency, radiologic-pathologic characteristics, and surgical margin requirements," Cancer 91, 647-659 (2001).

12 H. M. Zonderland, E. G. Coerkamp, J. Hermans, M. J. van de Vijver, and A. E. van Voorthuisen, "Diagnosis of breast cancer: contribution of US as an adjunct to mammography," Radiology 213, 413-422 (1999).

13 S. H. Heywang-Kobrunner, P. Viehweg, A. Heinig, and C. Kuchler, "Contrastenhanced MRI of the breast: accuracy, value, controversies, solutions," Eur J Radiol 24, 94-108 (1997).

14 P. Viehweg, I. Paprosch, M. Strassinopoulou, and S. H. Heywang-Kobrunner, "Contrast-enhanced magnetic resonance imaging of the breast: interpretation guidelines," Top Magn Reson Imaging 9, 17-43 (1998).

15 C. Boetes, R. D. Mus, R. Holland, J. O. Barentsz, S. P. Strijk, T. Wobbes, J. H. Hendriks, and S. H. Ruys, "Breast tumors: comparative accuracy of MR imaging

Contrast-enhanced MRI in breast cancer patients 43

Treatment selection 43

relative to mammography and US for demonstrating extent," Radiology 197, 743-747 (1995).

16 M. Van Goethem, K. Schelfout, E. Kersschot, C. Colpaert, I. Verslegers, I. Biltjes, W. A. Tjalma, J. Weyler, and A. De Schepper, "Enhancing area surrounding breast carcinoma on MR mammography: comparison with pathological examination," Eur Radiol 14, 1363-1370 (2004).

17 H. Mumtaz, M. A. Hall-Craggs, T. Davidson, K. Walmsley, W. Thurell, M. W. Kissin, and I. Taylor, "Staging of symptomatic primary breast cancer with MR imaging," AJR Am J Roentgenol 169, 417-424 (1997).

18 L. Esserman, N. Hylton, L. Yassa, J. Barclay, S. Frankel, and E. Sickles, "Utility of magnetic resonance imaging in the management of breast cancer: evidence for improved preoperative staging," J Clin Oncol 17, 110-119 (1999).

19 U. Fischer, L. Kopka, and E. Grabbe, "Breast carcinoma: effect of preoperative contrast-enhanced MR imaging on the therapeutic approach," Radiology 213, 881-888 (1999).

20 P. J. Drew, S. Chatterjee, L. W. Turnbull, J. Read, P. J. Carleton, J. N. Fox, J. R. Monson, and M. J. Kerin, "Dynamic contrast enhanced magnetic resonance imaging of the breast is superior to triple assessment for the pre-operative detection of multifocal breast cancer," Ann Surg Oncol 6, 599-603 (1999).

21 S. P. Weinstein, S. G. Orel, R. Heller, C. Reynolds, and B. Czerniecki, "MR Imaging of the Breast in Patients with Invasive Lobular Carcinoma," AJR Am J Roentgenol 176, 399-406 (2001).

22 K. Munot, B. Dall, R. Achuthan, G. Parkin, S. Lane, and K. Horgan, "Role of magnetic resonance imaging in the diagnosis and single-stage surgical resection of invasive lobular carcinoma of the breast," Br J Surg 89, 1296-1301 (2002).

23 L. Liberman, E. A. Morris, D. D. Dershaw, A. F. Abramson, and L. K. Tan, "MR imaging of the ipsilateral breast in women with percutaneously proven breast cancer," AJR Am J Roentgenol 180, 901-910 (2003).

24 P. J. Kneeshaw, L. W. Turnbull, A. Smith, and P. J. Drew, "Dynamic contrast enhanced magnetic resonance imaging aids the surgical management of invasive lobular breast cancer," Eur J Surg Oncol 29, 32-37 (2003).

25 M. L. Quan, L. Sclafani, A. S. Heerdt, J. V. Fey, E. A. Morris, and P. I. Borgen, "Magnetic resonance imaging detects unsuspected disease in patients with invasive lobular cancer," Ann Surg Oncol 10, 1048-1053 (2003).

26 U. Fischer, O. Zachariae, F. Baum, D. von Heyden, M. Funke, and T. Liersch, "The influence of preoperative MRI of the breasts on recurrence rate in patients with breast cancer," Eur Radiol 14, 1725-1731 (2004).

44 Chapter 2

44 Section II

27 K. Schelfout, M. Van Goethem, E. Kersschot, I. Verslegers, I. Biltjes, P. Leyman, C. Colpaert, L. Thienpont, J. Van den Haute, J. P. Gillardin, W. Tjalma, P. Buytaert, and A. De Schepper, "Preoperative breast MRI in patients with invasive lobular breast cancer," Eur Radiol 14, 1209-1216 (2004).

28 M. Van Goethem, K. Schelfout, L. Dijckmans, J. C. Van Der Auwera, J. Weyler, I. Verslegers, I. Biltjes, and A. De Schepper, "MR mammography in the pre-operative staging of breast cancer in patients with dense breast tissue: comparison with mammography and ultrasound," Eur Radiol 14, 809-816 (2004).

29 P. L. Davis, M. J. Staiger, K. B. Harris, M. A. Ganott, J. Klementaviciene, K. S. McCarty Jr, and H. Tobon, "Breast cancer measurements with magnetic resonance imaging, ultrasonography, and mammography," Breast Cancer Res Treat 37, 1-9 (1996).

30 T. Davidson, H. Mumtaz, M. A. Hall-Craggs, M. W. Kissin, W. Thurell, and I. Taylor, "Impact of magnetic resonance imaging in determining surgical management in breast cancer," Breast 6, 177-182 (1997).

31 G. Amano, N. Ohuchi, T. Ishibashi, T. Ishida, M. Amari, and S. Satomi, "Correlation of three-dimensional magnetic resonance imaging with precise histopathological map concerning carcinoma extension in the breast," Breast Cancer Res Treat 60, 43-55 (2000).

32 J. E. Tan, S. G. Orel, M. D. Schnall, D. J. Schultz, and L. J. Solin, "Role of magnetic resonance imaging and magnetic resonance imaging-guided surgery in the evaluation of patients with early-stage breast cancer for breast conservation treatment," Am J Clin Oncol 22, 414-418 (1999).

33 I. Bedrosian, R. Mick, S. G. Orel, M. Schnall, C. Reynolds, F. R. Spitz, L. S. Callans, G. P. Buzby, E. F. Rosato, D. L. Fraker, and B. J. Czerniecki, "Changes in the surgical management of patients with breast carcinoma based on preoperative magnetic resonance imaging," Cancer 98, 468-473 (2003).

34 F. Sardanelli, A. Iozzelli, A. Fausto, A. Carriero, and M. A. Kirchin, "Gadobenate dimeglumine-enhanced MR imaging breast vascular maps: association between invasive cancer and ipsilateral increased vascularity," Radiology 235, 791-797 (2005).

35 F. Kelcz, E. Furman-Haran, D. Grobgeld, and H. Degani, "Clinical testing of high-spatial-resolution parametric contrast-enhanced MR imaging of the breast," AJR Am J Roentgenol 179, 1485-1492 (2002).

36 L. W. Nunes, S. A. Englander, R. Charafeddine, and M. D. Schnall, "Optimal post-contrast timing of breast MR image acquisition for architectural feature analysis," J Magn Reson Imaging 16, 42-50 (2002).

37 T. W. Vomweg, A. Teifke, R. P. Kunz, C. Hintze, A. Hlawatsch, A. Kern, K. F. Kreitner, and M. Thelen, "Combination of low and high resolution sequences in two

Contrast-enhanced MRI in breast cancer patients 45

Treatment selection 45

orientations for dynamic contrast-enhanced MRI of the breast: more than a compromise," Eur Radiol 14, 1732-1742 (2004).

38 K. G. Gilhuijs, E. E. Deurloo, S. H. Muller, J. L. Peterse, and L. J. Schultze Kool, "Breast MR imaging in women at increased lifetime risk of breast cancer: clinical system for computerized assessment of breast lesions initial results," Radiology 225, 907-916 (2002).

39 E. E. Deurloo, S. H. Muller, J. L. Peterse, A. P. Besnard, and K. G. Gilhuijs, "Clinically and mammographically occult breast lesions on MR images: potential effect of computerized assessment on clinical reading," Radiology 234, 693-701 (2005).

40 J. A. van Dongen, A. C. Voogd, I. S. Fentiman, C. Legrand, R. J. Sylvester, D. Tong, E. van der Schueren, P. A. Helle, K. van Zijl, and H. Bartelink, "Long-term results of a randomized trial comparing breast conserving therapy with mastectomy: European Organization for Research and Treatment of Cancer 10801 trial," J Natl Cancer Inst 92, 1143-1150 (2000).

41 H. Bartelink, J. H. Borger, J. A. van Dongen, and J. L. Peterse, "The impact of tumor size and histology on local control after breast-conserving therapy," Radiother Oncol 11, 297-303 (1988).

42 R. L. Egan, "Multicentric breast carcinomas: clinical-radiographic-pathologic whole organ studies and 10-year survival," Cancer 49, 1123-1130 (1982).

43 W. F. Klein Zeggelink, E. E. Deurloo, H. Bartelink, E. J. Rutgers, and K. G. Gilhuijs, "Reproducibility of the assessment of tumor extent in the breast using multiple image modalities," Med Phys 30, 2919-2926 (2003).

44 E. E. Ntzani and J. P. Ioannidis, "Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment," Lancet 362, 1439-1444 (2003).

45 C. E. Metz, "ROC methodology in radiologic imaging," Invest Radiol 21, 720-733 (1986).

46 E. E. Deurloo, J. L. Peterse, E. J. Rutgers, A. P. Besnard, S. H. Muller, and K. G. Gilhuijs, "Additional breast lesions in patients eligible for breast-conserving therapy by MRI: impact on preoperative management and potential benefit of computerised analysis," Eur J Cancer 41, 1393-1401 (2005).

47 C. C. Park, M. Mitsumori, A. Nixon, A. Recht, J. Connolly, R. Gelman, B. Silver, S. Hetelekidis, A. Abner, J. R. Harris, and S. J. Schnitt, "Outcome at 8 years after breast-conserving surgery and radiation therapy for invasive breast cancer: influence of margin status and systemic therapy on local recurrence," J Clin Oncol 18, 1668-1675 (2000).

48 J. M. Kurtz, R. Amalric, H. Brandone, Y. Ayme, J. Jacquemier, J. C. Pietra, D. Hans, J. F. Pollet, C. Bressac, and J. M. Spitalier, "Local recurrence after breastconserving

46 Chapter 2

46 Section II

surgery and radiotherapy. Frequency, time course, and prognosis," Cancer 63, 1912-1917 (1989).

49 A. C. Voogd, G. van Tienhoven, H. L. Peterse, M. A. Crommelin, E. J. Rutgers, C. J. van de Velde, B. N. van Geel, A. Slot, P. T. Rodrigus, J. J. Jobsen, M. F. von Meyenfeldt, and J. W. Coebergh, "Local recurrence after breast conservation therapy for early stage breast carcinoma: detection, treatment, and outcome in 266 patients. Dutch Study Group on Local Recurrence after Breast Conservation (BORST)," Cancer 85, 437-446 (1999).

50 U. Veronesi, A. Luini, V. Galimberti, and S. Zurrida, "Conservation approaches for the management of stage I/II carcinoma of the breast: Milan Cancer Institute trials," World J Surg 18, 70-75 (1994).

51 D. E. Wazer, T. DiPetrillo, R. Schmidt-Ullrich, L. Weld, T. J. Smith, D. J. Marchant, and N. J. Robert, "Factors influencing cosmetic outcome and complication risk after conservative surgery and radiotherapy for early-stage breast carcinoma," J Clin Oncol 10, 356-363 (1992).

52 T. Hata, H. Takahashi, K. Watanabe, M. Takahashi, K. Taguchi, T. Itoh, and S. Todo, "Magnetic resonance imaging for preoperative evaluation of breast cancer: a comparative study with mammography and ultrasonography," J Am Coll Surg 198, 190-197 (2004).

53 L. R. LaTrenta, J. H. Menell, E. A. Morris, A. F. Abramson, D. D. Dershaw, and L. Liberman, "Breast lesions detected with MR imaging: utility and histopathologic importance of identification with US," Radiology 227, 856-861 (2003).

54 C. K. Kuhl, A. Elevelt, C. C. Leutner, J. Gieseke, E. Pakos, and H. H. Schild, "Interventional breast MR imaging: clinical use of a stereotactic localization and biopsy device," Radiology 204, 667-675 (1997).

55 G. A. DeAngelis, R. E. Moran, L. L. Fajardo, J. P. Mugler, J. M. Christopher, and J. A. Harvey, "MRI-guided needle localization: technique," Semin Ultrasound CT MR 21, 337-350 (2000).

56 L. F. Smith, R. Henry-Tillman, A. T. Mancino, A. Johnson, M. Price Jones, K. C. Westbrook, S. Harms, and V. S. Klimberg, "Magnetic resonance imaging-guided core needle biopsy and needle localized excision of occult breast lesions," Am J Surg 182, 414-418 (2001).

57 C. K. Kuhl, N. Morakkabati, C. C. Leutner, A. Schmiedel, E. Wardelmann, and H. H. Schild, "MR imaging-guided large core (14-gauge) needle biopsy of small lesions visible at breast MR imaging alone," Radiology 220, 31-39 (2001).

58 I. Bedrosian, J. Schlencker, F. R. Spitz, S. G. Orel, D. L. Fraker, L. S. Callans, M. Schnall, C. Reynolds, and B. J. Czerniecki, "Magnetic resonance imaging guided biopsy of mammographically and clinically occult breast lesions," Ann Surg Oncol 9, 457-461 (2002).

Contrast-enhanced MRI in breast cancer patients 47

Treatment selection 47

59 E. A. Morris, L. Liberman, D. D. Dershaw, J. B. Kaplan, L. R. LaTrenta, A. F. Abramson, and D. J. Ballon, "Preoperative MR imaging-guided needle localization of breast lesions," AJR Am J Roentgenol 178, 1211-1220 (2002).

60 P. Viehweg, A. Heinig, B. Amaya, T. Alberich, M. Laniado, and S. H. Heywang-Kobrunner, "MR-guided interventional breast procedures considering vacuum biopsy in particular," Eur J Radiol 42, 32-39 (2002).

61 L. Liberman, E. A. Morris, D. D. Dershaw, C. M. Thornton, K. J. Van Zee, and L. K. Tan, "Fast MRI-guided vacuum-assisted breast biopsy: initial experience," AJR Am J Roentgenol 181, 1283-1293 (2003).

62 C. Gajdos, P. I. Tartter, I. Bleiweiss, G. Hermann, J. de Csepel, A. Estabrook, and A. W. Rademaker, "Mammographic appearance of nonpalpable breast cancer reflects pathologic characteristics," Ann Surg Oncol 235, 246-251 (2002).

63 P. J. Beron, E. M. Horwitz, A. A. Martinez, K. J. Wimbish, A. J. Levine, G. Gustafson, P. Y. Chen, J. A. Ingold, and F. A. Vicini, "Pathologic and mammographic findings predicting the adequacy of tumor excision before breast-conserving therapy," AJR Am J Roentgenol 167, 1409-1414 (1996).

64 L. Liberman, S. L. Goodstine, D. D. Dershaw, E. A. Morris, L. R. LaTrenta, A. F. Abramson, and K. J. Van Zee, "One operation after percutaneous diagnosis of nonpalpable breast cancer: frequency and associated factors," AJR Am J Roentgenol 178, 673-680 (2002).

65 G. F. Tillman, S. G. Orel, M. D. Schnall, D. J. Schultz, J. E. Tan, and L. J. Solin, "Effect of breast magnetic resonance imaging on the clinical management of women with early-stage breast carcinoma," J Clin Oncol 20, 3413-3423 (2002).

66 M. Morrow, "Magnetic resonance imaging in breast cancer: one step forward, two steps back?," JAMA 292, 2779-2780 (2004).

67 V. L. Ernster, J. Barclay, K. Kerlikowske, D. Grady, and C. Henderson, "Incidence of and treatment for ductal carcinoma in situ of the breast," JAMA 275, 913-918 (1996).

SECTION III

Treatment implementation

CHAPTER 3

Assessment of analysis-of-variance-based methods to quantify the random variations of observers in medical imaging measurements: guidelines to the investigator

William F. A. Klein Zeggelink 1, Augustinus A. M. Hart 2, and Kenneth G. A. Gilhuijs 1

1 Department of Radiology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 2 Department of Radiotherapy, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

Medical Physics, Vol. 31, No. 7, July 2004, pp. 1996 – 2007

Assessment of analysis-of-variance-based methods 53

Treatment implementation 53

ABSTRACT

The random variations of observers in medical imaging measurements negatively affect the outcome of cancer treatment, and should be taken into account during treatment by the application of safety margins that are derived from estimates of the random variations. Analysis-of-variance- (ANOVA-) based methods are the most preferable techniques to assess the true individual random variations of observers, but the number of observers and the number of cases must be taken into account to achieve meaningful results. Our aim in this study is twofold. First, to evaluate three representative ANOVA-based methods for typical numbers of observers and typical numbers of cases. Second, to establish guidelines to the investigator to determine which method, how many observers, and which number of cases are required to obtain the a priori chosen performance.

The ANOVA-based methods evaluated in this study are an established technique (pairwise differences method: PWD), a new approach providing additional statistics (residuals method: RES), and a generic technique that uses restricted maximum likelihood (REML) estimation. Monte Carlo simulations were performed to assess the performance of the ANOVA-based methods, which is expressed by their accuracy (closeness of the estimates to the truth), their precision (standard error of the estimates), and the reliability of their statistical test for the significance of a difference in the random variation of an observer between two groups of cases.

The highest accuracy is achieved using REML estimation, but for datasets of at least 50 cases or arrangements with 6 or more observers, the differences between the methods are negligible, with deviations from the truth well below ±3%. For datasets up to 100 cases, it is most beneficial to increase the number of cases to improve the precision of the estimated random variations, whereas for datasets over 100 cases, an improvement in precision is most efficiently achieved by increasing the number of observers. For datasets of at least 50 cases, the standard error ranges between 30% or less with 3 observers down to 10% or less with 8 observers, and the differences in precision between the methods are negligible. The F test (PWD) is very anticonservative and should not be used, while the t test (RES) is reliable for datasets of at least 2×50 cases evaluated by 4 or more observers. The likelihood-ratio test (REML estimation) consistently indicates the significance of a difference in the random variation of an observer between two groups of cases, regardless of the number of cases, and regardless of the number of observers.

If a statistical package to perform REML estimation is available, and the investigator feels confident using it, this is the preferred method for studies that involve less than 50 cases evaluated by less than 6 observers. Otherwise, the RES method is an excellent alternative, because of its straightforward implementation, its completeness with respect to the provided statistics, and its overall sufficient accuracy, precision, and reliability of the provided statistical test. If neither the RES method nor REML estimation can provide sufficient performance, either more observers or more cases must be included.

54 Chapter 3

54 Section III

3.1. INTRODUCTION

The random variations of observers in medical imaging measurements negatively affect the accuracy of cancer treatment. Typical examples are the random variations in the radiological assessment of tumor extent, the random variations in the measurements of patient setup in external-beam radiotherapy, and the random variations in target delineations for radiotherapy treatment planning, see, e.g., Rasch et al. 1 In this context, observers are either multiple clinicians performing measurements using a single technique (e.g., computed tomography), or multiple techniques (e.g., film-screen mammography, full-field digital mammography, ultrasonography, etc.) used by one or more clinicians.

Unlike systematic deviations, random variations are difficult to correct for, except by improving standardization of the measurement process and, occasionally, by multiple readouts from the same or different observers. Therefore, random variations should be taken into account during treatment by the application of safety margins that are derived from estimates of the random variations. The estimates of these random variations can be obtained by performing an observer study. In addition to the random variations in the overall patient group, the investigator is often also interested in differences in the extent of the random variations of observers between groups of cases, or between the observers.

A straightforward way to assess the random variations is to let the observers perform measurements on the same cases repeatedly, and calculate the standard deviations of their findings, see, e.g., Hurkmans et al. 2. The advantage of such an approach is that the calculation of the random variations is fairly uncomplicated. A disadvantage is, however, the large workload involved if all available cases are included in the repeat measurements. For that reason, the study is typically performed using a limited set of cases from the available subset of the patient population. Furthermore, a bias may be introduced due to observer recollection. Moreover, studies have shown that clinicians may act differently in a study setting (i.e., under laboratory conditions), compared with their performance during actual clinical practice, see, e.g., Rutter et al. 3

Alternatively, the random variations of observers can be obtained by defining one of the observers as the gold truth, and calculating the standard deviations of the differences between the findings of each observer and the chosen gold truth, see, e.g., Boetes et al. 4 An advantage of such an approach is that data obtained during the regular clinical workup can be used, obviating the need for repeat measurements. This allows an efficient use of the available patient data, and bias due to observer recollection is prevented. The major disadvantage is, however, that the joined variations of each observer and the defined gold truth are obtained, rather than their individual random variations. Moreover, by definition, the random variation of the chosen gold truth itself cannot be assessed.

Analysis-of-variance- (ANOVA-) based methods have the major advantage that the necessity to choose a gold standard is avoided, see, e.g., Gilhuijs et al. 5, Remeijer et al. 6, Heydorn et al. 7, Saarnak et al. 8, Ploeger et al. 9-10, and Klein Zeggelink et al. 11 Because

Assessment of analysis-of-variance-based methods 55

Treatment implementation 55

the random variations are thus calculated independently of the systematic deviations, the true individual random variation of each observer is obtained. In addition, measurements do not need to be repeated, so bias due to observer recollection cannot occur, and data that are already available from daily clinical practice can be used. The set of cases thus represents the complete available sample of the patient population, and the random variations are obtained in the actual clinical workup, i.e., not under laboratory conditions.

Considering all advantages, ANOVA-based methods are the most preferable techniques to assess the true individual random variations of observers. Nevertheless, the performance of the ANOVA-based methods is dependent on the number of observers and the number of cases in an observer study. The performance concerns their accuracy (closeness of the estimates to the truth), their precision (standard error of the estimates), and the reliability of their statistical test for the significance of a difference in the random variation of an observer between groups of cases. To our knowledge, no guidelines exist that indicate which number of cases, how many observers, and which method should be used to achieve meaningful results.

Our aim in this study is twofold. First, to assess the performance of three representative ANOVA-based methods for typical numbers of observers and typical numbers of cases. Second, to establish guidelines to the investigator to determine which method, how many observers, and which number of cases are required to obtain the a priori chosen performance.

3.2. MATERIALS AND METHODS

The three analysis-of-variance- (ANOVA-) based methods evaluated in this study are an established technique (pairwise differences method: PWD), a new approach providing additional statistics (residuals method: RES), and a generic technique that uses restricted maximum likelihood (REML) estimation. All are based on the fixed-effects model.

3.2.1. Fixed-effects model

Consider a paired-cases, paired-readers study design with K observers, each performing measurements in a dataset of N cases:

( ) jijiij mO δµ ++=, , with Kj ≤≤1 and Ni ≤≤1 . (3.1)

Here, ijO , indicates the measurement of case i by observer j ; iµ represents the true but unknown value associated with case i ; jm denotes the systematic measurement deviation from the truth of observer j ; and ( ) jiδ indicates the random measurement deviation of observer j for case i . The variation of the random measurement deviation ( ) jiδ of observer j over all N cases thus represents the random variation of observer j in the dataset, which can be expressed by either its variance or its standard deviation.

56 Chapter 3

56 Section III

The total dataset of all measurements of all N cases by all K observers is written as:

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

=

NKNN

K

K

OOO

OOOOOO

,,2,1

2,2,22,1

1,1,21,1

L

MOMM

L

L

D . (3.2)

3.2.2. Pairwise differences method

The pairwise differences (PWD) method was introduced by our group and applied in several studies, see, e.g., Gilhuijs et al. 5, Ploeger et al. 9-10, and Klein Zeggelink et al. 11 The method provides a straightforward way to calculate the random variations of observers, by examining the variations of the differences between the measurements of the observers. In the current study, we explain the PWD method for general application.

Consider the model described by Eq. (3.1) and the total dataset D of all measurements given by Eq. (3.2). For each case i , the total set of pairwise differences between the measurements of the K observers is given by:

⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜

⋅−−−⋅

−⋅−−−−⋅−−−−⋅

=

L

MMMM

L

L

L

iKiiKiiKi

iiKiiii

iiKiiii

iiKiiii

i

OOOOOO

OOOOOOOOOOOOOOOOOO

,,3,,2,,1

,3,,3,2,3,1

,2,,2,3,2,1

,1,,1,3,1,2

P . (3.3)

Note that for each pairwise difference in iP , the true but unknown value associated with case i , iµ , is eliminated. The variances of the pairwise differences in iP over all N cases in the dataset are given by:

⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜

⋅⋅

⋅⋅

=

−−−

−−−

−−−

−−−

L

MMMM

L

L

L

222

222

222

222

321

33231

22321

11312

KKK

K

K

K

OOOOOO

OOOOOO

OOOOOO

OOOOOO

σσσ

σσσσσσσσσ

V . (3.4)

Note that the variances of the pairwise differences in V are thus obtained directly from all N vectors iP that result from the observed total dataset D of all measurements. The variances of the systematic deviations jm of the observers are zero by definition, and the random measurement deviations ( ) jiδ of the observers are assumed to be uncorrelated.

Assessment of analysis-of-variance-based methods 57

Treatment implementation 57

Consequently, the only variance components that contribute to the variances of the pairwise differences in V are the variances of the random variations of the observers in R :

⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜

⋅+++⋅

+⋅++++⋅++++⋅

=

L

MMMM

L

L

L

222222

222222

222222

222222

321

33231

22321

11312

KKK

K

K

K

OOOOOO

OOOOOO

OOOOOO

OOOOOO

σσσσσσ

σσσσσσσσσσσσσσσσσσ

R . (3.5)

The variances of the random variations of the observers, 21Oσ , 2

2Oσ … 2KOσ , are obtained

from solving VR = . The least-squares solution for VR = can be found using, e.g., singular value decomposition. The standard deviations of the random variations of the observers,

1Oσ , 2Oσ …

KOσ , are obtained by taking the square root of the variances.

Tests for the significance of a difference in the random variation of an observer between two groups of cases, A and B , are performed using the F-test statistic:

2

2

,

,

Bj

Aj

O

OFσ

σ= , (3.6)

where 2, AjOσ and 2

,BjOσ denote the variances of the random variation of observer j within group A ( AN cases) and within group B ( BN cases), respectively, and 22

,, BjAj OO σσ > . Comparing the thus derived value for F against the F distribution with 1−AN numerator and 1−BN denominator degrees of freedom yields the p value for the null hypothesis that the variance of the random variation of observer j is not different between group A and group B . Note that these p values should be considered as approximate values, because the variances of the random variations of the observers in R are not obtained directly, but are derived from the observed variances of the pairwise differences in V .

3.2.3. Residuals method

We introduce a new approach that will be referred to as the residuals (RES) method. This method is a more efficient algorithm to calculate the random variations of observers, by examining the variations of the differences between the measurement of each individual observer and the average measurement of the observers. Analytical estimates for the standard errors of the estimated random variations of the observers are provided as well.

Consider the model given by Eq. (3.1) with the total dataset D of all measurements given by Eq. (3.2). For each case i , we calculate the difference between the measurement ijO , of observer j and the mean value of the measurements of the K observers for that case:

∑=

=

−=Kj

jijijij O

KOR

1,,,

1 , (3.7)

where ijR , is defined as the measurement residual of observer j for case i . Note that for each residual ijR , the true but unknown value associated with case i , iµ , is eliminated.

58 Chapter 3

58 Section III

The total set of measurement residuals of all K observers for each case i is given by:

ii AOR = , (3.8)

where A represents the matrix describing the relationship between the measurements ijO , and the measurement residuals ijR , , and iO is the vector containing the total set of

measurements of all K observers for case i :

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

iK

i

i

i

R

R

R

,

,2

,1

MR ,

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

−+−−

−−

+−

−−−

+

=

KK

KK

KKK

K

KKKK

111

111

111

L

MOMM

L

L

A ,

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

iK

i

i

i

O

O

O

,

,2

,1

MO . (3.9)

Over all N cases in the dataset, the variances of the random variations of the observers are related to the variances of the measurement residuals according to:

TVS = , (3.10) where T represents the matrix describing the relationship between the variances of the measurement residuals and the variances of the random variations of the observers, and V is the vector containing the variances of the random variations of the observers:

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

2

2

2

2

1

KR

R

R

σ

σ

σ

MS ,

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛ −

=

222

222

222

111

111

111

KK

KK

KKK

K

KKKK

L

LOMM

L

L

T ,

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

2

2

2

2

1

KO

O

O

σ

σ

σ

MV . (3.11)

Comparable to the PWD method, the variances of the systematic deviations jm of the observers are zero by definition, and the random measurement deviations ( ) jiδ of the observers are assumed to be uncorrelated. Consequently, the variances of the random variations of the observers, 2

1Oσ , 22Oσ … 2

KOσ , are obtained by: STV 1−= . (3.12)

By taking the square root of the variances, the standard deviations of the random variations of the observers,

1Oσ , 2Oσ …

KOσ , are obtained. Note that these standard deviations are identical to the standard deviations obtained by the PWD method.

Assessment of analysis-of-variance-based methods 59

Treatment implementation 59

Analytical estimates for the standard errors of the estimated random variations of the observers are obtained by calculating:

11 −−= CTTW , (3.13) where W is the variance-covariance matrix of the random variations of the observers, and C is the variance-covariance matrix of the variances of the measurement residuals, assuming a normal distribution for the observations:

⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜

⋅⋅⋅⋅⋅⋅

⋅⋅⋅

⋅⋅⋅

=

2

2

2

2

22

21

KO

O

O

σ

σ

σ

σ

σ

σ

OW ,

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

⋅⋅⋅⋅

⋅⋅⋅

⋅⋅−

⋅⋅

⋅⋅⋅−

=

12

12

12

2

2

2

2

1

N

N

N

KR

R

R

σ

σ

σ

O

C . (3.14)

For simplicity, the off-diagonal elements of C have been ignored. The diagonal elements of W are the variances of the variances of the random variations of the observers, and from these, the analytical estimates for the standard errors of the estimated random variations of the observers are obtained by the first-order Taylor expansion:

22

121

jOj

jOO

σσ σσ

σ = . (3.15)

Tests for the significance of a difference in the random variation of an observer between two groups of cases, A and B , are performed using the t-test statistic:

22

22

2,

2,

,,

BjOAjO

BjAj OOtσσ

σσ

σσ

+

−= , (3.16)

where 2,AjOσ and 2

,BjOσ denote the variances of the random variation of observer j within group A ( AN cases) and within group B ( BN cases), respectively. Furthermore, 2

2,AjOσ

σ and 2

2,BjOσ

σ denote the variances of the variances of the random variations of observer j within each group. Comparing the value of t against the t distribution with 2−+ BA NN degrees of freedom yields the p value for the null hypothesis that the variance of the random variation of observer j is not different between group A and group B .

3.2.4. Restricted maximum likelihood estimation

The third ANOVA-based method uses restricted maximum likelihood (REML) estimation to assess the random variations of the observers. The method is an iterative procedure that optimizes estimates for the fixed effects and the random variations of the observers by maximizing the restricted likelihood function. The method also provides analytical estimates for the standard errors of the estimated random variations of the observers.

60 Chapter 3

60 Section III

Consider the model given by Eq. (3.1) and the total dataset D of all measurements given by Eq. (3.2). The restricted likelihood represents the probability density of the observed dataset D , given the parameters α (the variances of the random variations 2

jOσ of the observers) and β (the true but unknown values iµ associated with the cases in the dataset and the systematic measurement deviations jm of the observers) to be estimated.

In the absence of a gold standard, it is impossible to estimate all K systematic deviations. One solution is to define one systematic deviation to be zero. The remaining 1−K systematic errors are then relative to the observer for which the systematic deviation is defined zero. The restricted likelihood is given by, e.g., Verbeke and Molenberghs 12:

( ) ( ) ( ) ( )⎭⎬⎫

⎩⎨⎧

−′

−−×′×′= ∑∏∑∑=

=

−−

==

−−N

iiii

N

iiii

N

iii

N

ii

pn XVXVXVXXXL1

1

1

2/12/1

1

1

2/1

1

2/)( ˆˆ21exp2 βYβYα iiπ , (3.17)

where:

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

=

2

2

2

2

1

KO

O

O

σ

σσ

Mα ,

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

=

iK

i

i

O

OO

,

,2

,1

MiY ,

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

−1

2

1

2

1

ˆ

K

N

m

mm

M

M

µ

µµ

β ,

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

=

2

2

2

2

1

KO

O

O

iV

σ

σσ

O. (3.18)

In addition, KNn ×= , 1−+= KNp , and iX is a matrix linking the measurements with the estimates for the fixed effects 1µ , 2µ … Nµ and 1m , 2m … 1−Km ; see Eq. (3.1).

Optimized estimates for the fixed effects and for the variances of the random variations of the observers, 2

1Oσ , 22Oσ … 2

KOσ , are obtained by maximizing the restricted likelihood function. The standard deviations of the random variations of the observers,

1Oσ , 2Oσ …

KOσ , are obtained by taking the square root of the variances. The procedure to estimate the parameters from the restricted maximum likelihood is implemented in a limited number of specialized statistical packages, such as SAS (SAS Institute Inc., Cary, NC), S-PLUS (Insightful Corp., Seattle, WA), and, recently, SPSS (SPSS Inc., Chicago, IL).

Tests for the significance of a difference in the random variation of an observer between two groups of cases, A and B , are performed using the likelihood-ratio (LR) test statistic:

( )( ) ⎟

⎟⎠

⎞⎜⎜⎝

⎛−=

1

0ln2LRHREML

HREML

LL

αα

. (3.19)

Here, ( )oHREMLL α is the restricted maximum likelihood calculated for a one-group-of-

cases model, permitting only one variance for the random variation of the observer j in the group of cases (i.e., 2

jOσ ); ( )1HREMLL α is the restricted maximum likelihood obtained

from a two-groups-of-cases model, allowing different variances for the random variation of the observer j in groups A and B (i.e., 2

, AjOσ and 2,BjOσ ). Comparing the thus derived

Assessment of analysis-of-variance-based methods 61

Treatment implementation 61

value for LR against the chi-squared distribution with one degree of freedom yields the p value for the null hypothesis that the variance of the random variation of the observer is not different between group A and group B . Note that this is the theoretically derived approximation which is commonly used in the field of statistics.

3.2.5. Monte Carlo simulations

Monte Carlo simulations were performed to assess the performance of the three ANOVA-based methods. We evaluated arrangements of 3, 4, 5, 6, 7, and 8 observers with datasets consisting of 10, 20, 50, 100, and 200 cases randomly sampled from a normal distribution. For each arrangement, 1000 repeat measurements of these cases were simulated using a standard deviation (SD) of 3.0 for the random variation of each observer. This value was chosen because it represents a realistic estimate of reproducibility in multiple studies involving medical imaging measurements, see, e.g., Gilhuijs et al. 5, Ploeger et al. 9-10, and Klein Zeggelink et al. 11 During each repetition, the standard deviations of the random variations of the observers were estimated using each method, as well as the analytical estimates for the standard errors of the estimated random variations. Separate simulations were performed to simulate the situation in which the observers have different random variations ranging evenly between 2.0 and 4.0 (1 SD). All results describing the random variations are expressed as standard deviations. Consequently, in the remainder of this paper, each further use of the expression random variation(s), should be interpreted as the standard deviation(s) of the random variation(s).

We define accuracy as the closeness of the estimates to the truth. The average deviation between the estimated random variations (observed from the simulations) and the true random variations (defined for the simulations) is used as a measure of accuracy: the smaller the average deviation, the higher the accuracy. The average deviation is expressed relative to the true random variations (defined for the simulations); the measure for accuracy is thus scale invariant and therefore generally applicable.

We define precision as the closeness of the estimates to each other. The standard error of the distribution of the estimated random variations (observed from the simulations) is used as a measure of precision: the smaller the standard error, the higher the precision. The standard error is expressed relative to the average of the estimated random variations (observed from the simulations); the measure for precision is therefore scale invariant and thus generally applicable.

The Monte Carlo simulations were repeated to assess the reliability of the statistical tests. For this purpose, the same simulation scheme was followed, but now with two groups of cases, i.e., 2×10, 2×20, 2×50, and 2×100. During each repetition, the F test (PWD method), t test (RES method), and LR test (REML estimation) were performed. After 1000 repeat simulations, the frequency at which each test indicated a probability

05.0<p was counted. Ideally, this frequency equals 5%. Smaller frequencies indicate an underestimation of significance, i.e., the test is conservative. Conversely, higher frequencies indicate an overestimation of significance, i.e., the test is anticonservative.

62 Chapter 3

62 Section III

Microsoft Excel 2000 (Microsoft Corp., Redmond, WA), Microsoft Visual Basic 6.0 (Microsoft Corp., Redmond, WA), and MATLAB 6.0.0.88 Release 12 (The MathWorks Inc., Natick, MA ) were used to implement the PWD and RES methods and to run the Monte Carlo simulations to assess their performance. SAS 8.02 Level 02M0 (SAS Institute Inc., Cary, NC) was used to run the Monte Carlo simulations to assess the performance of REML estimation.

3.3. RESULTS

In this section, the performance of the three analysis-of-variance- (ANOVA-) based methods is expressed by their accuracy, their precision, and the reliability of their statistical test for the significance of a difference in the random variation of an observer between two groups of cases.

3.3.1. Accuracy

Excellent accuracy is observed for each method when the observers have similar random variations, with deviations well below ±1%, regardless of the number of observers, the specific observer, or the number of cases. If the observers have different random variations, an overestimation of the random variation occurs for the observers with relatively small random variation, while the random variation is underestimated for the observers with relatively large random variation (Fig. 3.1). The deviations from the truth decrease, however, rapidly with an increasing number of observers and an increasing number of cases.

In summary, the highest accuracy is achieved using REML estimation, but for datasets of at least 50 cases or arrangements with 6 or more observers, the differences between the methods are negligible, with deviations from the truth well below ±3%. Nevertheless, one should be careful with the PWD and RES methods in studies with datasets of less than 50 cases evaluated by less than 6 observers, especially if one expects considerable differences in the random variations between the observers.

3.3.2. Precision

If the observers have similar random variations, the standard error of the distribution of the estimated random variations is equal for all observers in each arrangement of observers (Fig. 3.2). The standard error is inversely proportional to the square root of the number of cases and also decreases with an increasing number of observers. For a dataset of 100 cases, the standard errors range from 12% for 3 observers down to 8% for 8 observers. There are no relevant differences between the precisions of the methods.

Treatment implementation 63

Assessment of analysis-of-variance-based methods 63

Fig.

3.1

. Acc

urac

y: A

vera

ge d

evia

tion

betw

een

the

estim

ated

ran

dom

var

iatio

ns (

obse

rved

) an

d th

e tru

e ra

ndom

var

iatio

ns (

defin

ed),

expr

esse

d re

lativ

e to

the

tru

e ra

ndom

var

iatio

ns (

defin

ed);

for

the

situ

atio

n in

whi

ch th

e ob

serv

ers

have

diff

eren

t ran

dom

var

iatio

ns;

uppe

r ro

w: P

WD

and

R

ES

met

hods

; low

er ro

w: R

EM

L es

timat

ion;

& o

bser

ver 1

(sm

alle

st ra

ndom

var

iatio

n); !

obs

erve

r 2; f

obs

erve

r 3; (

obs

erve

r 4; x

obs

erve

r 5;

y o

bser

ver 6

; v o

bser

ver 7

; O o

bser

ver 8

; the

seg

men

ts b

etw

een

the

poin

ts h

ave

been

add

ed to

faci

litat

e re

adin

g of

the

grap

hs.

3 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

4 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

5 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

6 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

7 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

8 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

3 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

4 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

5 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

6 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

7 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

8 O

bse

rver

s

-25

-20

-15

-10-50510152025

050

100

150

200

Nu

mb

er o

f ca

ses

Deviation (%)

64 Chapter 3

64 Section III

Fig. 3.2. Precision: Standard error of the distribution of the estimated standard deviations of the random variations of the observers, expressed relative to the average of the estimated standard deviations of the random variations; for the situation in which the observers have similar random variations; left: PWD and RES methods; right: REML estimation; & 3 observers; ! 4 observers; f 5 observers; ( 6 observers; x 7 observers; y 8 observers; the segments between the points have been added to facilitate reading of the graphs.

When the observers have different random variations, the standard error of the distribution of the estimated random variations is no longer equal for the observers in each arrangement of observers (Fig. 3.3). The standard errors for the observers with relatively small random variation are larger than the standard errors for the observers with relatively large random variation. For a dataset of 100 cases evaluated by 3 observers, the standard error equals 21% for the observer with smallest random variation and 9% for the observer with largest random variation. The differences between the standard errors for the observers reduce, however, fast with an increasing number of observers. For a dataset of the same size evaluated by 6 observers, the standard error hardly changes for the observer with largest random variation (8% instead of 9%), but is reduced from 21% to 11% for the observer with smallest random variation. Furthermore, the precision achieved using the PWD and RES methods is a little higher than the precision obtained with REML estimation for the observers with smallest random variation. The differences decrease, however, quickly with an increasing number of observers and an increasing number of cases, and are irrelevant for datasets of at least 50 cases or arrangements with 6 or more observers.

The analytical estimates for the standard errors obtained by the RES method tend to overestimate the true standard errors for small numbers of observers and small numbers of cases. This applies to the situation in which the observers have similar random variations (deviations ~5% for 3 observers with datasets less than 50 cases), and

0

5

10

15

20

25

30

35

40

45

50

0 50 100 150 200Number of cases

Stan

dard

err

or (%

)

0

5

10

15

20

25

30

35

40

45

50

0 50 100 150 200Number of cases

Stan

dard

err

or (%

)

Treatment implementation 65

Assessment of analysis-of-variance-based methods 65

Fig.

3.3

. Pre

cisi

on: S

tand

ard

erro

r of

the

dist

ribut

ion

of th

e es

timat

ed s

tand

ard

devi

atio

ns o

f the

ran

dom

var

iatio

ns o

f the

obs

erve

rs, e

xpre

ssed

re

lativ

e to

the

ave

rage

of

the

estim

ated

sta

ndar

d de

viat

ions

of

the

rand

om v

aria

tions

; fo

r th

e si

tuat

ion

in w

hich

the

obs

erve

rs h

ave

diffe

rent

ra

ndom

var

iatio

ns; u

pper

row

: PW

D a

nd R

ES

met

hods

; low

er r

ow: R

EM

L es

timat

ion;

& o

bser

ver 1

(sm

alle

st r

ando

m v

aria

tion)

; ! o

bser

ver

2;

f o

bser

ver 3

; (

obs

erve

r 4; x

obs

erve

r 5; y

obs

erve

r 6; v

obs

erve

r 7; O

obs

erve

r 8;

the

segm

ents

bet

wee

n th

e po

ints

hav

e be

en a

dded

to

faci

litat

e re

adin

g of

the

grap

hs.

3 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

4 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

5 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

6 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

7 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

8 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

3 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

4 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

5 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

6 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

7 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

8 O

bse

rver

s

05101520253035404550

050

100

150

200

Nu

mb

er o

f ca

ses

Standard error (%)

66 Chapter 3

66 Section III

particularly to the situation in which their random variations differ (deviations ~15% for 3 observers with datasets less than 50 cases). The deviations reduce, however, fast with an increasing number of observers and an increasing number of cases, and are negligible for datasets of at least 50 cases or arrangements with 6 or more observers. For REML estimation the agreement between the analytical estimates for the standard errors and the true standard errors is almost perfect.

In summary, for datasets up to 100 cases, it is most beneficial to increase the number of cases to improve the precision of the estimated random variations, whereas for datasets over 100 cases, an improvement in precision is most efficiently achieved by increasing the number of observers. Furthermore, for datasets of at least 50 cases or arrangements with 6 or more observers, the differences in precision between the methods are irrelevant, and there is sufficient agreement between the analytical estimates for the standard errors and the true standard errors. For REML estimation the latter holds true for any number of observers and any number of cases. Differences between the precisions of observers in an arrangement of observers whose random variations differ are negligible for arrangements with 6 or more observers, regardless of the number of cases, or the method.

3.3.3. Reliability of the statistical tests

The F test is very anticonservative and the numbers of cases in the groups do not affect its performance (Fig. 3.4). The overestimation of significance decreases with an increasing number of observers, but for 8 observers with similar random variations, the test still indicates a significant difference in almost 10% of times. For the situation in which the random variations of the observers differ, the overestimation of significance is extremely high for the observer with smallest random variation, and also still present for the observer with largest random variation.

The t test is a little conservative for small numbers of cases (Fig. 3.4). There is no relevant difference in performance between the situation where the observers have different random variations, compared to the situation where they have similar random variations. The underestimation of significance reduces with an increasing number of observers and an increasing number of cases. Despite the slight conservatism, the t test reliably indicates a significant difference in the random variation of an observer between two groups of cases, for arrangements of 4 or more observers with datasets of at least 2×50 cases.

The performance of the LR test hardly changes with the number of observers, the number of cases, and whether or not the random variations of the observers in an arrangement of observers are different with respect to each other (Fig. 3.4). Furthermore, the overall conservatism is negligible. The LR test thus consistently indicates the significance of a difference in the random variation of an observer between two groups of cases.

Treatment implementation 67

Assessment of analysis-of-variance-based methods 67

Fig.

3.4

. Rel

iabi

lity

of th

e st

atis

tical

test

s fo

r th

e si

gnifi

canc

e of

a d

iffer

ence

in th

e ra

ndom

var

iatio

n of

an

obse

rver

bet

wee

n tw

o gr

oups

of c

ases

; sh

own

is th

e pe

rcen

tage

of t

imes

that

a te

st in

dica

tes

a di

ffere

nce

at th

e 95

% c

onfid

ence

leve

l (p

valu

e <0

.05)

whi

le a

true

diff

eren

ce is

abs

ent;

F te

st: O

all

obse

rver

s; &

obs

erve

r w

ith s

mal

lest

ran

dom

var

iatio

n; !

obs

erve

r w

ith la

rges

t ran

dom

var

iatio

n; t

test

and

LR

test

: & 3

obs

erve

rs;

! 4

obs

erve

rs; f

5 o

bser

vers

; ( 6

obs

erve

rs; x

7 o

bser

vers

; y 8

obs

erve

rs; t

he s

egm

ents

bet

wee

n th

e po

ints

hav

e be

en a

dded

to

faci

litat

e re

adin

g of

the

grap

hs.

F

-tes

t

Sim

ilar

ran

do

m v

aria

tio

ns

05101520253035404550

23

45

67

89

Nu

mb

er o

f o

bse

rver

s

p-value * 100 (%)

F-t

est

Dif

fere

nt

ran

do

m v

aria

tio

ns

05101520253035404550

23

45

67

89

Nu

mb

er o

f o

bse

rver

s

p-value * 100 (%)

t-te

st

Sim

ilar

ran

do

m v

aria

tio

ns

05101520253035404550

050

100

Nu

mb

er o

f ca

ses

p-value * 100 (%)

t-te

st

Dif

fere

nt

ran

do

m v

aria

tio

ns

05101520253035404550

050

100

Nu

mb

er o

f ca

ses

p-value * 100 (%)

LR

-tes

t

Sim

ilar

ran

do

m v

aria

tio

ns

05101520253035404550

050

100

Nu

mb

er o

f ca

ses

p-value * 100 (%)

LR

-tes

t

Dif

fere

nt

ran

do

m v

aria

tio

ns

05101520253035404550

050

100

Nu

mb

er o

f ca

ses

p-value * 100 (%)

68 Chapter 3

68 Section III

3.4. DISCUSSION

We assessed the performance of three representative analysis-of-variance- (ANOVA-) based methods for typical numbers of observers and typical numbers of cases. In this section, we discuss the clinical impact of random variations, the need for ANOVA-based methods, aspects of the ANOVA-based methods, and give guidelines to the investigator.

3.4.1. Clinical impact of random variations

In the radiological assessment of tumor extent, the inherent resolution limitations of the

current imaging modalities impede the ability to make precise measurements. Technical

variations from study to study, such as the slice selection variation on cross-sectional

studies, affect the reproducibility as well. Other sources of variation that limit the precision

of measurements are variations in the orientation of the tumor in the field of view, variations

in tumor interpretation, and the random variations introduced by the human observers.

Random variations in radiological tumor assessment may cause an underestimation or overestimation of tumor extent. The uncertainty may result in an incomplete resection of the tumor at surgery or in overcalling tumor regression when monitoring the response to neoadjuvant chemotherapy or hormonal therapy, see Klein Zeggelink et al. 11 Random variations in target delineation for radiotherapy treatment planning may cause underdose to the tumor and overdose to organs at risk, see Rasch et al. 1 New imaging modalities are continually integrated into treatment planning strategies. Variations in interpretation may have profound effects on patient outcome. Efforts to evaluate the variations should be made prior to integrating new modalities into treatment planning strategies.

The current clinical guidelines to monitor the response to treatment incorporate criteria for the purpose of reducing the probability of misclassification, see, e.g., Therasse et al. 13 A consistent application of these guidelines causes current methods generally to fall within acceptable limits for treatment planning and decision making. Given the current imaging modalities and measurement methodologies, the random variations in the radiological assessment of tumor extent in the breast using ultrasonography may be on the order of 4 mm (1 SD). As a consequence, the current clinical guidelines to monitor the response to neoadjuvant chemotherapy or hormonal therapy may produce approximately 10% false-positive calls for tumors between 20 and 30 mm, see Klein Zeggelink et al. 11. The introduction of new techniques based on automated volumetric measurements of tumor extent, see, e.g., Gilhuijs et al. 14, is likely to impact the current balance between true-positive calls and false-positive calls on tumor regression or tumor progression. When computer-aided volumetric measurements of tumor extent become standard techniques, issues of human observer variation would be replaced by variations in computer algorithms. The real precision of automated techniques in the daily clinical workup can, however, not be assessed by repeated application of such methods to the same images, because each time exactly the same results would be obtained. ANOVA-

Assessment of analysis-of-variance-based methods 69

Treatment implementation 69

based methods could then be used to assess the true reproducibility of automated techniques in daily clinical practice, and the quantitative knowledge of the random variations may be used to establish new criteria to monitor the response to treatment.

3.4.2. The need for ANOVA-based methods

In the medical literature, reproducibility is often expressed by (intraclass) correlation coefficients, or by kappa statistics. Such measures relate, however, to the agreement between multiple observers, or to the agreement between repeat measurements on the same cases by a single observer, and cannot be used to derive safety margins to take the random variations of observers into account during treatment. Analysis of variance (ANOVA) is required to derive appropriate margins of treatment that directly implement estimates for the individual random variations of observers in daily clinical practice.

The implementation of ANOVA models in statistical packages is primarily focused at providing statistical inferences about differences between the means of two or more groups of cases, or at predicting the value of the dependent variable as a function of the independent factors. As a consequence, it is not straightforward to obtain estimates for the individual contributors to the variances in the observed dataset, and measures for the precision of the estimates are generally not provided, nor are statistical tests to indicate a significant difference in an observer’s random variation between two groups of cases. Conversely, the dedicated ANOVA-based methods evaluated in this study directly provide estimates for the individual random variations of observers in the regular clinical workup, as well as measures for the precision of the estimates, and statistical tests for the significance of a difference in the random variation of an observer between two datasets.

3.4.3. Aspects of the ANOVA-based methods

Although the PWD method provides a straightforward way to calculate the random variations of observers, the matrices may become impractically large for studies with a large number of observers (e.g., 56 elements with 8 observers), and singular value decomposition is needed to obtain the random variations.

The proposed RES method is a more efficient algorithm, because the size of the vectors is equal to the number of observers (i.e., 8 elements with 8 observers), and the random variations are obtained using general linear algebra. The RES method also provides analytical estimates for the standard errors of the estimated random variations of the observers. The major advantage of this additional statistic is that it provides a direct measure for the specific precision with which the random variation of each individual observer is obtained, given the specific arrangement of observers and the specific dataset.

The results of this study show that REML estimation is most accurate in estimating the random variations, their standard errors, and p values in tests, irrespective of the number of observers or the number of cases, and even in the presence of heterogeneity (different random variations) between observers. The implementation of REML estimation is,

70 Chapter 3

70 Section III

however, complicated, and even if a specialized statistical package is available, attention should be paid to choose the most appropriate of the various ANOVA models offered.

The three ANOVA-based methods evaluated in this study are all based on the fixed-effects model. The major advantage of using this model is that the accuracy and precision of the estimates are independent of the variance of the true values in the dataset. As a consequence, the performance of the methods can be expressed directly as a function of the number of observers and the number of cases. It should be kept in mind, however, that the perceived random variation of an observer may result from multiple sources of variation. For instance, the random variations in the radiological assessment of tumor extent may be attributable to the human observer, and to the modality that is used to perform the measurements, see, e.g., Klein Zeggelink et al. 11 If the investigator is interested in separating the contributions of the individual sources of variation that add to the overall perceived random variation of an observer, these independent sources have to be included in the model. Such an approach may require, however, the participation of additional observers in the study, in order to obtain a model from which the contributions of the individual sources of variation can be estimated, see, e.g., Ploeger et al. 10

Basically, the three ANOVA-based methods assume independence between the random variations of observers and the size of the values of the cases, i.e., the random variations are expected to be of equal extent for both cases with small values and cases with large values. If one wishes to investigate whether or not this assumption is true, or if the investigator is interested in the random variations of observers as a continuous function of the size of the values of the cases, this may be examined by applying an ANOVA-based method together with a sliding window that spans a fixed number of cases. Care should be taken, however, to ensure that the width of the sliding window, i.e., the fixed number of cases in the subset of the measurement data that is included in the ANOVA-based method, is chosen such, that a sufficiently high accuracy and precision are obtained.

The F test, t test, and the LR test evaluated in this study are valid specifically to assess the significance of a difference in the random variation of an observer between two groups of cases. The significance of a difference between the random variations of two observers using the same dataset cannot be assessed by these tests, because their measurements are correlated through the identical cases. The established test of Pitman 15 and Morgan 16 can be used instead to assess the significance of a difference between the random variations of two observers performing measurements on the same cases.

3.4.4. Guidelines to the investigator

3.4.4.1. Choice of method

If a statistical package to perform REML estimation is available, and the investigator feels confident using it, this is the preferred method for studies that involve less than 50 cases evaluated by less than 6 observers. Otherwise, the RES method is an excellent

Assessment of analysis-of-variance-based methods 71

Treatment implementation 71

Considerable differencesbetween random variations

of observers expected ?

Choose REMLprecision

and reliability(figs. 3.2, 3.4)

Apply REML estimation

Perform observermeasurements

REML estimationavailable ?

Choose REMLnumber of cases

number of observers(figs. 3.2, 3.4)

Performancesufficient ?

Choose RESprecision

andreliability(figs. 3.2, 3.4)

Apply RES method

Perform observermeasurements

Choose RESnumber of cases

number of observers(figs. 3.2, 3.4)

Performancesufficient ?

Better performancewith REML estimation ?

(figs. 3.2, 3.4)

Number ofcases < 100 ?

Increasenumber of

cases

Increasenumber ofobservers

Choose REMLaccuracy, precision

andreliability(figs. 3.1, 3.3, 3.4)

Apply REML estimation

Perform observermeasurements

REML estimationavailable ?

Choose REMLnumber of cases

number of observers(figs. 3.1, 3.3, 3.4)

Choose RESaccuracy, precision

andreliability(figs. 3.1, 3.3, 3.4)

Apply RES method

Perform observermeasurements

Choose RESnumber of cases

number of observers(figs. 3.1, 3.3, 3.4)

Performancesufficient ?

Yes

Increasenumber of

cases

Increasenumber ofobservers

Better performancewith REML estimation ?

(figs. 3.1, 3.3, 3.4)

Yes

YesYes

Yes

Yes

Done

YesYes Yes

No

No No

No No

No

NoNo

NoNo

Yes

No

Yes

Done

Performancesufficient ?

Done Done

Number ofcases < 100 ?

Considerable differencesbetween random variations

of observers expected ?

Choose REMLprecision

and reliability(figs. 3.2, 3.4)

Apply REML estimation

Perform observermeasurements

REML estimationavailable ?

Choose REMLnumber of cases

number of observers(figs. 3.2, 3.4)

Performancesufficient ?

Choose RESprecision

andreliability(figs. 3.2, 3.4)

Apply RES method

Perform observermeasurements

Choose RESnumber of cases

number of observers(figs. 3.2, 3.4)

Performancesufficient ?

Better performancewith REML estimation ?

(figs. 3.2, 3.4)

Number ofcases < 100 ?

Increasenumber of

cases

Increasenumber ofobservers

Choose REMLaccuracy, precision

andreliability(figs. 3.1, 3.3, 3.4)

Apply REML estimation

Perform observermeasurements

REML estimationavailable ?

Choose REMLnumber of cases

number of observers(figs. 3.1, 3.3, 3.4)

Choose RESaccuracy, precision

andreliability(figs. 3.1, 3.3, 3.4)

Apply RES method

Perform observermeasurements

Choose RESnumber of cases

number of observers(figs. 3.1, 3.3, 3.4)

Performancesufficient ?

Yes

Increasenumber of

cases

Increasenumber ofobservers

Better performancewith REML estimation ?

(figs. 3.1, 3.3, 3.4)

Yes

YesYes

Yes

Yes

Done

YesYes Yes

No

No No

No No

No

NoNo

NoNo

Yes

No

Yes

Done

Performancesufficient ?

Done Done

Number ofcases < 100 ?

Fig. 3.5. Flowchart to guide to the investigator in choosing the ANOVA-based method, the number of cases, and the number of observers, depending on the preferred accuracy, precision, agreement between the analytical estimates for the standard errors and the true standard errors, and reliability of the provided statistical test.

72 Chapter 3

72 Section III

alternative in most situations, because of its straightforward implementation, its completeness with respect to the provided statistics, and its overall sufficient accuracy, precision, and reliability of the provided statistical test. If the performance of the RES method is, however, not acceptable, given the available number of cases and the available number of observers, it may still be worth the effort to use REML estimation instead, in order to improve the performance without the need to include more cases or the participation of more observers. This applies especially if the investigator prefers an almost perfect agreement between the analytical estimates for the standard errors and the true standard errors, or desires an excellent reliability of the statistical test. If neither the RES method nor REML estimation can provide sufficient performance, either more observers or more cases must be included.

3.4.4.2. Number of cases and number of observers

The results of the Monte Carlo simulations may be used directly to determine the number of observers and the number of cases that are required with each evaluated ANOVA-based method, depending on the preferred accuracy (Fig. 3.1), precision (Figs. 3.2 and 3.3), and reliability of the provided statistical test (Fig. 3.4). The flowchart of Fig. 3.5 provides a guideline to the investigator in choosing these parameters, by referring to the relevant figures at each step of the process. The flowchart may be used to determine the required number of observers and the required number of cases, for the situation in which the investigator needs to set up a new observer study. The flowchart may also be applied for the situation in which a database of observer measurements is already available, and the investigator wishes to improve the reliability of the results of the observer study by including more cases, participation of one or more additional observers, or by employing a different ANOVA-based method. In addition to these quantitatively established guidelines, the choice of which is most beneficial – adding more cases or including more observers – obviously depends on the specific situation as well, regarding time and workload-related issues in clinical practice.

3.5. CONCLUSION

Many different techniques exist to assess the random variations of observers. Among these, analysis-of-variance- (ANOVA-) based methods are the most appropriate and versatile techniques, but the number of observers and the number of cases in an observer study must be taken into account in order to obtain meaningful results. In this study, we assessed the performance of three representative ANOVA-based methods, for typical numbers of observers and typical numbers of cases. Guidelines to the investigator are given concerning which method, how many observers, and which number of cases are required to obtain the a priori chosen performance.

Assessment of analysis-of-variance-based methods 73

Treatment implementation 73

ACKNOWLEDGMENTS

The authors would like to thank Professor Dr. H. Bartelink and Dr. S. H. Muller for their thorough examination of the manuscript and the useful discussions on the research. This work was financially supported by the Dutch Cancer Society, Grant No. NKI 99-2035.

REFERENCES 1 C. Rasch, A. Eisbruch, P. Remeijer, L. Bos, M. Hoogeman, M. van Herk, and J. V.

Lebesque, "Irradiation of paranasal sinus tumors, a delineation and dose comparison study," Int. J. Radiat. Oncol. Biol. Phys. 52, 120-127 (2002).

2 C. W. Hurkmans, J. H. Borger, B. R. Pieters, N. S. Russell, E. P. Jansen, and B. J. Mijnheer, "Variability in target volume delineation on CT scans of the breast," Int. J. Radiat. Oncol. Biol. Phys. 50, 1366-1372 (2001).

3 C. M. Rutter and S. Taplin, "Assessing mammographers' accuracy. A comparison of clinical and test performance," J. Clin. Epidemiol. 53, 443-450 (2000).

4 C. Boetes, R. D. Mus, R. Holland, J. O. Barentsz, S. P. Strijk, T. Wobbes, J. H. Hendriks, and S. H. Ruys, "Breast tumors: comparative accuracy of MR imaging relative to mammography and US for demonstrating extent," Radiology 197, 743-747 (1995).

5 K. G. Gilhuijs, A. Touw, M. van Herk, and R. E. Vijlbrief, "Optimization of automatic portal image analysis," Med. Phys. 22, 1089-1099 (1995).

6 P. Remeijer, C. Rasch, J. V. Lebesque, and M. van Herk, "A general methodology for three-dimensional analysis of variation in target volume delineation," Med. Phys. 26, 931-940 (1999).

7 A. Heydorn, B. K. Ersboll, M. Hentzer, M. R. Parsek, M. Givskov, and S. Molin, "Experimental reproducibility in flow-chamber biofilms," Microbiology 146 ( Pt 10), 2409-2415 (2000).

8 A. E. Saarnak, C. W. Hurkmans, B. R. Pieters, R. A. Valdes Olmos, L. J. Schultze Kool, A. A. Hart, and S. H. Muller, "Accuracy of internal mammary lymph node localization using lymphoscintigraphy, sonography and CT," Radiother. Oncol. 65, 79-88 (2002).

9 L. S. Ploeger, A. Betgen, K. G. Gilhuijs, and M. van Herk, "Feasibility of geometrical verification of patient set-up using body contours and computed tomography data," Radiother. Oncol. 66, 225-233 (2003).

10 L. S. Ploeger, M. Frenay, A. Betgen, J. A. de Bois, K. G. Gilhuijs, and M. van Herk, "Application of video imaging for improvement of patient set-up," Radiother. Oncol. 68, 277-284 (2003).

74 Chapter 3

74 Section III

11 W. F. Klein Zeggelink, E. E. Deurloo, H. Bartelink, E. J. Rutgers, and K. G. Gilhuijs, "Reproducibility of the assessment of tumor extent in the breast using multiple imaging modalities," Med. Phys. 30, 2919-2926 (2003).

12 G. Verbeke and G. Molenberghs, "Linear Mixed Models in Practice : A SAS Oriented Approach," Lecture Notes in Statistics 126, Springer-Verlag, New York, NY (1997).

13 P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, R. S. Kaplan, L. Rubinstein, J. Verweij, M. van Glabbeke, A. T. van Oosterom, M. C. Christian, and S. G. Gwyther, "New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada," J. Natl. Cancer Inst. 92, 205-216 (2000).

14 K. G. Gilhuijs, M. L. Giger, and U. Bick, "A method for computerized assessment of tumor extent in contrast-enhanced MR images of the breast," In: K. Doi, H. MacMahon, M. L. Giger, K. R. Hoffmann, eds. Computer-aided diagnosis in medical imaging. Amsterdam, The Netherlands: Elsevier Science, 305-310 (1999).

15 E. J. Pitman, "A note on normal correlation," Biometrika 31, 9-12 (1939). 16 W. A. Morgan, "A test for the significance of the difference between the two variances

in a sample from a normal bivariate population," Biometrika 31, 13-19 (1939).

CHAPTER 4

Reproducibility of the assessment of tumor extent in the breast using multiple image modalities

William F. A. Klein Zeggelink 1, Eline E. Deurloo 1, Harry Bartelink 2, Emiel J. Th. Rutgers 3, and Kenneth G. A. Gilhuijs 1

1 Department of Radiology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 2 Department of Radiotherapy, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 3 Department of Surgery, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

Medical Physics, Vol. 30, No. 11, November 2003, pp. 2919 – 2926

Reproducibility of the assessment of tumor extent 77

Treatment implementation 77

ABSTRACT

The accuracy of breast-conserving therapy (BCT) is limited by uncertainties in the assessment of tumor extent. These uncertainties may result in too wide treatment volumes leading to undesirable cosmetic results, or too narrow treatment volumes leading to higher probabilities of local recurrence. The aim of this study is to quantify the reproducibility of the assessment of tumor extent in the breast at preoperative diagnostic imaging with multiple imaging modalities and at pathology, applied to (1) determining minimum surgical safety margins to reduce the probability of underestimating the tumor extent due to uncertainty in the radiological assessment, and (2) defining the minimum difference between two measurements of tumor size that indicates a significant reduction of tumor extent in response to neoadjuvant chemotherapy or hormonal therapy.

Measurements of the largest tumor diameter in mammography, ultrasonography, contrast-enhanced magnetic resonance imaging, and at pathology were retrieved, retrospectively, for 105 patients eligible for BCT. An analysis of variance technique is employed to separate uncertainty at preoperative diagnostic imaging from uncertainty at pathology. The random variations are thus calculated independently of the systematic deviations, avoiding the necessity to choose a gold standard. Moreover, the technique does not require repeat measurements of tumor extent, thus allowing the use of data that is obtained in daily clinical practice, while avoiding bias due to recollection. The magnitude of the random variations is used to determine minimum surgical safety margins and to define the minimum significant difference between two measurements of tumor size.

The overall random variations in the assessment of tumor extent are on the order of 3 mm (1 SD) with only little differences of about 0.3 mm between the four techniques. The dependence of the random variations on tumor size was found significant (p < 0.05) for mammography (2.7 mm vs 4.2 mm, 1 SD) and ultrasonography (2.5 mm vs 3.8 mm, 1 SD) for tumors up to 17 mm compared to those that are larger. A minimum surgical safety margin on the order of 5 mm for tumors up to 17 mm and 7 mm for larger tumors takes the uncertainty in radiological assessment of the tumor extent into account effectively in 95% of the performed surgical procedures. A minimum difference in largest tumor diameter of 7 mm for tumors up to 17 mm and 9 mm for those that are larger indicates a significant (p < 0.05) reduction of tumor extent in response to neoadjuvant chemotherapy or hormonal therapy.

The reproducibility of the assessment of tumor extent at preoperative diagnostic imaging is of comparable magnitude to the reproducibility at pathology. The uncertainty in the preoperative assessment of tumor extent constitutes a large portion (5 – 7 mm) of the current safety margin in breast-conserving surgery (10 mm). In monitoring response to neoadjuvant chemotherapy or hormonal therapy using repeat imaging before and after treatment, the current clinical guidelines may produce approximately 10% false-positive responses for tumors between 20 and 30 mm.

78 Chapter 4

78 Section III

4.1. INTRODUCTION

Breast cancer is a major cause of early death among women in western countries. Breast-conserving therapy (BCT) is offered to patients with early breast cancer in favor of radical mastectomy. Large prospective randomized trials like those by Veronesi et al. 1 and van Dongen et al. 2 showed that the survival rates after BCT are equivalent to those obtained after radical mastectomy.

Surgery is a major contributor in the process of achieving local control. Studies performed by Park et al. 3, Wazer et al. 4, and Spivack et al. 5 showed that incomplete excision of the primary tumor increases the probability of local recurrence. Harris et al. 6 demonstrated that complete excision of the primary tumor maximizes local control. Conversely, Vrieling et al. 7 showed that large excision volumes lead to adverse side effects such as a poor cosmetic outcome. Consequently, accurate assessment of tumor extent is important to limit the number of incomplete excisions in order to achieve local control with good cosmetic outcome.

Accurate assessment of tumor extent is also relevant in other components of the treatment plan, e.g., deciding whether or not a patient is eligible for BCT; deciding whether or not to treat a patient with neoadjuvant chemotherapy to reduce the extent of the tumor in order to achieve eligibility for BCT; deciding whether or not to treat a patient with chemotherapy after surgery; and deciding whether or not a given treatment has resulted in a relevant change in tumor extent. The extent of the tumor is involved as a primary factor in all of these decision-making procedures.

Unfortunately, uncertainties in the assessment of tumor extent at preoperative diagnostic imaging and at pathology introduce uncertainties in BCT that may negatively affect local control and cosmesis. The uncertainty consists of two components: a systematic deviation and a random variation. The systematic deviation is the average discrepancy between measurements and a chosen gold standard that is assumed to represent the truth. The random variation is the arbitrary scatter of measurements around the systematic deviation.

Findings at pathology are commonly considered to be the gold standard. It may be possible to compensate for systematic deviations in the assessment of tumor extent between preoperative imaging and pathology once these systematic deviations are known. It is, however, more difficult to compensate for the random variations in the assessment. Nonetheless, these variations need to be taken into account in the margins of treatment. To optimize the current qualitatively established guidelines, reliable estimation of the reproducibility of the assessment of tumor extent is required.

Several studies have focused on the mean discrepancies between the perceived tumor extent at preoperative diagnostic imaging and tumor extent at pathology, e.g., Boetes et al. 8, Mumtaz et al. 9, Davis et al. 10, and Amano et al. 11. These studies were, however, primarily focused on the systematic deviations between imaging and pathology, whereas only little attention was directed toward the random variations in the assessment of tumor

Reproducibility of the assessment of tumor extent 79

Treatment implementation 79

extent. Furthermore, the random variations at pathology were not separated from those at preoperative imaging. As a consequence, the reported variations are the joined variations of preoperative diagnostic imaging and pathology. To our knowledge, the random variations in the assessment of the tumor extent at preoperative imaging have not been specified independently from those at pathology previously in the literature.

In the current study, an analysis of variance technique is employed to separate uncertainty at preoperative diagnostic imaging from uncertainty at pathology. The random variations are thus calculated independently of the systematic deviations, avoiding the necessity to choose a gold standard. Moreover, the technique does not require repeat measurements of tumor extent, thus allowing the use of data that is obtained in daily clinical practice, while avoiding bias due to recollection.

The aim of this study is to quantify the reproducibility of the assessment of tumor extent in the breast at preoperative diagnostic imaging with multiple imaging modalities and at pathology, applied to (1) determining minimum surgical safety margins to reduce the probability of underestimating the tumor extent due to uncertainty in the radiological assessment, and (2) defining the minimum difference between two measurements of tumor size that needs to be observed to discriminate between measurement variability and significant reduction of tumor extent in response to neoadjuvant chemotherapy or hormonal therapy.

4.2. MATERIALS AND METHODS

The reproducibility of the assessment of tumor extent in the breast is determined by quantifying the random variations in the measurements of the largest tumor diameter at preoperative diagnostic imaging and at pathology. For this purpose, an analysis of variance method is employed. The magnitude of the random variations in the measurements of the largest tumor diameter is used to determine minimum surgical safety margins and to define the minimum significant difference between two measurements of tumor size.

4.2.1. Patients and tumors

This study was performed retrospectively using the data of 105 breast cancer patients who were treated in The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital between 14 January 1999 and 14 December 2002. The age of the included patients at the time of treatment ranged from 27 to 88 years (mean ± 1 SD = 55 ± 11 years). One hundred and five tumors are included in this study. The majority of the tumors are invasive ductal carcinoma (96), followed by invasive lobular carcinoma (6), colloid (mucinous) carcinoma (1), medullary carcinoma (1), and mixed invasive pattern (1).

80 Chapter 4

80 Section III

4.2.2. Imaging techniques and assessment of tumor extent

For each of the 105 patients, mammography, ultrasonography, contrast-enhanced magnetic resonance (MR) imaging, and examination at pathology was performed. For each of the 105 malignancies, the largest tumor diameter was measured using each of these four techniques. The total set of measurements thus constitutes 105 tumors × 4 techniques. The preoperative measurements were performed by a team of radiologists during regular clinical workup. Each radiologist thus contributed to a different subset of the total set of measurements, with no radiologist-specific assignment of tumor, and no radiologist-specific assignment of imaging modality. A similar procedure applies to pathology, where the measurements were performed by a team of pathologists. In the following paragraphs, the assessment of the largest tumor diameter is described for each of the four techniques separately.

4.2.2.1. Mammography

Two types of mammography machines are used in this study. The first system is the Trex Lorad M-IV (Trex Medical Corporation, Lorad Division, Danbury, CT). The second system is the Philips Mammo Diagnost 3000 (Philips Medical Systems AG, Hamburg, Germany). The breasts of each patient are imaged in accordance to the standard craniocaudal and medio-lateral oblique projections. In each view, the radiologist measures the tumor diameter in two orthogonal directions using a ruler on the films, resulting in four measurements. The largest of these measurements is reported as the largest tumor diameter.

4.2.2.2. Ultrasonography

Two types of ultrasonography machines are used in this study. The first system is the Kretz Voluson 730 (Kretztechnik AG, Zipf, Austria). The second system is the Siemens Sonoline Elegra (Siemens Medical Systems AG, Erlangen, Germany). Both machines are employed to obtain two-dimensional (2D) cross-sectional images of the breasts. The breasts are scanned using a 7.5 MHz transducer. During ultrasonography, the radiologist first locates the tumor by searching for it in transversal and longitudinal cross sections through the breast. Once the tumor is detected, the largest tumor diameter is searched for in each of two perpendicular views. The radiologist measures the tumor diameter in two orthogonal directions in each of the two perpendicular views, resulting in four measurements. The largest of these measurements is reported as the largest tumor diameter. The tumor diameters are either measured using a software caliper in the viewer application of the ultrasound system, or by means of a ruler on film-prints of the images.

Reproducibility of the assessment of tumor extent 81

Treatment implementation 81

4.2.2.3. Contrast-enhanced MR imaging

The MR scanner that is employed in this study is a 1.5 T Siemens Magnetom (Siemens Medical Systems AG, Erlangen, Germany). Fast low-angle shot three-dimensional (FLASH 3D) acquisition is used. The patients are positioned on the MR scanner table in prone orientation. Both breasts of each patient are imaged simultaneously using a dedicated double-breast array coil. One series of precontrast MR images is obtained, followed by the acquisition of four to five series of postcontrast MR images, after administration of 0.1 mmol/kg body weight contrast agent (Prohance, Bracco-Byk Gulden AG, Konstanz, Germany) at a rate of 2 – 4 ml/s. The following MR imaging parameters are used: T1-weighted sequence, repetition time (TR) 8.1 ms, echo time (TE) 4.0 ms, and no fat suppression. The reconstructed MR in-plane matrix contains 256 × 256 pixels, with an isotropic in-plane resolution of 1.2 × 1.2 mm2 or 1.4 × 1.4 mm2, and a section thickness of 1.7 mm or 1.4 mm. The MR images are reconstructed in two orthogonal views: craniocaudal (CC) and anterior-posterior (AP). The tumor diameters are measured in one of two ways. In the first method the radiologist uses a clinical viewing station by stepping through the set of MR images on a slice-by-slice basis. Once the slice in which the tumor appears largest is found, the diameter is measured in two orthogonal views using a software caliper in the viewer application. This is done for the CC and the AP views separately, resulting in four measurements in total. The largest of these measurements is reported as the largest tumor diameter. In the second method the radiologist uses a ruler on film-prints of the images to follow a similar procedure.

4.2.2.4. Pathology

The excision specimen is investigated prior to fixation by formaldehyde. First, the specimen is cooled at –15 °C during 20 minutes. Next, the specimen is cut into slices perpendicular to its longest axis. The slices have comparable thickness of approximately 5 mm. The pathologist measures the largest tumor diameter macroscopically using a ruler. In case of doubt, the largest tumor diameter is measured by microscopic examination.

4.2.3. Analysis of variance method

The random variations in the measurements of the largest tumor diameter are quantified using an analysis of variance based method that was originally described by Gilhuijs et al. 12. In the remainder of this paper, the four techniques are indicated X for mammography, U for ultrasound imaging, M for MR imaging, and P for pathology.

82 Chapter 4

82 Section III

The following model describes the measurements of the largest tumor diameter by each of the four techniques in the dataset of N tumors:

( )XiXii mX δµ ++= ,

( )UiUii mU δµ ++= , (4.1)

( )MiMii mM δµ ++= ,

( )PiPii mP δµ ++= , with Ni ≤≤1 .

Here, iX , iU , iM , and iP indicate the largest diameter of tumor i measured by each of the four techniques; iµ represents the true but unknown largest diameter of that tumor;

Xm , Um , Mm , and Pm denote the systematic deviations of the four techniques with regard to the truth, while the random variations of the four techniques are given by ( )Xiδ , ( )Uiδ , ( )Miδ , and ( )Piδ .

The random variations are obtained as follows. Consider the pairwise differences between the largest tumor diameters iX , iU , iM , and iP :

( ) ( )PiXiPXii mmPX δδ −+−=− , ( ) ( )XiUiXUii mmXU δδ −+−=− ,

( ) ( )PiUiPUii mmPU δδ −+−=− , ( ) ( )UiMiUMii mmUM δδ −+−=− , (4.2)

( ) ( )PiMiPMii mmPM δδ −+−=− , ( ) ( )XiMiXMii mmXM δδ −+−=− .

Note that the true but unknown largest tumor diameter, iµ , is eliminated as a result of the pairwise subtractions.

The variances of the pairwise differences between the largest tumor diameters iX , iU , iM , and iP are given by:

222PXXP δδ σσσ += , 222

XUUX δδ σσσ += , 222

PUUP δδ σσσ += , 222UMMU δδ σσσ += , (4.3)

222PMMP δδ σσσ += , 222

XMMX δδ σσσ += .

Here, 2XP

σ , 2UP

σ , 2MP

σ , 2UX

σ , 2MU

σ , and 2MX

σ indicate the variances of the pairwise differences between the largest tumor diameters iX , iU , iM , and iP , while the variances of the random variations of the four techniques are represented by 2

Xδσ , 2

Uδσ , 2

Mδσ , and

2Pδ

σ . Note that the systematic deviations of the four techniques with regard to the truth, Xm , Um , Mm , and Pm , are eliminated because their variance is zero by definition.

The system of linear equations given by Eq. (4.3) contains more equations (six) than unknowns (four) and is therefore overdetermined. Singular value decomposition is used to calculate the variances of the random variations of the four techniques. By taking the square root of these variances, the standard deviations of the random variations of the four techniques are obtained. In the remainder of this paper, these standard deviations will be denoted

Xδσ ,

Uδσ ,

Mδσ , and

Pδσ .

Reproducibility of the assessment of tumor extent 83

Treatment implementation 83

4.2.4. Assessment of reliability

A Monte Carlo simulation of the method described in Sec. 4.2.3 is performed to assess its reliability. Largest tumor diameters are simulated by generating sets of 10, 20, 50, 100, 200, 500, and 1000 random samples from a normal distribution. The standard deviations of the random variations of the four techniques, as solved from Eq. (4.3), are used to simulate the measurements of these largest tumor diameters. For each subset of simulated measurements, the standard deviations of the random variations of the four techniques are calculated again using the method described in Sec. 4.2.3. This is repeated 1000 times for each subset. During each repetition, we calculate the differences between these standard deviations and those of the four techniques calculated from the dataset of 105 tumors. The standard error (1 SD) of the distribution of these differences provides the measure of reliability of the analysis of variance method.

4.2.5. Analysis of the data

The overall random variations of the four techniques are quantified by applying the analysis of variance on the complete dataset of 105 tumors. The tumor-size dependence of the random variations of the four techniques is studied by splitting the dataset of 105 tumors and applying the analysis of variance to each part. The mean value of the largest tumor diameters in the dataset is used as the separation criterion to form a subgroup of small tumors and a subgroup of large tumors.

4.2.6. Determination of minimum surgical safety margins

Minimum surgical safety margins are determined to reduce the probability of underestimating the tumor extent due to uncertainty in the radiological assessment. This is of particular interest for non-palpable tumors, where the surgeon is directed to the tumor by a hook wire, and extent is obtained from imaging. Adding the minimum surgical safety margin to the preoperatively measured largest tumor diameter prevents underestimation of the tumor extent due to radiological uncertainty up to a chosen a priori confidence level.

Minimum surgical safety margins are calculated for each of the three preoperative imaging modalities, and for both the subgroup of small tumors and the subgroup of large tumors. Each margin is calculated according to:

modδσ⋅= cS , (4.4)

where modδσ is either

Xδσ ,

Uδσ , or

Mδσ , depending on which preoperative imaging

modality has been used by the radiologist to assess the tumor extent. Furthermore, c is an independent multiplication factor that is related directly to the probability density function of the normal distribution and the desired a priori confidence level. In order to prevent underestimation of the tumor extent due to radiological uncertainty in 90%, 95%, or 99% of the performed surgical procedures, c equals 1.28, 1.65, or 2.33, respectively.

84 Chapter 4

84 Section III

Note that the minimum surgical safety margins only take uncertainty in tumor extent into

account, and not uncertainty in localization of the tumor or uncertainty in surgical precision.

4.2.7. Definition of the minimum significant difference between two measurements of tumor size

For the purpose of monitoring response to neoadjuvant chemotherapy or hormonal therapy, we define the minimum difference between two measurements of tumor size that needs to be observed to discriminate between measurement variability and significant reduction of tumor extent. Using the minimum significant difference as the threshold to evaluate response to treatment prevents unjustly classifying an observed difference in tumor size as an actual reduction of tumor extent up to a chosen a priori confidence level.

Minimum significant differences are calculated for each of the three preoperative imaging modalities, and for both the subgroup of small tumors and the subgroup of large tumors. Each difference is calculated according to:

2mod⋅⋅= δσcD . (4.5)

In order to justly classify an observed difference between two measurements of tumor size as an actual reduction of tumor extent in 90%, 95%, or 99% of the performed radiological examinations, c equals 1.28, 1.65, or 2.33, respectively.

4.3. RESULTS

The mean largest tumor diameter measured in the complete dataset of 105 tumors is about 17 mm with small differences of approximately 1 mm between the four techniques (Table 4.1). The standard deviation (1 SD) of the distribution of the measured largest tumor diameters is about 6.5 mm for each technique. The minimum and maximum measured largest tumor diameters are approximately 5 and 40 mm, respectively, and these are comparable between the techniques.

Table 4.1. Largest tumor diameters in the complete dataset of 105 tumors measured by the four techniques.

Technique Mean (mm) Standard deviation (mm) Minimum (mm) Maximum (mm)

Mammography 16.5 6.3 4.7 40.4

Ultrasound imaging 15.6 6.7 4.9 40.0

MR imaging 17.6 6.7 8.0 40.0

Pathology 17.1 6.5 6.0 38.0

Reproducibility of the assessment of tumor extent 85

Treatment implementation 85

Table 4.2. Overall random variations of the four techniques in the complete dataset of 105 tumors. Also shown is the significance of the difference between the overall random variation of each of the three imaging modalities and the overall random variation of the pathological examination.

Technique Random variation (1 SD) (mm) Significance

Mammography 3.4 NSa

Ultrasound imaging 3.1 NSa

MR imaging 2.8 NSa

Pathology 3.2

aNS: Not significant at a 95% confidence level (p value > 0.05).

Table 4.3. Random variations of the four techniques in the subgroup of 60 small tumors (< 17 mm) and the subgroup of 45 large tumors (≥ 17 mm). Also shown is the significance of the difference between the random variations of each of the four techniques in the two subgroups.

Small tumors (< 17 mm) Large tumors (≥ 17 mm)

Technique Random variation (1 SD) (mm) Random variation (1 SD) (mm) Significance

Mammography 2.7 4.2 0.008

Ultrasound imaging 2.5 3.8 0.022

MR imaging 2.8 2.9 NSa

Pathology 3.3 3.1 NSa

aNS: Not significant at a 95% confidence level (p value > 0.05).

The overall random variations of the four techniques in the complete dataset of 105 tumors are on the order of 3 mm (1 SD) with only about 0.3 mm difference between the techniques (Table 4.2). The differences in the overall random variations between each of the three preoperative imaging modalities and pathology were not found to be statistically significant.

The random variations of the four techniques in the subgroup of small tumors (< 17 mm) are on the order of 2.8 mm (1 SD) with differences of about 0.5 mm between the techniques (Table 4.3). In the subgroup of large tumors (≥ 17 mm) the random variations of the four techniques are on the order of 3.5 mm (1 SD) with differences between the techniques of approximately 0.8 mm. The differences between the random variations of each technique in the two subgroups of tumors are statistically significant for

86 Section III

86 Chapter 4

Tabl

e 4.

4. M

inim

um s

urgi

cal

safe

ty m

argi

ns t

o re

duce

the

pro

babi

lity

of u

nder

estim

atin

g th

e gr

oss

tum

or e

xten

t du

e to

unc

erta

inty

in

the

radi

olog

ical

ass

essm

ent.

Sm

all t

umor

s (<

17

mm

) La

rge

tum

ors

(≥ 1

7 m

m)

Tech

niqu

e 90

% a

prio

ri C

La (m

m)

95%

a p

riori

CLa

(mm

) 99

% a

prio

ri C

La (m

m)

90%

a p

riori

CLa

(mm

) 95

% a

prio

ri C

La (m

m)

99%

a p

riori

CLa

(mm

)

Mam

mog

raph

y 4

5 7

6 7

10

Ultr

asou

nd im

agin

g 4

5 6

5 7

9

MR

imag

ing

4 5

7 4

5 7

a CL:

Con

fiden

ce le

vel.

Tabl

e 4.

5. M

inim

um d

iffer

ence

bet

wee

n tw

o m

easu

rem

ents

of

tum

or s

ize

that

nee

ds t

o be

obs

erve

d to

dis

crim

inat

e be

twee

n m

easu

rem

ent

varia

bilit

y an

d si

gnifi

cant

redu

ctio

n of

gro

ss tu

mor

ext

ent i

n re

spon

se to

neo

adju

vant

che

mot

hera

py o

r hor

mon

al th

erap

y.

Sm

all t

umor

s (<

17

mm

) La

rge

tum

ors

(≥ 1

7 m

m)

Tech

niqu

e 90

% a

prio

ri C

La (m

m)

95%

a p

riori

CLa

(mm

) 99

% a

prio

ri C

La (m

m)

90%

a p

riori

CLa

(mm

) 95

% a

prio

ri C

La (m

m)

99%

a p

riori

CLa

(mm

)

Mam

mog

raph

y 5

7 9

8 10

14

Ultr

asou

nd im

agin

g 5

6 9

7 9

13

MR

imag

ing

6 7

10

6 7

10

a CL:

Con

fiden

ce le

vel.

Reproducibility of the assessment of tumor extent 87

Treatment implementation 87

Standard error = 3.2994 N -0.5177

R2 = 0.9995

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 100 200 300 400 500 600 700 800 900 1000Number of tumors

Stan

dard

err

or (1

SD

) (m

m)

mammography (p = 0.008) and ultrasonography (p = 0.022). Although statistically significant, the differences are, however, small (about 1.5 mm). No statistical significance is found for MR imaging and pathology.

For the subgroup of small tumors (< 17 mm), the minimum surgical safety margins range between 4 and 7 mm for confidence levels between 90% and 99% (Table 4.4). For the subgroup of large tumors (≥ 17 mm), the minimum surgical safety margins range between 4 and 10 mm for confidence levels between 90% and 99%. As expected, the difference in magnitude of the minimum surgical safety margins between the two subgroups of tumors is observed for mammography and ultrasonography, while no difference is apparent for MR imaging.

In the subgroup of small tumors (< 17 mm), the minimum significant difference between two measurements of tumor size ranges between 5 and 10 mm for confidence levels between 90% and 99% (Table 4.5). In the subgroup of large tumors (≥ 17 mm), the minimum significant difference ranges between 6 and 14 mm for confidence levels between 90% and 99%. Again as expected, the magnitude of the minimum significant difference is equal in each subgroup of tumors for MR imaging, while for mammography and ultrasonography a difference is observed.

The standard error (1 SD) of the calculated random variations of the four techniques is inversely proportional to the square root of the number of tumors that is included in the analysis of variance (Fig. 4.1). For a dataset on the order of 100 tumors (comparable to our complete dataset of tumors) the standard error of the calculated random variations is about 0.3 mm (1 SD). The standard error of the calculated random variations for a dataset on the order of 50 tumors (comparable to the two subgroups of tumors) is approximately 0.5 mm (1 SD).

Fig. 4.1. Relationship between the number of tumors that is included in the analysis of variance and the standard error of the calculated random variations of the four techniques.

88 Chapter 4

88 Section III

4.4. DISCUSSION

A method has been presented to quantify the random variations in the measurements of the largest tumor diameter in the breast at preoperative diagnostic imaging and at pathology. We found that the overall random variations are on the order of 3 mm (1 SD) with only little differences between mammography, ultrasonography, contrast-enhanced MR imaging, and pathology.

There are several aspects that contribute to the random variation in the measurements of the largest tumor diameter at preoperative imaging. These factors include variation in alignment of the largest diameter with the imaging plane, partial volume effect, variation in interpretation of the tumor, and uncertainty in establishing the optimal angle of measurement in the imaging plane. Comparable factors contribute to the random variation in the measurements of the largest tumor diameter at pathology. In addition, the specimen slices are relatively thick compared to the typical tumor dimensions in BCT. Furthermore, during the preparation and cutting of the specimen deformations may be introduced. In addition to these technique-related contributions to the random variation, the human observer introduces variation as well.

Several investigators compared the tumor extent visible at preoperative diagnostic imaging with the tumor extent at pathology, e.g., Boetes et al. 8, Mumtaz et al. 9, Davis et al. 10, and Amano et al. 11. These studies reported, however, the joined random variations in the assessment of the tumor extent at preoperative imaging and at pathology. As a consequence, the contribution of the image modality to the observed random variation cannot be distinguished from the contribution of pathology. In this study, we calculated the random variations in the assessment of the tumor extent at preoperative imaging independently from those at pathology. Note that although the random variations are computed differently, there is consistency between our results and those from the previous studies in that the random variations are smallest for MR imaging, followed by ultrasonography and mammography.

In this study we focused on the quantification of the random variations in the assessment of tumor extent, because unlike the systematic deviations, the random variations are difficult to correct for and are taken into account in the margins of treatment. Pathology is commonly defined as the gold standard. Therefore, the systematic deviation of pathology is assumed to be zero. Nevertheless, the results of this study indicate that the random variation at pathology is of comparable magnitude to the random variation at preoperative diagnostic imaging. Although the gold standard thus has zero systematic deviation by definition, it does have random variations that need to be taken into account.

A straightforward way to assess reproducibility is to repeat measurements on the same tumor several times, and to compute the standard deviation of the results. A drawback of such method is the large workload involved, because measurements have to be performed repeatedly for multiple tumors and techniques. Consequently, such analysis is typically restricted to a smaller subset of cases. A major disadvantage of the repeat measurements

Reproducibility of the assessment of tumor extent 89

Treatment implementation 89

method is that it is likely to introduce bias due to recollection. Using analysis of variance, the necessity for repeat measurements is avoided. This allows the use of data that is already available from daily clinical practice, while avoiding introduction of bias due to recollection.

The analysis of variance method does require, however, a relatively large number of tumors to provide sufficient reliability, compared to the more straightforward repeat measurements method. We have shown that the standard error (1 SD) of the calculated random variations of the four techniques is inversely proportional to the square root of the number of tumors that is included in the analysis of variance. The standard error is also dependent on the number of observers that is included in the study; the larger the number of observers, the smaller the standard error becomes. In order to establish a standard error of, e.g., 10% of the magnitude of the random variations, a dataset on the order of 100 tumors is needed when four observers are included (comparable to our study design). The resulting confidence interval is then, however, somewhat wider than it would be when calculating the standard deviation of repeat measurements with the same number of tumors and observers. Furthermore, the application of the analysis of variance method also involves somewhat more sophisticated calculus. Nevertheless, it is our opinion that the advantages of the analysis of variance method outweigh the disadvantages.

The preoperative measurements were performed by a team of radiologists, where each radiologist contributed to a different subset of the total set of measurements. Likewise, the measurements at pathology were performed by a team of pathologists. As a consequence, the observed random variations of each technique are the result of both intra-observer variations and inter-observer variation. In a more conventional study design where each radiologist performs the measurements on each tumor using each technique, the intra- and inter-observer variations can be examined separately. We did not choose such an approach, because it would necessitate an observer study under laboratory conditions, and the aim of this study was to assess the overall uncertainty in the actual clinical workup by a typical team of experienced clinicians.

The reproducibility of the assessment of tumor extent in the breast is an important component of uncertainty in attempts to improve the accuracy and precision of BCT. It may be possible to reduce the random variations in the assessment of tumor extent by measuring the largest tumor diameter automatically in 2D (mammography, ultrasonography) or in 3D (MR imaging), or to express tumor extent in terms of volume by stacking slices. We have, however, not pursued such an approach, because the purpose of this study was to quantify the reproducibility of the current clinical measurement methods; the largest tumor diameter is a major parameter in the current clinical guidelines.

Typically, the surgeons aim to achieve a tumor-free margin of at least 1 cm. For palpable tumors, this is done mainly on the basis of palpation, with the outline of the tumor on the mammograms, ultrasound images, or MR images in mind. For non-palpable tumors,

90 Chapter 4

90 Section III

however, the surgeon is typically guided toward the position of the tumor by a hook wire that is installed using mammography or ultrasonography. In particular for non-palpable tumors, the surgeon relies heavily on the preoperative diagnostic images and the report of the radiologist with regard to the largest tumor diameter.

Application of the given minimum surgical safety margins a priori sets the probability of underestimating the tumor extent during surgery due to uncertainty in the radiological assessment. For example, consider a tumor larger than 17 mm that was imaged by ultrasonography. For such case, a minimum surgical safety margin of 7 mm was derived. Adding this minimum surgical safety margin to the preoperatively measured largest tumor diameter prevents underestimation of the tumor extent due to radiological uncertainty up to an a priori confidence level of 95%. This means that in 95 out of 100 cases the tumor extent is covered. Adding a minimum surgical safety margin of 9 mm covers the tumor extent in 99 out of 100 cases.

The current 1 cm margin guideline in breast-conserving surgery aims to cover the uncertainty in tumor extent, as well as uncertainty in localization of the tumor and uncertainty in surgical precision. The results of this study demonstrate that the uncertainty in the assessment of the tumor extent alone constitutes a large portion (5 – 7 mm, 95% CL, overall), if not almost all (6 – 10 mm, 99% CL, overall), of the current safety margin of 1 cm in breast-conserving surgery. Conversely, due to the finite surgical precision, the actual distance between the tumor boundary and the boundary of the excision specimen is often larger than the planned 1 cm, see, e.g., Christiaens et al. 13. Consequently, when techniques become available to accurately achieve 1 cm margins in breast-conserving surgery, the effect of measurement uncertainty is likely to become more prominent.

Tumor extent is also an important parameter in clinical guidelines to monitor response to therapy in breast cancer treatment. Application of the given minimum significant differences between two measurements of tumor size as the threshold to evaluate response to treatment a priori sets the probability of unjustly classifying an observed difference in tumor size as an actual reduction of tumor extent. For instance, consider the situation where a tumor larger than 17 mm was imaged by ultrasonography, before and after neoadjuvant chemotherapy. For such case, a minimum significant difference of 9 mm is required as the threshold to prevent unjustly classifying an observed difference of tumor size as an actual reduction of tumor extent up to an a priori confidence level of 95%. This means that in 95 out of 100 examinations an observed difference equal to or larger than this threshold will be justly classified as an actual reduction of tumor extent. Using a minimum significant difference of 13 mm as the threshold correctly classifies an observed difference as an actual reduction of tumor extent in 99 out of 100 examinations.

Current clinical guidelines employ regression in largest tumor diameter of 30% as the criterion to indicate partial response of a tumor to neoadjuvant chemotherapy, see, e.g., Therasse et al. 14. For example, a tumor of 25 mm imaged by ultrasonography must show a reduction of its largest diameter of 7.5 mm or more to be considered associated with an

Reproducibility of the assessment of tumor extent 91

Treatment implementation 91

actual regression. According to the results of the current study, a minimum difference of 9 mm between two measurements of the largest tumor diameter needs to be observed to achieve less than 5% chance of misclassification, and at least 13 mm difference to achieve a chance of misclassification less than 1%. At the current 30% clinical guideline, approximately 10% of the tumors between 20 and 30 mm may be incorrectly classified as responders to therapy. This issue is a topic of future research.

4.5. SUMMARY AND CONCLUSIONS

We quantified the reproducibility of the assessment of tumor extent in the breast at preoperative diagnostic imaging using mammography, ultrasound imaging, and contrast-enhanced MR imaging, as well as postoperatively at pathology. The overall random variations in the assessment of tumor extent are on the order of 3 mm (1 SD) with only little differences of about 0.3 mm between the four techniques. The dependence of the random variations on tumor size was found significant (p < 0.05) for mammography (2.7 mm vs 4.2 mm, 1 SD) and ultrasonography (2.5 mm vs 3.8 mm, 1 SD) for tumors up to 17 mm compared to those that are larger.

A minimum surgical safety margin on the order of 5 mm for tumors up to 17 mm and 7 mm for larger tumors takes the uncertainty in radiological assessment of the tumor extent into account effectively in 95% of the performed surgical procedures. A minimum difference in largest tumor diameter of 7 mm for tumors up to 17 mm and 9 mm for those that are larger indicates a significant (p < 0.05) reduction of tumor extent in response to neoadjuvant chemotherapy or hormonal therapy.

The reproducibility of the assessment of tumor extent at preoperative diagnostic imaging is of comparable magnitude to the reproducibility at pathology. The uncertainty in the preoperative assessment of tumor extent constitutes a large portion (5 – 7 mm) of the current safety margin in breast-conserving surgery (10 mm). In monitoring response to neoadjuvant chemotherapy or hormonal therapy using repeat imaging before and after treatment, the current clinical guidelines may produce approximately 10% false-positive responses for tumors between 20 and 30 mm.

ACKNOWLEDGMENTS

The authors would like to thank Dr. M. van de Vijver, Dr. S. H. Muller, and H. J. Teertstra for proofreading of the manuscript. A. A. M. Hart is thanked for useful discussions on statistical techniques. This work was financially supported by the Dutch Cancer Society, Grant No. NKI 99-2035.

REFERENCES 1 U. Veronesi, B. Salvadori, A. Luini, M. Greco, R. Saccozzi, M. del Vecchio, L.

Mariani, S. Zurrida, and F. Rilke, "Breast conservation is a safe method in patients

92 Chapter 4

92 Section III

with small cancer of the breast. Long-term results of three randomised trials on 1,973 patients," Eur. J. Cancer 31A, 1574-1579 (1995).

2 J. A. van Dongen, A. C. Voogd, I. S. Fentiman, C. Legrand, R. J. Sylvester, D. Tong, E. van der Schueren, P. A. Helle, K. van Zijl, and H. Bartelink, "Long-term results of a randomized trial comparing breast-conserving therapy with mastectomy: European Organization for Research and Treatment of Cancer 10801 trial," J. Natl. Cancer Inst. 92, 1143-1150 (2000).

3 C. C. Park, M. Mitsumori, A. Nixon, A. Recht, J. Connolly, R. Gelman, B. Silver, S. Hetelekidis, A. Abner, J. R. Harris, and S. J. Schnitt, "Outcome at 8 years after breast-conserving surgery and radiation therapy for invasive breast cancer: influence of margin status and systemic therapy on local recurrence," J. Clin. Oncol. 18, 1668-1675 (2000).

4 D. E. Wazer, G. Jabro, R. Ruthazer, C. Schmid, H. Safaii, and R. K. Schmidt-Ullrich, "Extent of margin positivity as a predictor for local recurrence after breast conserving irradiation," Radiat. Oncol. Investig. 7, 111-117 (1999).

5 B. Spivack, M. M. Khanna, L. Tafra, G. Juillard, and A. E. Giuliano, "Margin status and local recurrence after breast-conserving surgery," Arch. Surg. 129, 952-956 (1994).

6 J. R. Harris, L. Botnick, W. D. Bloomer, J. T. Chaffey, and S. Hellman, "Primary radiation therapy for early breast cancer: the experience at The Joint Center for Radiation Therapy," Int. J. Radiat. Oncol. Biol. Phys. 7, 1549-1552 (1981).

7 C. Vrieling, L. Collette, A. Fourquet, W. J. Hoogenraad, J. H. Horiot, J. J. Jager, M. Pierart, P. M. Poortmans, H. Struikmans, B. Maat, E. Van Limbergen, and H. Bartelink, "The influence of patient, tumor and treatment factors on the cosmetic results after breast-conserving therapy in the EORTC 'boost vs. no boost' trial. EORTC Radiotherapy and Breast Cancer Cooperative Groups," Radiother. Oncol. 55, 219-232 (2000).

8 C. Boetes, R. D. Mus, R. Holland, J. O. Barentsz, S. P. Strijk, T. Wobbes, J. H. Hendriks, and S. H. Ruys, "Breast tumors: comparative accuracy of MR imaging relative to mammography and US for demonstrating extent," Radiology 197, 743-747 (1995).

9 H. Mumtaz, M. A. Hall-Craggs, T. Davidson, K. Walmsley, W. Thurell, M. W. Kissin, and I. Taylor, "Staging of symptomatic primary breast cancer with MR imaging," AJR Am. J. Roentgenol. 169, 417-424 (1997).

10 P. L. Davis, M. J. Staiger, K. B. Harris, M. A. Ganott, J. Klementaviciene, K. S. McCarty, and H. Tobon, "Breast cancer measurements with magnetic resonance imaging, ultrasonography, and mammography," Breast Cancer Res. Treat. 37, 1-9 (1996).

Reproducibility of the assessment of tumor extent 93

Treatment implementation 93

11 G. Amano, N. Ohuchi, T. Ishibashi, T. Ishida, M. Amari, and S. Satomi, "Correlation of three-dimensional magnetic resonance imaging with precise histopathological map concerning carcinoma extension in the breast," Breast Cancer Res. Treat. 60, 43-55 (2000).

12 K. G. Gilhuijs, A. Touw, M. van Herk, and R. E. Vijlbrief, "Optimization of automatic portal image analysis," Med Phys. 22, 1089-1099 (1995).

13 M. R. Christiaens, L. Cataliotti, I. Fentiman, E. Rutgers, M. Blichert-Toft, J. E. DeVries, H. P. Graversen, K. Vantongelen, and R. Aerts, "Comparison of the surgical procedures for breast conserving treatment of early breast cancer in seven EORTC centres," Eur. J. Cancer 32A, 1866-1875 (1996).

14 P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, R. S. Kaplan, L. Rubinstein, J. Verweij, M. Van Glabbeke, A. T. van Oosterom, M. C. Christian, and S. G. Gwyther, "New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada," J. Natl. Cancer Inst. 92, 205-216 (2000).

CHAPTER 5

Reproducibility of mammary gland structure during repeat setups in a supine position

William F. A. Klein Zeggelink 1, Eline E. Deurloo 1, Sara H. Muller 1, Leo J. Schultze Kool 1, and Kenneth G. A. Gilhuijs 1

1 Department of Radiology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

Medical Physics, Vol. 29, No. 9, September 2002, pp. 2062 – 2069

Reproducibility of mammary gland structure 97

Treatment implementation 97

ABSTRACT

In breast-conserving therapy, complete excision of the tumor with an acceptable cosmetic outcome depends on accurate localization in terms of both the position of the lesion and its extent. We hypothesize that preoperative contrast-enhanced magnetic resonance (MR) imaging of the patient in a supine position may be used for accurate tumor localization and marking of its extent immediately prior to surgery. Our aims in this study are to assess the reproducibility of mammary gland structure during repeat setups in a supine position, to evaluate the effect of a breast immobilization device, and to derive reproducibility margins that take internal tissue shifts into account occurring between repeat setups.

The reproducibility of mammary gland structure during repeat setups in a supine position is estimated by quantification of tissue shifts in the breasts of healthy volunteers between repeat MR setups. For each volunteer fiducials are identified and registered with their counter locations in corresponding MR volumes. The difference in position denotes the shift of breast tissue. The dependence on breast volume and the part of the breast, as well as the effect of a breast immobilization cast are studied.

The tissue shifts are small with a mean standard deviation on the order of 1.5 mm, being slightly larger in large breasts (V > 1000 cm3), and in the posterior part (toward the pectoral muscle) of both small and large breasts. The application of a breast immobilization cast reduces the tissue shifts in large breasts. A reproducibility margin on the order of 5 mm will take the internal tissue shifts into account that occur between repeat setups.

The results demonstrate a high reproducibility of mammary gland structure during repeat setups in a supine position.

5.1. INTRODUCTION

Breast cancer is a common cause of mortality among women in western countries. Breast-conserving therapy (BCT) is regularly offered instead of radical mastectomy to patients with early breast cancer. During BCT, it is essential to achieve local control while preserving the cosmetic outcome. An incomplete excision of the primary tumor may increase the probability of recurrence 1-3, whereas larger excisions of healthy tissue reduce the cosmetic outcome 4. Consequently, an accurate determination of both the position of the tumor and its extent are of great importance to limit the number of incomplete excisions while maintaining an acceptable cosmetic outcome.

In breast-conserving surgery, clinically occult (i.e., non-palpable) lesions are localized and marked preoperatively using localization techniques based on diagnostic imaging modalities. Various localization techniques exist, based on different modalities. The lesion is either marked by the injection of a dye or carbon solution, or by the insertion of a localization wire indicating the position of the lesion.

98 Chapter 5

98 Section III

Traditional localization is based on x-ray mammography. Using images acquired in craniocaudal and mediolateral projections, a localization needle is inserted in a compressed breast and directed toward the lesion. The needle position is subsequently evaluated by two additional x-ray images. An extension of this technique is the use of a dedicated stereotactic localization device. The main advantage of using such a device is that the breast is viewed in two stereotactic directions simultaneously; our experience with such a device indicates higher accuracy in localizing the position of the lesion, compared to traditional localization.

Other common localization methods are based on ultrasonography 5. A localization needle is inserted and positioned using real-time image acquisition. Consequently, the position of the lesion is localized more easily than with x-ray-based methods. Some lesions in the breast can, however, be difficult to identify with ultrasonography, e.g., microcalcifications and small masses.

Localization guided by contrast-enhanced magnetic resonance imaging (CE MRI) may be of value when mammography and ultrasonography fail to reveal the lesion 6-8. An advantage of using CE MRI is that the images provide accurate information about the position as well as the extent of the lesion in three dimensions (3D). Furthermore, CE MRI has been found to provide more reliable information about the extent of the lesion than mammography and ultrasonography 9,10. Preoperative localization devices using real-time CE MRI guidance are, however, relatively new and currently not widely used. Real-time MRI guidance during actual surgery requires dedicated MRI devices that are rarely available, as well as a completely MRI-proof operating theatre.

The accuracy of breast-conserving surgery is limited by uncertainties in both the position of the tumor and its extent (macroscopic and microscopic). Our surgeons take these uncertainties into account by aiming to remove the tumor with a margin of at least 1 cm, guided to the position of the lesion by a hook wire that is installed using ultrasonography or x-ray mammography. It is striking that the most important determinant for completeness of the excision of clinically occult lesions appears to be the experience of the surgeon 11. The results of a multi-institutional investigation of the accuracy of BCT show a poor correlation between tumor extent and size of the excision volume 12, demonstrating the necessity for improved localization in terms of both the position of the tumor and its extent.

Unfortunately, most preoperative localization techniques do not directly inform the surgeon about the extent of the lesion during actual surgery, since the localization wire only indicates the position of the tumor. A different approach is to utilize preoperatively acquired CE MRI data of the patient in a supine position for localization of the tumor and marking of its extent immediately prior to actual surgery, using dedicated tracking devices to establish real-time mapping between the patient (breast), localization instrument (needle), and the previously acquired CE MRI data. This approach requires a high reproducibility of the mammary gland structure during imaging and localization.

Reproducibility of mammary gland structure 99

Treatment implementation 99

Our aim in this study is threefold. First, to assess the reproducibility of mammary gland structure during repeat setups in a supine position. Second, to assess the effect of a breast immobilization device on this reproducibility. Third, to derive reproducibility margins that take internal tissue shifts into account occurring between repeat setups.

5.2. MATERIALS AND METHODS

The reproducibility of mammary gland structure during repeat setups in a supine position is estimated by the quantification of tissue shifts in the breasts of healthy volunteers between repeat MR setups. The dependence of the reproducibility on breast volume, the differences of the reproducibility between different parts of the breast, as well as the effect of a breast immobilization cast on the reproducibility are studied. The magnitude of the internal tissue shifts occurring between the repeat MR setups is used to derive reproducibility margins.

The image acquisition, experimental setup, and measurement prescription are described in Sec. 5.2.1. In Sec. 5.2.2 we describe the image features used to calculate the tissue shifts in the breast between the repeat MR setups. The method employed to analyze the tissue shifts is described in Sec. 5.2.3. Finally, in Sec. 5.2.4 we describe the calculation of the reproducibility margins.

5.2.1. Image acquisition, experimental setup, and measurement prescription

The MR scanner used in this study is a 1.5 T Siemens Magnetom (Siemens Medical Systems AG, Erlangen, Germany). Fast low-angle shot three-dimensional (FLASH 3D) acquisition is used. Acquisition parameters are repetition time (TR) 9.1 ms, echo time (TE) 5.0 ms, no fat suppression. All MR scans are acquired using a surface-receive-only loop coil. During each MR scan the breast is imaged in 60 slices perpendicular to an axis at a 45º angle between the coronal and sagital plane (Fig. 5.1). Such a set of slices will be referred to as a MR volume. Each slice contains 256 × 256 pixels with a size of 1.17 × 1.17 mm2. The slices are adjacent and have a thickness of 1.67 mm. In the remainder of this paper, the mediolateral (ML), craniocaudal (CC), and anterior-posterior (AP) directions are all defined with respect to the orientation of the field of view (Fig. 5.1).

The breasts of eight healthy women are imaged in a volunteer study. The age of the women ranges from 26 to 64 years (mean ± SD = 44 ± 13 years). Both the left and right breasts are imaged alternately. The weight of the volunteers is between 53 and 77 kg (mean ± SD = 66 ± 8 kg). Density and structure of parenchyma vary according to age, volume, and weight. Breast volume is estimated from the acquired MR volumes by segmenting the breast from the background, and ranges between roughly 600 and 1350 cm3 (mean ± SD = 980 ± 261 cm3). The segmentation is based on gray-value thresholding in the global histogram of voxel values 13.

100 Chapter 5

100 Section III

45º45º

Fig. 5.1. Transverse view of the MR setup in supine orientation. The rectangle indicates the field of view. The arrows show the definitions of the mediolateral (ML), craniocaudal (CC), and anterior-posterior (AP) directions (with respect to the orientation of the field of view).

To serve as a local coordinate frame of the breast, a set of six MR-visible markers is attached to the skin of each breast of each volunteer. The skin markers used in this study are gelatin orbs of 6 mm in diameter filled with cod-liver oil. Four skin markers surround the nipple at roughly equal distances (i.e., at a 3, 6, 9, and 12 o’clock configuration), and are thus located anterior in the field of view. The other two skin markers are positioned near the sternum and toward the lateral extremity of the breast, and are thus located posterior in the field of view. The set of skin markers thus spans the surface of the breast, and its center of gravity gives an estimate of the geometrical center of the breast.

Ecoplast (Orfit Industries NV, Wijnegem, Belgium) is used to construct breast immobilization casts. The casts are fitted to the volunteer’s breasts while avoiding compression. In order to allow easy installation to the breasts, the casts are constructed for each breast separately. Care is also taken during MR setup to install the casts without changing the natural shape of the breast. Holes are cut in the casts at the six predefined locations of the skin markers, thus preventing the skin markers from pressing into the breast.

The breasts of each volunteer are imaged in three measurement series, performed consecutively. Each measurement series consists of the three following steps. First, the volunteer is positioned on the scanner table in supine orientation, such that the setup laser lines of the MR scanner are aligned with the nipples. Second, four MR volumes are obtained in fixed order and identified by their order number: (1) left breast without cast, (2) right breast without cast, (3) left breast with cast, and (4) right breast with cast. Third, the volunteer gets up from the scanner table. In the remainder of this paper, MR volumes with identical order number are referred to as corresponding MR volumes.

Reproducibility of mammary gland structure 101

Treatment implementation 101

The skin markers are attached to each breast immediately prior to obtaining the first MR volume, and remain attached to the breasts during the three measurement series. During repositioning of the volunteer, care is taken to reproduce the position and orientation of the arms. The time required to obtain one MR volume is approximately 1.5 minutes. To minimize the risk of artifacts due to breathing motion, the volunteers are instructed to apply abdominal respiration during scanning.

5.2.2. Image features to calculate the tissue shifts in the breast

In the MR volumes parenchymal structures are expressed because of the different intensity values for fat and glandular tissue. While fat is brightly visualized, the relatively darker structures correspond to glandular tissue. The shifts of glandular tissue are assessed by quantification of the shifts of their corresponding anatomical fiducials. Examples of anatomical fiducials in the breast are ends, junctions, and intersections of glandular tissue on the transitions to fat, as well as points at uniquely distinguishable texture.

Consider a specific point at a specific location in a specific structure in an MR volume of the first measurement series of a volunteer. Also, consider one counterpoint in the corresponding MR volume of the second measurement series and one counterpoint in the corresponding MR volume of the third measurement series of the same volunteer. If these three points are positioned at the same location in the same structure, the location is defined as a fiducial.

For each volunteer fiducials are identified in all MR volumes of the first measurement series, and registered with their counter locations in the corresponding MR volumes of the other two measurement series. For this purpose, an interactive registration tool is developed.

First, candidate points are inspected for their spatial distinctiveness. The latter is assessed in 3D by calculation of the normalized cross-correlation between a small region (referred to as the fiducial template) around a candidate point in a MR volume of the first measurement series and a larger area (referred to as the search template) around the same candidate point in the same MR volume 14. A comparable method in 2D has been described in the literature 15. Low correlation values around the position of the candidate point indicate high spatial distinctiveness with respect to its surrounding, which is an indication of a fiducial. To facilitate the identification of fiducials, the interactive registration tool allows simultaneous viewing of three orthogonal planes through the MR volume at the position of the queried location and three orthogonal planes through the corresponding correlation space. Next, the search template is positioned in the corresponding MR volume of the second measurement series. The position of the fiducial template in the search template that yields the highest correlation value is used as an estimate of the position of the corresponding candidate point. This is repeated for the third measurement series.

102 Chapter 5

102 Section III

A similar procedure is used to locate the position of the skin markers in all MR volumes of the first measurement series, and to recover their positions in the corresponding MR volumes of the other two measurement series.

5.2.3. Analysis of the tissue shifts in the breast

In each of the 32 MR volumes of the first measurement series, one fiducial is selected in each of 10 equally spaced slices in the MR volume displaying breast tissue (i.e., 10 fiducials per MR volume), resulting in 320 different fiducials in total. We will quantify the shifts of each of these 320 different fiducials for each fiducial independently.

Consider the MR volume described by the function G : )(pG with ),,( zyx≡p , (5.1)

where )(pG is the intensity value of the voxel at coordinate p in the MR volume.

Next, consider the set JIF , of projected fiducial coordinates f in the domain )(GD of G : }1,1);(|{ ,, JjIiGDF jiJI LL ==∈= ff , (5.2)

where i denotes the fiducial number (I = 320) and j denotes the measurement series number (J = 3).

Likewise, consider the set JIKM ,, of projected skin marker coordinates m in the domain )(GD of G :

}1,1,1);(|{ ,,,, JjIiKkGDM jikJIK LLL ===∈= mm , (5.3)

where k denotes the skin marker number (K = 6), i denotes the number of the fiducial it is associated with (I = 320), and j denotes the measurement series number (J = 3). Note that there is only 1 set of 6 skin markers for the 10 fiducials in each MR volume. The six projected coordinates m of these six skin markers are, however, included repeatedly in the set JIKM ,, for each of the ten fiducials.

Now, the distribution of the positions of fiducial i in the corresponding MR volumes of the three measurement series j is characterized by the vector of standard deviations iS :

( )jii ,rel )(fSDS = with ]31[ L=j , (5.4)

where i denotes the fiducial number, j indicates the measurement series number, and the components of ( )ji,rel )(fSD represent the standard deviations of the components of ji,rel )(f over measurements j. Furthermore, ji,rel )(f is given by:

∑ ∑∑= ==

⎟⎠

⎞⎜⎝

⎛−−−=

3

1

6

1,,,

6

1,,,,rel 6

131

61)(

l klikli

kjikjiji mfmff . (5.5)

The second term transforms the absolute coordinates ji ,f of the fiducial to coordinates relative to the center of gravity of the skin markers. The third term completes the transformation by expressing these coordinates relative to the mean position of the fiducial in the three corresponding MR volumes (where summation index l also indicates the measurement series number).

Reproducibility of mammary gland structure 103

Treatment implementation 103

By definition, the average of the relative positions given by Eq. (5.5) equals zero. Furthermore, it is expected that the relative positions are normal distributed. Kolmogorov-Smirnov tests are performed to evaluate the validity of this assumption. In addition, the relative positions of the fiducials are expected to be independent, i.e., uncorrelated within and between breasts. The validity of this assumption is evaluated using an analysis of variance test for each breast on each of the components of the relative positions, with measurement series (1···3) and fiducial (1···10) as independent factors.

The quantity of clinical interest is the distribution of the distances over which fiducials are displaced (i.e., the shifts of the fiducials) between repeat setups. This distribution is characterized by the vector of standard deviations iD (i.e., the dispersion of the shifts of the fiducials) of the differences between the positions of fiducial i in two random samples from the distribution described by iS :

ii SD ⋅= 2 . (5.6)

Multivariate analysis of variance and regression analysis are performed to evaluate the effects of the age and weight of the volunteers, the breast volume, the part of the breast, as well as the application of the cast. Square-root transformation is applied to each of the components of iD in order to meet the requirement of normality. Subgroups are formed based on the independent factors. For each subgroup the mean values of the components of iD and their 95% confidence intervals are calculated.

5.2.4. Calculation of the reproducibility margins

Reproducibility margins are derived to cover internal tissue shifts occurring between repeat setups with an a priori confidence level (CL). Such a reproducibility margin is expressed as the radius of a sphere that encompasses the prediction interval for the expected position of an indicated point.

Reproducibility-margin groups are formed representing those subgroups that show roughly comparable mean values of the components of iD . For each reproducibility-margin group a reproducibility margin is calculated according to:

DLM ⋅= , (5.7) where D is the largest of the mean values of the components of iD in those subgroups that are represented in the reproducibility-margin group. Furthermore, L is an independent multiplication factor that is related solely to the 3D probability density function of the normal distribution. For a 1D normal distribution, an interval of [–1, 1] standard deviations (SD) contains 68% of all observations. In 3D, however, [–1, 1] SD includes 20% of the observations. In order to obtain 90%, 95%, or 99% of all observations in 3D, an interval of [–L, L] SD is required, where L equals 2.50, 2.79, and 3.37, respectively. An example of the derivation of the equations needed to calculate the value of L for the 3D probability density function of the normal distribution can be found in the literature 16.

104 Chapter 5

104 Section III

5.3. RESULTS

The distribution of the mean positions of the 320 fiducials spans approximately 125 mm in the ML direction, 100 mm in the CC direction, and 60 mm in the AP direction. The 320 fiducials are selected at locations that are well spread throughout the breasts.

According to a Kolmogorov-Smirnov test performed for the relative positions of the total set of 320 fiducials, the distribution is not found to be different than normal. In addition, Kolmogorov-Smirnov tests performed for the relative positions of the ten fiducials in each breast separately indicate no deviation from the normal distribution either.

The analysis of variance test for each breast on each of the components of the relative positions shows that the fiducials move independently, which indicates that the shifts of the fiducials are uncorrelated within and between breasts, and therefore that the dispersion of the shifts of each of the fiducials is independent as well.

The age and weight of the volunteers and breast volume turn out to be positively correlated (p < 0.001). Consequently, the age and weight of the volunteers will be discarded in favor of breast volume (V). A threshold of 1000 cm3 is introduced to label the fiducials as being located in either larger breasts (V > 1000 cm3) or in smaller breasts (V < 1000 cm3). Regression analysis of the dispersion of the shifts of the fiducials against their mean positions shows statistical significance for the AP direction only (p < 0.031). Therefore, the fiducials are also labeled as being located in either the posterior part (i.e., from the center of gravity of the skin markers toward the pectoral muscle) or in the anterior part (i.e., from the center of gravity of the skin markers toward the nipple) of the breast.

The mean dispersion of the shifts of the fiducials is shown in Table 5.1. The mean standard deviations are on the order of 1.5 mm for each direction, with only about 0.1 mm difference between them.

Table 5.1. Mean dispersion of the shifts of the fiducials. Shown are the mean standard deviations and 95% confidence intervals (CI).

Directiona Mean (mm) 95% CI (mm) N

ML 1.31 1.22 – 1.40 320

CC 1.54 1.44 – 1.64 320

AP 1.46 1.36 – 1.56 320

aML: mediolateral; CC: craniocaudal; AP: anterior-posterior (with respect to the orientation of the field of view).

Reproducibility of mammary gland structure 105

Treatment implementation 105

Table 5.2. Significances of the effects of the independent factors on the dispersion of the shifts of the fiducials.

Directiona

Factor ML CC AP

Cast NS NS NS

Volume NS p < 0.001 NS

Part NS p < 0.001 p = 0.004

Cast * Volume NS p < 0.001 p = 0.043

Cast * Part NS NS NS

Volume * Part NS NS p = 0.042

aML: mediolateral; CC: craniocaudal; AP: anterior-posterior (with respect to the orientation of the field of view); NS: Not significant at a 95% confidence level (p > 0.05).

Table 5.2 gives an overview of the significances of the effects of the independent factors on the dispersion of the shifts of the fiducials. The influence of the cast alone is not statistically significant for any direction. The dependence on breast volume only is statistically significant (p < 0.001) for the CC direction, whereas the dependence on the part of the breast is statistically significantly for both the CC direction (p < 0.001) and the AP direction (p = 0.004). The influence of the interaction of breast volume and the cast is statistically significant for both the CC direction (p < 0.001) and the AP direction (p = 0.043). No statistically significant effect of any of the independent factors is found in the ML direction.

The magnitude of the impact of the independent factors on the dispersion of the shifts of the fiducials is shown in Table 5.3. In the CC direction the mean standard deviation in the larger breasts is approximately 0.5 mm larger than in the smaller breasts. In the CC and AP directions the mean standard deviation in the posterior part of the breasts is about 0.4 mm larger than in the anterior part of the breasts. In the CC and AP directions the cast reduces the mean standard deviation in the larger breasts by about 0.4 mm, but increases the mean standard deviation in the breasts with smaller volume with approximately 0.1 mm.

Based on comparison of the mean standard deviations in the subgroups presented in Table 5.3, we formed a total of three reproducibility-margin groups (Table 5.4). The group with the largest reproducibility margin of 6 mm (95% a priori CL) includes the larger breasts where no cast is applied (group I). A reproducibility margin of 5 mm (95%

106 Section III

106 Chapter 5

Tabl

e 5.

3. M

agni

tude

of

the

impa

ct o

f th

e in

depe

nden

t fa

ctor

s on

the

dis

pers

ion

of t

he s

hifts

of

the

fiduc

ials

. S

how

n ar

e th

e m

ean

stan

dard

de

viat

ions

and

95%

con

fiden

ce in

terv

als

(CI).

V

< 1

000

cm3

V >

100

0 cm

3

N

o ca

st in

stal

led

Cas

t ins

talle

d N

o ca

st in

stal

led

Cas

t ins

talle

d

Dire

ctio

na P

art

Mea

n (m

m)

95%

CI (

mm

)N

M

ean

(mm

)95

% C

I (m

m)

N

Mea

n (m

m)

95%

CI (

mm

) N

M

ean

(mm

)95

% C

I (m

m)

N

Pos

terio

r 1.

14

0.87

– 1

.45

351.

34

1.06

– 1

.65

271.

55

1.30

– 1

.82

42

1.35

1.

10 –

1.6

4 44

ML

Ant

erio

r 1.

30

1.07

– 1

.54

451.

29

1.06

– 1

.55

531.

27

1.06

– 1

.49

38

1.23

1.

03 –

1.4

5 36

Pos

terio

r 1.

43

1.16

– 1

.72

351.

73

1.38

– 2

.13

272.

19

1.90

– 2

.49

42

1.86

1.

59 –

2.1

6 44

CC

A

nter

ior

1.15

1.

00 –

1.3

2 45

1.27

1.

05 –

1.5

0 53

1.77

1.

53 –

2.0

3 38

1.

17

0.96

– 1

.41

36

Pos

terio

r 1.

75

1.46

– 2

.06

351.

70

1.30

– 2

.17

271.

71

1.42

– 2

.03

42

1.39

1.

13 –

1.6

8 44

AP

A

nter

ior

1.17

0.

89 –

1.4

9 45

1.27

1.

07 –

1.5

0 53

1.68

1.

44 –

1.9

3 38

1.

25

1.07

– 1

.45

36

a ML:

med

iola

tera

l; C

C: c

rani

ocau

dal;

AP

: ant

erio

r-pos

terio

r (w

ith re

spec

t to

the

orie

ntat

ion

of th

e fie

ld o

f vie

w).

Treatment implementation 107

Reproducibility of mammary gland structure 107

Tabl

e 5.

4. R

epro

duci

bilit

y m

argi

ns t

o co

ver

inte

rnal

tis

sue

shift

s oc

curri

ng b

etw

een

repe

at s

etup

s w

ith a

n a

prio

ri co

nfid

ence

leve

l (C

L).

The

repr

oduc

ibilit

y m

argi

ns a

re g

iven

for t

hree

repr

oduc

ibilit

y-m

argi

n gr

oups

, and

for a

n a

prio

ri C

L of

90%

, 95%

, and

99%

.

Rep

rodu

cibi

lity-

mar

gin

grou

p V

olum

e C

ast

Par

t R

epro

duci

bilit

y m

argi

n90

% a

prio

ri C

L (m

m)

Rep

rodu

cibi

lity

mar

gin

95%

a p

riori

CL

(mm

) R

epro

duci

bilit

y m

argi

n99

% a

prio

ri C

L (m

m)

I La

rge

Not

app

lied

Ant

erio

r / P

oste

rior

5 6

7

Sm

all

Not

app

lied

/ App

lied

II La

rge

App

lied

Pos

terio

r 5

5 6

Sm

all

Not

app

lied

/ App

lied

III

Larg

e A

pplie

d A

nter

ior

3 4

4

108 Chapter 5

108 Section III

a priori CL) is derived for targets in the posterior part of both the smaller breasts and the larger breasts with the application of a cast (group II). The smallest reproducibility margin equals 4 mm (95% a priori CL), and is assigned to structures in the anterior part of both the smaller breasts and the larger breasts where the cast is applied (group III).

As an example, we demonstrate the calculation of the reproducibility margins for reproducibility-margin group I. The largest of the mean values of the components in the corresponding subgroups equals 2.19 mm (Table 5.3). Using this value for D in Eq. (5.7), the accompanying reproducibility margin for an a priori confidence level of 95% (L = 2.79) equals 6 mm. Similarly, a reproducibility margin of 5 mm is calculated for a 90% a priori confidence level (D = 2.19 mm, L = 2.50). Likewise, a reproducibility margin of 7 mm is obtained for a 99% a priori confidence level (D = 2.19 mm, L = 3.37).

5.4. DISCUSSION

5.4.1. Reproducibility of mammary gland structure

A method has been presented to quantify the shifts of internal breast tissue between repeat setups in a supine position. We found that the dispersion of the tissue shifts is small with a mean standard deviation on the order of 1.5 mm. The results demonstrate a high reproducibility of mammary gland structure during repeat setups in a supine position.

It is shown that the larger the breast, the larger the dispersion of tissue shifts. This observation may be explained by a larger freedom of movement of structures in larger breasts, which has several causes. The attachment of the breast to the chest is formed by a surface which size is fairly independent of breast volume. As a consequence, the mobility of the entire breast increases with breast volume. In the case of movement, the induced distortion of the shape of the breast due to the interaction of gravity and the attachment to the chest will be larger in larger breasts. A positive correlation is found between the dispersion of tissue shifts, age, weight, and breast volume. With increasing age, parenchymal structures tend to reduce both in number and size, and the vacant space is filled with fat. Fat is less stiff than parenchymal structures, and as a consequence, the freedom of movement of structures in the breasts of the older volunteers is larger.

The application of the cast reduces the dispersion of tissue shifts in larger breasts, but slightly increases the dispersion in breasts with smaller volume. The application of the cast is therefore beneficial for larger breasts, reducing their mobility and the internal tissue shifts. It may, however, introduce somewhat larger distortions to the natural shape of smaller breasts. Nevertheless, the resulting increase (~0.2 mm) of the dispersion of tissue shifts in smaller breasts due to application of the cast is of no clinical importance. Generally, care must be taken to avoid introducing forces to the breast during installation of the cast.

Most clearly observed, but small as well, is the larger dispersion of tissue shifts in the posterior part of the breasts, compared to the anterior part. This difference is present in all

Reproducibility of mammary gland structure 109

Treatment implementation 109

breasts, whether or not the casts are applied, and regardless of the volume of the breast. This is likely a result of the supine setup orientation of the volunteers, because the compression of breast tissue due to gravity increases proportional to the depth (toward the pectoral muscle).

5.4.2. Validity of the method

For each reproducibility-margin group, the largest of the mean values of the components in the corresponding subgroups is used to calculate the reproducibility margin. It would have been valid also to use the average of the mean values of the components in the corresponding subgroups. We decided, however, to use a more conservative estimate to quantify the reproducibility margin at a selected a priori confidence level.

Because the women are scanned in a supine position, a surface-receive-only loop coil is used rather than a dedicated breast coil. Despite the lower signal-to-noise ratio, it provided sufficient detail in the MR volumes to identify and recover anatomical fiducials in the breasts. Another consequence of the supine setup orientation of the volunteers is the risk of artifacts due to breathing motion. Especially because the women were instructed to perform abdominal respiration during scanning, artifacts due to breathing motion were not found to be a limitation to identify a reliable set of fiducials in each MR volume.

The effects of inhomogeneities of the magnetic field in the MR scanner are of no influence in our study, because they occur at much lower spatial frequency than the fiducials. Deformations in the imaged geometry due to susceptibility differences are not expected inside the organ 17, where the fiducials are identified and recovered. More sensitive to this effect are the projections of the skin markers, since these are located at the surface of the breasts 17. These geometrical distortions are, however, reproduced in subsequent scans.

Variations in the positions of the skin markers with respect to each other are small. We compared the distances between the skin markers and their center of gravity and found that there is only 2% variation. This also includes possible shifts of the gelatin orbs under the adhesive tape with which they were attached to the skin of the breasts.

Note that inaccuracy in indicating the center of the marker has no effect on the accuracy of the method, because cross correlation of a large template around the indicated point is employed to recover the same position in the marker in subsequent scans. Also note that although the accuracy of the cross correlation is voxel limited, subvoxel accuracy is achieved because the results for each subgroup are averages of multiple shift values.

We assessed the shifts of glandular tissue by quantification of the shifts of interactively indicated anatomical fiducials. The choice to use an interactive rather than an automated method is twofold. First of all, it provides insight in the nature of anatomical fiducials with regard to their uniqueness, thus aiding in their identification. Second, it provides a visual check whether the recovered anatomical fiducials truly indicate the same structure

110 Chapter 5

110 Section III

in the breast. The quality of the recovered anatomical fiducials is estimated by the value of the normalized cross-correlation, as well as by visual inspection of the correlation space and the corresponding MR volumes. The anatomical fiducials are selected only if visual coordinance is observed, and not on the basis of a high value of the normalized cross-correlation alone.

An alternative way to estimate tissue shifts in the breast is by using a non-rigid registration method 18-20. Control points are sampled according to a deformable grid that is overlaid on an image, and subsequently matched to counterpoints in a second image. Alternatively, contours or other image features from one image are deformed until they are aligned with their counter features in the second image. Using either strategy, the deformations are performed according to predefined models (e.g., elastic models or B-splines). Consequently, assumptions are made concerning the underlying tissue characteristics. We emphasize that our study was aimed at determining the mobility of internal breast tissue without making such prior assumptions. By scanning the volunteers in supine orientation, the necessity for extensive non-rigid registration methods for transformation of the MR volumes of the breasts in a prone position to their state during treatment is avoided.

The women under study are all healthy volunteers. The breasts of patients are not expected to differ in behavior with respect to the reproducibility of breast structure. Researchers from the field of magnetic resonance elastography established that the mechanical properties of the breast only differ from normal where the lesion occurs 21; in patients eligible for breast-conserving surgery the lesion is typically very small (~5 cm3) compared to the volume of normal breast tissue (~1000 cm3).

5.4.3. Application

We emphasize that our aim in this study was to assess the reproducibility of the structure of the breast, which is currently an important but unknown component of uncertainty in efforts to improve the accuracy of BCT. The reproducibility is expressed by the given reproducibility margins. This study is not aimed at providing a complete quantitative overview of margins required for an application to treatment. Other margin components are the subject of current research.

An application currently investigated is to localize the tumor and mark its extent immediately prior to actual surgery, using dedicated tracking devices to establish real-time mapping between the patient (breast), localization instrument (needle), and preoperatively acquired CE MRI data of the patient in a supine position. The actual position of an indicated point can then be expected within the reproducibility margin (in 3D). During the localization procedure, the arms of the patient must be positioned along the body similar to the orientation during imaging, and overall motion of the breast due to breathing will be taken into account because the breast is included in the tracking procedure. To minimize uncertainties due to changes of breast topology over time, the CE MRI data should be obtained close in time prior to actual surgery. An additional

Reproducibility of mammary gland structure 111

Treatment implementation 111

margin is required to take tissue shifts into account that will occur as a result of the insertion of a needle. Initial results from our group on tissue shifts due to needle insertion in compressed breasts suggest, however, that this margin may be small (~2 mm) 22.

The reproducibility of the mammary gland structure may also prove to be important for further optimization of postoperative radiation therapy, in particular, with regard to day-to-day variations in geometry. Additional uncertainties need to be taken into account, though. These include uncertainties due to differences of the random component of the tissue shifts that may occur when the arms are in a different position, when the breast conformation changes over time, and due to respiratory motion.

5.5. SUMMARY AND CONCLUSION

The tissue shifts in the breasts of healthy volunteers between repeat setups in a supine position have been quantified using MRI.

The tissue shifts are small with a mean standard deviation on the order of 1.5 mm, being slightly larger in large breasts (V > 1000 cm3), and in the posterior part (toward the pectoral muscle) of both small and large breasts. The application of a breast immobilization cast reduces the tissue shifts in large breasts. A reproducibility margin on the order of 5 mm will take the internal tissue shifts into account that occur between repeat setups.

The results demonstrate a high reproducibility of mammary gland structure during repeat setups in a supine position.

ACKNOWLEDGMENTS

The authors would like to thank Professor Dr. H. Bartelink for proof reading of the manuscript, Dr. E. J. Th. Rutgers for useful clinical input, A. A. M. Hart for useful input on statistical techniques, and J. Osinga and M. Romp for constructing the casts. In addition, we are grateful to the volunteers for their collaboration. This work was financially supported by the Dutch Cancer Society, Grant No. NKI 99-2035.

REFERENCES 1 C. C. Park, M. Mitsumori, A. Nixon, A. Recht, J. Connolly, R. Gelman, B. Silver, S.

Hetelekidis, A. Abner, J. R. Harris, and S. J. Schnitt, "Outcome at 8 years after breast-conserving surgery and radiation therapy for invasive breast cancer: influence of margin status and systemic therapy on local recurrence," J. Clin. Oncol. 18, 1668-1675 (2000).

2 D. E. Wazer, G. Jabro, R. Ruthazer, C. Schmid, H. Safaii, and R. K. Schmidt-Ullrich, "Extent of margin positivity as a predictor for local recurrence after breast conserving irradiation," Radiat. Oncol. Investig. 7, 111-117 (1999).

112 Chapter 5

112 Section III

3 B. Spivack, M. M. Khanna, L. Tafra, G. Juillard, and A. E. Giuliano, "Margin status and local recurrence after breast-conserving surgery," Arch. Surg. 129, 952-956 (1994).

4 C. Vrieling, L. Collette, A. Fourquet, W. J. Hoogenraad, J. H. Horiot, J. J. Jager, M. Pierart, P. M. Poortmans, H. Struikmans, B. Maat, E. Van Limbergen, and H. Bartelink, "The influence of patient, tumor and treatment factors on the cosmetic results after breast-conserving therapy in the EORTC 'boost vs. no boost' trial. EORTC Radiotherapy and Breast Cancer Cooperative Groups," Radiother. Oncol. 55, 219-232 (2000).

5 A. H. Davies, A. Cowan, P. Jones, R. M. Watkins, and C. Teasdale, "Ultrasound localization of screen detected impalpable breast tumours," J. R. Coll. Surg. Edinb. 39, 353-354 (1994).

6 U. Fischer, L. Kopka, and E. Grabbe, "Magnetic resonance guided localization and biopsy of suspicious breast lesions," Top. Magn Reson. Imaging 9, 44-59 (1998).

7 U. Fischer, R. Vosshenrich, W. Doler, A. Hamadeh, J. W. Oestmann, and E. Grabbe, "MR imaging-guided breast intervention: experience with two systems," Radiology 195, 533-538 (1995).

8 U. Fischer, R. Vosshenrich, H. Bruhn, D. Keating, B. W. Raab, and J. W. Oestmann, "MR-guided localization of suspected breast lesions detected exclusively by postcontrast MRI," J. Comput. Assist. Tomogr. 19, 63-66 (1995).

9 P. L. Davis, M. J. Staiger, K. B. Harris, M. A. Ganott, J. Klementaviciene, K. S. McCarty, Jr., and H. Tobon, "Breast cancer measurements with magnetic resonance imaging, ultrasonography, and mammography," Breast Cancer Res. Treat. 37, 1-9 (1996).

10 C. Boetes, R. D. Mus, R. Holland, J. O. Barentsz, S. P. Strijk, T. Wobbes, J. H. Hendriks, and S. H. Ruys, "Breast tumors: comparative accuracy of MR imaging relative to mammography and US for demonstrating extent," Radiology 197, 743-747 (1995).

11 J. M. Dixon, O. Ravisekar, M. Cunningham, E. D. Anderson, T. J. Anderson, and H. K. Brown, "Factors affecting outcome of patients with impalpable breast cancer detected by breast screening," Br. J. Surg. 83, 997-1001 (1996).

12 M. R. Christiaens, L. Cataliotti, I. Fentiman, E. Rutgers, M. Blichert-Toft, J. E. DeVries, H. P. Graversen, K. Vantongelen, and R. Aerts, "Comparison of the surgical procedures for breast conserving treatment of early breast cancer in seven EORTC centres," Eur. J. Cancer 32A, 1866-1875 (1996).

13 K. G. Gilhuijs, M. L. Giger, and U. Bick, "Computerized analysis of breast lesions in three dimensions using dynamic magnetic-resonance imaging," Med. Phys. 25, 1647-1654 (1998).

Reproducibility of mammary gland structure 113

Treatment implementation 113

14 R. C. Gonzales and P. Wintz, "Digital image processing," Addison-Wesley, Reading, Mass. 2nd. ed., (1987).

15 B. Likar and F. Pernus, "Automatic extraction of corresponding points for the registration of medical images," Med. Phys. 26, 1678-1686 (1999).

16 M. van Herk, P. Remeijer, C. Rasch, and J. V. Lebesque, "The probability of correct target dosage: dose-population histograms for deriving treatment margins in radiotherapy," Int. J. Radiat. Oncol. Biol. Phys. 47, 1121-1135 (2000).

17 J. C. de Munck, R. Bhagwandien, S. H. Muller, F. C. Verster, and M. B. van Herk, "The computation of MR image distortions caused by tissue susceptibility using the boundary element method," IEEE Trans. Med. Imaging 15, 620-627 (1996).

18 E. R. Denton, L. I. Sonoda, D. Rueckert, S. C. Rankin, C. Hayes, M. O. Leach, D. L. Hill, and D. J. Hawkes, "Comparison and evaluation of rigid, affine, and nonrigid registration of breast MR images," J. Comput. Assist. Tomogr. 23, 800-805 (1999).

19 T. Bruckner, R. Lucht, and G. Brix, "Comparison of rigid and elastic matching of dynamic magnetic resonance mammographic images by mutual information," Med. Phys. 27, 2456-2461 (2000).

20 R. Lucht, M. V. Knopp, and G. Brix, "Elastic matching of dynamic MR mammographic images," Magn Reson. Med. 43, 9-16 (2000).

21 J. Lorenzen, R. Sinkus, D. Schrader, M. Lorenzen, C. Leussler, M. Dargatz, and P. Roschmann, "[Imaging of breast tumors using MR elastography]," Rofo Fortschr. Geb. Rontgenstr. Neuen Bildgeb. Verfahr. 173, 12-17 (2001).

22 E. E. Deurloo, K. G. Gilhuijs, L. J. Schultze Kool, and S. H. Muller, "Displacement of breast tissue and needle deviations during stereotactic procedures," Invest Radiol. 36, 347-353 (2001).

SECTION IV

Treatment evaluation

CHAPTER 6

Potential impact of measurement variation on false-categorization rates in evaluating breast-tumor response using RECIST

William F. A. Klein Zeggelink 1, Eline E. Deurloo 1,2, H. Jelle Teertstra 1, Emiel J. Th. Rutgers 3, Harry Bartelink 4, and Kenneth G. A. Gilhuijs 1

1 Department of Radiology, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 2 Department of Radiology, Academic Medical Center/University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands 3 Department of Surgery, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 4 Department of Radiotherapy, The Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

Submitted to the Journal of the National Cancer Institute (JNCI)

Potential impact of measurement variation on false-categorization rates 119

Treatment evaluation 119

ABSTRACT

The purpose of this study was to estimate the false-categorization rates due to measurement variation that may occur if the response of breast tumors to therapy is monitored using RECIST and to identify the imaging techniques with the lowest rates.

Measurements of largest diameter (LD) at mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (CE MRI) were retrieved for 204 breast cancers. Variations in these measurements were estimated using analysis-of-variance and linear-regression analysis. The obtained measurement variations were subsequently used to estimate the false-categorization rates for partial response (PR) and progressive disease (PD) that may occur if these breast tumors were monitored using RECIST.

The average false-categorization rate for PR, due to unjustly observing a decrease in LD of at least 30% because of measurement variation alone, was estimated at 14% for mammography, 13% for ultrasonography, and 6% for CE MRI. The average false-categorization rate for PD, due to unjustly observing an increase in LD of at least 20% because of measurement variation alone, was estimated at 29% for mammography, 28% for ultrasonography, and 19% for CE MRI.

Despite RECIST requirements to observe substantial differences in LD before assuming PR or PD, measurement variation may result in considerable false-categorization rates in monitoring response of breast tumors. In particular, an incorrect assessment of PD may occur frequently, which may result in the premature rejection of an actually effective treatment. Applying CE MRI is expected to result in lower false-categorization rates compared to those resulting from mammography or ultrasonography.

6.1. INTRODUCTION

In the treatment of breast cancer, the effects of systemic therapy are increasingly more often monitored to assess the efficacy of a specific treatment regimen, to assess whether a patient's individual treatment plan needs changes, and to guide locoregional therapy. Changes in the largest diameter (LD) over the course of treatment, observed using present breast-imaging techniques, are typically used to evaluate the response of breast tumors. The currently world-wide adopted Response Evaluation Criteria In Solid Tumors (RECIST) define a complete response (CR) as the disappearance of all tumor(s), a partial response (PR) as a decrease in the LD(s) of at least 30%, progressive disease (PD) as an increase in the LD(s) of at least 20%, and stable disease (SD) in all other cases 1.

The assessment of the LD is, however, subject to measurement variation which is caused by both the imaging technique and the human observer. Effectively, a difference in the LD may be observed, while in reality the tumor did not change. If the observed difference is large enough to meet the RECIST criteria for PR or PD, such will result in the incorrect assumption that the lesion has partly regressed or progressed in its extent. Consequently,

120 Chapter 6

120 Section IV

the tumor will be falsely classified to the PR category or the PD category, while it should have been classified to the SD category. The magnitude of variability in the assessment of the LD, and the threshold values for PR and PD imposed by RECIST, together determine the rates of false categorizations due to measurement variation.

Some authors have estimated that measurement variation may result in false-categorization rates as high as 34% 2-5. Their results are, however, not easily extendable to current response evaluations in breast tumors. First, these studies were restricted to simulated nodules and neck nodes assessed by physical examination (palpation), or to pulmonary metastases on chest roentgenograms. Breast tumors were not involved, nor were any of the (conventional and emerging) imaging techniques currently used to assess tumor extent in the breast, such as contrast-enhanced magnetic resonance imaging (CE MRI). Secondly, these studies were limited to bidimensional (the product of perpendicular diameters) instead of unidimensional (the LD only) measurements of tumor extent, in accordance with former criteria such as those of the World Health Organization (WHO) 6. Knowledge of the false-categorization rates in monitoring the response of breast tumors is, however, of essential value to assess the reliability of evaluations of treatment regimens and decision-making processes in current clinical practice as well as to allow for prospective comparisons between different treatments.

The purpose of this study was to estimate the false-categorization rates due to measurement variation that may occur if the response of breast tumors to therapy is monitored using RECIST and to identify the imaging techniques with the lowest rates.

6.2. MATERIALS AND METHODS

6.2.1. Synopsis

We estimated the variation in the measurement of the largest diameter (LD) in daily clinical practice at mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (CE MRI). For this purpose, a mathematical model based on analysis-of-variance was developed, which makes use of the single radiological measurements of the LD without the need for repeated measurements under artificial reading conditions. The obtained measurement variations were subsequently used to estimate the false-categorization rates for partial response (PR) and progressive disease (PD) that may occur if these breast tumors were monitored for response to therapy using the RECIST criteria.

6.2.2. Database

This retrospective study was performed after approval of the institutional review board and waiver of informed consent on data available from breast cancer patients treated at the Netherlands Cancer Institute / Antoni van Leeuwenhoek Hospital between January 1999 and January 2004. A dataset was constructed from our radiological report database by retrieving LD measurements of the primary tumor in patients for whom: (1) breast

Potential impact of measurement variation on false-categorization rates 121

Treatment evaluation 121

cancer was proven preoperatively by triple diagnosis including fine-needle aspiration or core biopsy; (2) conventional x-ray mammography, two-dimensional ultrasonography, and CE MRI were performed prior to their actual treatment; (3) the primary tumor was visible at each imaging technique; (4) measurements of the LD were performed for each imaging technique, in daily clinical practice, and were recorded for retrospective analysis. Based on these criteria, the measurement data of 204 primary breast tumors were retrieved from the radiological report database. Mean age of the patients was 55 years (range 30 – 86 years); 152 underwent breast-conserving therapy; 52 underwent mastectomy; and 11 of these patients had received neoadjuvant chemotherapy. Histological tumor types were: invasive ductal carcinoma (163); invasive lobular carcinoma (19); mixed invasive pattern (8); medullary carcinoma (3); tubular carcinoma (3); colloid carcinoma (2); ductal carcinoma in situ (3); and other (undefined) types of breast malignancies (3).

For each tumor in the dataset, the LD had been measured three times: once at mammography, once at ultrasonography, and once at CE MRI. All measurements were performed by a team of six radiologists working independently from each other. Each of the six radiologists thus contributed to a different subset of the total set of LD measurements with no radiologist-specific assignment of imaging technique or patient.

Conventional x-ray mammography was performed using a Trex Lorad M-IV (Trex Medical Corporation, Lorad Division, Danbury, CT) or a Philips Mammo Diagnost 3000 (Philips Medical Systems, AG Hamburg, Germany) in craniocaudal and medio-lateral oblique projections on Agfa HDR Film (Agfa Gevaert NV, Mortsel, Belgium). The LD was measured using a ruler on the film prints and all measurement results were corrected for magnification effects prior to the analyses.

Two-dimensional ultrasonography was performed with a GE Voluson 730 (GE Medical Systems, Milwaukee, WI) or a Siemens Sonoline Elegra (Siemens Medical Systems AG, Erlangen, Germany) in two perpendicular planes through the tumor. Measurements of the LD were obtained using a software caliper provided by the ultrasonography system or occasionally by means of a ruler on hard copies of the images.

CE MRI was performed with a 1.5 T Siemens Magnetom scanner (Siemens Medical Systems AG, Erlangen, Germany) using fast low-angle shot three-dimensional (FLASH 3D) acquisition. The breasts were imaged using a dedicated double-breast array coil with the patient in prone orientation. One series before and four to five series after intravenous bolus injection of contrast agent (Prohance, Bracco-Byk Gulden, Konstanz, Germany; 0.1 mmol/kg body weight at a rate of 2 – 4 ml/s) were acquired at intervals of either 120 seconds (four postcontrast scans) or 90 seconds (five postcontrast scans). The following MRI parameters were used: T1-weighted sequence; repetition time (TR) 8.1 ms; echo time (TE) 4.0 ms; reconstructed in-plane matrix 256 × 256 pixels; isotropic in-plane resolution 1.21 × 1.21 mm2 or 1.35 × 1.35 mm2; slice thickness 1.69 mm or 1.35 mm; and no fat suppression. Images were reconstructed in two perpendicular views (coronal and transversal) and subtraction images were available to examine initial and late

122 Chapter 6

122 Section IV

enhancement. The LD was measured on the initial-enhancement images using a software caliper provided by the MRI system or by means of a ruler on hard copies.

6.2.3. Analysis

We developed a mathematical model to estimate measurement variations and false-categorization rates. The model is based on the previously developed residuals (RES) method that uses analysis-of-variance 7. In short, the measurements of the LD of each tumor at mammography, at ultrasonography, and at CE MRI are compared to the average of these measurements. For this purpose, only one measurement of the LD per tumor is required at each imaging technique. The result of the RES method is an estimate for the measurement variation at mammography, at ultrasonography, and at CE MRI that explains the observed differences in measurement of the LD between these techniques.

The RES method was used in conjunction with the sliding-window technique and linear-regression analysis in order to investigate the relationship between the magnitude of measurement variation and tumor size. The average of the measurements of the LD per tumor at mammography, ultrasonography, and CE MRI was used to sort the total set of LD measurements in ascending order prior to analysis. In accordance with previously established guidelines 7, the sliding window constituted a subset of 50 LD measurements. The sliding window incrementally proceeded through the sorted total set of LD measurements, and for each subset, the measurement variations were calculated using the RES method. The average of the measurements of the LD per subset per imaging modality was used to represent the average tumor size for which the measurement variation at each technique was obtained. After proceeding through the sorted total set of LD measurements, the estimates for the individual measurement variations were fitted using linear-regression analysis. Thus obtained regression models were subsequently used to calculate linear trend lines for the measurement variation at each imaging technique over an interval of tumor sizes between 10 and 30 mm.

A Monte Carlo study was performed to assess the reliability of the linear trend lines for the measurement variation at each imaging technique. Three sets of largest diameters, one set per imaging technique, were drawn from the given distributions of LD measurements. Repeat LD measurements were simulated next, based on the obtained linear trend lines for the measurement variation at each imaging technique. During each repetition, the linear trend lines were re-estimated from the simulated data using the model described above. After 1000 repeat estimations, the averages and corresponding 95% confidence interval limits were calculated for all points on the linear trend lines.

Another Monte Carlo study was performed to estimate the false-categorization rates for PR and PD that may occur if these breast tumors were monitored for response to therapy using the RECIST criteria. Repeat serial pairs of LD measurements were simulated for all points on the obtained linear trend lines for the measurement variation at each imaging technique. If the second LD measurement was at least 30% smaller than the first LD measurement, it was considered an incorrect observation of PR. If the second LD

Potential impact of measurement variation on false-categorization rates 123

Treatment evaluation 123

measurement was at least 20% larger than the first LD measurement, it was considered an incorrect observation of PD. After 100,000 repeat serial evaluations, the percentage of occurred PR and the percentage of occurred PD thus yielded the estimates for the false-categorization rates. This simulation scheme was executed separately for the average linear trend lines and accompanying 95% confidence interval limits of each technique.

Microsoft Excel 2002 (Microsoft Corp., Redmond, WA), Microsoft Visual Basic 6.3 (Microsoft Corp., Redmond, WA), and MATLAB 6.5 Release 13 (The MathWorks Inc., Natick, MA ) were used to perform all analyses and Monte Carlo studies.

6.3. RESULTS

6.3.1. Preamble

In this section, the measurement variations are expressed in terms of standard deviation (SD), and the false-categorization rates are expressed as the percentage of times that an incorrect observation of partial response (PR) or progressive disease (PD) is expected to occur in monitoring the response of breast tumors using RECIST. Differences in the obtained trend lines, either within or between the imaging techniques, are considered statistically significant if the accompanying 95% confidence interval limits do not cross.

6.3.2. Estimated measurement variations

The variation in the measurement of the largest diameter (LD) was found to be proportional to the size of the tumor at mammography and at ultrasonography (Fig. 6.1). For mammography, the measurement variation ranged between 1.7 mm for the smallest tumors and 7.9 mm for the largest tumors. Comparable, the measurement variation ranged between 2.0 mm (smallest tumors) and 6.9 mm (largest tumors) for ultrasonography. Conversely, the variation in the measurements of the LD at contrast-enhanced magnetic resonance imaging (CE MRI) was not found to depend on tumor size, and was estimated at 2.9 mm on average. For relatively large tumors, the variation at CE MRI was observed to be significantly smaller than the variation at mammography (LD ≥ 18 mm; 2.9 mm vs 6.0 mm; mean SDs) or the variation at ultrasonography (LD ≥ 20 mm; 2.9 mm vs 5.6 mm; mean SDs), while no differences were observed for smaller tumors.

6.3.3. Estimated false categorization rates

The average false-categorization rate for PR, due to unjustly observing a decrease in LD of at least 30% because of measurement variation alone, was estimated at 14% for mammography, 13% for ultrasonography, and 6% for CE MRI (Fig. 6.2). The estimated false-categorization rate for PR was not found to be dependent on the size of the tumor at mammography or at ultrasonography. The estimated false-categorization rate for PR using CE MRI showed, however, a decreasing trend with increasing tumor size, with an

124 Chapter 6

124 Section IV

Mammography : measurement variation

0

2

4

6

8

10

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

1 SD

(mm

)

Ultrasonography : measurement variation

0

2

4

6

8

10

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

1 SD

(mm

)

CE MRI : measurement variation

0

2

4

6

8

10

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

1 SD

(mm

)

Fig. 6.1. Measurement variation expressed in terms of 1 standard deviation (1 SD) as a function of largest tumor diameter, at mammography (top), at ultrasonography (mid), and at contrast-enhanced magnetic resonance imaging (CE MRI) (below). Shown is the average trend (bold curves) with its accompanying 95% confidence interval (thin curves).

Potential impact of measurement variation on false-categorization rates 125

Treatment evaluation 125

Mammography : false categorizations PR

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

Ultrasonography : false categorizations PR

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

CE MRI : false categorizations PR

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

Fig. 6.2. False categorization rates for partial response (PR) as a function of largest tumor diameter, at mammography (top), at ultrasonography (mid), and at contrast-enhanced magnetic resonance imaging (CE MRI) (below). Shown is the average trend (bold curves) with its accompanying 95% confidence interval (thin curves).

126 Chapter 6

126 Section IV

Mammography : false categorizations PD

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

Ultrasonography : false categorizations PD

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

CE MRI : false categorizations PD

0

10

20

30

40

50

7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5Largest diameter (mm)

Eval

uatio

ns (%

)

Fig. 6.3. False categorization rates for progressive disease (PD) as a function of largest tumor diameter, at mammography (top), at ultrasonography (mid), and at contrast-enhanced magnetic resonance imaging (CE MRI) (below). Shown is the average trend (bold curves) with its accompanying 95% confidence interval (thin curves).

Potential impact of measurement variation on false-categorization rates 127

Treatment evaluation 127

estimated false-categorization rate ranging between 18% and 1%. For relatively large tumors, the estimated false-categorization rate for PR at CE MRI was significantly smaller than that at mammography (LD ≥ 18 mm; 2% vs 16%; mean rates) or that at ultrasonography (LD ≥ 20 mm; 2% vs 14%; mean rates). No differences between the estimated false-categorization rates for PR were, however, found for smaller tumors.

The average false-categorization rate for PD, due to unjustly observing an increase in LD of at least 20% because of measurement variation alone, was estimated at 29% for mammography, 28% for ultrasonography, and 19% for CE MRI (Fig. 6.3). Although the scale of the estimated false-categorization rates for PD is clearly higher than that for PR, similar patterns were observed for both response evaluation categories. The estimated false-categorization rate for PD was not found to be dependent on the size of the tumor at mammography or at ultrasonography. Conversely, the estimated false-categorization rate for PD using CE MRI showed a decreasing trend with increasing tumor size, with an estimated false-categorization rate ranging between 33% and 9%. For relatively large tumors, the estimated false-categorization rate for PD at CE MRI was significantly smaller than that at mammography (LD ≥ 18 mm; 14% vs 30%; mean rates) or that at ultrasonography (LD ≥ 20 mm; 13% vs 28%; mean rates), but no differences between the estimated false-categorization rates for PD were found for smaller tumors.

6.4. DISCUSSION

We estimated how often stable disease is interpreted as partial response (PR) or progressive disease (PD) due to measurement variation if the RECIST criteria are used in combination with conventional and emerging breast-imaging techniques. Averaged over the range of tumor sizes (10 mm ≤ largest diameter (LD) ≤ 30 mm) and the three imaging techniques, the chances of classifying a stable tumor to the PR category or the PD category were estimated at 11% and 25%, respectively. This implies that, despite the RECIST requirement to observe a reduction in LD of at least 30%, approximately 11% of stable tumors may erroneously be considered to respond to therapy (PR). Likewise, about 25% of stable tumors may erroneously be considered to progress in extent (PD), despite the RECIST requirement to observe a reduction in LD of at least 20%. Moreover, it should be noted that the total chance of falsely classifying stable disease as either PR or PD is actually the sum of both individual probabilities, and may thus be as high as 36%.

Although aimed at different tumor sites and measurement techniques, it is of interest to consider the results of preceding studies, because the current unidimensional RECIST criteria were established in part on the former bidimensional WHO criteria 1,6. Gurland and Johnson studied variation in measurements of pulmonary metastases using rulers on chest roentgenograms and estimated false-categorization rates of 5% for PR and 30% for PD 2. Moertel and Hanley investigated variation in assessments of simulated nodules by palpation using rulers or calipers and estimated false-categorization rates of 6.8% for PR and 31.5% for PD 3. Lavin and Flowerdew also investigated variation in assessments of simulated nodules by palpation using rulers or calipers, but considered only masses larger

128 Chapter 6

128 Section IV

than 30 mm, which resulted in estimated false-categorization rates of 2.5% for PR but still as much as 26% for PD 4. Warr et al. studied the variation in measurements of pulmonary metastases on chest roentgenograms and in assessments of simulated nodules and neck nodes using rulers or tape measures 5. False-categorization rates for PR were estimated to vary between 0.8% and 13.1% whereas those for PD were estimated to range from 15.9% up to 34.3%. The large variability in the false-categorization rates reported by these studies for various tumor types and measurement techniques underscores the importance of the current study to quantify the efficacy of the RECIST criteria specifically for breast tumors and current breast-imaging techniques.

In the current study we showed that measurement variation is likely to have considerable impact on false-categorization rates in monitoring the response of breast tumors to therapy. Another component that may add to false categorization is the difference between the accuracy of an imaging technique in visualizing extent prior and after treatment. Several authors investigated the accuracy of imaging techniques in demonstrating breast tumor extent after chemotherapy by comparing imaging results with those found at pathology. An early study by Herrada et al. demonstrated that palpation correlated best with pathology (r = 0.726) compared to mammography (r = 0.649) and ultrasonography (r = 0.600) in the assessment of residual tumor size 8. Fiorentino et al. also concluded that residual tumor size assessed by palpation using calipers correlated better with pathologic findings (r = 0.69) than the assessment of residual tumor size by mammography (r = 0.33) or by ultrasonography (r = 0.29) 9. The assessment of residual disease after therapy by mammography and ultrasonography is compromised by the difficulty to distinguish between therapy-induced fibrosis and residual tumor which is likely to contribute to the false-categorization rates. Balu-Maestro et al. included CE MRI in their comparative study and found that it was more accurate in demonstrating residual tumor size (63% of cases) than palpation (52% of cases), mammography (38% of cases), or ultrasonography (43% of cases) 10. Cheung et al. focused specifically at the accuracy of CE MRI in demonstrating residual tumor size and its potential for prediction of response, and found that residual tumor size at CE MRI correlated most accurately with microscopic findings (r = 0.982) 11. These and other studies have shown that CE MRI provides excellent performance in discriminating between therapy-induced fibrosis and vascularized residual tissue. In addition to this advantage, the current study indicates that CE MRI yields the lowest risk of erroneously assuming PR or PD on stable disease because of measurement variation, compared to mammography or ultrasonography.

To our knowledge, the current study is the first to provide estimates for the false-categorization rates due to measurement variation that may occur in response evaluations using RECIST specifically for real breast tumors and current breast-imaging techniques. We chose to use analysis-of-variance to estimate measurement variation, because it has the major advantage that measurements do not need to be repeated, thus preventing biases due to observer recollection or laboratory conditions 7. The sliding-window technique and linear-regression analysis were employed to establish the dependency of the measurement variation on tumor size. A disadvantage of this approach is, however, that it provided

Potential impact of measurement variation on false-categorization rates 129

Treatment evaluation 129

estimates for the measurement variations over a smaller interval of tumor sizes than that of the tumors actually included in the study. As such, the estimates for the measurement variations and the estimates for the false-categorization rates were obtained over a range of tumor sizes between 10 and 30 mm. Nonetheless, stable trends were found in the upper range (Figs. 6.1 – 6.3), suggesting that results may be extrapolated for larger tumors. All measurements of the LD were obtained during the radiological workup of patients prior to their actual treatment. As indicated previously, mammography and ultrasonography are limited by the difficulty to distinguish between treatment-induced fibrosis and residual tumor. Consequently, the false-categorization rates inferred in the current study represent the minimum rates that must be anticipated when monitoring the response of breast tumors using the RECIST criteria with these imaging techniques. Because each radiologist contributed to a different subset of the measurements with no bias towards imaging technique or patient, the measurement variation at each imaging technique thus represents the sum of intra- and inter-observer variability. If serial measurements are performed by the same radiologist, the variation in the assessment of the LD and hence the false categorization rates for PR and PD, may be lower than those inferred in the current study. Such an idealized situation does, however, not correspond to the actual situation in typical clinical practice. Consequently, the estimated false-categorization rates in our study are expected to provide a more realistic representation than those obtained from repeat measurement studies under artificial reading conditions.

The presence of false categorizations in response evaluations most likely has a profound impact on the conduct of clinical trials and the reliability of decision-making processes in the daily practice of breast cancer treatment. Such may result in an early termination of drugs that are potentially effective or continuation of ineffective drugs in patients who would potentially benefit from an alternative treatment. A reduction of false categorization due to measurement variation may be established by pursuing two different strategies. Introducing more stringent response evaluation criteria would inherently lower the false-categorization rates, but such would also reduce the sensitivity of detecting true tumor changes (PR or PD), and complicate any clinical trial aimed at comparing future results with those obtained at present. Another solution is to reduce the underlying measurement variations by applying computerized methods for (semi-) automated assessment of tumor extent. Some authors developed algorithms for automated segmentation of breast tumors in mammograms 12 from which the LD may be determined. Such strategy may reduce measurement variation because the influence of the human observer is eliminated. Methods for automated tumor segmentation in two-dimensional (2D) ultrasound images have also been developed 13. The potential merit of any automated strategy to measure tumor extent would be optimally utilized with true three-dimensional (3D) imaging. Three-dimensional ultrasonography may replace the current 2D standard and offers opportunities 14 for automated measurement of the LD or tumor volume. CE MRI inherently provides undistorted 3D information, has been shown superior to other imaging techniques in demonstrating extent of invasive breast tumor, and algorithms for 3D automated segmentation are available 15. A recently published

130 Chapter 6

130 Section IV

study demonstrates that reductions in breast-tumor volume measured semi-automatically using CE MRI may also be predictive for recurrence-free survival 16.

In conclusion, despite RECIST requirements to observe substantial differences in LD before assuming PR or PD, measurement variation may result in considerable false-categorization rates in monitoring response of breast tumors. In particular, an incorrect assessment of PD may occur frequently, which may result in the premature rejection of an actually effective treatment. Applying CE MRI is expected to result in lower false-categorization rates compared to those resulting from mammography or ultrasonography.

ACKNOWLEDGMENTS

The authors are thankful to Dr. F. A. Pameijer, C. E. Loo, Dr. S. H. Muller, and A. A. M. Hart for useful discussions on the research described in this paper. This work was financially supported by the Dutch Cancer Society, Grant No. NKI 99-2035.

REFERENCES 1 P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, R. S. Kaplan, L. Rubinstein,

J. Verweij, M. Van Glabbeke, A. T. van Oosterom, M. C. Christian, and S. G. Gwyther, "New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada," J. Natl. Cancer Inst. 92, 205-216 (2000).

2 J. Gurland and R. O. Johnson, "How reliable are tumor measurements?," JAMA 194, 125-130 (1965).

3 C. G. Moertel and J. A. Hanley, "The effect of measuring error on the results of therapeutic trials in advanced cancer," Cancer 38, 388-394 (1976).

4 P. T. Lavin and G. Flowerdew, "Studies in variation associated with the measurement of solid tumors," Cancer 46, 1286-1290 (1980).

5 D. Warr, S. McKinney, and I. Tannock, "Influence of measurement error on assessment of response to anticancer chemotherapy: proposal for new criteria of tumor response," J. Clin. Oncol. 2, 1040-1046 (1984).

6 A. B. Miller, B. Hoogstraten, M. Staquet, and A. Winkler, "Reporting results of cancer treatment," Cancer 47, 207-214 (1981).

7 W. F. Klein Zeggelink, A. A. Hart, and K. G. Gilhuijs, "Assessment of analysis-of-variance-based methods to quantify the random variations of observers in medical imaging measurements: guidelines to the investigator," Med. Phys. 31, 1996-2007 (2004).

Potential impact of measurement variation on false-categorization rates 131

Treatment evaluation 131

8 J. Herrada, R. B. Iyer, E. N. Atkinson, N. Sneige, A. U. Buzdar, and G. N. Hortobagyi, "Relative value of physical examination, mammography, and breast sonography in evaluating the size of the primary tumor and regional lymph node metastases in women receiving neoadjuvant chemotherapy for locally advanced breast carcinoma," Clin. Cancer Res. 3, 1565-1569 (1997).

9 C. Fiorentino, A. Berruti, A. Bottini, M. Bodini, M. P. Brizzi, A. Brunelli, U. Marini, G. Allevi, S. Aguggini, A. Tira, P. Alquati, L. Olivetti, and L. Dogliotti, "Accuracy of mammography and echography versus clinical palpation in the assessment of response to primary chemotherapy in breast cancer patients with operable disease," Breast Cancer Res. Treat. 69, 143-151 (2001).

10 C. Balu-Maestro, C. Chapellier, A. Bleuse, I. Chanalet, C. Chauvel, and R. Largillier, "Imaging in evaluation of response to neoadjuvant breast cancer treatment benefits of MRI," Breast Cancer Res. Treat. 72, 145-152 (2002).

11 Y. C. Cheung, S. C. Chen, M. Y. Su, L. C. See, S. Hsueh, H. K. Chang, Y. C. Lin, and C. S. Tsai, "Monitoring the size and response of locally advanced breast cancers to neoadjuvant chemotherapy (weekly paclitaxel and epirubicin) with serial enhanced MRI," Breast Cancer Res. Treat. 78, 51-58 (2003).

12 S. Timp and N. Karssemeijer, "A new 2D segmentation method based on dynamic programming applied to computer aided detection in mammography," Med. Phys. 31, 958-971 (2004).

13 R. F. Chang, W. J. Wu, W. K. Moon, and D. R. Chen, "Automatic ultrasound segmentation and morphology based diagnosis of solid breast tumors," Breast Cancer Res. Treat. 89, 179-185 (2005).

14 B. Sahiner, H. P. Chan, M. A. Roubidoux, M. A. Helvie, L. M. Hadjiiski, A. Ramachandran, C. Paramagul, G. L. LeCarpentier, A. Nees, and C. Blane, "Computerized characterization of breast masses on three-dimensional ultrasound volumes," Med. Phys. 31, 744-754 (2004).

15 W. Chen, M. L. Giger, and U. Bick, "A fuzzy c-means (FCM)-based approach for computerized segmentation of breast lesions in dynamic contrast-enhanced MR images," Acad. Radiol. 13, 63-72 (2006).

16 S. C. Partridge, J. E. Gibbs, Y. Lu, L. J. Esserman, D. Tripathy, D. S. Wolverton, H. S. Rugo, E. S. Hwang, C. A. Ewing, and N. M. Hylton, "MRI measurements of breast tumor volume predict response to neoadjuvant chemotherapy and recurrence-free survival," AJR Am. J. Roentgenol. 184, 1774-1781 (2005).

SECTION V

General discussion

CHAPTER 7

Discussion

Discussion 137

General discussion 137

7.1. RADIOLOGICAL ASSESSMENT OF BREAST TUMOR EXTENT

The aim of this thesis was to quantify the impact of uncertainty in the radiological assessment of tumor extent on the efficacy of current clinical guidelines for the treatment of breast cancer, and to assess whether the efficacy of these guidelines is improved if contrast-enhanced magnetic resonance imaging (CE MRI) of the breast is used complementary to conventional breast imaging (mammography and ultrasonography). Within this last chapter, the implications of the findings described in this thesis with respect to the clinical practice of breast cancer treatment are discussed and directions for future developments aimed at further improvements are presented, for each subsequent section separately (treatment selection, treatment implementation, treatment evaluation). Finally, conclusions are given, summarizing the most significant results of this thesis.

7.1.1. Treatment selection

Although breast-conserving therapy (BCT) has become more common since the introduction of screening programs for an early detection of breast cancer, the number of mastectomy procedures is still comparable to the number of BCT procedures 1. Many mastectomies are performed for breast cancers that are incompletely excised during an initial attempt of breast-conserving surgery and for which an additional reexcision was no longer considered cosmetically feasible. Incomplete tumor resections are often due to an underestimation of tumor extent prior to treatment. High accuracy in determining patient eligibility for BCT is therefore of utmost importance to limit the number of reexcisions and mastectomies after initial breast-conserving attempts.

Current clinical guidelines incorporate mammography and ultrasonography to preoperatively assess tumor extent in order to select patients for BCT 2,3. We have shown, however, that conventional imaging accurately targets total tumor extent in only 70% of patients (chapter 2). This means that in 30% of the patients considered eligible for BCT, tumor extent is underestimated or overestimated prior to treatment, which may thus negatively affect treatment results (e.g., incomplete excision, reduced cosmesis). Conversely, we have shown that CE MRI accurately targets total tumor extent in as much as 90% of patients (chapter 2). Retrospectively, CE MRI was found to have complementary value with respect to conventional imaging in 23% of patients. Prospectively, CE MRI is thus expected to improve the efficacy of current clinical guidelines to determine patient eligibility for BCT, because of its ability to improve the accuracy to target total tumor extent from 70% towards 90% of patients.

In our study, addition of CE MRI to conventional imaging for the assessment of tumor extent resulted in a switch from BCT to mastectomy in 19% of patients (chapter 2). The decision whether CE MRI should be performed is, however, a trade-off between its high cost (~18 × more expensive compared to conventional imaging) and its merit with regard to patient benefit. We have shown that patients younger than 58 years with irregular tumor margins and a difference between measurements of tumor extent at mammography

138 Chapter 7

138 Section V

and those at ultrasonography of at least 10 mm have a 3.2 × higher probability to have advantage of CE MRI than other patients (chapter 2). Therefore, in our opinion, performing preoperative CE MRI is particularly warranted for these patients. These guidelines are currently validated in a prospective setting at our clinic.

Improvements in the accuracy of imaging modalities in demonstrating breast tumor extent are likely to further improve the reliability of evaluating patient eligibility for BCT. Digital mammography is currently at the stage of replacing film-screen mammography in an increasing number of hospitals. Digital mammography may improve the conspicuity of lesions in dense mammograms thus facilitating the interpretation of lesion boundaries 4. Higher transducer frequencies in standard two-dimensional (2D) ultrasonography and the emergence of three-dimensional (3D) ultrasonography may improve the ability to discriminate between tumor boundary and surrounding tissue 5,6. Increments of the magnetic field strength in MR scanners and higher electrical field strengths of dedicated breast MR pickup-coils may improve both the temporal and spatial resolution of CE MRI thus facilitating visualization detail 7. Upcoming technologies such as magnetic resonance elastography, magnetic resonance spectrography, magnetic resonance electrical impedance tomography, positron emission tomography, and tomosynthesis may also prove to be of complementary value to the imaging modalities available at present 8-12.

7.1.2. Treatment implementation

The objective of BCT is to maximize local control (optimal patient survival) while maintaining good cosmetic outcome (optimal quality of life). To achieve this, resection of the tumor must be complete, while minimizing the total volume of excised breast tissue 13-16. A margin of healthy tissue around the tumor is incorporated in order to take several components of uncertainty into account during surgery: uncertainty in (macroscopic and microscopic) measurement of tumor extent, uncertainty in localization of tumor extent, and the uncertainty that is introduced due to surgical inaccuracy (difference between intended and actual resection margins).

7.1.2.1. Extent measurement

Current clinical guidelines for the implementation of BCT advise surgeons to excise breast tumors with a margin of typically 1 cm in order to cover the various components of uncertainty during surgery 17. Because non-palpable tumors constitute the major part of the lesions in patients eligible for BCT, the surgeon often must rely on the radiological measurements of tumor extent, in order to estimate the optimal total excision volume.

We have shown, however, that the uncertainty in the preoperative measurements of tumor extent due to random variations constitutes a large portion (ranging 40% – 100%) of the 1 cm margin in BCT (chapter 4). For relatively small tumors (< 17 mm), the uncertainty was estimated at 4 – 7 mm, for both conventional imaging and CE MRI. Conversely, for

Discussion 139

General discussion 139

relatively large tumors (≥ 17 mm), the uncertainty was estimated to constitute 7 – 10 mm at conventional imaging, whereas the uncertainty at CE MRI was still estimated to be 4 – 7 mm. Application of CE MRI thus reduces the uncertainty in the assessment tumor extent within the currently widely-adopted 1 cm margin in BCT (towards 40% – 70%). Preoperative CE MRI is particularly warranted since the 1 cm margin is intended to cover the other components of uncertainty during surgery as well (see Sec. 7.1.2).

We quantified the random variations in the measurement of tumor extent in a population of patients specifically eligible for BCT and separated the uncertainty at preoperative imaging from the uncertainty at pathology (chapter 4). We found that the overall random variations were 3.6 mm (1 standard deviation (SD)) for mammography, 3.1 mm (1 SD) for ultrasonography, and 2.8 mm (1 SD) for CE MRI. Interestingly, the random variations at pathology were found to be of comparable magnitude to those at preoperative imaging: 3.2 mm (1 SD). Consequently, to evaluate the potential merit of new imaging techniques with respect to their accuracy in measuring tumor extent, the random variations must be taken into account in order to assess the reliability of such evaluations. This applies to comparisons between imaging and pathology, where both the random variations at imaging – and – the random variations at pathology must be considered. This also applies to comparisons between imaging techniques, regardless whether or not one of those is chosen as the gold standard, where the random variations of each imaging technique must be taken into account.

A further reduction of the uncertainty in the preoperative radiological measurements of tumor extent within the current 1 cm surgical safety margin may be accomplished by computerized (semi-) automated methods for assessment of tumor extent. Several investigators have focused on the development of algorithms for automated segmentation of breast tumors in mammograms thus yielding automated measurements of tumor extent 18-20. Nevertheless, the need to compress the breast as well as the fact that mammography can only provide projections of the tumor remain present, and both are important contributors to random variations. Algorithms for automated segmentation of breast tumors in 2D and 3D ultrasound images have been developed as well 21-26. Nonetheless, the visualization of certain tumor types such as ductal carcinoma in situ (DCIS) is problematic with current ultrasonography 27. Improvements may be achieved most effectively using algorithms for automated segmentation of breast tumors in the undistorted 3D data provided by CE MRI 28-30.

7.1.2.2. Extent localization

For the resection of non-palpable tumors in BCT, the surgeon typically relies on a hook wire that is preoperatively installed in the breast using conventional imaging techniques, in order to estimate the position of the tumor in the breast. Nevertheless, surgeons generally face great difficulty in obtaining a resection with a uniform margin of healthy tissue around the tumor 31. Concordantly, the experience of the surgeon is the dominant parameter that determines the chance of obtaining a complete tumor resection 32. The

140 Chapter 7

140 Section V

apparent difficulties in BCT thus necessitate better localization strategies in terms of both the position of the tumor and its extent.

We investigated whether preoperatively obtained CE MRI of the patient in a supine position can be used to guide the surgeon to localize both the position and the extent of the tumor immediately prior to the actual surgical procedure. A prerequisite for such an approach is that the geometry of the internal breast structure with respect to the skin must reproduce accurately between preoperative CE MRI and localization. We found that the geometrical structure of the mammary gland reproduces accurately if the setup of the patient is accurately reproduced (chapter 5). Application of an immobilization cast only slightly affects the reproducibility and may assist in keeping the breast immobilized after patient setup. Preoperative CE MRI of the patient in a supine position thus provides a highly reproducible representation of the geometry of the mammary gland structure prior to actual surgery. Image-guided localization strategies based on preoperative CE MRI of the patient in a supine position may therefore be of complementary value to localization methods based on conventional breast imaging.

A relatively new development is the application of a conventional hook wire in conjunction with 99mtechnetium-labeled human protein as a radioactive marker to localize the tumor during surgery. The 99mtechnetium marker is preoperatively injected under ultrasonography guidance or by using stereotactical x-ray mammography for tumors that cannot be visualized at ultrasonography. The procedure to preoperatively inject the 99mtechnetium marker is, however, sensitive to tissue shifts and needle deviations which may result in an inaccurate determination of the optimal injection site 33. In addition, it is unknown whether the 99mtechnetium marker always remains at the site of injection and which volume of radioactivity found during surgery (by means of a gamma probe) actually corresponds to the true extent of the tumor. Another disadvantage of such an approach is that the extent of DCIS is inaccurately visualized in a large percentage of patients 34.

Some authors have reported on the use of ultrasonography to guide the surgeon in localizing the position of the tumor and its extent. Mullen et al. used a carbon suspension to draw lines from the tumor toward the skin of the breast under ultrasonography guidance in order to mark the position of the tumor as an alternative for the conventional hook wire 35. The technique was, however, not applied to mark the extent of the tumor. Inoue et al. evaluated the feasibility of projecting 3D ultrasound images onto live video images of the breast 36. An optical tracking system was used to establish the geometrical correspondence between the 3D ultrasound probe and the breast. Such an augmented-reality visualization approach allows the surgeon to assess the extent of the tumor immediately prior to surgery as if it would be visible through the skin. Rahusen et al. reported on a successful application of intraoperative ultrasonography to demonstrate tumor extent during actual surgery 37. A general disadvantage of using ultrasonography for all such localization strategies is, however, that the extent of invasive tumors and especially the extent of DCIS may be underestimated 38.

Discussion 141

General discussion 141

Other investigators have reported on MRI-guided localization of breast tumors 39-41. Such an approach requires, however, a dedicated breast biopsy coil not commonly available in hospitals. In addition, MRI-guided localization is not a real-time procedure and a displacement of the tumor or the needle may result in an inaccurate localization 42. The method was also found to be less feasible for tumors smaller than 1 cm. Hirose et al. successfully used intraoperative CE MRI to visualize the extent of the tumor during actual surgery 43. Real-time MRI-guided surgery requires, however, a completely MRI-proof operating theatre with dedicated MRI devices which is only rarely available in hospitals. Such a procedure is also complex, time-consuming, and expensive. Tamaki et al. evaluated an approach in which preoperative CE MRI of the breast is obtained after it is covered by a MRI-visible lattice sheet for mapping the tumor 44. The surgeon marked the extent of the tumor on the skin of the breast, by inking those grids containing tumor in the preoperatively obtained MR images, and subsequently performed the excision. Complete removal of the non-palpable tumors was, however, achieved in only eight out of 12 patients.

We are currently investigating an approach for marking the extent of the tumor just before actual surgery commences by combining preoperatively obtained CE MRI of the patient in a supine position with real-time 3D ultrasonography. A dedicated optical tracking system will be used to establish the geometrical correspondence between the breast, the localization needle, the 3D ultrasound images, and the preoperatively obtained MR images. Geometrical correspondence between the breast during preoperative MR imaging and the breast during the actual localization procedure will be established by scanning the patient after placement of MRI-visible skin markers which are subsequently replaced by optical markers that are visible to the tracking system. Optical markers on the shaft of the localization needle and the 3D ultrasound probe will be used to obtain their position and orientation with respect to the breast during the localization procedure. Geometrical deformation of the mammary gland structure due to needle insertion will be real-time monitored by tracking fiducial points in the 3D ultrasound images which is thus applied to align the preoperatively obtained MR images. The previously derived reproducibility margins (chapter 5) will be applied to take the internal tissue shifts between preoperative MR imaging and the actual localization procedure into account. Using such an approach, it may be feasible to leave several traces of carbon suspension from the boundaries of the tumor toward the skin of the breast, in order to aid the surgeon in obtaining a complete excision of the tumor with the planned uniform 1 cm margin.

7.1.3. Treatment evaluation

Differences in tumor extent between treatment start and follow up are regularly used to monitor the response of breast tumors to therapy. In the field of clinical research, this concerns assessment of the efficacy of new anti-cancer agents (phase II clinical trials) 45,46, as well as evaluation of the long-term benefits of routinely administering effective drugs (phase III clinical trials) 47,48. In daily clinical practice, the effects of hormonal

142 Chapter 7

142 Section V

therapy, chemotherapy, and radiotherapy are monitored to contribute to decision-making processes 49,50. As from their inception in the year 2000, the RECIST guidelines were rapidly adopted by the oncology community, and currently constitute the standard set of response evaluation criteria used worldwide 51.

According to RECIST, a reduction in tumor extent of at least 30% must be observed to assume partial response (PR), and an increment in tumor extent of at least 20% must be observed to assume progressive disease (PD). We have shown, however, that despite the fact that RECIST thus requires considerable differences in tumor extent before assuming regression or progression, the random variations in the measurements of tumor extent may cause considerable false-categorization rates (ranging 13% – 29%) (chapter 6). For relatively small tumors, no significant (p ≥ 0.05) differences were found between the estimated false-categorization rates at conventional imaging and those at CE MRI. Conversely, for relatively large tumors, the false-categorization rates were estimated to be significantly (p < 0.05) lower if CE MRI is used rather than mammography (for tumors ≥ 18 mm) or ultrasonography (for tumors ≥ 20 mm). Averaged over all tumor sizes, the false-categorization rates are expected to be reduced (towards 6% – 19%) if CE MRI is used rather than conventional imaging, thus improving the efficacy of the current RECIST guidelines for monitoring the response of breast tumors to therapy.

Phase II clinical trials aim at a rapid assessment of the efficacy of new anti-cancer drugs and are regularly based on single- or multiple-stage designs 52-58. Predefined rules are employed to decide whether to stop the trial or to move on to a consecutive stage accruing more patients. The number of patients included at each stage is often small (typically ~15) and decision criteria are usually stringent. Specifically, in a regular two-stage design, just one patient needs to achieve the status of PR to allow for a second stage accruing more patients 52. Since the false-categorization rate for PR may be on the order of 13%, it is likely that the trial is continued for another stage, even though the anti-cancer agent under investigation may not induce any tumor regression at all. On the other hand, most testing procedures also allow for an early termination of the trial, if intermediate evaluations suggest no drug efficacy 54. Because the false-categorization rate for PD may be on the order of 29%, the chance of rejecting an effective drug that at least stabilizes tumor growth may be high. Similar aspects apply to phase III clinical trials, as well as to the non-investigational setting in the daily clinical practice of breast cancer treatment, where response is monitored on a more individual basis. Patients may remain on therapy if evaluation suggests PR, whereas in reality, they may not benefit from the treatment. A major implication may be that these patients are thus delayed in being offered an alternative and possibly better therapy. On the other hand, patients may be withdrawn from therapy if evaluation suggests PD, while in reality their treatment may at least stabilize the disease.

Although introducing more stringent criteria for PR and PD would certainly reduce the false-categorization rates, such would also reduce the sensitivity of detecting true regression or progression, and it would render any objective comparison of current and future results useless. Instead, a further reduction in false-categorization rates may be

Discussion 143

General discussion 143

accomplished by reducing the underlying random variations in the assessment of tumor extent. Such may be realized directly by implementation of computerized methods for (semi-) automated extent measurement (see Sec. 7.1.2.1) or indirectly by improvements in the imaging modalities (see Sec. 7.1.1). Alternatively, some authors have shown that differences in the uptake pattern of contrast agent in CE MRI of the breast prior and after neoadjuvant chemotherapy may be used to predict tumor response 59. In addition to such temporal features, other investigators have demonstrated that morphological features in CE MRI are also predictive for response 60. These techniques may appear to be of complementary value to the assessment of response to therapy on the basis of serial measurements of tumor extent, and as such, may be implemented in future response evaluation guidelines. Either improvement in breast imaging will, however, shift the balance between false-positive calls and true-positive calls on both tumor regression and tumor progression. The risk is that differences in response evaluations may be attributed to the anti-cancer drug or the treatment strategy under investigation, while in reality, those differences are caused entirely by a reduction of random variations in the measurements of tumor extent. Continual evaluation of current and newly emerging imaging techniques with respect to this issue is therefore highly desirable.

144 Chapter 7

144 Section V

7.2. CONCLUSIONS

The uncertainty in the radiological assessment of tumor extent, due to measurement inaccuracies and random measurement variations, has a profound impact on the efficacy of the current clinical guidelines for the treatment of breast cancer:

- Conventional breast imaging (mammography and ultrasonography) performed to select patients for BCT accurately targets total tumor extent in only 70% of patients;

- The uncertainty in the assessment of tumor extent constitutes a large portion (ranging 40% – 100%) of the currently widely-adopted 1-cm surgical margin in BCT;

- Despite the fact that the current guidelines for response evaluation (RECIST) inherently aim to take uncertainty in assessment of tumor extent into account, considerable false-categorization rates (ranging 13% – 29%) must still be anticipated.

Application of CE MRI improves the efficacy of the current clinical guidelines for the treatment of breast cancer because of its superior accuracy and reproducibility in assessment of tumor extent compared to conventional breast imaging techniques:

- Performing CE MRI to determine patient eligibility for BCT is expected to improve the accuracy to target total tumor extent from 70% towards 90% of patients;

- Application of CE MRI reduces the uncertainty in the assessment of tumor extent within the currently widely adopted 1-cm safety margin in BCT (towards 40% –70%);

- Preoperative CE MRI in a supine position provides a highly reproducible representation of the geometry of the mammary gland structure prior to surgery;

- Localization based on preoperative CE MRI in a supine position may therefore be of complementary value to localization methods based on conventional breast imaging;

- False-categorization rates in response evaluations using RECIST are expected to be reduced (towards 6%–19%) if CE MRI is used rather than conventional breast imaging.

Discussion 145

General discussion 145

7.3. REFERENCES 1 A. C. Voogd, O. J. Repelaer van Driel, R. M. Roumen, M. A. Crommelin, M. W. van

Beek, and J. W. Coebergh, "Changing attitudes towards breast-conserving treatment of early breast cancer in the south-eastern Netherlands: results of a survey among surgeons and a registry-based analysis of patterns of care," Eur. J. Surg. Oncol. 23, 134-138 (1997).

2 Kwaliteitsinstituut voor de Gezondheidszorg (CBO), Nationaal Borstkanker Overleg Nederland (NABON), and Vereniging van Integrale Kankercentra (VIKC), "Richtlijn Behandeling van het Mammacarcinoom," Utrecht: CBO (2002).

3 Kwaliteitsinstituut voor de Gezondheidszorg (CBO), Nationaal Borstkanker Overleg Nederland (NABON), and Vereniging van Integrale Kankercentra (VIKC), "Richtlijn Screening en Diagnostiek van het Mammacarcinoom," Utrecht: CBO (2000).

4 R. Bonardi, D. Ambrogetti, S. Ciatto, E. Gentile, B. Lazzari, P. Mantellini, E. Nannelli, E. Ristori, L. Sottani, and M. R. Turco, "Conventional versus digital mammography in the analysis of screen-detected lesions with low positive predictive value," Eur. J. Radiol. 55, 258-263 (2005).

5 B. E. Hashimoto, D. J. Kramer, and V. J. Picozzi, "High detection rate of breast ductal carcinoma in situ calcifications on mammographically directed high-resolution sonography," J. Ultrasound Med. 20, 501-508 (2001).

6 K. R. Cho, B. K. Seo, J. Y. Lee, E. D. Pisano, B. K. Je, J. Y. Lee, E. J. Choi, K. B. Chung, and O. Y. Whan, "A comparative study of 2D and 3D ultrasonography for evaluation of solid breast masses," Eur. J. Radiol. 54, 365-370 (2005).

7 J. T. Vaughan, G. Adriany, C. J. Snyder, J. Tian, T. Thiel, L. Bolinger, H. Liu, L. DelaBarre, and K. Ugurbil, "Efficient high-frequency body coil for high-field MRI," Magn Reson. Med. 52, 851-859 (2004).

8 T. Xydeas, K. Siegmann, R. Sinkus, U. Krainick-Strobel, S. Miller, and C. D. Claussen, "Magnetic resonance elastography of the breast: correlation of signal intensity data with viscoelastic properties," Invest Radiol. 40, 412-420 (2005).

9 M. A. Thomas, N. Wyckoff, K. Yue, N. Binesh, S. Banakar, H. K. Chung, J. Sayre, and N. DeBruhl, "Two-dimensional MR spectroscopic characterization of breast cancer in vivo," Technol. Cancer Res. Treat. 4, 99-106 (2005).

10 B. I. Lee, S. H. Oh, T. S. Kim, E. J. Woo, S. Y. Lee, O. Kwon, and J. K. Seo, "Basic setup for breast conductivity imaging using magnetic resonance electrical impedance tomography," Phys. Med. Biol. 51, 443-455 (2006).

11 L. Tafra, Z. Cheng, J. Uddo, M. B. Lobrano, W. Stein, W. A. Berg, E. Levine, I. N. Weinberg, D. Narayanan, E. Ross, D. Beylin, S. Yarnall, R. Keen, K. Sawyer, J. Van Geffen, R. L. Freimanis, E. Staab, L. P. Adler, J. Lovelace, P. Shen, J. Stewart, and S.

146 Chapter 7

146 Section V

Dolinsky, "Pilot clinical trial of 18F-fluorodeoxyglucose positron-emission mammography in the surgical management of breast cancer," Am. J. Surg. 190, 628-632 (2005).

12 A. Smith, "Full-field breast tomosynthesis," Radiol. Manage. 27, 25-31 (2005). 13 B. Spivack, M. M. Khanna, L. Tafra, G. Juillard, and A. E. Giuliano, "Margin status

and local recurrence after breast-conserving surgery," Arch. Surg. 129, 952-956 (1994).

14 D. E. Wazer, G. Jabro, R. Ruthazer, C. Schmid, H. Safaii, and R. K. Schmidt-Ullrich, "Extent of margin positivity as a predictor for local recurrence after breast conserving irradiation," Radiat. Oncol. Investig. 7, 111-117 (1999).

15 C. C. Park, M. Mitsumori, A. Nixon, A. Recht, J. Connolly, R. Gelman, B. Silver, S. Hetelekidis, A. Abner, J. R. Harris, and S. J. Schnitt, "Outcome at 8 years after breast-conserving surgery and radiation therapy for invasive breast cancer: influence of margin status and systemic therapy on local recurrence," J. Clin. Oncol. 18, 1668-1675 (2000).

16 C. Vrieling, L. Collette, A. Fourquet, W. J. Hoogenraad, J. H. Horiot, J. J. Jager, M. Pierart, P. M. Poortmans, H. Struikmans, B. Maat, E. Van Limbergen, and H. Bartelink, "The influence of patient, tumor and treatment factors on the cosmetic results after breast-conserving therapy in the EORTC 'boost vs. no boost' trial. EORTC Radiotherapy and Breast Cancer Cooperative Groups," Radiother. Oncol. 55, 219-232 (2000).

17 E. J. Rutgers, "Quality control in the locoregional treatment of breast cancer," Eur. J. Cancer 37, 447-453 (2001).

18 B. Sahiner, H. P. Chan, N. Petrick, M. A. Helvie, and L. M. Hadjiiski, "Improvement of mammographic mass characterization using spiculation meausures and morphological features," Med. Phys. 28, 1455-1465 (2001).

19 L. Kinnard, S. C. Lo, E. Makariou, T. Osicka, P. Wang, M. F. Chouikha, and M. T. Freedman, "Steepest changes of a probability-based cost function for delineation of mammographic masses: a validation study," Med. Phys. 31, 2796-2810 (2004).

20 S. Timp and N. Karssemeijer, "A new 2D segmentation method based on dynamic programming applied to computer aided detection in mammography," Med. Phys. 31, 958-971 (2004).

21 K. Horsch, M. L. Giger, L. A. Venta, and C. J. Vyborny, "Automatic segmentation of breast lesions on ultrasound," Med. Phys. 28, 1652-1659 (2001).

22 Y. L. Huang and D. R. Chen, "Watershed segmentation for breast tumor in 2-D sonography," Ultrasound Med. Biol. 30, 625-632 (2004).

Discussion 147

General discussion 147

23 R. F. Chang, W. J. Wu, W. K. Moon, and D. R. Chen, "Automatic ultrasound segmentation and morphology based diagnosis of solid breast tumors," Breast Cancer Res. Treat. 89, 179-185 (2005).

24 D. R. Chen, R. F. Chang, W. J. Wu, W. K. Moon, and W. L. Wu, "3-D breast ultrasound segmentation using active contour model," Ultrasound Med. Biol. 29, 1017-1026 (2003).

25 R. F. Chang, W. J. Wu, W. K. Moon, W. M. Chen, W. Lee, and D. R. Chen, "Segmentation of breast tumor in three-dimensional ultrasound images using three-dimensional discrete active contour model," Ultrasound Med. Biol. 29, 1571-1581 (2003).

26 B. Sahiner, H. P. Chan, M. A. Roubidoux, M. A. Helvie, L. M. Hadjiiski, A. Ramachandran, C. Paramagul, G. L. LeCarpentier, A. Nees, and C. Blane, "Computerized characterization of breast masses on three-dimensional ultrasound volumes," Med. Phys. 31, 744-754 (2004).

27 H. Satake, K. Shimamoto, A. Sawaki, R. Niimi, Y. Ando, T. Ishiguchi, T. Ishigaki, K. Yamakawa, T. Nagasaka, and H. Funahashi, "Role of ultrasonography in the detection of intraductal spread of breast cancer: correlation with pathologic findings, mammography and MR imaging," Eur. Radiol. 10, 1726-1732 (2000).

28 R. Lucht, S. Delorme, and G. Brix, "Neural network-based segmentation of dynamic MR mammographic images," Magn Reson. Imaging 20, 147-154 (2002).

29 S. C. Partridge, J. E. Gibbs, Y. Lu, L. J. Esserman, D. Tripathy, D. S. Wolverton, H. S. Rugo, E. S. Hwang, C. A. Ewing, and N. M. Hylton, "MRI measurements of breast tumor volume predict response to neoadjuvant chemotherapy and recurrence-free survival," AJR Am. J. Roentgenol. 184, 1774-1781 (2005).

30 W. Chen, M. L. Giger, and U. Bick, "A fuzzy c-means (FCM)-based approach for computerized segmentation of breast lesions in dynamic contrast-enhanced MR images," Acad. Radiol. 13, 63-72 (2006).

31 M. R. Christiaens, L. Cataliotti, I. Fentiman, E. Rutgers, M. Blichert-Toft, J. E. DeVries, H. P. Graversen, K. Vantongelen, and R. Aerts, "Comparison of the surgical procedures for breast conserving treatment of early breast cancer in seven EORTC centres," Eur. J. Cancer 32A, 1866-1875 (1996).

32 J. M. Dixon, O. Ravisekar, M. Cunningham, E. D. Anderson, T. J. Anderson, and H. K. Brown, "Factors affecting outcome of patients with impalpable breast cancer detected by breast screening," Br. J. Surg. 83, 997-1001 (1996).

33 E. E. Deurloo, K. G. Gilhuijs, L. J. Schultze Kool, and S. H. Muller, "Displacement of breast tissue and needle deviations during stereotactic procedures," Invest Radiol. 36, 347-353 (2001).

148 Chapter 7

148 Section V

34 G. Amano, N. Ohuchi, T. Ishibashi, T. Ishida, M. Amari, and S. Satomi, "Correlation of three-dimensional magnetic resonance imaging with precise histopathological map concerning carcinoma extension in the breast," Breast Cancer Res. Treat. 60, 43-55 (2000).

35 D. J. Mullen, R. N. Eisen, R. D. Newman, P. M. Perrone, and J. C. Wilsey, "The use of carbon marking after stereotactic large-core-needle breast biopsy," Radiology 218, 255-260 (2001).

36 T. Inoue, Y. Tamaki, Y. Sato, M. Nakamoto, S. Tamura, Y. Tanji, T. Taguchi, and S. Noguchi, "Three-dimensional ultrasound imaging of breast cancer by a real-time intraoperative navigation system," Breast Cancer 12, 122-129 (2005).

37 F. D. Rahusen, A. J. Bremers, H. F. Fabry, A. H. van Amerongen, R. P. Boom, and S. Meijer, "Ultrasound-guided lumpectomy of nonpalpable breast cancer versus wire-guided resection: a randomized clinical trial," Ann. Surg. Oncol. 9, 994-998 (2002).

38 M. Van Goethem, K. Schelfout, L. Dijckmans, J. C. Van Der Auwera, J. Weyler, I. Verslegers, I. Biltjes, and A. De Schepper, "MR mammography in the pre-operative staging of breast cancer in patients with dense breast tissue: comparison with mammography and ultrasound," Eur. Radiol. 14, 809-816 (2004).

39 L. F. Smith, R. Henry-Tillman, A. T. Mancino, A. Johnson, J. M. Price, K. C. Westbrook, S. Harms, and V. S. Klimberg, "Magnetic resonance imaging-guided core needle biopsy and needle localized excision of occult breast lesions," Am. J. Surg. 182, 414-418 (2001).

40 I. Bedrosian, J. Schlencker, F. R. Spitz, S. G. Orel, D. L. Fraker, L. S. Callans, M. Schnall, C. Reynolds, and B. J. Czerniecki, "Magnetic resonance imaging-guided biopsy of mammographically and clinically occult breast lesions," Ann. Surg. Oncol. 9, 457-461 (2002).

41 E. A. Morris, L. Liberman, D. D. Dershaw, J. B. Kaplan, L. R. LaTrenta, A. F. Abramson, and D. J. Ballon, "Preoperative MR imaging-guided needle localization of breast lesions," AJR Am. J. Roentgenol. 178, 1211-1220 (2002).

42 S. H. Heywang-Kobrunner, A. Heinig, D. Pickuth, T. Alberich, and R. P. Spielmann, "Interventional MRI of the breast: lesion localisation and biopsy," Eur. Radiol. 10, 36-45 (2000).

43 M. Hirose, D. F. Kacher, D. N. Smith, C. M. Kaelin, and F. A. Jolesz, "Feasibility of MR imaging-guided breast lumpectomy for malignant tumors in a 0.5-T open-configuration MR imaging system," Acad. Radiol. 9, 933-941 (2002).

44 Y. Tamaki, S. Akashi-Tanaka, T. Ishida, T. Uematsu, Y. Sawai, M. Kusama, S. Nakamura, K. Hisamatsu, Y. Tanji, Y. Sato, and N. Matsuura, "3D imaging of intraductal spread of breast cancer and its clinical application for navigation surgery," Breast Cancer 9, 289-295 (2002).

Discussion 149

General discussion 149

45 G. von Minckwitz, S. D. Costa, W. Eiermann, J. U. Blohmer, A. H. Tulusan, C. Jackisch, and M. Kaufmann, "Maximized reduction of primary breast tumor size using preoperative chemotherapy with doxorubicin and docetaxel," J. Clin. Oncol. 17, 1999-2005 (1999).

46 J. M. Dixon, L. Renshaw, C. Bellamy, M. Stuart, G. Hoctin-Boes, and W. R. Miller, "The effects of neoadjuvant anastrozole (Arimidex) on tumor volume in postmenopausal women with breast cancer: a randomized, double-blind, single-center study," Clin. Cancer Res. 6, 2229-2235 (2000).

47 J. A. van der Hage, C. J. van de Velde, J. P. Julien, M. Tubiana-Hulin, C. Vandervelden, and L. Duchateau, "Preoperative chemotherapy in primary operable breast cancer: results from the European Organization for Research and Treatment of Cancer trial 10902," J. Clin. Oncol. 19, 4224-4237 (2001).

48 P. Therasse, L. Mauriac, M. Welnicka-Jaskiewicz, P. Bruning, T. Cufer, H. Bonnefoi, E. Tomiak, K. I. Pritchard, A. Hamilton, and M. J. Piccart, "Final results of a randomized phase III trial comparing cyclophosphamide, epirubicin, and fluorouracil with a dose-intensified epirubicin and cyclophosphamide + filgrastim as neoadjuvant treatment in locally advanced breast cancer: an EORTC-NCIC-SAKK multicenter study," J. Clin. Oncol. 21, 843-850 (2003).

49 P. Therasse, "Evaluation of response: new and standard criteria," Ann. Oncol. 13 Suppl 4, 127-129 (2002).

50 P. Therasse, "Measuring the clinical response. What does it mean?," Eur. J. Cancer 38, 1817-1823 (2002).

51 P. Therasse, S. G. Arbuck, E. A. Eisenhauer, J. Wanders, R. S. Kaplan, L. Rubinstein, J. Verweij, M. Van Glabbeke, A. T. van Oosterom, M. C. Christian, and S. G. Gwyther, "New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada," J. Natl. Cancer Inst. 92, 205-216 (2000).

52 E. A. Gehan, "The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent," J. Chronic. Dis. 13, 346-353 (1961).

53 P. C. O'Brien and T. R. Fleming, "A multiple testing procedure for clinical trials," Biometrics 35, 549-556 (1979).

54 T. R. Fleming, "One-sample multiple testing procedure for phase II clinical trials," Biometrics 38, 143-151 (1982).

55 R. J. Sylvester, "A bayesian approach to the design of phase II clinical trials," Biometrics 44, 823-836 (1988).

150 Chapter 7

150 Section V

56 R. Simon, "Optimal two-stage designs for phase II clinical trials," Control Clin. Trials 10, 1-10 (1989).

57 L. G. Ensign, E. A. Gehan, D. S. Kamen, and P. F. Thall, "An optimal three-stage design for phase II clinical trials," Stat. Med. 13, 1727-1736 (1994).

58 A. Kramar, D. Potvin, and C. Hill, "Multistage designs for phase II clinical trials: statistical issues in cancer research," Br. J. Cancer 74, 1317-1320 (1996).

59 K. Wasser, S. K. Klein, C. Fink, H. Junkermann, H. P. Sinn, I. Zuna, M. V. Knopp, and S. Delorme, "Evaluation of neoadjuvant chemotherapeutic response of breast cancer using dynamic MRI with high temporal resolution," Eur. Radiol. 13, 80-87 (2003).

60 L. Esserman, E. Kaplan, S. Partridge, D. Tripathy, H. Rugo, J. Park, S. Hwang, H. Kuerer, D. Sudilovsky, Y. Lu, and N. Hylton, "MRI phenotype is associated with response to doxorubicin and cyclophosphamide neoadjuvant chemotherapy in stage III breast cancer," Ann. Surg. Oncol. 8, 549-559 (2001).

Summary 151

Summary 151

SUMMARY

Breast cancer is an all too common disease among women in western countries. In the Netherlands, around one out of every nine women will be diagnosed with breast cancer during her lifetime, and nearly one out of every 22 women will die due to the disease. Contemporary breast cancer treatment consists of a combination of locoregional therapy (surgery & radiotherapy) and systemic therapy (chemotherapy & hormonal therapy). Preferably, patients are offered breast-conserving therapy (BCT), which aims at a complete elimination of the cancer while preserving an optimal cosmesis. Alternatively, patients may have to undergo mastectomy: complete removal of the mammary gland. Systemic therapy is used both to reduce the cancer load prior to locoregional treatment (neo-adjuvant) as well as to eliminate already existing distant metastases (adjuvant).

Tumor extent is a major parameter in current clinical guidelines for selecting the type of treatment, treatment implementation, and for the evaluation of treatment results. Imaging techniques are routinely used for the preoperative assessment of tumor extent in the breast. Conventional mammography and ultrasonography are the standard imaging techniques currently used to assess breast tumor extent. Contrast-enhanced magnetic resonance imaging (CE MRI) is an emerging technique to assess the extent of breast tumors, but its use is restricted to selected cases, predominantly due to its higher costs. The inherent limitations of either of these breast imaging techniques, in terms of both accuracy and reproducibility, introduce, however, uncertainty in the preoperative assessment of tumor extent, which thus affects the efficacy of current clinical guidelines. In this thesis, we quantified the impact of uncertainties in the radiological assessment of tumor extent on the efficacy of current clinical guidelines for the treatment of breast cancer, and evaluated if CE MRI of the breast improves the efficacy of these guidelines.

The first part of this thesis addresses the impact of uncertainty in the radiological assessment of tumor extent on current clinical guidelines for treatment selection. Generally, current guidelines consider a large tumor in a small breast as an important contra-indication for BCT, and pertinently advise against BCT in case of multicentricity. Chapter 2 describes a quantitative comparison between measurements of tumor extent performed at preoperative imaging and those obtained postoperatively at pathology. Conventional imaging was found to target the total tumor extent in only 70% of patients. In the majority of the 30% of patients in which conventional imaging was inaccurate, total tumor extent was underestimated. Such may result in an incomplete tumor excision, thus requiring either a reexcision, or even a mastectomy if BCT seems no longer feasible. Conversely, CE MRI was found to target the total tumor extent in as much as 90% of patients. Application of CE MRI complementary to conventional imaging resulted in a change of the treatment plan from BCT to mastectomy in 19% of the patients that participated in this study. Performing CE MRI is thus expected to improve the efficacy of current clinical guidelines to determine patient eligibility for BCT, because of its ability to improve the accuracy to target total tumor extent from 70% towards 90% of patients.

152 Summary

152 Summary

The second part of this thesis considers the impact of uncertainty in the radiological assessment of tumor extent on current clinical guidelines for treatment implementation. Current guidelines for the implementation of BCT generally advise surgeons to remove the tumor with a margin of healthy tissue around it of typically 1 cm in order to take multiple sources of uncertainty into account during surgery, including uncertainty in extent measurement, and uncertainty in extent localization. In chapter 4, we quantified the reproducibility of the assessment of tumor extent in the breast at preoperative imaging, using an analysis of variance (ANOVA) based method validated in chapter 3. We found that the uncertainty in the preoperative measurements of tumor extent due to random variations constitutes a large portion (40% – 100%) of the 1 cm margin in BCT. For relatively small tumors (< 17 mm), the uncertainty was estimated at 4 – 7 mm, for both conventional imaging and CE MRI. Conversely, for relatively large tumors (≥ 17 mm), the uncertainty was estimated to constitute 7 – 10 mm at conventional imaging, whereas the uncertainty at CE MRI was still estimated to be 4 – 7 mm. Application of CE MRI thus reduces the uncertainty in the assessment tumor extent within the currently widely-adopted 1 cm margin in BCT (towards 40% – 70%). In chapter 5, we investigated whether preoperatively obtained CE MRI of the patient in a supine position can be used to guide the surgeon to localize both the position and the extent of the tumor immediately prior to the actual surgical procedure. A prerequisite for such an approach is that the geometry of the internal breast structure with respect to the skin must reproduce accurately between preoperative CE MRI and localization. Repeat MRI scans were obtained from both breasts of healthy volunteers and the reproducibility of mammary gland structure was estimated by quantifying the shifts of fiducial points in the MR images between repeat setups. We found that the geometrical structure of the mammary gland reproduces accurately if the setup of the patient is accurately reproduced. Preoperative CE MRI in a supine position thus provides a highly reproducible representation of the geometry of the mammary gland structure prior to surgery. Localization based on preoperative CE MRI in a supine position may therefore be of complementary value to localization methods based on conventional breast imaging.

The third part of this thesis addresses the impact of uncertainty in the radiological assessment of tumor extent on current clinical guidelines for treatment evaluation. Generally, the response of breast tumors to therapy is monitored by means of the criteria proposed by current RECIST guidelines, using repeat imaging before and after treatment. Chapter 6 describes a quantitative study to estimate the false categorization rates that may occur in the evaluation of the response of breast tumors due to random variations in the measurements of tumor extent. For this purpose, a newly developed ANOVA based method was used, which performance has been validated in chapter 3. We found that despite the fact that RECIST requires considerable differences in tumor extent before assuming regression or progression, the random variations in the measurements of tumor extent may cause considerable false-categorization rates (13% – 29%). For relatively small tumors, no significant (p ≥ 0.05) differences were found between the estimated false-categorization rates at conventional imaging and those at CE MRI.

Summary 153

Summary 153

Conversely, for relatively large tumors, the false-categorization rates were estimated to be significantly (p < 0.05) lower if CE MRI is used rather than mammography (for tumors ≥ 18 mm) or ultrasonography (for tumors ≥ 20 mm). Averaged over all tumor sizes, the false-categorization rates are expected to be reduced (towards 6% – 19%) if CE MRI is used rather than conventional imaging, thus improving the efficacy of the current RECIST guidelines for monitoring the response of breast tumors to therapy.

This thesis has shown that the uncertainty in the radiological assessment of tumor extent, due to measurement inaccuracies and random measurement variations, has a profound impact on the efficacy of the current clinical guidelines for the treatment of breast cancer. Application of CE MRI improves the efficacy of the current clinical guidelines for the treatment of breast cancer because of its superior accuracy and reproducibility in assessment of tumor extent compared to conventional breast imaging techniques. The detailed quantitative information on the uncertainty in the radiological assessment of tumor extent and its impact on the efficacy of current clinical guidelines for the treatment of breast cancer – as provided by this thesis – allows to interpret treatment results on a more quantitative basis. Furthermore, the obtained knowledge may be used to refine the existing criteria for decision-making as proposed by the current clinical guidelines. Moreover, the quantitative information provides a guideline regarding the complementary value of CE MRI with respect to conventional imaging. Finally, the obtained knowledge allows to quantitatively assess the value of new therapies and breast imaging techniques.

Samenvatting 155

Samenvatting 155

SAMENVATTING

Borstkanker is een veel voorkomende ziekte onder vrouwen in westerse landen. In Nederland wordt bij ongeveer één op de negen vrouwen gedurende haar leven de diagnose borstkanker gesteld en nagenoeg één op de 22 vrouwen zal overlijden door de ziekte. De hedendaagse behandeling van borstkanker bestaat uit een combinatie van locoregionale therapie (chirurgie & radiotherapie) en systemische therapie (chemotherapie & hormonale therapie). Patiënten wordt bij voorkeur borstsparende therapie (BCT) aangeboden welke als doel heeft een volledige verwijdering van de kanker met behoud van een optimaal cosmetisch resultaat te realiseren. In andere gevallen ondergaat de patiënt veelal een mastectomie: volledige verwijdering van de borstklier. Systemische therapie wordt zowel gebruikt om de uitgebreidheid van de kanker voorafgaand aan de locoregionale behandeling (neo-adjuvant) te verminderen alsmede om eventueel reeds bestaande metastasen op afstand uit te schakelen (adjuvant).

Tumoruitbreiding is een belangrijke parameter in de huidige klinische richtlijnen voor het kiezen van het type behandeling, de implementatie van de behandeling, en voor het evalueren van behandelingsresultaten. Beeldvormende technieken worden routinematig gebruikt voor het preoperatief bepalen van tumoruitbreiding in de borst. Conventionele mammografie en echografie zijn op dit moment de standaard beeldvormende technieken voor het bepalen van de uitbreiding van borsttumoren. Magnetic resonance imaging met contrastmiddel (CE MRI) is een techniek in opkomst voor het bepalen van de tumoruitbreiding in de borst, maar wordt beperkt toegepast, vanwege de hogere kosten. De inherente beperkingen van een ieder van deze beeldvormende technieken, in termen van nauwkeurigheid en reproduceerbaarheid, introduceren echter onzekerheid in de radiologische bepaling van tumoruitbreiding, hetgeen aldus de doeltreffendheid van de huidige klinische richtlijnen beïnvloedt. In dit proefschrift quantificeerden we de invloed van onzekerheden in de radiologische bepaling van tumorgrootte op de doeltreffendheid van de huidige klinische richtlijnen voor de behandeling van borstkanker en evalueerden we eveneens of CE MRI van de borst de doeltreffendheid van deze richtlijnen verbetert.

Het eerste deel van dit proefschrift richt zich op de invloed van onzekerheid in de radiologische bepaling van tumoruitbreiding op de huidige klinische richtlijnen voor het kiezen van het type behandeling. In het algemeen wordt een grote tumor in een kleine borst door de huidige richtlijnen als een relatief belangrijke contra-indicatie voor BCT beschouwd en wordt BCT pertinent ontraden in geval van multicentriciteit. Hoofdstuk 2 beschrijft een quantitatieve vergelijking tussen metingen van tumoruitbreiding welke verkregen zijn aan de hand van preoperatieve beeldvorming enerzijds en anderzijds metingen van tumoruitbreiding welke postoperatief zijn verkregen door middel van pathologisch onderzoek. Met conventionele beeldvorming bleek het in totaal door tumor ingenomen gebied in slechts 70% van de patiënten nauwkeurig te zijn bepaald. In het overgrote deel van de 30% van de patiënten waarvoor conventionele beeldvorming onnauwkeurig was bleek het in totaal door tumor ingenomen gebied te zijn onderschat. Dat kan een onvolledige verwijdering van de tumor tot gevolg hebben, hetgeen aldus

156 Samenvatting

156 Samenvatting

ofwel een reëxcisie vereist, of mogelijk zelfs een mastectomie indien BCT niet langer haalbaar wordt geacht. Anderzijds is gebleken dat met CE MRI het in totaal door tumor ingenomen gebied in maar liefst 90% van de patiënten nauwkeurig te zijn bepaald. Toepassing van CE MRI in aanvulling op conventionele beeldvorming resulteerde in het veranderen van het voorgenomen behandelplan van BCT naar mastectomie in 19% van de patiënten die aan deze studie deelnamen. Het ligt dan ook in de verwachting dat het gebruik van CE MRI aldus de doeltreffendheid van de huidige klinische richtlijnen zal verbeteren vanwege diens mogelijkheid om de nauwkeurigheid van de bepaling van het in totaal door tumor ingenomen gebied te verbeteren van 70% naar 90% van de patiënten.

Het tweede deel van dit proefschrift behandelt de invloed van onzekerheid in de radiologische bepaling van tumoruitbreiding op de huidige klinische richtlijnen voor de implementatie van de behandeling. De huidige richtlijnen voor de implementatie van BCT adviseren chirurgen in het algemeen om de tumor te verwijderen met een marge van gezond weefsel eromheen van typisch 1 cm, om zodoende rekening te houden met verschillende bronnen van onzekerheid gedurende chirurgie waaronder onzekerheid in het meten van tumoruitbreiding en onzekerheid in het lokaliseren van tumoruitbreiding. In hoofdstuk 4 hebben we de reproduceerbaarheid van de bepaling van tumoruitbreiding in de borst met preoperatieve beeldvorming gequantificeerd door gebruik te maken van een op variantie analyse gebaseerde methode (ANOVA) welke is gevalideerd in hoofdstuk 3. We vonden dat de onzekerheid in de preoperatieve meting van tumoruitbreiding een groot deel (40% – 100%) inneemt binnen de 1 cm marge in BCT. Voor relatief kleine tumoren (< 17 mm) werd de onzekerheid geschat op 4 – 7 mm voor zowel conventionele beeldvorming alsmede voor CE MRI. Anderzijds werd voor relatief grote tumoren (≥ 17 mm) de onzekerheid geschat op 7 – 10 mm voor conventionele beeldvorming terwijl de onzekerheid voor CE MRI nog altijd werd geschat op 4 – 7 mm. Toepassing van CE MRI reduceert aldus de onzekerheid in de bepaling van tumoruitbreiding binnen de heden ten dage veelvuldig toegepaste 1 cm marge in BCT (naar 40% – 70%). In hoofdstuk 5 onderzochten we of preoperatief verkregen CE MRI van de patiënt in rugligging kan worden aangewend om de chirurg te ondersteunen in het lokaliseren van zowel de positie als de uitbreiding van de tumor direkt voorafgaand aan de feitelijke chirurgische procedure. Een eerste vereiste voor een dergelijke aanpak is dat de geometrie van de inwendige borststructuur ten opzichte van de huid nauwkeurig reproduceert tussen het uitvoeren van de preoperatieve CE MRI en het lokaliseren. Herhaalde MRI scans werden verkregen van beide borsten van gezonde vrijwilligsters en de reproduceerbaarheid van de borstklier werd geschat door het quantificeren van de verschuivingen van fiduciele punten in de MR beelden tussen de herhaalde opstellingen. We vonden dat de geometrische structuur van de borstklier nauwkeurig reproduceert wanneer de opstelling van de patiënt nauwkeurig wordt gereproduceerd. Preoperatieve CE MRI in rugligging biedt aldus een goed reproduceerbare representatie van de geometrie van de borstklier vóór aanvang van chirurgie. Lokalisatie gebaseerd op preoperatieve CE MRI in rugligging kan dientengevolge van toegevoegde waarde zijn op de bestaande lokalisatie methoden welke gebaseerd zijn op conventionele beeldvorming.

Samenvatting 157

Samenvatting 157

Het derde deel van dit proefschrift richt zich op de invloed van onzekerheid in de radiologische bepaling van tumoruitbreiding op de huidige klinische richtlijnen voor het evalueren van behandelingsresultaten. In het algemeen wordt de respons van borsttumoren op therapie gemonitord aan de hand van de criteria zoals voorgesteld in de RECIST richtlijnen door gebruik te maken van herhaalde beeldvorming voorafgaand aan en na afloop van de behandeling. Hoofdstuk 6 beschrijft een quantitatieve studie voor het schatten van de percentages foutieve beoordelingen die voor kunnen komen in het evalueren van de respons van borsttumoren ten gevolge van random variaties in de metingen van tumoruitbreiding. Hiertoe werd een nieuw ontwikkelde ANOVA methode gebruikt welke gevalideerd is in hoofdstuk 3. Het bleek dat ondanks het feit dat RECIST aanzienlijke verschillen in tumoruitbreiding verlangd voordat regressie of progressie mag worden aangenomen de random variaties in de metingen van tumoruitbreiding kunnen zorgen voor aanzienlijke percentages foutieve beoordelingen (13% – 29%). Voor relatief kleine tumoren werden geen significante (p ≥ 0.05) verschillen gevonden tussen de geschatte percentages foutieve boordelingen met conventionele beeldvorming en die met CE MRI. Anderzijds bleken de percentages foutieve beoordelingen significant (p < 0.05) kleiner geschat indien CE MRI wordt gebruikt in plaats van mammografie (voor tumoren ≥ 18 mm) of echografie (voor tumoren ≥ 20 mm). Gemiddeld over alle tumorgroottes ligt het dan ook in de verwachting dat de percentages foutieve beoordelingen zullen worden gereduceerd (naar 6% – 19%) indien CE MRI wordt aangewend in plaats van conventionele beeldvorming hetgeen aldus de doeltreffendheid verhoogt van de huidige RECIST richtlijnen voor het monitoren van de respons van borsttumoren op therapie.

Dit proefschrift heeft aangetoond dat de onzekerheid in de radiologische bepaling van tumoruitbreiding door meetonnauwkeurigheden en random meetvariaties van diepgaande invloed is op de doeltreffendheid van de huidige klinische richtlijnen voor de behandeling van borstkanker. Toepassing van CE MRI verbetert de doeltreffendheid van de huidige klinische richtlijnen voor de behandeling van borstkanker vanwege diens superieure nauwkeurigheid en reproduceerbaarheid in het bepalen van tumoruitbreiding in vergelijking met conventionele beeldvormende technieken. De gedetailleerde quantitatieve informatie op het gebied van de onzekerheid in de radiologische bepaling van tumoruitbreiding en diens invloed op de doeltreffenheid van de huidige klinische richtlijnen voor de behandeling van borstkanker – gepresenteerd in dit proefschrift – maakt het mogelijk om behandelingsresultaten op een meer quantitatieve manier te interpreteren. Daarnaast kan de opgedane quantitatieve kennis worden aangewend ter verfijning van de bestaande criteria voor besluitvorming zoals voorgesteld in de huidige klinische richtlijnen. Bovendien biedt de quantitatieve informatie een richtlijn betreffende de toegevoegde waarde van CE MRI op de conventionele beeldvorming. Tot slot schept de opgedane kennis de mogelijkheid om op quantitatieve wijze de waarde van nieuwe therapiën alsmede die van nieuwe beeldvormende technieken voor de borst te bepalen.

Dankwoord 159

Dankwoord 159

DANKWOORD

Velen hebben direkt en indirekt bijgedragen aan de totstandkoming van dit proefschrift, op deze plaats wil ik allen daarvoor hartelijk bedanken, enkelen van hen in het bijzonder.

Allereerst dank ik mijn promotor Harry Bartelink voor het bieden van de mogelijkheid tot het doen van het onderzoek dat is beschreven in dit proefschrift. Dank voor je vertrouwen, geduld, inspiratie, motivatie, en het afdelings-overkoepelend managen van het MARGINS project. Tegelijkertijd dank ik mijn co-promotor Kenneth Gilhuijs voor al die jaren van dagelijkse begeleiding, met telkens weer nieuwe brainstorms, experimenten, en revisies van artikelen. Jij bent degene die mij het vak van wetenschapper heeft geleerd.

Verder wil ik de overige leden van de MARGINS werkgroep danken, Leo Schultze Kool, Hans Peterse, Emiel Rutgers, Marc van de Vijver, en Jelle Teertstra, voor hun essentiële bijdragen aan het tot uitvoering kunnen brengen van dit inter-disciplinaire project.

Vervolgens gaat mijn dank uit naar de radiologen van het NKI – AVL, Peter Besnard, Elisabeth Joekes, Wim Koops, Robert Kröger, Claudette Loo, Frank Pameijer, Warner Prevoo, en Maartje Smid-Geirnaerdt, voor hun natuurgetrouwe beroepsuitoefening.

Daarnaast dank ik Saar Muller, klinisch-fysicus, voor je technisch-logistieke ondersteuning van het MARGINS project, en Guus Hart, statisticus, voor je waardevolle inbreng in een ieder van de studies van dit proefschrift. Retrospectief gezien zijn jullie beide van bijzonder richtinggevende invloed geweest op mijn professionele ontwikkeling.

Speciaal wil ik ook Eline Deurloo, Marja van Vliet, en Angelique Schlief, de meest direkte collega's van het Diagnostic Imaging Lab en het MARGINS team, bedanken voor al hun bijdragen aan het MARGINS project, en natuurlijk het inter-menselijke contact.

Peter van de Ven, Nico Jessurun, en Carolien Peters-Maas dank ik in het bijzonder voor het altijd perfect laten draaien van het computernetwerk, het up-to-date verzorgen van de hardware alsmede de benodigde programmatuur en allerhande handige tools, het iedere middag gezamelijk nuttigen van de lunch, en de gesprekken over de wereld om ons heen.

Voor alle morele ondersteuning, taalpurisme, en de gezellige borrels, etentjes, feesten, die we gezamelijk hebben beleefd dank ik Anja Betgen, Josien de Bois, Bob Brand, Niels Dekker, Joop Duppen, Michel Frenay, Marcel van Herk, Peter Remeijer, Monique Smitsmans, Jan-Jakob Sonke, Adriaan Touw, Jochem Wolthaus, en Lambert Zijp, jullie zorgden voor de broodnodige afwisseling en relativering van het promotieonderzoek.

In het bijzonder dank ik mijn hartskameraden Rob Budde en Lennert Ploeger voor al hun onvoorwaardelijke vriendschap, steun, en aanmoediging gedurende de afgelopen jaren. Welbeschouwd zijn jullie al veel langer paranimfen dan pas sinds mijn verzoek daartoe.

Tenslotte maar niet in de laatste plaats bedank ik al mijn vrienden en familie in het land voor hun oprechte interesse in de promovendus. In het bijzonder dank ik mijn ouders en mijn broer, het onomstotelijke thuisfront, bij wie ik immer en altijd weer terecht kon voor inspiratie en motivatie om te kunnen volharden in het vervolmaken van dit proefschrift.

160 List of abbreviations

160 List of abbreviations

LIST OF ABBREVIATIONS 1D one dimensional 2D two dimensional 3D three dimensional ANOVA analysis of variance AP anterior-posterior BCT breast-conserving therapy BI-RADS breast imaging reporting and data system CC craniocaudal CE MRI contrast-enhanced magnetic resonance imaging CI confidence interval CL confidence level CR complete response DCIS ductal carcinoma in situ FLASH fast low-angle shot LD largest diameter LR likelihood ratio ML mediolateral MR magnetic resonance MRI magnetic resonance imaging PD progressive disease PR partial response PWD pairwise differences RECIST Response Evaluation Criteria In Solid Tumors REML restricted maximum likelihood estimation RES residuals ROC receiver operating characteristic SD (arithmetical) standard deviation SD (classification) stable disease TE echo time TR repetition time WHO World Health Organization