Transcript

The Effect of Disc Size and Severity of Disease on theDiagnostic Accuracy of the Heidelberg RetinaTomograph Glaucoma Probability Score

Linda M. Zangwill,1 Sonia Jain,2 Lyne Racette,1 Karin B. Ernstrom,2 Christopher Bowd,1

Felipe A. Medeiros,1 Pamela A. Sample,1 and Robert N. Weinreb1

PURPOSE. To compare the effect of disc size and disease severityon the Heidelberg Retina Tomograph (HRT) Glaucoma Proba-bility Score (GPS) and the Moorfields Regression Analysis(MRA) for discriminating between glaucomatous and healthyeyes.

METHODS. Ninety-nine eyes with repeatable standard automatedperimetry results showing glaucomatous damage and 62 nor-mal eyes were included from the longitudinal Diagnostic Inno-vations in Glaucoma Study (DIGS). The severity of glaucoma-tous visual field defects ranged from early to severe (average[95% CI] pattern standard deviation [PSD] was 5.7 [5.0–6.5]dB). The GPS (HRTII ver. 3.0; Heidelberg Engineering, Heidel-berg, Germany) utilizes two measures of peripapillary retinalnerve fiber layer shape (horizontal and vertical retinal nervefiber layer curvature) and three measures of optic nerve headshape (cup depth, rim steepness, and cup size) as input into arelevance vector machine learning classifier that estimates aprobability of having glaucoma. The MRA compares measuredrim area with predicted rim area adjusted for disc size tocategorize eyes as outside normal limits, borderline, or withinnormal limits. The effect of disc size and severity of disease onthe diagnostic accuracy of both GPS and MRA was evaluatedusing the generalized estimating equation marginal logisticregression analysis.

RESULTS. Using the manufacturers’ suggested cutoffs for GPSglobal classification (�64% as outside normal limits), the sen-sitivity and specificity (95% CI) were 71.7% (62.2%–79.7%) and82.3% (71.0%–89.8%), respectively. The sensitivity and speci-ficity (95% CI) of the MRA result were 66.7% (58.0%–76.1%)and 88.7% (78.5%–94.34%), respectively. Likelihood ratios forregional GPS and MRA results outside normal limits rangedfrom 4.0 to 10.0, and 6.0 to infinity, respectively. Disc size andseverity of disease were significantly associated with the sen-sitivity of both GPS and MRA.

CONCLUSIONS. GPS tended to have higher sensitivities and some-what lower specificities and lower likelihood ratios than MRA.These results suggest that in this population, GPS and MRAdifferentiate between glaucomatous and healthy eyes withgood sensitivity and specificity. In addition, the likelihoodratios suggest that GPS may be most useful for confirming anormal disc, whereas MRA may be most helpful in confirminga suspicion of glaucoma. Larger disc size and more severe fieldloss were associated with improved diagnostic accuracy forboth GPS and MRA. (Invest Ophthalmol Vis Sci. 2007;48:2653–2660) DOI:10.1167/iovs.06-1314

Glaucoma is a progressive optic neuropathy diagnosed byidentifying characteristic optic nerve, retinal nerve fiber

layer (RNFL), and visual field damage. For more than 10 years,confocal scanning laser ophthalmoscopy (CSLO), scanning la-ser polarimetry (SLP), and optical coherence tomography(OCT) have provided objective, reproducible measurementsthat, when used with other clinical information, can assist theclinician in differentiating between normal and glaucomatouseyes. Imaging instruments recently have incorporated norma-tive databases into their statistical analyses so that an automaticassessment of whether the eye is outside normal limits (ONL)is provided to the clinician. The diagnostic accuracy of opticdisc and RNFL measurements and the automatic classificationsof these instruments have been evaluated extensively.1–4 Al-though the overall diagnostic accuracy of the best parametersof each of these imaging instruments may be similar, theperformance of diagnostic tests can vary among subgroups ofpatients with glaucoma, according to clinical and nonclinicalcharacteristics.2–6

It is well established that the accuracy of tests increaseswith increasing severity of disease. It is therefore important toprovide an estimate of diagnostic precision at various stages ofdisease. In addition, optic disc size has been shown to influ-ence the diagnostic accuracy of imaging instruments, particu-larly the confocal scanning laser ophthalmoscope.6–12 Re-cently Medeiros et al.8 have used marginal logistic regressionmethods13–15 for simultaneous evaluation of the effect of se-verity of disease and disc size on the diagnostic accuracy ofimaging instruments.

Each imaging instrument has specific advantages and limi-tations.1 A limitation of a commercially available CSLO, theHeidelberg Retina Tomograph (HRT, Heidelberg Engineering,Heidelberg, Germany) has been its reliance on an operator tooutline the disc margin before topographic optic disc param-eters can be calculated. The outlining of the disc margin addsprocessing time, and differences in how it is completed canlead to interobserver variability in stereometric variables.16,17

In addition, many of these topographic optic disc parametersare calculated with a reference plane.

The recently released HRT software version 3.0 includes theGlaucoma Probability Score (GPS). As the GPS calculation isbased on the overall shape of the optic nerve head and poste-rior pole and does not rely on the outlining of the disc margin

From the 1Hamilton Glaucoma Center, Department of Ophthal-mology, the 2Division of Biostatistics and Bioinformatics, Departmentof Family and Preventive Medicine, University of California, San Diego,La Jolla, California.

Supported in part by National Eye Institute Grant EY11008 (LMZ)and EY08208 (PAS).

Submitted for publication November 1, 2006; revised January 24,2007; accepted April 4, 2007.

Disclosure: L.M. Zangwill, Carl Zeiss Meditec (F), HeidelbergEngineering (F); S. Jain, None; L. Racette, None; K.B. Ernstrom,None; C. Bowd, None; F.A. Medeiros, Carl Zeiss Meditec (F); P.A.Sample, Carl Zeiss Meditec (F), Welch-Allyn (F), Haag Streit (F); R.N.Weinreb, Carl Zeiss Meditec (F, R), Heidelberg Engineering (F, R)

The publication costs of this article were defrayed in part by pagecharge payment. This article must therefore be marked “advertise-ment” in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Corresponding author: Linda M. Zangwill, Hamilton GlaucomaCenter, Department of Ophthalmology 0946, University of California,San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946;[email protected].

Investigative Ophthalmology & Visual Science, June 2007, Vol. 48, No. 6Copyright © Association for Research in Vision and Ophthalmology 2653

for its calculation,18 it may be less influenced by optic disc sizethan are conventional CSLO topographic optic disc parametersand the Moorfields Regression Analysis (MRA). The objective ofthis study was to compare the GPS with the MRA for discrim-inating between glaucomatous and healthy eyes and to evalu-ate the influence of disease severity and optic disc size on thediagnostic accuracy of these two classification systems.

METHODS

Subjects

One randomly selected eye from each of 99 patients with glaucomaand 62 normal subjects participating in the longitudinal DiagnosticInnovations in Glaucoma Study (DIGS) was included in the study.

All participants underwent a complete ophthalmic examinationincluding slit lamp biomicroscopy, intraocular pressure measurement,dilated stereoscopic fundus examination, and standard automated pe-rimetry (SAP) using the Swedish Interactive Threshold Algorithm(SITA) and the 24-2 program (Humphrey Field Analyzer; Carl ZeissMeditec, Inc., Dublin, CA). Visual fields were reliable (fixation lossesand false positive and false negative responses �33%). To be includedin DIGS, at study entry, all participants had open angles, a best cor-rected acuity of 20/40 or better, a spherical refraction within �5.0 D,and cylinder correction within �3.0 D. A family history of glaucomawas allowed.

Participants were excluded from DIGS if they had a history ofintraocular surgery except for uncomplicated cataract or glaucomasurgery. Participants were also excluded if there was evidence ofsecondary causes of elevated IOP (e.g., iridocyclitis, trauma), otherintraocular eye disease, other diseases affecting the visual field (e.g.,pituitary lesions, demyelinating diseases, HIV� or AIDS, or diabeticretinopathy), or medications known to affect visual field sensitivity.

For purposes of this analysis, patients were classified as havingglaucoma if they had at least two consecutive standard automatedperimetry examinations with either a pattern standard deviation (PSD)outside the 95% normal limits or a glaucoma hemifield test (GHT)result outside the 99% normal limits. At least one of the abnormal fieldswas obtained within 6 months of CSLO imaging. The appearance of theoptic disc was not used as criteria for designation as glaucomatous.

Normal control eyes had IOP �22 mm Hg with no history ofelevated IOP and with normal visual field results, defined as a PSDwithin 95% confidence limits and a GHT result within normal limits(WNL). Normal control eyes also had no evidence of glaucomatousoptic disc damage (no diffuse or focal rim thinning, or RNFL defects)as evaluated by clinical examination.

The severity of visual field damage was assessed on a scale of 0 (nofield loss) to 20 (end-stage glaucoma), according to the AdvancedGlaucoma Intervention Study (AGIS) severity score. The AGIS score isbased on the extent of damage measured by the total deviation plot atdifferent visual field locations and has been described in detail else-where.19

The research adhered to the tenets of the Declaration of Helsinki.Informed consent was obtained from all participants and the Universityof California, San Diego, Human Subjects Committee approved allmethodology.

Confocal Scanning Laser Ophthalmoscopy

The HRTII provides topographical measures of the optic disc andperipapillary retina and has been discussed in detail elsewhere.1 Threescans centered on the optic disc were automatically obtained for eachtest eye, and a mean topography was created. Magnification errorswere corrected by using patients’ corneal curvature measurements.The optic disc margin was outlined on the mean topography image bytrained technicians while they viewed simultaneous stereoscopic pho-tographs of the optic disc. All images included in the analysis were

reviewed for adequate centration, focus, and illumination; all meantopography images had a standard deviation of �50 �m. The scanswere obtained with HRT software version 1.5.9.0 or earlier, but wereanalyzed with the recently released software version 3.0.

HRT software version 3.0 includes improved alignment algorithms,a larger normative database, and the calculation of the GPS.18 Twomeasures of peripapillary retinal nerve fiber layer shape (horizontal andvertical retinal nerve fiber layer curvature) and three measures of opticnerve head shape (cup depth, rim steepness, and cup size) are used asinput into a relevance vector machine learning classifier to estimate theprobability of having glaucoma as between 0% and 100%. Two math-ematical functions are used to model the topography of the optic nervehead: (1) A Gaussian cumulative distribution function is used to modelthe optic disc, and (2) a quadratic (parabolic) surface is used to modelthe peripapillary retina. To parameterize the cup, a parabolic surface isfitted to the peripapillary region of each topograph. As outlined inSwindale et al.,18 the parabolic surface, which serves as a referenceplane for estimating the cup parameters, is then subtracted from thetopography. The average location of the deepest points in the differ-ence topograph is used to identify the cup center. In contrast toSwindale et al.,18 the GPS constructs a cumulative Gaussian distribu-tion of topograph heights to estimate the cup radius (r) such thatp(radius � r) � 0.5. The cup radius r serves as a cup margin. Thus, thecup area is computed as the area of circle of radius r and the mean cupdepth is computed as the average height of the measurements insidethe cup in the difference topograph. The rim steepness estimates arederived from the radial topograph–height gradients. No contour line orreference plane is used in the GPS calculation. GPS output is thenautomatically classified into three categories: outside normal limits(ONL; GPS � 64%), borderline (BL; GPS between 24% and 64%) andwithin normal limits (WNL; GPS � 24%).

The MRA compares measured rim area to predicted rim area ad-justed for disc size, to categorize eyes as ONL, BL, or WNL.9 It relies ona contour line and the standard reference plane (50 �m below themean height of the contour in the temporal sector between 350° and356°) for its measurements. By using the HRT 3.0 software, both theGPS and MRA classify eyes as within normal limits (WNL), borderline(BL), or outside normal limits (ONL), according to the same normativedatabase of 700 eyes of whites and 200 eyes of African-Americans. Forthis analysis, the white normative database was used because most ofthe DIGS participants are of European descent. The comparison to thenormative database is provided in six regions (superior temporal,inferior temporal, temporal, superior nasal, inferior nasal, and nasal),and as an overall global classification (if any of the six regions are ONL,then the eye is classified as ONL). For analysis using the MRA and GPSas categorical variables (ONL versus WNL), BL values were consideredWNL for estimates of the sensitivity, specificity, and likelihood ratio.

In addition to estimating diagnostic accuracy by using the MRA andGPS as categorical variables, we evaluated the sensitivity of each atfixed specificities of 80% and 90%. We converted the MRA into acontinuous variable by subtracting the predicted MRA from the actualMRA. This difference is used for determining whether the MRA is ONL.The difference between the predicted and actual MRA for each regionwas used to estimate sensitivity. For GPS the continuous variable usedwas the RVM output between 0 and 100.

The area under the receiver operator characteristic curve (AUROC)was calculated for both the three-level categorical variables (WNL, BL,and ONL) and for the MRA and GPS continuous variables MRA (pre-dicted minus actual) and GPS (relevance vector machine output be-tween 0%–100%).

Statistical Analysis

The sensitivity, specificity, and likelihood ratios of the MRA and GPSwere compared for both global and regional results. The 95% CI forsensitivity and specificity were calculated by the Wilson ScoreMethod without continuity correction for proportions.20,21 Likeli-hood ratio confidence intervals (CI) were computed with the Simel

2654 Zangwill et al. IOVS, June 2007, Vol. 48, No. 6

et al. method,22 and AUROC CIs were calculated using the methodof Delong et al.23

We compared the influence of disc size and severity of glaucoma onthe diagnostic accuracy of the MRA and GPS as categorical variables(ONL versus not ONL) by using a generalized estimating equation(GEE) marginal logistic regression modeling approach. This methoddirectly compares the effect of covariates on several tests performedon the same group of subjects and adjusts for subject-specific correla-tion.13–15 Sensitivity is the dependent variable in the GEE logisticregression model. The covariates, disc area, AGIS score, and test type(GPS or MRA) were entered into the model as independent variables,and first-order interaction terms were included. An exchangeable cor-relation structure among observations was assumed. This method hasrecently been applied to compare the influence of disc size andseverity of glaucoma on the sensitivity of the GDxVCC (Carl ZeissMeditec, Inc.), the HRT (Heidelberg Engineering), and the Stratus OCT(Carl Zeiss Meditec, Inc.) for glaucoma detection.8

In addition, as a confirmatory analysis of the categorical diagnosticfindings, we evaluated the sensitivity for detection of glaucoma at two

levels of specificity: 80% and 90%. For this analysis, the 80% and 90%cutoffs for specificity were determined by using the healthy controleyes and were applied to the glaucomatous eyes to determine thecorresponding sensitivity cutoffs. This allowed sensitivity to be studiedat the same level of specificity. GEE marginal logistic regression wasthen used to evaluate the influence of disc area and severity of glau-coma on diagnostic performance. The dependent variable, sensitivity,was dichotomized to 1 or 0 if the result of the diagnostic test was aboveor below the sensitivity cutoff.

Statistical analyses were performed with commercial software (JMPver. 6.0; SAS Institute, Cary, NC) and R (http://www.r-project.org/),version 2.1.1.

RESULTS

The demographic and ocular characteristics of the 99 subjectswith glaucoma and the 62 healthy control subjects are pre-sented in Table 1. The average age (95% CI) of patients with

TABLE 1. Demographic and Ocular Characteristics of the Glaucoma and Healthy Control Study Groups

Glaucoma Eyes(n � 99)

Normal Eyes(n � 62) P

Age (y)* 67.9 (65.8,70.0) 57.7 (55.2,60.3) �0.0001Gender (% male) 52% 26% 0.001Visual field pattern

standard deviation (dB)* 5.7 (5.0,6.5) 1.5 (1.4,1.6) �0.0001Visual field mean deviation (dB)* �5.2 (�6.2,�4.2) �0.5 (�0.7,�0.2) �0.0001AGIS Score* 3.9 (3.1,4.7) .02 (�0.02,0.04) �0.0001Optic disc size* (mm2) 1.97 (1.86,2.07) 1.76 (1.67,1.85) 0.006

* Mean (95% CI) and t-test used to calculate significance.

TABLE 2. AUROC, Sensitivity, Specificity, and Likelihood Ratios of the MRA and GPS as Categorical Variables

AUROC(95% CI)

Sensitivity(95% CI)

Specificity(95% CI)

Likelihood Ratio

OutsideNormal Limits

(95% CI)Borderline(95% CI)

Within NormalLimits

(95% CI)

GlobalMRA 0.74

(0.66–0.83)67.7

(58.0–76.1)88.7

(78.5–94.4)5.99

(2.94–12.20)1.06

(0.52–2.17)0.20

(0.12–0.34)GPS 0.70

(0.61–0.80)71.7

(62.2–79.7)82.3

(71.0–89.8)4.04

(2.33–7.00)0.91

(0.48–1.71)0.14

(0.07–0.28)Temporal superior

MRA 0.67(0.60–0.73)

25.3(17.7–34.6)

100(94.2–100.0)

InfNA

1.96(0.89–4.33)

0.59(0.48–0.72)

GPS 0.68(0.58–0.77)

64.7(54.8–73.4)

93.6(84.6–97.5)

10.02(3.84–26.14)

0.80(0.47–1.35)

0.18(0.1–0.32)

Temporal inferiorMRA 0.78

(0.71–0.85)48.5

(38.9–58.2)98.4

(91.4–99.7)30.11

(4.19–218.80)2.63

(1.04–1.61)0.33

(0.24–0.45)GPS 0.69

(0.59–0.78)72.7

(63.2–80.5)88.7

(78.5–94.4)6.44

(3.17–13.08)0.66

(0.35–1.25)0.18

(0.09–0.34)Nasal superior

MRA 0.72(0.66–0.78)

35.4(26.6–45.2)

98.4(91.4–99.7)

21.96(3.08–155.96)

3.54(1.08–11.61)

0.50(0.4–0.63)

GPS 0.74(0.65–0.83)

67.7(58.0–76.1)

85.5(74.7–92.2)

4.66(2.51–8.66)

1.19(0.61–2.3)

0.16(0.09–0.29)

Nasal inferiorMRA 0.74

(0.68–0.81)43.4

(34.1–53.3)96.8

(89.0–99.1)13.45

(3.38–53.62)2.66

(0.93–7.54)0.43

(0.33–0.56)GPS 0.70

(0.60–0.79)68.7

(59.0–77.0)88.7

(78.5–94.4)6.08

(2.99–12.38)0.83

(0.45–1.5)0.17

(0.09–0.3)

AUROC is in three categories: outside normal limits, borderline and within normal limits. Sensitivity and specificity are outside normal limitsversus not outside normal limits.

IOVS, June 2007, Vol. 48, No. 6 Effect of Disc Size and Disease Severity on Diagnostic Accuracy 2655

glaucoma was significantly higher than the age of those withthe healthy control eyes: 67.9 (65.8–70.0) and 57.7 (55.2–60.3) years, respectively. Mean deviation (95% CI) of the stan-dard automated perimetry (SAP) test closest to the CSLO im-aging date was �5.2 (�6.2 to �4.2) dB in glaucomatous eyesand �0.05 (�0.7 to �0.2) dB in control eyes. The severity ofglaucomatous visual field defects ranged from early to severe(average [95% CI]) PSD was 5.7 (5.0–6.5) dB; and AGIS scorewas 3.9 (3.1–4.7).

The AUROC curves are presented for the GPS and MRA asthree-level categorical variables in Table 2 and as continuousvariables in Table 3. In general, the AUROC curves estimatedfrom the continuous variables tended to be higher for GPS thanfor MRA. However, these differences reached statistical signif-

icance only in the temporal superior region. The AUROC curvedifferences between the GPS and MRA were less consistent forestimates based on categorical variables.

The analysis of the GPS and MRA as categorical variables(outside normal limits versus not outside normal limits)found that both GPS and MRA have relatively high specific-ity, at least 82% for all regions. In general, the MRA had ahigher specificity (range, from 88.7% for global to 100% inthe temporal superior region) than GPS (range from 82.3%for global to 93.6% in the temporal superior region). How-ever, the GPS tended to have higher sensitivity (range, from64.7% temporal superior to 72.7% in the temporal inferiorregion), than MRA (25.3% in the temporal superior region to67.7% globally). These differences in diagnostic accuracybetween MRA and GPS were found to be statistically signif-icant in the logistic marginal regression models for each ofthe four regional analyses (variable test type, P � 0.001;Table 4, Fig. 1).

The MRA tended to have larger likelihood ratios of anONL result (range, 5.99 to infinity) than the GPS (range,4.04 –10.02). Likelihood ratios can be interpreted by theireffect on the posttest probability of disease,24 with likeli-hood ratios between 0.5 and 2 considered as having aninsignificant effect, ratios between 0.2 and 0.5 or between 2and 5 as having a small effect, ratios between 0.1 and 0.2 orbetween 5 and 10 as having a moderate effect, and ratios�0.1 or �10 as having a large effect. The MRA likelihoodratios of an ONL result were all �6, indicating a moderate tolarge effect on the posttest probability of glaucoma. The GPSONL value had a moderate effect on posttest probabilities ofglaucoma, with values ranging from 4.04 to 10.02. Thelikelihood ratios for a WNL result were smaller for GPS thanfor MRA, with most GPS likelihood ratios having a moderateto large effect on posttest probabilities (range, 0.014 – 0.18),whereas MRA likelihood ratios had an insignificant effect(range, 0.20 – 0.59). BL values had an insignificant or weakeffect on posttest probabilities, with likelihood ratios forMRA ranging from 1.06 to 3.54 and for GPS ranging from0.66 to 1.19.

Moreover, as shown in Table 4 and Figure 1, the GEElogistic marginal regression models indicate that for each re-gion, the independent variables AGIS score and disc size influ-enced the diagnostic accuracy (normal versus ONL) of bothGPS and MRA (P � 0.05). For the global parameter, however,only severity of glaucoma (AGIS) was positively associated withincreased sensitivity (P � 0.007); disc size did not reach sta-tistical significance (P � 0.081).

TABLE 3. Sensitivity at 80% and 90% Specificity for the MRA and GPS

AUROC(95% CI)

Sensitivity (95% CI)

At 80%Specificity

At 90%Specificity

GlobalMRA 0.82

(0.75–0.88)67%

(59%–74%)60%

(52%–68%)GPS 0.86

(0.80–0.92)70%

(62%–77%)67%

(59%–74%)Temporal superior

MRA 0.79(0.73–0.86)

60%(52%–68%)

45%(37%–53%)

GPS 0.87(0.81–0.93)

77%(70%–83%)

70%(62%–77%)

Temporal inferiorMRA 0.86

(0.81–0.92)76%

(68%–82%)68%

(60%–75%)GPS 0.87

(0.81–0.93)76%

(68%–82%)71%

(63%–78%)Nasal superior

MRA 0.80(0.73–0.87)

72%(64%–79%)

58%(50%–65%)

GPS 0.86(0.80–0.92)

75%(67%–81%)

64%(56%–71%)

Nasal superiorMRA 0.81

(0.74–0.87)74%

(66%–81%)68%

(60%–75%)GPS 0.86

(0.80–0.92)76%

(68%–82%)67%

(59%–74%)

MRA analyzed as a continuous variable: predicted minus actual rimarea.

TABLE 4. GEE Logistic Regression Modeling Sensitivity of MRA and GPS

Variable Coef.

Global Temporal Superior Temporal Inferior Nasal Superior Nasal Inferior

Est. SE P Est. SE P Est. SE P Est. SE P Est. SE P

Constant 1.11 0.27 <0.001 0.87 0.29 0.003 1.24 0.30 <0.001 0.91 0.26 <0.001 1.02 0.29 <0.001AGIS �1 0.17 0.06 .007 0.17 0.07 0.015 0.21 0.08 .008 0.16 0.06 0.007 0.18 0.07 .007Disc area �2 1.34 0.77 0.081 2.46 0.86 0.004 1.49 0.76 .051 1.65 0.78 0.035 1.85 0.84 .027Test-type �3 �0.21 0.31 0.494 �2.11 0.34 <0.001 �1.27 0.30 <0.001 �1.60 0.32 <0.001 �1.29 0.29 <0.001AGIS �

disc area �4 �0.07 0.11 0.536 �0.03 0.09 0.738 0.05 0.10 0.572 �0.08 0.10 0.418 �0.02 0.11 .835AGIS �

test-type �5 �0.04 0.08 0.577 0.00 0.08 0.978 �0.09 0.08 .0264 �0.04 0.07 0.601 �0.05 0.08 .558Disc area �

test-type �6 0.35 0.79 0.655 �0.97 1.03 0.349 �0.53 0.65 0.409 �0.47 0.82 0.566 �0.49 0.74 .509

Logit(sensitivity) � �0 � �1(AGIS) � �2(disc area) � �3(test-type) � �4(AGIS � disc area) � �5(AGIS � test-type) � �6(disc area �test-type). Est., estimate; Coef., coefficient. Boldface P-values are statistically significant.

2656 Zangwill et al. IOVS, June 2007, Vol. 48, No. 6

The GEE marginal logistic model used was

logit(sensitivity) � �0 � �1(AGIS) � �2(disc area)

� �3(test-type)� �4(AGIS � disc area)

� �5(AGIS � test-type) � �6(disc area � test-type),

where AGIS represents the AGIS scores for severity of visualfield defect, disc area represents the optic disc area, test type

is an indicator variable coding for type of test, MRA versus GPS(GPS is the reference test), and sensitivity is a dichotomousdependent variable. For the continuous case, sensitivity wasdichotomized based on cutoffs determined at fixed specifici-ties. Three first-order (pair-wise) interaction terms were alsoincluded.

The parameter estimate of the slope suggests that as discsize increases, the sensitivity increases. Similarly, sensitivityincreases with the severity of visual field damage. The interac-

FIGURE 1. After adjustment for discsize and AGIS score, the sensitivity ofthe GPS was significantly higher thanthe MRA when the internal norma-tive database was used to classifyeyes as ONL. With the temporal infe-rior results as an example, the illus-tration shows that MRA (A) and GPS(B) sensitivity both increased withincreasing disc area and more severevisual field damage (higher AGISscores) and that GPS sensitivitytended to be higher.

IOVS, June 2007, Vol. 48, No. 6 Effect of Disc Size and Disease Severity on Diagnostic Accuracy 2657

tion terms (disc size versus test type and AGIS score versus testtype) were not statistically significant, suggesting that disc sizeand severity of glaucomatous visual field damage does notaffect MRA differently than it affects GPS.

In addition to analyzing the GPS and MRA as categoricalvariables, we modeled the sensitivity of the GPS and MRA fordetecting glaucoma at two fixed specificities, 80% and 90%, toensure that the analysis on GPS and MRA as categorical vari-ables was not driven by the inherent tradeoff between sensi-tivity and specificity. At a fixed specificity of 90%, both severityof visual field damage, and disc size were significantly associ-ated with GPS and MRA sensitivity for detecting glaucoma(Table 5, Fig. 2). At both levels of fixed specificity, no signifi-cant differences were found between the sensitivity of the GPSand MRA. However, at a specificity of 80%, the influence ofdisc size on sensitivity reached statistical significance in thetemporal superior region (P � 0.029), but not in the global(P � 0.093), temporal inferior (P � 0.098), nasal superior (P �0.141), and nasal inferior (P � 0.158) regions.

DISCUSSION

We found that, when compared to the internal normativedatabase the GPS results ONL tended to have higher sensitivity,but lower specificity than MRA results. The differences insensitivity and specificity were larger in the regional outputcompared with the global MRA and GPS output. In addition,this study demonstrates that disc size and severity of visual fielddamage influence the diagnostic accuracy of both the GPS andMRA in a similar manner. Sensitivity improves with increasingdisc size and severity of visual field damage. This finding sug-gests that even though outlining of the disc margin is notnecessary for the GPS, the size of the disc still influences thediagnostic accuracy of the measurement.

The finding that GPS sensitivity tends to be higher than thatof the MRA are consistent with the sensitivity for detection ofearly glaucoma reported by Harizman et al.,25 72.3% and59.6%, respectively. This study also confirms previous reportsthat the diagnostic accuracy of HRT parameters, particularlylinear discriminant functions and the MRA improve with in-creasing disc size,6–12,26 probably due to the difficulty in de-tecting neuroretinal rim loss in a small disc compared with alarge disc.8 Most of these studies evaluated the association ofdiagnostic accuracy with disc sizes using univariate stratifiedanalysis or regression techniques that do not take into consid-eration the severity of disease. It is likely that disc size andother covariates have more of an effect on diagnostic accuracyin eyes with early compared with severe glaucomatous visualfield damage. The advantage of the multivariate technique used

in the present study is that it can evaluate and control for theseverity of visual field damage and disc size in the same anal-ysis.

To compare more directly the results of the two classifica-tion parameters, we also chose to examine the sensitivity attwo levels of specificity. At fixed specificities, the sensitivity ofGPS and MRA were not significantly different. At 90% specific-ity, both disc size and severity of visual field damage wereassociated with the sensitivity of the measurements. However,at 80% specificity, the sensitivity was significantly associatedwith severity of visual field damage, but was not consistentlyassociated with disc size (P ranged from 0.029 for the temporalsuperior region to 0.158 for the nasal superior region). It isunclear why disc size was not consistently associated with thesensitivity at 80% specificity.

To complete the analysis at fixed specificity, we incorpo-rated the MRA in the analysis as a continuous variable (actualminus predicted) and compared it directly to the GPS. Thiscomparison of continuous variables utilizes the normative da-tabase to calculate the predicted values, and can be consideredcomparable to the categorical values provided to the clinicianon an HRT printout.

There are several ways to describe and summarize theability of a diagnostic test to detect disease. The most commonmeasures of diagnostic accuracy include sensitivity, specificity,and AUROC curves. The advantages and limitations of thesemeasures have been described recently.8,27 In brief, sensitivity,specificity, and AUROC curve provide important informationabout the overall diagnostic accuracy of a test and were there-fore used to compare the diagnostic accuracy of the MRA andGPS parameters, and to evaluate the effects of glaucoma sever-ity and disc size. However, sensitivity, specificity, and AUROCdo not necessarily provide information in a form that is usefulto the clinician or patient in clinical decision-making. Forexample, sensitivity and specificity are related (as sensitivityincreases, specificity decreases and vice versa) and depend onthe specific cutoffs used to define the disease. For this reason,we included a comparison of the sensitivity of the GPS andMRA at fixed specificities. By definition, at a high fixed speci-ficity of 90%, 10% of normal eyes will be classified as glauco-matous. Unfortunately, it is difficult to apply this informa-tion to a specific patient. Similarly, the AUROC curve isimportant for comparing the diagnostic accuracy of twodifferent diagnostic tests, but has little clinical meaning formaking decisions regarding a particular patient. In contrast,the likelihood ratio provides this type of information; itexpresses the magnitude by which the probability of adiagnosis in a given patient is modified by the results of thetest. In another words, the likelihood ratio indicates how

TABLE 5. GEE Logistic Regression Modeling Sensitivity of MRA and GPS at 90% Specificity

Variable Coef.

Global Temporal Superior Temporal Inferior Nasal Superior Nasal Inferior

Est. SE P Est. SE P Est. SE P Est. SE P Est. SE P

Constant 0.87 0.25 0.001 1.10 0.30 <0.001 1.08 0.28 0.000 0.64 0.23 0.006 0.66 0.24 0.007AGIS �1 0.18 0.06 0.004 0.19 0.09 0.026 0.18 0.07 0.005 0.12 0.05 0.019 0.13 0.05 0.015Disc area �2 1.58 0.75 0.035 2.11 0.77 0.006 1.77 0.78 0.024 1.25 0.63 0.047 1.57 0.72 0.030Test-type �3 �0.37 0.30 0.227 �1.23 0.29 <0.001 �0.21 0.29 0.472 �0.23 0.30 0.445 �0.01 0.30 0.986AGIS �

disc area �4 �0.05 0.11 0.655 0.11 0.13 0.406 0.02 0.12 0.853 �0.10 0.09 0.265 �0.11 0.10 0.279AGIS � test

type �5 �0.09 0.08 0.270 �0.03 0.08 0.685 �0.03 0.08 0.666 �0.07 0.07 0.271 �0.11 0.07 0.101Disc area �

test type �6 0.90 0.87 0.304 0.01 0.67 0.990 �0.79 0.75 0.290 0.90 0.77 0.241 0.92 0.80 0.247

Boldface P-values are statistically significant.

2658 Zangwill et al. IOVS, June 2007, Vol. 48, No. 6

much a given diagnostic test result will raise or lower thepretest probability of the disease in question. We thereforereported the likelihood ratios for the three categorical out-puts of the MRA and GPS: ONL, BL, and WNL. We found thatan MRA output of ONL had a moderate to large effect on theposttest probability of glaucoma both globally and for eachregion. A GPS output of ONL had a moderate effect onposttest probability. A GPS output of WNL was much morestrongly associated with the probability that the test resultwas normal than was the same output from the MRA. The

results in this study population suggest that GPS providesbetter information for confirming a normal disc, whereasMRA is most helpful in confirming a suspicion of glaucoma.It should be noted that even small changes in posttestprobability may be relevant, depending on other relevantclinical information and the pretest probability of disease.

There are several possible limitations to the present study.First, the subjects with glaucoma were older than the normalsubjects, which may lead to an overestimation of the diagnosticaccuracy of the methods. However, the main objective of this

FIGURE 2. After adjustment for discsize and AGIS score, the sensitivity(at a fixed specificity of 90%) for glau-coma detection of the GPS was sim-ilar to that of the MRA. With thetemporal inferior results as an exam-ple, the illustration shows that MRA(A) and GPS (B) sensitivity both in-creased with increasing disc area andmore severe visual field damage(higher AGIS score values) and thatthe sensitivity was similar for bothclassification systems.

IOVS, June 2007, Vol. 48, No. 6 Effect of Disc Size and Disease Severity on Diagnostic Accuracy 2659

study was to use the same population to compare the influenceof disc size and severity of disease between the MRA and GPS,and since there was no relationship between age and disc size(R2 � 0.006, P � 0.153) and a very weak relationship betweenage and AGIS score (R2 � 0.017, P � 0.018), it is unlikely thatthe age difference influenced the comparison. Second, thesample size of this study was modest, leading to relatively wideconfidence limits for estimates of AUROC, sensitivity, specific-ity, and likelihood ratios. Larger studies are needed to providemore precise estimates of the diagnostic accuracy. Finally, inour limited population, disc area was larger in the glaucoma-tous eyes than in the normal eyes, which may affect how discsize influenced the diagnostic accuracy of the MRA and GPS.However, the analysis of sensitivity at a fixed specificity usingglaucomatous eyes only confirmed the relationship of bettersensitivity with increasing disc size for both MRA and GPS.Other investigators also have found larger discs in subjectswith glaucoma than in normal control subjects.26 A likelyexplanation for this difference in disc size is sampling biasrelated to glaucoma’s being more difficult to detect in smalloptic discs.8 If this is the case, then small discs are likely to beunderrepresented and large discs overrepresented among pa-tients with diagnosed glaucoma in glaucoma specialty clinics.26

In conclusion, GPS can differentiate between glaucomatousand healthy eyes with relatively good sensitivity and specificity.Using the manufacturer’s suggested cutoff values for classifica-tion as ONL, the GPS tended to have higher sensitivities andlower specificities and likelihood ratios than did the MRA. Discsize influences the diagnostic accuracy of both the GPS andMRA.

References

1. Zangwill LM, Medeiros FA, Bowd C, Weinreb RN. Optic nerveimaging devices: recent advances. In: Grehn F, Stampher R, eds.Essentials in Ophthalmology: Glaucoma. Heidelberg, Germany:Springer-Verlag; 2004:63–91.

2. Greaney MJ, Hoffman DC, Garway-Heath DF, et al. Comparison ofoptic nerve imaging methods to distinguish normal eyes fromthose with glaucoma. Invest Ophthalmol Vis Sci. 2002;43:140–145.

3. Medeiros FA, Zangwill LM, Bowd C, Weinreb RN. Comparison ofthe GDx VCC Scanning Laser Polarimeter, HRT II Confocal Scan-ning Laser Ophthalmoscope, and Stratus OCT Optical CoherenceTomograph for the detection of glaucoma. Arch Ophthalmol.2004;122:827–837.

4. Zangwill LM, Bowd C, Berry CC, et al. Discriminating betweennormal and glaucomatous eyes using the Heidelberg Retina Tomo-graph, GDx Nerve Fiber Analyzer, and Optical Coherence Tomo-graph. Arch Ophthalmol. 2001;119:985–993.

5. Iester M, Jonas JB, Mardin CY, Budde WM. Discriminant analysismodels for early detection of glaucomatous optic disc changes.Br J Ophthalmol. 2000;84:464–468.

6. Iester M, Mikelberg FS, Drance SM. The effect of optic disc size ondiagnostic precision with the Heidelberg retina tomograph. Oph-thalmology. 1997;104:545–548.

7. Bowd C, Zangwill LM, Blumenthal EZ, et al. Imaging of the opticdisc and retinal nerve fiber layer: the effects of age, optic disc area,refractive error, and gender. J Opt Soc Am A Opt Image Sci Vis.2002;19:197–207.

8. Medeiros FA, Zangwill LM, Bowd C, et al. Influence of diseaseseverity and optic disc size on the diagnostic performance ofimaging instruments in glaucoma. Invest Ophthalmol Vis Sci.2006;47:1008–1015.

9. Wollstein G, Garway-Heath DF, Hitchings RA. Identification ofearly glaucoma cases with the scanning laser ophthalmoscope.Ophthalmology. 1998;105:1557–1563.

10. Bathija R, Zangwill L, Berry CC, et al. Detection of early glauco-matous structural damage with confocal scanning laser tomogra-phy. J Glaucoma. 1998;7:121–127.

11. Ford BA, Artes PH, McCormick TA, et al. Comparison of dataanalysis tools for detection of glaucoma with the Heidelberg RetinaTomograph. Ophthalmology. 2003;110:1145–1150.

12. Mardin CY, Horn FK. Influence of optic disc size on the sensitivityof the Heidelberg Retina Tomograph. Graefes Arch Clin Exp Oph-thalmol. 1998;236:641–645.

13. Leisenring W, Pepe MS, Longton G. A marginal regression model-ling framework for evaluating medical diagnostic tests. Stat Med.1997;16:1263–1281.

14. Liang KY, Zeger SL. Longitudinal data-analysis using generalizedlinear-models. Biometrika. 1986;73:13–22.

15. Martus P, Stroux A, Junemann AM, et al. GEE approaches tomarginal regression models for medical diagnostic tests. Stat Med.2004;23:1377–1398.

16. Iester M, Mikelberg FS, Courtright P, et al. Interobserver variabilityof optic disk variables measured by confocal scanning laser tomog-raphy. Am J Ophthalmol. 2001;132:57–62.

17. Miglior S, Albe E, Guareschi M, et al. Intraobserver and interob-server reproducibility in the evaluation of optic disc stereometricparameters by Heidelberg Retina Tomograph. Ophthalmology.2002;109:1072–1077.

18. Swindale NV, Stjepanovic G, Chin A, Mikelberg FS. Automatedanalysis of normal and glaucomatous optic nerve head topographyimages. Invest Ophthalmol Vis Sci. 2000;41:1730–1742.

19. Advanced Glaucoma Intervention Study. 2. Visual field test scoringand reliability. Ophthalmology. 1994;101:1445–1455.

20. Newcombe RG. Two-sided confidence intervals for the singleproportion: comparison of seven methods. Stat Med. 1998;17:857–872.

21. Wilson E. Probable inference, the law of succession, and statisticalinference. J Am Stat Assoc. 1927;22:209–217.

22. Simel DL, Samsa GP, Matchar DB. Likelihood ratios withconfidence: sample size estimation for diagnostic test studies.J Clin Epidemiol. 1991;44:763–770.

23. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areasunder two or more correlated receiver operating characteristiccurves: a nonparametric approach. Biometrics. 1988;44:837–845.

24. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medicalliterature. III. How to use an article about a diagnostic testB—what are the results and will they help me in caring for mypatients? The Evidence-Based Medicine Working Group. JAMA.1994;271:703–707.

25. Harizman N, Zelefsky JR, Ilitchev E, et al. Detection of glaucomausing operator-dependent versus operator-independent classifica-tion in the Heidelberg retinal tomograph-III. Br J Ophthalmol.2006;90:1390–1392.

26. Coops A, Henson DB, Kwartz AJ, Artes PH. Automated analysis ofheidelberg retina tomograph optic disc images by glaucoma prob-ability score. Invest Ophthalmol Vis Sci. 2006;47:5348–5355.

27. Langlotz CP. Fundamental measures of diagnostic examinationperformance: usefulness for clinical decision making and research.Radiology. 2003;228:3–9.

2660 Zangwill et al. IOVS, June 2007, Vol. 48, No. 6