27
Biostat I, week 2 Exercise 1 Worksheet : variance of binomial & sampling distribution of a proportion For each box in the following table, first revise what you learned in week 1 about continuous variables (on the left), then read about the same concepts for indicator variables (on the right). (a) Continuous variable (b) Indicator variable If a variable is X is measured on five individuals, giving the values X 1 =2. X 2 =5, X 3 =6, X 4 =8, X 5 =9, then the mean Out of 5 births, 2 boys and 3 girls are born. We use X to denote the sex of the baby (0 denotes “boy”, 1 denotes “girl”). X 1 =0. X 2 =1,X 3 =0, X 4 =1, X 5 =1. Mean = (0+1+0+1+1)/5 = 3/5 =proportion of girls (1’s) For sample of size n with k 1’s, Mean= k/n The variance : = average of residuals (i.e. squared distances from the mean) Using the same formula (as for continuous variables), the variance is: ((0-3/5) 2 +(1-3/5) 2 +…)/5 = (9/25+4/25+9/25+4/25+4/25)/5 = (30/25)×(1/5) = 6/25 = 0.24 The variance can also be calculated (sometimes more simply) as : the average of the squares - the square of the average = 6 Since all the values are 1 or 0, then the average of the squares (which are also all 1 or 0) is the same as the average of the raw values : This gives the same variance as in the box directly above; (3/5)×(2/5) = 6/25= 0.24

Exercise 1 Worksheet : variance of binomial & sampling

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I, week 2

Exercise 1 Worksheet : variance of binomial & sampling distribution of a proportion For each box in the following table, first revise what you learned in week 1 about continuous variables (on the left), then read about the same concepts for indicator variables (on the right).

(a) Continuous variable (b) Indicator variable

If a variable is X is measured on five individuals, giving the values X1=2. X2=5, X3=6, X4=8, X5=9, then the mean

Out of 5 births, 2 boys and 3 girls are born. We use X to denote the sex of the baby (0 denotes “boy”, 1 denotes “girl”).

X1=0. X2=1,X3=0, X4=1, X5=1.

Mean = (0+1+0+1+1)/5 = 3/5

=proportion of girls (1’s)      

 

For sample of size n with k 1’s, Mean= k/n

The variance : = average of residuals (i.e. squared distances from the mean)

 

 

Using the same formula (as for continuous variables), the variance is:

((0-3/5)2+(1-3/5)2+…)/5

= (9/25+4/25+9/25+4/25+4/25)/5

= (30/25)×(1/5) = 6/25 = 0.24

The variance can also be calculated (sometimes more simply) as :

the average of the squares - the square of the average

= 6

Since all the values are 1 or 0, then the average of the squares (which are also all 1 or 0) is the same as the average of the raw values :

This gives the same variance as in the box directly above;

(3/5)×(2/5) = 6/25= 0.24

Page 2: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I, week 2

From this it can be seen that for repeated sampling (samples of size n)

var(k) = np(1-p)

var(k/n)=p(1-p)/n

where k is the number of “successes” in each sample.

Recall:

the sampling distribution of the mean:

If we take repeated samples of size n from a population with mean μ and variance σ2, then, for large n, the distribution of the sample mean will be approximately normal with

mean equal to μ

and standard error of the mean equal to σ/√n

the sampling distribution of a proportion:

if p is the true proportion of “successes” in the population and we take repeated samples of size n, sample proportions will be approximately normally distributed with

mean p

standard error √p(1-p)/n

Page 3: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I, week 2

Exercise 2 (Diagnostic Tests) 1. Consider a test used to detect the presence of performance enhancing drugs in athletes. The test

returns a false positive 1% of the time. For an athlete who never used such drugs, and has been tested 100 times, would you be surprised if they have tested positive on at least one occasion?

2. The validity of a diagnostic test was assessed by applying it to 100 patients known to have the disease and 400 known to be free of the disease. There were 80 positive results in the first group and 8 in the second. State the sensitivity, specificity, false positive rate and false negative rate.

3. In the paper “Is intestinal biopsy always needed for diagnosis of celiac disease” (Scoglio et al,

Am Jour. of Gastroenterology, 2003), the authors report on a study of 181 patients suspected of celiac disease for whom histology confirmation is available. Verify the values reported on Table 2 for sensitivity, specificity, positive predictive value and negative predictive value.

4. A study has reported that the sensitivity of the mammogram as a screening test for detecting breast cancer is 0.85, while its specificity is 0.80.

(a) What is the probability of a false negative test result? (b) What is the probability of a false positive result? (c) In a population in which the probability that a woman has breast cancer is 0.0025, what is the probability that she has cancer given that her mammogram is positive? 5. (a) If a test has low sensitivity, how will this affect the reported prevalence rate based on using

this test? (b) If a test has low specificity, how will this affect the reported prevalence rate based on using

this test? (c) What prevalence rates will be reported if the test in question 2 above is applied to two

populations where the true prevalence rates are 21% and 7% respectively (assume 10,000 person in each of the populations and construct a tree diagram or 2×2 table).

(d) According to the true prevalence rates in the two populations, the rate ratio is 21/7 = 3. Would you expect this be the same, lower, or higher, when an imperfect diagnostic test is used? What is the rate ratio from your results in (c)?

6. (optional extra/difficult question) Suppose we have a diagnostic test that gives the correct result 99% of the time when the patient has the disease, and the correct result 98% of the time when the patient does not have the disease. (a) Is it possible that (in the long run) among all patients who test positive, less than 1% of them have the disease? Why/why not? (b) Explain the implications that this has for screening programs.

Page 4: Exercise 1 Worksheet : variance of binomial & sampling

Is Intestinal Biopsy Always Needed for Diagnosis ofCeliac Disease?Riccardo Scoglio, MD, Giuseppe Di Pasquale, MD, Giuseppe Pagano, MD, Maria Cristina Lucanto, MD,Giuseppe Magazzu`, MD, and Concetta Sferlazzas, MDDepartment of Pediatrics, GI Unit, University of Messina, Messina, Italy

OBJECTIVE: Intestinal biopsy is required for a diagnosis ofceliac disease (CD). The aim of this study was to assessdiagnostic accuracy of transglutaminase antibodies (TGA)in comparison and in association with that of antiemdomy-sial antibodies (AEA), calculating the post-test odds ofhaving the disease, to verify whether some patients mightavoid undergoing intestinal biopsy for a diagnosis of CD.

METHODS: A total of 181 consecutive patients (131� 18yr), referred to our celiac clinic by primary care physiciansfor suspect CD. Overall diagnostic accuracy, negative pre-dictive value, and likelihood ratio (LR) were calculated bothfor each serological test and for serial testing (TGA and afterAEA, assuming the post-test probability of TGA as pretestprobability of AEA). Both serological determination andhistological evaluation were blindly performed. Histologyof duodenal mucosa was considered the gold standard.

RESULTS: The overall accuracy of TGA and of AEA were92.8% (89.1–96.6) and 93.4% (89.7–97.0), respectively.The negative predictive value of TGA and AEA were 97.2%(91.9–102.6) and 87.2% (77.7–96.8), respectively. Positivelikelihood ratios for TGA and AEA were 3.89 (3.40–4.38)and 7.48 (6.73–8.23), respectively. Serial testing, in groupsof patients with prevalence of CD estimated higher than75%, such as those with classic symptoms of CD, wouldprovide a post-test probability of more than 99%.

CONCLUSIONS: Our results suggest that serial testing withTGA and AEA might allow, in some cases, the avoidance ofintestinal biopsy to confirm the diagnosis of CD. (Am JGastroenterol 2003;98:1325–1331. © 2003 by Am. Coll. ofGastroenterology)

INTRODUCTION

Celiac disease (CD) is a permanent intolerance to gluten ofsome cereals, mediated by an autoimmune mechanism, ingenetically predisposed individuals (1). According to theoriginal diagnostic criteria of the European Society for Pe-diatric Gastroenterology and Nutrition, to establish the di-agnosis definitively one needed to obtain an initial biopsyalong with abnormal small intestinal mucosa (usually flat),a second one after a clinical response to a gluten-free diet toshow histological response, and a third one to show clinical

and/or histological relapse after gluten challenge (2). Thesecriteria have been modified and new or revised EuropeanSociety for Pediatric Gastroenterology and Nutrition criteriaproduced (3), which do not regard serial biopsy and chal-lenge as necessary for all children diagnosed as having CDexcept in certain situations. These include age at presenta-tion of under 2 yr, atypical biopsy or clinical features, noprevious biopsy, and teenagers who plan to return to anormal gluten-containing diet despite advice to the contrary.

With the introduction and widespread use of serologicaltests to detect antigliadin and antiendomysial (AEA) anti-bodies, it was clear that CD may have been underdiagnosed.The “asymptomatic” form of CD, with typical serologicaland histological features despite the absence of symptoms,may be five to seven times more common than the symp-tomatic form of CD (4). More recently, tissue transglutami-nase (tTG) was identified as the unknown endomysial au-toantigen of CD (5), and ELISAs were established tomeasure IgA and IgG antitTG titers in serum samples (6).These antitransglutaminase antibodies (TGA), both againstguinea pig and human tTG, were demonstrated to have veryhigh diagnostic accuracy for CD, expressed as sensitivityand specificity of the serological test (7, 8). Unfortunately,most studies are biased because the reference or gold stan-dard (intestinal biopsy) was not performed, regardless of theresults of the test under evaluation, and the serological testwas performed on sera of already known celiac and controlpatients. Moreover, the whole point of a diagnostic test is touse it to make a diagnosis, so one needs to know theprobability that the test will give the correct diagnosis.Sensitivity and specificity do not give us this information;instead, predictive values are recommended. Positive andnegative predictive values are known as posterior probabil-ities. The difference between the probability of having adisease before the test is carried out (pretest probability) andpost-test probability is one way of assessing the usefulnessof the test. For any test result, the probability of getting thatresult if the patient truly had the condition of interest withthe corresponding probability if he or she were healthy (theratio of these probabilities is called the likelihood ratio, LR)indicates the value of the test for increasing certainty abouta positive diagnosis (9). In the present study, we assesseddiagnostic accuracy of TGA in comparison and in associa-

THE AMERICAN JOURNAL OF GASTROENTEROLOGY Vol. 98, No. 6, 2003© 2003 by Am. Coll. of Gastroenterology ISSN 0002-9270/03/$30.00Published by Elsevier Inc. doi:10.1016/S0002-9270(03)00229-6

Page 5: Exercise 1 Worksheet : variance of binomial & sampling

tion with AEA, in consecutive patients referred to undergointestinal biopsy for suspect CD, calculating the post-testodds of having the disease, that is the pretest odds multipliedby the LR. In this way we aimed at verifying whetherintestinal biopsy is always necessary for diagnosis of CD.This study suggests that in some cases it is possible to avoidan intestinal biopsy.

PATIENTS AND METHODS

A total of 181 consecutive patients, 131 � 18 yr (86 women,45 men; mean age 7.6 yr, range 1–17 yr), 50 adults (37women, 13 men; mean age 30.4 yr, range 18–69 yr), with-out any known GI disorder, were enrolled in the study fromOctober 7, 1999, to July 20, 2000. They were referred to ourceliac clinic by primary care physicians to undergo intesti-nal biopsy for positive serological tests for CD (AEA orguinea pig-TGA or both) and/or a suspect CD, based on GIsymptoms (chronic diarrhea, weight loss or failure to thrive,abdominal pain, abdominal distension, dyspepsia), extra-GIsymptoms or clinical conditions associated with CD (sid-eropenic anemia refractory to therapy with iron, short stat-ure, recurrent stomatitis, insulin-dependent diabetes melli-tus, autoimmune diseases, alopecia areata, reproductivedisorders, IgA nephropathy, Down syndrome), and familialor scholar screening for CD. Symptoms and clinical condi-tions of the patients enrolled in the study are shown in Table1. After drawing a blood sample and storing frozen serumfor AEA and TGA determination, all patients underwentesophagogastroduodenoscopy, to provide at least three bi-opsy samples taken from the third part of the duodenum.Marsh’s modified classification was used for histologicalexamination (10).

On the basis of this original standardized scheme, fivedifferent histological pictures can be described, from type 0to type 4, which are grouped in three different diagnosticclasses: normal (type 0), borderline including type 1–2,diagnostic including type 3 (a, b, c), and 4.

Briefly, type 0 is a normal mucosa with less than 40intraepithelial lymphocytes (IEL)/100 epithelial cells (EC);type 1 is the infiltrative type, which is characterized by anormal villous architecture, a normal height of the cryptsand an increase in IEL numbers up to more than 40 IEL/100EC; type 2 is the hyperplastic type, characterized by anormal villous architecture, an increase in IEL numbers upto more than 40 IEL/100 EC and crypt hyperplasia.

The original Marsh’s classification type 3 lesions, thedestructive ones, which represent diagnostic lesions, arehere subdivided into subgroups a–c: type 3a, characterizedby a mild villous flattening, an increase in crypt height, andan increase in IEL numbers up to more than 40 IEL/100 EC;type 3b, characterized by a marked villous flattening, anincrease in crypt height, and an increase in IEL numbers upto more than 40 IEL/100 EC; type 3c, characterized by a flatmucosa (total villous flattening), an increase in crypt height,and an increase in IEL numbers up to more than 40 IEL/100EC; type 4 is the very rare hypoplastic lesion. According tothis classification even a mild villous atrophy may representa diagnostic finding.

Forty-one patients with Crohn’s disease (mean age 20.4yr, range 4–65 yr, 21 women) were also enrolled as patho-logical controls. Total serum IgA were preliminarily deter-mined in all the patients to exclude IgA deficiency. SerumIgA AEA were measured by means of indirect immunoflu-orescence using cryostat sections of human umbilical cordfrom a commercial kit (IPR, Catania, Italy). Serum IgAGP-TGA were measured by ELISA method established inour laboratory, preliminarily validated on sera of 43 patientsknown to have CD, and on sera of 56 patients in whom CDhad been ruled out, showing a 100% sensitivity and a 98%specificity (11).

Microtiter plates, high-binding capacity (96-wellskovalink Nunc, Roskilde, Denmark) were coated overnightat 4°C with 100 �l/well of 10 �g/ml of Guinea pig tTG(Sigma T5398, Chemical, St. Louis, MO) and 2 �g/ml of analcoholic gliadin solution, in coating buffer (50 mmol/L

Table 1. Symptoms and Clinical Conditions of the Patients Enrolled in the Study

All Patients Pediatric Patients Adult Patients

GI symptoms (%) 107 (59.1) 86 (65.6) 21 (42)Extra-GI symptoms or clinical conditions (%) 47 (26) 28 (21.4) 19 (38)

Iron-deficiency anemia 20 10 10Short stature 9 8 1Diabetes 4 2 2Recurrent aphtous stomatitis 2 1 1Alopecia 2 1 1Dermatitis herpetiformis 4 1 3Atopic dermatitis 2 2 0Down syndrome 1 1 0IgA nephropathy 1 1 0Connective tissue diseases 1 1 0Recurrent spontaneous abortion 1 0 1

Screening (%) 27 (14.9) 17 (13) 10 (20)Total 181 131 50

1326 Scoglio et al. AJG – Vol. 98, No. 6, 2003

Page 6: Exercise 1 Worksheet : variance of binomial & sampling

Tris-HCI, 5 mmol/L CaCl2, 150 mmol/L NaCl, pH 8.2). Theplates were washed four times with 50 mmol/L Tris-HCI,150 mmol/L NaCl, 0.05% triton x-100 pH 8.2, and over-coated for 2 h at room temperature with 200 �l/well of 1%hydrolysated casein, 0.05 polivinilpirrolidone, and 0.05%triton x-100 (MW 10.000, Sigma) in Tris buffer, pH 8.2

Also, 100 �l of serum samples diluted 1:50 in samplebuffer (50 mmol/L Tris-HCI, 150 mmol/L NaCl, 0.05%triton x-100, 0.25% casein, pH 7.4) were added to the wellsand incubated overnight at 4°C. Then, the plates werewashed four times and incubated for 2 h at room tempera-ture with 100 �l of rabbit ALP antihuman IgA (Sigma) anddiluted 1:2000 in the sample buffer. The plates were washedfour times to remove unbound antibodies. The color wasdeveloped by addition of 100 �l/well of dietanolaminebuffer, pH 9.7, paranitro-phenilphosphate 1 mg/ml, at roomtemperature for 30 min. The reaction was stopped with100/well of NaOH 1N solution, and absorbance was read onan ELISA reader (LP 300 Pasteur) at 405 nm. Two repli-cates were used for each serum sample. Antibody concen-trations were expressed in arbitrary units (AU), that is, aspercentages of the positive reference serum.

The intrassay coefficient of variation for IgA class tTGAELISA was 2.16% at a 1:2,500 dilution (40 AU) (n � 10),2.53% at a 1:5,000 dilution (20 AU) (n � 10), 1.96% at a1:10,000 dilution (10 AU) (n � 10), and 3.1% at a 1:20.000dilution (n � 10). The interassay coefficient of variation was2.6% (n � 8). The results of the serological tests given bythe laboratory were not known by the pathologists whointerpreted the reference test and vice versa.

Considering the result of histology as gold standard fordiagnosis of CD, we calculated diagnostic accuracy for eachtest and for serial testing assuming the post-test probability ofthe test (TGA) with the highest negative predictive value as

pretest probability of AEA. Diagnostic accuracy and post-testprobability were calculated using the all-purpose 4-fold TableAnalyzer and the interactive nomogram for post-test probabil-ities offered by the home page of the Center for Evidence-Based Medicine (http://cebm.jr2.ox.ac.uk/). This calculationwas applied in different clinical conditions, such as an infantwith a classic clinical picture of CD, assuming a pretest prob-ability of at least 75%, an asymptomatic first-grade relative ofa celiac patient, assuming a pretest probability of 10% (12), andasymptomatic people from the general pediatric and adultpopulation, assuming a pretest probability of 0.5–0.4% (4, 13).

RESULTS

Of all 181 patients enrolled in the study, 134 had a histologythat was compatible (type 3a–3c), and 47 not compatible,with CD (type 0–2). All patients with a histology compat-ible with CD had a clinical and serological response to agluten-free diet. The results of TGA and AEA in patientswith and without histology compatible with CD, regardlessof age, and for the two groups of age, are shown in Figures1, 2, and 3.

Among the 134 patients (100 � 18 yr) with a subtotalvillous atrophy, TGA and AEA were negative in only onepediatric case. Among the 47 patients (31 � 18 yr) with anormal mucosa, TGA was positive in 12 (nine children), five(four children) of whom were AEA positive, too. All thefour children with positive TGA and AEA had celiac-typehuman leukocyte antigen (HLA) DQ2. Only one adult pa-tient with normal mucosa and negative TGA had positiveAEA.

Diagnostic accuracy of TGA and AEA for the threegroups of patients are shown in Tables 2 and 3, respectively.In particular, positive LR of TGA and AEA resulted as 3.89

Figure 1. TGA and AEA in 181 consecutive patients who underwent intestinal biopsy.

1327AJG – June, 2003 Intestinal Biopsy for Diagnosis of CD

Page 7: Exercise 1 Worksheet : variance of binomial & sampling

and 7.48, respectively, in patients considered as a whole,and 3.41 and 7.36; 5.33 and 7.76, respectively, in pediatricand adult patients. Positive LR of AEA was higher than thatof TGA both in pediatric and adult patients. On the contrary,TGA had a negative predictive value higher than that ofAEA, especially in adult patients. In all patients withCrohn’s disease TGA and AEA were negative

CALCULATION OF POST-TEST PROBABILITY INDIFFERENT CLINICAL SITUATIONS

Post-test probability was calculated on the basis of LRobtained for each test, in the corresponding age group,

assuming as pretest probability of TGA a probability basedon clinical experience or on literature data, and as pretestprobability for AEA the post-test probability of TGA.

Pediatric PatientsIn infants with a classic clinical picture of CD, the interac-tive nomogram for post-test probability for TGA is shown inFigure 4. It can be seen that, assuming a 75% pretestprobability for these infants, with a 3.41 positive LR, thepost-test probability of TGA is more than 90%. Assumingthis post-test probability as pretest probability for AEA, itcan be seen that with a 7.36 LR, the post-test probability is

Figure 2. TGA and AEA in 131 consecutive patients �18 yr who underwent intestinal biopsy.

Figure 3. TGA and AEA in 50 consecutive adult patients who underwent intestinal biopsy.

1328 Scoglio et al. AJG – Vol. 98, No. 6, 2003

Page 8: Exercise 1 Worksheet : variance of binomial & sampling

more than 98%. In these patients, intestinal biopsy may notgive any additional probability for diagnosis.

In first-grade relatives of celiac patients, under 18 yr ofage, with a 10% pretest probability (12) and a 3.41 positiveLR, the post-test probability of TGA is about 25%. Assum-ing this post-test probability as pretest probability for AEA,with a 7.36 LR, the post-test probability is about 70%. Forthese patients, an intestinal biopsy is needed to confirm thediagnosis of CD.

In asymptomatic school children screened for CD, with a0.5% pretest probability (4) and a 3.41 positive LR, thepost-test probability of TGA is about 2%. Assuming thispost-test probability as pretest probability for AEA, with a7.36 LR, the post-test probability is only about 12%. Forthese patients, an intestinal biopsy is mandatory to confirmthe diagnosis of CD.

Adult PatientsKeeping in mind the positive LR for TGA and AEA foundin patients over 18 yr of age (5.33 and 7.76, respectively), inadults with a symptomatology strongly suggesting CD (pre-test probability 75%), intestinal biopsy does not give anyadditional probability for diagnosis because the post-testprobability is at the top of the scale. With respect to adultrelatives or asymptomatic people, conclusions similar tothose in pediatric patients can be drawn: an intestinal biopsyis needed to confirm the diagnosis of CD, as the post-testprobability is 82% and 20%, respectively.

DISCUSSION

After the identification of tTG as the unknown endomysialautoantigen of CD (5), more than 60 articles on TGA as adiagnostic tool for CD can be found with a MEDLINE

search. Most of these studies, especially those using humanTGA, report a higher diagnostic accuracy of this test incomparison with AEA and antigliadin antibodies. However,intestinal biopsy is still the gold standard for diagnosis ofCD (1).

Our data suggest that for adult patients with a classicclinical picture strongly suggesting CD, intestinal biopsymight not be needed to confirm the diagnosis, in the pres-ence of positive antibodies against both transglutaminaseand endomysium. However, an endoscopy should be man-datory in case of absence of response to a gluten-free diet toobtain intestinal biopsy samples and to exclude other dis-orders (e.g., lymphoma).

Reviewing the results of recent studies using the deter-mination of human tTG as substrate of ELISA in children,which has been reported as having a better diagnostic ac-curacy than that based on guinea pig as substrate, the his-tology of intestinal biopsy samples does not give the patientany additional probability of having the disease (7).

Keeping into account the 107 patients with GI symptomsstrongly suggesting CD, a strategy based on starting a glu-ten-free diet in patients with both TGA and AEA positivevalues would have allowed to avoid costs of 72 endoscopicprocedures. On the other hand, two of these patients wouldhave a diagnosis of CD in the absence of a compatiblehistology, but they would undergo endoscopy in case of lackof clinical and/or serological response to a gluten-free diet.

However, this conclusion cannot be generalized for everysetting as the diagnostic accuracy of these tests should bevalidated in each laboratory to which patients are referred,and a positive LR should be calculated to estimate thepost-test probability. In the presence of a low positive LR,even in cases with a high pretest probability, the diagnosis

Table 2. Diagnostic Accuracy of TGA Determined in All, Pediatric and Adult, Patients (CI)

All Patients(n � 181)

Pediatric Patients(n � 131)

Adult Patients(n � 50)

Sensitivity (%) 99.3 (97.8–100.7) 99.0 (97.0–101.0) 100.0 (100.0–100.0)Specificity (%) 74.5 (62.0–86.9) 71.0 (55.0–86.9) 81.3 (62.1–100.4)Positive predictive value (%) 91.7 (87.2–96.2) 91.7 (86.5–96.9) 91.9 (83.1–100.7)Negative predictive value (%) 97.2 (91.9–102.6) 95.7 (87.3–104.0) 100.0 (100.0–100.0)Overall accuracy (%) 92.8 (89.1–96.6) 92.4 (87.8–96.9) 94.0 (87.4–100.6)Positive LR 3.89 (3.40–4.38) 3.41 (2.86–3.96) 5.33 (4.31–6.35)Negative LR 0.01 (�1.95–1.97) 0.01 (�1.95–1.98) 0.00

Table 3. Diagnostic Accuracy of AEA Determined in All, Pediatric and Adult, Patients (CI)

All Patients(n � 181)

Pediatric Patients(n � 131)

Adult Patients(n � 50)

Sensitivity (%) 95.5 (92.0–99.0) 95.0 (90.7–99.3) 97.1 (91.4–102.7)Specificity (%) 87.2 (77.7–96.8) 87.1 (75.3–98.9) 87.5 (71.3–103.7)Positive predictive value (%) 95.5 (92.0–99.0) 96.0 (92.1–99.8) 94.3 (86.6–102.0)Negative predictive value (%) 87.2 (77.7–96.8) 84.4 (71.8–97.0) 93.3 (80.7–106.0)Overall accuracy (%) 93.4 (89.7–97.0) 93.1 (88.8–97.5) 94.0 (87.4–100.6)Positive LR 7.48 (6.73–8.23) 7.36 (6.45–8.8) 7.76 (6.47–9.06)Negative LR 0.05 (�0.74–0.84) 0.06 (�0.81–0.92) 0.03 (�1.91–1.97)

1329AJG – June, 2003 Intestinal Biopsy for Diagnosis of CD

Page 9: Exercise 1 Worksheet : variance of binomial & sampling

of CD should be confirmed by performing intestinal biopsy.On the other hand, an asymptomatic patient found positivefor TGA and AEA at a familial or general populationscreening for CD would not accept the idea of starting alifelong diet with only a 75% or 15%, respectively, post-testprobability of being celiac.

In most studies, both TGA and EMA are performed asdiagnostic tests for CD. We carried forward the post-testodds from the last test as the pretest odds for the next test.Only if one wants to use this serial testing to avoid doingfurther testing would it be cost-effective to perform bothtests. It would be better, otherwise, to perform only the testwith the highest negative predictive value to select patientsto undergo intestinal biopsy.

Intestinal biopsy, however, should be also performed inpatients with symptoms strongly suggesting CD, particu-larly if they have another risk factor for CD, such as famil-iarity, even though serology is negative (14). Apart from thefalse negative of a laboratory test, an explanation in thesecases may be that tTG is not the only autoantigen of CD(e.g., immune reaction against actin cytoskeleton has beendescribed in both children and adults with CD) (15).

Intestinal biopsy is so far considered as the gold standardfor diagnosis of CD. However, a picture of subtotal villousatrophy, next to a normal or subnormal duodenal morphol-ogy may be present. Recently, a pattern of patchy duodenallesions was observed in all untreated CD patients biopsied

up to five times (16). For this reason, it would be advisableto take multiple biopsy samples to detect a picture, which iscompatible with diagnosis. On the other hand, the observa-tion of a dynamic morphological lesion in CD, as shown byMarsh and Crowe (17), might allow a diagnosis of CD in thepresence of minor morphological changes (10). Patientswith latent CD who show normal histology and positiveAEA while taking a gluten-containing diet, and who later ondevelop a severe villous atrophy which recovers on a gluten-free diet, have been reported in another study (18). Someobservations might suggest considering serology as the goldstandard of CD because a gluten-related autoimmune dis-order might take place before a severe enteropathy occurs.In this regard, Maki et al. showed how reticulin autoanti-body negative subjects became antibody positive, followedin several cases by CD (19), in insulin-dependent diabetesmellitus patients. Moreover, in a prospective study, Venturaet al. reported that patients with CD have a high prevalenceof both insulin-dependent diabetes mellitus and thyroid-related serum autoantibodies and that these autoantibodieswould seem to be gluten dependent because they disappearas a result of a gluten-free diet (20). Of our five patientspositive for both TGA and AEA without duodenal histologycompatible with CD, four had celiac-type HLA. For thesepatients, a regular surveillance for CD would be advisable,and further investigations (e.g., an increase in ��� intraepi-

Figure 4. Interactive nomogram for post-test probability for TGA and AEA in infants with a classic clinical picture of CD, assuming thepost-test probability of TGA (left) as pretest probability for AEA (right).

1330 Scoglio et al. AJG – Vol. 98, No. 6, 2003

Page 10: Exercise 1 Worksheet : variance of binomial & sampling

thelial lymphocytes) might indicate a CD in its early stage(21).

In conclusion, our study suggests that in some casesintestinal biopsy could be avoided. These cases are thosewith a classic clinical picture of disease or with possiblegluten-related autoimmune disorders with a positive serol-ogy for CD in which a severe enteropathy may not develop.The classic example is dermatitis herpetiformis where thegluten-induced skin disease may develop in genetically sus-ceptible subjects without enteropathy, but with a high den-sity of �/� T cells in normal mucosa (22).

A situation analogous to that of the skin in dermatitisherpetiformis is the cerebellum or the peripheral nerves ingluten ataxia, where the neurological dysfunction may notonly precede CD but may also be its only manifestation(23).

This discovery allows us to shift the emphasis from thegut as the sole protagonist in CD and to adopt Marsh’sdefinition of gluten sensitivity as “a state of heightenedimmunological responsiveness to ingested gluten in genet-ically susceptible individuals” (24).

Reprint requests and correspondence: Giuseppe Magazzu,M.D., Dipartimento di Scienze Pediatriche Mediche e Chirurgiche,Padiglione NI, Policlinico Universitario, Via Consolare Valeria 1,98125 Messina, Italy.

Received Sep. 4, 2002; accepted Jan. 10, 2003.

REFERENCES

1. Fasano A, Catassi C. Current approaches to diagnosis andtreatment of celiac disease: An evolving spectrum. Gastroen-terology 2001;120:636–51.

2. Meeuwisse GW. Diagnostic criteria in celiac disease. ActaPaediatr Scand 1970;58:461–3.

3. Walker-Smith JA, Guandalini S, Schmitz J, et al. Revisedcriteria for diagnosis of celiac disease. Arch Dis Child 1990;65:909–11.

4. Catassi C, Fabiani E, Ratsch IM, et al. The coeliac iceberg inItaly. A multicentre antigliadin antibodies screening for coe-liac disease in school-age subjects. Acta Paediatr Suppl 1996;412:29–35.

5. Dieterich W, Ehnis T, Bauer M, et al. Identification of tissuetransglutaminase as the autoantigen of celiac disease. Nat Med1997;3:797–801.

6. Dieterich W, Laag E, Schopper H, et al. Autoantibodies to

tissue transglutaminase as predictors of celiac disease. Gas-troenterology 1998;115:1317–21.

7. Sblattero D, Berti I, Trevisiol C, et al. Human recombinanttissue transglutaminase ELISA: An innovative diagnostic as-say for celiac disease. Am J Gastroenterol 2000;95:1253–7.

8. Baldas V, Tommasini A, Trevisiol C, et al. Development of anovel rapid non-invasive screening test for coeliac disease.Gut 2000;47:628–31.

9. Greenhalgh T. How to read a paper: Papers that report diag-nostic or screening tests. BMJ 1997;315:540–3.

10. Oberhuber G, Granditsch G, Vogelsang H. The histopathologyof coeliac disease: Time for a standardized report scheme forpathologists. Eur J Gastroenterol Hepatol 1999;11:1185–94.

11. Foti M, Di Pasquale G, Sferlazzas C, et al. Improving diag-nostic accuracy of tissue transglutaminase autoantibody deter-mination in coeliac disease. JPGN 1999;28:554.

12. Maki M, Holm K, Lipsanen V, et al. Serological markers andHLA genes among healthy first degree relatives of patientswith celiac disease. Lancet 1991;338:1350–3.

13. Hed J, Lieden G, Ottosson E, et al. IgA anti-gliadin antibodiesand jejunal mucosal lesions in healthy blood donors. Lancet1986;262:215.

14. Rostami K, Mulder CJ, van Overbeek FM, et al. Shouldrelatives of coeliacs with mild clinical complaints undergo asmall-bowel biopsy despite negative serology? Eur J Gastro-enterol Hepatol 2000;12:51–5.

15. Clemente MG, Musu MP, Frau F, et al. Immune reaction againstthe cytoskeleton in coeliac disease. Gut 2000;47:520–6.

16. Maiuri L, Ciacci C, Raia V, et al. FAS engagement drivesapoptosis of enterocytes of coeliac patients. Gut 2001;48:418–24.

17. Marsh MN, Crowe PT. Morphology of the mucosal lesion ingluten sensitivity. Baillieres Clin Gastroenterol 1995;9:273–93.

18. Ferguson A, Arranz E, O’Mahony S. Clinical and pathologicalspectrum of coeliac disease–Active, silent, latent, potential.Gut 1993;34:150–1.

19. Maki M, Huupponen T, Holm K, et al. Seroconversion ofreticulin autoantibodies predicts coeliac disease in insulin de-pendent diabetes mellitus. Gut 1995;36:239–42.

20. Ventura A, Neri E, Ughi C, et al. Gluten-dependent diabetes-related and thyroid-related autoantibodies in patients with ce-liac disease. J Pediatr 2000;137:263–5.

21. Kaukinen K, Maki M, Partanen J, et al. Celiac disease withoutvillous atrophy. Revision of criteria called for. Dig Dis Sci2001;46:879–87.

22. Savilahti E, Reunala T, Maki M. Increase of lymphocytesbearing the gamma/delta T-cell receptor in the jejunum ofpatients with dermatitis herpetiformis. Gut 1992;33:206–11.

23. Hadjivassiliou M, Grunewald RA, Davies-Jones GAB. Glutensensitivity: A many headed hydra. BMJ 1999;318:1710–1.

24. Marsh MN. The natural history of gluten sensitivity: Defining,refining and re-defining. Q J Med 1995;85:9–13.

1331AJG – June, 2003 Intestinal Biopsy for Diagnosis of CD

Page 11: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I week 2

Exercise 3: Estimating risk 1. Excerpts from Table 1 and Table 2 from the paper “Coffee consumption and prostate cancer risk and progression in the health professionals follow-up study” by Wilson et al, JNCI, 2011, 103 (11) are given below.

Use the information given to complete the following table for the outcome advanced prostate cancer (i.e. compare advanced prostate cancers against all other individuals (healthy+non-advanced prostate cancer)): Exposure none <1cup 1-3 cups 4-5 cups >=6 cups cases   211  (178.3)   422  (397.6)      Non-cases     20839(20863.4)   6613(6609.0)   2473(2445.4)  

a) Compute the expected numbers in each cell if there is no association between coffee drinking and advanced prostate cancer and write these numbers in parentheses beside the observed counts (this has already been done for half of the cells – to save you time).

b) Compute the discrepancy between the observed and expected table. c) What is your conclusion?

2. In Exercise 1 above, you could use Stata’s “table calculator”: Statistics > Summaries, tables, and tests > Tables > Table calculator and choosing the “expected frequencies” and Chi-square test in the dialogue box (try it and compare your result) 3. For the data in question 1, calculate the relative risk in each exposure group, using the non coffee-drinkers as reference. Do you think there is evidence of a dose-response trend? (we will return to this in question 5)

Page 12: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I week 2

4. The association between coffee and prostate cancer was also investigated in a case-control study “Coffee and risk of prostate cancer incidence and mortality in the Cancer of the Prostate in Sweden study” (Wilson et al, Cancer Causes & Control, 2013, 24:1575-1581) We can use their data to investigate the relationship between education and prostate cancer. Table 1 from their article is copied below.

From this information complete the following table (note the above Table – Table 1 in the published paper - presents % not counts!): Education 0-9 years 10-12 years 13+ years No. of cases No. of controls a) Using the 0-9 years education as reference, calculate the Odds Ratio for the other two groups. b) Is there evidence of a trend? c)

You can test for in the trend by entering the counts è in the Data Editor in Stata, then using the command expand n followed by tabodds case level, or (the option “or” results in OR rather than odds)

level case n 1 1 685 1 0 517 2 1 596 2 0 472 3 1 208 3 0 123

5. Use the same method as in question 4 to test formally for a dose-response for coffee-drinking in question 3 above. Do you think it is appropriate to carry out this test for this data?

Page 13: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I week 2

Exercise 4: Practical Exercise Estimating a Relative Risk or Odds Ratio from a Cross-Sectional Study.

In the plot shown on the screen, we show (part of) a “population” consisting of males (represented in red) and females (represented in black), with diseased individuals represented by crosses and “healthy” individuals represented by circles. Each of you will take (be given) a random sample. Record the numbers and complete the two-by-two table below and then do the following:

1. Calculate the relative risk (RR) in females vs. males. _______________

2. Calculate the Odds Ratio (OR) (females to males). ______________

Disease No Disease

Total

Female Risk= Odds= Male Risk= Odds= Total

In the lecture you will learn how to estimate confidence intervals for the RR and OR (you will be given time to calculate these later). 95% CI for relative risk _______________________ 95% CI for odds ratio _________________________ Note re logs: Common logs use 10 as the "base", and with this base, the log of 10 is 1, log of 100 is 2, of 1000 is 3 etc. (i.e. the log10 of any number is the power we must 10 to in order to get the number). Thus the "anti-log (or getting back to the original number from the log) is done by just raising 10 to the power of the log i.e. 2=100, 102=100; 103=1000 etc. Logs and anti-logs can be obtained from tables, calculators, and computer programs. Every number above zero has a log. Adding two logs is equivalent to multiplying the numbers they represent (e.g. if we add the logs 2 and 3, we get 5, which represents 10000), and the difference is equivalent to dividing ; for example. log (1000)-log(100) = 3-2 =1 = log (1000/100). Thus if a series of logs is equally spaced this tells us that the ratio (or relative differences) between the numbers is constant. “Natural logs”, which are common in epidemiology, use a special number called "e" as their base (e is a recurring decimal, 2.718….) :. Loge is written as “ln” and the anti-log is denoted as "ex" or "exp(x)".

Page 14: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I week 2

Exercise 5: Confidence intervals for RR and OR. 1. In Exercise 3, we used data from the paper “Coffee consumption and prostate cancer risk and

progression in the health professionals follow-up study” by Wilson et al (JNCI, 2011) Use the first and last column of the table you constructed in Q1 of Exercise 3 to calculate the RR of prostate cancer among those who drink 6 or more cups of coffee per day (compared to non coffee-drinkers as the reference group):

No coffee >=6 cups cases    Non-cases    

(a) RR= (b) Calculate the 95% confidence interval for the RR. (c) Compare your results to the published RR and confidence interval (see Exercise 3) and if they differ

suggest an explanation.

2. In Lecture 5, we examined the following data from Kadhel et al, “Chlordecone exposure, length of gestation and risk of preterm birth” Am. J. Epi., Jan 2014

a) Calculate the OR for preterm birth in the highest exposure group (compared to the lowest)

b) Calculate the 95% confidence interval for your OR in (a)

c) Compare your results to the published Table 4 and comment.

Page 15: Exercise 1 Worksheet : variance of binomial & sampling

Biostat I week 2

3. In question 4 of Exercise 3, you extracted data from a paper reporting on a case-control study of prostate

cancer. Use the first and last columns of the table you constructed in that exercise to calculate the OR of prostate cancer among those with 13 or more years of education (compared to those with 0-9 years as the reference group).

0-9 years 13+ years cases controls

(a) OR=

(b) Calculate the 95% confidence interval for the OR. (c) Comment on your results

Page 16: Exercise 1 Worksheet : variance of binomial & sampling

Biostat  I,  Week  2  

Exercise 6. Associations and Confounding

Suppose data on 20,000 pregnancies has been collected in order to study the association between maternal smoking (3 to 5 months after conception) and the risk of the baby being SGA (small for gestational age):

Calculate the RR and OR of SGA in smokers vs. non-smokers: RR= ____________

OR= ____________

Now suppose that in addition to smoking status and SGA, we have collected two more variables: mother’s age (categorised as “young” or “old”) and whether the mother had preeclampsia during pregnancy.

For each variable state whether you think it is a potential confounder (explain).

(i) For the young mothers the 2 × 2 table of smoking vs. SGA is given on the left (fill in the table for old mothers on the right)

Calculate the OR for SGA in smokers vs. non-smokers for the young mothers (the OR for the old mothers and both RRs are given below) RRyoung = 2.070 RRold = 2.088

ORyoung = ___________ ORold = 2.138

Page 17: Exercise 1 Worksheet : variance of binomial & sampling

Biostat  I,  Week  2  

When stratifying on age, the RR becomes slightly stronger. Although the difference is not big - suggest relationships between the variables that could explain this.

Compute the Mantel-Haenzel OR if you consider this appropriate.

(ii) The table for women with preeclampsia is given (complete the non-preeclampsia table).

Calculate the OR for the association between smoking and SGE in mothers with preeclampsia, mothers without preeclampsia, and compare to the overall OR.

Is the association weaker or stronger when stratifying on preeclampsia?

Can you suggest possible relationships between the variables that would explain this?

Compute the Mantel-Haenzel OR if you consider this appropriate.

Page 18: Exercise 1 Worksheet : variance of binomial & sampling

Biostat  1,  week  2  

Exercise 7:

The table below appeared in a letter in a recent issue of Epidemiology (July 2014). Explain whether these data provide evidence of (i) an association between salt use and cancer, (ii) confounding by smoking, (iii) confounding by BMI.

Page 19: Exercise 1 Worksheet : variance of binomial & sampling

Biostat  I,  week  2  

Exercise 8: Rates and Standardisation

Q1: From the table below from Kesavan et al. Am J Epidemiol 2010

a) calculate  the  crude  incidence  rate  of  pancreatic  cancer  for  men  with  BMI  >=25  in  each  of  the  

three  exposure  groups  (i.e.  tertiles  of  magnesium  intake)  

b) Using  level  1  as  the  reference,  calculate  the  IRR  for  level  2  and  level  3  

c) Compare  your  crude  IRR  to  the  adjusted  values  reported,  and  comment  

d) For  any  one  of  your  crude  IRR  values,  construct  a  95%  confidence  interval  and  interpret  

Q2.

In a recent issue of International Journal of Epidemiology (2014, 949–961), the authors write in an annotated reference No. 4.

“Suppose on Jan 1st, 1936 there were 5000 persons under observation, none of whom are inoculated; that 300 are inoculated on April 1st, a further 600 on July 1st, and another 100 on Oct. 1st. At the end of the year there are, therefore, 1000 inoculated persons and 4000 still uninoculated. During the year there were registered 110 attacks amongst the inoculated persons and 890 amongst the uninoculated, a result apparently very favourable to inoculation”

(i) Show what simple (and inappropriate!) calculation gives this “favourable” result

(ii) Assuming that all the attacks happened at the end of the year, calculate the valid person

time in the two groups and compare the incidence rates.

Page 20: Exercise 1 Worksheet : variance of binomial & sampling

Biostat  I,  week  2  

Q3: In a paper in 1985, Boyce and Wessey report that the incidence of hip fractures in women aged 35 or more in Oxford was 35.4 per 10,000 (91 hip fractures among 25,698 women). The population distribution and annual incidence of hip fracture for men in Oxford is described in the following table: Age(yr) Mid-year population Rate per 10,000 35-54 14,217 1.1 55-64 4,303 6.5 65-74 2,695 6.7 75-84 1,100 21.8 85-94 164 48.8 Total 22,479 4.2 (a) Using the following population counts for women in Oxford, calculate the expected number

of events in women and the standardised incidence ratio using men as the reference.

Age(yr) 35-54 55-64 65-74 75-84 85-94 Total No. 10,309 5,376 5,558 3,400 1,055 25,698

(b) Test whether there is a significant difference between the incidence in men and women

Q4: Look at the paper “Stratification for Confounding – Part 2: Direct and Indirect Standardization” by Tripepi et al (Nephron Clin Pract 2 2010;116:c322–c325) and see that you understand how the authors obtain the numbers in Tables 2,3 and 4. For the indirect standardization example, calculate the 95% confidence interval for the SMR using the simple method given in lecture 12 (slide 15). Q5:

A study in J Am Soc Nephrol in 2010 reported a survival advantage for patients whose kidney transplants have failed and who had the kidney removed (“allograft nephrectomy”). In Int J. Epi, 2014, Hanley and Foster point out that their analysis was inappropriate - in the comparison between patients with and without nephrectomy, the follow-up time was not handled correctly in computing incidence.

Read page 949-950 of Hanley and Foster’s article and see that you understand the rates in Figure 2, and more generally, the important point that Hanley and Foster are making!

Page 21: Exercise 1 Worksheet : variance of binomial & sampling

Fax +41 61 306 12 34E-Mail [email protected]

Kidney Disease and Population Health

Nephron Clin Pract 2010;116:c322–c325 DOI: 10.1159/000319591

Stratification for Confounding – Part 2: Direct and Indirect Standardization

Giovanni Tripepi   a Kitty J. Jager   b Friedo W. Dekker   b, c Carmine Zoccali   a

a   CNR-IBIM, Clinical Epidemiology and Physiopathology of Renal Diseases and Hypertension of Reggio Calabria, Reggio Calabria , Italy; b   ERA-EDTA Registry, Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Amsterdam , and c   Department of Clinical Epidemiology, Leiden UniversityMedical Center, Leiden , The Netherlands

plied to compare observed and expected rates of a given disease/outcome by removing the influence of extrane-ous factors (confounders) [2] . Standardization is well suit-ed for comparing the rate of a given outcome (for example mortality) in large populations that differ for important prognostic factors like age, gender and social economical status. In such circumstances, due to the presence of con-founding factors, a crude comparison of mortality rates between the two populations may be misleading [2] . There are two major standardization methods: one is used when the available ‘standard’ is the structure of a reference population (direct method) and the other when the ‘standard’ is a set of specific event rates (indirect method). For example, when controlling for gender in two populations by using direct standardization, the ex-ternal standard could be set as the gender distribution derived by the combination of the two populations while in the indirect standardization the external standard is a series of gender-specific rates.

Direct Method of Standardization

Here we consider a hypothetical study investigating the mortality rate in two large populations of moderate drinkers (n = 80,000) and nondrinkers (n = 80,000). Age is a potential confounder [2] because (1) it is significantly

Key Words

Confounding � Direct standardization � Indirect standardization � Stratification

Abstract

Standardization is a method used to compare observed and expected rates of a given disease/outcome by removing the influence of factors that may confound the comparison. There are two major standardization methods: one is used when the ‘standard’ is the structure of a population (direct method) and the other when the ‘standard’ is a set of spe-cific event rates (indirect method). The direct standardization is commonly used for large populations while the indirect one is applied to populations of relatively small dimensions.

Copyright © 2010 S. Karger AG, Basel

Introduction

In a previous paper of this series, we focused on the Mantel-Haenszel method [1] , a statistical technique that allows to calculate an overall, unconfounded effect esti-mate of a given exposure for a specific disease/outcome by pooling stratum-specific relative risks (RRs) or odds ratios. In this paper, we describe another technique based on stratification [1] , termed ‘standardization’ that is ap-

Published online: July 28, 2010

Dr. Giovanni Tripepi, MSc CNR-IBIM, Istituto di Biomedicina, Epidemiologia Clinica e Fisiopatologiadelle Malattie Renali e dell’Ipertensione Arteriosa , c/o Euroline di Ascrizzi Vincenzo Via Vallone Petrara No. 55/57 , IT–89124 Reggio Calabria (Italy) Tel. +39 0965 397 010, Fax +39 0965 26879, E-Mail gtripepi   @   ibim.cnr.it

© 2010 S. Karger AG, Basel1660–2110/10/1164–0322$26.00/0

Accessible online at:www.karger.com/nec

Dow

nloa

ded

by:

Kar

olin

ska

Inst

itute

t, U

nive

rsity

Lib

rary

19

8.14

3.54

.65

- 9/

23/2

015

11:0

9:05

AM

Page 22: Exercise 1 Worksheet : variance of binomial & sampling

Direct and Indirect Standardization Nephron Clin Pract 2010;116:c322–c325 c323

different in the two populations (the proportion of indi-viduals aged 6 60 years is higher in moderate drinkers than in nondrinkers; table 1 ), (2) it is related to the out-come of interest (mortality) and because (3) there is no scientific evidence that ageing is in the potential causal pathway mediating the link between alcohol intake and death.

As a first step, the authors stratified the two popula-tions according to age strata and calculated the stratum-specific proportion of deaths, separately in moderate drinkers (exposed) and nondrinkers (unexposed) ( ta-ble 1 ).

The crude mortality rate was 1,062 deaths/10 5 in mod-erate drinkers and 787 deaths/10 5 in nondrinkers. There-fore, the crude RR (that is the ratio of death rates in the exposed and unexposed groups) is 1,062/787 = 1.35. This implies that the crude mortality rate is 35% higher in mod-erate drinkers that in nondrinkers. If moderate drinking is causally implicated in the risk of death, the relationship between moderate alcohol intake and mortality should be independent of age (the potential confounder). To control for the confounding effect of age, the authors used the di-rect method of standardization. To this end, they calcu-lated the proportion of deaths in each age category, sepa-

Table 1. D eath occurrence in moderate drinkers and nondrinkers according to age strata

Age strata Moderate drinkers (exposed) N ondrinkers (unexposed)

number of individuals

number of deaths

crude mortality rate (for 100,000 individuals)

number of in dividuals

number of deaths

crude mortality rate (for 100,000 individuals)

<30 years 25,000 50 200 30,000 60 20030–60 years 30,000 300 1,000 35,000 210 600

≥60 years 25,000 500 2,000 15,000 360 2,400

Total 80,000 850 1,062 80,000 630 787

The crude mortality rate (deaths/100,000 individuals) is calculated in each stratum by the standard formula: (number of deaths/number of individuals) ! 100,000 (or ! 105).

Table 2. O bserved and expected death rate in the overall population (moderate drinkers + nondrinkers)

Age strata

P roportion ofdeaths in moderate drinkers

Proportion of deaths in nondrinkers

Total number of individuals in each age category (moderatedrinkers + nondrinkers)

Age-adjusted number of deaths in 160,000 moderate drinkers

Age-adjusted number of deaths in 160,000 nondrinkers

<3 0 years 0.002 0.002 25,000 + 30,000 = 55,000 0.002 ! 55,000 = 110 0.002 ! 55,000 = 11030–60

years0.01 0.006 30,000 + 35,000 = 65,000 0.01 ! 65,000 = 650 0.006 ! 65,000 = 390

≥60 years 0.02 0.024 25,000 + 15,000 = 40,000 0.02 ! 40,000 = 800 0.024 ! 40,000 = 960

Total = 160,000 Total expecteddeaths = 1,560

Total expecteddeaths = 1,460

Adjusted death rate (for 100,000 individuals)

In moderate drinkers when weighted for the mean age distribution of moderate drinkers and nondrinkers 1,560/160,000 = 975

In nondrinkers when weighted for the mean age distribution of moderate drinkers and nondrinkers 1,460/160,000 = 912

Table 3. A djusted death rate in moderate drinkers and non drinkers weighted for the mean age distribution

Dow

nloa

ded

by:

Kar

olin

ska

Inst

itute

t, U

nive

rsity

Lib

rary

19

8.14

3.54

.65

- 9/

23/2

015

11:0

9:05

AM

Page 23: Exercise 1 Worksheet : variance of binomial & sampling

Tripepi   /Jager   /Dekker   /Zoccali  

Nephron Clin Pract 2010;116:c322–c325 c324

rately in moderate drinkers and nondrinkers (see second and third columns in table 2 ), and summed up the stra-tum-specific number of individuals in the overall (com-bined) populations (see fourth column in table 2 ).

Then the authors calculated the expected, stratum-specific number of deaths in a hypothetical population of 160,000 moderate drinkers and of 160,000 nondrinkers weighed for the mean age distribution of moderate drink-ers and nondrinkers (see fifth and sixth columns). Sum-ming up the stratum-specific expected number of deaths gave 1,560 expected deaths in moderate drinkers and 1,460 expected deaths in nondrinkers. The adjusted death rates are the mortality rates that the two populations would have experienced if they had had the same distri-bution of the confounding factor ( table 3 ).

By using the age-adjusted death rates, the authors cal-culated the age-adjusted RR of death (moderate drinking vs. nondrinking) by the standard formula:

adjusted RR = 975/912 = 1.07.

After data adjustment by age, the RR of death changed toward the null hypothesis (that is no independent effect of moderate drinking on the risk of death). Furthermore, by comparing age-adjusted (RR = 1.07) and unadjusted RRs (RR = 1.35) of death, the authors found a difference by more than 20%, a figure indicating a strong confound-ing effect of age. In this case, age engendered ‘positive confounding’ because it determined an overestimationof death risk associated with moderate alcohol intake. When using direct standardization for comparing two large populations, the statistical significance of the ad-justed effect estimate (RR = 1.07) and the corresponding 95% confidence interval (CI) is not an issue because due to the large sample size of the two populations also a very modest risk increase (+7%) may be significant. The cal-culation of the 95% CI of the adjusted RR is important particularly when comparing samples of relatively small dimension (for details, see Chiang [3] ).

In the renal field, the direct standardization method was used by Jager et al. [4] investigating whether the rela-tive excess of cardiovascular over noncardiovascular mortality was higher in patients starting dialysis as com-pared to that in the general population in European countries. For a valid comparison, the higher age and higher percentage of males among dialysis patients had to be taken into account. To this end, age- and sex-specif-ic mortality rates were calculated both for dialysis pa-tients and the general population. Then, a weighted aver-age over strata of age and sex was calculated, with the age and sex distribution of the general population in these European countries as ‘standard’. By this approach, the authors found that the cardiovascular/noncardiovascular mortality rates ratio in patients starting dialysis did not differ from that in the general population.

An important limitation of the direct method of stan-dardization is that it is inefficient when the number of categories of the confounding variable is relatively high and the sample size is relatively low. In this instance, it is preferable to use the indirect method of standardization.

Indirect Method of Standardization

In the indirect method of standardization, instead of using the distribution of the confounding variable (for example age) in a standard population, we use a series of standard age-specific rates. In this way, the expected number of deaths is calculated for each stratum of age in the population of interest. Next, the ratio of observed to expected number of deaths is taken and reported as a standardized mortality rate (SMR). Here we consider a hypothetical study including a random sample of 545 in-dividuals in which the mortality rate was compared to that in the general, reference population (the standard death rate; table 4 ).

Table 4. O bserved and expected death rate in a hypothetical sample of 545 individuals

Age categories

Observed death cases

Total number of indivi-duals in each age category

Observed proportion of deaths

Standard death rate

Expected death cases

3 0–39 years 16 180 0.0889 0.0669 0.0669 ! 180 = 1240–49 years 22 140 0.1571 0.0948 0.0948 ! 140 = 1350–59 years 25 135 0.1852 0.1209 0.1209 ! 135 = 1660–69 years 19 90 0.2111 0.1370 0.1370 ! 90 = 12

Total 82 53

Dow

nloa

ded

by:

Kar

olin

ska

Inst

itute

t, U

nive

rsity

Lib

rary

19

8.14

3.54

.65

- 9/

23/2

015

11:0

9:05

AM

Page 24: Exercise 1 Worksheet : variance of binomial & sampling

Direct and Indirect Standardization Nephron Clin Pract 2010;116:c322–c325 c325

The study population was stratified according to age categories (from 30 to 69 years). In each age stratum, the authors reported the number of observed deaths, the total number of individuals, the observed proportion of deaths as well as the standard, age-specific, death rates derived from available tables of the reference population. The ex-pected number of deaths in all age categories was calcu-lated by multiplying the standard death rates by the num-ber of individuals in each age stratum. By summing up the stratum-specific expected deaths, the authors found 53 total expected deaths that represent the death rate that the population under investigation would have experi-enced if it had had the same age-specific death rates as the reference population. The SMR is calculated by dividing the total number of observed deaths by the total number of expected deaths:

SMR = 82/53 = 1.55 (95% CI = 1.23–1.92)

where the 95% CI of the SMR was calculated by the stan-dard formula (see appendix for mathematical details).

An SMR of 1.55 indicates that the death risk in the population under investigation is 55% higher than that expected if the population had experienced the same age-specific death rates as the reference population. This SMR of 1.55 is statistically significant because the corre-sponding 95% CI (1.23–1.92) does not include 1. This ef-fect estimate is standardized, i.e. adjusted, for the con-founding effect of age. In table 5 , all possible interpreta-tions of SMR values are given.

The indirect standardization method was applied in a recent study in Taiwan investigating the SMR for uro-logical cancers in herbalists [5] . The SMR for urological cancers was significantly higher in Chinese herbalistsas compared to the general population (SMR = 3.1, 95% CI = 1.4–5.9), and this was also true for kidney cancer (SMR = 3.8, 95% CI = 1.4–8.3) and renal failure (SMR = 2.4, 95% CI = 1.4–3.8). The authors concluded that Chi-

nese herbalists had a significantly higher risk for urolog-ical cancers as compared to the general population and emphasized the importance of safety assessments of Chi-nese herbs on a large scale.

Conclusions

Direct and indirect methods of standardization ad-dress confounding when comparing the rate of a given outcome (like mortality) between two populations that differ for important prognostic factors (like age, gender and social-economical status). These two methods are used for both descriptive and analytical purposes. By the direct method of standardization, we calculate adjusted mortality rates and adjusted RRs. The indirect method of standardization allows to calculate the SMR, that is the ratio between the observed number of deaths in a given population and the death rate expected assuming that the population under investigation experienced the same age-specific death rates as the reference population.

Appendix

95% CI calculation for SMR: 3

1 1. 19 . 3 .

.

zObsObs Obs

Lower limitExp

3

1 1. 1 13 . 19 . 1

.

zObsObsObs

Upper limitExp

where Obs. = number of observed deaths, Exp. = number of ex-pected deaths and z = constant (1.96).

Table 5. I nterpretation of SMR values

Interpretation

SMR >1 The death risk in the population under investigation is higher than that in the reference population

SMR <1 The death risk in the population under investigation is lower than that in the reference population

SMR = 1 The death risk in the population under investigation is identical to that in the reference population

References

1 Tripepi G, Jager KJ, Dekker FW, Zoccali C: Stratification for confound-ing. 1. The Mantel-Haenszel formula. Nephron Clin Pract 2010, in press.

2 Jager KJ, Zoccali C, Macleod A, Dekker FW: Confounding: what it is and how to deal with it. Kidney Int 2008; 73: 256–260.

3 Chiang CL: Standard error of the age adjusted death rate. Vital Stat Spec Rep 1961; 47: 275–285.

4 de Jager DJ, Grootendorst DC, Jager KJ, van Dijk PC, Tomas LMJ, An-sell D, Collart F, Finne P, Heaf JG, De Meester J, Wetzels JFM, Rosen-daal FR, Dekker FW: Cardiovascular and non-cardiovascular mortal-ity among patients initiating dialysis. JAMA 2009; 302: 1782–1789.

5 Yang HY, Wang JD, Lo TC, Chen PC: Increased mortality risk for can-cers of the kidney and other urinary organs among Chinese herbalists. J Epidemiol 2009; 19: 17–23.

Dow

nloa

ded

by:

Kar

olin

ska

Inst

itute

t, U

nive

rsity

Lib

rary

19

8.14

3.54

.65

- 9/

23/2

015

11:0

9:05

AM

Page 25: Exercise 1 Worksheet : variance of binomial & sampling

Education corner

Avoiding blunders involving ‘immortal time’

James A Hanley1 and Bethany J Foster1,2*

1Department of Epidemiology, Biostatistics, and Occupational Health and 2Department of Pediatrics,

Montreal Children’s Hospital, Faculty of Medicine, McGill University, Montreal, QC, Canada

*Corresponding author. Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, 1020 Pine

Avenue West, Montreal, Quebec, H3A 1A2, Canada. E-mail: [email protected]

Accepted 2 April 2014

As Groucho Marx once said ‘Getting older is no problem.

You just have to live long enough’.

(Queen Elizabeth II, at her 80th birthday celebration in

2006)

This award proves one thing: that if you stay in the busi-

ness long enough and if you can get to be old enough, you

get to be new again.

(George Burns, on receiving an Oscar, at age 80, in 1996)

(Richard Burton died, a nominee 6 times, but sans Oscar, at

59. Burns lived to 100, so how much of the 41 years’ longev-

ity difference should we credit to Burns’ winning the Oscar?)

Some time ago, while conducting research on U.S. presi-

dents, I noticed that those who became president at earlier

ages tended to die younger. This informal observation led

me to scattered sources that provided occasional empirical

parallels and some possibilities for the theoretical under-

pinning of what I have come to call the precocity-longevity

hypothesis. Simply stated, the hypothesis is that those who

reach career peaks earlier tend to have shorter lives.

(Stewart JH McCann. Personality and Social Psychology

Bulletin 2001;27:1429–39)

Statin use in type 2 diabetes mellitus is associated with a

delay in starting insulin.

(Yee et al. Diabet Med 2004;21:962–67)

Introduction

For almost two centuries, teachers have warned against

errors involving what is now called ‘immortal time.’

Despite the warnings, and many examples of how to pro-

ceed correctly, this type of blunder continues to be made in

a widening range of investigations. In some instances, the

consequences of the error are less serious, but in others the

false evidence has been used to support theories for social

inequalities; to promote greater use of pharmaceuticals,

medical procedures and medical practices; and to minimize

occupational hazards.

We use a recent example to introduce this error. We

then discuss: (i) other names for it, how old it is and who

tried to warn against it; (ii) how to recognize it, and why it

continues to trap researchers; and (iii) some statistical

ways of dealing with denominators measured in units of

time rather than in numbers of persons.

Example and commentary

Example

Patients whose kidney transplants (allografts) have failed

must return to long-term dialysis. But should the failed

allograft be removed or left in? To learn whether its re-

moval ‘affects survival’, researchers1 used the US Renal

Data System to study ‘a large, representative cohort of

[10 951] patients returning to dialysis after failed kidney

transplant’. Some 1106, i.e. 32% of the 3451 in the allo-

graft nephrectomy group, and 2679, i.e. 36% of the 7500

in the non-nephrectomy group, were identified as having

died by the end of follow-up.

Patients in the two groups differed in many characteris-

tics: to take into account a ‘possible treatment selection

bias’, the authors constructed a propensity score for the

VC The Author 2014; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association 949

International Journal of Epidemiology, 2014, 949–961

doi: 10.1093/ije/dyu105

Advance Access Publication Date: 23 April 2014

Education corner

at Karolinska Institutet on Septem

ber 17, 2014http://ije.oxfordjournals.org/

Dow

nloaded from

Page 26: Exercise 1 Worksheet : variance of binomial & sampling

likelihood of receiving nephrectomy during the follow-up.

They used this together with other potential confounders

to perform ‘multivariable extended Cox regression’’. The

main finding of these analyses was that ‘receiving an allo-

graft nephrectomy was associated with a 32% lower ad-

justed relative risk for all-cause death (adjusted hazard

ratio 0.68; 95% confidence interval 0.63 to 0.74)’.

In their discussion, the researchers suggest that their

findings of ‘improved survival’ after allograft nephrectomy

‘challenge the traditional practice of retaining renal allo-

grafts after transplant failure’. The title of the article

(‘Transplant nephrectomy improves survival following a

failed allograft’) suggested causality. They emphasized the

large representative sample and the extensive and sophisti-

cated multivariable analyses, but they did caution that ‘as

an observational study of clinical practice, their analysis re-

mains susceptible to the effects of residual confounding

and treatment selection bias’ and that ‘their results should

be viewed in light of these methodologic limitations inher-

ent to registry studies’. They suggested that a randomized

trial to evaluate the intervention in an unbiased way

would be appropriate. Similar concerns about residual

confounding and selection bias, and the need for caution,

were expressed in the accompanying editorial reiterating

the limitations of the ‘retrospective interrogation of a

database’.

Commentary

‘Residual confounding’ may be a threat, but both authors

and editorialists overlooked a key aspect of the analysis,

one that substantially distorted the comparison. The over-

looked information is to be found in the statements that:

3451 received nephrectomy of the transplanted kid-

ney during follow-up; the median time between return

to dialysis [the time zero in the Cox regression] and

nephrectomy was 1.66 yr (interquartile range 0.73 to

3.02 yr).(Paragraph 1 of Results section)

and that:

Overall, the mean follow-up was (only) 2.93 6 2.26 yr.(Paragraph 3 of Results section)

From these and other statements in the report it would

appear that, in their analyses, follow-up of both ‘groups’

began at the time of return to dialysis. The use of this time-

zero for the 3451 who had the failed allograph removed is

not appropriate—or logical. These patients could not bene-

fit from its removal until after it had been removed; but, as

the median of 1.66 years indicates, a large portion of their

‘follow-up’ was spent in the initial ‘failed graft still in

place’ state—along with those who never underwent neph-

rectomy of their failed allograft.

Since the 3451 patients who ultimately underwent a

nephrectomy (the ‘nephrectomy group’) had to survive

long enough to do so (collectively, approximately 6700 pa-

tient-years, based on the reported quartiles of 0.73, 1.66

and 3.02 years), there were, by definition, no deaths in

these 6700 pre-nephrectomy patient-years. In modern par-

lance, these 6700 patient-years were ‘immortal’. There was

no corresponding ‘immortality’ requirement for entry into

the ‘non-nephrectomy group’. Indeed, all 10 951 patients

returning to dialysis after failed kidney transplant began

follow-up with their ‘failed graft in place’. Some 7500 of

these remained in that initial state until their death (for

some, death occurred quite soon, before removal could

even be contemplated) or the end of follow-up, whereas

the other 3451 spent some of their follow-up time in that

initial state and then changed to the ‘failed graft no longer

in place’, i.e. post-nephrectomy, state.

How big a distortion could the misallocation of these

6700 patient-years produce? The article does not have suf-

ficient information to re-create the analyses exactly.

Figures 1 and 2 show a simpler hypothetical dataset which

we constructed to match the reported summary statistics

quite closely. It was created assuming no variation in mor-

tality rates over years of follow-up or between those lived

in the two states. The ‘virtual’ intervention was set up

‘retroactively’ and was limited to the dataset itself, rather

than to real individuals, and so could not have affected

(other than randomly) the mortality rates in the person-

years lived in each state.

Figure 2A shows that even though the data were gener-

ated to produce the same mortality rate of 11.8 per 100 PY

(person-years) in the person-years in the initial and post-

‘intervention’ states, the inappropriate type of analysis used

in the paper, applied to these hypothetical data, would have

resulted in a much lower rate (6.4) in the ‘intervention

group and a much higher one (17.1) in the ‘non-interven-

tion’ group. The reason is that none of the 1031 deaths

post-‘intervention’ could have occurred, and none of them

did occur, in the 6732 (immortal) pre-‘intervention’ PY that

are included in the denominator input to the rate of 6.4:

logically, the 1031 post-‘intervention’ deaths only occurred

in the post-‘intervention’ PY. And conversely, the 2759

deaths occurred not in 16 096 PY, but rather in the much

larger denominator of 16 096þ6732¼ 22 828 PY lived in

the initial state. The omission of the 6732 PY from the de-

nominator input led to the rate, higher than it should have

been, of 17.1 deaths/100 PY. Indeed it was because of these

(misplaced) immortal 6732 PY they had already survived

that the 3451 patients got to have the ‘intervention’; in

other words, it may not have been that they lived longer be-

cause they underwent the ‘intervention’, but rather that

they underwent the ‘intervention’ because they survived

950 International Journal of Epidemiology, 2014, Vol. 43, No. 3

at Karolinska Institutet on Septem

ber 17, 2014http://ije.oxfordjournals.org/

Dow

nloaded from

Page 27: Exercise 1 Worksheet : variance of binomial & sampling

AB

Fig

ure

2.

Mo

rta

lity

rate

sa

nd

rate

rati

os

pro

du

ced

by

the

(A)

mis

-a

nd

(B)

pro

pe

ra

llo

cati

on

of

pre

-‘in

terv

en

tio

n’

pa

tie

nt

ye

ars

.A

se

xp

lain

ed

inF

igu

re1

,th

eh

yp

oth

eti

cal

da

tafo

rth

e1

09

51

pa

tie

nts

we

reco

n-

stru

cte

dto

ha

ve

an

av

era

ge

mo

rta

lity

rate

of

37

85

de

ath

sin

(10

95

1�

2.9

32

08

6)

pa

tie

nt-

ye

ars

(PY

),i.e

.1

1.8

de

ath

sp

er

10

0P

Y(a

sin

the

act

ua

lst

ud

y),

bu

tw

ith

no

va

ria

tio

no

ve

ry

ea

rso

ffo

llo

w-u

p,

or

be

-

twe

en

sta

tes

(no

,o

rp

re-‘

inte

rve

nti

on

’(w

hit

eb

ack

gro

un

d)

an

dp

ost

-‘in

terv

en

tio

n’

(pin

kb

ack

gro

un

d).

Ind

ee

d,

the

sele

ctio

no

fth

ose

wh

och

an

ge

dst

ate

s(f

rom

wh

ite

top

ink

po

lyg

on

,in

B)

wa

sm

ad

ea

t

ran

do

m,

an

dre

tro

act

ive

ly.

Th

eti

me

loca

tio

n(r

ela

tiv

eto

wh

en

the

allo

gra

ftfa

ile

d)

of

ea

chd

ea

this

ind

ica

ted

by

ab

lack

do

t.In

B,

up

pe

rp

an

el,

the

nu

mb

er

be

ing

follo

we

da

ta

ny

tim

eis

sma

lle

rth

an

34

51

be

-

cau

seso

me

wh

oh

ad

rece

ive

dth

e‘in

terv

en

tio

n’

we

rea

lre

ad

yd

ea

db

efo

reth

ela

sto

ne

sre

ceiv

ed

it.

952 International Journal of Epidemiology, 2014, Vol. 43, No. 3

at Karolinska Institutet on Septem

ber 17, 2014http://ije.oxfordjournals.org/

Dow

nloaded from