Statistics 101. Why statistics ? To understand studies in clinical journals. To design and analyze clinical research studies. To be better able to explain

Statistics 101

Why statistics ?

• To understand studies in clinical journals.

• To design and analyze clinical research studies.

• To be better able to explain epidemiologic research to patients.

• To answer questions on board examinations.

Types of Clinical Research Studies

• Cohort: all patients have some condition or something in common (e.g., healthy and living in Framingham, MA)

• Case-Control: cases have some condition; controls do not– Often an aspect of cohort study, which controls are ‘matched’ with cases for age, gender,

and sometimes other variables such as date of admission or date of encounter

• Randomized, placebo-controlled treatment trial: all patients have the condition• May be unblinded, single blinded or double blinded

• Randomized, active-treatment controlled trial: all patients have the condition • often phase 3 trial

• Meta analysis: multiple studies of same condition, although definition of the condition may vary from study to study

Types of Variables

CONTINUOUS– AGE

– BP

– CRP

– AST, CK, glucose, etc

– HEIGHT

– WEIGHT

– BMI

– Etc.

CATEGORICAL– GENDER

– OBESE

– CURE

– MI

– RACE

– OLD vs YOUNG

– Etc.

Basic Statistical Terms

• Range: the two extreme values (min and max)• Mean: the average value (uses all values)• Median: the middle value (ignores extreme values), which

divides population into two subgroups• Quartiles: divides all values into 4 groups

– Tertiles, Quintiles, Percentiles

• Standard deviation of the mean: measure degrees of difference among all values (uses all values)SD= ((differences from the mean2 )/n-1)

A simple example of standard deviation

Values (n=5)

Difference from mean,d

Differences2

d2

12 2 4

10 0 0

5 5 25

15 5 25

8 2 4

Mean=10

Median=?

d2=58

d2(n-1)= 58/4=14.5

14.5 = 3.8

SD = 3.8

Serum [Na+] in 135 normals

134

136

138

140

142

144

146

0 20 40 60 80 100 120 140 160

subject number

seru

m N

a

Mean, 140; median 140; range, 135-145 mM; standard deviation 2

The normal (bell-shaped) distribution

• Imagine 2 curves with the same mean, but different SDs ( one wider and less precise; the other narrower and more precise)– Confidence intervals will

differ

• Now imagine two curves with different means and standard deviations from this curve– Statistical tests are designed to

tell us to what extent these different curves could have occurred by chance

mean

Standard deviations (SD) from the mean.95% of values are within 1.96 SD of mean

n

Some important statistical concepts• Confidence intervals (usually reported as 95% CI)

• Number needed to treat (or harm)

• Absolute and relative risk or benefit reductions (or increases)

• 2-by-2 tables (Chi square, Fisher exact, Mantel Haenszel, others)

• Odds or hazard ratios

• Type 1 and 2 errors (Statistics 102)

• Estimating sample size needed for a study (Statistics 102)

• Pre- and post-test probabilities and likelihood ratios (Statistics 102)

Ann Int Med 2009: 150: JC6-16

H. pylori eradication/NSAID study with outcome of ulcer or no ulcer (categorical outcome):

5 of 51 (10%, or .10) Hp+ pts. who received antibiotics got ulcers when exposed to NSAID.

… and 15 of 49 (31%, or .31) Hp+ pts. who did not receive antibiotics got ulcers when exposed to NSAID.

What is the chance this difference in outcome occurred due to chance and not the antibiotics?

Lancet 2002; 359:9-13.

95% Confidence interval (CI): Example 1

95% CIs

The proportions, p1 and p2, of patients who got ulcers in the 2 groups are an estimate of the true rate. However, from this estimate we can be 95% confident that the actual rates ranges from A to B, with p1 and p2 in the center of the interval from A to B. A and B are the 95% confidence intervals.

p1A BA→B is t h e 9 5 % c o n f i d e n c e i n t e r v a l

95% Confidence interval (CI)

To calculate the 95% CI for p (i.e., A and B), use this formula:

p ± 1.96 [(p)(1-p)/n]

The larger the n, which is in the denominator, the smaller (more precise) the CI

5 of 51 (p1=10%, or .10) of the antibiotic group got ulcers when exposed to NSAID for a fixed time– 95% CI =.10 1.96(.1)(.9)/51=.10±.08=[.02, .18] [2%,18%]

15 of 49 (p2=31%, or .31) of the placebo- group got ulcers when exposed to NSAID for a fixed time– 95%CI =.311.96(.31)(.69)/49 =.31±.13=[.18,.44][18%, 44%]

Note: the two 95% CIs do not overlap, which means that differencesare unlikely to be due to chance. But is the ARR significant?

Absolute risk reduction (ARR) (and its 95% CI)

• The ARR with antibiotics was 31% minus 10%, or 21%. • The 95% CI of the ARR =

21% 1.96 (p1)(1-p1)/n1+(p2)(1-p2)/n2)= 21% 15%, or [6%, 36%].

• The ARR with antibiotics is somewhere between 6% and 36%, with 95% confidence.

• This CI does not overlap zero and thus is unlikely due to chance.

Number needed to treat (NNT)• If Absolute Risk reduction (ARR) = 31%-10%=21%,

the number needed to treat = 1/ARR = 1/.21=5.

• Number needed to harm is the same concept as number needed to treat except that the intervention caused harm rather than good

– e.g.: how many patients needed to be treated with antibiotics to produce one drug rash

• Easy to calculate 95% CI of NNT

• http://www.graphpad.com/quickcalcs/index.cfm

http://www.graphpad.com/quickcalcs/index.cfm

Example : A new protease inhibitor is tested in chronic hepatitis C, genotype 1. The new therapy (added to the standard therapy, interferon alpha/ribavirin) or standard therapy is randomly given to 200 patients for 48 weeks. Sustained viral response rates were as follows:

SVR No SVR

STANDAR D RX (n=101)

50 51

NEW + STAN-DARD (n=99)

83 16

What is the N needed to treat to achieve 1 additional SVR?

Number (n) needed to treat (NNT)

NNT= 1

(SVR, NEW / # NEW) – (SVR,CONTROL / # CONTROL)

NNT=

1

(83/99) –( 50/101) =1

.343 3

Note that the denominator , .343 (34.3%) , is the absolute risk reduction ( ARR).NNT= 1/ARR.NNT= 1/ARR.

Using http://www.graphpad.com/quickcalcs/index.cfm

95% CI of ARR = 0.222 to 0.465.95% CI of NNT = 2.2 to 4.5.


RRRRRR• Relative Risk Reduction (RRR) = ARR/risk with placebo..

• In this example, RRR= 21%/31% = 68%.– Treat 1,000 pts. with NSAID 310 ulcers (31%)

– Treat 1,000 pts. with NSAID + Abs 100 ulcers (10%)

– Antibiotic use prevented 210 ulcers (210/310 = 68% = RRR)

– Antibiotic use reduced ulcers from 310 to 100, or to 32% of expected, a RRR of 68%.

• Note: Length of exposure to NSAID in this study in the 2 groups was identical. If two groups were not followed for an identical time, often the case in trials, outcomes may be higher in the group followed longer and thus events need to be expressed per unit of time (e.g., events per 100 patient-years)

14 of 255 (p1=5.5%, or .055) patients with VTE switched to low-intensity warfarin developed another VTE– 95% CI = [2.6%, 8.4%]

… and 37 of 253 (p2=14.6%, or .146) switched to placebo developed another VTE– 95% CI = [10.3%, 18.9%]

Is this 9.1% difference in VTE likely to be due to chance?

New Engl. J. Med. 2003; 348: 1425-1434

Example 2: VTE or no VTE (categorical outcome)

Example 3: Chi Square/Fisher Exact Tests

• A new treatment for colitis is compared to the standard treatment in 245 patients.

• 120 patients are randomized to the new treatment and 125 to the standard treatment.

• 90 given the new treatment group go into remission (75%) and 30 (25%) do not.

• 75 given the standard treatment go into remission (60%) and 50 (40%) do not.

• Is this a significant improvement in outcome, or to what extent could this have been due to chance? Let’s vote!

(used for categorical outcomes)

Step 1: standard 2X2 table

New Rx a b a+b

Standard Rx c d c+d

a + c b + d

a+b+c+d=n=total

patients in study

REMIT NO REMIT

Enter the data from our study

New Rx: 90(a) 30(b) 120(a+b)

Standard Rx: 75(c) 50(d) 125(c+d)

165 80 245(a+b+c+d)=n(a+c) (b+d)

REMIT NO REMIT

Calculate chi square (2) by plugging in numbers into handheld or online calculator

2 = n (ad-bc- n/2)2

(a+b)(c+d)(a+c)(b+d)

2 = 6.264 (p=0.0123)


Fisher exact test, p=0.0143


We could also have calculated the odds ratio (OR) for a remission :

New Rx a=90 b=30Standard Rx c= 75 d=50

odds ratio = ad/bc odds ratio = 4,500/ 2,250= 2

But this odds ratio of 2 could have occurred by chance.We can calculate the 95% CI of the odds ratio to see if the CI overlaps 1 or not. If not, it favors the new treatment with >95% confidence.

95% CI of the odds ratio (OR)

• ln 95% CI = ln OR 1.96 1/a+1/b+1/c+1/d • The OR = 2.00, and so the ln 2.00= 0.693 (e2.72)• Thus ln 95% CI= 0.693 0.508 = 0.185, 1.201.• To find the CI, we need the antiln of 0.185 and of 1.201.• Antiln 0.185 = e.185 =1.20; antiln 1.201 = e1.201 =3.32. 95% CI =1.20, 3.32.• Thus, the odds ratio for a remission with the new treatment

is 2.00 (95% CI= 1.20, 3.32).• As this odds ratio does not cross 1.00, the difference is

unlikely due to chance and is significant at the 0.05 level.

Documents

Statistics 101. Why statistics ? To understand studies in clinical journals. To design and analyze clinical research studies. To be better able to explain