How to design and interpret controlled clinical trials “the dark side of the moon” “How to session” ESH June 2005 Andreas Pittaras MD

How to design and interpret controlled clinical trials

“the dark side of the moon”“How to session” ESH June 2005

Andreas Pittaras MD

E. FREIS1912-2005

The father of the first

multicenter, double-

blinded, random trial of

cardiovascular drugs,

VA Cooperative Study

on Antihypertensive

Agents

10.00 new randomized trials every year

>350.000 trials

General internists would need to read 20

articles a day all year round to maintain

present knowledge

Systematic reviews and guidelines reduces

this problem

“the aim of science (…clinical trial) is not to open a door to endless wisdom,

but to put a limit to endless error”

-Bertolt Brecht

Do we wear the same eyeglasses ?

Clinical

Trials

Clinical Studies: Essential Questions

• Was the study original?

• Whom is the study about?

• Was the design of the study sensible?

• Was systematic bias avoided or

minimized?

• Was the study large enough, and continued

for long enough, to make the results credible?


• Was the study original?

• Is there any similar study?

• Is this study bigger, continued for longer,

or otherwise more substantial than previous

one(s)?

• Is the methodology of this study any more

rigorous (in particular, does it address any

specific methodological criticisms of previous

studies)?

• Will the numerical results of this study add

significantly to a meta-analysis of previous studies?

• Is the population that was studied different in any

way (has the study looked at different ages, sex, or

ethnic groups than previous studies)?

• Is the clinical issue addressed of sufficient

importance, and is there sufficient doubt in the minds

of the public or key decision makers, to make new

evidence “politically” desirable even when it is not

strictly scientifically necessary?


• Whom is the study about?

• How were the subjects recruited ? advertisement local

newspaper, primary care, veterans, homeless people etc

• Who was included in the study? coexisting illness, local

language, other medication, illiterate people etc (the results of

studies of new drugs in 23 yo healthy male volunteers will not

be applicable to the average elderly women)

• Who was excluded from the study? A study may be

restricted to pts with moderate or severe CHF, which could

lead to false conclusions about mild CHF. Hospital outpatients

studies have different disease spectrum from the primary care

• Were the subjects studied in real life circumstances?

doubt on the applicability of findings to your own practice


• was the design of the study sensible?

• What specific intervention or other maneuver was

being considered, and what was it being compared

with ?

•It is tempting to take published statements at face value, but

authors frequently misrepresent (usually subconsciously

rather than deliberately) what they actually did, and they

overestimate its originality and potential importance.

•…… examples of problematic descriptions in the method

section of a clinical trial……

What the authors said"We measured how often GPs ask patients whether they smoke."

"We measured how doctors treat low back pain."

"We compared a nicotine-replacement patch with placebo."

"We asked 100 teenagers to participate in our survey of sexual attitudes."

"We randomized patients to either 'individual care plan' or 'usual care'."

"To assess the value of an educational leaflet, we gave the intervention group a leaflet and a telephone helpline number. Controls received neither."

"We measured the use of vitamin C in the prevention of the common cold."

What the authors said What they should have said (or should have done)"We measured how often GPs ask patients whether they smoke." •"We looked in patients' medical records and counted how many had had their

smoking status recorded."

"We measured how doctors treat low back pain." •"We measured what doctors say they do when faced with a patient with low back pain."

"We compared a nicotine-replacement patch with placebo." •"Subjects in the intervention group were asked to apply a patch containing 15 mg nicotine twice daily; those in the control group received identical-looking patches."

"We asked 100 teenagers to participate in our survey of sexual attitudes." •"We approached 147 white American teenagers aged 12-18 (85 males) at a summer camp; 100 of them (31 males) agreed to participate."

"We randomized patients to either 'individual care plan' or 'usual care'." •"The intervention group were offered an individual care plan consisting of ...; control patients were offered ...."


•If the study is purely to assess the value of the leaflet, both groups should have been given the helpline number.

"We measured the use of vitamin C in the prevention of the common cold." •A systematic literature search would have found numerous previous studies on this subject14

An example of:•Assumption that medical records are 100% accurate.

•Assumption that what doctors say they do reflects what they actually do.

•Failure to state dose of drug or nature of placebo.

•Failure to give sufficient information about subjects. (Note in this example the figures indicate a recruitment bias towards females.)

•Failure to give sufficient information about intervention. (Enough information should be given to allow the study to be repeated by other workers.)

•Failure to treat groups equally apart form the specific intervention.

•Unoriginal study.

What the authors said

"We measured how often GPs ask patients whether they smoke.""We measured how doctors treat low back pain."


"We asked 100 teenagers to participate in our survey of sexual attitudes."

"We randomized patients to either 'individual care plan' or 'usual care'."



What the authors said What they should have said (or should have done)"We measured how often GPs ask patients whether they smoke." "We looked in patients' medical records and counted how many

had had their smoking status recorded.""We measured how doctors treat low back pain." "We measured what doctors say they do when faced with a patient with low back

pain."

"We compared a nicotine-replacement patch with placebo." "Subjects in the intervention group were asked to apply a patch containing 15 mg nicotine twice daily; those in the control group received identical-looking patches."

"We asked 100 teenagers to participate in our survey of sexual attitudes." "We approached 147 white American teenagers aged 12-18 (85 males) at a summer camp; 100 of them (31 males) agreed to participate."

"We randomized patients to either 'individual care plan' or 'usual care'." "The intervention group were offered an individual care plan consisting of ...; control patients were offered ...."


If the study is purely to assess the value of the leaflet, both groups should have been given the helpline number.

"We measured the use of vitamin C in the prevention of the common cold." A systematic literature search would have found numerous previous studies on this subject14

An example of:

Assumption that medical records are 100% accurate.

Assumption that what doctors say they do reflects what they actually do.

Failure to state dose of drug or nature of placebo.

Failure to give sufficient information about subjects. (Note in this example the figures indicate a recruitment bias towards females.)

Failure to give sufficient information about intervention. (Enough information should be given to allow the study to be repeated by other workers.)

Failure to treat groups equally apart form the specific intervention.

Unoriginal study.

What the authors said"We measured how often GPs ask patients whether they smoke."

"We measured how doctors treat low back pain."


"We asked 100 teenagers to participate in our survey of sexual attitudes.""We randomized patients to either 'individual care plan' or 'usual care'."



What the authors said What they should have said (or should have done)"We measured how often GPs ask patients whether they smoke." "We looked in patients' medical records and counted how many had had their

smoking status recorded."

"We measured how doctors treat low back pain." "We measured what doctors say they do when faced with a patient with low back pain."

"We compared a nicotine-replacement patch with placebo." "Subjects in the intervention group were asked to apply a patch containing 15 mg nicotine twice daily; those in the control group received identical-looking patches."

"We asked 100 teenagers to participate in our survey of sexual attitudes." "We approached 147 white American teenagers aged 12-18 (85 males) at a summer camp; 100 of them (31 males) agreed to participate."

"We randomized patients to either 'individual care plan' or 'usual care'." "The intervention group were offered an individual care plan consisting of ...; control patients were offered ...."


If the study is purely to assess the value of the leaflet, both groups should have been given the helpline number.

"We measured the use of vitamin C in the prevention of the common cold." A systematic literature search would have found numerous previous studies on this subject14

An example of:Assumption that medical records are 100% accurate.

Assumption that what doctors say they do reflects what they actually do.

Failure to state dose of drug or nature of placebo.

Failure to give sufficient information about subjects. (Note in this example the figures indicate a recruitment bias towards females.)Failure to give sufficient information about intervention. (Enough information should be given to allow the study to be repeated by other workers.)

Failure to treat groups equally apart form the specific intervention.

Unoriginal study.

•What outcome was measured, and how?

•If you had an incurable disease, testing a new drug, you would

measure the efficacy of the drug in terms of whether it

made you live longer (and perhaps, whether life was

worth living given your condition and any side effects of

the medication)

•The measurement of symptomatic effects (pain), functional

effects (mobility), psychological effects (anxiety), or social

effects (inconvenience) of an intervention has even more

problems.

•What is important in the eyes of the doctor may not be valued

so highly by the patient, and vice versa.


•Was systematic bias avoided or minimized?

The aim: groups as similar as possible except for the particular difference being examined

Receive same explanations

Have same contacts with health professionals

Be assessed the same number of times

Using the same outcome measures

Different study designs to reduce systematic bias

Randomized controlled trials

Non-randomized controlled clinical trials

Cohort studies

Case-control studies

Randomized double-blind controlled trials

“Gold standard”

The two treatments are investigated concurrently

Allocation of treatments to patients is by a random

process

Neither the patient nor the clinician knows which

treatment was received

“Single blind”: only the patient is unaware

Copyright ©1997 BMJ Publishing Group Ltd.

Sources of bias to check for in a randomised controlled trial

Random allocation: same chance of receiving either treatment, and is thus

unbiased by definition Minimization (each pt takes automatically the treatment

which leads to less imbalance; alternative in small trials,) Systematic allocation (pseudo-random; even vs odd days

groups; open to abuse) Non-random concurrent controls ( active vs control of

ineligible + refusers; volunteer bias) Historical controls ( a single group of new treatment vs a

group previously treated with other alternative treatment)

Alternative designs Parallel group design (two different groups are

studied concurrently) Crossover design Within group (paired) comparisons Sequential designs Factorial designs Adaptive designs Zelen’s design





Sequential design Parallel groups are studied, but the trial continues until

the clear benefit of one treatment, or it is unlikely that any difference will emerge.

Will be shorter than fixed length trials The data are analyzed after each pt’s results become

available Blinding problems; ethical difficulties Group sequential trial : a useful variation; data

analysis after each block of patients are available (early termination)



Factorial designs

Two treatments , A & B, are simultaneously

compared with each other and with a control.

Pts are divided into four groups, who receive

the control treatment, A only, B only, and

both A&B.

Allows the investigation of the “synergy”

between A & B






• Was assessment “blind”?

“Blind” assessment? “Blind” assessment? People who assess outcome know the patient’s group

-Judge whether someone is still clinically in heart

failure

-Say whether an x ray is “improved” from last time

-recheck a high BP measurement in active group

-BB vs ACEi or ARBs or Diuretics(<HR, <K+)

-CCB vs others (pedal edema)




• Was the study large enough, and

continued for long enough, to make the

results credible?

Sample SizeSample Size

Big enough to have a high chance of

detecting a worthwhile effect if it exists

Be reasonably sure that no benefit exists if its

not found in the trial

Errors defined Type I error (α) : The probability of detecting a

statistically significant difference when the treatments are in reality equally effective (the chance of false-positive result)

Type II error (β): :The probability of not detecting a statistically significant difference when a difference of a given magnitude in reality exists (the chance of a false-negative result)

Power (1-β): The probability of detecting a statistically significant difference when a difference of a given magnitude really exists

The simplest approximate sample size formula for binary outcomes, assuming α=0.05, power=0.90,

and equal sample sizes in the two groups

n=10.51 [(R+1)-p₂(R²+1)]

p₂(1-R)²

n : the sample size in each of the groups

p₁: event rate in the treatment group

p₂: event rate in the control group

R: risk ratio (p₁/p₂)

The simplest approximate sample size formula for binary outcomes, assuming α=0.05, power=0.90,

and equal sample sizes in the two groups

N=962=10.51 [(0.60+1)-0.10(0.60²+1)]

0.10(1-0.60)²

n : the sample size in each of the groups

p₁: 0.06 ( 6% event rate in the treatment group)

p₂:0.10 (estimate 10% event rate in the control group)

R:0.60=6%/10% (to detect 40% reduction (p₁/p₂)

Approximate relative trial sizes for different levels of “α” and “power”

Power (1-β)

α (type I error) 0.50 0.80 0.90 0.99

0.05 100 200 270 480

0.01 170 300 390 630

0.001 280 440 540 820

Duration of follow upDuration of follow up

The study must continue long enough for the effect

of intervention to be reflected in the outcomes.

A study of a new painkiller on the postoperative

pain may only need a follow up period of 48h.

The effect of nutritional supplements in the

preschool years on the final height needs decades.

Events in newly diagnosed DM need >10 years




Interpretation “Tips” of results Interpretation “Tips” of results p <0.05 means by chance <1:20 “significant”

p<0.01 by chance <1:100 “highly significant”

CI “confidence interval” around a result:

indicates the limits within which the “real”

difference is likely to lie

Every r value should be accompanied by a p

value or a CI

Interpretation “Tips” of results Interpretation “Tips” of results

Relative Risk of death

Relative Risk Reduction

Absolute Risk Reduction

Number needed to treat

Odds Ratio

10.00 new randomized trials every year >350.000 trials General internists would need to read 20 articles a

day all year round to maintain present knowledge Systematic reviews and guidelines reduces this

problem

•30-40% of patients do not receive care according to present scientific evidence•20-25% of care provided, is not needed or is potentially harmful

Common clinician concerns about trials, subgroups, meta-analyses, and risk

“Could my patient have been randomized in this

trial? If so the results are applicable; if not, they

may not be”

“Is my patient so different from those in the trial

that its results cannot help me make my

treatment decision?”

Source Population

Eligible Population

Participants

Exposureor

Intervention

Comparisonor

Control

Outcomes

+_

+_

Nested triangles: Different

Population with a Common condition

Documents

How to design and interpret controlled clinical trials “the dark side of the moon” “How to session” ESH June 2005 Andreas Pittaras MD