The SPSS-effect on medical research

Preview:

DESCRIPTION

Medical research relies heavily on statistical inference for generalization of findings, for assessing the uncertainty in applying these findings on new patients. SPSS and similar packages has made complex statistical calculations possible with no or very little understanding of statistical inference. As a consequence, research findings are misunderstood, the presentation of them confusing, and their reliability massively overestimated.

Citation preview

The SPSS-effect on medical research

Jonas Ranstam

Generalization

Medical research studies are typically performed for the benefit of other subjects than the participants.

Treatment effects in the observed patients

x, SD

Treatment effects in new patients

μ, σ

μ (95% CI: μll - μ

ul)^

What we do know (have observed)

What we want to know (but never will)

The best estimate and its uncertainty

The uncertainty can in some cases also be presented as a probability value

Medical research and generalization

Treatment effects in the observed patients

x, SD

What we do know (have observed)

p < 0.05 or ns

Some weird stuff that no one understands but is necessary for getting manuscripts accepted

Medical researchin practice

Treatment effects in the observed patients

x, SD

What we do know (have observed)

p < 0.05 or ns

Little (if anything) is mentioned about the uncertainty in the generalization of the findings.

Many (if not all) authors severely underestimate the uncertainty of their findings.

Medical researchin practice

Statistical significance and insignificance is typically described as a property of the sample, not the population: “there was a significant difference”.

The presented conclusions are usually a summary of what has been observed in the sample.

SD, SEM and 95%Ci are all believed to describe the variability of observed data.

This is the SPSS-effect on medical research.

Statistics is about much more than statistical significance

Important phenomena are neglected

Examples:

- Regression-to-the-mean (RTM)- Consequences of missing data

The placebo effect and regression to the mean

The Placebo effect is a real phenomenon

In conclusion, we believe that investigating the formation of behavioral and biological changes due to placebos deserves future efforts, as the placebo effect is a “real” neurobiological phenomenon that has important implications for clinical neuroscience research and medical care.

Meissner K. et al. The Placebo Effect: Advances from Different Methodological Approaches. J Neurosci 2011; 31:16117–16124

Problem

The vast majority of reports on placebos have estimated the effect of placebo as the change from baseline in the placebo group of a randomized trial after treatment.

The effect of placebo can thus not be distinguished from the natural course of the disease, regression to the mean, and the effects of other factors.

Systematic review of the placebo effect

114 trials - 8525 patients

We included studies if patients were assigned randomly to a placebo group or an untreated group (often there was also a third group that received active treatment).

Publication bias?There was significant heterogeneity among the trials with continuous outcomes (P<0.001). The magnitude of the effect of placebo decreased with increasing sample size (P=0.05), indicating a possible bias related to the effects of small trials.

ConclusionIn conclusion, we found little evidence that placebos in general have powerful clinical effects.

Placebos had no significant pooled effect on subjective or objective binary or continuous objective outcomes.

We found significant effects of placebo on continuoussubjective outcomes and for the treatment of pain butalso bias related to larger effects in small trials.

The use of placebo outside the aegis of a controlled, properly designed clinical trial cannot be recommended.

Regression to the mean (RTM)

When an extreme group is selected from a population based on the measurement of a particular variable, and a second measurement is taken for the same group, the second mean will be closer to the population mean than the first measurement.

RTM

Any measurement taken consists of two components: the ‘true’ value plus a random error component. It is the random error component that contributes to RTM. If the value of the random error component is large, then the magnitude of the corresponding RTM effects are increased.

Hypothetical example: SF-36 PF

Baseline: mean = 80, SD = 17Follow up: mean = 80, SD = 17p ≈ 1.0

Hypothetical example: SF-36 PF

Baseline: mean = 48.7, SD = 8.6Follow up: mean = 59.2, SD = 16.7p < 0.001

Barnett AG, van der Pols JC, Dobson AJ. Regression to the mean: what it is and how to deal with it. Int J Epidemiol 2005;34:215–220

RTM - Easy to quantify (for Normally distributed endpoints)

Hypothetical example of RTM in SF-36 PF

Mean = 80, SD = 17, cut off = 60

r RTM0.0 28.4 0.1 25.50.2 22.70.3 19.90.4 17.00.5 14.20.6 11.30.7 8.50.8 5.70.9 2.81.0 0

RTM

Evaluation of a single groups’ development over time should be avoided, or at least include a comparison with the expected RTM effect.

Examples

Diagnostic tests

New treatments

Public health efforts

Health care management

Clinical audits

Hospital comparisons

If one were a policy maker alert to the possibilities of using RTM to ‘prove’ an initiative, one might target hospitals at the bottom of the league table with an initiative, extra resources, for example. RTM, combined with a floor effect, will ensure that such a policy can be ‘proven’ to work.

Morton V, Torgerson DJ. Regression to the mean: treatment effect without the intervention. J Eval Clin Pract 2005;11:59-65.

The consequences of missing values

RANDOMIZATION

Inclusion/exclusion criteria

TRT CTRbaselineTRTTRT

baseline

Lost to follow upLost to follow up

TRTFollow up

CTRFollow up

A Randomized Trial

Missing dataMissing data

Study populations

Intention-to-treat (ITT)

Patients are analyzed according to randomization outcome irrespective of received treatment or any protocol violation.

Per-protocol (PP)

The subgroup of the ITT population that has been treated according to the study protocol.

Full Analysis Set (FAS)

The ITT population with exclusion of missing data.

Consequence of missing data

Precision

- reduced power- variability

Validity

- comparability of treatment groups- the representativity of the results

Missing data definitions

Missing outcome values

MCAR (missing completely at random)- independent of both observed and unobserved variables.

MAR (missing at random)- depend only on observed variables.

MNAR (missing not at random)- depend on unobserved variables.

Handling of missing data

1. Complete case analysis (violates the ITT principle, not FAS)

2. Single imputation methods, e.g. LOCF, (biased p-values)

3. Multiple imputation, MI, (requires MCAR or MAR)

4. Mixed models, GEE (requires MCAR or MAR)

Sensitivity analysis

- Compare FAS results with Complete Case analysis results.

- Define missing values as failures.

- Worst case scenario analysis: Define missing values as failures in TRT and successes in CTR.

Recommended