Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Further data analysis topics
Jonathan Cook
Centre for Statistics in Medicine, NDORMS, University of Oxford
EQUATOR – OUCAGS training course24th October 2015
2
Outline
Ideal study
Further topics
– Multiplicity
– Subgroups
– Missing data
Summary
3
Ideal study
An ‘ideal’ clinical study is where
– Every participant was eligible for the study
– All receive the intervention exactly as desired
– All outcomes are obtained for all participants
– Participants directly map into a definable population and clinical decision
Analysis of such a study is (reasonably) straightforward, reliable, interpretable and applicable
In reality?
Man et al., BMJ 2004
5
Who do we analyse?
Statistical analysis premised upon having a representative sample (or that we can get back to such a thing in our analysis)
Patients may though be “unideal”
– Got another treatment before, during or afterwards?
– Might be quite “abnormal”?
– What about important factors (e.g. age)?
– May have incomplete data
Who should be included in the analysis?
What do we do when the outcome is missing?
6
The more you look, the more you will find
Multiplicity
find
7
Dangers of multiplicity
Each statistical test typically has a 5% probability of being significant when in reality there is no real difference
– A “false positive” finding
With multiple tests the probability of at least one false positive finding rises
– With many tests something is likely to be significant
– May be misinterpreted
– Danger of selective reporting (i.e. publish only the significant results)
Multiple tests
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 11 21 31 41 51 61 71 81 91
Pro
ba
bil
ity o
f a
t le
ast o
ne
sig
nif
ica
nt re
su
lt
Number of tests
9
Sources of multiplicity in RCTs
DESIGN
Multiple treatment groups
Multiple outcome measures
Multiple follow-up time points
CONDUCT
Multiple looks at accumulating data
ANALYSIS Grouping of continuous or categorical data Adjusted or unadjusted
Subgroups Do these all generate the same concerns?
PRE-SPECIFY
10
3 groups = 7 comparisons:
– Global: A1 vs A2 vs B
– Pairwise: A1 vs A2; A1 vs B; A2 vs B; A1+A2 vs B; A1+B vs A2; A2 + B vs A1
3 time-points: 1 month; 3 months; 6 months
21 possible comparisons
The trial reported a global analysis of variance at each time-point and a post-hoc multiple comparison test between groups.
Could take account of all time-points using a more complex model (e.g. multilevel model)
Multiple treatments,multiple time-points
11
Adjusting for multiple testing
Formal adjustment to control overall significance level ( ) to desired level (e.g. 0.05) is possible
Under Bonferroni procedure divide the by the number of tests – Overly conservative (as usually outcome/time points are
correlated)
– Considers all analyses of equivalent importance
More complex approach are available but still somewhat simplistic
Better approach is to think about hierarchy of testingand take a p-value with a good pinch of salt
11
12
Dealing with multiplicity
Limit the number of analyses
– Consider analyses which all testing of multiple groups
Prioritise key analyses over others
– Primary versus secondary outcomes
– Hypothesis testing versus hypothesis generating
Distinguish between planned and posthoc (after the event) analyses
Interpret similar analyses together not in isolation
– If only one of 11 analyses on a single outcome is “significant”…
13
To confirm an observed treatment effect is consistent across all major subgroups
We suspect in advance that certain features may alter the magnitude of the effect, e.g. age, severity of disease, histological type of tumour
To identify those for which the treatment does not work
To identify groups who benefit from the treatment even when the overall result is not significant
To generate hypotheses for future studies
Why Examine subgroups?
14
Subgroup analyses
What is the question?
– Main analysis (e.g. RCT looks for a difference in treatments) give an overall finding
– Subgroup analysis asks if there is evidence that result (e.g. the treatment effect in a RCT) varies across subgroups
Examining each subgroup is misleading
– Separate tests do not address the right question
– Multiple tests results in a raised false positive rate
– Commonly done!
Should compare subgroups directly
– Interaction test
Placebo Vaccine Relative Risk Reduction (95%CI)
All volunteers 98/1679 (5.8%) 191/3330 (5.7%)
3.8% (-22.9 to 24.7%)
15
Example: HIV Vaccine Trial
White & Hispanic 81/1508 (5.4%) 179/3003 (6.0%)
-9.7% (-42.8 to 15.7)
Black/Asian/Other 17/171 (9.9%) 12/327 (3.7%) 66.8% (30.2 to 84.2)
Black 9/111 (8.1%) 4/203 (2.0%) 78.3% (29.0 to 93.3)
Asian 2/20 (10.0%) 2/53 (3.8%) 68.0% (-129.4 to 95.5)
Other 6/40 (15.0%) 6/71 (8.5%) 46.2% (-67.8 to 82.8)
16
HIV Vaccine Trial
“This is the first time we have specific numbers to
suggest that a vaccine has prevented HIV infection in
humans”, said Phillip Berman, inventor of the vaccine
and senior vice president of Research and
Development at VaxGen (Brisbane, CA), the company
that is developing the vaccine. “We're not sure yet
why certain groups have a better immune response,
but these preliminary results indicate that a surface
protein vaccine that stimulates neutralising
antibodies correlates with prevention of infection.”
16
Lancet headline
JAMA headline
18
Missing data & why it occurs
Patients lost to follow up are very unlikely to be a random subset of all those randomised as
– they may fail to return because they feel much better or worse
– they failed to comply and feel guilty
– etc.
Missing data may introduce bias (and undermine the benefit of randomisation if we have do so)
Also leads to a loss of statistical precision
19
Missing data & its impact
Impact depends on the amount missing
– Can be large in some contexts, e.g. smoking cessation
Credibility will be weakened if many participants are lost to follow up
– Hence the need to know how complete follow up was
Credibility will particularly suffer if loss to follow up is greater in one group
Missing data in trials
Wood et al. Clin Trials 2004
21
Dealing with missing data
No fully satisfactory solution
– Assumptions are needed beyond those needed to analyse full data set
– All approaches make important assumptions
– Those assumptions are largely uncheckable
– Can investigate sensitivity to those assumptions
Main options
– Ignore & conduct ‘complete case analysis’
– Impute
22
Imputing
Simple imputation
– All missing values set to the same outcome (e.g. best or worst)
• Leads to optimistic or pessimistic results for binary outcomes
• Difficult for continuous data (can use mean or median)
– Leads to overly-precise results
Common simple imputation approaches
– ‘Best case - worst case’
• Generally not helpful
– Last value carried forward
• Popular but problematic
More complex regression methods
– Assume a relationship between missing and observed data
– Valid analysis if underlying assumptions are correct
23
LOCF (1)
We have a trial with longitudinal follow-up
– Observations at 2 or more different times
– With no dropouts analysis is straightforward
Under last observation carried forward (LOCF)
– Where patients have partial (e.g. dropped out) data we fill in all their missing observations with their last observation
– We analyse this completed data set as if it was the real data set
Simple and popular, but …
24
LOCF (2)
We make the strong assumption that unseen observations equal the last observation seen
– How plausible?
We also ignore uncertainty associated with that assumption
– Imputed data should show more uncertainty than real data, not less!
Method has bad properties
– Gives biased treatment estimates
• Direction and size of bias depends on (unknown) true effect
– Tests are biased (over-optimistic)/Confidence intervals wrong coverage
Pittler et al.Br J Dermatol2003
LOCF (3)
26
The best solutions to missing data
Don’t have any!
Design the trial to maximise completeness of data collection
– e.g. systems for chasing people
Anticipate possibility of missing data when preparing protocol and analysis plan
– Pre-specify statistical methods
Assess sensitivity of result to assumptions
27
General strategy –analysis & reporting
Analysis
Decisions about which analyses to do and who to include should be made (AFAP) before viewing data
Document reasons for missing data and quantify it
Advisable to do analysis on “everyone relevant” even if good reasons for look at a specific subpopulation
Less analysis is more (consider the threat of multiple comparisons)
Reporting
Always clarify who was included in each analysis
Depict key inclusion decisions in a flow diagram
Report posthoc as posthoc
Interpret similar tests together
28
Summary
What gets into the analysis affects the validity & credibility of the findings
Studies should be designed to minimise missing data
Statistical analyses need careful planning
– Be choosey about analyses (less is more)
Report what you did clearly, fully and accurately as intended
– Not in relation to chance findings
29
References
Man WD-C, et al. BMJ 2004 Community pulmonary rehabilitation after hospitalisation for acute exacerbations of chronic obstructive pulmonary disease: randomised controlled study. doi:10.1136/bmj.38258.662720.3A.
Molnar F, et al. Does analysis using "last observation carried forward" introduce bias in dementia research? , CMAJ 2008 179(8) 751-3.
Pittler MH, et al. Randomized, double-blind, placebo-controlled trial of autologous blood therapy for atopic dermatitis. Br J Dermatol. 2003 Feb;148(2):307-13.
Bender R, Lange S. Adjusting for multiple testing--when and how? J Clin Epidemiol. 2001 Apr;54(4):343-9.
Dmitrienko A, et al. General Guidance on Exploratory and Confirmatory Subgroup Analysis in Late-Stage Clinical Trials J Biopharm Stat. 2015 [Epub ahead of print]