Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Generalizing experimental study results to targetpopulations
Elizabeth StuartJohns Hopkins Bloomberg School of Public Health
Departments of Mental Health, Biostatistics,and Health Policy and Management
[email protected]/∼estuart
Funding thanks to NSF DRL-1335843, IES R305D150003
February 26, 2016
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 1 / 25
Outline
1 Introduction, context, and framework
2 The setting and overview of approaches
3 Reweighting approaches
4 Conclusions
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 2 / 25
Outline
1 Introduction, context, and framework
2 The setting and overview of approaches
3 Reweighting approaches
4 Conclusions
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 3 / 25
Making research results relevant: A range of policy orpractice questions
A given district or school may go on to the What WorksClearinghouse to see whether a new reading intervention is“evidence-based” and helpful for them
The state of Maryland may be deciding whether to recommend thenew program for all schools or districts in the state
Or for all “struggling” schools?
Medicare may be deciding whether or not to approve payment for anew treatment for back pain
Should a broad public health media campaign be started around notswitching car seats to forward facing until a child is 12 months old?
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 4 / 25
From individual to population effects
All of these reflect a “population” average treatment effect
e.g., across individuals in a population, does this intervention work “onaverage”?This population could be fairly narrow, or quite broad
There may actually be underlying treatment effect heterogeneity
e.g., stronger effects for some individualsLots of interest in tailoring treatments for individuals; not my focustoday
But for policy questions that motivate today’s talk, desire an overallaverage effect
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 5 / 25
At this point, relatively little attention to how well results from a givenstudy might carry over to a relevant target population
This talk will discuss recent work trying to get people to start thinkingabout these issues, while taking advantage of recent advances in studyquality and data
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 6 / 25
How much do we need to worry about external validity?
Lots of evidence that the people or groups that participate in trialsdiffer from general populations
Will cause bias if the factors that differ also moderate treatment effects
Districts that participate in rigorous educational evaluations muchlarger than typical districts in the US (Stuart et al., under review)
People that participate in trials of drug abuse treatment have highereducation levels than those in drug abuse treatment nationwide(Susukida et al., in press)
Increasing worries about lack of minority representation in clinicaltrials
And these differences can lead to external validity bias (Bell et al., inpress)
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 7 / 25
Outline
1 Introduction, context, and framework
2 The setting and overview of approaches
3 Reweighting approaches
4 Conclusions
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 8 / 25
The setting
Assume we have one randomized trial, already conducted
And also covariate data on some target population of interest (do nothave treatment values or outcomes in the population)
The question: How can we use these data to estimate the effects ofthe intervention in the target population?
Note: Focused on assessing and enhancing external validity withrespect to the characteristics of trial and population subjects
Lots of other threats to external validity as well: scale-up problems,implementation, different settings, . . . (see Cook, 2014)
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 9 / 25
Analysis approaches for estimating population effects
Meta-analysis: When multiple studies available, but does notnecessarily give population estimates
Cross-design synthesis: Explicitly combines experimental andnon-experimental effect estimates (Pressler & Kaizar, 2013)
Model-based approaches: Model outcome in the trial, use topredict outcomes in the population (e.g., BART; Kern et al., 2016)
Post-stratification: Estimate separate effects, then combine usingpopulation proportions
Reweighting: Like a smoothed version of post-stratification (Cole &Stuart, 2009; O’Muircheartaigh & Hedges, 2014)
(Of course design options exist too, e.g., aiming to enrollrepresentative (or “balanced”) samples (Royall!))
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 10 / 25
Outline
1 Introduction, context, and framework
2 The setting and overview of approaches
3 Reweighting approaches
4 Conclusions
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 11 / 25
Case study: The ACTG Trial
Examined highly active antiretroviral (HAART) therapy for HIVcompared to standard combination therapy
577 US HIV+ adults randomized to treatment, 579 to control
33/577 and 63/579 endpoints (AIDS/death) during 52-week follow-up
Intent-to-treat analysis: Hazard ratio of 0.51 (95% CI: 0.33, 0.77)
Cole & Stuart (2010)
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 12 / 25
The target population
Don’t necessarily just care about people in trial
What would the effects of the treatment be if implementednationwide?
US estimates of the number of people infected with HIV in 2006(CDC, 2008)
HIV incidence was estimated using a statistical approach withadjustment for testing frequency and extrapolated to the US
Have joint distribution of sex, race, and age groups of the newlyinfected individuals
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 13 / 25
Inverse probability of selection weighting
Weight the trial subjects up to the population
Each subject in trial receives weight wi = 1P(Si=1|X )
(Inverse of their probability of being in the trial)
Use those weights when calculating means or running regressions
Related to inverse probability of treatment weighting,Horvitz-Thompson estimation in surveys
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 14 / 25
Standard assumptions
Experiment was randomized
“Sample ignorability for treatment effects”: selection into the trialindependent of impacts given the observed covariates
For the same value of observed covariates, impacts the same acrosstrial and populationNo unmeasured variables related to selection into the trial andtreatment effects (Sensitivity analysis for this: Nguyen et al., underreview)
“Overlap”: all individuals in the population had a non-zero probabilityof participating in the trial
Analogous to strong ignorability/unconfoundedness of treatmentassignment in non-experimental studies
(If outcome under control observed in the population, can use aslightly different assumption)
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 15 / 25
Effect heterogeneity and predictors of participation
People in trial more likely to be:
Older (not 13-29)MaleWhite or Hispanic
Those characteristics also moderate effects in the trial
Detrimental effects for young peopleLargest effects for those 30-39Larger effects for males, as compared to femalesLarger effects for blacks, as compared to White or Hispanic
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 16 / 25
Estimated population effects
Hazard ratio 95% CI
Crude trial results 0.51 0.33, 0.77Age weighted 0.68 0.39, 1.17Sex weighted 0.53 0.34, 0.82Race weighted 0.46 0.29, 0.72Age-sex-race weighted 0.57 0.33, 1.00
CI’s longer for weighted results
Effects generally somewhat attenuated, except for weighting only byrace
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 17 / 25
Placebo checks
Can also use the weighting as a diagnostic
Weighted control group mean should match the population outcomemean if the control conditions are the same (“placebo check”)
In HAART case, if we had mortality information in the population,could see if weighted mortality rate among control group matched thepopulation mortality rate (assuming no treatment in the population)
If placebo check fails, may indicate unobserved differences betweenthe groups
Hartman et al., 2013; Stuart et al., 2011
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 18 / 25
Outline
1 Introduction, context, and framework
2 The setting and overview of approaches
3 Reweighting approaches
4 Conclusions
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 19 / 25
Everyone wants to assume that study results generalize
But very few statistical methods exist
At this point, lots of “hand waving,” qualitative statements
Need more statistical methods to quantify and improve externalvalidity
For both study design and study analysis
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 20 / 25
What do we need to assess and enhance external validity?
Information on the factors that influence treatment effectheterogeneity
Information on the factors that influence participation in rigorousevaluations
Data on all of these factors in the trial and the population
Not very helpful if these factors not observed in the population
Methods that allow for the differences between trial and populationon these factors
These are coming along
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 21 / 25
Data a primary limiting factor
Right now we have very little information on factors that influenceeffects or participation in trials
Sometimes hard to find population data
Trial data also often not publicly available
Even harder to find population data that has the same measures astrial of interest
Stuart & Rhodes (under review): Hard to find appropriate populationdata, and even then out of over 400 measures in each, only about 7were comparable
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 22 / 25
Conclusions
Can’t necessarily assume that average effects seen in a trial wouldcarry over directly to a target population
Methods allow us to adjust for differences in observed characteristicsbetween the trial sample and population to estimate populationtreatment effects
But only as good as the data available!
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 23 / 25
And remember . . .
“With better data, fewer assumptions are needed.”- Rubin (2005, p. 324)
“You can’t fix by analysis what you bungled by design.”- Light, Singer and Willett (1990, p. v)
“Real world relationships are invariably more complicated than those wecan represent in mathematically tractable models.”- Royall and Pfeffermann (1981, p. 16)
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 24 / 25
References, with thanks to all my co-authors
Bell, S.H., Olsen, R.B., Orr, L.L., and Stuart, E.A. (in press). Estimates of external validity bias when impactevaluations select sites non-randomly. Forthcoming in Education Evaluation and Policy Analysis.
Cole, S.R. and Stuart, E.A. (2010). Generalizing evidence from randomized clinical trials to target populations: theACTG-320 trial. American Journal of Epidemiology 172: 107-115.
Imai, K., King, G., and Stuart, E.A. (2008). Misunderstandings between experimentalists and observationalists aboutcausal inference. Journal of the Royal Statistical Society, Series A 171: 481-502.
Kern, H.L., Stuart, E.A., Hill, J., and Green, D.P. (2016). Assessing methods for generalizing experimental impactestimates to target populations. Journal of Research on Educational Effectiveness.
Olsen, R., Bell, S., Orr, L., and Stuart, E.A. (2013). External Validity in Policy Evaluations that Choose SitesPurposively. Journal of Policy Analysis and Management 32(1): 107-121
Stuart, E.A., Cole, S.R., Bradshaw, C.P., and Leaf, P.J. (2011). The use of propensity scores to assess thegeneralizability of results from randomized trials. The Journal of the Royal Statistical Society, Series A 174(2): 369-386.
Stuart, E.A., Bradshaw, C.P., and Leaf, P.J. (2015). Assessing the generalizability of randomized trial results to targetpopulations. Prevention Science 16(3): 475-485.
Susukida, R., Crum, R., Stuart, E.A., and Mojtabai, R. (in press). Assessing Sample Representativeness in RandomizedControl Trials: Application to the National Institute of Drug Abuse Clinical Trials Network. Forthcoming in Addiction.
Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 25 / 25