Confounding adjustment: Ideas in Action -a case study

Confounding adjustment: Ideas in Action -a case study

Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine

2

• Description of the data set• Quantity to be estimated• Summary of baseline characteristics• Approaches to data analyses• Results• Discussion

Outline

3

Linder Center data described and analyzed in Kereiakes et al. (2000)

• 6 month follow-up data on 996 patients who underwent an initial Percutaneous Coronary

Intervention (PCI) were treated with “usual care” alone or usual care plus a

relatively expensive blood thinner (IIB/IIIA cascade blocker

• has10 variables Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost) X: 1 treatment variable, and 7 baseline covariates,

stent, height, female, diabetic, acutemi, ejecfrac and ves1proc

Simulation Setup

4

Baseline characteristics

Stent coronary stent deployment

female patient sex

diabetic diabetes mellitus

acutemi acute myocardial infarction

ves1proc number of vessels involved in initial PCI

height In centimeter

ejecfrac left ejection fraction %

5

Simulation data set was based on the Linder Center data

• 17 copies of the clustered Lindner data, with fudge factors added to ejfract and hgt, and some clipping

same correlation among covariates, same clustering patterns

• Contains the values of 10 simulated variables for 10,325 hypothetical patients

• To simplify analyses, the data contain no missing values.

• Details and dataset available from Bob’s website

The “LSIM10K” dataset

6

The population average treatment effect (ATE), i.e.,

E(Y1) - E(Y0)

Y1 and Y0 are conterfactual outcomes

In plain words: what if scenarios

The expected response if treatment had been assigned to the entire study population minus the expected response if control had been assigned to the entire study population

What do we want to estimate?

7

Baseline covariate balanceassessment

Variable C (Usual care

alone)

T (Usual care + Abciximab)

P value

stent 63% 69% <0.001

female 33% 34% 0.36

diabetic 23% 19% <0.001

acutemi 7% 15% <0.001

ves1proc 1.4 (±0.6) 1.3 (±0.6) <0.001

height (cm) 172.5 (±10) 171.5 (±10) <0.001

ejfract 53 (±8) 50 (±10) <0.001

8

Visualizing overall imbalance

C

Deep blue = high values

T

9

The following methods were applied to lsim10k

• Outcome regression adjustment (OR)• Propensity score (PS) stratification• Inverse-probability-treatment-weighted (IPTW)• Doubly robust estimation• Matching by

Mahalonobis distancePS only

Analytical Methodsfor confounding adjustment

ANALYSIS OF MORT6MOOR model for mort6mo :• treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract• Residual deviance: 2410.4 on 10323 degrees of freedom

PS model:• saturated model for the five categorical covariates (main effects and interaction terms up to fifth-order)• main effects and quadratic terms for height and ejfract

Covariates Balance Evaluations based on PS Quintiles

12

Stent

13

Female

14

Diabetic

15

Acutemi

16

Ves1proc

17

Heightstrata 2 (0.95 cm) and 3 (-1.50cm)

18

Height

• Existence of residual confounding after adjusting for PS quintiles

• The within-stratum between-group height difference

mean s.d. p

Stratum 2: 0.949 0.44 0.032

Stratum 3: -1.497 0.43 0.0005

19

Ejfractstrata 1 (0.81), 2 (-1.32) and 3 (-0.72)

20

• Existence of residual confounding after adjusting for PS quintiles

• The within-strata between-group height difference mean s.d. p-value

Stratum 1: 0.812 0.41 0.0475

Stratum 2: -1.322 0.33 7.38e-5

Stratum 3: -0.721 0.32 0.025

Ejfract

21

• Residual confounding within strata

• In PS stratification method, height and ejfract are further adjusted

stratum specific Treatment effect Height, ejfract main effects and their quadratic terms

PS Stratification

22

Results – mort6mo

Method u1 u0 △ SE

Outcome Regression

0.010 0.043 -0.032 0.0038

PS strat. 0.012 0.044 -0.033 0.0039

IPTW1 0.011 0.045 -0.034 0.0038

IPTW2 0.011 0.045 -0.034 0.0037

DR 0.011 0.043 -0.032 0.0037

Match Mahalanobis PS

NA NA -0.037-0.036

0.00440.0039

Results of all methods are consistent, providing evidence of treatment effectiveness at preventing death at 6 months.

True △=-0.036

ANALYSIS OF CARDCOST

cardcost model:•treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract

PS MODEL: SAME AS BEFORE

cardcost model of CA with PS stratification:

stratum specific Treatment effectHeight, ejfract main effects and their quadratic terms

24

Model checking – OR Adjusted R-squared: 0.0386

25

Model checking – OR (log transformed) Adjusted R-squared: 0.0693

26

Results – cardcostMethod u1 u0 △ SE

OR: original scale

15308 15300 8 210

OR: Log transformed

13536 13702 -166 111

PS strat. 13580 13639 -59 119

IPTW1 15545 15226 -319 409

IPTW2 15408 15303 -105 229

DR 15393 15292 -101 226

Match Mahalanobis PS

NA NA 150-3

178215

27

• All methods give consistent results on the 2 outcomes

• All PS based results have similar variance except IPTW1

• IPTWs depend on approx. correct PS model• OR depends on approx. correct outcome model• DR is a fortuitous combination of OR and IPTW: de

pends on one of models being right• Nonparametric models of either models may be an

alternative to parametric models

Discussion

28

Double Robustness

Method PS outcome △ SEIPTW2 wrong NA 464 214

DR

wrong wrong right

wrong right wrong

463166

-131

217214233

• wrong PS model: adjust for one covariate ‘acutemi’ only• wrong OR model for card cost: adjust for the treatment indicator ‘trtm’ and the ‘acutemi’ covariate

By “right”, we mean approximately.

29

• The majority applications in literature use a parametric logistic regression model that assume covariates are linear and additive on the log odds scale May include selected interactions and polynomial terms

• Accurate PS estimation is impeded by High dimensional covariates – which ones should we de-

confound? Unknown functional form – how do they relate to the

treatment selection

• PS model misspecification can substantially bias the estimated treatment effect

• Nonparametric approach is flexible to accommodate nonlinear/non-additive relationship of covariates to treatment assignment, e.g., trees

Propensity score estimation

30

Nonparametric regression techniques

• Generalized Boosted Models (GBM) to estimate the propensity score function Friedman, 2001; Madigan and Ridgeway, 2004;

McCaffrey, Ridgeway, and Morral, 2004 R package: twang

• Regression tree model to predict cardcost Ripley, 1996; Therneau and Atkinson, 1997 R package: rpart

31

• A multivariate nonparametric regression technique• Sum of a large set of simple regression trees modelling

log-odds gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x)

• Predict treatment assignment from a large number of pretreatment covariates – adaptively choose them

• Nonlinear• No need to select variables• Can model complex interactions• Invariant to monotone transformations of x

E.g, same PS estimates whether use age, log(age) or age2

• Outperforms alternative methods in prediction error

Generalized Boosted Models (GBM)

32

Results – cardcostnonparametric approach

Method u1 u0 △ SE

DR:parametric models

15393 15292 -101 226

DR:Gbm + parametric model

15303 15213 -90 210

DR:Gbm + tree

15233 15356 123 172

33

• People try quintiles, deciles for propensity score stratification – need data driven approach (based on bias-variance tradeoff) for number of strata

• Model selection: PS model, and outcome model Nonparametric estimation of models may be intuitive,

but not clear about the properties of the causal estimates

Nonparametric caveat: still need to define a set of “confounders” based on knowledge of causal relationship among treatment, outcome and covariates rather than conditioning indiscriminatly on all covariates that have associations with treatment and outcome

Future research

Documents

Confounding adjustment: Ideas in Action -a case study