Propensity score modeling for matching and weighting

Propensity score modeling for matching andweighting estimators: calibration/entropy

balancing approaches

Ingeborg Waernbaum

Department of StatisticsUppsala University

Uppsala Stadstradgarden, end of May 2020

Background and summary

Estimating a causal effect in an observational study, ATE. PS used indifferent ways in estimators.

Stratification, matching, regression adjustment with PS.

Inverse probability of treatment weighting (IPW)

The true PS is unknown, often estimated using a parametric model.Misspecified models - bias in causal effect estimators.

PS research: Model errors and approaches to reduce them.

Background and summary

Focus in causal inference research on the role of model assumptions(PS and oucome regression models)

This has resulted in proposals of alternatives with aim to be less modeldependent.

Also, focus on using machine learning (ML) for nonparametricestimation of the PS.

This talk is a mix of material: matching and IPW are established PSmethods, entropy balancing estimators are more recent. Includesongoing research with co-authors Emma Persson and David Kallbergand an application from the Swedish Childhood Diabetes Register.

Most software available but not all user-friendly.

Swedish childhood Diabetes Register, SCDR

Disease incidence register for Type 1 Diabetes Mellitus (T1DM)diagnosed in the ages 0-15. Parents and siblings are included.

Linkage to several administrative registers – the LongitudinalIntegration Database for Health Insurance and Labour MarketStudies (LISA), the Prescription drug Register, Cancer register,Medical birth register, Inpatient hospital register.

Matched case-control design. Studies have included risk factorsfor T1DM, short and long term complications of T1DM,Economic and educational outcomes for T1DM cases, parentsand siblings.

Notation, as earlier

Binary treatment, Z (1=treatment, 0=controls).

X, pre-treatment variables, covariates.

Potential outcomes, Y(1), Y(0)

Observed outcome Y = Z · Y(1) + (1− Z) · Y(0) (consistencyassumption)

Estimand of interest β = E[Y(1)− Y(0)]

Propensity score: e(X) = P(Z = 1|X)

Assumptions:Generic: (SUTVA, Consistency), Identification: No unmeasuredconfounding and overlap.

Propensity score theory

Biometrilca (1083), 70, 1, pp. 41-55 4 1Printed in Ortat Britain

The central role of the propensity score in observationalstudies for causal effects

BY PAUL R. ROSENBAUMDepartments of Statistics and Human Oncology, University of Wisconsin, Madison,

Wisconsin, U.S.A.

AND DONALD B. RUBINUniversity of Chicago, Chicago, Illinois, U.S.A.

SUMMARY

The propensity score is the conditional probability of assignment to a particulartreatment given a vector of observed covariates. Both large and small sample theoryshow that adjustment for the scalar propensity score is sufficient to remove bias due toall observed covariates. Applications include: (i) matched sampling on the univariatepropensity score, which is a generalization of discriminant matching, (ii) multivariateadjustment by subclassification on the propensity score where the same subclasses areused to estimate treatment effects for all outcome variables and in all subpopulations,and (iii) visual representation of multivariate covariance adjustment by a two-dimensional plot.

Same hey words: Covariance adjustment; Direct adjustment; Discriminant matching; Matched sampling;Xonrandomized study; Standardization; Stratification; Subclassification.

1. DEFINITIONS

1-1. The structure of studies for causal effects

Inferences about the effects of treatments involve speculations about the effect onetreatment would have had on a unit which, in fact, received some other treatment. Weconsider the case of two treatments, numbered 1 and 0. In principle, the ith of the Nunits under study has both a response ru that would have resulted if it had receivedtreatment 1, and a response r0l that would have resulted if it had received treatment 0.In this formulation, causal effects are comparisons of ru and rOi, for example ru — r0l orriilroi- Since each unit receives only one treatment, either ru or r0l is observed, but notboth, so comparisons of ru and rOi imply some degree of speculation. In a sense,estimating the causal effects of treatments is a missing data problem, since either ru orr0( is missing.

This formulation is that used in the literature of experimental design, for example, inthe books by Fisher (1951) and Kempthorne (1952), and follows the development byRubin (1974, 1977, 1978, 1980a); Hamilton (1979) adopts a similar approach. Thestructure would not be adequate when, for example, the response of unit i to treatment /depends on the treatment given to unit j , as could happen if they compete for resources.The assumption that there is a unique value rti corresponding to unit i and treatment thas been called the stable unit-treatment value assumption (Rubin, 1980a), and will be

Dow

nloaded from https://academ

ic.oup.com/biom

et/article-abstract/70/1/41/240879 by guest on 08 June 2020



-X

gg(X)

e f

e(X)

R

b(X) = {g(X), e(X)}

Misspecifications and bias


A matching-type estimator:

E[Y(1)− Y(0)] = Ee(X) [E[Y(1)− Y(0) | e(X) = e(x)]]

Gives robustness if a misspecified propensity score e(X) ≈ b(X) (andalso for a larger class of functions, e.g., prognostic balance).


Algorithm for PS modeling

The balancing property emphasized in PS-modeling

Stepwise algorithm:

1. Start with subject matter required covariates (and keep them)

2. Fit a linear model for a large pool of covariates

3. Add higher order terms and interaction

4. Evaluate balance (e.g. two-way matching balance)

Iterate between step 3 and 4.

Will select a model that produces the balance evaluated in the balancecriteria.


Two-way matching balance

Balance:X ⊥⊥ Z | e(X)

Often implemented by evaluating some aspects of the univariatedistributions of X = X1, . . . ,Xj, . . . ,Xp in the matched sample

F(Xj|Z = 1, e(X) = e(x)) = F(Xj|Z = 0, e(X) = e(x))

,for example standardized mean differences.


Matching R Software

Matching, Multivariate and Propensity Score Matching withBalance Optimization (Sekhon, 2020)

MatchIt Nonparametric Preprocessing for Parametric CausalInference (Imai, 2018). A variety of matching andsubclassification methods without outcome analyis.

designmatch Matched samples that are balanced andrepresentative by design. (Zubizarreta, 2018)


refugees

flatsmiss

flats

rubblemiss

rubble

indrate

emprate

popgrowth3339

popgrowth1939

log2pop

0.0 0.2 0.4 0.6 0.8 1.0

Absolute standardized differences in means

Before matchingAfter matching


IPW estimators

Under No unmeasured confounding and overlap

β = E(

ZYe(X)

)− E

((1− Z)Y1− e(X)

)= E(Y1 − Y0).

Motivating an estimator of type:

βIPW =1N

N∑i=1

(ZY

e(X)− (1− Z)Y

1− e(X)

).


PS modeling for weighting estimators

Stepwise algorithm:

1. Start with subject matter required covariates (and keep them)

2. Fit a linear model for a large pool of covariates

3. Add higher order terms and interaction

4. Evaluate balance (e.g. two-way weighting balance)

Iterate between step 3 and 4.

Will select a model that produces the balance evaluated in the balancecriteria.


Two-way weighting balance

Evaluating aspects of

F[

ZXj

e(X)

]= F

[(1− Z)Xj

1− e(X)

]For example standardized mean differences of the weighted variables.


Misspecification and bias for IPW-estimators

Asymptotic error:

βIPW − βP−→ E

[e(X)e∗(X)

β1(X)]− E

[1− e(X)1− e∗(X)

β0(X)]− (β1 − β0).

Where e∗(X) is the limiting PS under model misspecification.

Can be large for minor misspecifications of the PS.


R Software

PSW Propensity Score Weighting Methods for DichotomousTreatments, Huzhang Mao (2018). Normalized IPW.

Package iWeighReg Improved methods for causal inference andmissing data problems, Zhiqiang Tan (2015). Simple IPW

CBPS Covariate Balancing Propensity Score, Christian Fong (2019),Combines parametric IPW with two-way balancing of covariates, has anonparametric option.


Weighting based on three-way balance

The propensity score has a property of three-way balance:For each function of the covariates u(X) with E[u(X)] <∞

E[

Zu(X)e(X)

]= E

[(1− Z)u(X)

1− e(X)

]= E[u(X)], (1)

Entropy balancing/Calibration estimators

Calibration/entropy balance

Instead of iteratively evaluating balance, a class of calibration/entropybalancing estimators, directly enforcing covariate balance (ATT,Hainmueller 2012) aims at reducing model dependence. (Statapackage, ebalance, R-package ebal)

Recent proposals and extensions (ATE), Chan et al. (2016), Kallbergand Waernbaum (2019), Tan (2020) for high-dimensional data.R-package ATE, RCAL.

The estimators target three-way balance (treated-control-the combinedgroup).


Entropy Balancing for estimation of β

EB estimators directly derive weights w1, . . . ,wN such that a sampleversion of three-way balance holds.

For a user-specified selection of functions denoted byUK(X) = (u1(X), . . . , uk(X), . . . , uK(X)).

1N

N∑i=1

wiZiuk(Xi) =1N

N∑i=1

wi(1− Zi)uk(Xi) =1N

N∑i=1

uk(Xi),

k = 1, . . . ,K.


The EB-weights w = (w1, . . . ,wN) are derived by minimizing thedivergence, or relative entropy, D(w||d) relative to base weightsd = (d1, . . . , dN):

minw

D(w||d)

subject to the three-way balance

1N

N∑i=1

wiZiuk(Xi) =1N

N∑i=1

wi(1− Zi)uk(Xi) =1N

N∑i=1

uk(Xi)


Users need to specify

Divergence D(w||d)Balance functions u1(X), . . . , uK(X)

Base weights d = (d1, . . . , dN) (usually uniform di = 1)

Here two members of the Renyi α-divergences Dα(w||d) are studied:

D1(w||d) =N∑

i=1

wi log(wi/di), (Kullback-Leibler)

D2(w||d) = log

(N∑

i=1

w2i /di

), (Quadratic)


EB-results

EB-estimators, although without explicit parametric assumptions,correspond to implicit model assumptions for the propensity score;balance functions UK(X) are regressors in an outcome model and thedivergence the link function in the PS model.

The EB weights hold three-way covariate balance, ’treated - control -combined group’. Can be contrasted with the two-way balance, treatedand control, in CBPS methods.

EB has certain robustness properties, similar to, but somewhat morerestrictive than the usual double robustness properties.


Comparative study of alternatives to parametricIPW-estimation

Background: Large sample properties of entropy balancing estimatorsof average causal effects, Kallberg and Waernbaum, (2019). Describesproperties of two specific entropy balancing estimators(Kullback-Leibler divergence, quadratic Renyi divergence). Varianceestimators.

Recent proposals of calibration estimators with (sometimes)accompanying R-packages.

Nonparametric IPW (RF algorithms for estimating the PS model).

EBAL-QR, EBAL-KL, IPW-logit, IPW-probit, CBPS, CBPSex,CBPS-NP, CAL-Tan, CAL2Way, IPW-RF1, IPW-RF2, RKHS,DR-logit, DR-probit.


14 estimators of the average causal effect, and 6 variance estimators.

A simulation study: Replicating (and extending) three simulationdesigns from previous studies in the literature. The extensions includesnew response variables (non-constant treatment effect).

Frolich, 2004, Hainmueller 2012, 9 DGPs

Lunceford and Davidian (2004), 4 DGPs

Persson, et al. (2017), 2 DGPs

An application estimating the average treatment effect of schoolachievements on accute complications of T1DM from the SwedishChildhood Diabetes Register (SCDR).

Simulations

For all the estimators (where it can be applied), we use two sets ofbalancing constraints UK(X). Either the covariates (first order) or anextended set including (second order terms and pairwise interactions),that aims at adding flexibility, to the estimators.

We analyze bias, standard errors and RMSE for each of the estimators.

Simulations

Simulations: summary of results

In line with previous studies IPW-estimators under modelmisspecification are biased in all designs with some exception.

For both bias and variance the original CBPS-estimator combiningbalance restriction and MLE equations have sometimes very large bias.In addition, the IPW estimator using random forest also has a large biasfor many of the designs.

The EBAL, CAL-Tan estimators have small bias and variance whenusing the larger set of covariate functions UK(X). The DR estimatorsare in line with the calibration estimators. EBAL using two-waybalance shows poor results in some designs.

Application: SCDR

Application: Effect of school achievements on accutecomplications of T1DM

Data from the Swedish Childhood Diabetes Register. All childrendiagnosed with T1DM before the age of 15 and that received acompulsory school grade in Sweden between 1998 and 2006.

N=3773, Due to missing information on school grades (74 individuals)and the mother and/or the father (185 individuals), 259 individuals(6.4%) were excluded from the study.

We use the grade attained when finishing compulsory school as ameasure of learning ability. T = 1, ”low grade”, is defined as havingreceived a grade lower than 165 (the 20th percentile).

The outcome is the number of times an individual has been hospitalizedwith a main diagnosis related to accute complications of T1DM. Weadjust for 15 demographic and socioeconomic confounders.

Application: SCDR

Table: Results for the empirical diabetes data. Unadjusted treatment effect ≈0.95

β σ(w)EBAL-QR 0.87 0.94EBAL-KL 0.82 0.99IPW-LOG 0.74 1.87

IPW-PROB 0.69 2.49CBPS 0.85 0.95

CBPSex 0.83 0.88CBPSnp 0.74 0.95

CAL-TAN 0.82 1.03TWO-WAY 0.83 0.76

IPW-RF 0.71 1.22RKHS 0.75 1.67

DR-LOG 0.84 *DR-PROB 0.83 *

Application: SCDR

Table: Results using the standard deviation estimators for the diabetes data

β sd-est 95 CI%EBAL-QR1 0.87 0.24 (0.39,1.34)EBAL-KL1 0.82 0.24 (0.35,1.29)IPW-LOG1 0.74 0.16 (0.43,1.05)

aIPW-LOG1 0.84 0.15 (0.55,1.13)CHAN-QR1 0.87 0.19 (0.50,1.24)CHAN-KL1 0.82 0.15 (0.53,1.11)

Application: SCDR

ConclusionIn the comparative study the EBAL estimators, CAL-Tan and DRestimators most often give smallest bias. Especially for the larger setUK(X). For the standard errors EBAL and CAL-Tan are often smallest.RKHS performs well in most studies in spite of producing quitedifferent weights than the other weighting estimators.

The original CBPS (Imai and Ratkovich, 2014) and IPW based on therandom forest algorithm are standing out with sometimes very poorresults.

The variance estimators proposed from Kallberg and Waernbaum(2019) are both under and over-estimating the true variance.

The DGPs could also be used to directly calculate numericalapproximations of the asymptotic bias from Kallberg and Waernbaum(2019).

Application: SCDR

References

J. Hainmueller. Entropy balancing for causal effects: A multivariatereweighting method to produce balanced samples in observational studies.Political Analysis, 20(1):25-46, 2012.

J. K. Lunceford, Jared K., and M. Davidian. Stratification and weighting viathe propensity score in estimation of causal treatment effects: a comparativestudy. Statistics in medicine 23.19 (2004): 2937-2960.

K. Imai and M. Ratkovic. Covariate balancing propensity score. Journal ofthe Royal Statistical Society: Series B (Statistical Methodology),76(1):243-263, 2014.

E. Persson et al., Data-driven algorithms for dimension reduction in causalinference. Computational Statistics and Data Analysis 105 (2017): 280-292.

Application: SCDR

References

Z. Tan. Regularized calibrated estimation of propensity scores with modelmisspecification and high-dimensional data. Biometrika, (2020), 107 (1),137-158.

R.K. Wong, and K. C. G. Chan. Kernel-based covariate functional balancingfor observational studies. (2018) Biometrika 105 (1), 199-213.

K.C.G. Chan, S. C. P. Yam, and Z. Zhang. Globally efficient non-parametricinference of average treatment effects by empirical balancing calibrationweighting.” Journal of the Royal Statistical Society: Series B (StatisticalMethodology) 78.3 (2016): 673-700.

Kallberg, D. and Waernbaum, I. Large sample properties of entropybalancing estimators of average causal effect (2019, manuscript).

Documents

Propensity score modeling for matching and weighting