Causal Inference and the Role of Machine Learning...“Causal Inference and the Role of Machine...

“Causal Inference and the Role of Machine Learning”

David Benkeser

Assistant Professor Department of Biostatistics and BioinformaticsRollins School of Public HealthEmory University

Disclosure

Dr. Benkeser discloses that this image was not used in his talk.

Basic causal inference

Observed data:W = putative confoundersA = binary treatmentY = binary outcome

Counterfactual outcomes:Y(1) = mean we would see under treatment A = 1Y(0) = mean we would see under treatment A = 0

Population-level causal effects:Average treatment effect: E[Y(1) – Y(0)]

Are the right data captured on the right people?

Pyramid of Good Science (patent pending)

Are the methods robust?

Methods

Is the question answered in a reliable way?Pre-specification

Does the estimand answer the questions of interest?Estimand

Randomized vs. observational studies

Why are randomized trials the gold standard of evidence?• Data are collected specifically to answer question• Simple estimands* for drawing causal inference• Regulatory oversight of analysis• Simple methods for drawing causal inference

How can observational/secondary data analysis have similar rigor?• Scrutinize data source and assumptions• Scrutinize estimands• Pre-specify analyses• Use robust methodology

*assuming no missing data, perfect compliance, etc…

Roadmap for causal inference

1. Specify a causal model representing scientific knowledge.

2. Specify observed data and link to causal model.

3. Specify causal question and causal parameter.

4. Assess identifiability of quantity of interest.

5. Specify a statistical parameter and statistical model.

6. Estimate statistical parameter with robust, pre-specified approach.

7. Interpret results.

txhistory

-omics

gender

imaging

clinic

outcomes

Causal models

txhistory

-omics

gender

imaging

clinic

outcomes

Causal models

Causal models encode what we know about our experiment.

txhistory

-omics

gender

imaging

clinic

outcomes

Causal models

Not just what is measured, but what is not measured.

Not just arrows that are there, but arrows that are not there.

preference

Causal models

Causal models should make us uncomfortably aware ofhow little we know and/or how little we measured.

Causal interventions

How would be modify the experiment in an ideal world?

Examples:- Everyone (no one) gets this seasons flu vaccine

- E.g., average treatment effect- Everyone not contraindicated gets flu vaccine

- E.g., treatment rule- Everyone gets slightly higher odds of receiving vaccine

- E.g., incremental propensity score intervention- Everyone gets a slightly higher dose of drug than they

otherwise would receive- E.g., stochastic interventions

Identifiability

What do we need to assume to estimate causal parameters using observed data?

Average treatment effect: E[Y(1) – Y(0)]

CF data assumptions: Conditional exchangeability(Y(1), Y(0)) A | W

Observed data assumptions: Positivity0 < P(A = 1 | W) < 1

Statistical parameter

IF we can use the observed data to draw causal inference, how? What do we estimate?

E[Y(1) – Y(0)] =

(E[Y | A = 1, W=w] – E[Y | A = 0, W = w]) P(W= w) �

subgroup-specific effect subgroup size

6. Estimate statistical parameter with robust, pre-specified

approach.

Estimation

Identification expresses our counterfactual quantities of interestas quantities defined using the observed data.- This required causal assumptions; cannot be relaxed for free.

This is certainly progress, we can answer our scientific questions of interest using the observed data!

Estimation

Practitioners make statistical assumptions of varying degrees to tackle the resulting estimation/inference problem.

- Most of these are verifiable and thus unnecessary (except for convenience).

E[Y(1) – Y(0)] =

(E[Y | A = 1, W=w] – E[Y | A = 0, W = w]) P(W= w) �

Linear/logistic regression?

Estimation

We can instead use modern statistical learning to reduce the risk of misleading conclusions due to inappropriate statistical assumptions.

E[Y(1) – Y(0)] =

(E[Y | A = 1, W=w] – E[Y | A = 0, W = w]) P(W= w) �

Machine learning?

Machine learning

Other identifications lead to alternative estimation strategies.- E.g., inverse probability of treatment weighting, propensity

score matching

Many estimators are built on an estimate of - “Outcome regression” = E[Y | A, W]

- G-computation, standardization- “Propensity score” = P(A | W)

- IPTW, matching- Or both (doubly robust)

Machine learning

How can we decide a priori how to estimate these quantities?

Many choices available for fitting regressions:- parametric regression (logistic, linear, splines) - additive models, partially linear additive models- machine learning, deep learning 😍😍

The best approach depends on the truth! What to do?

Regression stacking (super learning) particularly appealing.- Pre-specified, objective competition between regression estimators- All types of estimators can be included- Oracle inequalities endow some optimality to procedure

Machine learning

Regardless, of your learning approach, there is a challenge associated with using machine learning to draw statistical inference:

How should we select tuning parameters?

Machine learning

Cross-validation selects tuning parameters that are best for estimating e.g., E[Y | A, W].- But we need a good estimate of E(E[Y | A = a, W])

This generally results in estimators with too much bias.- Undersmooth? Difficult in practice.

Doubly-robust methods generally allow for tuning parameter selection using cross-validation.- Can be used in fully pre-specified analyses!

Current research topics

Doubly robust inference- van der Laan (2014) Int. J. Biost.; Benkeser et al (2017) Biometrika

“Double machine learning”- Zheng et al (2011) Targeted Learning; Chernozhukov (2017) Econom. J.

Making bootstrap “work” for ML estimators- Wager and Athey (2015); Cai et al (2019+) arxiv: 1905.10299

Counterfactual fairness in ML- Kusner et al (2017) NIPS Proc.

Methods

Are the methods robust?Methods

Machine learning

Causal Inference and the Role of Machine Learning...“Causal Inference and the Role of Machine...

Documents

Generalizability in Causal Inference - Carlos Cinelli and Bareinboim...Cinelli, Bareinboim, SoCal 2019 - Generalizability in Causal Inference Causal inference: causal models Inputs:

Statistics and Causal Inference Author(s): Paul W. Holland ...blei/fogm/2020F/readings/Holland...Statistics and Causal Inference PAUL W. HOLLAND* Problems involving causal inference

Lecture 1: Introduction, Regressions and Causal Inference 1.pdf · Lecture 1: Introduction, Regressions and Causal Inference January 10, 2016. Introduction Regressions Causal Inference

Efficient non-parametric inference on the causal effects of ...nhejazi/present/2020_wnar...with M. van der Laan, H. Janes, P. Gilbert, D. Benkeser WNAR student paper competition presentation

Bayesian Causal Inference: A Tutorial · Bayesian Causal Inference: A Tutorial Fan Li Department of Statistical Science Duke University ... I Causal inference under the potential

Bayesian Causal Inference - uni-muenchen.de...from causal inference have been attracting much interest recently. [HHH18] propose that causal [HHH18] propose that causal inference stands

CompSci 295, Causal Inference

Causal Inference with Noisy and Missing Covariates via ...papers.nips.cc/paper/7924-causal-inference-with-noisy...Causal Inference with Noisy and Missing Covariates via Matrix Factorization

Principles of Causal Inference

Origo: causal inference by compression · 2018. 6. 29. · Origo: causal inference by compression 287 – a practical framework for causal inference based on MDL, – a causal inference

Causal inference

Machine Learning and Causal Inference

Representation Learning for Causal Inference AAAI …cobweb.cs.uga.edu/~shengli/Docs/AAAI-20-Causal-Inference...Representation Learning for Causal Inference Sheng Li1, Liuyi Yao2,

Causal Inference in Epidemiology

Causality & causal inference - TeachEpi

Module . s: Causal Inference - edX€¦ · CAUSAL INFERENCE AND IMPACT EVALUATION Causal inference and impact evaluation is all about attributing a change in an outcome of interest

Clash of Causal Inference Techniques

Causal inference using regression on the treatment variablegelman/arm/chap9.pdf · CHAPTER 9 Causal inference using regression on the treatment variable 9.1 Causal inference and predictive

Causal Inference Woth Observational Data

Causal inference in data science