54
Learning representations for counterfactual inference Fredrik D. Johansson* 2 , Uri Shalit* 1 , David Sontag 1 *Equal Contribution 2 1 NIPS 2016 Deep Learning Symposium December 2016

Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningrepresentationsforcounterfactualinference

FredrikD.Johansson*2,UriShalit*1,DavidSontag1*EqualContribution

2

1NIPS2016DeepLearningSymposiumDecember2016

Page 2: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Talktodayabouttwopapers

•FredrikD.Johansson,UriShalit,DavidSontag“LearningRepresentationsforCounterfactualInference”ICML2016

•UriShalit,FredrikD.Johansson,DavidSontag“Estimatingindividualtreatmenteffect:generalizationboundsandalgorithms”arXiv:1606.03976

Code:https://github.com/clinicalml/cfrnet

Page 3: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Causalinferencefromobservationaldata•Patient“Anna”comesinwithhypertension

• Asian,54,historyofdiabetes,bloodpressure150/95,…•Whichofthetreatments𝑡 willcauseAnnatohavelowerbloodpressure?• Calciumchannelblocker(𝑡 = 1)• ACEinhibitor(𝑡 = 0)

•Datasetofobservationaldatafrommanypatients:medications,bloodtests,pastdiagnoses,demographics…

Page 4: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Causalinferencefromobservationaldata

Howtobestuseobservationaldata for

individual-levelcausalinference?

Page 5: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Causalinferencefromobservationaldata:Jobtraining•1,000unemployedpersons• Jobtrainingprogramwithcapacityof100

• Training(𝑡 = 1)• Notraining(𝑡 = 0)

•Whoshouldgetjobtraining?• Forwhichpersonswilljobtraininghavethemostimpact?

•Observationaldataaboutthousandsofpeople:jobhistory,jobtraining,education,skills,demographics…

Page 6: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

•Datasetoffeatures,actionsandoutcomes

•Wedonotcontroltheactions•Wedonotknowthemodelgeneratingtheactions

Observationaldata

Page 7: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Causalinferencefromobservationaldataandreinforcementlearning

•Robotonthesideline,learningbyobservingotherrobotsplayingrobotfootball

•Sideline-robotdoesnotknowtheplaying-robots’internalmodel

•Formofoff-policylearning,learningfromdemonstration

Page 8: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 9: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 10: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

•Patient“Anna”comesinwithhypertension• Asian,54,historyofdiabetes,bloodpressure150/95,…

•Whichofthetreatments𝑡willlowerAnna’sbloodpressure?• Calciumchannelblocker(𝑡 = 1)• ACEinhibitor(𝑡 = 0)

•Datasetofobservationaldatamedications,bloodtests,pastdiagnoses,demographics…

Causalinferencefromobservationaldata:Medication

Buildaregressionmodelfrompatientfeaturesandtreatmentdecisionstobloodpressure

Page 11: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

predictedBP

predictedBP

• Buildregressionmodelfrompatientfeaturesandtreatmentdecisiontobloodpressure(BP)usingourobservationaldata

• Input:

• Compare

Regressionmodeling

Anna’sfeatures𝑥 𝑡 = 1

Anna’sfeatures𝑥

Output:

=?

𝑡 = 0

Page 12: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑙𝑜𝑠𝑠(ℎ 𝑥, 𝑡 , 𝑦)… …

𝑡treatment

𝑥features

Regressionmodeling

Page 13: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝒍𝒐𝒔𝒔(𝒉 𝒙, 𝒕 , 𝒚)… …

𝑡treatment

𝑥features

Regressionmodeling

Page 14: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑙𝑜𝑠𝑠(ℎ 𝑥, 𝑡 , 𝑦)… …

𝒕treatment

𝑥features

Regressionmodeling

Page 15: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Notsupervisedlearning!

•Thisisnotaclassicsupervisedlearningproblem•Supervisedlearningisoptimizedtopredictoutcome,nottodifferentiatetheinfluenceof𝑡 = 1 vs.𝑡 = 0

•Whatifourhigh-dimensionalmodelthrewawaythefeatureoftreatment𝑡?

•Maybethere’sconfounding:youngerpatientstendtogetmedication𝑡 = 1olderpatientstendtogetmedication𝑡 = 0

Page 16: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Potentialoutcomes(Rubin&Neyman)Foreverysample𝑥 ∈ 𝒳,andtreatment𝑡 ∈ {0,1},thereisapotentialoutcome𝑌=|𝑥

Bloodpressurehadtheyreceivedtreatment1

Bloodpressurehadtheyreceivedtreatment0 𝑌?|𝑥

𝑌@|𝑥

Individualtreatmenteffect𝑰𝑻𝑬 𝒙 := 𝔼 𝒀𝟏 − 𝒀𝟎|𝒙Weobserveonlyonepotentialoutcome,andnotatrandom!

Page 17: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Example– patientbloodpressure(BP)

Factual(observed)set(age,gender,treatment)

BPaftermedication

(40, F, 1) 𝑌@ = 140(40, M, 1) 𝑌@ = 145(65, F, 0) 𝑌? = 170(65, M, 0) 𝑌? = 175(70, F, 0) 𝑌? = 165

Features:𝑥 = (𝑎𝑔𝑒, 𝑔𝑒𝑛𝑑𝑒𝑟),𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡: 𝑡 ∈ {0,1}

Page 18: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Example– patientbloodpressure(BP)

Factual(observed)set(age,gender,treatment)

BPaftermedication

(40, F, 1) 𝑌@ = 140(40, M, 1) 𝑌@ = 145(65, F, 0) 𝑌? = 170(65, M, 0) 𝑌? = 175(70, F, 0) 𝑌? = 165

Counterfactual set(age,gender,treatment)

BPaftermedication

(40, F, 0) 𝑌? =?(40, M, 0) 𝑌? =?(65, F, 1) 𝑌@ =?(65, M, 1) 𝑌@ =?(70, F, 1) 𝑌@ =?

Features:𝑥 = (𝑎𝑔𝑒, 𝑔𝑒𝑛𝑑𝑒𝑟),𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡: 𝑡 ∈ {0,1}

Page 19: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Example– patientbloodpressure(BP)

Factual(observed)set(age,gender,treatment)

BPaftermedication

(40, F, 1) 𝑌@ = 140(40, M, 1) 𝑌@ = 145(65, F, 0) 𝑌? = 170(65, M, 0) 𝑌? = 175(70, F, 0) 𝑌? = 165

Counterfactual set(age,gender,treatment)

BPaftermedication

(40, F, 0) 𝑌? =?(40, M, 0) 𝑌? =?(65, F, 1) 𝑌@ =?(65, M, 1) 𝑌@ =?(70, F, 1) 𝑌@ =?

Features:𝑥 = (𝑎𝑔𝑒, 𝑔𝑒𝑛𝑑𝑒𝑟),𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡: 𝑡 ∈ {0,1}Predictionset

Page 20: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Factual(observed)set(age,gender,treatment)

BPaftermedication

(40, F, 1) 𝑌@ = 140(40, M, 1) 𝑌@ = 145(65, F, 0) 𝑌? = 170(65, M, 0) 𝑌? = 175(70, F, 0) 𝑌? = 165

Counterfactual set(age,gender,treatment)

BPaftermedication

(40, F, 0) 𝑌? =?(40, M, 0) 𝑌? =?(65, F, 1) 𝑌? =?(65, M, 1) 𝑌@ =?(70, F, 1) 𝑌@ =?

Predictionset

• Closelyrelatedtounsuperviseddomainadaptation

• Nosamplesfromthetestset• Can’tperformcross-validation!

Page 21: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 22: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 23: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OurWork•Newneural-netbasedrepresentationlearningalgorithmwithexplicitregularizationforcounterfactualestimation

•State-of-the-artonpreviousbenchmarkandonreal-worldcausalinferencetask

•Firsterrorboundforestimatingindividualtreatmenteffect(ITE)

Page 24: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Features𝑥

Control,𝑡 = 0Treated,𝑡 = 1

Whenisthisproblemeasier?RandomizedControlledTrials

Randomizedtreatmentàcounterfactualandfactualhaveidenticaldistributions

Page 25: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Features𝑥

Control,𝑡 = 0Treated,𝑡 = 1

Whenisthisproblemharder?Observationalstudy

Treatmentassignmentnon-randomàcounterfactualandfactualhavedifferentdistributions

Page 26: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

Features𝑥

Control,𝑡 = 0Treated,𝑡 = 1

Page 27: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

Features𝑥

RepresentationΦ(𝑥)

Control,𝑡 = 0Treated,𝑡 = 1

Page 28: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

𝑝=VWX=WY(𝑥)

Features𝑥

RepresentationΦ(𝑥)

Control,𝑡 = 0Treated,𝑡 = 1

Page 29: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

𝑝Z[\=V[](𝑥)

Features𝑥

RepresentationΦ(𝑥)

Control,𝑡 = 0Treated,𝑡 = 1

Page 30: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

𝑝^=VWX=WY(𝑥)

Features𝑥

RepresentationΦ(𝑥)

Control,𝑡 = 0Treated,𝑡 = 1

Page 31: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Learningmorebalancedrepresentations

𝑝^Z[\=V[](𝑥)

Features𝑥

RepresentationΦ(𝑥)

Control,𝑡 = 0Treated,𝑡 = 1

Page 32: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑙𝑜𝑠𝑠(ℎ 𝑥, 𝑡 , 𝑌=)… …

𝑡treatment

𝑥features

NaïveNeuralNetworkforestimatingindividualtreatmenteffect(ITE)

Page 33: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑙𝑜𝑠𝑠(ℎ Φ, 𝑡 , 𝑌=)… …Φ

𝑡

RepresentationΦ

Predictionℎ

𝑡treatment

𝑥features

VanillaNeuralNetworkforCounterfactualRegression(CFR)

Page 34: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

BalancingNeuralNetworkforCounterfactualRegression(CFR)

𝑙𝑜𝑠𝑠(ℎ Φ, 𝑡 , 𝑌=)… …Φ

𝑡

𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

RepresentationΦ

𝑡treatment

𝑥features

Predictionℎ

Page 35: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

RepresentationΦ

Predictionℎ

𝑑𝑖𝑠𝑡 𝑝^=VWX=WY, 𝑝^Z[\=V[] :

MMDdistance(Gretton etal.2012)Wassersteindistance(Villani2008,Cuturi 2013)

Page 36: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

RepresentationΦ

Predictionℎ

InspiredbyDomainAdversarialNetworks(Ganin etal.,2016):

(source domain, target domain) à(treated population, control population)

𝑑𝑖𝑠𝑡 𝑝^=VWX=WY, 𝑝^Z[\=V[] :

MMDdistance(Gretton etal.2012)Wassersteindistance(Villani2008,Cuturi 2013)

Page 37: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 38: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 39: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

EvaluatingcounterfactualinferenceTrain-testparadigmbreaksNoobservationsfromthecounterfactual“test”setCan’tdocross-validationforhyper-parameterselection

1)Simulateddata:IHDP(Hill,2011)2)Realdata:NationalSupportedWorkstudy(LaLonde,1986,Todd&Smith2005)Theeffectofjobtrainingonemployment andincomeObservationalstudywitharandomizedcontrolledtrialsubset

Page 40: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

EvaluatingcounterfactualinferenceTrain-testparadigmbreaksNoobservationsfromthecounterfactual“test”setCan’tdocross-validationforhyper-parameterselection

1)Simulateddata:IHDP(Hill,2011)2)Realdata:NationalSupportedWorkstudy(LaLonde,1986,Todd&Smith2005)Theeffectofjobtrainingonemployment andincomeObservationalstudywitharandomizedcontrolledtrialsubset3212samples,8features incl.educationandpreviousincome

Page 41: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Evaluatingmodelswithrandomizedcontrolledtrialsdata• Wecan’tdirectlyevaluateindividualtreatmenteffect(ITE)errorbecauseweneverseethecounterfactual

• EveryITEestimatorimpliesapolicy𝐼𝑇𝐸c 𝑥 = 𝑓(𝑥)

Policy𝜋f,g:𝒳 → {0,1}Treatallpersons𝑥with𝑓 𝑥 > 𝜆, forthreshold𝜆

• Everypolicy𝜋hasapolicy-value:

𝔼 𝑌@ 𝜋 𝑥 = 1 𝑝 𝜋 = 1 + 𝔼 𝑌? 𝜋 𝑥 = 0 𝑝 𝜋 = 0

Page 42: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

RandomizedControlledTrial Policy𝜋

Agreement

Evaluatingmodelperformanceusingrandomizeddata(off-policyevaluation)

Control,𝑡 = 0Treated,𝑡 = 1

Policyvalue:𝔼 𝑌@ 𝜋 𝑥 = 1 𝑝 𝜋 = 1 + 𝔼 𝑌? 𝜋 𝑥 = 0 𝑝 𝜋 = 0

Page 43: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

• NationalSupportedWork:randomizedtrialembeddedinanobservationalstudy• Policyriskestimatedonrandomizedsubsample• CFR-2-2:ourmodel,with2 layersbeforeΦand2layersafterΦ

Method Policyrisk(std)

ℓ@-reg. logisticregression 0.23±0.00BART(Chipman,George &McCulloch,2010) 0.24±0.01Causalforests(Wager& Athey,2015) 0.17±0.006CFR-2-2 Vanilla 0.16±0.02CFR-2-2Wasserstein 0.15±0.02CFR-2-2MMD 0.13±0.02

Experimentalresults– NationalSupportedWorkStudy

Loweris

better

Page 44: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Experimentalresults– NationalSupportedWorkStudy

Loweris

better

CausalforestCFR2-2VanillaCFR2-2MMDRandomPolicy

Page 45: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 46: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

OutlineBackgroundModelExperimentsTheory

Page 47: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Theoryofcausaleffectinference

•Standardresultsinstatistics:asymptoticrateofconvergencetotrueaverageeffect•Assumptions:weknowtruemodel(consistency)

•Ourresult:generalizationerrorboundforindividual-levelinference•Assumptions:truemodellieswithinlargemodelfamily,e.g.boundedLipschitzfunctions

Page 48: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Theorem (informal) • Let 𝑌m=

^,n(𝑥) = ℎ(Φ 𝑥 , 𝑡) for 𝑡 = 0,1• 𝐼𝑇𝐸c ^,n(𝑥):= 𝑌m@

^,n(𝑥) − 𝑌m?^,n(𝑥)

• If “strong ignorability” holds, and if 𝑑𝑖𝑠𝑡 is “nice” with respect to the true potential outcomes 𝑌? and 𝑌@and the representation Φ, then for all normalized Φand ℎ:

𝔼o 𝑒𝑟𝑟𝑜𝑟 𝐼𝑇𝐸c ^,p(𝑥) ≤2 s 𝔼o,= 𝑒𝑟𝑟𝑜𝑟 𝑌=t

^,p(𝑥) + 𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

Page 49: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

• Let 𝑌m=^,n(𝑥) = ℎ(Φ 𝑥 , 𝑡) for 𝑡 = 0,1

• 𝐼𝑇𝐸c ^,n(𝑥):= 𝑌m@^,n(𝑥) − 𝑌m?

^,n(𝑥)

• If “strong ignorability” holds, and if 𝑑𝑖𝑠𝑡 is “nice” with respect to the true potential outcomes 𝑌? and 𝑌@and the representation Φ, then for all normalized Φand ℎ:

Theorem (informal)

ExpectederrorinestimatingITE

𝔼o 𝑒𝑟𝑟𝑜𝑟 𝐼𝑇𝐸c ^,p(𝑥) ≤2 s 𝔼o,= 𝑒𝑟𝑟𝑜𝑟 𝑌=t

^,p(𝑥) + 𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

Page 50: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Theorem (informal)

“supervisedlearninggeneralizationerror”

𝔼o 𝑒𝑟𝑟𝑜𝑟 𝐼𝑇𝐸c ^,p(𝑥) ≤2 s 𝔼o,= 𝑒𝑟𝑟𝑜𝑟 𝑌=t

^,p(𝑥) + 𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

• Let 𝑌m=^,n(𝑥) = ℎ(Φ 𝑥 , 𝑡) for 𝑡 = 0,1

• 𝐼𝑇𝐸c ^,n(𝑥):= 𝑌m@^,n(𝑥) − 𝑌m?

^,n(𝑥)

• If “strong ignorability” holds, and if 𝑑𝑖𝑠𝑡 is “nice” with respect to the true potential outcomes 𝑌? and 𝑌@and the representation Φ, then for all normalized Φand ℎ:

Page 51: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Theorem (informal)

Distancebetween𝛷-induceddistributions

𝔼o 𝑒𝑟𝑟𝑜𝑟 𝐼𝑇𝐸c ^,p(𝑥) ≤2 s 𝔼o,= 𝑒𝑟𝑟𝑜𝑟 𝑌=t

^,p(𝑥) + 𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

• Let 𝑌m=^,n(𝑥) = ℎ(Φ 𝑥 , 𝑡) for 𝑡 = 0,1

• 𝐼𝑇𝐸c ^,n(𝑥):= 𝑌m@^,n(𝑥) − 𝑌m?

^,n(𝑥)

• If “strong ignorability” holds, and if 𝑑𝑖𝑠𝑡 is “nice” with respect to the true potential outcomes 𝑌? and 𝑌@and the representation Φ, then for all normalized Φand ℎ:

Page 52: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

• Let 𝑌m=^,n(𝑥) = ℎ(Φ 𝑥 , 𝑡) for 𝑡 = 0,1

• 𝐼𝑇𝐸c ^,n(𝑥):= 𝑌m@^,n(𝑥) − 𝑌m?

^,n(𝑥)

Theorem (informal)

Weminimizeupperboundwithrespectto𝛷andℎ

𝔼o 𝑒𝑟𝑟𝑜𝑟 𝐼𝑇𝐸c ^,p(𝑥) ≤2 s 𝔼o,= 𝑒𝑟𝑟𝑜𝑟 𝑌=t

^,p(𝑥) + 𝑑𝑖𝑠𝑡(𝑝^=VWX=WY, 𝑝^Z[\=V[])

Page 53: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

Summary•EstimatingIndividualTreatmentEffectisdifferentfromsupervisedlearning• Bearsstrongconnectionstodomainadaptation

•WegivenewrepresentationlearningalgorithmsforestimatingIndividualTreatmentEffect• UsetheMMDandWassersteindistributionaldistances

• Experimentsshowourmethodiscompetitiveorbetterthanstate-of-the-art

•Wegiveanewerrorbound forestimatingIndividualTreatmentEffect

Page 54: Learning representations for counterfactual inference...Learning representations for counterfactual inference Fredrik D. Johansson*2, Uri Shalit*1, David Sontag1 *Equal Contribution

• FredrikD.Johansson,UriShalit,DavidSontag“LearningRepresentationsforCounterfactualInference”ICML2016

• UriShalit,FredrikD.Johansson,DavidSontag“Estimatingindividualtreatmenteffect:generalizationboundsandalgorithms”arXiv:1606.03976

Acknowledgments:JustinChiu(FAIR),MarcoCuturi (ENSAE/CREST),JenniferHill(NYU),Aahlad Manas (NYU),Sanjong Misra (U.Chicago),EstebanTabak (NYU)andStefanWager(Columbia)

Thankyou!