Upload
rashad
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Learning Causal Structure from Observational and Experimental Data. Richard Scheines Carnegie Mellon University. Causation , Statistics , and Experiments. Graphical Causal Models. Galileo Galilei. Francis Bacon. Udny Yule. Charles Spearman. Sewall Wright. Sir Ronald A. Fisher. - PowerPoint PPT Presentation
Citation preview
1
Learning Causal Structure from
Observational and Experimental Data
Richard Scheines
Carnegie Mellon University
Causation, Statistics, and Experiments
2
Francis Bacon
Galileo Galilei
Sewall Wright
Trygve Haavelmo
Charles Spearman
Udny Yule
Sir Ronald A. Fisher
Jerzy Neyman
1500 1600 ….. …… 1900 1930 1960
1990
Graphical
Causal Models
Potential
Outcomes
3
Causal Graph G = {V,E} Each edge X Y represents a direct causal claim:
X is a direct cause of Y relative to V
Causal Graphs
Years of Education
Income
IncomeSkills and Knowledge
Years of Education
4
Causal Markov Axiom Acyclicity
d-separation criterion
Independence OracleCausal Graph
Z X Y1
Z _||_ Y1 | X Z _||_ Y2 | X
Z _||_ Y1 | X,Y2 Z _||_ Y2 | X,Y1
Y1 _||_ Y2 | X Y1 _||_ Y2 | X,ZY2
Bridge Principles: Causal Graph over V Constraints on P(V)
5
Faithfulness
Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G.
Revenues = aRate + cEconomy + eRev.
Economy = bRate + eEcon.
Faithfulness: a ≠ -bcTax Revenues
Economyc
ba
Tax Rate
6
Faithfulness
Gene AGene B
Protein 24
++
- By evolutionary design:
Gene A _||_ Protein 24
Air
Temp Core Body
Temp
Homeostatic
Regulator
By evolutionary design:
Air temp _||_ Core Body Temp
Sampling Rate vs. Equilibration rate
7
Causal Structure Association
TV Obesity
TV Obesity
ObesityTV
C
TV _||_ Obesity
TV _||_ Obesity
TV _||_ Obesity
8
Sweaters On
Room Temperature
Pre-experimental SystemPost
Modeling Ideal Interventions
Interventions on the Effect
9
Modeling Ideal Interventions
SweatersOn
Room Temperature
Pre-experimental SystemPost
Interventions on the Cause
10
Interventions & Causal GraphsModel an ideal intervention by adding an “intervention” variable
outside the original system as a direct cause of its target.
Education Income Taxes Pre-intervention graph
Intervene on Income
“Soft” Intervention Education Income Taxes
I
“Hard” Intervention Education Income Taxes
I
11
Association underdetermines Causal Structure
TV Obesity
TV Obesity
ObesityTV
C
TV _||_ Obesity
TV _||_ Obesity
TV _||_ Obesity
Spurious Association
12
Randomization Association = Causation
TV Obesity
TV Obesity
ObesityTV
C
TV _||_ Obesity
TV _||_ Obesity
TV _||_ Obesity
Randomizer
Randomizer
Randomizer
13
Randomization Association = Causation
Treatment _||_ Response
Treatment ResponseRandomizer
U
TreatmentAssignment
Treatment _||_ Response | Dropout = no
TreatmentResponse
Randomizer
U
Dropout
14
Randomization Association = Causation
Treatment _||_ Response
Treatment ResponseRandomizer TreatmentAssignment
Belief
15
Experimental Control & Statistical Control
X3 _||_ X1 | CX3 _||_ X1 | C(set)
Statistically control for CExperimentally control for C
X1 X3
C
Randomizer
X3 _||_ X1 | MX3 _||_ X1 | M(set)
Statistically control for MExperimentally control for M
X1 X3
M
Randomizer
16
Experimental Control ≠ Statistical Control
X3 _||_ X1 | M(set)
Statistically control for MExperimentally control for M
X1 X3
M
Randomizer
UX3 _||_ X1 | M
X3 _||_ X1 | M(set)
Statistically control for MExperimentally control for M
Randomizer
X3 _||_ X1 | M
X1 X3
M U2U1
17
Causal Model(V)
• X Y Z• Structural Eqs.(V) or
CPT(V)
Experimental Setup(V)
• V = {O, M}• P(M)
Manipulated Causal ModelM(V)
• X Y Z
• Structural Eqs.M(V) or CPTM (V)
I
PM(V)
Data
Sampling
P(V) = f(Causal Model(V), Experimental Setup(V))
18
Experimental Setup(V)
• V = {O, M}• P(M)
PM(V)
Data
StatisticalInference
Discovery Algorithm
Equivalence Class of Causal Structures
Causal Discovery
General Assumptions- Markov, Faithfulness- Linearity- Gaussianity- Acyclicity- Etc.
19
Causal Discoveryfrom Passive Observation
• PC, GES Patterns (Markov equivalence class - no latent confounding)
• FCI PAGs (Markov equivalence - including confounders and selection bias)
• CCD Linear cyclic models (no confounding)
• BPC Linear latent variable models
• Lingam unique DAG (no confounding – linear non-Gaussian – faithfulness not
needed)
• LVLingam set of DAGs (confounders allowed)
• CyclicLingam set of DGs (cyclic models, no confounding)
• Non-linear additive noise models unique DAG
20
Causal Discoveryfrom Manipulations/Interventions
• Do(X=x) : replace P(X | parents(X)) with P(X=x) = 1.0
• Randomize(X): (replace P(X | parents(X)) with PM(X), e.g., uniform)
• Soft interventions (replace P(X | parents(X)) with PM(X | parents(X), I), PM(I))
• Simultaneous interventions
• Sequential interventions
• Sequential, conditional interventions
• Time sensitive interventions
• Shock and run: Set X at time t, and then let the system run
• Clamp : Set X at time t, and hold it fixed until time t + D
What sorts of manipulation/interventions have been studied?
X Y
21
Causal Discoveryfrom Manipulations/Interventions
Simultaneous Interventions Destroy Information
Experimental Setup
Randomize(X,Y) independently
PM(V) X _||_ Y
X Y
Equivalence Class
X Y
X Y
X Y
X Y
X Y
X Y
X Y
22
Causal Discoveryfrom Manipulations/Interventions
Simultaneous Interventions Destroy Information, but:
Sequence of single interventions over N variables,
N-1 experiments are needed to guarantee causal identification
Sequence of simultaneous interventions: 2 log(N) + 1
23
Causal Discoveryfrom Manipulations/Interventions
Equivalence class oddities
X Y
True Model Experimental Setup
Randomize(Y)
PM(V) X _||_ YX Y
I
24
Causal Discoveryfrom Manipulations/Interventions
Equivalence class oddities
Experimental Setup
Randomize(Y)
PM(V) X _||_ Y
X Y
Equivalence Class
X Y
X Y
X Y
X Y
25
Causal Discoveryfrom Manipulations/Interventions
Equivalence class oddities
Experimental Setup
Randomize(X,Y) independently
PM(V) X _||_ Z
Equivalence Class
• X is an ancestor of Z
• X has a path to Z not through Y
26
Issues
• Efficiently representing a wider array of information relevant to
causal structure discovery, and then efficiently combining it to
maximally constrain the possible explanations of data
• Rate of reaching equilibrium vs. rate of sampling
• Transportability
• Constructing appropriate variables from raw measurements
• High dimensionality