Bob Weigelbobweigel.net/projects/images/JMMcCracken_defense_slides.pdfExploratory Causal Analysis in...

Preview:

Citation preview

Exploratory Causal Analysis in Bivariate Time Series Data

AbstractMany scientific disciplines rely on observational data of systems for which it is difficult

(or impossible) to implement controlled experiments and data analysis techniques are

required for identifying causal information and relationships directly from observational

data. This need has lead to the development of many different time series causality

approaches and tools including transfer entropy, convergent cross-mapping (CCM), and

Granger causality statistics.

A practicing analyst can explore the literature to find many proposals for identifying

drivers and causal connections in times series data sets, but little research exists of how

these tools compare to each other in practice. This work introduces and defines

exploratory causal analysis (ECA) to address this issue. The motivation is to provide a

framework for exploring potential causal structures in time series data sets.

J. M. McCracken

Defense talk for PhD in Physics, Department of Physics and Astronomy

10:00 AM November 20, 2015; Exploratory Hall, 3301

Advisor: Dr. Robert Weigel; Committee: Dr. Paul So, Dr. Tim Sauer

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 1 / 50

Exploratory Causal Analysis in Bivariate TimeSeries Data

J. M. McCrackenDepartment of Physics and AstronomyGeorge Mason University, Fairfax, VA

November 20, 2015

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 2 / 50

Outline1. Motivation

2. Causality studies

3. Data causality

4. Exploratory causal analysis

5. Making an ECA summaryTransfer entropy differenceGranger causality statisticPairwise asymmetric inferenceWeighed mean observed leaningLagged cross-correlation difference

6. Computational tools for the ECA summary

7. Empirical examplesCooling/Heating System DataSnowfall Data

8. Times series causality as data analysis

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 3 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Motivation

Data

Consider two sets of time series measurements, X and Y.

Question

Is there evidence that X “drives” Y?

We were looking for a data analysis approach, i.e., we were looking foranalysis tools that

I worked with time series data,

I had straightforward, preferably well-established, interpretations,

I were reliable,

I and did not require studying the (vast) philosophical causalityliterature.

Essentially, we were looking for a “plug-and-play” analysis tool.

This work stems from our search for such a tool.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 4 / 50

Causality studies

The study of causality is as old as science itself

I Modern historians credit Aristotle with both the first theory ofcausality (“four causes”) and an early version of the scientific method

I The modern study of causality is broadly interdisciplinary; far toobroad to review in a short talk.

Illari and Russo’s textbook1provides an overview of causality studies

1Illari, P., & Russo, F. (2014). Causality: Philosophical theory meets scientific practice. Oxford University Press.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 5 / 50

Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

Foundational causality “Is a cause required to precede an effect?” or “Howare causes and effects related in space-time?”

Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”

2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50

Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

Foundational causality “Is a cause required to precede an effect?” or“How are causes and effects related in space-time?”

Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”

2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50

Towards a taxonomy of causal studies

Paul Holland identified four types of causal questions2:

I the ultimate meaningfulness of the notion of causality

I the details of causal mechanisms

I the causes of a given effect

I the effects of a given cause

Foundational causality “Is a cause required to precede an effect?” or “Howare causes and effects related in space-time?”

Data causality “Does smoking cause lung cancer?” or “Are trafficaccidents caused by rain storms?”

2Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396), 945-960.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 6 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Data causality

Data causality is data analysis to draw causal inferences

Approaches to data causality studies include

I design of experiments (e.g., Fisher randomization)

I potential outcomes (Rubin’s counterfactuals)

I directed acyclic graphs (DAGs) with structural equation models(SEMs); popularized by Pearl as “structural causal models (SCMs)”

I time series causality

There is no consensus on the best approach to data causality

Many authors consider their favored approach to be the exclusive correctapproach.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 7 / 50

Time series causality

Time series causality is data causality with time series data

Approaches to time series causality can be roughly divided into fivecategories,

I Granger (model based approaches)

I Information-theoretic

I State space reconstruction (SSR)

I Correlation

I Penchant

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50

Time series causality

Time series causality is data causality with time series data

Approaches to time series causality can be roughly divided into fivecategories,

I Granger (model based approaches)

I Information-theoretic

I State space reconstruction (SSR)

I Correlation

I Penchant

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 8 / 50

Exploratory causal analysisLanguage

Exploring causal structures in data sets is distinct from confirming causalstructures in data sets.

Causal language used in ECA should not be conflated with other typicaluses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms withdefinitions unrelated to their common, everyday definitions.

→ and ← will be used as shorthand for causal statements,e.g., A drives B will be written as A→ B.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50

Exploratory causal analysisLanguage

Exploring causal structures in data sets is distinct from confirming causalstructures in data sets.

Causal language used in ECA should not be conflated with other typicaluses; i.e., “cause”, “effect”, “drive”, etc. are used as technical terms withdefinitions unrelated to their common, everyday definitions.

→ and ← will be used as shorthand for causal statements,e.g., A drives B will be written as A→ B.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 9 / 50

Exploratory causal analysisAssumptions

A cause always precedes an effect.

This assumption is required for the operational definitions of causality.

A driver may be present in the data being analyzed.

This assumption may lead to issues of confounding.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 10 / 50

Exploratory causal analysisECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool

Consider a time series pair (X,Y),

ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.

ECA summary

The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50

Exploratory causal analysisECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool

Consider a time series pair (X,Y),

ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.

ECA summary

The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50

Exploratory causal analysisECA summary vector approach

We will not favor a specific operational definition of causality ⇒ we do notfavor any particular tool

Consider a time series pair (X,Y),

ECA summary vector

Define a vector ~g where each element gi is defined as either 0 if X→ Y, 1if X← Y, or 2 if no causal inference can be made. The value of each gicomes from a specific time series causality tool.

ECA summary

The ECA summary is either X→ Y, Y → X, or undefined, withgi = 0 ∀gi ∈ ~g ⇒ X→ Y and gi = 1 ∀gi ∈ ~g ⇒ Y → X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 11 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

g2

g3

g4

g5

transfer entropy differenceGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

g2

g3

g4

g5

transfer entropy differenceGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

• g1

g2

g3

g4

g5

transfer entropy difference information-theoreticGranger log-likelihood statisticspairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

• g2

g3

g4

g5

transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI)average weighted mean observed leaninglagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

g2

• g3

g4

g5

transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaninglagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

g2

g3

• g4

g5

transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaning penchantlagged cross-correlation difference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Making an ECA summary

Our focus is on time series, so each causal inference gi ∈ ~g will be drawnfrom a tool in one of each of the five time series causality categories.

g1

g2

g3

g4

• g5

transfer entropy difference information-theoreticGranger log-likelihood statistics Grangerpairwise asymmetric inference (PAI) SSRaverage weighted mean observed leaning penchantlagged cross-correlation difference correlation

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 12 / 50

Transfer entropy (g1)Shannon entropy

The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,

HX = −NX∑n=1

P(X = Xn) log2 P(X = Xn)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50

Transfer entropy (g1)Shannon entropy

The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,

HX = −NX∑n=1

P(X = Xn) log2 P(X = Xn)

P(X = Xn) is the probability that X takes the specific value Xn

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50

Transfer entropy (g1)Shannon entropy

The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,

HX = −NX∑n=1

P(X = Xn) log2 P(X = Xn)

The sum is over all possible values of Xn; n = 1, 2, . . . ,NX

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50

Transfer entropy (g1)Shannon entropy

The uncertainty that a random variable X takes some specific value Xn isgiven by the Shannon (or information) entropy,

HX = −NX∑n=1

P(X = Xn) log2 P(X = Xn)

The base of the logarithm sets the entropy units, which is “bits” here

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 13 / 50

Transfer entropy (g1)Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1

completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50

Transfer entropy (g1)Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1

completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50

Transfer entropy (g1)Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1

completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50

Transfer entropy (g1)Shannon entropy example

Binary example (to help with intuition)

Consider a coin C that take the value H with probability pH and T withprobability pT . The Shannon entropy is

HC = − (pH log2 pH + pT log2 pT )

completely uncertain of outcomeFair coin ⇒ pH = pT = 0.5⇒ HC = 1

completely certain of outcomeAlways heads (or tails) ⇒ pH(T ) = 0, pT (H) = 1⇒ HC = 0

(Entropy calculations almost always assume 0 log2 0 := 0.)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 14 / 50

Transfer entropy (g1)Mutual information

A pair of random variables (X,Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

=

NX∑n=1

NY∑m=1

P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)

P(X = Xn)P(Y = Ym)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50

Transfer entropy (g1)Mutual information

A pair of random variables (X,Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

=

NX∑n=1

NY∑m=1

P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)

P(X = Xn)P(Y = Ym)

P(X = Xn,Y = Ym) is the probability that X takes the specific value Xn

and Y takes the specific value Ym

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50

Transfer entropy (g1)Mutual information

A pair of random variables (X,Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

=

NX∑n=1

NY∑m=1

P(X = Xn,Y = Ym) log2P(X=Xn,Y=Ym)

P(X=Xn)P(Y=Ym)

If X and Y are independent, thenP(X = Xn,Y = Ym) = P(X = Xn)P(Y = Ym)⇒ IX ;Y = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50

Transfer entropy (g1)Mutual information

A pair of random variables (X,Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

=

NX∑n=1

NY∑m=1

P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)

P(X = Xn)P(Y = Ym)

The mutual information is symmetric; i.e., IX ;Y = IY ;X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50

Transfer entropy (g1)Mutual information

A pair of random variables (X,Y) have some mutual information given by

IX ;Y = HX + HY − HX ,Y

=

NX∑n=1

NY∑m=1

P(X = Xn,Y = Ym) log2P(X = Xn,Y = Ym)

P(X = Xn)P(Y = Ym)

Schreiber proposed an extension of the mutual information to measure“information flow” by making it conditional and including assumptionsabout the temporal behavior X and Y.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 15 / 50

Transfer entropy (g1)Information flow

Suppose X and Y are both Markov processes. The directed flow ofinformation from Y to X is given by the transfer entropy,

TY→X =

NX∑n=1

NY∑m=1

pn+1,n,m log2

pn+1|n,m

pn+1|n

with

I pn+1,n,m = P(X(t + 1) = Xn+1,X(t) = Xn,Y(τ) = Ym)

I pn+1|n,m = P(X(t + 1) = Xn+1|X(t) = Xn,Y(τ) = Ym)

I pn+1|n = P(X(t + 1) = Xn+1|X(t) = Xn)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 16 / 50

Transfer entropy (g1)Information flow

There is no directed information flow from Y to X if X is conditionallyindependent of Y; i.e.,

pn+1|n,m = pn+1|n ⇒ TY→X = 0

Operational causality (information-theoretic)

X causes Y if the directed information flow from X to Y is higher than thedirected information flow from Y to X; i.e.,

TX→Y − TY→X > 0 ⇒ X→ Y

TX→Y − TY→X < 0 ⇒ Y → X

TX→Y − TY→X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50

Transfer entropy (g1)Information flow

There is no directed information flow from Y to X if X is conditionallyindependent of Y; i.e.,

pn+1|n,m = pn+1|n ⇒ TY→X = 0

Operational causality (information-theoretic)

X causes Y if the directed information flow from X to Y is higher than thedirected information flow from Y to X; i.e.,

TX→Y − TY→X > 0 ⇒ X→ Y

TX→Y − TY→X < 0 ⇒ Y → X

TX→Y − TY→X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 17 / 50

Granger causality (g2)Granger’s axioms

Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50

Granger causality (g2)Granger’s axioms

Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Axiom 1

The past and present may cause the future, but the future cannot causethe past.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50

Granger causality (g2)Granger’s axioms

Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Axiom 1

The past and present may cause the future, but the future cannot causethe past.

Axiom 2

Ωn contains no redundant information, so that if some variable Z isfunctionally related to one or more other variables, in a deterministicfashion, then Z should be excluded from Ωn.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50

Granger causality (g2)Granger’s axioms

Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Granger’s definition of causality

Given some set A, Y causes X if

P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50

Granger causality (g2)Granger’s axioms

Consider a discrete universe with two time series X = Xt | t = 1, . . . , nand Y = Yt | t = 1, . . . , n, where t = n is considered the present time.All knowledge available in the universe at all times t ≤ n is denoted as Ωn.

Granger’s definition of causality

Given some set A, Y causes X if

P(Xn+1 ∈ A|Ωn) 6= P(Xn+1 ∈ A|Ωn − Y)

Granger’s original goal was to make this notion of causality “operational”.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 18 / 50

Granger causality (g2)VAR models

Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(

Xt

Yt

)=

n∑i=1

(Ai

11 Ai12

Ai21 Ai

22

)(Xt−iYt−i

)+

(ε1,t

ε2,t

)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50

Granger causality (g2)VAR models

Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(

Xt

Yt

)=

n∑i=1

(Ai

11 Ai12

Ai21 Ai

22

)(Xt−iYt−i

)+

(ε1,t

ε2,t

)

The current time step t of X and Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50

Granger causality (g2)VAR models

Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(

Xt

Yt

)=

n∑i=1

(Ai

11 Ai12

Ai21 Ai

22

) (Xt−iYt−i

)+

(ε1,t

ε2,t

)

The current time step t of X and Y is modeled as a sum of n past steps

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50

Granger causality (g2)VAR models

Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(

Xt

Yt

)=

n∑i=1

(Ai

11 Ai12

Ai21 Ai

22

) (Xt−iYt−i

)+

(ε1,t

ε2,t

)

The current time step t of X and Y is modeled as a sum of n past stepsof X and Y,

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50

Granger causality (g2)VAR models

Consider a time series pair (X,Y). Suppose there is a vectorautoregressive (VAR) model that describes the pair,(

Xt

Yt

)=

n∑i=1

(Ai

11 Ai12

Ai21 Ai

22

)(Xt−iYt−i

)+

(ε1,t

ε2,t

)

The current time step t of X and Y is modeled as a sum of n past stepsof X and Y, plus uncorrelated noise terms.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 19 / 50

Granger causality (g2)Comparison of VAR models

Consider two different VAR models for the pair (X,Y),(Xt

Yt

)=

n∑i=1

(Axx ,i Axy ,i

Ayx ,i Ayy ,i

)(Xt−iYt−i

)+

(εx ,tεy ,t

)(Xt

Yt

)=

n∑i=1

(A′xx ,i 0

0 A′yy ,i

)(Xt−iYt−i

)+

(ε′x ,tε′y ,t

)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50

Granger causality (g2)Comparison of VAR models

Consider two different VAR models for the pair (X,Y),(Xt

Yt

)=

n∑i=1

(Axx ,i Axy ,i

Ayx ,i Ayy ,i

)(Xt−iYt−i

)+

(εx ,tεy ,t

)(Xt

Yt

)=

n∑i=1

(A′xx ,i 0

0 A′yy ,i

)(Xt−iYt−i

)+

(ε′x ,tε′y ,t

)The G-causality log-likelihood statistic is defined as

FY→X = ln|Σ′xx ||Σxx |

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50

Granger causality (g2)Comparison of VAR models

Consider two different VAR models for the pair (X,Y),(Xt

Yt

)=

n∑i=1

(Axx ,i Axy ,i

Ayx ,i Ayy ,i

)(Xt−iYt−i

)+

(εx ,tεy ,t

)(Xt

Yt

)=

n∑i=1

(A′xx ,i 0

0 A′yy ,i

)(Xt−iYt−i

)+

(ε′x ,tε′y ,t

)The G-causality log-likelihood statistic is defined as

FY→X = ln|Σ′xx ||Σxx |

Covariance of X model residuals given no dependence on Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50

Granger causality (g2)Comparison of VAR models

Consider two different VAR models for the pair (X,Y),

(Xt

Yt

)=

n∑i=1

(Axx ,i Axy ,i

Ayx ,i Ayy ,i

)(Xt−iYt−i

)+

(εx ,tεy ,t

)(Xt

Yt

)=

n∑i=1

(A′xx ,i 0

0 A′yy ,i

)(Xt−iYt−i

)+

(ε′x ,tε′y ,t

)The G-causality log-likelihood statistic is defined as

FY→X = ln|Σ′xx ||Σxx |

Covariance of X model residuals given a possible dependence on Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 20 / 50

Granger causality (g2)G-causality log-likelihood statistic

If both VAR models fit (or “forecast”) the data equally well, then there isno G-causality; i.e.,

|Σ′xx | = |Σxx | ⇒ FY→X = 0

Operational causality (Granger)

X causes Y if the X-dependent forecast of Y decreases the Y modelresidual covariance (as compared to the X-independent forecast) morethan the Y-dependent forecast of X decreases the X model residualcovariance (as compared to the Y-independent forecast); i.e.,

FX→Y − FY→X > 0 ⇒ X→ Y

FX→Y − FY→X < 0 ⇒ Y → X

FX→Y − FY→X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50

Granger causality (g2)G-causality log-likelihood statistic

If both VAR models fit (or “forecast”) the data equally well, then there isno G-causality; i.e.,

|Σ′xx | = |Σxx | ⇒ FY→X = 0

Operational causality (Granger)

X causes Y if the X-dependent forecast of Y decreases the Y modelresidual covariance (as compared to the X-independent forecast) morethan the Y-dependent forecast of X decreases the X model residualcovariance (as compared to the Y-independent forecast); i.e.,

FX→Y − FY→X > 0 ⇒ X→ Y

FX→Y − FY→X < 0 ⇒ Y → X

FX→Y − FY→X = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 21 / 50

Pairwise asymmetric inference (g3)State space reconstruction

Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as

X = xt | t = 1 + (E − 1)τ, . . . , L

withxt =

(xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ

)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50

Pairwise asymmetric inference (g3)State space reconstruction

Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as

X = xt | t = 1 + (E − 1)τ, . . . , L

with

xt =

(xt , xt− τ , xt−2 τ , . . . , xt−(E−1) τ

)I τ is the delay time step

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50

Pairwise asymmetric inference (g3)State space reconstruction

Consider an embedding of the time series X = xt | t = 0, 1 . . . , L− 1, Lconstructed from delayed time steps as

X = xt | t = 1 + (E − 1)τ, . . . , L

with

xt =

(xt , xt−τ , xt−2τ , . . . , x

t−( E −1)τ

)I τ is the delay time step

I E is the embedding dimension

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 22 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

1. Find the n nearest neighbors to xt (in X), where “nearest” meanssmallest Euclidean distance, d ; i.e., d1 < d2 < . . . < dn

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

2. Create weights,w , from the nearest neighbors as

wi =e− di

d1∑nj=1 e

−djd1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

3. Construct the cross-mapped estimate of Y using the weights as

Y|X =

Yt |X =

n∑i=1

wiYti| t = 1 + (E − 1)τ, . . . , L

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

Each cross-mapped point in the estimate of Y, i.e.,

Yt |X =n∑

i=1

e− di /d1∑nj=1 e

−dj/d1Yti

depends on comparisons of

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = (xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt)

Each cross-mapped point in the estimate of Y, i.e.,

Yt |X =n∑

i=1

e−di/d1∑nj=1 e

−dj/d1Yti

depends on comparisons of the pasts of X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapping

Consider a time series pair (X,Y). The shadow manifold of X (labeled X)is constructed from the points

xt = ( xt , xt−τ , xt−2τ , . . . , xt−(E−1)τ , yt )

Each cross-mapped point in the estimate of Y, i.e.,

Yt |X =n∑

i=1

e−di/d1∑nj=1 e

−dj/d1Yti

depends on comparisons of the pasts of X and the presents of X and Y.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 23 / 50

Pairwise asymmetric inference (g3)Cross-mapped correlation

A good cross-mapped estimate is defined as one that is strongly correlatedwith the original times series. The cross-mapped correlation is

CYX =[ρ(Y,Y|X)

]2

where ρ (·) is Pearson’s correlation coefficient.

Cross-mapping interpretation

If similar histories of X (i.e., nearest neighbors in the shadow manifold)capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then thepresence (or action) of Y in the system has been recorded in X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50

Pairwise asymmetric inference (g3)Cross-mapped correlation

A good cross-mapped estimate is defined as one that is strongly correlatedwith the original times series. The cross-mapped correlation is

CYX =[ρ(Y,Y|X)

]2

where ρ (·) is Pearson’s correlation coefficient.

Cross-mapping interpretation

If similar histories of X (i.e., nearest neighbors in the shadow manifold)capably estimate Y (i.e., lead to CYX ≈ 1, or at least CYX 6= 0), then thepresence (or action) of Y in the system has been recorded in X.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 24 / 50

Pairwise asymmetric inference (g3)Cross-mapping interpretation of causality

A time series pair (X,Y) will have two cross-mapped correlations, CYX

and CXY .

Operational causality (SSR)

X causes Y if similar histories of Y estimate X better than similar historiesof X estimate Y, where the “similar histories” of one time series are usedto estimate another time series through shadow manifold nearest neighborweighting (cross-mapping); i.e.,

CYX − CXY < 0 ⇒ X→ Y

CYX − CXY > 0 ⇒ Y → X

CYX − CXY = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50

Pairwise asymmetric inference (g3)Cross-mapping interpretation of causality

A time series pair (X,Y) will have two cross-mapped correlations, CYX

and CXY .

Operational causality (SSR)

X causes Y if similar histories of Y estimate X better than similar historiesof X estimate Y, where the “similar histories” of one time series are usedto estimate another time series through shadow manifold nearest neighborweighting (cross-mapping); i.e.,

CYX − CXY < 0 ⇒ X→ Y

CYX − CXY > 0 ⇒ Y → X

CYX − CXY = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 25 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C ) − P(E |C

)

P (E |C ) is the probability of some effect E given some cause C

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

P(E |C

)is the probability of some effect E given no cause C

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

So, the penchant is the probability of an effect E given a cause C minusthe probability of that effect without the cause

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

In the psychology/medical literature, the causal penchant is known as theEells measure of causal strength or probability contrast.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

If C drives E , then it is expected that ρEC > 0.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P (E |C )− P(E |C

)

The second term, P(E |C

), can be eliminated from the penchant formula

using Bayes theorem.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P(E |C )

(1 +

P(C )

1− P(C )

)− P(E )

1− P(C )

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P(E |C )

(1 +

P(C )

1− P(C )

)− P(E )

1− P(C )

If E and C are independent, then P(E |C ) = P(E ), which implies

ρEC = P(E ) +P(E )P(C )− P(E )

1− P(C )= P(E )− P(E ) = 0

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P(E |C )

(1 +

P(C )

1− P(C )

)− P(E )

1− P(C )

Example (to help with intuition)

Consider C and E to be two fair coins, c1 and c2, being “heads”; i.e.,P(c1 = “heads ′′) = 0.5 and P(c2 = “heads ′′) = 0.5. If the coins areindependent, then

P(c2 = “heads ′′|c1 = “heads ′′) = P(c2 = “heads ′′) = 0.5⇒ ρEC = 0

If they are completely dependent then

P(c2 = “heads ′′|c1 = “heads ′′) = 1 or 0⇒ ρEC = 1 or − 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal penchant

The causal penchant ρEC ∈ [1,−1] is

ρEC = P(E |C )

(1 +

P(C )

1− P(C )

)− P(E )

1− P(C )

This formula has the additional benefit of only needing to estimate oneconditional probability from the data.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 26 / 50

Weighted mean observed leaning (g4)Causal leaning

A difference of penchants can be used to compare different cause-effectassignments (i.e., different assumptions of what should be considered acause and what should be considered an effect). The leaning is

λEC = ρEC − ρCE

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50

Weighted mean observed leaning (g4)Causal leaning

A difference of penchants can be used to compare different cause-effectassignments (i.e., different assumptions of what should be considered acause and what should be considered an effect). The leaning is

λEC = ρEC − ρCE

Leaning interpretation

If λEC > 0, then C drives E more than E drives C .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 27 / 50

Weighted mean observed leaning (g4)Usefulness of the leaning

The usefulness of the leaning depends on two things,

1. Operational definitions of C and E (called the cause-effectassignment)

2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data

The primary cause-effect assignment will be the l-standard assignment,

l-standard assignment

Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50

Weighted mean observed leaning (g4)Usefulness of the leaning

The usefulness of the leaning depends on two things,

1. Operational definitions of C and E (called the cause-effectassignment)

2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data

The primary cause-effect assignment will be the l-standard assignment,

l-standard assignment

Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50

Weighted mean observed leaning (g4)Usefulness of the leaning

The usefulness of the leaning depends on two things,

1. Operational definitions of C and E (called the cause-effectassignment)

2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data

The primary cause-effect assignment will be the l-standard assignment,

l-standard assignment

Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50

Weighted mean observed leaning (g4)Usefulness of the leaning

The usefulness of the leaning depends on two things,

1. Operational definitions of C and E (called the cause-effectassignment)

2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data

The primary cause-effect assignment will be the l-standard assignment,

l-standard assignment

Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50

Weighted mean observed leaning (g4)Usefulness of the leaning

The usefulness of the leaning depends on two things,

1. Operational definitions of C and E (called the cause-effectassignment)

2. Estimations of P(C ), P(E ), P(C |E ), and P(E |C ) from the data

The primary cause-effect assignment will be the l-standard assignment,

l-standard assignment

Consider a time series pair (X,Y). The l-standard assignment initiallyassumes the cause is the l lagged time step of X and the effect is thecurrent time step of Y; i.e., C ,E = xt−l , yt.

Probabilities will estimated using data frequency counts.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 28 / 50

Weighted mean observed leaning (g4)Leaning from the data

The cause-effect assignment must be specific if the probabilities are to beestimated with frequency counts and need to include tolerance domains toaccount for noise in the measurements.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Leaning from the data

Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as

P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Leaning from the data

Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as

P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩b

nb

na∩b is the number of times yt ∈ [a− δLy , a + δRy ] and

xt−l ∈ [b − δLx , b + δRx ] in (X,Y)

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Leaning from the data

Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as

P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb

nb is the number of times xt−l ∈ [b − δLx , b + δRx ] in X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Leaning from the data

Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as

P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb

The tolerance domains are usually considered symmetric; i.e., δLx = δRx andδLy = δRy

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Leaning from the data

Consider the time series pair (X,Y). The penchant calculation depends onthe conditional P(yt = a|xt−l = b), where a ∈ Y and b ∈ X. Thisconditional will be estimated as

P(yt ∈ [a− δLy , a + δRy ]|xt−l ∈ [b − δLx , b + δRx ]) =na∩bnb

The causal inference implied by the leaning calculations aredependent on both the cause-effect assignment and the tolerance

domains.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 29 / 50

Weighted mean observed leaning (g4)Weighted mean

Any time series pair (X,Y) will have many leanings; e.g., an l-standardassignment of C ,E = xt−l = b ± δx , yt = a± δy will have a differentleaning calculation for each xt−1 ∈ [b − δx , b + δx ] andyt ∈ [a− δy , a + δy ].

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50

Weighted mean observed leaning (g4)Weighted mean

Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50

Weighted mean observed leaning (g4)Weighted mean

Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.

Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50

Weighted mean observed leaning (g4)Weighted mean

Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.

Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.

The weighed mean observed penchant, 〈ρEC 〉w , is the weighedalgebraic mean of the observed penchants.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50

Weighted mean observed leaning (g4)Weighted mean

Consider a time series pair (X,Y) and some cause-effect assignmentC ,E for which reasonable tolerance domains have been defined.

Any penchant calculation for which the (estimated) conditionalP(E |C ) 6= 0 (or P(C |E ) 6= 0) is called an observed penchant.

The weighed mean observed penchant, 〈ρEC 〉w , is the weighed algebraicmean of the observed penchants.

The weighed mean observed leaning, 〈λEC 〉w , is the difference of theweighed mean observed penchants; i.e., 〈λEC 〉w = 〈ρEC 〉w − 〈ρCE 〉w

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 30 / 50

Weighted mean observed leaning (g4)Causal inference

Operational causality (penchant)

X causes Y if the weighted mean observed leaning is positive given acause-effect assignment (and reasonable tolerance domains) in which theassumed cause X precedes the assumed effect Y; i.e.,

〈λEC 〉w > 0 ⇒ X→ Y

〈λEC 〉w < 0 ⇒ Y → X

〈λEC 〉w = 0 ⇒ no causal inference

given C ∈ X, E ∈ Y, and C precedes E .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 31 / 50

Lagged cross-correlation difference (g5)Cross-correlation

The cross-correlation between two time series X and Y is

ρxy =E [(xt − µX ) (yt − µY )]√

σ2Xσ

2Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50

Lagged cross-correlation difference (g5)Cross-correlation

The cross-correlation between two time series X and Y is

ρxy =E[(

xt − µX

)(yt − µY )

]√σ2Xσ

2Y

Every point in X is compared to the mean of X

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50

Lagged cross-correlation difference (g5)Cross-correlation

The cross-correlation between two time series X and Y is

ρxy =E[(xt − µX )

(yt − µY

)]√σ2Xσ

2Y

Every point in Y is compared to the mean of Y

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50

Lagged cross-correlation difference (g5)Cross-correlation

The cross-correlation between two time series X and Y is

ρxy =E [(xt − µX ) (yt − µY )]√

σ2Xσ

2Y

The product of the individual variances of X and Y is used as anormalization

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50

Lagged cross-correlation difference (g5)Cross-correlation

The cross-correlation between two time series X and Y is

ρxy =E [(xt − µX ) (yt − µY )]√

σ2Xσ

2Y

Example (to help with intuition)

X = Y ⇒ ρxy =E [(xt − µX ) (yt − µY )]√

σ2Xσ

2Y

=E[(xt − µX )2

]σ2X

=σ2X

σ2X

= 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 32 / 50

Lagged cross-correlation difference (g5)Lagged cross-correlation

Consider a time series pair (X,Y). The past of Y may be compared to thepresent of X by introducing a lag l into the cross-correlation calculation,

ρxyl =E [(xt − µX ) (yt−l − µY )]√

σ2Xσ

2Y

Operational causality (correlation)

X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) ismore strongly correlated with the present of Y than the past of Y (i.e., Ylagged by l time steps) is with the present of X; i.e.,

|ρxyl | − |ρyxl | < 0 ⇒ X→ Y

|ρxyl | − |ρyxl | > 0 ⇒ Y → X

|ρxyl | − |ρyxl | = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50

Lagged cross-correlation difference (g5)Lagged cross-correlation

Consider a time series pair (X,Y). The past of Y may be compared to thepresent of X by introducing a lag l into the cross-correlation calculation,

ρxyl =E [(xt − µX ) (yt−l − µY )]√

σ2Xσ

2Y

Operational causality (correlation)

X causes Y (at lag l) if the past of X (i.e., X lagged by l time steps) ismore strongly correlated with the present of Y than the past of Y (i.e., Ylagged by l time steps) is with the present of X; i.e.,

|ρxyl | − |ρyxl | < 0 ⇒ X→ Y

|ρxyl | − |ρyxl | > 0 ⇒ Y → X

|ρxyl | − |ρyxl | = 0 ⇒ no causal inference

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 33 / 50

Computational tools for the ECA summary

Open source packages are available for some of the mentioned times seriescausality tools and others required code to be develop from scratch.

g1

g2

g3

g4

g5

Java Information Dynamics Toolkit (JIDT)Multivariate Granger Causality (MVGC) MATLAB toolbox(C++)(MATLAB)(MATLAB)

All the code is available at https://github.com/jmmccracken

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 34 / 50

Cooling/Heating System DataTime series data

Consider a time series pair (X,Y) where X are indoor temperaturemeasurements (in degrees Celsius) in a house with “experimental”environmental controls and Y is the temperature outside of that house,measured at the same time intervals (168 measurements in each series)3

0 20 40 60 80 100 120 140 16020

21

22

23

24

25

26

27

t

xt

X

0 20 40 60 80 100 120 140 1600

5

10

15

20

25

30

t

yt

Y

The intuitive causal inference is Y → X.

3This data was originally presented at a time series conference. The abstract is available here,

http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50

Cooling/Heating System DataTime series data

Consider a time series pair (X,Y) where X are indoor temperaturemeasurements (in degrees Celsius) in a house with “experimental”environmental controls and Y is the temperature outside of that house,measured at the same time intervals (168 measurements in each series)3

0 20 40 60 80 100 120 140 16020

21

22

23

24

25

26

27

t

xt

X

0 20 40 60 80 100 120 140 1600

5

10

15

20

25

30

t

yt

Y

The intuitive causal inference is Y → X.

3This data was originally presented at a time series conference. The abstract is available here,

http://www.osti.gov/scitech/biblio/5231321 . The data is also available as part of the UCI Machine Learning Repository.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 35 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataECA summary preliminaries

An ECA summary requires several parameters be set from the data,including

I embedding dimension and time delay for g3 (PAI)

I cause-effect assignment and tolerance domains for g4 (leaning)

I lags for g5 (cross-correlation)

The embedding dimension will be set (somewhat arbitrarily) to E = 10and the time delay will be τ = 1.

The tolerance domains will be the f -width tolerance domains; i.e.,±δx = f (max(X)−min(X)) and ±δy = f (max(Y)−min(Y)). For thisexample, f = 1/4.

The cause-effect assignment will be the l-standard assignment, but thereis still the problem of determining relevant lags l .

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 36 / 50

Cooling/Heating System DataAutocorrelations

There are autocorrelations in both time series (only 50 lags are shown),

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

|r(x

t−l,xt)|2

l

X

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

|r(y

t−l,yt)|2

l

Y

The autocorrelations appear cyclic and initially drop to zero around l = 7for both time series.

This observation will be used justify using lags of l = 1, 2, · · · , 7 for

both g4 (leaning) and g5 (cross-correlation).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50

Cooling/Heating System DataAutocorrelations

There are autocorrelations in both time series (only 50 lags are shown),

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

|r(x

t−l,xt)|2

l

X

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

|r(y

t−l,yt)|2

l

Y

The autocorrelations appear cyclic and initially drop to zero around l = 7for both time series.

This observation will be used justify using lags of l = 1, 2, · · · , 7 for

both g4 (leaning) and g5 (cross-correlation).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 37 / 50

Cooling/Heating System DataLagged cross-correlations and leanings

The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,

1 2 3 4 5 6 7−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

l

⟨ λl ⟩

∆l

There are 7 different causal inferences in this plot, all of which agreeexcept l = 7. A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50

Cooling/Heating System DataLagged cross-correlations and leanings

The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,

1 2 3 4 5 6 7−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

l

⟨ λl ⟩

∆l

There are 7 different causal inferences in this plot, all of which agreeexcept l = 7.

A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50

Cooling/Heating System DataLagged cross-correlations and leanings

The lagged cross-correlations and leaning (using the l-standardassignment) can be plotted for each tested lag,

1 2 3 4 5 6 7−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

l

⟨ λl ⟩

∆l

There are 7 different causal inferences in this plot, all of which agreeexcept l = 7. A single causal inference (for each tool) will be foundwith the algebraic mean across all the tested lags.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 38 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒FX→Y − FY→X = −0.35 ⇒ Y → X ⇒CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ

yxl |〉 = 0.40 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1

FX→Y − FY→X = −0.35 ⇒ Y → X ⇒CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ

yxl |〉 = 0.40 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1

CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ

yxl |〉 = 0.40 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1

〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒〈|ρxyl | − |ρ

yxl |〉 = 0.40 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1

〈|ρxyl | − |ρyxl |〉 = 0.40 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1〈|ρxyl | − |ρ

yxl |〉 = 0.40 ⇒ Y → X ⇒ g5 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Cooling/Heating System DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = −0.14 ⇒ Y → X ⇒ g1 = 1

FX→Y − FY→X = −0.35 ⇒ Y → X ⇒ g2 = 1

CYX − CXY = 3.1× 10−4 ⇒ Y → X ⇒ g3 = 1

〈〈λEC 〉w 〉 = −0.20 ⇒ Y → X ⇒ g4 = 1

〈|ρxyl | − |ρyxl |〉 = 0.40 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is Y → X, which agrees with intuition

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 39 / 50

Snowfall DataTime series data

Consider a time series pair (X,Y) where X is the mean daily temperature(in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall(in centimeters) (7,753 measurements in each series)4

0 1000 2000 3000 4000 5000 6000 7000 8000−30

−20

−10

0

10

20

30

t

xt

X

0 1000 2000 3000 4000 5000 6000 7000 80000

20

40

60

80

100

120

t

yt

Y

The intuitive causal inference is X→ Y.

3This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to

December 31, 2009.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50

Snowfall DataTime series data

Consider a time series pair (X,Y) where X is the mean daily temperature(in degrees Celsius) at Whistler, BC, Canada, and Y is the total snowfall(in centimeters) (7,753 measurements in each series)4

0 1000 2000 3000 4000 5000 6000 7000 8000−30

−20

−10

0

10

20

30

t

xt

X

0 1000 2000 3000 4000 5000 6000 7000 80000

20

40

60

80

100

120

t

yt

Y

The intuitive causal inference is X→ Y.

3This data is available as part of the UCI Machine Learning Repository. The data was recorded from July 1, 1972 to

December 31, 2009.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 40 / 50

Snowfall DataECA summary preliminaries

The ECA summary can be made with similar parameters as the previousexample,

I The embedding dimension will be E = 100 with a time delay of τ = 1

I The cause-effect assignment will be the l-standard assignment

I The tolerance domains will be the 1/4-width domains

I The tested lags will be l = 1, 2, . . . , 20

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 41 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ

yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0

FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ

yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1

CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ

yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0

〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒〈|ρxyl | − |ρ

yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0〈|ρxyl | − |ρ

yxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0

FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1

CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0

〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is undefined

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Snowfall DataMaking an ECA summary

Each of the five time series tools leads to a causal inference in the ECAsummary vector,

TX→Y − TY→X = 2.1× 10−2 ⇒ X→ Y ⇒ g1 = 0

FX→Y − FY→X = −2.6× 10−3 ⇒ Y → X ⇒ g2 = 1

CYX − CXY = −3.4× 10−2 ⇒ X→ Y ⇒ g3 = 0

〈〈λEC 〉w 〉 = 3.7× 10−2 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = 2.3× 10−2 ⇒ Y → X ⇒ g5 = 1

∴ the ECA summary is undefined

The majority of the causal inferences agree with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 42 / 50

Times series causality as data analysisObjections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality

1. Correlation is not causation

2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.

True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50

Times series causality as data analysisObjections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality

1. Correlation is not causation

2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.

True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50

Times series causality as data analysisObjections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality

1. Correlation is not causation

2. Confounding cannot be controlled

Many different tools have been developed that go beyondcorrelation and ignoring such tools means ignoring potentially usefulinferences that can be drawn from the data.

True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50

Times series causality as data analysisObjections to causal studies

Data analysis often ignores causality.

Two primary objections to time series causality

1. Correlation is not causation

2. Confounding cannot be controlled

Many different tools have been developed that go beyond correlation andignoring such tools means ignoring potentially useful inferences that canbe drawn from the data.

True, but this is an issue of defining “causality”. Exploring potentialcausal relationships within data sets can be done with operationaldefinitions of causality. These different causalities may provide deeperinsight into the system dynamics.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 43 / 50

ECA summaries as practical toolsExploratory causal inference as a practical part of data analysis

Consider a recent result presented in Unraveling the cause-effect relationbetween time series [Phys. Rev. E 90, 052150]:

Liang; Section V, Ibid.

“. . . El Nino and IOD [Indian Ocean Dipole] are mutually causal, and thecausality is asymmetric, with the one from the latter to the former largerthan its counterpart . . .” (In the language of ECA: Given the time seriespair (E, I), the dominant potential driver is I; i.e., I→ E)

This conclusion is drawn from a derivation of the “Liang information flow”from the transfer entropy and then applying this new formula to E and I,but this same conclusion can be drawn from the ECA summaryvectors of these time series pairs, using the code presentedpreviously with naive algorithm parameters (specifically, theparameters used in the snowfall example).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50

ECA summaries as practical toolsExploratory causal inference as a practical part of data analysis

Consider a recent result presented in Unraveling the cause-effect relationbetween time series [Phys. Rev. E 90, 052150]:

Liang; Section V, Ibid.

“. . . El Nino and IOD [Indian Ocean Dipole] are mutually causal, and thecausality is asymmetric, with the one from the latter to the former largerthan its counterpart . . .” (In the language of ECA: Given the time seriespair (E, I), the dominant potential driver is I; i.e., I→ E)

This conclusion is drawn from a derivation of the “Liang information flow”from the transfer entropy and then applying this new formula to E and I,but this same conclusion can be drawn from the ECA summaryvectors of these time series pairs, using the code presentedpreviously with naive algorithm parameters (specifically, theparameters used in the snowfall example).

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 44 / 50

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 45 / 50

BACK-UP

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 46 / 50

Impulse with linear response

Consider X,Y = xt, yt where t = 0, 1, . . . , L,

xt =

2 t = 1Aηt ∀ t ∈ t | t 6= 1 and t mod 5 6= 02 ∀ t ∈ t | t mod 5 = 0

and yt = xt−1 + Bηt with y0 = 0, A,B ∈ R ≥ 0 and ηt ∼ N (0, 1).Specifically, consider L = 500, A = 0.1, and B = 0.4.

TX→Y − TY→X = 5.3× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 4.5× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −8.3× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 6.6× 10−3 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = −2.8× 10−3 ⇒ X→ Y ⇒ g5 = 0

ECA summary is X→ Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 47 / 50

Cyclic driving with linear response

Consider X,Y = xt, yt where t = 0, 1, . . . , L,

xt = a sin(bt + c) + Aηt

andyt = xt−1 + Bηt

with y0 = 0, A ∈ [0, 1], B ∈ [0, 1], ηt ∼ N (0, 1), and with the amplitudea, the frequency b, and the phase c all in the appropriate units.Specifically, consider L = 500, A = 0.1, B = 0.4, a = b = 1, and c = 0.

TX→Y − TY→X = 1.9× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 2.1× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −9.8× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 3.9× 10−3 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = −2.9× 10−2 ⇒ X→ Y ⇒ g5 = 0

ECA summary is X→ Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 48 / 50

Cyclic driving with non-linear responseConsiderX,Y = xt, yt where t = 0, 1, . . . , L,

xt = a sin(bt + c) + Aηt

andyt = Bxt−1 (1− Cxt−1) + Dηt ,

with y0 = 0, with A,B,C ,D ∈ [0, 1], ηt ∼ N (0, 1), and with theamplitude a, the frequency b, and the phase c all in the appropriate unitsgiven t = 0, f π, 2f π, 3f π, . . . , 6π with f = 1/30, which implies L = 181.Specifically, consider A = 0.1, B = 0.3, C = 0.4, D = 0.5, a = b = 1, andc = 0.

TX→Y − TY→X = 2.7× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 2.6× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −1.8× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 8.4× 10−3 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = −6.8× 10−2 ⇒ X→ Y ⇒ g5 = 0

ECA summary is X→ Y, which agrees with intuition.J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 49 / 50

Coupled logistic map

Consider X,Y = xt, yt where t = 0, 1, . . . , L,

xt = xt−1 (rx − rxxt−1 − βxyyt−1)

andyt = yt−1 (ry − ryyt−1 − βyxxt−1)

where the parameters rx , ry , βxy , βyx ∈ R ≥ 0. Specifically, considerL = 500, βxy = 0.5, βyx = 1.5, rx = 3.8, and ry = 3.2 with initialconditions x0 = y0 = 0.4.

TX→Y − TY→X = 4.9× 10−1 ⇒ X→ Y ⇒ g1 = 0FX→Y − FY→X = 5.4× 10−1 ⇒ X→ Y ⇒ g2 = 0CYX − CXY = −3.9× 10−3 ⇒ X→ Y ⇒ g3 = 0〈〈λEC 〉w 〉 = 2.7× 10−1 ⇒ X→ Y ⇒ g4 = 0

〈|ρxyl | − |ρyxl |〉 = −2.6× 10−1 ⇒ X→ Y ⇒ g5 = 0

ECA summary is X→ Y, which agrees with intuition.

J. M. McCracken (GMU) ECA w/ time series causality November 20, 2015 50 / 50