43
Local Independence Tests for Point Processes Learning causality in event models Nikolaj Thams, University of Copenhagen November 21 st , 2019 Time to Event Data and Machine Learning Workshop Joint work with Niels Richard Hansen

Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local Independence Tests for Point ProcessesLearning causality in event models

Nikolaj Thams, University of CopenhagenNovember 21st, 2019

Time to Event Data and Machine Learning WorkshopJoint work with Niels Richard Hansen

Page 2: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Hawkes Processes

Causality

Local independence test

Experimental results

Conclusion

Page 3: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Learning causality in event models?

0 TTime

b

h

c

a

Page 4: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Learning causality in event models?

0 TTime

b

h

c

a

Page 5: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Hawkes Processes

Page 6: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Point process

Point processesA point process with marks V = 1, . . . , d is a collectionof random measures random

Nk =∑

iTk

i ,

where Tki is the i’th event of type k. This defines processes

t 7→ Nkt := Nk(0, t].

T1 T2T3

Nt

t

If the compensator Akt of Nk

t equals∫ t

0 λks ds for some λk, λk is the intensity of Nk.

Observe that ENkt =

∫ t0 Eλk

s ds.

Famous examples: Poisson process (λt constant) and Hawkes process (next slide).

Page 7: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Hawkes processes

Hawkes processThe process with intensity:

λkt = βk

0 +∑v∈V

∫ t−

−∞gvk(t − s)N(ds) = βk

0 +∑v∈V

∑s<t

gvk(t − s)

is called the (linear) Hawkes process, with kernels g for some integrable functions g.E.g. gvk(x) = βvk

1 e−βvk2 (x).

This motivates using graphs for summarizing dependencies:N1 N2

0 5 10 15 20 0 5 10 15 20

0.3

0.4

0.5

0.6

Time

Inte

nsity

Process

N1

N2

1 2

Page 8: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Hawkes processes

Hawkes processThe process with intensity:

λkt = βk

0 +∑v∈V

∫ t−

−∞gvk(t − s)N(ds) = βk

0 +∑v∈V

∑s<t

gvk(t − s)

is called the (linear) Hawkes process, with kernels g for some integrable functions g.E.g. gvk(x) = βvk

1 e−βvk2 (x).

This motivates using graphs for summarizing dependencies:N1 N2

0 5 10 15 20 0 5 10 15 20

0.3

0.4

0.5

0.6

Time

Inte

nsity

Process

N1

N2 1 2

Page 9: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causality

Page 10: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causal inference

Static systemStructural Causal Models (SCMs) consist of functional assignments, summarized byparents in a graph.

Xi = fi(Xpai , ϵi) , i ∈ V

X1

X2 X3

X1

X2 X3:= cEssential assumption: Also describes the system under interventions Xi := c.

A graph satisfies, in conjunction with a separation criterion ⊥ satisfies:

• The global Markov property if A ⊥B|C =⇒ A |= PB|C.• Faithfulness A |= PB | C =⇒ A ⊥B|C

The global Markov property and faithfullness is the motivation for developingconditional independence tests in causality. See (Peters et al. 2017) for details.

Page 11: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causal inference

Static systemStructural Causal Models (SCMs) consist of functional assignments, summarized byparents in a graph.

Xi = fi(Xpai , ϵi) , i ∈ V

X1

X2 X3

X1

X2 X3:= cEssential assumption: Also describes the system under interventions Xi := c.

A graph satisfies, in conjunction with a separation criterion ⊥ satisfies:

• The global Markov property if A ⊥B|C =⇒ A |= PB|C.• Faithfulness A |= PB | C =⇒ A ⊥B|C

The global Markov property and faithfullness is the motivation for developingconditional independence tests in causality. See (Peters et al. 2017) for details.

Page 12: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causal inference

Static systemStructural Causal Models (SCMs) consist of functional assignments, summarized byparents in a graph.

Xi = fi(Xpai , ϵi) , i ∈ V

X1

X2 X3

X1

X2 X3:= cEssential assumption: Also describes the system under interventions Xi := c.

A graph satisfies, in conjunction with a separation criterion ⊥ satisfies:

• The global Markov property if A ⊥B|C =⇒ A |= PB|C.• Faithfulness A |= PB | C =⇒ A ⊥B|C

The global Markov property and faithfullness is the motivation for developingconditional independence tests in causality. See (Peters et al. 2017) for details.

Page 13: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causal inference: Dynamical system

Causal ideas have been generalized the dynamical setting, e.g. (Didelez 2008;Mogensen, Malinsky, et al. 2018; Mogensen and Hansen 2018)

X1t1

X2t1

X3t1

X1t2

X2t2

X3t2

X1t3

X2t3

X3t3

. . .

. . .

. . .

X1

X2

X3

Page 14: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Volterra series

Local independenceLet N be a marked point process. For subsets A,B,C ⊆ V, we say that B is locallyindependent of A given C if for every b ∈ B:

λb,A∪Ct = E[λb

t | FA∪Ct ]

version∈ FC

t

and we write A → B | C. Heuristically, the intensity of b, when observing A ∪ C,depends only on events of C.

Under faithfulness assumptions, there exist algorithms for learning the causal graph(Meek 2014; Mogensen and Hansen 2018), by removing the edge a → b if a → b | Cfor some C. In practice, this requires an empirical test for independence!

Page 15: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local independence test

Page 16: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local independence test

We want to test:

H0 : j → k | C

Equivalently to test if λk,Ct is a version of λk,C∪j

t . We propose to fit:

λk,C∪jt = β

k0 +

∫ t

0gjk(t − s)Nj(ds) + λk,C

t

Then

H0 : gjk = 0

will have the right level, if we estimate the true λk,C.

Problem: If there are latent variables, the marginalized modelmay not be a Hawkes process. So how to estimate λC gener-ally, to retain level?

k

h

c j

Page 17: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local independence test

We want to test:

H0 : j → k | C

Equivalently to test if λk,Ct is a version of λk,C∪j

t . We propose to fit:

λk,C∪jt = β

k0 +

∫ t

0gjk(t − s)Nj(ds) + λk,C

t

Then

H0 : gjk = 0

will have the right level, if we estimate the true λk,C.

Problem: If there are latent variables, the marginalized modelmay not be a Hawkes process. So how to estimate λC gener-ally, to retain level?

k

h

c j

Page 18: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Voltera approximations

To develop a non-parametric fit for λC, we prove the following theorem, resemblingVoltera series for continuous systems.TheoremSuppose that N is a stationary point process. There exist a sequence of functions hαN,such that letting:

λNt = h0

N +N∑

n=1

∑|α|=n

∫ t

−∞· · ·

∫ t

−∞hαN(t − s1, · · · t − sn)Nα1(ds1) · · ·Nαn(dsn)

and λNt

P−→ λC for N → ∞.

Page 19: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Approximating intensity

λC approximationsA1: Approximate by 2nd order iterated integrals.A2: Approximate kernels using tensor splineshα(x1, . . . , xn) ≈

∑dj1=1 · · ·

∑djn=1 β

αj1,...,jnbj1(x1) · · · bjn(xn)

In vector notation:λC

t (β) = β0 +∑v∈C

∫ t−

−∞(βv)TΦ1(t − s)Nv(ds)

+∑

v1,v2∈Cv2≥v1

∫ t−

−∞(βv1v2)TΦ2(t − s1, t − s2)N(v1,v2)(ds2)

=: βTC xC

tSimilarly for gjk, such that

λk,Ct = β

k0 +

∫ t

0gjk(t − s)Nj(ds) + λk,C

t = βTxt + βT

C xCt =: βTxt

Page 20: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Maximum Likelihood Estimation

The likelihood is concave for linear intensities!

logLT(β) =

∫ T

0log

(βTxt

)Nk(dt)− βT

∫ T

0xk

t dt

We penalize with a roughness penalty:

maxβ logLT(β)− κ0βTΩβ

s.t. Xβ ≥ 0

The distribution of maximum likelihood estimate is approximately normal:

βapprox∼ N

((I + 2κ0J−1

T Ω)β0, J−1T KTJ−1

T

)with KT =

∫ T0

xtxTt

βTxtdt and JT = KT − 2κ0Ω

Page 21: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local Independence Test

Given the distribution of β = (β, βC), we can test the hypothesis H0 : j → k | C.

How do we test ΦTβ ≡ 0?

• First idea: β approximately normal, so test directly β = 0.• Better idea (see Wood 2012), evaluate basis Φ in a grid G = x1, . . . , xM. Fitted

function values over grid is thus Φ(G)Tβ.

If β ∼ N (µj,Σj) then Wald test statistic for null hypothesis Φ(G)Tµj = 0 is:

Tα = (β)TΦ(G)[Φ(G)TΣjΦ(G)

]−1Φ(G)Tβ (1)

This is χ2(M)-distributed, and we can test for significance of components!

Page 22: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Summary of test

We summarize our proposed test. To test j → k | C:

• Approximate λC by Voltera expansion at degree 2 and with spline-kernels.• Fit λk,C

(β) within model class by penalized MLE.• Test ϕTβ ≡ 0 using grid evaluation and Wald approximation.• If test is accepted, conclude local independence.

Page 23: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Experimental results

Page 24: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Experiment 1: Testing various structures

In each of the following 7 structures, we test a → b | b,C:

a c b a c h b a ch

bL1: L2: L3:

a b a h b ac

hbP1: P2: P3:

We obtain acceptance rates:L1 L2 L3 P1 P2 P3

1 2 1 2 1 2 1 2 1 2 1 2

0%

20%

40%

60%

80%

100%

H0

acce

ptan

ce r

ate

Test outcome

Accepted

Rejected

Page 25: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Experiment 1: Testing various structures

In each of the following 7 structures, we test a → b | b,C:

a c b a c h b a ch

bL1: L2: L3:

a b a h b ac

hbP1: P2: P3:

We obtain acceptance rates:L1 L2 L3 P1 P2 P3

1 2 1 2 1 2 1 2 1 2 1 2

0%

20%

40%

60%

80%

100%

H0

acce

ptan

ce r

ate

Test outcome

Accepted

Rejected

Page 26: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Causal discovery

We evaluate the performance in the CA-algorithm, which estimates the causal graph.

0 TTime

a b

cd

a → b | b, c, d

a b

cd

· · ·

a b

cd

Page 27: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Experiment 2: Causal discovery

We simulate random graphs, simulate a dataset from this graph, recover the graphfrom dataset and measure the Structural Hamming Distance (SHD) to the true graph:SHD between 1 and 2The (minimum) number of actions between flipping, adding or removing an edgeneeded to turn 1 into 2

0

1

2

3

4

5

3 4 5 6Dimension of graph

SH

D to

true

gra

ph

Type

Baseline

LI Test

Page 28: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Conclusion

Page 29: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Conclusion

• Causal inference is possible in point process models, using conditionalindependence tests!

• Facing latent components in a Hawkes model, the marginal process may not beHawkes.

• The Voltera expansions can overcome this model misspecification, by fitting ageneral functional form of intensities.

• We propose a testing framework based on splines, and have promisingexperimental results.

Page 30: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

References i

Daley, Daryl J and David Vere-Jones (2007). An introduction to the theory of pointprocesses: volume II: general theory and structure. Springer Science & BusinessMedia.

Didelez, Vanessa (2008). “Graphical models for marked point processes based on localindependence”. In: Journal of the Royal Statistical Society: Series B (StatisticalMethodology) 70.1, pp. 245–264.

Meek, Christopher (2014). “Toward learning graphical and causal process models”. In:Proceedings of the UAI 2014 Conference on Causal Inference: Learning andPrediction-Volume 1274. CEUR-WS. org, pp. 43–48.

Page 31: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

References ii

Mogensen, Søren Wengel and Niels Richard Hansen (2018). “Markov equivalence ofmarginalized local independence graphs”. In: arXiv preprint arXiv:1802.10163. Toappear in Ann. Statist.

Mogensen, Søren Wengel, Daniel Malinsky, and Niels Richard Hansen (2018). “Causallearning for partially observed stochastic dynamical systems”. In: 34th Conferenceon Uncertainty in Artificial Intelligence 2018, UAI 2018Conference on Uncertainty inArtificial Intelligence. Association For Uncertainty in Artificial Intelligence (AUAI),pp. 350–360.

Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf (2017). Elements of causalinference: foundations and learning algorithms. MIT press.

Wood, Simon N (2012). “On p-values for smooth components of an extendedgeneralized additive model”. In: Biometrika 100.1, pp. 221–228.

Page 32: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Questions?

Page 33: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Voltera: Sketch of proof I

First we show the representation at time 0, i.e. for λ0

1. For any λ0, use that 1|λ0|<Nλ0P→ λ and 1|λ0|<Nλ0 ∈ L1(F)

2. Define Fτ = σ(T1 ∧ τ,T2 ∧ τ, . . . , ), and show ∪τ≤0L1(Fτ ) is dense in L1(F),where F = σ(Nt, t < 0) via martingale convergence.

3. Through combinatoric argument, show that for λ0 ∈ L1(Fτ ), 1N([τ,0])=1λ0 has aadditive decomposition

N∑n=1

βn

∫[τ,0]

f(t1)1Dn dN(tn)a.s.→N λ01N([τ,0])=1

4. Extend to 1N([τ,0])=Mλ0 and sum all terms.

Page 34: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Voltera: Sketch of proof II

5. Using time-homogenity 1, we extend the result to every time t.6. Extension to multivariate point processes is simple, using:∫

((−∞,0]×V)nhn(t1, v1, . . . , tn, vn)N(dtn × vn)

=∑|α|=n

∫(−∞,0]n

hα(t1, . . . , tn)N(dtn)

1λ(π, Nss<π) = λ(0, Nπs s<0)

Page 35: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Local independence graphs

Local independence graphFor point process with coordinates V = 1, . . . , d, define the local independencegraph = (V,E) by

E = (a, b) | a → b | V\a

Example

a b

c

Page 36: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Graphs and µ-separation

ac1

c2

bGraph: ac1

c2

bWalk:

Colliderµ-connection and separationFor = (V,E) let a, b ∈ V,C ⊆ V. A µ-connecting walk p from a to b given C is awalk from a to b such that:

1. p is non-trivial and its final edge points to b.2. a /∈ C3. coll(p) ⊆ An(C)4. noncoll(p) ∩ C = ∅

If no walks from a to b are µ-connecting given C, they are µ-separated and we writea ⊥µ b | C.

Page 37: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Global Markov property

The following concepts relate local independence to a graph :

Global Markov Property A ⊥µ B | C implies A → B | CFaithfullness A → B | C implies A ⊥µ B | C.

The global Markov property makes the local independence graph ”relevant” forunderstanding the underlying point process.Recovering the graph using independence testAssuming faithfullness and the global Markov property, (Meek 2014) proposes analgorithm which guarantees to return the true local independence graph, essentiallyby testing a → b | C for all a, b and sets C of increasing size.

Page 38: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: The CA algorithm

Algorithm 1 Causal Analysis algorithmInitialize = (V,ECA) as a fully connected graphfor v ∈ V do:

n = 0while n < |pa(v)| do:

for v′ ∈ pa(v) do:for C ⊆ pa(v)\v′ with |C| = n do:

if v′ → v | C then remove (v′, v) from ECA.n = n + 1

return = (V,ECA)

In short: For pairs (v′, v), remove the edge v′ → v if there exist a set C such thatv′ → v | C.

Page 39: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: P1 and P2

Definition

• ⊥ satisfies P1 if separation v′ ⊥ v | C for v′ /∈ C implies (v′, v) /∈ E.• ⊥ satisfies P2 if lack of an edge (v′, v) implies existence of a set C ⊆ pa(v) such

that v′ ⊥ v | C.

The CA algorithm assumes both P1 and P2. d-separation satisfies P1 and δ- andµ-separation satisfies P1 and P2.

We show that for ⊥ satisfying P1 and P2, two graphs have the same separationsexactly if they are equal.

Page 40: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: Example of Local independence

Example3 children (a, b, c) throwing a ball a exp(1)→ b exp(1)→ c exp(1)→ a · · · . Nv counts the numberof times child v has thrown the ball. This has intensities:

λat = 1Na

t=Nct λb

t = 1Nbt <Na

tλc

t = 1Nct<Nb

t

We find b → a | a, c and a → b | b, c, because:

λa,a,b,ct = E

[λa

t | Fa,b,ct

]= 1Na

t=Nct ∈ Fa∪c

t

λb,a,b,ct = E

[λb

t | Fa,b,ct

]= 1Nb

t <Nat/∈ Fb∪c

T

Also a → a | b, c because

λa,a,b,ct = 1Na

t=Nct /∈ Fb∪c

t

Page 41: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: Runtime

0

10

20

30

40

250 500 750 1000 1250

Point count

Run

tim

e (s

)Test order

First order

Second order

Structure

S1

S3

S4i

Figure 1: Runtime of 300 invocations of the local empirical independence test. a →λ b | b,Cwas tested 100 times in different structure S1, S3 and S4i.

Page 42: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: Tuning κ0

S4ii S4iii S4iv

S1 S2 S3 S4i

0.001 1 1000 0.001 1 1000 0.001 1 1000

0.001 1 10000.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

Scale κ0

1e−04

0.001

0.01

0.1

0.316

1

3.162

10

100

1000

10000

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

Scale κ0

1e−04

0.001

0.01

0.1

0.316

1

3.162

10

100

1000

10000

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

Scale κ0

1e−04

0.001

0.01

0.1

0.316

1

3.162

10

100

1000

10000

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

Scale κ0

1e−04

0.001

0.01

0.1

0.316

1

3.162

10

100

1000

10000

S1 S2 S3 S4i S4ii S4iii S4iv

Roughness penalty

0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000 0.001 1 1000

0.00

0.25

0.50

0.75

Scale κ0

Test

p−

valu

e

Figure 2: Boxplots of p-values from the 7 structures. From each structure 100 Hawkes processwas simulated, and the local empirical independence test was run, each with the roughnesspenalty at various levels of κ0. Each simulation produced a p-value, which is plotted. The reddotted line shows the 5%-level. The headers show the ground truth of whether a → b | b,C.The dark-green line show the fraction of the simulated p-values falling below a 5%-level.

Page 43: Local Independence Tests for Point Processes Learning ... · Nikolaj Thams, University of Copenhagen November 21st, 2019 Time to Event Data and Machine Learning Workshop Joint work

Backup: Latent experiment

0 1 2

2 3 4 5 2 3 4 5 2 3 4 5

0

2

4

6

8

Observed nodes

SH

D

Algorithm

Second Order SHD

First Order SHD

Figure 3: Structural Hamming Distances of graphs estimated using the ECA-algorithm with afirst- and second-order local empirical independence test (second being the standard one, usedabove). Each of the boxes 0, 1 and 2 indicate the number |V\O| of latent variables. The linesrepresent the average SHD within each group.