28
Estimating inter-event time distributions from finite observation periods Mikko Kivelä Aalto University MK, M.A. Porter, Phys. Rev. E 92 052813 It is a curious fact that researchers in statistical physics know very little about statistics.- Anonymous referee

Presentation for the Burstiness satellite at CCS16 Amsterdam

Embed Size (px)

Citation preview

Page 1: Presentation for the Burstiness satellite at CCS16 Amsterdam

Estimating inter-event time distributions from finite

observation periods

Mikko Kivelä Aalto University

MK, M.A. Porter, Phys. Rev. E 92 052813

“It is a curious fact that researchers in statistical physics know very little about statistics.” - Anonymous referee

Page 2: Presentation for the Burstiness satellite at CCS16 Amsterdam

Temporal communication networks● Communication between people: emails, letters,

calls, SMS, messages in websites etc. ● Temporal networks: links/nodes active at discrete

times (activation events)

Time

Holme & Saramäki, Phys. Reps. 519, 97 (2012)

Page 3: Presentation for the Burstiness satellite at CCS16 Amsterdam

Inter-event times (IETs)● Times between activations of nodes or edges in

temporal networks

Time

Activation event of a node or a link.

Page 4: Presentation for the Burstiness satellite at CCS16 Amsterdam

Inter-event times - why are they important?

1) Validating models of human activity ● Models predict that there should be very long IETs

− Power-law IET dist. Barabási,Nature435,207(2005)− Exponential IET dist.

2) Processes acting on top of temporal networks are affected by burstiness/long IETs ● Spreading processes Karsaietal.,Phys.Rev.E83,025102(2011)● Mixing time of random walks Delvenneetal.,arXiv:1309.4155

[physics.soc-ph](2013)

● Opinion formation Takaguchi&Masuda,Phys.Rev.E84,036115(2011)

The tail of the IET distribution is important!

Page 5: Presentation for the Burstiness satellite at CCS16 Amsterdam

Example: bustiness & spreading

M.Kivelä,R.K.Pan,K.Kaski,J.Kertész,J.Saramäki,M.Karsai:Multiscaleanalysisofspreadinginalargecommunicationnetwork,J.Stat.Mech.3P03005(2012)MKarsai,MKivelä,RKPan,KKaski,JKertész,ALBarabási,JSaramäki:Smallbutslowworld:Hownetworktopologyandburstinessslowdownspreading,PhysicalReviewE83(2),025102(2011)

Bursts can dramatically slow down spreading on networks!

Page 6: Presentation for the Burstiness satellite at CCS16 Amsterdam

Example: bustiness & spreadingLong tailed IET distribution = bursty event sequence

= high residual waiting times = slow spreading

random time point

M.Kivelä,R.K.Pan,K.Kaski,J.Kertész,J.Saramäki,M.Karsai:Multiscaleanalysisofspreadinginalargecommunicationnetwork,J.Stat.Mech.3P03005(2012)

Page 7: Presentation for the Burstiness satellite at CCS16 Amsterdam

The problem

Page 8: Presentation for the Burstiness satellite at CCS16 Amsterdam

A typical example of a plot of IET distribution found in the literature

Rybski et al., Sci. Reps. 2, 560 (2012)

Page 9: Presentation for the Burstiness satellite at CCS16 Amsterdam

A typical example of a plot of IET distribution found in the literature

Rybski et al., Sci. Reps. 2, 560 (2012)

Is this cut-off in the power-law really a scale in the interaction patterns or is it just a „finite size effect“?

Page 10: Presentation for the Burstiness satellite at CCS16 Amsterdam

A typical example of a plot of IET distribution found in the literature

Rybski et al., Sci. Reps. 2, 560 (2012)

Length of the observation period T=~500 days

Page 11: Presentation for the Burstiness satellite at CCS16 Amsterdam

Questions● Is there a finite size effect or is the dip “real”? ● What is the reason for it? ● What is the functional form of the effect?

● Linear? Vazquezetal.,Phys.Rev.Lett.98,158702(2007) ● Exponential? Wu et. al, PNAS 107 18803 (2010)

● How does it affect estimates of statics such as the bustiness parameter or residual waiting time?

● How to correct for the finite size effect correctly?

Page 12: Presentation for the Burstiness satellite at CCS16 Amsterdam

Renewal process model

Real IET distribution p(τ) Observed IET distribution p'(τ)

● Stationary renewal process and finite observation window:

Page 13: Presentation for the Burstiness satellite at CCS16 Amsterdam

Renewal process model

Real IET distribution p(τ) Observed IET distribution p'(τ)

● Stationary renewal process and finite observation window:

● There is a linear length bias:

Soon&WoodroofeJ.Stat.Plan.Inference53,171(1996)

Page 14: Presentation for the Burstiness satellite at CCS16 Amsterdam

Renewal process model

p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)

Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40

Page 15: Presentation for the Burstiness satellite at CCS16 Amsterdam

Strategies used in the literature to cope with finite time windows

● Periodic boundary conditions Karsaietal.,Phys.Rev.E83,025102(2011)● Rescaling the IET distribution Holme,Europhys.Lett.64,427(2003)

● Resample data with smaller T -> infer scaling law Vazquezetal.,Phys.Rev.Lett.98,158702(2007)

● Select only high frequency event sequences ● E.g., >103 - 105 event sequences, but a single sequence or 10% of sequences used Barabási,Nature435,207(2005);Wuet.al,PNAS10718803(2010)

● Nothing! (some times “finite size effects” are mentioned)

Page 16: Presentation for the Burstiness satellite at CCS16 Amsterdam

The solution

Page 17: Presentation for the Burstiness satellite at CCS16 Amsterdam

Censored IETsReal IET distribution p(τ)

Observed IET distribution p'(τ)

● Two types of IETs are sampled: ● Observed ● Censored

● Bias arises when one leaves out the censored ones

● How to deal with censored IETs: Survival analysis

Page 18: Presentation for the Burstiness satellite at CCS16 Amsterdam

Using Kaplan-Meier estimator to estimate the IET distribution

● Take the observed and censored IETs:Censored IET: we know that τ3 is longer than τfc

Number of observed IETs of length ti

Number of IETs know to be at least length ti (including censored)

Kaplan&Meier,JASA53,457(1958);Denby&Vardi,Technometrics27,361(1985)

● Estimate the cumulative IET distribution:

● Result is an unbiased non-parametric maximum likelihood estimator for the IET distribution

Page 19: Presentation for the Burstiness satellite at CCS16 Amsterdam

Renewal process model

Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40

p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)

Page 20: Presentation for the Burstiness satellite at CCS16 Amsterdam

Renewal process model

p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)

Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40

pkm(τ), Kaplan-Meier est

Page 21: Presentation for the Burstiness satellite at CCS16 Amsterdam

How important is the correction?Email communication

Eckmann et al, PNAS 101 14333 (2004)Messages in POK

Rybski al, Sci. Reps. 2 560 (2012)Short messages

Wu al, PNAS 107 18803 (2010)

IET: Average IET: sqrt of 2nd moment Avg residual waiting time

Observed KM estimate Observed KM estimate Observed KM estimate

Email 0.908 1.51 3.20 6.88 5.62 15.6

POK 5.13 28.4 23.1 106 51.9 198

SMS 0.633 1.40 2.11 4.89 3.53 8.53

Times in days

Kaplan-MeierObserved IETs

Page 22: Presentation for the Burstiness satellite at CCS16 Amsterdam

When to worry about length bias?● The bias depends on how far you are from the end of the

time window:

● E.g, the max point in IET distribution is 100 times smaller than the time window T -> max 1% error

Page 23: Presentation for the Burstiness satellite at CCS16 Amsterdam

A rule of thumb

Duarte et al., ICWSM’2007

Affected

Not Affected

T= 2.5 106 s

Page 24: Presentation for the Burstiness satellite at CCS16 Amsterdam

Separating time sequences with different activity levels

● There is heterogeneity in the frequencies of different sequences (e.g., some people send huge numbers of emails, others only few)

● IET distributions can have the same shape but different frequencies: data collapse

Observed IET distributions Kaplan-Meier estimates

Page 25: Presentation for the Burstiness satellite at CCS16 Amsterdam

Summary● IETs are subject to linear length bias if there is a

finite sampling window ● The bias is considerably large for several popular

and freely available data sets on communication ● Kaplan-Meier estimator is an easy way to

estimate the real IET distribution (other estimators exist)

● You can even use low-frequency event sequences

● Try it: http://github.com/bolozna/iet

Page 26: Presentation for the Burstiness satellite at CCS16 Amsterdam
Page 27: Presentation for the Burstiness satellite at CCS16 Amsterdam

Model with heterogeneous activity levels

Activities t0 from a power-law distribution for p(t0)

Poisson processes with rates t0

Page 28: Presentation for the Burstiness satellite at CCS16 Amsterdam

Estimators for single sequences

Poisson process with n expected events