Estimation of Space Time Branching Process Models in ...bemlar.ism.ac.jp/zhuang/Refs/Refs/veen2008jasa.pdf · Veen and Schoenberg: Estimation of Space Time Branching Process Models

Estimation of SpacendashTime Branching Process Modelsin Seismology Using an EMndashType Algorithm

Alejandro VEEN and Frederic P SCHOENBERG

Maximum likelihood estimation of branching point process models via numerical optimization procedures can be unstable and computa-tionally intensive We explore an alternative estimation method based on the expectation-maximization algorithm The method involvesviewing the estimation of such branching processes as analogous to incomplete data problems Using an application from seismology weshow how the epidemic-type aftershock sequence (ETAS) model can in fact be estimated this way and we propose a computationallyefficient procedure to maximize the expected complete data log-likelihood function Using a spacendashtime ETAS model we demonstrate thatthis method is extremely robust and accurate and use it to estimate declustered background seismicity rates of geologically distinct regionsin Southern California All regions show similar declustered background intensity estimates except for the one covering the southern sectionof the San Andreas fault system to the east of San Diego in which a substantially higher intensity is observed

KEY WORDS Branching process models Earthquakes Epidemic-type aftershock sequence model Expectation-maximization algo-rithm Maximum likelihood Spacendashtime point process models

1 INTRODUCTION

Point process models have long been used to describe earth-quake occurrences (Vere-Jones 1970 1975) See Ogata (1999)for a nice review Some of the early applications fitted NeymanndashScott-type models in which main shocks are viewed as clus-ter centers each of which may trigger a random number ofaftershocks with magnitudes not larger than the main shock(Vere-Jones 1970 Hawkes and Adamopoulos 1973) More re-cent work has favored the use of branching process models inwhich all earthquakes can trigger aftershocks and among thesethe epidemic-type aftershock sequence (ETAS) model is con-sidered to be one of the standard models in seismology (Ogata1988 1998)

ETAS and other branching process models are commonlyestimated using maximum likelihood (ML) However closed-form solutions are usually not available and numerical maxi-mization algorithms must be employed In such situations com-putational difficulties can arise especially if the models arecomplex multidimensional and nonlinear as this often leadsto multimodal or extremely flat log-likelihood functions

The view of branching process models as incomplete dataproblems suggests the use of the expectation-maximization(EM) algorithm in order to attain maximum likelihood esti-mates (MLEs) (Dempster Laird and Rubin 1977) In this con-text the information about which event ldquotriggersrdquo each otherevent is unobservable but can be described probabilisticallyThe EM algorithm involves maximizing the expected completedata log-likelihood which in the context of branching pointprocess models is based on the probabilistic incorporation ofthe branching structure and is usually easier to maximize Us-ing an analogous expression conventional ML maximizes theincomplete data log-likelihood

In this article we show how an EM-type algorithm can becombined with a partial information approach in certain steps

Alejandro Veen is Research Staff Member IBM T J Watson ResearchCenter Yorktown Heights NY 10598 (E-mail aveenusibmcom) FredericP Schoenberg is Professor Department of Statistics University of CaliforniaLos Angeles CA 90095-1554 (E-mail fredericstatuclaedu) This materialis based on work supported by the National Science Foundation under grant0306526 We thank Yan Kagan Yingnian Wu Ilya Zaliapin and anonymousreferees for helpful comments and the Southern California Earthquake Centerfor its generosity in sharing its data All computations were performed in R (seeR Development Core Team 2007)

This relates to partial likelihood maximization which was in-troduced by Cox (1975) and which was briefly discussed byOgata and Akaike (1982) in the context of branching processesin seismology By coupling two well-established estimationmethods (EM and a partial information approach) we are ableto present a highly robust and accurate estimation procedurethat can be used to estimate even very complex branchingprocess models To demonstrate the properties of our proposedmethod we use a spacendashtime ETAS model to simulate earth-quake catalogs and then compare the results of the EM-typeestimation algorithm to the conventional ML procedure using anumerical optimization approach

Following a description in Section 2 of self-exciting pointprocess models for earthquake occurrences we describe someof the problems with conventional ML estimation of such mod-els in Section 3 Section 4 describes a proposed alternative es-timation method based on the EM algorithm and shows itsrobustness and accuracy using simulations The EM-type algo-rithm is then used in Section 5 to estimate background seismic-ity rates for Southern California and the article will concludewith a discussion in Section 6

2 SELFndashEXCITING POINT PROCESSES ANDTHE ETAS MODEL

Consider a simple temporal point process N on [0infin)

adapted to a filtration Ht Assuming it exists the conditionalintensity λ(t |Ht ) is defined as the unique nondecreasing H-predictable process such that N([0 t)) minus int

λ(t |Ht ) dt is anH-martingale In this representation H must contain the historyof the process up to time t denoted as Ht = ti ti lt t with tias the time event i occurs but H may contain additional in-formation as well Because the finite-dimensional distributionsof such a point process are uniquely determined by its condi-tional intensity (Daley and Vere-Jones 2003) one way to modela point process is via its conditional intensity

In self-exciting point processes the conditional intensityis given by λ(t |Ht ) = μ + sum

i tiltt g(t minus ti ) where μ gt 0g(v) ge 0 for nonnegative v and equals 0 otherwise andint infin

0 g(v) dv lt 1 in order to ensure stationarity (Hawkes1971ab) Early applications of self-exciting point processes to

copy 2008 American Statistical AssociationJournal of the American Statistical Association

June 2008 Vol 103 No 482 Applications and Case StudiesDOI 101198016214508000000148

614

Veen and Schoenberg Estimation of SpacendashTime Branching Process Models 615

earthquake occurrence models can be found in Hawkes andAdamopoulos (1973) Lomnitz (1974 chap 7) and Kagan andKnopoff (1987)

A particularly important example of a self-exciting pointprocess is the ETAS model which was first introduced by Ogata(1988) and is widely used to describe earthquake occurrencesEarly forms of the ETAS model (Ogata 1988) only took magni-tudes and earthquake occurrence times into account

λ(t |Ht ) = μ +sum

itiltt

g(t minus ti mi)

where the history of the process Ht = (ti mi) ti lt t also in-cludes earthquake magnitudes mi μ is the (in this case con-stant) background intensity of earthquake occurrences and g(middot)is the so-called ldquotriggering functionrdquo because it describes howearthquakes trigger aftershocks One possible triggering func-tion suggested in Ogata (1988) is

g(τimi) = K0

(τi + c)(1+ω)ea(miminusM0)

where τi = t minus ti is the time elapsed since earthquake i K0 gt 0is a normalizing constant governing the expected number ofdirect aftershocks triggered by earthquake i the parametersc aω gt 0 and M0 is the ldquocutoff magnituderdquo that is the lowestearthquake magnitude in the dataset under consideration Theterm K0(τi + c)(1+ω) describing the temporal distribution ofaftershocks is known as the modified OmorindashUtsu law Whilethe literature in seismology usually lets ω gt minus1 the interpre-tation of the modified OmorindashUtsu law as a probability densityfunction requires strictly positive values for ω

The ETAS model has since been extended to describe thespacendashtimendashmagnitude distribution of earthquake occurrences(Ogata 1993b 1998) A version suggested in Ogata (1998) usescircular aftershock regions where the squared distance betweenan aftershock and its triggering event follows a Pareto distribu-tion

λ(t x y|Ht )

= μ(x y) +sum

i tiltt

g(t minus ti x minus xi y minus yimi) (1)

with triggering function

g(t minus ti x minus xi y minus yimi)

= K0ea(miminusM0)

(t minus ti + c)(1+ω)((x minus xi)2 + (y minus yi)2 + d)(1+ρ) (2)

where (xi yi) represents the epicenter of earthquake i d gt 0and ρ gt 0 are parameters describing the spatial distribution oftriggered seismicity and the history of the process up to time t

is now defined as Ht = (ti xi yimi) ti lt t One character-istic of this model is that the aftershock zone does not scalewith the magnitude of the triggering event It has been sug-gested that aftershock zones increase with main shock mag-nitudes and some recent research seems to support this view(Kagan 2002b) Examples of how this can be incorporated intothe ETAS model can be found in Kagan and Knopoff (1987)and Ogata (1998)

Summing things up the ETAS model can be described as abranching process with immigration (spontaneous background

earthquakes) The aftershock activity is modeled through a trig-gering function consisting of two terms one of which modelsthe expected number of aftershocks for earthquake i while theother models the temporal or spacendashtime distribution of the trig-gered aftershocks

3 MAXIMUM LIKELIHOOD ESTIMATION OFTHE ETAS MODEL

Conventional ML estimation attempts to maximize

(θ) =sum

i

log(λ(ti xi yi |Hti

))

minusint T

0

int y1

y0

int x1

x0

λ(t x y|Ht ) dx dy dt (3)

the incomplete data log-likelihood function of model (1) whereθ = (μK0 a cω dρ) is the parameter vector and [x0 x1]times[y0 y1] times [0 T ] is the spacendashtime window in which the dataset(xi yi ti mi) is observed (Ogata 1998 Daley and Vere-Jones2003 chap 7) The term ldquoincomplete datardquo signifies that (θ)

does not incorporate any information about the branching struc-ture and it is used to distinguish it from the (in practiceunattainable) complete data log-likelihood function introducedin Section 4 Typically (3) is maximized by using a numericaloptimization routine because no closed-form solution is gener-ally available Unfortunately in cases where the log-likelihoodfunction is extremely flat in the vicinity of its maximum suchoptimization routines can have convergence problems and canbe substantially influenced by arbitrary choices of starting val-ues To distinguish the conventional MLE computed by numeri-cal maximization from the one based on the EM-type algorithmpresented later we will denote the former by θnum and the lat-ter by θEM Similarly the vector components will be denotedby ωnum ωEM and so on

An illustration may be helpful to demonstrate some of thedifficulties encountered when directly maximizing (3) using nu-merical methods Figure 1 shows a simulated earthquake cata-log of 638 events using model (1) The spacendashtime windowused for this simulation is similar to the Southern Californiadataset described in Section 5 Background earthquakes aresimulated on an area of 8 of longitude by 5 of latitude over aperiod of 7500 days (approximately 20 years) Parameter val-ues as shown in Table 1 are chosen to approximate those usedin descriptions of earthquake catalogs based on Ogata (1998)as well as discussions with UCLA seismologists Yan Y Kaganand Ilya Zaliapin For simplicity a truncated exponential distri-bution is used to model earthquake magnitudes in accordancewith the GutenbergndashRichter law (Gutenberg and Richter 1944)

fGR(m) = βeminusβ(mminusM0)

1 minus eminusβ(MmaxGR minusM0)

(4)

where fGR(m) is the probability density function β = log(10)and M0 = 2 le m le Mmax

GR = 8 where the lower threshold is theapproximate current threshold (since 2001) above which cata-logs of the Southern California Seismic Network (SCSN) arebelieved to be complete (Kagan 2002a 2003) and the upperthreshold is the approximate magnitude of the strongest Cali-fornia earthquakes in historic times the 1857 Fort Tejon earth-quake and the ldquogreatrdquo San Francisco earthquake of 1906

616 Journal of the American Statistical Association June 2008

(a) (b)

Figure 1 Simulated earthquake process using a spacendashtimendashmagnitude ETAS model This figure shows a simulated earthquake catalog usingmodel (1) with a parameterization as described in Table 1 The simulated catalog consists of 638 earthquakes (241 background events and397 aftershocks) The spatial distribution is presented in (a) although 32 (triggered) earthquakes are not shown because they are outside thespecified spacendashtime window The temporal distribution is shown in (b) The spike in activity starting on day 5527 is caused by a magnitude533 earthquake and its aftershocks and corresponds to the cluster on the lower right side of (a)

The use of numerical methods to maximize (3) can be prob-lematic in cases where the log-likelihood is extremely flat un-less some supervision is imposed and intelligent starting valuesare used Figure 2 for instance shows the incomplete data log-likelihood for variations of each component of the parametervector by up to 50 around the MLE The function is quite flataround θnum especially with regard to the parameters μK0 cand d as a result these parameters are difficult to estimate andthey generally are associated with rather large standard errors aswell as numerical challenges during the estimation procedureThe parameters a ω and ρ on the other hand show much morepeaked log-likelihood functions and can hence be estimatedmore stably

The issue of log-likelihood flatness can be aggravated in amultidimensional context In Figure 3 two parameters are var-ied while the others remain constant at their MLEs Again(3) can stay extremely flat along certain trajectories even forlarge deviations from the MLEs The parameter c for instance

Table 1 Specification of the spacendashtimendashmagnitude ETAS model (1)used for simulation

Spacendashtime windowParameter Value for background events

μ(x y) 0008K0 0000305a 23026 [08] times [05] times [07500 days]c 01ω 5 Parameters of the magnitude distribution

d 015 M0 2ρ 8 Mmax

GR 8

NOTE This specification uses a homogeneous background intensity (measured in eventsper day per squared degree) that does not depend on the location The time is measured indays spatial distances are measured in degrees A truncated exponential distribution (4) isused to simulate magnitudes

can be increased to more than four times its MLE and ω in-creased to double its MLE yet the log-likelihood function isreduced only very slightly The problem of log-likelihood flat-ness becomes increasingly severe as more and more parametersare varied at once In more realistic settings where ETAS mod-els are estimated for actual earthquake catalogs none of theparameters would be known in advance

Note that flatness in the log-likelihood function does not nec-essarily imply that accurate estimation of the parameters in theETAS model is unimportant Some of the ETAS parametershave physical interpretations that can be used by seismologiststo characterize earthquake catalogs (see Ogata 1998 and thereferences therein) Further the accurate estimation of ETASparameters is important for comparisons of certain parametersfor different catalogs as well as for studies of bias in ETAS pa-rameter estimation

In cases where the log-likelihood function is extremely flatthe choice of starting values can influence the results In Fig-ure 4 conventional ML estimation is performed using eightdifferent starting values and a standard NewtonndashRaphson op-timization routine The true values of Table 1 are used as start-ing values for all parameters except K0 and a which are variedas indicated in the figure In two cases the estimation resultsare quite close to the true θ In four cases the algorithm findsreasonable estimates for most parameters but fails to find an ac-ceptable estimate for K0 In fact the algorithm does not changethe starting value for K0 at all even though it is roughly 33off its true value In two of the cases the algorithm fails to con-verge

As shown in Figure 4 even if the starting values are closeto the MLE any departure from an optimal choice of tolerancelevels and stopping criteria for the numerical maximization pro-cedure can lead to poor convergence results In this example theNewton-type algorithm used to find the MLEs actually yieldsbetter results if the starting values for K0 are not too close to


(a) (b)

Figure 2 Flatness of the incomplete data log-likelihood function for varying one parameter at a time This figure demonstrates the relativeflatness of the incomplete data log-likelihood function (3) with respect to the components of the parameter estimate θnum (two graphs are shownto improve the legibility) The log-likelihood stays very flat when μ K0 c and d are varied around their MLEs This indicates potentiallyhigh standard errors andor numerical challenges for the estimation of these parameters The parameters a ω and ρ have much more peakedlog-likelihood functions and can hence be estimated more easily

the true values as the computation of gradients may actually bemore accurate in locations farther away from the true parametervalue

Another problem often encountered in practice is that thelog-likelihood (3) can be multimodal (see eg Ogata andAkaike 1982) Whenever the numerical optimization routineconverges to a solution it is quite difficult to determine whetherit has converged to a local maximum or to the global maximumEven if the log-likelihood is unimodal in cases where the log-likelihood is flat there can be numerical multimodality due torounding errors the way these errors affect intermediate and fi-nal results and the way values are stored in memory In caseswhere the log-likelihood surface is extremely flat such as those

shown in Figures 3 and 4 such numerical problems can explainwhy θnum can be very far from the true parameter value Thismakes it difficult to perform simulation studies of bias and as-ymptotic properties for which an automatic procedure would bedesirable

While the main focus of this article is to present how branch-ing process models can be estimated using an EM-type algo-rithm a few remarks will be added here on how to improveconventional ML estimation via a numerical maximization ofthe incomplete data log-likelihood One approach is to maxi-mize (3) with respect to log(c) and log(d) in place of c and d This procedure is computationally more stable because c andd can be quite ill constrained Also the simplex algorithm as

(a) (b)

Figure 3 Flatness of the incomplete data log-likelihood in multidimensional settings The problem of flatness of the incomplete datalog-likelihood function (3) (shown in gray levels) can be aggravated in a multidimensional context In this analysis pairs of components of θ

[c ω in (a) and a ρ in (b)] are varied around the MLE θnum (small white circle) while all the other components remain fixed Along certaintrajectories even large deviations reduce the log-likelihood only marginally


Figure 4 Difficulties in estimating ETAS parameters using conven-tional ML Except for K0 and a the starting values (black dots) forthe components of θ are set to the true values (Table 1) The white cir-cles show the estimation results of conventional ML (θnum) and the+ symbol depicts the location of the true K0 and a The numericalmaximization routine converges to an estimate close to θ in only twocases It fails to converge in two cases and seems incapable of improv-ing the K0 estimate for the four starting configurations in which K0is modified by 33 This could be due to the relative flatness of theincomplete data log-likelihood function (shown in gray-levels) withrespect to variations of K0

well as genetic algorithms have proved to be useful and effi-cient alternatives to the NewtonndashRaphson procedure used in ourstudy Finally Ogata (1993a) presented a ldquofastrdquo algorithm ap-plicable to Markovian conditional intensities that replaces thefirst term of (3) [which is effectively a double sum becauseλ(ti xi yi |Hti ) is itself a sum see (1)] with a single sum Thisprocedure replaces the computation of the inner sum with a nu-merical integration procedure

4 ETAS ESTIMATION USING ANEMndashTYPE ALGORITHM

In their seminal article Dempster et al (1977) establishedthe EM algorithm as the estimation method of choice for in-complete data problems It has been extended in various waysand adapted to a wide range of applications A good overviewof this algorithm and its extensions is provided by McLachlanand Krishnan (1996)

The estimation of the ETAS model can be viewed as an in-complete data problem in which the unobservable quantity ui

identifies whether an earthquake is a background event (ui = 0)or whether it was triggered by a preceding event denoted asui = j for the case that earthquake i was triggered by earth-quake j The branching structure of the ETAS model has previ-ously been used as a computationally efficient simulation pro-cedure for earthquake catalogs (Felzer Becker AbercrombieEkstroumlm and Rice 2002 Zhuang Ogata and Vere-Jones 2004)

It has also facilitated a probabilistic declustering method usinga ldquostochastic reconstructionrdquo of earthquake catalogs (Zhuang etal 2002 2004) For an early version of probabilistic decluster-ing see Kagan and Knopoff (1976) In the works of Zhuanget al probabilities of earthquakes being background eventsor otherwise triggered by preceding events are used to im-prove spatial background intensity estimates Here the branch-ing structure of the model will help estimate all parameters ofthe ETAS model not just the background intensity

In the following consider an ETAS model with an inho-mogeneous background rate μ(x y) While the EM method-ology allows for quite general forms of μ(x y) here we modelan inhomogeneous background rate by subdividing the (inthis case rectangular) spatial observation window [x0 x1] times[y0 y1] into κ cells each with constant intensity μk k isin1 κ The background intensities for the κ cells can thenbe collected in the vector μ = (μ1 μκ) To simplify someof the upcoming expressions it is helpful to define the ex-pected number of background earthquakes in cell k denotedas νk

νk = μk middot (area of cell k) times (length of time window) (5)

where the length of the time window is T [see (3)] Dependingon the situation either μk or νk will be used in the formulasbecause one is simply a fixed multiple of the other The ac-tual number of background events in cell k will be denoted asnk and is modeled as a Poisson random variable with expecta-tion νk

If the complete branching structure of an observed ETASprocess were known (including whether an event is a back-ground event or a triggered event) that is if the unobservedquantities ui were known for all i the complete data log-likelihood c(θ) would be written as

c(θ) =κsum

k=1

minus log(nk) minus νk + nk log(νk)

+sum

i

minus log(li ) minus Gi(θ) + li log(Gi(θ))

+sum

i ui =0

log(ω) + ω log(c) + log(ρ) + ρ log(d)

minus (1 + ρ) log((

xi minus xui

)2 + (yi minus yui

)2 + d)

minus (1 + ω) log(ti minus tui

+ c) minus log(π)

(6)

where the first sum relates to the actual number of back-ground events in each of the κ cells The second sum relatesto the number of direct aftershocks li (in this case triggeredby earthquake i) which also follows a Poisson distributionand whose expectation will be denoted as Gi(θ) Using thetriggering function g(middot) defined in (2) Gi(θ) can be derivedas

Gi(θ) =int infin

0

int infin

minusinfin

int infin

minusinfing(t minus ti x minus xi y minus yi) dx dy dt

= K0πdminusρcminusω

ρωea(miminusM0) (7)

The third sum of (6) is due to the spacendashtime distributionof aftershocks (relative to their triggering events) and its


density can be figured by dividing the triggering functiong(t minus ti x minus xi y minus yi) in (2) by the expected number of after-shocks Gi(θ) in (7)

Complete Data Maximum Likelihood Estimation Gener-ally the complete data log-likelihood c(θ) cannot be maxi-mized in practice because the branching structure is unobserv-able However the use of the EM algorithm will allow a prob-abilistic incorporation of the branching structure To aid in theexposition of the implementation of the EM algorithm assumefor the moment that the quantities ui are indeed known for all iusing them allows us to maximize c(θ) that is find the (inpractice unattainable) complete data MLE denoted as θ Partialderivatives of c(θ) with respect to each component of θ willbe used which are generally more compact than those of theincomplete data log-likelihood (θ) in (3)

Given the quantities ui for all i the parameters for the back-

ground intensity can be estimated using 0= partc(θ)partνk =

minus1 + nk

νkwith k = 12 κ where the notation

= indicatesthat the partial derivatives are set to 0 in order to maximize therespective function in this case c(θ) The complete data MLEis νk = nk and μk can then be computed using (5)

The partial derivatives with respect to the parameters c and ω

describing the temporal aftershock distribution are

0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i

(Gi(θ) minus li ) (8)

0= partc(θ)

partω

=sum

i ui =0

(1

ω+ log(c) minus log

(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i


Note that (8) and (9) would only depend on c and ω if theterms after the first sums in each of the equations could be

ignored This is indeed possible because 0= partc(θ)partK0 =

minus1K0sum

i (Gi(θ) minus li ) will subsequently guarantee thatsumi (Gi(θ) minus li ) equals 0 Setting only the first sums of (8)

and (9) to 0 can be interpreted as using only the time passedbetween the triggering event and the aftershock for the estima-tion of the temporal parameters while ignoring the contributionof the number of aftershocks triggered by each earthquake tothe log-likelihood In this sense a partial information approachis used to estimate c and ω in this step This implies the follow-ing

ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)

where L is the number of triggered earthquakes This equationsystem can be solved by choosing a strictly positive starting

value for c and then iterating between (a) computing the rightsides of (10) and (11) and (b) updating the current values forc and ω by solving the equation system using the quantitiescomputed in step (a) for the right sides

The spatial parameters d and ρ can be estimated analogouslybecause partc(θ)partd and partc(θ)partρ are structurally identical to(8) and (9) respectively with the squared distance between af-tershock and triggering event replacing the time elapsed be-tween the two events

Finally once the parameters of the spacendashtime distributionof aftershocks are known the remaining parameters of θ K0and a which govern the number of triggered aftershocks canbe estimated as parameters of a Poisson regression in whichthe number of triggered earthquakes depends on the magni-tude of the triggering event through Gi(θ) in (7) A similarconditional likelihood approach was investigated by Ogata andAkaike (1982) for a temporal self-exciting point process modelfor earthquake occurrences Ogata and Akaike presented a two-step procedure in which they held the equivalent of parameter a

constant in order to estimate the other parameters of their model[step (a)] They then updated the equivalent of a using Akaikersquos(1974 1977) information criterion (AIC) [step (b)] and iteratedbetween (a) and (b) Our approach as outlined in the follow-ing section applies the conditional likelihood methodology inboth steps of this procedure and does not require the use of anyinformation criteria to direct the algorithm

In summary an algorithm for complete data ML would es-timate (a) μ (b) c ω d and ρ using partial information MLand (c) K0 and a by Poisson regression conditioning on the es-timates obtained in (b)

Maximum Likelihood Estimation Using the EM AlgorithmBecause the quantities ui are unknown (and in fact unobserv-able) complete data ML estimation as outlined previously can-not be implemented in practice However the expected com-plete data log-likelihood can be computed (E step) and thenmaximized (M step) The E step of the EM algorithm requiresestimating the triggering probabilities prob(n+1)(ui = j) for alli j based on a current estimate θ

(n)EM

prob(n+1)(ui = j)

= (g(ti minus tj xi minus xj yi minus yj mj |θ (n)

EM

))(

μ(n)k iisincell k

+iminus1sum

r=1

g(ti minus tr xi minus xr yi minus yrmr |θ (n)

EM

))

(12)

These probabilities allow finding expressions for the expectednumber of background events in cell k n(n+1)

k and the expectednumber of direct aftershocks triggered by each earthquake il(n+1)i

n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)


The expected complete data log-likelihood can then be writtenas

Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1

)) minus νk + n(n)k log(νk)

+sum

i

minus log(

(l(n)i + 1

)) minus Gi(θ) + l(n)i log(Gi(θ))

+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)

times log(ω) + ω log(c) + log(ρ) + ρ log(d)

minus (1 + ρ) log((xi minus xj )

2 + (yi minus yj )2 + d

)

minus (1 + ω) log(ti minus tj + c) minus log(π) (13)

where the gamma function with x = (x + 1) replaces the fac-torials used in (6) allowing for noninteger arguments

The M step of the EM algorithm maximizes (13) In princi-ple this can be done as outlined in the previous section for the(hypothetical) complete data ML procedure For this purposethe quantities nk and li have to be replaced with their counter-parts in expectation that is n

(n)k and l

(n)i and the expressions

for the partial derivatives have to reflect the estimated trigger-ing probabilities (12) because it is unknown which earthquaketriggered which aftershock The equations (10) and (11) used toestimate the temporal parameters c and ω for instance take onthe following form

ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1

ts minus tr + c(n) (14)

1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times log(ts minus tr + c(n)

) (15)

where L(n) = sumi l

(n)i is the expected number of triggered earth-

quakes The corresponding equations for d and ρ are analogousThe following summarizes our proposed EM-type algorithm toestimate the ETAS model

Algorithm 1

Step 0 n = 1 set each component of θ(n)EM to some strictly

positive valueStep 1 (E Step) Based on θ

(n)EM estimate the triggering prob-

abilities prob(n+1)(ui = j) for all i j as shown in (12)Step 2 (M Step) Maximize (13) that is find θ

(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]

(a) Find μ(n+1)EM using ν

(n+1)kEM

= n(n+1)k and (5)

(b) Find c(n+1)EM and ω

(n+1)EM as outlined in the previ-

ous section but using (14) and (15) Compute d(n+1)EM and

ρ(n+1)EM analogously

(c) Fix all components of θ except for K0 and aat their current estimates that is let θ (n+1) = (μ

(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and

a(n+1)EM by solving for K0 and a numerically

0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)

Step 3 If 13θ(n+1)EM = θ

(n+1)EM minus θ

(n)

EM is smaller than some con-vergence criterion stop Otherwise increase n by 1 andrepeat Steps 1ndash3

Dempster et al (1977) showed that under general conditionsestimates obtained using the EM algorithm are consistent justas conventional (ie incomplete data) MLEs are In a way theEM estimates are ldquoexpected MLEsrdquo because they maximize theexpected complete data log-likelihood Like other optimizationroutines EM may converge to a local maximum or saddle pointbut the incorporation of the probabilistic branching structure of-ten leads to unique maxima sometimes with closed-form solu-tions

The robustness and accuracy of the algorithm introduced inthis work becomes evident when directly comparing it to con-ventional ML estimation Using the same starting values asthose in Figure 4 the EM-type algorithm converges to an es-timate of θ very close to the true value in all eight situations Ina more systematic approach 100 synthetic earthquake catalogsare simulated using model (1) and the parameters in Table 1 andthen estimated using the proposed EM-type algorithm as wellas using conventional ML with our EM-type estimates as start-ing values Table 2 shows that the EM-type algorithm generallyproduces estimates that are less biased (the only exception be-ing the parameter μ) Apart from the bias the sampling distrib-utions are quite similar for all components of θ Figure 5 servesas an example and presents the sampling distributions of ωEMand ωnum

A possible explanation for why the EM-type algorithm yieldssuperior estimates is that most theoretical results relating toML estimation only hold asymptotically (see for instance Fer-guson 1996 for a general treatment of this matter and Ogata1978 who derived analogous results for point process mod-els) While both θnum and θEM are consistent in practice witha sample consisting of a limited number of observations EMestimates and conventional MLEs may in fact be different as(θnum) ge (θEM) and E

θEM[c(θEM)] ge E

θEM[c(θnum)] [see

(3) and (13)] In our simulations the number of simulatedearthquakes may not be large enough for reaching an asymp-totic regime in the numerical maximization of the incompletedata log-likelihood function However the number of simulatedearthquakes might be large enough for the EM procedure toproduce accurate results because the expected complete datalog-likelihood incorporates additional structural information In


Table 2 Bias of the parameter estimates using the EM-type algorithm and conventional ML

μ K0(times10minus4) (times10minus5) a c ω d ρ

True values 8000 3050 2303 01000 500 01500 800

Conventional MLEs 8011 2993 2275 01086 519 01625 841Standard errors (527) (708) (113) (00283) (058) (00409) (103)

Bias in of true value +14 minus186 minus122 +856 +380 +835 +513

EM-type estimates 7925 2993 2296 01019 501 01564 824Standard errors (516) (708) (109) (00265) (056) (00423) (112)

Bias in of true value minus94 minus185 minus27 +191 +20 +430 +300

NOTE The bias of the proposed EM-type algorithm and conventional ML is compared by applying both procedures to 100 simulated processes For most parameters the EM-typealgorithm yields results that are closer to the true parameter values described in Table 1

fact for the numerical maximization routine the theoreticalstandard errors based on the Hessian matrix are substantiallylarger than the ones derived by simulation which also suggeststhat the asymptotic regime assumed in the theoretical frame-work has not been attained

To demonstrate the robustness of the proposed EM-type al-gorithm with respect to starting values 10 earthquake catalogsare simulated with the same parameterization as before and thenestimated using 100 different starting values for θ The startingvalues are sampled from a uniform distribution whose rangeis one-fifth of the parameter value to five times the parame-ter value The results indicate that the parameter estimates areaffected only minimally by even substantial offsets in startingvalues The largest observed difference between the smallestand the largest parameter estimate for a component of θ is lessthan 5 of the true parameter value the average being lessthan 1 It may be relevant to note that the small variabilityassociated with different starting values can be controlled by

the convergence criterion used In this implementation our pro-posed algorithm was halted as soon as each component of θ hadconverged to four significant digits

5 APPLICATION TO EARTHQUAKE OCCURRENCESIN SOUTHERN CALIFORNIA

The methodology introduced in the previous section will nowbe used to estimate the ETAS model (1) using seismologicaldata compiled by the Southern California Earthquake Center(SCEC) The data include occurrence times magnitudes andlocations based on measurements taken by a network of al-most 400 seismographic stations throughout Southern Califor-nia The catalog is maintained by the Southern California Seis-mic Network (SCSN) a cooperative project of the CaliforniaInstitute of Technology and the United States Geological Sur-vey and is publicly available at httpwwwdatascecorg Thedata consist of 6796 earthquakes occurring in a rectangulararea around Los Angeles between longitudes minus122 and minus114

(a) (b)

Figure 5 Sampling distributions of parameter estimates The sampling distributions for the different components of θ are quite similar forEM-type (a) and conventional ML (b) estimation though the former is less biased Shown here are the sampling distributions of ωEM and ωnumUsing the parameter values of Table 1 100 earthquake catalogs were simulated and the model then estimated using the two procedures Thevertical lines on each histogram indicate the true parameter value The location of the mean based on the 100 estimates is shown as the largetriangle on the bottom of each histogram One and two standard error intervals are represented by the mid-sized and small triangles respectively


and latitudes 32 and 37 (733 km times 556 km) between Janu-ary 1 1984 and June 17 2004 and are considered completeabove M0 = 3 (Kagan 2002a 2003)

One problem of interest is to estimate spatial background in-tensities which represent the occurrence rate of spontaneousuntriggered earthquakes One way to approach this problem isto use the stochastic declustering method introduced by Zhuanget al (2002 2004 2005) who used a spatial kernel densitysmoother as an estimate of μ(x y) in (1) As an alternative onemay instead incorporate known geological features of South-ern California and estimate background intensities in the ETASmodel for geologically distinct regions within the study areaWe use a regionalization proposed by Zaliapin Keilis-Borokand Axen (2002) who identified seven distinct regions basedon fault orientation historical slip and tectonic setting The re-sulting regions are in agreement with the main geological andfault activity maps for California (Jennings 1977 1994)

The spacendashtime ETAS model (1) with μ(x y) varying acrossseismic regions is estimated using the EM-type algorithm in-troduced in the previous section A conventional maximizationof the log-likelihood could be challenging because the para-meter estimates can be heavily influenced by a poor choice ofstarting values as shown in Section 3 Moreover judging fromthe simulations described in Section 4 the conventional ML ap-proach may have an increased bias compared to the EM-type al-gorithm because it seems to require a substantially larger sam-ple size in order to attain the asymptotic regime guaranteeingconsistent estimation

Table 3 Estimation results of spacendashtime ETAS model (1) forSouthern California 1984ndash2004 M0 = 3

μk Number of Area(events per day earthquakes of region

k per degree2) in region (degree2) Description of region

1 3500 times 10minus3 501 637 SW coastal and offshoreregion

2 16945 times 10minus3 1357 326 SE southern section of SanAndreas fault system

3 3921 times 10minus3 476 614 NW creeping section ofSan Andreas fault system

4 5216 times 10minus3 817 385 Western transverse ranges5 6970 times 10minus3 148 168 Garlock fault system6 5294 times 10minus3 2822 343 Mojave block eastern

transverse ranges7 6130 times 10minus3 637 275 Western Great Basin0 019 times 10minus3 38 1252 Rest

K0 a c ω d ρ

4823 times 10minus5 1034 01922 222 4906 times 10minus5 497

NOTE The estimated background intensities are between 35 times 10minus3 and 697 times 10minus3 insix of the seven regions including the two in which major earthquakes occurred Region 2however has a substantially higher declustered background intensity of 16945 times 10minus3

(see Fig 6)

The estimation results are presented in Table 3 and Figure 6The declustered background intensities for six of the seven re-

Figure 6 Declustered background intensity estimates for Southern California Background intensities are estimated for geologically distinctregions in Southern California using the spacendashtime ETAS model (1) The declustered background intensities are quite similar for six of theseven regions including the ones in which the largest earthquakes and aftershock clusters were observed The only region with a substantiallylarger background rate is the southern section of the San Andreas fault system to the east of San Diego (region 2)


gions are quite similar and range between 3500 times 10minus3 and6970 times 10minus3 events per day per squared degree This is a re-markable result because strong earthquakes with large after-shock clusters were observed in the western transverse ranges(region 4) and the Mojave block (region 6) with 817 and2822 earthquakes respectively while the Garlock fault sys-tem (region 5) had very few (148) observed seismic eventsNevertheless our declustered background seismicity estimatessuggest that the occurrence rate of spontaneous untriggeredearthquakes is similar in six of the seven regions Only re-gion 2 the area covering the southern section of the San An-dreas fault system has a substantially higher background inten-sity of 16945 times 10minus3 events per day per squared degree

Some parameter estimates are not in line with geophysicaltheory The model parameter a = 1034 for instance is com-monly thought to be closer to log(10) asymp 23 (Felzer Abercrom-bie and Ekstroumlm 2004 Helmstetter Kagan and Jackson 20052006) However several empirical studies estimate this para-meter to a value closer to unity (see for instance Ogata 1998Helmstetter et al 2006) Helmstetter et al (2006) attributed thisdownward bias to data errors in earthquake catalogs (incom-pleteness and measurement errors concerning earthquake loca-tions and magnitudes) as well as model misspecifications Alsomany seismologists consider the parameter estimate c = 01922to be too large (or even an artifact) due to catalog incomplete-ness of small earthquakes after strong events (Kagan 2004Helmstetter et al 2005)

The fact that the southern section of the San Andreas faultsystem is highly active is well established in the literature Forexample Fialko (2006) found a high level of seismic strain ac-cumulation in this region Given that the last two major Cali-fornia earthquakes in 1857 and 1906 ruptured the middle andnorthern sections of the San Andreas fault it is believed thatfaults in the southern section of the fault system currently posethe highest seismic risk in California (see Fialko 2006 and thereferences therein) This is supported by paleoseismological ev-idence estimating the average recurrence time of large earth-quakes in that area to be between 200 and 300 years togetherwith the fact that no such large event (magnitude 7 or larger)has been observed in the last 250 years Fialko concluded thatthe southern San Andreas fault system may be in the late phaseof its interseismic recurrence

The ETAS model usually assumes that the earthquake mag-nitude distribution is separable from the spacendashtime featuresof the model Therefore in this framework an increased back-ground seismicity rate in a particular area does not directly im-ply an increased risk for a large-magnitude event However anincreased incidence of strong earthquakes can be anticipatedsimply because more earthquakes are expected in general Inthis sense an elevated seismic hazard may exist in the southernsection of the San Andreas fault system compared to all otherregions in Southern California

6 DISCUSSION AND CONCLUDING REMARKS

We presented an EM-type algorithm that maximizes the ex-pected complete data log-likelihood function The advantagesof this algorithm compared to conventional ML estimation weresubstantial for the case of estimating the spacendashtime ETAS

model in terms of convergence bias and robustness to choiceof starting values

This methodology should also work well for other specifi-cations of ETAS models In fact it is applicable to all kindsof branching process models where the information of whichevent ldquotriggersrdquo which other event is not observable but can bedescribed probabilistically because this allows the incorpora-tion of the branching structure in expectation

As the simulations in Section 3 illustrate if the number of ob-servations is limited our proposed estimation procedure may infact yield more accurate results because the asymptotic regimeunder which the desirable properties of conventional ML arewell known may not be reached In addition even in caseswhere a direct numerical maximization of the incomplete datalog-likelihood may be preferred the EM-type algorithm pro-posed here may be a useful way of obtaining starting valuesfor the optimization routine Further the reliability with whichthe EM-type procedure converges to a reasonable estimate maybe especially attractive for simulation studies in which one isinterested in repeatedly simulating and estimating a model forexample in order to estimate the bias or variance of certain pa-rameter estimates Such repetitions are very difficult using con-ventional ML procedures due to the required oversight involvedand occasional lack of convergence

Our seismological application involved the use of geologi-cally distinct regions within Southern California incorporatingthe known morphological and tectonic conditions present in thisregion However it should be noted that the EM-type algorithmcould just as well be combined with a spatial kernel smoothingmethod as used by Zhuang et al (2002 2004 2005) in order toestimate continuous background intensities

Our results suggest that the declustered background inten-sity in the area covering the southern section of the San An-dreas fault system is substantially higher than in other regionsof Southern California In this sense an elevated seismic riskexists in this area

While Southern California appears to be rather naturally di-vided into distinct seismic regions a subjective element in out-lining the regions remains Changes to the borders of the re-gions could in fact change the background intensity estimatesas could misspecification of the ETAS model For instance thespacendashtime model employed in this work uses circular after-shock regions whereas aftershocks usually occur on faults withfault orientation varying over different regions Moreover aspointed out at the end of Section 2 there is evidence that after-shock regions scale with the magnitude of the main shock Theinvestigation and application to California seismicity of ETASmodels with noncircular and magnitude-scaling aftershock dis-tributions are important directions for further research

[Received July 2006 Revised April 2007]

REFERENCES

Akaike H (1974) ldquoA New Look at Statistical Model Identificationrdquo IEEETransactions on Automatic Control 19 716ndash722

(1977) ldquoOn Entropy Maximization Principlerdquo in Application of Sta-tistics ed P R Krishnaiah Amsterdam North-Holland pp 27ndash41

Cox D R (1975) ldquoPartial Likelihoodrdquo Biometrika 62 269ndash276Daley D and Vere-Jones D (2003) An Introduction to the Theory of Point

Processes (2nd ed) New York SpringerDempster A Laird N and Rubin D (1977) ldquoMaximum Likelihood From

Incomplete Data via the EM Algorithmrdquo Journal of the Royal Statistical So-ciety Ser B 39 1ndash38


Felzer K R Abercrombie R E and Ekstroumlm G (2004) ldquoA Common Originfor Aftershocks Foreshocks and Multipletsrdquo Bulletin of the SeismologicalSociety of America 94 88ndash98

Felzer K R Becker T W Abercrombie R E Ekstroumlm G and Rice J R(2002) ldquoTriggering of the 1999 MW 71 Hector Mine Earthquake by After-shocks of the 1992 MW 73 Landers Earthquakerdquo Journal of GeophysicalResearch 107 2190

Ferguson T S (1996) A Course in Large Sample Theory London Chapmanamp Hall

Fialko Y (2006) ldquoInterseismic Strain Accumulation and the Earthquake Po-tential on the Southern San Andreas Fault Systemrdquo Nature 441 968ndash971

Gutenberg B and Richter C F (1944) ldquoFrequency of Earthquakes in Cali-forniardquo Bulletin of the Seismological Society of America 34 185ndash188

Hawkes A G (1971a) ldquoPoint Spectra of Some Mutually Exciting PointProcessesrdquo Journal of the Royal Statistical Society Ser B 33 438ndash443

(1971b) ldquoSpectra of Some Self-Exciting and Mutually Exciting PointProcessesrdquo Biometrika 58 83ndash90

Hawkes A G and Adamopoulos L (1973) ldquoCluster Models for Earth-quakesmdashRegional Comparisonsrdquo Bulletin of the International Statistical In-stitute 45 454ndash461

Helmstetter A Kagan Y Y and Jackson D D (2005) ldquoImportance of SmallEarthquakes for Stress Transfers and Earthquake Triggeringrdquo Journal of Geo-physical Research 110 B05S08

(2006) ldquoComparison of Short-Term and Long-Term Earthquake Fore-cast Models for Southern Californiardquo Bulletin of the Seismological Society ofAmerica 96 90ndash106

Jennings C W (1977) Geological Map of California Scale 1 750000 Sacra-mento CA California Division of Mines and Geology

(1994) Fault Activity Map of California and Adjacent Areas Scale1 750000 Sacramento CA California Division of Mines and Geology

Kagan Y Y (2002a) ldquoAftershock Zone Scalingrdquo Bulletin of the SeismologicalSociety of America 92 641ndash655

(2002b) ldquoModern California Earthquake Catalogs and Their Compar-isonrdquo Seismological Research Letters 73 921ndash929

(2003) ldquoAccuracy of Modern Global Earthquake Catalogsrdquo Physics ofthe Earth and Planetary Interiors 135 173ndash209

(2004) ldquoShort-Term Properties of Earthquake Catalogs and Modelsof Earthquake Sourcerdquo Bulletin of the Seismological Society of America 941207ndash1228

Kagan Y Y and Knopoff L (1976) ldquoStatistical Search for Nonrandom Fea-tures of the Seismicity of Strong Earthquakesrdquo Physics of the Earth and Plan-etary Interiors 12 291ndash318

(1987) ldquoStatistical Short-Term Earthquake Predictionrdquo Science 2361563ndash1567

Lomnitz C (1974) Global Tectonics and Earthquake Risks London ElsevierMcLachlan G J and Krishnan T (1996) The EM Algorithm and Extensions

New York WileyOgata Y (1978) ldquoThe Asymptotic Behaviour of Maximum Likelihood Esti-

mators for Stationary Point Processesrdquo Annals of the Institute of StatisticalMathematics 30 243ndash261

(1988) ldquoStatistical Models for Earthquake Occurrences and ResidualAnalysis for Point Processesrdquo Journal of the American Statistical Associa-tion 83 9ndash27

(1993a) ldquoFast Likelihood Computation of Epidemic Type AftershockSequence Modelrdquo Geophysical Research Letters 20 2143ndash2146

(1993b) ldquoSpacendashTime Modeling of Earthquake Occurrencesrdquo Bul-letin of the International Statistical Institute 55 249ndash250

(1998) ldquoSpacendashTime Point-Process Models for Earthquake Occur-rencesrdquo Annals of the Institute of Statistical Mathematics 50 379ndash402

(1999) ldquoSeismicity Analysis Through Point-Process Modeling A Re-viewrdquo Pure and Applied Geophysics 155 471ndash507

Ogata Y and Akaike H (1982) ldquoOn Linear Intensity Models for Mixed Dou-bly Stochastic Poisson and Self-Exciting Point Processesrdquo Journal of theRoyal Statistical Society Ser B 44 102ndash107

R Development Core Team (2007) R A Language and Environment for Statis-tical Computing Vienna R Foundation for Statistical Computing

Vere-Jones D (1970) ldquoStochastic Models for Earthquake Occurrencerdquo Jour-nal of the Royal Statistical Society Ser B 32 1ndash62

(1975) ldquoStochastic Models for Earthquake Sequencesrdquo GeophysicalJournal of the Royal Astronomical Society 42 811ndash826

Zaliapin I Keilis-Borok V and Axen G (2002) ldquoPremonitory Spreading ofSeismicity Over the Faultrsquos Network in Southern California Precursor Ac-cordrdquo Journal of Geophysical Research 107 2221

Zhuang J Ogata Y and Vere-Jones D (2002) ldquoStochastic Declustering ofSpacendashTime Earthquake Occurrencesrdquo Journal of American Statistical Asso-ciation 97 369ndash380

(2004) ldquoAnalyzing Earthquake Clustering Features by Using Stochas-tic Reconstructionrdquo Journal of Geophysical Research 109 B05301

(2005) ldquoDiagnostic Analysis of SpacendashTime Branching Processes forEarhquakesrdquo in Case Studies in Spatial Point Process Modelling eds A Bad-deley P Gregori J Mateu R Stoica and D Stoyan Springer pp 275ndash292


earthquake occurrence models can be found in Hawkes andAdamopoulos (1973) Lomnitz (1974 chap 7) and Kagan andKnopoff (1987)

A particularly important example of a self-exciting pointprocess is the ETAS model which was first introduced by Ogata(1988) and is widely used to describe earthquake occurrencesEarly forms of the ETAS model (Ogata 1988) only took magni-tudes and earthquake occurrence times into account

λ(t |Ht ) = μ +sum

itiltt

g(t minus ti mi)

where the history of the process Ht = (ti mi) ti lt t also in-cludes earthquake magnitudes mi μ is the (in this case con-stant) background intensity of earthquake occurrences and g(middot)is the so-called ldquotriggering functionrdquo because it describes howearthquakes trigger aftershocks One possible triggering func-tion suggested in Ogata (1988) is

g(τimi) = K0

(τi + c)(1+ω)ea(miminusM0)

where τi = t minus ti is the time elapsed since earthquake i K0 gt 0is a normalizing constant governing the expected number ofdirect aftershocks triggered by earthquake i the parametersc aω gt 0 and M0 is the ldquocutoff magnituderdquo that is the lowestearthquake magnitude in the dataset under consideration Theterm K0(τi + c)(1+ω) describing the temporal distribution ofaftershocks is known as the modified OmorindashUtsu law Whilethe literature in seismology usually lets ω gt minus1 the interpre-tation of the modified OmorindashUtsu law as a probability densityfunction requires strictly positive values for ω

The ETAS model has since been extended to describe thespacendashtimendashmagnitude distribution of earthquake occurrences(Ogata 1993b 1998) A version suggested in Ogata (1998) usescircular aftershock regions where the squared distance betweenan aftershock and its triggering event follows a Pareto distribu-tion

λ(t x y|Ht )

= μ(x y) +sum

i tiltt

g(t minus ti x minus xi y minus yimi) (1)

with triggering function

g(t minus ti x minus xi y minus yimi)

= K0ea(miminusM0)

(t minus ti + c)(1+ω)((x minus xi)2 + (y minus yi)2 + d)(1+ρ) (2)

where (xi yi) represents the epicenter of earthquake i d gt 0and ρ gt 0 are parameters describing the spatial distribution oftriggered seismicity and the history of the process up to time t

is now defined as Ht = (ti xi yimi) ti lt t One character-istic of this model is that the aftershock zone does not scalewith the magnitude of the triggering event It has been sug-gested that aftershock zones increase with main shock mag-nitudes and some recent research seems to support this view(Kagan 2002b) Examples of how this can be incorporated intothe ETAS model can be found in Kagan and Knopoff (1987)and Ogata (1998)

Summing things up the ETAS model can be described as abranching process with immigration (spontaneous background

earthquakes) The aftershock activity is modeled through a trig-gering function consisting of two terms one of which modelsthe expected number of aftershocks for earthquake i while theother models the temporal or spacendashtime distribution of the trig-gered aftershocks

3 MAXIMUM LIKELIHOOD ESTIMATION OFTHE ETAS MODEL

Conventional ML estimation attempts to maximize

(θ) =sum

i

log(λ(ti xi yi |Hti

))

minusint T

0

int y1

y0

int x1

x0

λ(t x y|Ht ) dx dy dt (3)

the incomplete data log-likelihood function of model (1) whereθ = (μK0 a cω dρ) is the parameter vector and [x0 x1]times[y0 y1] times [0 T ] is the spacendashtime window in which the dataset(xi yi ti mi) is observed (Ogata 1998 Daley and Vere-Jones2003 chap 7) The term ldquoincomplete datardquo signifies that (θ)

does not incorporate any information about the branching struc-ture and it is used to distinguish it from the (in practiceunattainable) complete data log-likelihood function introducedin Section 4 Typically (3) is maximized by using a numericaloptimization routine because no closed-form solution is gener-ally available Unfortunately in cases where the log-likelihoodfunction is extremely flat in the vicinity of its maximum suchoptimization routines can have convergence problems and canbe substantially influenced by arbitrary choices of starting val-ues To distinguish the conventional MLE computed by numeri-cal maximization from the one based on the EM-type algorithmpresented later we will denote the former by θnum and the lat-ter by θEM Similarly the vector components will be denotedby ωnum ωEM and so on

An illustration may be helpful to demonstrate some of thedifficulties encountered when directly maximizing (3) using nu-merical methods Figure 1 shows a simulated earthquake cata-log of 638 events using model (1) The spacendashtime windowused for this simulation is similar to the Southern Californiadataset described in Section 5 Background earthquakes aresimulated on an area of 8 of longitude by 5 of latitude over aperiod of 7500 days (approximately 20 years) Parameter val-ues as shown in Table 1 are chosen to approximate those usedin descriptions of earthquake catalogs based on Ogata (1998)as well as discussions with UCLA seismologists Yan Y Kaganand Ilya Zaliapin For simplicity a truncated exponential distri-bution is used to model earthquake magnitudes in accordancewith the GutenbergndashRichter law (Gutenberg and Richter 1944)

fGR(m) = βeminusβ(mminusM0)

1 minus eminusβ(MmaxGR minusM0)

(4)

where fGR(m) is the probability density function β = log(10)and M0 = 2 le m le Mmax

GR = 8 where the lower threshold is theapproximate current threshold (since 2001) above which cata-logs of the Southern California Seismic Network (SCSN) arebelieved to be complete (Kagan 2002a 2003) and the upperthreshold is the approximate magnitude of the strongest Cali-fornia earthquakes in historic times the 1857 Fort Tejon earth-quake and the ldquogreatrdquo San Francisco earthquake of 1906


(a) (b)







d 015 M0 2ρ 8 Mmax

GR 8







(a) (b)






(a) (b)















c(θ) =κsum

k=1


+sum

i


+sum

i ui =0



xi minus xui

)2 + (yi minus yui

)2 + d)


+ c) minus log(π)

(6)


Gi(θ) =int infin

0

int infin

minusinfin

int infin










minus1 + nk





0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i


0= partc(θ)

partω

=sum

i ui =0

(1


(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i




minus1K0sum



ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)








(n)EM

prob(n+1)(ui = j)


EM

))(

μ(n)k iisincell k

+iminus1sum

r=1


EM

))

(12)



n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)



Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES










































(a) (b)







d 015 M0 2ρ 8 Mmax

GR 8







(a) (b)






(a) (b)















c(θ) =κsum

k=1


+sum

i


+sum

i ui =0



xi minus xui

)2 + (yi minus yui

)2 + d)


+ c) minus log(π)

(6)


Gi(θ) =int infin

0

int infin

minusinfin

int infin










minus1 + nk





0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i


0= partc(θ)

partω

=sum

i ui =0

(1


(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i




minus1K0sum



ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)








(n)EM

prob(n+1)(ui = j)


EM

))(

μ(n)k iisincell k

+iminus1sum

r=1


EM

))

(12)



n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)



Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES










































(a) (b)






(a) (b)















c(θ) =κsum

k=1


+sum

i


+sum

i ui =0



xi minus xui

)2 + (yi minus yui

)2 + d)


+ c) minus log(π)

(6)


Gi(θ) =int infin

0

int infin

minusinfin

int infin










minus1 + nk





0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i


0= partc(θ)

partω

=sum

i ui =0

(1


(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i




minus1K0sum



ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)








(n)EM

prob(n+1)(ui = j)


EM

))(

μ(n)k iisincell k

+iminus1sum

r=1


EM

))

(12)



n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)



Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES





















































c(θ) =κsum

k=1


+sum

i


+sum

i ui =0



xi minus xui

)2 + (yi minus yui

)2 + d)


+ c) minus log(π)

(6)


Gi(θ) =int infin

0

int infin

minusinfin

int infin










minus1 + nk





0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i


0= partc(θ)

partω

=sum

i ui =0

(1


(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i




minus1K0sum



ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)








(n)EM

prob(n+1)(ui = j)


EM

))(

μ(n)k iisincell k

+iminus1sum

r=1


EM

))

(12)



n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)



Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES














































minus1 + nk





0= partc(θ)

partc

=sum

i ui =0

(ω

cminus 1 + ω

ti minus tui+ c

)

+ ω

c

sum

i


0= partc(θ)

partω

=sum

i ui =0

(1


(ti minus tui

+ c))

+(

1

ω+ log(c)

)sum

i




minus1K0sum



ω

(1 + ω)c= 1

L

sum

i ui =0

1

ti minus tui+ c

(10)

1

ω+ log(c) = 1

L

sum

i ui =0

log(ti minus tui

+ c) (11)








(n)EM

prob(n+1)(ui = j)


EM

))(

μ(n)k iisincell k

+iminus1sum

r=1


EM

))

(12)



n(n+1)k =

sum

iisincellkige2

(

1 minusiminus1sum

j=1

prob(n+1)(ui = j)

)

l(n+1)i =

sum

sgei+1

prob(n+1)(us = i)



Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES











































Eθ

(n)

EM[c(θ)]

=sum

k

minus log(

(n

(n)k + 1


+sum

i

minus log(

(l(n)i + 1


+sum

ige2

iminus1sum

j=1

prob(n)(ui = j)




)




(n)k and l



ω(n)

(1 + ω(n))c(n)= 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)

times 1


1

ω(n)+ log

(c(n)

) = 1

L(n)

sum

sge2

sminus1sum

r=1

prob(n)(us = r)


) (15)

where L(n) = sumi l



Algorithm 1





(n+1)EM =

arg maxθ Eθ

(n)

EM[c(θ)]


(n+1)kEM

= n(n+1)k and (5)






(n+1)EM

K0 a c(n+1)EM ω

(n+1)EM d

(n+1)EM ρ

(n+1)EM ) Find K

(n+1)0EM

and


0= partE

θ(n)

EM

[c

(θ (n+1)

)]partK0

= minus1K0

sum

i

(Gi

(θ (n+1)

) minus l(n+1)i

)

0= partE

θ(n)

EM

[c

(θ (n+1)

)]parta

= minussum

i

((Gi

(θ (n+1)

) minus l(n+1)i

)(mi minus M0)

)


(n+1)EM minus θ

(n)





θEM[c(θEM)] ge E

θEM[c(θnum)] [see





True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES












































True values 8000 3050 2303 01000 500 01500 800











(a) (b)














K0 a c ω d ρ



(see Fig 6)

















REFERENCES





















































K0 a c ω d ρ



(see Fig 6)

















REFERENCES























































REFERENCES












































































Documents

Estimation of Space Time Branching Process Models in ...bemlar.ism.ac.jp/zhuang/Refs/Refs/veen2008jasa.pdf · Veen and Schoenberg: Estimation of Space Time Branching Process Models