Download pdf - Seismic Tomography - hestia.lgs.jussieu.frhestia.lgs.jussieu.fr/~boschil/tomography/Lecture2a_02112011.pdf · Seismic Tomography FS2011 ... porary deployments of seismic stations

Seismic Tomography FS2011S. Husen

Seismic TomographyFall semester 2011

8

66.5

7.5

01020304050607080

Dept

h (k

m)

0100

200300

400500

6000

1020304050607080

Dept

h (k

m) 500

600

6

6.5

6.5

7

7

7.5

7.5

8

88

2000

4000

Elev

. (m

)

2000

4000

Elev

. (m

)

1020304050607080

300200

100

01020304050607080

Dept

h (k

m

0

8 7

66.5

7.5

3 4 5 6 7 8 9P-Wellengeschwindigkeit [km/s]

N

S

SW

Ivrea Körper

Europäische Moho

Adriatische Moho

NE

Zentralalpen

Apennin

Westalpen

Ostalpen

Stephan Husen: [email protected]

02.11.2011- Theory: Setting up the inverse problem- LSQR solution- Damping

mailto:[email protected]

mailto:[email protected]


Recap: Travel times in seismic tomographyEach distinct phase can be associated with a certain travel path


Recap: Travel times in seismic tomographyEach distinct phase can be associated with a certain travel path “Travel time” of certain seismic phases vs. epicentral distance Each distinct phase has a certain travel

time


Recap: Travel times in seismic tomographyEach distinct phase can be associated with a certain travel path “Travel time” of certain seismic phases vs. epicentral distance Each distinct phase has a certain travel

time

Travel time tomography


Recap: Travel time tomography

In travel time tomography we compare observed travel times against computed travel times.

We change the model to minimize the difference between observed and computed travel times.

Travel time tomography is an inverse problem!

BIJWAARD ET AL' GLOBAL TRAVEL TIME TOMOGRAPHY 30,057

recent data from the U.S. Geological Survey's National Earthquake Information Center and from several tem- porary deployments of seismic stations [Engdahl et al., 1998]. The data processing included phase reidentifi- cation, travel time recalculation through the ak135 reference model [Kennett et al., 1995], and hypocenter re- determination (including P and $, PKiKP, PKPdf, and the teleseismic depth phases pP, pwP, and sP in the location procedure). The data set comprises over 82,000 well-constrained earthquakes and a total of 12 million first- and later-arriving seismic phases observed in the period 1964-1995. Engdahl et al. [1998] conclude from regionally systematic location shifts and a reduction of scatter in Wadati-Benioff zone seismicity that hypocenter locations have been significantly improved. Further- more, density plots of travel time residuals against epicentral distance no longer display the well-known de- pendence of ISC delay times on epicentral distance that indicates deviations of the reference model velocities from the layer-averaged real Earth. From this database, we select 7.6 million teleseismic (i.e., from epicentral distances larger than 25 ø) P, pP, and pwP data with travel time residuals between -3.5 and +3.5 s and re- gional (< 25 ø) P phases with absolute residuals smaller than 7.5 s. For pP and pwP phases, only events with hypocenters deeper than 35 km are included. All data are corrected for the Earth's ellipticity and station ele- vations, and the pP and pwP data are also corrected for bounce point topography and water depth, respectively.

The data set of Engdahl et al. [1998] has also been used for tomographic purposes by van der Hilst et al. [1997], who grouped the P and pP data into summary rays. In general, the use of ray bundles reduces the number of data and thus leads to a smaller inverse problem. Furthermore, the combined data are distributed more equally through the model, and the signal-to-noise ratio is increased. We therefore bundle the rays as well, but we combine the data into composite rays [Spakman and Nolet, 1988] constructed from I to 220 individual rays (the maximum number of single rays in the data set that fall within a ray bundle). A summary ray is a single ray that represents some average of a ray bundle; a composite ray is a ray bundle forged from all rays from an event cluster volume of, in our case, 30 x 30 x 30 km to a single station. A summary ray intersects only the cells traversed by the single ray it is represented by, whereas a composite ray intersects all cells traversed by rays in the ray bundle. We calculate composite ray residuals as the average of the original delays, allowing rays with sharp initial onsets to weigh twice as heavily as those with emergent onsets. Furthermore, P (and Pn), pP, and pwP phases are weighted with their specific stan- dard deviations as determined from the raw data (1.3, 1.4, and 1.4 s, respectively). The small event cluster volume of our composite rays leads to many more, narrow ray bundles (4.7 million) than the number of summary rays that have been used by van der Hilst et al. [1997] (namely, 500,000). This prevents the averaging of data

over large ray bundle volumes, which is prerequisite for achieving small-scale resolution.

The clustering of the 7.6 million single rays into 4.7 million composite rays divided over 34,000 event clus- ters reduces the data variance by 16.7%. In order to adapt to the different ray bundle sizes, we weighted the data used prior to inversion:

- w71- ' (1)

where Wrb represents the ray bundle weight, wi is the weight of ray i in the ray bundle, dti is the delay of ray i, and dt is the average delay of the ray bundle. The total weights of the different composite rays were, however, restricted to vary over one order of magnitude only. Residual density plots for the data used are shown in Figures la and lc. Although the data distribution is fiat, we still observe small fluctuations in the averages as a function of epicentral distance (denoted by the dashed white lines), indicating small deviations of the reference model from the best possible average for the data used.

0 10 20 30 40 50 60 70 80 90 100 epicentral distance (degrees)

0 ]::.::i:' li}i•ii•iii[•..{!ii:!•jii•IJ•/ 7000 composite rays

0 10 20 30 40 50 60 70 80 90 100 epicentral distance (degrees)

0 I' .!}!?•i!Ii!i•iI•i!•I"iiJ!i• 200 composite rays

Figure 1. Density plots of (a) P delays before inversion, (b) P delays after inversion, (c) pP delays before inversion, and (d) pP delays after inversion. Dashed white lines denote average residual per epicentral distance.

Density plots of a) P delays before inversion, and b) after inversion. Note the decrease in P delays after in inversion (from Bijward et al., JGR 1998).


Theory: Seismic arrival timeTravel time tomography uses mostly first-arriving body waves (P- and/or S-waves).

Station PLONS, ∆ = ~100 km




first arriving P-wave: Ti





How can we relate arrival times to tomographic models?




Arrival time Ti of seismic wave

Ti = f(hn, mk)

hi: hypocentral parameters (n=1,4)mk: seismic velocities (k=1,ktot)t0: origin time

non-linear function






Arrival time Ti of seismic wave

Ti = f(hn, mk)

hi: hypocentral parameters (n=1,4)mk: seismic velocities (k=1,ktot)t0: origin time

non-linear function


!

Ti = t0

+1

v(r(s))ds

raypath

"

includes unknown hypocenter location!



Theory: Travel time residual

ti calc = f(hn

est, mkest)

ti = t0 + tiobs - ti

calc

Ti obs = f(hn, mk) = t0 + ti

obs

We define the travel time residual as:



ti calc = f(hn

est, mkest)


calc


obs observed




ti calc = f(hn

est, mkest)


calc


obs observed

calculated




ti calc = f(hn

est, mkest)


calc


obs observed

calculated


Goal in seismic arrival time tomography is to minimize travel time residuals by changing the model parameters!


Theory: Linearization and normal equations

Taylor series expansion gives a linear relation between travel time residual and model parameters:

€

ti =∂f (hn ,mk )

∂hnΔhn +

∂f (hn,mk )mk

Δmk +k=1

ktot

∑n=1

4

∑ e




€

ti =∂f (hn ,mk )

∂hnΔhn +

∂f (hn,mk )mk

Δmk +k=1

ktot

∑n=1

4

∑ e

Partial derivatives (computed by solving the forward problem)




€

ti =∂f (hn ,mk )

∂hnΔhn +

∂f (hn,mk )mk

Δmk +k=1

ktot

∑n=1

4

∑ e

Model adjustments (searched for)




€

ti =∂f (hn ,mk )

∂hnΔhn +

∂f (hn,mk )mk

Δmk +k=1

ktot

∑n=1

4

∑ e

In matrix notation (many observations):

t = Hh + Mm + e = Ad + e

Model adjustments (searched for)

normal equations


Theory: Hypocenter-Velocity Coupling

Coupling between hypocenter locations and seismic velocities demands simultaneous inversion:

t = Hh + Mm + e



Earthquake location

t = Hh + e


t = Hh + Mm + e

Special cases:

ignoring velocity part



Earthquake location

t = Hh + e

Teleseismic tomography

t = Mm + e


t = Hh + Mm + e

Special cases:

ignoring velocity part

ignoring hypocenter part



a fault across which velocity changes discontinuouslyfrom 5.0 to 6.0 km s!1. If a hypothetical earthquakeoccurs on the fault and a homogeneous velocitymodel of 5.5 km s!1 is used to locate the event, itscalculated location would be 0.75 km to the right ofthe fault. If we then invert for a laterally heterogeneousvelocity model using least squares, placing a disconti-nuity at the fault (using a priori information), but ignorethe hypocenter–velocity structure coupling (i.e., keepthe event location fixed), we obtain a model with avelocity of 5.44 km s!1 to the left of the fault and5.56 km s!1 to the right. In contrast, if we perform asimultaneous inversion for structure and location, theearthquake is relocated to within 0.06 km of the fault,and we obtain a model with a velocity of 5.04 km s!1 tothe left of the fault and 6.06 km s!1 to the right. Thusthe velocity-only inversion underestimates the velocitycontrast by nearly an order of magnitude, while thesimultaneous inversion recovers the true structure towithin 1% and determines the event location within100m. The conclusion is that for LET, solving the fullsystem of equations is critical in order to obtain anunbiased solution.

Although many tomography algorithms utilize thefast sparse-matrix solvers such as LSQR, it is worthnoting the historical use of subspace methods (e.g.,Kennett et al., 1988) for efficient inversion procedures,and in particular for dealing with matrix size issueswhen many earthquakes are included in an inversion.For LET, three groups independently and nearlysimultaneously published comparable methods forseparating hypocenter parameters from velocitymodel parameters, allowing the efficient solution ofsmaller matrix problems in place of one giant problem(Pavlis and Booker, 1980; Spencer and Gubbins, 1980;Rodi et al., 1981). As a motivation for this approach,consider an LET problem with 10 000 earthquakesobserved on average at 50 stations, and with 20 000

model parameters. The full system matrix would be ofsize (50" 10 000) by (4" 10 000# 20 000), or 500 000by 60 000. If we take advantage of the annulling pro-cedure of Pavlis and Booker (1980), for example, wecan decompose the coupled hypocenter–structureequations for each earthquake i,

Hidhi # Sidsi $ ri %47&

where as before Hi and Si are the matrices of deriva-tives of arrival times with respect to hypocenter andmodel parameters, respectively (now for a singleearthquake), and dhi and dsi are the correspondingparameter perturbations. Using the orthogonalmatrix U0i that satisfies U0iHi$ 0 allows us toassemble the decoupled equations

U0iHidhi #U0iSidsi $ U0iri or S0idsi $ r0i %48&

The matrix in the original partial system in [47]would be of size 40 by 20 004, whereas the matrix inthe decoupled system in [48] would be of size 36 by20 000. If we treat all the events this way, the result isa system of equations of size 460 000 by 20 000, asubstantial size reduction achieved at relatively lowcomputational cost. Even greater reduction of theproblem size can be achieved by incrementally con-structing the normal equations, S9TS9$ S9Tr9(Spencer and Gubbins, 1980), resulting in a systemof equations that is only 20 000 by 20 000.Unfortunately, the cost of this last step is a squaringof the condition number of the system matrix, and aloss of matrix sparseness, but the price may be worthpaying in some cases.

1.10.4.4 Static (Station) CorrectionsRevisited

Static (station) corrections, a constant time delay oradvance applied for a given station, are commonly

Station 1 Station 2 Station 3 Station 4

True quakelocation

–10 km –5 km

Vp = 5 km s–1 Vp = 6 km s–1Fault

0 km +6 km +12 km–X +X

Figure 5 Hypothetical 1-D Earth with a velocity discontinuity across a fault with a velocity discontinuity at X$ 0.0 km, fourseismic stations, and an earthquake occurring on the fault. From Thurber CH (1992) Hypocenter–velocity structure coupling inlocal earthquake tomography. Special Issue: Lateral Heterogeneity and Earthquake Location. Physics of the Earth andPlanetary Interiors 75: 55–62.

342 Seismic Tomography and Inverse Methods

Synthetic experiment:(Thurber, 1992)













True quakelocation

–10 km –5 km


0 km +6 km +12 km–X +X



Synthetic experiment:

Relocation with constant velocity (Vp=5.5 km/s): -> 0.75 km to the right of the fault

(Thurber, 1992)













True quakelocation

–10 km –5 km


0 km +6 km +12 km–X +X





Estimating seismic velocities using relocated hypocenter location yields-> 5.44 km/s to the left of the fault and 5.66 km/s to the right of the fault

Vp = 5.44 km/s Vp = 5.66 km/s

(Thurber, 1992)













True quakelocation

–10 km –5 km


0 km +6 km +12 km–X +X





Estimating seismic velocities using relocated hypocenter location yields-> 5.44 km/s to the left of the fault and 5.66 km/s to the right of the fault Simultaneous inversion yields:-> 5.04 km/s to the left of the fault, 6.04 km/s to the right of the fault; hypocenter location within 0.06 km of the fault

Vp = 5.04 km/s Vp = 6.04 km/s

(Thurber, 1992)













True quakelocation

–10 km –5 km


0 km +6 km +12 km–X +X





Estimating seismic velocities using relocated hypocenter location yields-> 5.44 km/s to the left of the fault and 5.66 km/s to the right of the fault Simultaneous inversion yields:-> 5.04 km/s to the left of the fault, 6.04 km/s to the right of the fault; hypocenter location within 0.06 km of the fault

Vp = 5.04 km/s Vp = 6.04 km/s

(Thurber, 1992)

Neglecting hypocenter-velocity coupling will bias estimated seismic velocities!


Theory: Least-square solution

Solution of coupled hypocenter-velocity problem

t = Ad + e

Least-square (L2) solution (e.g. Menke, 1989):

€

L2 − Norm : ||e ||2= | ei |2

i=1

n

∑

d =(ATA)-1ATt general solution

minimize eTe in L2 sense:




t = Ad + e

Least-square (L2) solution (e.g. Menke, 1989):

€

L2 − Norm : ||e ||2= | ei |2

i=1

n

∑

d =(ATA)-1ATt general solution

minimize eTe in L2 sense:

Does (ATA)-1 always exist?




fails if the number of solutions is greater than one.




d =(ATA)-1ATt





d =(ATA)-1ATt





d =(ATA)-1ATt


v




d =(ATA)-1ATt


over-determined

v




d =(ATA)-1ATt


over-determined

v v




d =(ATA)-1ATt


over-determined under-determined

v v




d =(ATA)-1ATt


over-determined under-determined

v v v1 v2




d =(ATA)-1ATt


over-determined under-determined mixed-determined

v v v1 v2




d =(ATA)-1ATt


over-determined

Coupled hypocenter-velocity problem is mix-determined.

under-determined mixed-determined

v v v1 v2



Solution of the mixed-determined coupled hypocenter-velocity problem:

d =(ATA)-1ATt

Ideally, we would like to sort the unknown model parameters into two groups; those that are overdetermined and those that are underdetermined.



Solution of the mixed-determined coupled hypocenter-velocity problem:

d =(ATA)-1ATt

Ideally, we would like to sort the unknown model parameters into two groups; those that are overdetermined and those that are underdetermined.

Singular-Value Decomposition


Theory: Damped least-square solution

Approximation (Menke, 1989):

minimize ψ = eTe + Θ2dTd

d = (ATA+ΘI)-1 ATt

Singular-Value Decomposition is computationally intensive and only feasible for small matrices



Approximation (Menke, 1989):

minimize ψ = eTe + Θ2dTd

d = (ATA+ΘI)-1 ATt damped least-square solution

Singular-Value Decomposition is computationally intensive and only feasible for small matrices



Damping parameter Θ2

d = (ATA+ΘI)-1 ATt




Case 1: Θ2 =0 (underdamped solution)

We minimize error vector e but not solution vector d

-> no solution for underdetermined parameters

d = (ATA+ΘI)-1 ATt




Case 1: Θ2 =0 (underdamped solution)

We minimize error vector e but not solution vector d

-> no solution for underdetermined parameters

Case 2: Θ2 is large (overdamped solution)

We minimize over- and underdetermined parameters->solution does not reflect smallest e

d = (ATA+ΘI)-1 ATt


Theory: Effect of damping on the solutionTest with synthetic data (Switzerland)

Input model



damping = 5000



damping = 5000 damping = 10



damping = 5000 damping = 10damping = 300




Damping affects amplitudes and recovery of shape!




Damping affects amplitudes and recovery of shape!

How to determine the correct damping?


Theory: Damping trade-off curve (L-curve)

(Eberhart-Phillips, 1986)

data

fit

model complexity

Multiple inversions (single iterations) with varying damping:


Theory: Damping trade-off curve (L-curve)

(Eberhart-Phillips, 1986)

data

fit

model complexity

best damping: smoothest model that fits the data best

Multiple inversions (single iterations) with varying damping:

This approach assumes that solution converged after one iteration!


Theory: Damping trade-off curves and convergence

(Diehl et al., 2009)

Multiple inversions (several iterations) with varying damping:

convergence of solution















Damping and convergence of solution are closely linked!