10
CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (a) (b) (c) Figure 1. Update directions for one iteration of the conjugate gradient method (a), the image-guided conjugate gradient method (b), and a quasi-Newton method with application of the inverse projected Hessian (c). ABSTRACT A Hessian matrix in full waveform inversion (FWI) is difficult to compute di- rectly because of high computational cost and an especially large memory re- quirement. Therefore, Newton-like methods are rarely feasible in realistic large- size FWI problems. We modify the BFGS method to use a projected Hessian matrix that reduces both the computational cost and memory required, thereby making a quasi- Newton solution to FWI feasible. Key words: projected Hessian matrix, BFGS method 1 INTRODUCTION Full waveform inversion (FWI) (Tarantola, 1984; Pratt, 1999) is usually formulated as an optimization problem (Symes, 2008), in which we consider the minimization of a nonlinear objective function E: R N R, min mR N E (m) , (1) where the function variable m denotes N parameters, such as seismic velocities, for an earth model. In FWI, the objective function often takes a least-squares form: E (m)= 1 2 kd-F (m) k 2 , where k.k denotes an L2 norm, d denotes the recorded data, and F is a forward data- synthesizing operator, a nonlinear function of the model m. Various iterative methods (Nocedal and Wright, 2000; Tarantola, 2005) for solving this least-squares optimization problem include the conjugate gradient method, Newton’s method, the Gauss-Newton method, and various quasi-Newton methods. Compared to the conjugate gradient method, Newton’s method and other Newton-like methods generally converge faster in fewer iterations. However, Newton-like methods are rarely used to solve realistic seismic full waveform problems, because for such large problems they are costly. Newton- like methods, especially Newton’s method, are suit- able only for solving small- or medium-size optimization problems (Pratt et al., 1998; Sheen et al., 2006; Virieux and Operto, 2009), in which the number N of model parameters ranges from hundreds to thousands. Newton’s method must compute the 1st and 2nd order derivatives of the objective function E (m) with respect to model parameters in m. We refer to these two derivatives as the gradient g (m) ∂E m and the Hes- sian H (m) 2 E m 2 , respectively. The Hessian matrix, in particular, consists of large numbers O(N 2 ) of 2nd

A projected Hessian for full waveform inversion

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A projected Hessian for full waveform inversion

CWP-679

A projected Hessian for full waveform inversion

Yong Ma & Dave HaleCenter for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA

(a) (b) (c)

Figure 1. Update directions for one iteration of the conjugate gradient method (a), the image-guided conjugate gradientmethod (b), and a quasi-Newton method with application of the inverse projected Hessian (c).

ABSTRACTA Hessian matrix in full waveform inversion (FWI) is difficult to compute di-rectly because of high computational cost and an especially large memory re-quirement. Therefore, Newton-like methods are rarely feasible in realistic large-size FWI problems.We modify the BFGS method to use a projected Hessian matrix that reducesboth the computational cost and memory required, thereby making a quasi-Newton solution to FWI feasible.

Key words: projected Hessian matrix, BFGS method

1 INTRODUCTION

Full waveform inversion (FWI) (Tarantola, 1984; Pratt,1999) is usually formulated as an optimization problem(Symes, 2008), in which we consider the minimizationof a nonlinear objective function E: RN → R,

minm∈RN

E (m) , (1)

where the function variable m denotes N parameters,such as seismic velocities, for an earth model. In FWI,the objective function often takes a least-squares form:E (m) = 1

2‖d−F (m) ‖2, where ‖.‖ denotes an L2 norm,

d denotes the recorded data, and F is a forward data-synthesizing operator, a nonlinear function of the modelm.

Various iterative methods (Nocedal and Wright,2000; Tarantola, 2005) for solving this least-squaresoptimization problem include the conjugate gradient

method, Newton’s method, the Gauss-Newton method,and various quasi-Newton methods. Compared to theconjugate gradient method, Newton’s method and otherNewton-like methods generally converge faster in feweriterations. However, Newton-like methods are rarelyused to solve realistic seismic full waveform problems,because for such large problems they are costly. Newton-like methods, especially Newton’s method, are suit-able only for solving small- or medium-size optimizationproblems (Pratt et al., 1998; Sheen et al., 2006; Virieuxand Operto, 2009), in which the number N of modelparameters ranges from hundreds to thousands.

Newton’s method must compute the 1st and 2ndorder derivatives of the objective function E (m) withrespect to model parameters in m. We refer to these twoderivatives as the gradient g (m) ≡ ∂E

∂mand the Hes-

sian H (m) ≡ ∂2E∂m2 , respectively. The Hessian matrix,

in particular, consists of large numbers O(N2) of 2nd

Page 2: A projected Hessian for full waveform inversion

14 Y. Ma & D. Hale

derivatives. To compute each 2nd derivative in the moststraightforward way, we must solve one forward problemF (m). Therefore, the Hessian matrix computed in thisway requires the solution of O(N2) forward problems.Pratt et al. (1998) show a more efficient method thatreduces the number of required forward solutions fromO(N2) to O(N), which is still too costly in realisticlarge-size FWI problems. Moreover, the Hessian matrixconsumes large amounts of memory. In a model with Nparameters, a Hessian matrix stored in single precisionrequires 4N2 bytes of memory.

Because of the high cost of Newton’s method forlarge problems, various methodologies have been pro-posed to approximate the Hessian matrix. The Gauss-Newton method, for example, ignores 2nd-order termsthat account for nonlinearity in the function F (m). Thisapproximation is valid only when the forward operatorF is a linear or quasi-linear function of m. If the initialmodel in FWI is far from the true model and nonlinear-ity is significant, the Gauss-Newton method may notconverge.

Quasi-Newton methods do not compute the Hes-sian matrix explicitly, but instead iteratively updateHessian approximations. The BFGS method (Broyden,1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) isan efficent way to update the Hessian matrix. However,although the BFGS method reduces the computationtime required to approximate a Hessian matrix, it doesnot reduce the computation time required to use theHessian matrix to update model parameters, nor doesit decrease the amount of memory required to store Hes-sian approximations. In practical applications of quasi-Newton methods for FWI, we must reduce both com-putation time and memory consumption significantly.

In this paper, we first review various Newton-likemethods. We then pose FWI in a sparse model spaceand introduce a projected Hessian matrix in that space.We obtain the projected Hessian by using a projectedBFGS method, a version of BFGS in the sparse modelspace. A projected Hessian matrix built in this waysignificantly reduces both computation time and mem-ory consumption. Tests of our projected Hessian on theMarmousi II model suggest that quasi-Newton methodslike ours may be promising in practical applications ofFWI.

2 NEWTON-LIKE METHODS

We consider only iterative methods for solution of thenonlinear FWI optimization problem. In the ith itera-tion of such methods, we update the model m in thedirection of a vector pi:

mi+1 = mi + αipi , (2)

for some step length αi. Newton’s method, the Gauss-Newton method, and quasi-Newton methods differ in

the way in which they compute and use the update vec-tor pi.

2.1 Newton’s method

In Newton’s method, the update vector pi is determinedby first ignoring the higher-order (> 2) terms in a Taylorseries approximation of the objective function E (m):

E (mi + δmi) ≈ E (mi) + δmTi gi

+1

2δmT

i Hiδmi , (3)

where E (mi) denotes the objective function evaluatedat mi, gi = g (mi), and Hi = H (mi).

For models m near the current model estimate mi,Equation 3 approximates the objective function witha quadratic function. We then minimize this approxi-mated E (m) by solving a set of linear equations:

Hiδmi = −gi (4)

with solution

δmi = −H−1i gi . (5)

Therefore, the update direction in Newton’s method issimply

pi = −H−1i gi , (6)

and the step length αi = 1.However, for models with large numbers of param-

eters, high costs for computing the Hessian matrix Hi

prevent the application of Newton’s method. To analyzethe cost, let us explicitly write each element of Hi as

Hjk =∂2E

∂mj∂mk, (7)

where j and k are parameter indices in the model m,ranging from 1 to the total number N of model param-eters.

If, say, N = 500000 in a 2D FWI problem, theHessian Hi is a 500000 × 500000 matrix. To computethis Hessian matrix in the most straightforward way, wemust solve 2.5 × 1011 forward problems F (m) in eachiteration of FWI. Even using the more efficient way inPratt et al. (1998), we can only reduce this number to500000 per iteration. For FWI, which normally requiresmultiple iterations, this cost is prohibitive. Furthermore,the memory required to store this Hessian matrix in sin-gle precision is 1 terabyte, or about 1/2 terabyte con-sidering symmetry in the Hessian.

2.2 Gauss-Newton method

Discarding the second term on the right-hand side ofequation 7, we get an approximated Hessian Ha, withelements

Hjk ≈ JTj Jk , (8)

Page 3: A projected Hessian for full waveform inversion

Projected Hessian 15

where Jj and Jk are the jth and kth columns in the Ja-cobian matrix { ∂F(m)

∂m1, ∂F(m)∂m2

, ..., ∂F(m)∂mN

}, respectively.Using this approximation we obtain the Gauss-Newtonmethod with update direction

pi = −H−1a gi . (9)

Equation 8 shows that the cost of computing Ha isequivalent to that of computing the gradient gi. Prattet al. (1998) show an efficient approach for calculatinggi, and only 2 forward data syntheses F (m) are neces-sary. However, the approximation Ha in effect assumesthat the function F (mi + δmi) is linear with respect tothe difference δmi between the true model m and thecurrent model mi. In practice this assumption is seldomrealistic.

2.3 Quasi-Newton method

A quasi-Newton method iteratively approximates theHessian matrix. The BFGS method (Broyden, 1970;Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) is cur-rently considered the most popular (Nocedal andWright, 2000) quasi-Newton update formula. The basicidea of BFGS is to update the Hessian using changesin both the model mi and the gradient gi, from oneiteration to the next:

Hi+1 = Hi +yiy

Ti

yTi δmi− Hiδmi (Hiδmi)

T

δmTi Hiδmi

, (10)

where yi = gi+1 − gi, δmi = mi+1 −mi.The BFGS method normally begins with an iden-

tity matrix H0 = I in the first iteration, so thatthe first iteration of BFGS is similar to a gradient-descent method. Pseudo-code for implementing theBFGS method is as follows (Nocedal and Wright, 2000):

given m0, g0 = ∂E∂m0

, and H0 = I

for i = 0, 1, 2, ..., until convergence dopi = −H−1

i gisearch for αi, δmi = αipimi+1 = mi + δmi

gi+1 = ∇E (mi+1)yi = gi+1 − gi

Hi+1 = Hi +yiy

Ti

yTi δmi

− Hiδmi(Hiδmi)T

δmTi Hiδmi

end for

The BFGS method is an efficient way to com-pute Hessian approximations. Like the Gauss-Newtonmethod, BFGS requires computation of only gradientsgi, with the cost of only 2 forward modelings F (m).However, unlike the the Gauss-Newton method, BFGSdoes not simply ignore 2nd derivative terms in the Hes-sian. Instead, the BFGS method uses differences in mod-els and gradients between iterations to construct a com-plete Hessian approximation.

Unfortunately, the BFGS method does not reducethe O

(N3

)cost of solving the set of linear equations

(equation 4) for the update vector pi, nor does it re-duce the O

(N2

)memory required to store the Hessian

approximation.

3 PROJECTED HESSIAN MATRIX

The only way to reduce these costs is to reduce thenumber N of model parameters. For this purpose, weintroduce a projected Hessian matrix in a sparse modelspace.

3.1 Sparse optimization

In an approach similar to that used in subspace meth-ods (Kennett et al., 1988; Oldenburg et al., 1993) andalternative parameterizations (Pratt et al., 1998, Ap-pendix A), we construct a finely- and uniformly-sampled(dense) model m from a sparse model s that contains amuch smaller number n << N of model parameters:

m = Rs , (11)

where R is an interpolation operator that interpolatesmodel parameters from the sparse model s to the densemodel m. FWI is then reformulated as a new sparseoptimization problem (Pratt et al., 1998, Appendix A;Ma et al., 2010), in which we minimize a new nonlinearobjective function E: Rn → R,

mins∈Rn

E (Rs) . (12)

In the optimization problem posed in equation 12,model parameters in m are not determined indepen-dently as in equation 1. Instead, the n sparse parametersin s constrain the other N − n parameters in m. Thisconstraint is imposed by the interpolation operator R.Therefore, the optimization problem in equation 12 isequivalent to a linearly constrained optimization prob-lem in the dense model space m, where the number ofconstraints is N −n. In the context of constrained opti-mization, the projected Hessian matrix is suggested byGill et al. (1981) and several other authors.

We rewrite the objective function in equation 3 as

E (Rsi + Rδsi) ≈ E (Rsi) + δsTi RTgi

+1

2δsTi R

THiRδsi , (13)

where we refer to RTHiR as a projected Hessian ma-trix (Gill et al., 1981), and RTgi as a projected gradient.Here RT denotes the adjoint of the interpolation oper-ator R.

For models s near the current model estimate si,equation 13 approximates the objective function witha quadratic function. We then minimize this approxi-mated E (Rs) by solving a set of n linear equations:

RTHiRδsi = −RTgi , (14)

Page 4: A projected Hessian for full waveform inversion

16 Y. Ma & D. Hale

RHRRTHiR

=

RT Hi R Hi× × =

Figure 2. Schematic for computing a projected Hessian Hi.The projected Hessian reduces computation time and mem-

ory consumption.

with a solution for the n unknowns

δsi = −(RTHiR

)−1

RTgi . (15)

Therefore, in Newton’s method the update directionvector becomes

pi = −(RTHiR

)−1

RTgi , (16)

for a step length αi = 1.The projected Hessian RTHiR is a n × n sym-

metric matrix. Figure 2 describes this projected Hes-sian matrix in a schematic fashion, where the tall andthin rectangle denotes the interpolation operator R, theshort and fat rectangle denotes the adjoint operatorRT , and the large and small squares denote the orig-inal and projected Hessian matrices, respectively. Thisschematic diagram illustrates how memory consumptionis reduced from O

(N2

)to O

(n2

). Computation is like-

wise reduced from O(N3

)to O

(n3

).

3.2 Projected BFGS method

A projected Hessian matrix can be approximated it-eratively with an approach similar to the classic BFGSmethod. Apply the interpolation operator R and its ad-joint operator RT to both sides of equation 10 to obtain

RTHi+1R = RTHiR +RTyiy

Ti R

yTi δmi

−RTHiδmi

(RTHiδmi

)TδmT

i Hiδmi. (17)

Using the simple relationship δmi = Rδsi, we canrewrite equation 17 as

RTHi+1R = RTHiR +RTyiy

Ti R

yTi Rδsi

−RTHiRδsi

(RTHiRδsi

)TδsTi R

THiRδsi. (18)

Now simplify equation 18 by defining Hi ≡ RTHiR,gi ≡ RTgi, and yi ≡ RTyi = gi+1 − gi to obtain an

update formula for the projected Hessian matrix:

Hi+1 = Hi +yiy

Ti

yTi δsi−

Hiδsi(Hiδsi

)TδsTi Hiδsi

. (19)

We can further simplify equation 19 with δsi = αipiand Hiδsi = −αigi (Gill et al., 1981):

Hi+1 = Hi +yiy

Ti

αiyTi pi

+gig

Ti

pTi gi. (20)

Equation 20 has the same structure as equation 10,so we refer to this formula for updating the projectedHessian as the projected BFGS method, a version ofthe classic BFGS method in the sparse model spaces. Pseudo-code for implementing this projected BFGSmethod is as follows:

given s0, g0 = RTg0, and H0 = I

for i = 0, 1, 2, ..., until convergence do

pi = −H−1i gi

search for αi, δsi = αipimi+1 = mi + Rδsigi+1 = ∇E (mi+1), gi+1 = RTgi+1

yi = gi+1 − gi

Hi+1 = Hi +yiy

Ti

αiyTi pi

+gig

Ti

pTi gi

(equation 20)

end for

3.3 Wolfe conditions

The projected Hessian matrix H is symmetric positive-definite (SPD) if yTi δsi > 0 in equation 19. This SPDproperty is important for an optimization problem, be-cause it guarantees that a quasi-Newton method thatuses this projected Hessian converges to (possibly lo-cal) minimum. This SPD property can be achieved ifthe step length αi satisfies the sufficient descent condi-tion

E (si + αipi) ≡ E (αi) ≤ E (0) + c1E′ (0)αi , (21)

and the curvature condition

E′ (αi) ≥ c2E′ (0) , (22)

where E′ (0) = gTi pi and E′ (αi) = gTi+1pi are direc-tional derivatives. In equations 21 and 22, c1 and c2are constants ∈ (0, 1). Suggested values for c1 and c2are 0.0001 and 0.9 (Nocedal and Wright, 2000), respec-tively.

Together, equations 21 and 22 are referred to asthe Wolfe conditions (Nocedal and Wright, 2000). If thecurvature condition in equation 22 is replaced by

|E′ (αi) | ≤ c2|E′ (0) | , (23)

then the Wolfe conditions become the strong Wolfe con-ditions used in this study.

Page 5: A projected Hessian for full waveform inversion

Projected Hessian 17

(a)

(b)

Figure 3. The Marmousi II velocity model (a) and an initialvelocity model (b).

3.4 Cost of projected BFGS

In each iteration of the projected BFGS method, we per-form at least one forward calculation F (m) and com-pute at least one gradients g. Therefore, in the idealcase, the cost of each iteration of the projected BFGSmethod is that of three forward calculations F (m).

Furthermore, if the number of parameters n in thesparse model space s is much smaller than the numberNin the dense model m, the amount of memory requiredto store the projected Hessian is significantly reduced,as shown in Figure 2.

4 EXAMPLES

We test our projected BFGS method on the Marmousi IImodel. Figure 3a shows the true model m, and Figure 3bshows the initial model m0, a highly smoothed versionof the true model. We use 11 shots uniformly distributedon the surface, and a 15Hz Ricker wavelet as the sourcefor simulating wavefields. The source and receiver in-tervals are 0.76 km and 0.024 km, respectively. In thisexample, the dense model space has N = 391×1151 pa-rameters. Therefore, computation or storage of the trueHessian matrix for this example is infeasible.

Figure 4. Structurally constrained sample selections. A total

of 165 samples are chosen for our projected BFGS method.

4.1 Projected Hessian and its inverse

Our projected BFGS method poses the optimizationproblem in a sparse model space s. As described byMa et al. (2011), we construct the sparse model s us-ing a structurally constrained sample selection scheme.This selection method considers structures apparent inmigrated images, and places samples along structuralfeatures.

Figure 4 shows a total of 165 scattered sample lo-cations. This set of scattered sample locations are rep-resentative, because

(1) they lie between (not on) reflectors;(2) they are distributed along structural features;(3) there are more samples in structurally complex ar-

eas and fewer samples in simple areas.

The scattered locations together with correspond-ing values at these locations comprise a sparse modelspace s that will be used in the projected BFGS method.We employ image-guided interpolation (IGI) (Hale,2009) as the operator R and the adjoint of image-guidedinterpolation (Ma et al., 2010) as the operator RT . IGIinterpolates values in the sparse model s to a uniformlysampled dense model m, and the interpolant makesgood geological sense because it accounts for structuresin the model.

Figure 5a shows the initial projected Hessian H0,an identity matrix. Figure 5b shows the updated pro-jected Hessian H1 after one iteration of the projectedBFGS method. The update H1 − H0 to the Hessian inthis first iteration is shown separately in Figure 5c. Aswe can see, the projected BFGS method adds signifi-cant off-diagonal components to the initial Hessian H0.Because our line search satisfies the strong Wolfe condi-tions, the inverse of the projected Hessian matrix exists.

Figure 5d shows the inverse Hessian H−11 .

Page 6: A projected Hessian for full waveform inversion

18 Y. Ma & D. Hale

(a) (b)

(c) (d)

Figure 5. The initial Hessian H0 (a), the Hessian matrix H1 after the 1st iteration (b), the Hessian update H1 − H0 in the

1st iteration (c), and the inverse Hessian H−11 (d).

4.2 Eigenvalues and eigenvectors

Note that the last two terms of the right-hand side ofequation 19 are two rank-one matrices, so each iterationof the projected BFGS method is a rank-two update tothe previous Hessian approximation.

Figures 6a, 6b, 6c and 6d show the eigenvalues ofthe projected Hessian Hi in the 1st, 4th, 7th and 10thiterations, respectively. As suggested by the rank-twoupdate, the projected BFGS method changes only twoeigenvalues in each iteration: one, the smallest, and theother, the largest. Figures 7a and 7b show two eigenvec-tors of the projected Hessian H1, corresponding to thelargest and the smallest eigenvalues, respectively.

The eigenvalues and eigenvectors of the projectedHessian have geometric meaning. Consider the objectivefunction E (m) as a multidimensional parabolic bowl(Thacker, 1989). The eigenvector associated with thelargest eigenvalue points in the direction of greatest cur-

vature on the bowl. Updating the model m along theeigenvector direction shown in Figure 7a, we get thelargest rate of change in the objective function E (m).This eigenvector indicates the component of the modelm that can be best determined in FWI. Likewise, theeigenvector associated with the smallest eigenvalue in-dicates the direction of least curvature. Updating themodel m along the eigenvector direction shown in Fig-ure 7b yields the smallest rate of change of E (m). Thiseigenvector indicates the component of the model mthat can be least well determined in FWI.

4.3 Projected quasi-Newton FWI

The projected BFGS method updates the model in eachiteration, and therefore the projected BFGS methodcan be directly used in quasi-Newton solutions to FWI.The key difference between quasi-Newton FWI and

Page 7: A projected Hessian for full waveform inversion

Projected Hessian 19

0 25 50 75 100 125 1500

0.5

1

1.5

Index

Eig

enva

lues

(a)

0 25 50 75 100 125 1500

0.5

1

1.5

2

2.5

Index

Eig

enva

lues

(b)

0 25 50 75 100 125 1500

0.5

1

1.5

2

2.5

Index

Eig

enva

lues

(c)

0 25 50 75 100 125 1500

0.5

1

1.5

2

2.5

3

3.5

Index

Eig

enva

lues

(d)

Figure 6. Eigenvalues of projected Hessian in 1st (a), 4th (b), 7th (c), and 10th (d) iterations.

(a) (b)

Figure 7. Eigenvectors of H1 corresponding to the largest (a) and the smallest (b) eigenvalues. Geometrically, an eigenvector

is a direction on the multidimensional surface of an objective function E (m), and the associated eigenvalue determines thecurvature on the surface. If the model is updated in the eigenvector direction in (a), the rate of change of E (m) is largest. If

the model is updated in the eigenvector direction in (b), the rate of change of E (m) is smallest.

Page 8: A projected Hessian for full waveform inversion

20 Y. Ma & D. Hale

Table 1. Update directions pi for the conjugate gradient

(CG) method, the image-guided CG method, and our quasi-Newton method.

Method Update direction

Conjugate gradientp0 = −g0

βi ≡ β(gi,gi−1

)=

gTi (gi−gi−1)gTi−1gi−1

pi = −gi + βipi−1

Image-guided CG

p0 = −RRT g0

βi = β(RRT gi,RRT gi−1

)pi = −RRT gi + βipi−1

Quasi-NewtonH0 = I

pi = −RH−1i RT gi

other solutions, such as the conjugate gradient methodand Newton’s method, is the update direction. Table1 compares the update directions used in the conju-gate gradient method, the image-guided conjugate gra-dient method (Ma et al., 2011), and the quasi-Newtonmethod.

As suggested in Ma et al. (2011), the image-guidedconjugate gradient method uses a gather-scatter processin applying RRT , and thereby generates low wavenum-bers not present in the simple conjugate gradient due tothe lack of low frequencies in recorded data. The conju-gate gradient in Figure 1a and the image-guided conju-gate gradient in Figure 1b illustrate this difference.

Figures 8a and 8c show inversion results estimatedby the conjugate gradient (CG) method and the image-guided CG method, respectively, after two iterations.The corresponding estimated models in the 10th iter-ation are shown Figure 8b and 8d, respectively. Wecan see that both the conventional CG and the image-guided CG methods mainly update the shallow part ofthe model. This is due to the fact that both the conju-gate gradient and the image-guided conjugate gradienthave unbalanced amplitudes, high in the shallow partand low in the deeper part.

In the case of quasi-Newton FWI, the update di-rection shown in Figure 1c not only contains significantlow wavenumbers, as does the image-guided conjugategradient in Figure 1b, but the amplitudes are more bal-anced as well. In Figure 1c, the amplitudes are higher inthe deeper part. This improvement is due to the use of

the inverse projected Hessian matrix H−1i (see the last

row in Table 1), which works as a filter applied to thegradient.

Figures 8e and 8f show the models estimated in the2nd and 10th iterations of the quasi-Newton method,respectively. As we can see, quasi-Newton FWI updatesthe deeper part of the model in early iterations.

Figure 9 shows the convergence rates of the quasi-Newton method in this Marmousi II example with the

data misfit function and the model misfit function (anL2 norm of the difference between the true model andthe estimated model). Overall, the quasi-Newton FWIshows faster convergence.

5 CONCLUSIONS

A projected BFGS method iteratively approximates theHessian matrix in FWI, thereby reducing both compu-tation time and required memory. The projected BFGSmethod enables FWI to be performed using a quasi-Newton method. As demonstrated by the Marmousi IIexample, quasi-Newton FWI converges in fewer itera-tions.

Compared with the conjugate gradient method orthe image-guided conjugate gradient method, the pri-mary disadvantage of our projected BFGS method isthe computational cost of a single iteration, which isrelatively high because the line search must satisfy theWolfe conditions. Therefore, a further investigation ofthe coefficients in the Wolfe conditions is necessary.

6 ACKNOWLEDGMENT

This work is sponsored by a research agreement betweenConocoPhillips and Colorado School of Mines (SST-20090254-SRA). Yong Ma thanks Diane Witters for helpin learning how to polish this manuscript.

REFERENCES

Broyden, C. G., 1970, The convergence of a class ofdouble-rank minimization algorithms: IMA Journalof Applied Mathematics, 6, 222–231.

Fletcher, R., 1970, A new approach to variable metricalgorithms: The Computer Journal, 13, 317–322.

Gill, P. E., W. Murray, and M. H. Wright, 1981, Prac-tical optimization: Academic Press, London.

Goldfarb, D., 1970, A family of variable-metric meth-ods derived by variational means: Mathematics ofComputation, 109, 23–26.

Hale, D., 2009, Image-guided blended neighbor inter-polation of scattered data: SEG Technical ProgramExpanded Abstracts, 28, 1127–1131.

Kennett, B., M. Sambridge, and P. Williamson, 1988,Subspace methods for large inverse problems withmultiple parameter classes: Geophysical Journal In-ternational, 94, 237–247.

Ma, Y., D. Hale, B. Gong, and Z. Meng, 2011, Image-guided full waveform inversion: CWP Report.

Ma, Y., D. Hale, Z. Meng, and B. Gong, 2010, Fullwaveform inversion with image-guided gradient: SEGTechnical Program Expanded Abstracts, 29, 1003–1007.

Page 9: A projected Hessian for full waveform inversion

Projected Hessian 21

(a) (b)

(c) (d)

(e) (f)

Figure 8. Inverted velocity models for the conjugate gradient method (a,b), the image-guided conjugate gradient method (c,d),

and the quasi-Newton method (e,f). Left (a,c,e): the 2nd iteration; right (b,d,f): the 10th iteration.

Nocedal, J., and S. J. Wright, 2000, Numerical Opti-mization: Springer.

Oldenburg, D., P. McGillvray, and R. Ellis, 1993,Generalized subspace methods for large-scale inverseproblems: Geophysical Journal International, 114,12–20.

Pratt, R., C. Shin, and G. Hicks, 1998, Gauss-Newtonand full newton methods in frequency-space seis-mic waveform inversion: Geophysical Journal Inter-national, 133, 341–362.

Pratt, R. G., 1999, Seismic waveform inversion in thefrequency domain, part 1: Theory and verification in

a physical scale model: Geophysics, 64, 888.Shanno, D. F., 1970, Conditioning of quasi-Newtonmethods for function minimization: Mathematics ofComputation, 111, 647–656.

Sheen, D. H., K. Tuncay, C. E. Baag, and P. J. Ortol-eva, 2006, Time domain gauss-newton seismic wave-form inversion in elastic media: Geophysical JournalInternational, 167, 1373–1384.

Symes, W. W., 2008, Migration velocity analysis andwaveform inversion: Geophysical Prospecting, 56,765–790.

Tarantola, A., 1984, Inversion of seismic-reflection data

Page 10: A projected Hessian for full waveform inversion

22 Y. Ma & D. Hale

(a) (b)

Figure 9. Convergence of the conjugate gradient method, the image-guided conjugate gradient method, and the projected

quasi-Newton method: the data misfit function (a) and the model misfit function (b).

in the acoustic approximation: Geophysics, 49, 1259–1266.

——–, 2005, Inverse problem theory and methods formodel parameter estimation: Society for Industrialand Applied Mathematics.

Thacker, W. C., 1989, The role of the Hessian matrixin fitting models to measurements: Journal of Geo-physical Research, 94, 6177–6196.

Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics: Geo-physics, 74, WCC1–WCC26.