Travel time tomography with adaptive dictionariesacsweb.ucsd.edu/~mbianco/papers/Bianco2018b.pdf · Content may change prior to final publication. Citation information: DOI 10.1109/TCI.2018.2862644,

2333-9403 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCI.2018.2862644, IEEETransactions on Computational Imaging

1

Travel time tomography with adaptive dictionariesMichael J. Bianco, Student Member, IEEE, Peter Gerstoft, Member, IEEE,

Abstract—We develop a 2D travel time tomography methodwhich regularizes the inversion by modeling groups of slownesspixels from discrete slowness maps, called patches, as sparselinear combinations of atoms from a dictionary. We propose touse dictionary learning during the inversion to adapt dictionariesto specific slowness maps. This patch regularization, called thelocal model, is integrated into the overall slowness map, calledthe global model. The local model considers small-scale variationsusing a sparsity constraint and the global model considers larger-scale features constrained using `2 regularization. This strategyin a locally-sparse travel time tomography (LST) approachenables simultaneous modeling of smooth and discontinuousslowness features. This is in contrast to conventional tomographymethods, which constrain models to be exclusively smooth ordiscontinuous. We develop a maximum a posteriori formulationfor LST and exploit the sparsity of slowness patches usingdictionary learning. The LST approach compares favorably withsmoothness and total variation regularization methods on densely,but irregularly sampled synthetic slowness maps.

Index Terms—Dictionary learning, machine learning, inverseproblems, geophysics, seismology, sparse modeling

I. INTRODUCTION

Travel time tomography methods estimate Earth slownessstructure, which contains smooth and discontinuous featuresat multiple spatial scales, from travel times of seismic wavesbetween recording stations. The estimation of slowness (in-verse of speed) models from travel times is often formulatedas a discrete linear inverse problem, where the perturbations intravel time relative to a reference are used to infer the unknownstructure [1], [2]. Such problems are ill-posed, with irregularray coverage of environments, and require regularization toobtain physically plausible solutions.

We propose a 2D travel time tomography method whichregularizes the inversion by assuming small groups of slownesspixels from a discrete slowness map, called patches, are wellapproximated by sparse linear combinations of atoms from adictionary. In this sparse model [3], [4], the atoms representelemental slowness patches and can be generic dictionar-ies, e.g. wavelets, or adapted to specific data by dictionarylearning [5], [6]. This patch regularization, called the localmodel, is integrated into the overall slowness map, calledthe global model. Whereas the local model considers small-scale variations using a sparsity constraint, the global modelconsiders larger-scale features which are constrained using `2regularization.

This local-global modeling strategy with dictionary learninghas been successful in image denoising [3], [7], [8] andinpainting [9], where natural image content is recovered from

M.J. Bianco and P. Gerstoft are with the Scripps Institution of Oceanog-raphy, University of California San Diego, La Jolla, CA 92093-0238, USA,http://noiselab.ucsd.edu (e-mail: [email protected]; [email protected])

Supported by the Office of Naval Research, Grant No. N00014-18-1-2118.

noisy or incomplete data. We use this strategy to recover trueslowness fields from travel time tomography by simultane-ously modeling smooth and discontinuous slowness features.This gives an improvement over conventional methods withglobal damping and smoothness regularization [2], [10] andpixel level regularization, e.g. total variation (TV) regulariza-tion [11], [12], which regularize tomography by encouragingsmooth or discontinuous slownesses. Relative to existing to-mography methods based on wavelets [13], [14] and sparsedictionaries [15]–[18], our formulation of the tomographyproblem permits the adaptation of the sparse dictionaries totravel time data and ray sampling by dictionary learning, forwhich many methods exist [3], [5], [6], [8], [19]. Sparsereconstruction performance is often improved using adaptivedictionaries, which represent well specific data, relative togeneric dictionaries, which achieve acceptable performance formany tasks [4].

Sparse modeling assumes signals can be reconstructed toacceptable accuracy using a small (sparse) number of vectors,called atoms, from a potentially large set or dictionary ofatoms. The parsimony of sparse representations [4] oftenprovides better regularization than, for example, traditional `2model damping [2]. Early sparse approaches were developedin seismic deconvolution [20], [21]. This philosophy has sincebecome ubiquitous in signal processing for image and videodenoising [3], [7], [8] and inpainting [9], and medical imaging[22], [23], to name a few examples. Recent works in acousticsand seismics have utilized sparse modeling, e.g. beamforming[24], [25], matched field processing [26], [27], estimation ofocean acoustic properties [28]–[33]. Dictionary learning hasbeen used to denoise seismic [34], [35] and ocean acoustic[28] recordings, to regularize full waveform inversion [36],[37], and to regularize ocean sound speed profile inversion[29], [30].

Inspired by image denoising [7], we develop a sparse andadaptive 2D travel time tomography method, which we referto as locally-sparse travel time tomography (LST). Whereasin [7], the image pixel values are directly observed, in LSTthe pixel values are inferred from measurements [23]. Thisnecessitates an extra term to fit slowness pixels to travel timeobservations. We develop a maximum a posteriori (MAP)formulation for LST and use the iterative thresholding andsigned K-means (ITKM) [6] dictionary learning algorithm todesign adaptive dictionaries. This improves slowness modelsover generic dictionaries. We demonstrate the performance ofLST for 2D surface wave tomography with synthetic slownessmaps and travel time data. The LST results compare favorablywith two competing methods: a smoothing and dampingapproach [38] referred to as conventional tomography, and TVregularization [11].



2

II. LST MODEL FORMULATION

In developing the LST method, we consider the case of2D travel time tomography, where slowness of the mediumvaries only in two dimensions. In seismic tomography, surfacewave tomography is one case where this assumption is valid[39]. The sensing configuration for such a scenario is shownin Fig. 1. We discretize a 2D slowness map as a W1 ×W2

pixel image, where the pixels have constant slowness. An arrayof sensors in the 2D map measure waves propagating acrossthe array. From these observations, wave travel times betweenthe sensors, t′ ∈ RM , are obtained. We assume t′ givenand disregard refraction of the waves, yielding a ‘straight-ray’formulation of the problem. The tomography problem is toestimate the slowness pixels (see Fig. 1) from t′.

In the following, we develop separately two slowness mod-els, deemed the global and local models, which will be relatedin Section III, and discuss dictionaries for sparse modeling.The global model considers the larger scale or global featuresand relates travel times to slowness. The local model considerssmaller scale, localized features with sparse modeling.

A. Global model and travel times

In the global model, slowness pixels (see Fig. 1) are repre-sented by the vector s′ = sg +s0 ∈ RN , where s0 is referenceslownesses and sg is perturbations from the reference, herereferred to as the global slowness, with N = W1W2. Similarly,the travel times of the M rays are given as t′ = t + t0, wheret is the travel time perturbation and t0 is the reference traveltime. The tomography matrix A ∈ RM×N gives the discretepath lengths of M straight-rays through N pixels (see Fig. 1).Thus t and sg are related by the linear measurement model

t = Asg + ε, (1)

where ε ∈ RM is Gaussian noise N (0, σ2ε I), with mean 0

and covariance σ2ε I. We estimate the perturbations, with s0

and t0 = As0 known. We call (1) the global model, as itcaptures the large-scale features that span the slowness mapand generates t.

B. Local model and sparsity

In the local model, slowness pixels (see Fig. 1) are repre-sented by the vector s′ = ss+s0 ∈ RN , where the sparse slow-ness ss is perturbations from the reference s0. The slownessss is an auxiliary latent variable that is introduced to capturelocal slowness behavior, and is instrumental in the estimationprocedure proposed in Sec. III. In Sec. III-A, ss is preciselyrelated to sg in a Bayesian hierarchy.

Formulating the local model, we assume that patches, or√n ×√n groups of pixels from ss (see Fig. 1) are well

approximated by a sparse linear combination of atoms from adictionary D ∈ Rn×Q of Q atoms. The patches are selectedfrom ss by the binary matrix Ri ∈ {0, 1}n×N . Hence theslownesses in patch i are Riss. The sparse model is

Riss ≈ Dxi and |xi 6= 0| = T ∀ i (2)

where | · | is cardinality, xi ∈ Rn is the sparse coefficients, andT � n is the number of non-zero coefficients. Dxi is referred

Fig. 1: 2D slowness image corresponds to slowness mapdivided into pixels (dashed boxes). The square image patch icontains n pixels. W1 and W2 are the vertical and horizontaldimensions of the image in pixels. Sensors are red x’s and theray paths between the sensors are blue lines.

to as the patch slowness. We call (2) the local model, as itmodels smaller scale, localized features contained by patches.

Each slowness patch Riss is indexed by the row w1 andcolumn w2 of its top-left pixel in the 2D image as (w1,i, w2,i).We consider all overlapping patches, with w1,i and w2,i

differing from their neighbor by ±1 (stride of one). Further,the patches wrap-around the edges of the image [23], [40].Thus, for a N = W1 ×W2 pixel image, there are N patches,and the number of patches per pixel is n. Wrapping the patcheshelps capture local features at the edges of the slownessmap, if there is sufficient ray sampling (see Sec. V). Withoutpatch wrapping, pixels are modeled by as few as one patch,increasing to n patches at

√n pixels from the map edge.

The atoms in D are considered “elemental patches”, whereonly a small number of atoms are necessary to adequately ap-proximate Riss. Atoms can be generic functions, e.g. waveletsor the discrete cosine transform (DCT), or learned from thedata (see Sec. III-C). An example of DCT atoms are shownin Fig. 2. Adaptive dictionaries, which are designed fromspecific instances of data using dictionary learning algorithms,often achieve greater reconstruction accuracy over genericdictionaries. Examples of dictionaries learned from synthetictravel time data (from slowness maps in Fig. 3) are shownin Fig. 4. Relative to generic dictionaries, learned dictionariescan represent the smooth and discontinuous seismic featuresencountered in real inversion scenarios.

III. LST MAP OBJECTIVE AND EVALUATION

We now derive a Bayesian MAP objective for the LSTvariables and an algorithm for its evaluation, with the ultimategoal of estimating the sparse slowness ss (from (2)). Assumingthe travel times t, tomography matrix A, and dictionary Dknown, the solution to the objective gives MAP estimates ofthe global slowness sg (from (1)), ss, and the coefficientsX = [x1, ...,xI ] ∈ RQ×I describing all patches I (from (2)).Since we use a non-Bayesian dictionary learning algorithm(ITKM [6]), dictionary learning is added after the MAPderivation in Sec. III-C.



3

Fig. 2: Discrete cosine transform (DCT) dictionary atoms. TheQ = 49 atoms (ordered in a 7 × 7 grid) each contain n =8 × 8 = 64 pixels. Atom values stretched to full grayscalerange for display.

A. Derivation of MAP objective

Starting Bayes’ rule, we combine the global (1) and local(2) models, and formulate the posterior density of the LSTvariables as

p(sg, ss,X

∣∣t) ∝ p(t∣∣sg, ss,X)p(sg, ss,X). (3)

From (2), ss is conditioned only on X. We further assume thepatch coefficients X independent. Hence, using the chain rulewe obtain from (3)

p(sg, ss,X

∣∣t) ∝ p(t∣∣sg, ss,X)p(sg∣∣ss,X)p(ss,X)∝ p(t∣∣sg, ss,X)p(sg∣∣ss,X)p(ss∣∣X)p(X),(4)

From (1), t is conditioned only on sg, and we assume sg isconditioned only on ss. Hence, we obtain from (4)

p(sg, ss,X

∣∣t) ∝ p(t∣∣sg)p(sg∣∣ss)p(ss∣∣X)p(X). (5)

We approximate p(t∣∣sg), p(sg∣∣ss), p(ss∣∣X) as Gaussian,

and all patch slownesses from (2) independent, giving

p(t∣∣sg) = N (Asg,Σε),

p(sg∣∣ss) = N (ss,Σg),

p(ss∣∣X) =

∏i

p(Riss

∣∣xi) =∏i

N(Dxi,Σp,i

),

(6)

where Σε ∈ RK×K is the covariance of the travel time error,Σg ∈ RN×N is the covariance of sg, and Σp,i ∈ Rn×n is thecovariance of the patch slownesses. Taking the logarithm ofthe conditional probabilities from (6), we obtain

ln p(t∣∣sg) ∝ −1

2(t−Asg)TΣ−1ε (t−Asg),

ln p(sg∣∣ss) ∝ −1

2(sg − ss)

TΣ−1g (sg − ss),

ln p(ss∣∣X) ∝ −1

2

∑i

(Dxi −Riss)TΣ−1p,i (Dxi −Riss).

(7)

Since neighboring patches may share slowness features,xi may be correlated. For simplicity, the coefficients xidescribing the patches are assumed independent,

ln p(X)

=∑i

ln p(xi). (8)

From (5), and (7), and (8) we obtain

ln p(sg, ss,X

∣∣t) ∝ ln{p(t∣∣sg)p(sg∣∣ss)p(ss∣∣X)p

(X)}

∝ −(t−Asg)TΣ−1ε (t−Asg)− (sg − ss)TΣ−1g (sg − ss)

−∑i

{(Dxi −Riss)

TΣ−1p,i (Dxi −Riss) + 2 ln p(xi)}.

(9)

Assuming the coefficients xi sparse, we approximate ln p(xi)

with the `0 pseudo-norm ‖xi‖0, which counts the number ofnon-zero coefficients [3]. We further assume the number ofnon-zero coefficients T is the same for every patch. This givesthe MAP objective as

max{

ln p(sg, ss,X

∣∣t)} = min{− ln p

(sg, ss,X

∣∣t)}∝ min

{(t−Asg)TΣ−1ε (t−Asg) + (sg − ss)

TΣ−1g (sg − ss)

+∑i

(Dxi −Riss)TΣ−1p,i (Dxi −Riss)

}subject to ‖xi‖0 = T ∀ i.

(10)

Further simplifying, we assume the errors are Gaussian iid.Thus, Σε = σ2

ε I, Σg = σ2gI, and Σp,i = σ2

p,iI, where I is theidentity matrix. The LST MAP objective is thus{

sg, ss, X}

= arg minsg,ss,X

{1

σ2ε

‖t−Asg‖22

+1

σ2g

‖sg − ss‖22 +∑i

1

σ2p,i

‖Dxi −Riss‖22}

subject to ‖xi‖0 = T ∀ i,

(11)

where{sg, ss, X

}are the estimates of the LST variables.

B. Solving for the MAP estimate

We find the MAP estimates{sg, ss, X

}solving (11) via

an alternating minimization algorithm [7], [23]. This strategydivides the solution of (11) into three subproblems: 1) theglobal problem, corresponding to the global model (1), whichestimates sg; 2) the local problem, corresponding to the localmodel (2) which estimates X; and 3) an averaging procedurewhich estimates ss. The global problem is least squares with `2regularization. The global result is substituted into the sparsemodel (2) to obtain local structure by solving for the patches,and averaging their estimates. Global `2 regularization hasbeen used with TV regularization in seismic tomography [11],which was an extension of [41]. This modified TV approachis adapted to travel time tomography as a competing methodin Section IV-B. LST combines a global `2-norm constraintwith local regularization of patches, following [3]. A majordistinction between LST and image denoising [7] is that the2D image is inferred from travel time measurements, and notdirectly observed, similar to [23].



4

Fig. 3: Synthetic slowness maps and ray sampling. Slowness s′ for (a) checkerboard map and (b) smooth-discontinuous map(sinusoidal variations with discontinuity). Both maps are W1 = W2 = 100 pixels (1 km/pixel). (c) 64 stations (red X’s), givingin 2016 straight ray (surface wave) paths through synthetic images. (d) Density of ray sampling, in log10 rays per pixel.

The global problem is written from (11) as

sg =arg minsg

1

σ2ε

‖t−Asg‖22 +1

σ2g

‖sg − ss‖22

=arg minsg

‖t−Asg‖22 + λ1‖sg − ss‖22,(12)

where λ1 = (σε/σg)2 is a regularization parameter.The local problem is written from (11), with each patch

solved from the global estimate ss = sg from (12) (decouplingthe local and global problems), giving

xi = arg minxi

‖Dxi −Risg‖22 subject to ‖xi‖0 = T. (13)

With the coefficients estimate X = [x1, ..., xI ] from (13)and global slowness sg from (12) we solve for ss from (11),assuming constant patch variance σ2

p,i = σ2p,

ss =arg minss

1

σ2g

‖sg − ss‖22 +1

σ2p

∑i

‖Dxi −Riss‖22

=arg minss

λ2‖sg − ss‖22 +∑i

‖Dxi −Riss‖22,(14)

where λ2 = (σp/σg)2 is a regularization parameter. Thesolution to (14) is analytic, with ss the stationary point.Differentiating (14) gives

d

dss

{λ2‖sg − ss‖22 +

∑i

‖Dxi −Riss‖22}

=λ2(ss − sg

)+∑i

RTi

(Riss −Dxi

)=(λ2I + nI

)ss − λ2sg − sp,

(15)

where nI =∑i R

Ti Ri and sp = 1

n

∑i R

Ti Dxi. Thus,

ss =λ2sg + nspλ2 + n

, (16)

which gives ss as the weighted average of the patch slownesses{Dxi ∀ i} and sg. When λ2 � n, ss ≈ sp. When λ2 = n,sg and sp have equal weight. In image denoising λ2 = 0 istypical [4].

C. LST algorithm with dictionary learning

The expressions (12), (13), and (16) are solved iteratively,giving the LST algorithm, in Table I, as a MAP estimateof a slowness image with local sparsity constraints and aknown dictionary D. Further implementation details are givenin Sec. III-D. Dictionary learning is added to the algorithm inthe solution to the local problem (13), by optimizing D:

D = arg minD

{minxi‖Dxi −Risg‖22

subject to ‖xi‖0 = T ∀ i}.

(17)

The dictionary learning problem (17) is here solved using theITKM algorithm, Table II (for details, see App. A).

The ITKM solves this bilinear optimization problem (17)by alternately solving for the sparse coefficients xi with Dfixed, using thresholding [3], and solving for D with xi fixedusing a ‘signed’ K-means objective. In the ITKM iterations,the columns of D, dq , are constrained to have unit norm, toprevent scaling ambiguity. For fixed sparsity T , the ITKM iscomputationally more efficient and has better guarantees ofdictionary recovery than the K-SVD [6] [5].

To illustrate the content of learned dictionaries, the atomslearned during LST are shown for the checkerboard (Fig. 4(a))and smooth-discontinuous map (Fig. 4(b)). The atoms fromcheckerboard (Fig. 4(a)) contain sharp edges, which corre-spond to shifted sharp edges from the checkerboard pattern.The atoms from smooth-discontinuous map (Fig. 4(b)) containboth smooth and discontinuous features. The smooth atomscorrespond to the sinusoidal variations, whereas the atomswith sharp edges correspond to shifted features the faultregion. These features give the shift-invariance property to thesparse, representation, which will be discussed in Sec. III-E.Since learned dictionaries are adapted to specific data, theybetter model specific data with a minimal number of atomsthan prescribed dictionaries, such as Haar wavelets or DCT.Methods using dictionary learning have obtained superior per-formance over prescribed dictionaries in e.g. image denoisingand inpainting [4].



5

TABLE I: LST algorithm

Given: t ∈ RK , A ∈ RM×N , s0s = 0 ∈ RN , D0 = Haar wavelet,DCT (or) noise N

(0, 1)∈ Rn×Q, λ1, λ2, T , and j = 1

Repeat until convergence:1. Global estimate: solve (18) using LSQR [42],

sjδ = arg minsδ

‖t−A(sjδ + sj−1s )‖22 + λ1‖sδ‖22,

sjg = sjδ + sj−1s .

2. Local estimatea: Setting sjs = sjg, center patches {Ris

jg ∀ i } and

i. (Dictionary learning) Solve (17) for Dj using ITKM [6] (Table II),Dj = arg min

Dj

{minxi‖Djxi −Risg‖22 subject to ‖xi‖0 = T ∀ i

}i. (generic dictionary) Set Dj = D0.ii. Solve (13) using OMP,

xji = arg minxji

‖Djxji −Risjg‖22 subject to ‖xji‖0 = T ∀ i.

b: Obtain sjs by (16) sjs =λ2s

jg + nsjp

λ2 + nj = j + 1

D. Implementation and complexity

In the following, we give the implementation details andcomplexity of the LST algorithm (see Table I). Since A issparse, the global estimate (12) is solved using the sparse leastsquares program LSQR [42]. For LSQR, (12) is rewritten as

sδ = arg minsδ

‖t−A(sδ + ss)‖22 + λ1‖sδ‖22, (18)

with the substitution sδ = sg − ss, giving sg = sδ + ss.The sparse coefficients xi in the local estimates (13), (17)are obtained using the orthogonal matching pursuit (OMP)algorithm [43]. For LST with dictionary learning, first thedictionary is obtained from ITKM, then (13) is solved withthe same sparsity level T as ITKM. Before solving (13), (17),the slowness patches {Risg ∀ i } are centered [4] – i.e. themean of the pixels in each patch is subtracted. The mean ofpatch i is xi = 1

n1TRisg. Hence, Risg ≈ Dxi + 1xi.The complexity of each LST iteration is determined primar-

ily by LSQR computation in the global problem, O(2MN),and in the local problem by ITKM O(knQN) and OMPO(TnQN), where k is the number of ITKM iterations (seeTable II). For large slowness maps, we expect the LSTcomplexity to be dominated by LSQR. In our simulations weobtain reasonable run times (see Sec. V-B3). In the specialcase when T = 1, the solution to (13) is not combinatorial,and the dictionary learning problem is equivalent to gain-shapevector quantization [5], [44].

E. Advantages

Improved inversion performance over conventional and TVregularized tomography is obtained under the hypothesis thatseismic image patches can be represented as sparse linearcombinations of a small set of elemental patches, or patterns.Such patterns are the atoms in the dictionary D. This hy-pothesis follows numerous works in image processing andneuroscience, e.g. [3], [8], [19], [45], which have shown thatpatches of diverse image content are well approximated witha sparse linear combinations of atoms. This property has been

Fig. 4: Dictionary atoms learned during LST with ITKM, withn = 100 and Q = 150 for (a) checkerboard map (Fig. 3(a))with T = 1 and (b) smooth-discontinuous map (Fig. 3(b)) withT = 2. Atoms adjusted to full grayscale range for display.

TABLE II: ITKM algorithm

Given: j, sjg, D0 = Dj−1 ∈ Rn×Q, T , and h = 1Repeat until convergence:1. Find dictionary indices per (27)

K(Dh−1,yi) = max|K|=T

‖Dh−1K

Tyi‖1,

with yi = Risjg − 1xji .

2. Update dictionary per (32) using coefficient indices K

dh = λl∑i:l∈K(Dh−1,yi)

sign(dh−1l

Tyi)yi

with λl such that ‖dhl ‖2 = 1.h = h+ 1

Ouput: Dj = Dh

exploited for signal denoising and inpainting [3], [8] andclassification [9], [46].

Further, sparse dictionaries trained on overlapping patchespossess the shift-invariance property, whereby features suchas edges are recovered regardless of where they are locatedin an image [3]. LST enables finer resolution as permitted bythe atoms in the dictionary and can exploit shift invariance.Slowness features in Fig. 3(a–b) are shifted such that onlysmall cells of constant slowness may be used with conven-tional tomography (which necessitates damping, Sec. IV-A) to



6

illustrate this effect.

IV. COMPETING METHODS

A. Conventional tomography

We illustrate conventional tomography with a Bayesianapproach [38], which regularizes the inversion with a globalsmoothing (non-diagonal) covariance. Considering the mea-surements (1), the MAP estimate of the slowness is

sg =(ATA + ηΣ−1L

)−1ATt, (19)

where η = (σε/σc)2 is a regularization parameter, σc is the

conventional slowness variance, and

ΣL(i, j) = exp(−Di,j/L

). (20)

Here Di,j is the distance between cells i and j, and L is thesmoothness length scale [10], [38].

B. Total variation regularization

We implement the modified TV regularization method [11],[41]. TV regularization penalizes the gradient between pixels,enforcing piecewise constant models [47], hence we mightexpect TV regularization to preserve well discontinuous orconstant features. The TV method is adapted to the travel timetomography problem, giving the objective{

sg, sTV

}= arg min

sg,sTV

{1

σ2ε

‖t−Asg‖22

+1

σ2g

‖sg − sTV‖22 + µ‖sTV‖TV

},

(21)

where ‖sTV‖TV is the TV regularizer, which penalizes thegradient, and sTV ∈ RN is the TV estimate of the slowness.Similar to LST, TV (21) is solved by decoupling the probleminto two subproblems: 1) damped least squares and 2) TV. Theleast squares problem is

sg =arg minsg

1

σ2ε

‖t−Asg‖22 +1

σ2g

‖sg − sTV‖22

=arg minsg

‖t−Asg‖22 + λ1‖sg − sTV‖22,(22)

where λ1 is related to global LST problem (see (12)) and sTV

is initialized to the reference slowness. The TV problem is

sTV =arg minsTV

1

σ2g

‖sg − sTV‖22 + µ‖sTV‖TV

=arg minsTV

‖sg − sTV‖22 + λTV‖sTV‖TV,(23)

with λTV = σ2gµ. (21) is solved by alternately solving (22)

and (23), like LST, as an alternating minimization algorithm.In this paper (22) is solved using LSQR [42] and (23) is solvedusing the TV algorithm of Chambolle [47], [48]. We set thegradient step size α = 0.25, which is optimal for convergenceand stability of the algorithm [47]. The stopping tolerance isset to 1e-2.

V. SIMULATIONS

We demonstrate the performance of LST (Sec. III, Table I),using both dictionary learning and prescribed dictionaries,

56.92

1

20

40

60

80

100

Ra

nge

(km

)

0.3 0.4 0.5

Slowness (s/km)

0.3

0.4

0.5

Slo

wne

ss (

s/k

m)

True

Estimated

55.24

60.97

56.75

24.41

1 20 40 60 80 100

Range (km)

1 20 40 60 80 100

Range (km)

Fig. 5: Conventional, TV, and LST tomography results withouttravel time error for checkerboard map (Fig. 3(a)): 2D and 1D(from black line in 2D) slowness estimates plotted against trueslowness s′. (a,b) conventional, (c,d) TV regularization, s′TV,and LST with (e,f) Haar wavelet and (g,h) DCT dictionarywith Q = 169, T = 5, n = 64; and (g,h) with dictionarylearning. RMSE (ms/km, (24)), is printed on 2D slownesses.

relative to conventional (Sec. IV-A) and TV (Sec. IV-B)tomography on synthetic slowness maps (e.g. Fig. 3(a,b)).The recovered slownesses from the methods are plotted inFigs. 5–11, with their performance summarized in Table III.The convergence of LST and its sensitivity to the sparsity levelT are shown in Fig. 12. For LST without dictionary learning,the dictionary D is either the overcomplete Haar waveletdictionary or the DCT (see Fig. 2). LST with prescribeddictionaries performs similarly to previous works which usewavelet transforms with a sparsity constraint on the coeffi-cients [15]–[18]. Though LST with prescribed dictionaries isnot competing with these methods, as only two resolutionsare considered (global and local). Relative to conventional and



7

TV regularization, good performance can be obtained withoutdictionary learning and slowness is better recovered whendictionary learning is used.

Experiments are conducted using simulated seismic datafrom two synthetic 2D seismic slowness maps (Fig. 3(a,b))with dimensions W1 = W2 = 100 pixels (km), as well asvariations of these maps. The boxcar checkerboard in Fig. 3(a)demonstrates the recovery of discontinuous seismic slow-nesses. While the checkerboard slowness is quite unrealistic,it is commonly used as a benchmark for seismic tomographymethods. The smooth-discontinuous map Fig. 3(b), is morerealistic and illustrates fault-like discontinuities in an otherwisesmoothly varying (sinusoidal) slowness map, as used in [15].These examples illustrate the modeling flexibility of the LSTalgorithm. We also generate a variety of synthetic slownessmaps, by altering the size of the checkerboard squares andthe width and horizontal location of the discontinuity, inthe checkerboard and smooth-discontinuous slowness maps,respectively. For more details, see Sec. V-A2. The travel timesfrom the synthetic slowness maps are generated by the globalmodel (1). The slowness estimates from LST are s′s = ss+s0 ∈RN (from (16)), for conventional s′g = sg + s0 ∈ RN (from(19)), and for TV s′TV = sTV + s0 ∈ RN (from (23)).

The slowness map pixels are 1 km square and sampledby M = 2016 straight-rays between 64 sensors, shown inFig. 3(c). The travel times t are calculated by numericallyintegrating along these ray paths. The 2D valid-region forslowness map estimates using conventional and TV tomogra-phy is bounded by the outermost pixels along the ray paths (seeFig. 3(c)). The valid-region for LST is obtained by a dilationoperation [49] on the conventional/TV valid-region, effectivelypadding it by 5 pixels. The conventional/TV tomography validregion is used for error calculations for all methods.

To avoid overfitting during dictionary learning (Table II), weexclude patches from training if more than 10 % of the pixelsare not sampled by ray paths. This heuristic works well andwe have not investigated dictionary learning from incompleteinformation. The RMSE (ms/km) of the estimates is given by

RMSE =

√√√√ 1

NP

N∑n

P∑p

(s′np − s′est.,np

)2, (24)

where s′est.,np is s′s, s′g, or s′TV for the n-th pixel location, andthe p-th trial. For σε = 0, P = 1. The RMSE is printed onthe 1D/2D slowness estimates in Figs. 5–12.

A. Without travel time error

We first simulate travel times without errors (σε = 0)to obtain best-case results for the tomography methods. SeeFigs. 5 and 6 for results from LST, conventional, and TVtomography.

1) Inversion parameters for σε = 0: The LST tuningparameters are λ1, λ2, n, Q, and T . The sensitivity of theLST solutions with dictionary learning to λ1 and λ2 are shownin Fig. 7(c,f) for nominal values of n = 100, Q = 150,T = 1 for the checkerboard and T = 2 for the smooth-discontinuous (see Fig. 12(b)). With the prior σε = 0 s,

17.97

1

20

40

60

80

100

Ra

ng

e (

km

)

0.3 0.4 0.5

Slowness (s/km)

0.4

0.5

Slo

wn

ess (

s/k

m)

True

Estimated

1

20

40

60

80

100

Ra

ng

e (

km

)

20.72

15.66

14.95

7.51

1 20 40 60 80 100

Range (km)

1 20 40 60 80 100

Range (km)

0.32 0.37 0.42

Slowness (s/km)

Fig. 6: Conventional, TV, and LST tomography results withouttravel time error for smooth-discontinuous map (Fig. 3(b)).Same order as Fig. 5, with vertical 1D profile of estimatedand true slowness. RMSE (ms/km, (24)), is printed on 2Dslownesses.

we expect the best value of λ1 = 0 km2 (from (12)).From (12), (14) σ2

g is proportional to the variance of thetrue slowness. For the checkerboard (smooth-discontinuous)is σg = 0.10 (0.05) s/km. We assume the slowness patchesare well approximated by the sparse model (13), and expectσ2p � σ2

g . Hence we expect the best value of λ2 (from (14)) tobe small. It is shown for both the checkerboard (Fig. 7(c)) andsmooth-discontinuous maps (Fig. 7(f)) that the best RMSE forthe LST with dictionary learning is obtained when λ1 = 0 km2

and λ2 = 0, though the LST exhibits low sensitivity to thesevalues and recovers well the true slowness for a large rangeof values. We show the effect of varying the sparsity levelT on LST RMSE performance, relative to the true slownessper (24), for the nominal LST parameters in Fig. 12(b). Thevalues used for T were chosen based on the minimum error for



8

73.89

1

20

40

60

80

100

Range (

km

)

61.76 62.79

73.89 56.92 58.23

73.89

1 20 40 60 80 100

Range (km)

57.27 59.01

0.3 0.4 0.5

Slowness (s/km)

57.65

1

20

40

60

80

10055.24 60.68

59.74 57.33 62.98

61.09

1 20 40 60 80 100

59.78 70.17

24.41

1

20

40

60

80

10023.89 24.90

25.43 25.11 25.64

46.00

1 20 40 60 80 100

36.00 35.09

35.63

1

20

40

60

80

10026.14 26.51

35.63 17.99 18.64

35.63

1 20 40 60 80 100

18.10 18.96

22.23

1

20

40

60

80

10020.72 24.48

24.49 23.08 25.81

25.09

1 20 40 60 80 100

24.06 28.55

7.51

1

20

40

60

80

1007.57 8.22

15.05 14.34 15.08

15.49

1 20 40 60 80 100

15.33 15.80

Fig. 7: Conventional, TV, and LST tomography results for different values of regularization parameters for (a–c) checkerboard(Fig. 3(a)) and (d–f) smooth-discontinuous (Fig. 3(b)) maps, without travel time error. (a,d) Conventional s′g, effect of L andη. (b,e) TV regularization s′TV, effect of λ1 and λTV. (c,f) LST s′s, effect of λ1 and λ2. RMSE (ms/km, (24)), is printed on2D slownesses.

these curves, though LST performance exceeds conventionaland TV performance for a wide range of T .

For the LST with the Haar wavelet and DCT dictionaries(both Q = 169, n = 64, since Haar wavelet dimensions powerof 2), the best performance by minimum RMSE was achievedwith λ1 = 0 km2, λ2 = 0, and T = 5 for the checkerboardand T = 2 for smooth-discontinuous maps.

For conventional tomography, there are several methods forestimating the best values of the regularization parameters Land η, but the methods not always reliable [2], [50]. To findthe best parameters, we systematically varied the values of Land η (see Fig. 7(a,d)). The minimum RMSE for conventionaltomography was obtained by L = 10 km and η = 0.1 km2

for both the checkerboard and smooth-discontinuous maps.Similarly, for TV tomography, the best values of the tuning

parameters, λ1 and λTV, were obtained by systematicallyvarying their values (see Fig. 7(b,e)). The minimum RMSE forTV tomography was obtained by λ1 = 1 km2 and λTV = .01 sfor both the checkerboard and smooth-discontinuous maps.

2) Results for σε = 0: While the discontinuous shapes inthe Haar dictionary are similar to the discontinuous contentof the checkerboard image, the local features in the higher

order Haar wavelets overfit the rays where the ray samplingdensity is poor (see Fig. 3(d)). The performance of the Haarwavelets is better for the smooth-discontinuous slowness map(Fig. 6(g–i)) than for the checkerboard (Fig. 5(e,f)). As shownin Fig. 6(g–i), the Haar wavelets add false high frequencystructure to the slowness reconstruction but the trends inthe smooth-discontinuous features are well preserved. Theinversion performance of the DCT transform (Fig. 5(g,h) andFig. 6(j–l)) is better than the Haar wavelets for both cases,but matches less closely the discontinuous slowness features,as the DCT atoms are smooth. The smoothness of the DCTatoms better preserve the smooth slowness structure.

The LST with dictionary learning (Fig. 5(i,j) and Fig. 6(m–o)) achieves the best overall fit to the true slowness, recoveringnearly exactly the true slownesses. As in the other cases, theperformance degrades near the edges of the ray sampling,where the ray density is low, but high resolution is maintainedacross a large part of the sampling region. The RMSE ofthe LST with the Haar wavelet dictionary for the checker-board (Table III) is greater than for conventional tomography,although a better qualitative fit to the true slowness is ob-



9

Fig. 8: Conventional, TV, and LST tomography results forcheckerboard map (Fig. 3(a)) with 100 realizations of Gaussiantravel time error (STD 2% mean travel time): 1D slice ofinversion for one noise realization against true slowness, 1Dslice of mean from over all noise realizations with STD ofestimate against true slowness s′, and 2D RMSE of estimatesover noise realizations. (a–c) conventional tomography s′g, (d–f) TV regularization s′TV, (g–i) LST s′s with dictionary learningwith λ1 = 2 km2, and (j–l) λ1 = 7 km2. RMSE (ms/km, (24)),is printed on 1D errors.

served with the LST. Both in the case of the checkerboardand smooth-discontinuous maps, the TV obtained the highestRMSE, though in regions of the map where the ray samplingwas dense, the discontinuous and constant features were wellrecovered (Fig. 5(c,d) and Fig. 6(d–f)), as expected for TVregularization.

The true variances σε, σg, and σp in the LST regularizationparameters λ1 and λ2 provide best LST performance (seeFig. 7(c,f)), whereas the true variances for conventional to-mography, σε and σc, do not correspond to the best solutions(see Fig. 7(a,d)). The noise-free prior, η = 0 km2, giveserratic inversions, and the conventional tomography is betterregularized by η = 0.1 km2. The noise-free prior also giveserratic results from TV (see Fig. 7(b,e)).

A variety of checkerboard and smooth-discontinuous slow-ness maps with different geometries were generated to more

fully test the tomography methods. The results of these testsare summarized in Table III (and cases with travel time errorare shown in Fig. 11). Different checkerboard maps weregenerated by varying the size horizontal and vertical of thecheckerboard boxes from 5 to 20 pixels (256 permutations),and different smooth-discontinuous maps were generated byvarying the location of the left edge (from 32 to 62 pixels)and width (from 4 to 10 pixels) of the discontinuity (217permutations). From each of these sets, 100 slowness mapswere randomly chosen for simulation. Inversions withouttravel time errors were performed using conventional, TV, andLST (with dictionary learning) tomography using the nominalparameters from the aforementioned test cases, correspondingto the slowness maps in Fig. 3(a,b). LST obtains lower RMSEthan TV or conventional for all simulations for both variedcheckerboard and smooth-discontinuous maps.

B. With travel time errorWe also test the performance of the tomography methods

with travel time errors. We simulate Gaussian travel time errorswith σε = 0.02t, or the uncertainty is 2% of the mean traveltime, which is similar to the model implemented in [14]. Foreach true slowness map and method, we run the inversionsfor 100 realizations of noise N

(0, σε

)(also the random

initialization of D also changes 100 times for LST) andsummarize the statistics of the results. The noise simulationresults for conventional, TV, and LST tomography are in Figs.8 and 9. The RMSE for both approaches, calculated by (24)with P = 100, are in Table III.

1) Inversion parameters for σε = 0.02t: The sensitivity ofthe LST solutions with dictionary learning to λ1 and λ2 areshown in Fig. 10(c,f) for nominal values of n = 100, Q = 150and T = 2 (per Fig. 12(b)) for both the checkerboard andsmooth-discontinuous maps. With the prior σε = 0.02t, we ex-pect for the checkerboard map (σε = 0.27 s, σg = 0.10 s/km)the best value of λ1 ≈ 7.5 km2, and for the the smooth-discontinuous (σε = 0.28 s, σg = 0.05 s/km) the best valueof λ1 ≈ 28.3 km2 (from (12)). We use λ1 = 2 km2 for thecheckerboard (Fig. 8(g–i)) and λ1 = 10 km2 for the smooth-discontinuous map (Fig. 9(i–l)), and achieve lower RMSE thanλ1 = 1 km2 (see Fig. 10(c,f)). Although we expect the truevalues σg to decrease over the LST iterations, prior valuesof σg proportional to the variance of the true slowness workwell. It is further shown in Fig. 10 that, as in the the noise-freecase (Sec. V-A1), the LST recovers well the true slowness fora large range of values.

For the LST with Haar wavelet and DCT dictionaries, thebest performance by minimum RMSE was achieved with λ1 =5 km2, λ2 = 0, and T = 5 for the checkerboard and T = 2for smooth-discontinuous maps. For conventional tomography,the best values by minimum RMSE were L = 6 km and η =10 km2 for the checkerboard and L = 12 km and η = 10 km2

for the smooth-discontinuous slowness maps Fig. 10(a,d). ForTV tomography, the best values by minimum RMSE wereλ1 = 5 km2 and λTV = 0.02 s for both the checkerboard andsmooth-discontinuous slowness maps Fig. 10(b,e).

Considering again the influence of the choice of λ1 andλ2 on the LST with dictionary learning in Fig. 10(c,f), since



10

Fig. 9: Conventional, TV, and LST tomography results for smooth-discontinuous map (Fig. 3(b)) with 100 realizations ofGaussian travel time error (STD 2% mean travel time). Same format as Fig. 8, except vertical 1D slice of inversion is included,and only one LST case (i–l) s′s with dictionary learning. RMSE (ms/km, (24)), is printed on 1D errors.

n = 100 the case λ2 = 100 gives equal weight to the globalsg,n and patch slowness nsp in (16). The LST obtains resultssimilar to the best conventional estimates (see Fig. 10) for thecheckerboard with λ1 = 1 and λ2 = 100 and for the smooth-discontinuous map with λ1 = 10 and λ2 = 500, though theseparameter choices are suboptimal. Although in this case, thesparsity regularization of the patches has an effect similar tothe conventional damping and smoothing regularization from(19), there is no direct relationship between the regularizationin (19) and the sparsity and dictionary learning in (13).

2) Results for σε = 0.02t: The LST with dictionarylearning (Fig. 8(g–i) and Fig. 9(i–l)) achieves the best overallfit to the true slowness, relative to conventional tomography(Fig. 8(a–c) and Fig. 9(a–d)) and TV tomography (Fig. 8(d–f) and Fig. 9(e–h)) as evidenced by qualitative fit and RMSE(Table III). In the case of the checkerboard, a higher valueλ1 = 7 km2 is tested and shown in Fig. 8(j–l). Because ofthe increased damping, the standard deviation (STD) of theestimate (Fig. 8(k)) is less than the case λ1 = 2 km2 (Fig.8(h)), and fit to true profile is improved.

We also simulate inversion for a variety of checkerboardand smooth-discontinuous slowness maps with different ge-ometries (per Sec. V-A2), with travel time error. The resultsof these tests are summarized in Table III and Fig. 11.Inversions with 10 realizations of Gaussian travel time error(σε = 0.02t) were performed using conventional, TV, andLST (with dictionary learning) tomography using the nominalparameters for the slowness maps in Fig. 3(a,b). Fig. 11(a,b)and Fig. 11(c,d) show the results for the varied checkerboardand smooth-discontinuous maps. LST obtains lower RMSE

than TV or conventional for all simulations for both variedcheckerboard and smooth-discontinuous maps, as shown inFig. 11(a,c), a better subjective fit to the true slowness is alsoobserved (Fig. 11(b,d)).

3) Convergence and run time: The algorithms were codedin Matlab. The LST algorithm (Table I) used 100 iterations forall cases and the ITKM (Table II) used 50 iterations. The TValgorithm (Sec. IV-B) also used 100 iterations for all cases.The inversions in Fig. 5 and Fig. 6 with (without) dictionarylearning took 4 min (3 min) on a Macbook Pro 2.5 GHz IntelCore i7. For the same examples, conventional tomography(Sec. IV-A) took 20 s and TV tomography took 3 s.

In Fig. 12(a), it is shown that the LST RMSE travel timeerror (from ss) decreased over the iterations and convergedwithin at most 50 iterations. For the cases with travel timeerror, the RMSE approaches or falls only slightly below σε.Hence, the travel time data was not overfit.

VI. CONCLUSIONS

We have derived a method for travel time tomographywhich incorporates a local sparse prior on patches of theslowness image, which we refer to as the LST algorithm.The LST can use predefined or learned dictionaries, thoughlearned dictionaries gives improved performance. Relative tothe conventional and TV tomography methods presented, theLST is less sensitive the regularization parameters. LST withsparse prior and dictionary learning can solve for both smoothand discontinuous slowness features in an image.

We considered the case of 2D surface wave tomographyand using LST, for the dense sampling configuration and



11

75.15

1

20

40

60

80

100

Range (

km

)

67.80 74.57

64.32 62.32 70.00

62.84

1 20 40 60 80 100

Range (km)

63.51 73.66

0.3 0.4 0.5

Slowness (s/km)

169.48

1

20

40

60

80

100125.54 80.69

67.68 65.42 70.98

65.99

1 20 40 60 80 100

69.11 79.93

39.41

1

20

40

60

80

10040.04 53.27 98.81 130.39

46.77 46.71 48.37 54.56 60.70

77.61

1 20 40 60 80 100

77.46 76.97 77.42 78.10

50.00

1

20

40

60

80

10032.87 32.33

27.49 22.65 25.65

25.02

1 20 40 60 80 100

22.81 27.53

164.95

1

20

40

60

80

100114.51 51.51

32.70 26.64 28.56

26.61

1 20 40 60 80 100

27.33 32.21

28.65

1

20

40

60

80

10031.25 53.59 102.10 132.95

17.94 18.12 20.27 27.62 32.72

30.76

1 20 40 60 80 100

30.68 30.75 32.29 33.67

Fig. 10: Conventional, TV, and LST tomography results for different values of regularization parameters for (a–c) checkerboard(Fig. 3(a)) and (d–f) smooth-discontinuous (Fig. 3(b)) maps, for 100 realizations of Gaussian travel time error (STD 2% meantravel time). (a,d) Conventional mean s′g, effect of L and η. (b,e) TV regularization mean s′TV, effect of λ1 and λTV. (c,f) LSTmean s′s, effect of λ1 and λ2. RMSE (ms/km, (24)), is printed on 2D slownesses.

synthetic images we used, well recovered true slowness mapswith smooth and discontinuous features. The LST is relevantto other tomography scenarios where slowness structure isirregularly sampled, for instance in ocean [51] and terrestrial[52], [53] acoustics.

APPENDIX

A. ITKM algorithm details

The ITKM dictionary learning algorithm [6] (see Table II)is derived from a ‘signed’ K-means objective. In signed K-means, T -sparse coefficients C = [c1, ..., cI ] ∈ RQ×I withcTi ∈ {−1, 1} are assigned to training examples {yi, ...,yI} ∈Rn×I . The training examples are obtained from patch i asyi = Riss, and centered. The minimization problem is{

C,D}

=arg minD

∑i

arg minT,cTi =±1

‖yi −Dci‖22

=arg minD

∑i

arg minT, cti=±1

‖yi −∑t

dtcti‖22,

(25)

where cti is a non-zero coefficient and dt is the corresponding

dictionary atom. Expanding (25) and requiring ‖dt‖22 = 1,

minD

∑i

minT, cti=±1

{‖yi‖22 − 2

∑t

ctidTt yi +B

}= ‖Y‖2F +B − 2 max

D

∑i

max|K|=T

∑t

abs(dTt yi)

= ‖Y‖2F +B − 2 maxD

∑i

max|K|=T

‖DTKyi‖1,

(26)

where ‖ · ‖F is the Frobenius norm, B is a constant, and Kis the set of T dictionary indices having the largest absoluteinner product abs(dtyi), by is by thresholding [3]

K(D,yi) = arg max|K|=T

‖DTKyi‖1. (27)

From (26), the dictionary learning objective is

maxD

∑i

max|K|=T

‖DTKyi‖1, (28)

which finds D that maximizes the absolute norm of the T -largest responses from K. The ITKM solves (28) as a two-stepalgorithm. First, K is obtained from (27). Then, the dictionaryatoms are updated per

maxD

∑i:l∈K(D,yi)

abs(dTl yi). (29)



12

TABLE III: Conventional, TV, and LST tomography RMSE performance. Bold entries are least error.

RMSE (ms/km)Checkerboard Smooth-discon.

Nominal (Fig. 3(a)) Varied Nominal (Fig. 3(b)) VariedError σε = 0∗ 0.02t∗∗ 0 0.02t‡ 0† 0.02t†† 0 0.02t‡

Conventional 56.92 62.32 59.96 65.77 17.97 22.65 13.98 19.00TV 55.24 65.42 58.28 67.05 20.72 26.64 17.16 22.55LST Haar 60.97 68.96 59.71 66.51 15.66 23.07 14.00 18.68LST DCT 56.75 62.16 56.47 62.73 14.95 19.62 13.41 17.24LST Adaptive 24.41 37.26 31.16 37.14 7.51 17.94 7.40 14.62Estimated slownesses plotted in ∗Fig. 5, ∗∗Fig. 8, †Fig. 6, ††Fig 9, and ‡Fig 11.

(29) is solved using Lagrange multipliers, with the constraint‖dl‖2 = 1. The Lagrangian function is

Φ(dl, λ) =∑

i:l∈K(D,yi)

abs(dTl yi)− λ(dTl dl − 1

), (30)

with λ the Lagrange multiplier. Differentiating (30) gives

dΦ

ddl=

∑i:l∈K(D,yi)

sign(dTl yi)yi − 2λdl (31)

The stationary point of (30), per (31), gives the update for dl

dnewl = λl∑

i:l∈K(Dold,yn)

sign(doldl

Tyi)yi, (32)

where λl = 1/(2λ). The complexity of each ITKM iteration isdominated by matrix multiplication, O(nQI), which is muchless than K-SVD [5] which for each iteration calculates theSVD of a n× I matrix Q times.

REFERENCES

[1] N. Rawlinson, S. Pozgay, and S. Fishwick, “Seismic tomography: awindow into deep earth,” Phys. Earth and Planetary Interiors, vol. 178,no. 3, pp. 101–135, 2010.

[2] R. C. Aster, B. Borchers, and C. H. Thurber, Parameter estimation andinverse problems, 2nd ed. Elsevier, San Diego, 2013.

[3] M. Elad, Sparse and Redundant Representations. Springer, New York,2010.

[4] J. Mairal, F. Bach, and J. Ponce, “Sparse modeling for image and visionprocessing,” Found. Trends Comput. Graph. Vis., vol. 8, no. 2–3, pp. 85–283, 2014.

[5] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETrans. Signal Process., vol. 54, pp. 4311–4322, 2006.

[6] K. Schnass, “Local identification of overcomplete dictionaries,” J. Mach.Learning Research, vol. 16, pp. 1211–1242, 2015.

[7] M. Elad and M. Aharon, “Image denoising via sparse and redundantrepresentations over learned dictionaries,” IEEE Trans. Image Process.,vol. 15, no. 12, pp. 3736–3745, 2006.

[8] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learningfor sparse coding,” ACM Proc. 26th Inter. Conf. Mach. Learn., pp. 689–696, 2009.

[9] J. Mairal, F. Bach, and J. Ponce, “Task-driven dictionary learning,” IEEETrans. Pattern Anal. Mach. Intell., no. 4, pp. 791–804, 2012.

[10] A. Tarantola, Inverse problem theory. Elsevier Sci. Pub. Co., Inc., 1987.[11] Y. Lin and L. Huang, “Quantifying subsurface geophysical properties

changes using double-difference seismic-waveform inversion with amodified total-variation regularization scheme,” Geophys. J. Int., vol.203, no. 3, pp. 2125–2149, 2015.

[12] X. Zhang and J. Zhang, “Model regularization for seismic traveltimetomography with an edge-preserving smoothing operator,” J. Appl.Geophys., vol. 138, pp. 143–153, 2017.

[13] L. Y. Chiao and B. Y. Kuo, “Multiscale seismic tomography,” Geophys.J. Int., vol. 145, no. 2, pp. 517–527, 2001.

[14] R. Hawkins and M. Sambridge, “Geophysical imaging using trans-dimensional trees,” Geophys. J. Int., vol. 203, no. 2, pp. 972–1000, 2015.

0 20 40 60 80 100

Case #

0

10

20

30

40

50

60

70

80

RM

SE

(m

s/k

m)

Conventional

TV

LST

52.19

Case 21

20

40

60

80

100

Ra

ng

e (

km

)

77.10

Case 10

73.53

Co

nve

tio

na

l

Case 99

54.55 76.87 76.17

TV

35.61

1 20 40 60 80 100

Range (km)

32.38 32.71

LS

T

0.3 0.4 0.5

Slowness (s/km)

0 20 40 60 80 100

Case #

0

5

10

15

20

25

30

RM

SE

(m

s/k

m)

Conventional

TV

LST

19.79

Case 11

20

40

60

80

100

Ra

ng

e (

km

)

16.80

Case 30

15.66

Co

nve

tio

na

l

Case 99

24.00 20.23 20.02

TV

16.30

1 20 40 60 80 100

Range (km)

14.38 12.83

LS

T

0.3 0.4 0.5

Slowness (s/km)

Fig. 11: Conventional, TV, and LST tomography for 100checkerboard and smooth-discontinuous slowness map con-figurations, each with 10 realizations of travel time error(σε = 0.02t). (a,c) RMSE error for 100 cases, and (b,d) meanslowness for 3 example cases from (a,c). RMSE (ms/km, (24)),is printed on 2D slownesses.

[15] I. Loris, G. Nolet, I. Daubechies, and F. A. Dahlen, “Tomographic in-version using `1–norm regularization of wavelet coefficients,” Geophys.J. Int., vol. 170, no. 1, pp. 359–370, 2007.

[16] I. Loris, H. Douma, G. Nolet, I. Daubechies, and C. Regone, “Nonlinearregularization techniques for seismic tomography,” J. Comput. Phys.,vol. 229, no. 3, pp. 890–905, 2010.

[17] J. Charlety, S. Voronin, G. Nolet, I. Loris, F. J. Simons, K. Sigloch, andI. C. Daubechies, “Global seismic tomography with sparsity constraints:comparison with smoothing and damping regularization,” J. Geophys.Research: Solid Earth, vol. 118, no. 9, pp. 4887–4899, 2013.

[18] H. Fang and H. Zhang, “Wavelet-based double-difference seismic to-mography with sparsity regularization,” Geophys. J. Int., vol. 199, no. 2,pp. 944–955, 2014.

[19] B. A. Olshausen and D. J. Field, “Sparse coding with an overcompletebasis set: a strategy employed by v1?” Vis. Res., no. 37, pp. 3311–3325,1997.

[20] H. L. Taylor, S. C. Banks, and J. F. McCoy, “Deconvolution with the`1–norm,” Geophys., vol. 44, no. 1, pp. 39–52, 1979.

[21] N. R. Chapman and I. Barrondale, “Deconvolution of marine seismicdata using the l1 norm,” Geophys. J. Int., vol. 72, no. 1, pp. 93–100,



13

0 20 40 60 80 100

Iteration

0

0.1

0.2

0.3

0.4

0.5

Tra

vel tim

e R

MS

E (

s)

(a)

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

Slo

wness R

MS

E (

ms/k

m)

(b)

Fig. 12: (a) LST algorithm travel time RMSE convergence vs.iteration (Table I) and (b) slowness RMSE vs. the sparsitylevel T with and without travel time error, with dictionarylearning. Results shown with and without travel time error, cor-responding to the checkerboard (Fig. 5(i), 8(g)) and smooth-discontinuous (Fig. 6(m), 9(i)) slowness maps.

1983.[22] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: the application of

compressed sensing for rapid MR imaging,” Magn. Reson. Med., vol. 58,no. 6, pp. 1182–1195, 2007.

[23] S. Ravishankar and Y. Bresler, “MR image reconstruction from highlyundersampled k-space data by dictionary learning,” IEEE Trans. Med.Imag., vol. 30, no. 5, pp. 1028–1041, 2011.

[24] A. Xenaki, P. Gerstoft, and K. Mosegaard, “Compressive beamforming,”J. Acoust. Soc. Am., vol. 136, no. 1, pp. 260–271, 2006.

[25] P. Gerstoft, C. F. Mecklenbrauker, A. Xenaki, and S. Nannuru, “Mul-tisnapshot sparse Bayesian learning for DOA,” IEEE Signal Process.Lett., vol. 23, no. 10, pp. 1469–1473, 2016.

[26] K. L. Gemba, W. S. Hodgkiss, and P. Gerstoft, “Adaptive and compres-sive matched field processing,” J. Acoust. Soc. Am., vol. 141, no. 1, pp.92–103, 2017.

[27] K. L. Gemba, S. Nannuru, P. Gerstoft, and W. S. Hodgkiss, “Multi-frequency sparse Bayesian learning for robust matched field processing,”J. Acoust. Soc. Am., vol. 141, no. 5, pp. 3411–3420, 2017.

[28] M. Taroudakis and C. Smaragdakis, “De-noising procedures for invertingunderwater acoustic signals in applications of acoustical oceanography,”EuroNoise 2015 Maastricht, pp. 1393–1398, 2015.

[29] T. Wang and W. Xu, “Sparsity-based approach for ocean acoustictomography using learned dictionaries,” OCEANS 2016 Shanghai IEEE,pp. 1–6, 2016.

[30] M. Bianco and P. Gerstoft, “Compressive acoustic sound speed profileestimation,” J. Acoust. Soc. Am., vol. 139, no. 3, pp. EL90–EL94, 2016.

[31] ——, “Dictionary learning of sound speed profiles,” J. Acoust. Soc. Am.,vol. 141, no. 3, pp. 1749–1758, 2017.

[32] ——, “Regularization of geophysical inversion using dictionary learn-ing,” IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP),2017.

[33] Y. Choo and W. Seong, “Compressive sound speed profile inversionusing beamforming results,” Remote Sens., vol. 10, no. 5, p. 704, 2018.

[34] S. Beckouche and J. Ma, “Simultaneous dictionary learning and denois-ing for seismic data,” Geophys., vol. 79, no. 3, pp. 27–31, 2014.

[35] Y. Chen, “Fast dictionary learning for noise attenuation of multidimen-sional seismic data,” Geophys. J. Int., vol. 209, no. 1, pp. 21–31, 2017.

[36] L. Zhu, E. Liu, and J. H. McClellan, “Sparse-promoting full waveforminversion based on online orthonormal dictionary learning,” Geophys.,vol. 82, no. 2, pp. R87–R107, 2017.

[37] D. Li and J. M. Harris, “Full waveform inversion with nonlocal similarityand gradient domain adaptive sparsity-promoting regularization,” ArXive-prints, Mar. 2018, https://arxiv.org/abs/1803.11391.

[38] C. D. Rodgers, Inverse methods for atmospheric sounding: theory andpractice. World Sci. Pub. Co., 2000.

[39] M. P. Barmin, M. H. Ritzwoller, and A. L. Levshin, “A fast and reliablemethod for surface wave tomography,” Pure and Appl. Geo., vol. 158,no. 8, pp. 1351–1375, 2001.

[40] M. Aharon and M. Elad, “Sparse and redundant modeling of imagecontent using an image-signature-dictionary,” SIAM J. Image Sci., vol. 1,no. 3, pp. 228–247, 2008.

[41] Y. Huang, M. K. Ng, and Y.-W. Wen, “A fast total variation minimizationmethod for image restoration,” Multiscale Model. Simul., no. 2, pp. 774–795, 2008.

[42] C. C. Paige and M. A. Saunders, “LSQR: an algorithm for sparse linearequations and sparse least squares,” ACM Trans. Math. Software, vol. 8,no. 1, pp. 43–71, 1982.

[43] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition,” 27th. Annual Asilomar Conference on Signals, Systemsand Computers, IEEE Proc., pp. 40–44, 1993.

[44] A. Gersho and R. M. Gray, Vector quantization and signal compression.Kluwer Academic, Norwell, MA, 1991.

[45] A. Hyvarinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics: AProbabilistic Approach to Early Computational Vision. Springer Sci.Bus. Media, 2009.

[46] I. Fedorov, B. D. Rao, and T. Q. Nguyen, “Multimodal sparse bayesiandictionary learning applied to multimodal data classification,” IEEEConf. Acoust., Speech and Sig. Proc. (ICASSP), pp. 2237–2241, 2017.

[47] A. Chambolle, “An algorithm for total variation minimization andapplications,” J. Math. Imaging and Vision, vol. 20, no. 1–2, pp. 89–97,2004.

[48] M. Zhu, S. J. Wright, and T. F. Chan, “Duality-based algorithms fortotal-variation-regularized image restoration,” Comput. Opt. and App.,vol. 47, no. 3, pp. 377–400, 2010.

[49] R. C. Gonzalez and R. E. Woods, Digital image processing, 3rd ed.Pearson Ed., 2008.

[50] T. Bodin, M. Sambridge, and K. Gallagher, “A self-parametrizingpartition model approach to tomographic inverse problems,” InverseProblems, vol. 25, no. 5, 2009.

[51] C. M. Verlinden, J. Sarkar, W. S. Hodgkiss, W. A. Kuperman, andK. G. Sabra, “Passive acoustic source localization using sources ofopportunity,” J. Acoust. Soc. Am. Expr. Let., vol. 138, no. 1, pp. EL54–EL59, 2015.

[52] P. Annibale, J. Filos, P. A. Naylor, and R. Rabenstein, “TDOA-basedspeed of sound estimation for air temperature and room geometryinference,” IEEE Trans. Audio, Speech, and Lang. Process., vol. 21,no. 2, pp. 234–246, 2013.

[53] R. Rabenstein and P. Annibale, “Acoustic source localization undervariable speed of sound conditions,” Wireless Commun. Mob. Comp.,2017.

Michael J. Bianco received his B.Sc. in Aeronauti-cal and Astronautical Engineering from the PurdueUniversity College of Engineering in 2007 and hisM.Sc. from the University of California San DiegoScripps Institution of Oceanography in 2015. From2008-2012 he was an engineer with Rocketdyne, LosAngeles, USA. He is currently pursuing his Ph.D.at the University of California San Diego in theMarine Physical Laboratory. His research interestsinclude signal processing and machine learning, withapplication to acoustic and seismic inverse problems.

Peter Gerstoft received the Ph.D. from the Tech-nical University of Denmark, Lyngby, Denmark,in 1986. From 1987-1992 he was with Ødegaardand Danneskiold-Samsøe, Copenhagen, Denmark.From 1992-1997 he was at Nato Undersea ResearchCentre, La Spezia, Italy. Since 1997, he has beenwith the Marine Physical Laboratory, University ofCalifornia, San Diego. His research interests includemodeling and inversion of acoustic, elastic and elec-tromagnetic signals.

Dr. Gerstoft is a Fellow of Acoustical Society ofAmerica and an elected member of the International Union of Radio Science,Commission F.

Documents

Travel time tomography with adaptive dictionariesacsweb.ucsd.edu/~mbianco/papers/Bianco2018b.pdf · Content may change prior to final publication. Citation information: DOI 10.1109/TCI.2018.2862644,