Graph regularized seismic dictionary learning

Graph regularized seismic dictionary learning

Lina Liu1, Jianwei Ma2, Gerlind Plonka3

1Department of Mathematics and Institute of Geophysics, Harbin Institute of Technology, Harbin, China,

[email protected] of Mathematics and Institute of Geophysics, Harbin Institute of Technology, Harbin, China,

[email protected] for Numerical and Applied Mathematics, University of Gottingen, Gottingen, Germany,

[email protected]

2

ABSTRACT

A graph-based regularization for geophysical inversion is proposed that offers a more ef-

ficient way to solve inverse denoising problems by dictionary learning methods designed

to find a sparse signal representation that adaptively captures prominent characteris-

tics in a given data. Most traditional dictionary learning methods convert 2D seismic

data patches or 3D data volumes into 1D vectors for training or learning, but this

conversion breaks down the inherent geometric structure of the data. To overcome

this problem, two algorithms of graph regularization for dictionary learning based on

clustering and SVD methods, named FDL-Graph and SDL-Graph, are proposed. First,

the clustering-based dictionary learning method divides the training patches into re-

gions/clusters of similar geometric structures employing a simplified principal compo-

nent analysis method or L2 distance function to effectively perform such a clustering.

Next, the clusters are used to learn a basis or a frame that best describes the patches.

Besides employing an adapted dictionary for sparse data representation we also con-

sider the local geometric structures of the seismic data using a Laplacian graph. Since

this novel denoising model adds a graph regularization to the sparse representation

model to emphasize correlations among data patches, the proposed method leads to

smoother results and can achieve better denoising performance than existing methods

without graph regularization, both in terms of peak signal-to-noise ratio values and

visual estimation of weak-even preservation. Comparisons of experimental results on

synthetic and recorded seismic data using traditional FX deconvolution and curvelet

thresholding methods are also provided.

INTRODUCTION

Geophysical inversion plays a key role in seismic exploration processing for tasks such as

interpolation, denoising, migration/imaging, and full waveform inversion. However, this

approach creates an ill-posed problem, making it difficult to achieve satisfactory results.

3

Finding and building a regularization model based on prior knowledge is an important step

towards improving such problems. Examples include Tikhonov regularization, total varia-

tion regularization, and L1-norm regularization for smooth constraints, piece-wise smooth

constraints, and sparsity-promoting constraints, respectively. The graph regularization that

we consider in this work is a nonlocal constraint that makes use of some prior knowledge

in a Laplacian graph. While this type of special regularization has previously been applied

to image processing, to our knowledge this is the first use of this approach for geophysical

processing and inversion.

Denoising is still one of the most fundamental, widely studied, and significant problems

in seismic data processing. A relevant point worth noting is that it is also a geophysical

inversion problem. The purpose of denoising or restoration is to estimate the original seis-

mic data from noisy seismic data. A classical approach is the least squares method, where

the estimator is chosen to minimize the data error. This method is usually already applied

during the acquisition process in order to achieve data sets with lower noise level. To im-

prove denoising results, regularization methods are required that provide stable and unique

solutions. In addition, regularization methods serve to impose desired features on subsur-

face images. Quadratic regularization methods such as Tikhonov regularization tend to

produce models where discontinuities are blurred. They have been used to solve the acous-

tic inverse problem of crosshole seismology (Reiter and Rodi, 1996). On the other hand,

non-quadratic regularization methods such as total variation can provide high-resolution

images of the subsurface, where edges and discontinuities are properly preserved. For ex-

ample, Anagaw and Sacchi (2012) used total variation where sparsity was imposed on the

gradient of the model parameters, allowing the capture of edges that were not horizontally

or vertically aligned. Another regularization method that has attracted a renewed interest

and considerable attention in signal processing literature is L1-norm regularization. This

approach is popular for seismic data processing methods such as seismic data denoising and

interpolation (Herrmann and Hennenfent, 2008).

However, most regularization methods do not exploit the full geometric structure char-

4

acteristics of the data. To overcome this issue, additional regularizers are incorporated in

order to satisfy specific requirements in various scenarios. For example, the graph reg-

ularization method describes the relationship between data with manifold learning. The

common assumption is that, if two data points are close in the intrinsic data structure,

then their representations in any other domain are close as well. Various manifold learning

methods have been proposed to explore such structures (Belkin and Niyogi, 2003; Liu et al.,

2014; Zheng et al., 2011). In recent years, graph regularization has been used for describ-

ing the relationship between image patches (Elmoataz et al., 2008; Bougleux et al., 2009;

Kheradmand and Milanfar, 2014), e.g. by building a k-nearest neighbor graph to encode the

geometric information in the data. Such graph-based methods have been used for image de-

noising (Tang et al., 2013; Yankelevsky and Elad, 2016), image representation (Zheng et al.,

2011), and image super-resolution (Lu et al., 2012).

Dictionaries that allow a sparse transform offer a nice way to implement seismic data

denoising. Whether denoising is effective or not largely depends on dictionary selection or

the choice of sparse transforms. Wavelet transforms have been often used for solving seismic

data denoising problems (see e.g. Chanerley and Alexander, 2002). Zhang and Ulrych, 2003

presented a physical wavelet frame for seismic data denoising that used the characteristics

of seismic data. Curvelets are now one of most popular tools for seismic representation

and denoising (Hennenfent and Herrmann, 2006; Neelamani et al., 2008; Ma and Plonka,

2010). In addition to such sparse representation methods, Bonar and Sacchi (2012) applied

a nonlocal means algorithm to seismic data denoising by utilizing similar samples or pixels

in spatial proximity.

A new popular and highly effective approach for solving seismic denoising problem is

sparse representation of the seismic data over a trained dictionary. Compared to sparse

transformation denoising methods (Pan et al., 1999; Starck et al., 2002), a dictionary trained

through a dictionary learning method can provide a sparser representation of seismic data.

Different dictionary learning methods have already been applied to the seismic data denois-

ing process. Bechouche and Ma (2014) used the method of optimal direction (MOD) to

5

teach the dictionary (see also Engan et al., 1999), while Yu et al. (2015) proposed a dictio-

nary learning seismic data denoising method that used a data driven tight frame (DDTF).

However, almost all patch-based dictionary learning denoising methods convert seismic data

patches or volumes into one-dimensional (1D) vectors for training processing, and thereby

lose the inherent 2D or 3D geometric structures of the seismic data.

To overcome this limitation, this work focuses on processing the graph structure of

seismic data patches via sparse representation modeling, which considers the geometric

structure of the sparse coefficients to capture the intrinsic geometric structure of the seismic

data. The method is a two-stage approach that includes a learning dictionary stage and a

sparse coding stage. In the first stage, 2D training data patches are clustered into subsets

with similar structures using the clustering method. In order to efficiently perform such

clustering, we employ a generalized PCA method and the Euclidean distance function.

We then teach the dictionary from the cluster. In the second stage, we add the intrinsic

geometric structure of the data through a regularized graph and use the learned dictionary

to provide a sparse representation of noisy data patches. Finally all of the denoised patches

are combined to obtain the denoised seismic data.

The rest of the paper is organized as follows: In Section II we provide a brief de-

scription of the sparse representation problem, then propose two graph-based dictionary

learning methods for solving the denoising problem by combining graph regularization with

clustering and SVD method-based dictionary learning FDL-Graph and SDL-Graph. The

corresponding optimization schemes and algorithms are also presented. Experimental re-

sults and comparative discussions using simulated and recorded seismic data are provided

in Section III. A conclusion and final thoughts are provided in Section IV.

6

METHOD

Sparse representation by learned dictionaries and graph regularization

One essential tool in image denoising is the sparse image representation in an over-complete

dictionary, i.e., we want to approximate the data using a linear combination of a relatively

small number of atoms from the dictionary. Here the choice of the dictionary plays an

important role, its atoms need to contain the important structures being typical for seismic

images.

Let us first fix some notations. Let {Y1, . . . ,Ym} be a given training set of patches of

seismic data. One may think about Yj to be (n × n) patches with relatively small n, as

n = 8. Further, let Y := [y1 . . .ym] be an N ×m matrix with N := n2, where the columns

yj = colYj ∈ RN are the vectorized patches. With D := [d1 . . .dk] ∈ RN×k we denote the

dictionary, where each column di ∈ RN (or the reshaped patch Di ∈ Rn×n with di = colDi,

respectively) is one atom in the dictionary.

Once, we have determined a suitable dictionary D, a sparsity promoting optimization

problem can be formulated as

minX∈Rk×m

(1

2∥Y −DX∥2F + λ

m∑i=1

∥xi∥0

), (1)

where X = [x1, . . . ,xm] ∈ Rk×m denotes the matrix of sparse coefficient vectors, such that

the set of training patches Y = [y1 . . .ym] is sparsely represented by DX. Here, λ is a

regularization parameter, ∥ · ∥F denotes the Frobenius norm of the matrix and ∥xi∥0 counts

the number of non-zero entries of xi. Note that this optimization problem is NP-hard.

Therefore, ∥ · ∥0 is usually replaced by the convex norm ∥ · ∥1, and the relaxed optimization

problem reads

minX∈Rk×m

(1

2∥Y −DX∥2F + λ

m∑i

∥xi∥1

)= min

X∈Rk×m

(1

2∥Y −DX∥2F + λ∥X∥1

), (2)

where ∥X∥1 :=∑m

i=1 ∥xi∥1 =∑m

i=1

∑kj=1 |xi,j | is the sum of all the moduli of all entries in

X. The model (2) has been widely used for data denoising in the last decade, and many

7

algorithms have been developed to solve these optimization problems efficiently, see e.g.

Beck and Teboulle (2009); Chambolle and Pock (2011); Needell and Vershynin (2010).

We extend this model for sparse representation of seismic images by two important

ingredients, dictionary learning and graph regularization.

Dictionary learning. We want to employ a dictionary D that is adaptively learned from

the data. In machine learning and in other applications, see e.g. Aharon et al. (2006);

Elad and Aharon (2006); Dong et al. (2013), the following model has been studied already,

minX∈Rk×m,D∈RN×k

(1

2∥Y −DX∥2F + λ∥X∥∗

)(3)

with ∥ · ∥∗ being either ∥ · ∥0 or ∥ · ∥1, and where K-SVD for dictionary learning is employed.

The approach is based on alternating optimization: For a fixed D, an improved sparse

matrix X is computed (e.g. by the OMP method) and for fixed X, the dictionary D is

updated using K-SVD. However, K-SVD is very expensive. In particular, many iteration

steps are needed since the atoms of the dictionary are updated component by component

and two-dimensional structures of the set of training patches are not explicitly used but all

patches are considered only in vectorized form.

Therefore, other methods for dictionary learning came up aiming at a cheaper technique

to replace the K-SVD. For example, in Cai et al. (2014) and Liu et al. (2017) a further a

priori dictionary structure is imposed to reduce the number of parameters, as block-wise

Toeplitz structure or (directional) tensor-product frames.

Motivated by ideas in Zeng et al. (2015), we will propose two different adaptive dictio-

nary learning methods in the next subsection. These methods exploit the two-dimensional

geometric structure of the training data in a more direct way and are essentially cheaper

than K-SVD.

Graph regularization. The second ingredient is an extension of the considered functional

in (3) by a further term that measures the similarity between the image patches. Here we

8

follow ideas in Zheng et al. (2011) and Yankelevsky and Elad (2016) and employ a graph

that represents the internal topology of the set of training patches.

For the given set of training patches Y1, . . . ,Ym, we construct a weighted undirected

complete graph G(V,E,W), where the finite set V of m vertices represents the given

patches, V = {Y1, . . . ,Ym}. Further, E = V × V is a set of weighted edges, and these

weights are collected in the weight matrix W ∈ Rm×m. We measure the similarity of

the training patches Yi and Yj simply by the norm of its difference ∥Yi − Yj∥F . For

a pre-defined k, we fix the k nearest neighbors of Yi (being different from Yi itself) by

inspecting ∥Yi − Yj∥2F for j ∈ {1, . . . ,m} \ {i} and define the symmetric weight matrix

W = (Wi,j)mi,j=1 by Wi,j = 1 if Yj is among the k nearest neighbors of Yi or if Yi is among

the k nearest neighbors of Yj . Otherwise, we set Wi,j = 0. In particular, Wi,i = 0 for

i = 1, . . . ,m. Thus, each row (or column) of W contains at least k ones. The process of

graph building is illustrated in Figure 1. The degree of each vertex Yi, i.e., the number of

all edges with weight 1 to the vertex Yi is given by ∆i =∑m

j=1Wi,j . Introducing the diag-

onal matrix ∆ = diag(∆1, . . . ,∆m), the graph Laplacian of G is now given by L = ∆−W.

By construction, L is symmetric and positive semidefinite, with non-diagonal entries being

non-positive, and the sum of all entries in each column (or row) is zero.

A direct computation shows that

Tr(YLYT ) =

m∑i,j=1

Wi,j∥Yi −Yj∥2F =

m∑i,j=1

Wi,j∥yi − yj∥22 =∑

Yi∼Yj

∥Yi −Yj∥2F

measuring the similarity of neighborhood patches in the graph, where we have used the

notation Yi ∼ Yj if Wi,j = 1. For each j, the vector Dxj is assumed to be a good

approximation yj . Since the (fixed) transform matrix D induces a linear mapping, we

can suppose that the vectors xi, i = 1, . . . ,m possess a similar topological structure as

yi, i = 1, . . . ,m, and particularly that, if yi and yj are k-neighbors with a small distance

∥yi − yj∥2, we also have that ∥xi − xj∥2 is small. Therefore, we incorporate the term

Tr(XLXT ) =

m∑i,j=1

Wi,j∥xi − xj∥22 =∑

Yi∼Yj

∥xi − xj∥22

9

and obtain the new minimization problem

minX∈Rk×mD∈RN×k

1

2∥Y −DX∥2F +

α

2Tr(XLXT ) + λ∥X∥1, (4)

where the Laplacian matrix L only depends on the training data Y that generate the graph.

Note that this model is simpler than the model considered in Yankelevsky and Elad (2016),

since we have not incorporated the graph Laplacian into the dictionary learning process.

We remark that the graph G can also be defined by other weight matrices, as e.g.

Wi,j =

1

2πh2 exp(−∥Yi−Yj∥2F

2h2

)for i = j

0 for i = j

using the Gaussian kernel and some parameter h, or alternatively by employing a thresh-

olded kernel to achieve a sparse matrix L. Note however, that the graph construction and

hence the graph Laplacian is already determined by the vectorized training patches since

the Frobenius norm does not exploit any two-dimensional structures of the training patches

Y1, . . . ,Ym.

Therefore these two-dimensional features have to be incorporated already into the dic-

tionary by the dictionary learning method.

Denoising procedure. For the special denoising application we employ the given noisy

seismic images themselves to acquire the set of training patches, i.e., we take patches of

the noisy images to obtain the set of training patches {Y1, . . . ,Ym}. We propose a new

denoising scheme based on the minimization problem (4). It consists of the following steps

that we present in Algorithm 1.

10

Algorithm 1 : Denoising algorithm based on dictionary learning and

graph regularization

Input

Noisy training data Y = [y1 . . .ym]

Number of iterations

Parameters k, α and λ

Algorithm

1: Set YD := Y.

Loop steps 2-5 until the given number of iterations is achieved:

2: Compute the graph Laplacian L for the given training set YD.

3: Determine the dictionary D by a dictionary learning algorithm based on YD.

4: Solve the minimization problem minX∈Rk×m

12∥Y −DX∥2F + α

2 Tr(XLXT ) + λ∥X∥1.

5: Reconstruct the data YD := DX.

Output

Denoised data YD

In the next subsections, we describe the two essential steps 3 and 4 of the algorithm in

more detail.

For step 2 to compute the Laplacian L := (Li,j)mi,j=1, we can simply proceed as follows.

First we compute the symmetric matrix of all distances ∥Yi − Yj∥2F . Then, in each row

i = 1, . . . ,m, we order the nonzero values by size and collect the indices j corresponding to

the smallest k distances in the index set Ii. For i, j ∈ {1, . . . ,m} and i = j we put

Li,j :=

−1 for j ∈ Ii or i ∈ Ij ,

0 otherwise.

Finally, set Li,i =m∑j=1j =i

|Li,j |.

11

Methods for dictionary learning

We will propose here two dictionary learning methods to perform step 3 of Algorithm 1.

Both are based on a special partition tree structure. We construct the dictionaries in two

steps. First we compute a special tree structure to partition the set of our training patches.

Then, in a second step, we compute the dictionary based on the obtained subset partitions

in the tree. The first step, the tree construction, will be different for our two proposed

methods, while the method to determine the dictionary from the tree structure will be the

same.

First dictionary construction (FDL). Motivated by the ideas in Zeng et al. (2015) we

employ for the first dictionary learning method a top-bottom two-dimensional subspace

partition (TTSP) as follows.

Step 1: Construction of the partition tree. For the given training set Y1, . . . ,Ym ∈

Rn×n of image patches we compute the mean

C :=1

m

m∑i=1

Yi ∈ Rn×n

and the two non-symmetric (n× n)-covariance matrices

CL :=1

m

m∑i=1

(Yi −C)(Yi −C)T , CR :=1

m

m∑i=1

(Yi −C)T (Yi −C).

Observe that CL and CR possess the same eigenvalues. Now, the normalized eigenvectors

u and v corresponding to the maximal eigenvalue of CL and CR is computed, i.e.,

u := argmax∥x∥2=1

xTCLx, v := argmax∥x∥2=1

xTCRx,

representing the main structures of the training patches being not captured by the mean

patch C. We compute the numbers

si := uTYiv, i = 1, . . . ,m.

These numbers give us a measure, how strong each single patch is correlated to the found

structure and will be used to obtain a partition of the set of all patches {Y1, . . . ,Ym} into

12

two partial sets. For this purpose, we order these numbers by size,

sℓ1 ≤ sℓ2 ≤ . . . ≤ sℓm ,

and compute

κ := argmin1≤κ≤m−1

κ∑r=1

(sℓr −

1

κ

κ∑ν=1

sℓν

)2

+

m∑r=κ+1

(sℓr −

1

m− κ

m∑ν=κ+1

sℓν

)2 .

Using κ, the partition {Yℓ1 , . . . ,Yℓκ} ∪ {Yℓκ+1, . . . ,Yℓm} is derived. This is the clustering

k-means method for the special case k = 2, that can be simply solved exactly for the set of

numbers {sℓ1 , . . . , sℓm}.

Having found this first partition, we can proceed to partition the two obtained subsets

further, using the same scheme. This procedure yields a binary tree, where we stop the

further partition of a subset, if it contains a less than a predefined number of elements.

While this partitioning procedure is equivalent to the first part of the TTSP algorithm

proposed in Zeng et al. (2015), we compose now the data-dependent dictionary differently

from the idea in Zeng et al. (2015) as follows.

Step 2: Determine the dictionary from the partition tree. Each knot k in the

tree is now associated with a subset of training patches {Yj}j∈Λk, where Λk ⊂ {1, . . . ,m}

denotes the subset of indices of these patches. We assume that the root of the tree has the

knot number k = 1 (i.e., Λ1 = {1, . . . ,m}) and we proceed numbering by going through

each level from left to right. For each knot k, we compute the mean (center)

Ck :=1

|Λk|∑i∈Λk

Yi

and the normalized singular vectors to the maximal singular value of CkCTk and CT

kCk,

i.e.,

uk := argmax∥x∥2=1

CkCTk x, vk := argmax

∥x∥2=1CT

kCkx.

If λk denotes the maximal singular value of Ck then λkukvTk is the best rank-1 approxi-

mation of Ck, since uk and vk are the first vectors in the singular value decomposition of

Ck.

13

The dictionary is now determined as follows. We fix the first dictionary element

D1 := u1vT1

capturing the main structure of the mean C = C1. Further, for each pair of children knots

2k and 2k + 1 to the same parent with means C2k and C2k+1 we set

Dk := λ2ku2kvT2k − λ2k+1u2k+1v

T2k+1, Dk :=

Dk

∥Dk∥F,

thereby capturing the difference of main structures of C2k and C2k+1. Note that this

procedure is essentially different from the construction in Zeng et al. (2015).

Remarks. 1. We remark that instead of approximating the centers Ck by a rank-1 matrix,

we can also use a more exact approximation with matrices of higher rank using the singular

value decomposition. For sparse representation purposes and if the patches do not contain

noise, one may even use the centers Ck directly instead of their approximations of smaller

rank.

2. The procedure avoids the problem of having dictionary elements being very similar.

Strong similarity of dictionary elements especially occurs if only the means of the subsets

in the leafs of the tree, or a small rank approximation of these means are employed, see

Zeng et al. (2015).

3. Our dictionary construction can be understood as a generalized wavelet approach,

where the dictionary elements for k ≥ 1 are wavelet elements while D1 is the low-pass

element.

Second dictionary construction (SDL). We want to propose a second less expensive

method for dictionary learning.

Step 1: Construction of the partition tree. This time, we construct the partition tree

in a different way, using the similarity of training patches as we did already for the graph

regularization. Here, we use the weights

wi,j := ∥Yi −Yj∥2F (5)

14

to measure the similarity between Yi and Yj .

First we order the patches such that Y1 has the smallest Frobenius norm. Now, we

compute the weights w1,j for j = 1, . . . ,m according to (5) and order them by size,

w1,ℓ1 ≤ w1,l2 ≤ . . . w1,ℓm ,

where ℓ1 = 1 since w1,1 = 0 is the smallest weight. Now, similarly as before in the first

construction, we divide the set of training patches into two subsets {Yℓ1 , . . .Yℓκ} and

{Yℓκ+1, . . .Yℓm} using

κ := argmin1≤κ≤m−1

κ∑r=1

(w1,ℓr −

1

κ

κ∑ν=1

w1,ℓν

)2

+

m∑r=κ+1

(w1,ℓr −

1

m− κ

m∑ν=κ+1

w1,ℓν

)2 .

We proceed to partition the obtained subsets using the same procedure and obtain a binary

tree, where each knot is associated with a subset of training patches.

Step 2: Determine the dictionary from the partition tree. We proceed analogously

as for the first construction and apply the rank-1 approximation of the meanC = C1 and the

first dictionary element as well as the normalized differences of the rank-1 approximations of

the centers of each pair of two children in the partition tree as further dictionary elements.

We consider a toy example to illustrate the two dictionary learning methods in the

appendix B (see also Figure 2).

Methods to solve the minimization problem

In the fourth step of the proposed denoising Algorithm 1 we have to solve the minimization

problem

minX∈Rk×m

(1

2∥Y −DX∥2F +

α

2Tr(XLXT ) + λ∥X∥1

), (6)

for given (noisy) training data Y and the dictionary D = [d1 . . .dk], where dj = colDj ∈

RN are the dictionary elements constructed in the last subsection.

We suggest here to solve the problem using the split Bregman iteration see e.g. Goldstein and Osher

(2009); Plonka and Ma (2011) which is in the considered case equivalent to the Alternating

15

Direction Method of Multipliers (ADMM), see Yankelevsky and Elad (2016). For other

approaches we also refer e.g. to Chambolle and Pock (2011) or to Lee et al. (2007). The

detail of the process to solve the Eq.(6) is presented in Appendix A. We outline the following

algorithm to perform step 4 of Algorithm 1.

Algorithm 2 : Solving the minimization problem

Input

Noisy training data Y = [y1 . . . ym] ∈ RN×m

Laplacian matrix L ∈ Rm×m

Learned dictionary D ∈ RN×k

X0 = Z0 = B0 = 0k×m

Parameters λ, µ, α > 0

Number of iterations

Algorithm

Iterate until the given number of iterations is achieved:

1: Compute Xℓ+1 as the solution of (DTD+ µI)X+ αXL = DTY + µ(Zℓ −Bℓ).

2: Compute Zℓ+1 componentwisely by employing soft shrinkage

zℓ+1i,j = Tλ/µ(xℓ+1

i,j +Bℓi,j) for i = 1, . . . , k, j = 1, . . . ,m.

3: Update Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.

Output X

EXPERIMENTS

In this section, we verify the seismic data denoising performance of the proposed FDL-

graph and SDL-graph methods by comparing their performance to results without graph

regularization, and highlight the advantages and disadvantages of using graph regulariza-

tion. Comparisons to a traditional FX (frequency domain in the time direction and spatial

domain in trace direction) deconvolution (FX-Decon) algorithm (Canales., 1984) and state-

of-the-art curvelet denoising by thresholding (Hennenfent and Herrmann, 2006) are also

provided.

16

Determining an objective data quality metric plays an important role in denoising ap-

plications. The peak-signal-to-noise ratio (PSNR) is given by

PSNR = 20 ∗ log 10(

255

std2(YC − YD)

),

where YD presents the denoised data and YC the clear data. The function std2 computes

the standard deviation of YC −YD. Computation time (in seconds), and the recovered error

(i.e., error =||YC−YD||2F

||YC ||2F) are used as the performance measurements in our quantitative

evaluation. The parameter k for building the graph Laplacian, where we need to fix the k

nearest neighbors, is always taken to be k = 6.

To qualitatively evaluate our algorithm’s effectiveness on seismic data, we compare the

following six algorithms for seismic data denoising at noise level 25: FX-Decon, Curvelet

algorithm, FDL-OMP, SDL-OMP, FDL-Graph, and SDL-Graph. Here FDL-OMP and SDL-

OMP denote the denoising methods that we obtain by solving the optimization problem

(1) using the dictionary D learned by FDL or SDL, respectively, but without applying

the graph-regularization term. Similarly, FDL-Graph and SDL-Graph denote the methods

obtained from Algorithm 1, using the first (FDL) or the second (SDL) dictionary learning

approach. The FX-Decon algorithm has been applied to a window of size 64× 64 samples

overlapping on 8 samples in both the time and spatial dimensions. The curvelet denoising

algorithm decomposes the noisy data into different scales and different directions, and small

curvelet coefficients are removed by thresholding. For the seismic data in Figure 3(b),

the FDL-Graph and SDL-Graph algorithms use a set of 961 patches that constitutes the

training set, where the size of patches is 16 × 16. The regularization parameters in Eq.(8)

have been empirically chosen to be α = 1.6; λ = 0.4; µ = 0.05 and α = 1.6; λ = 0.5;

µ = 0.04. For the training dictionary part, the number of patches contained in each knot

determines the number of layers of the tree. We set the minimal number of patches in a

subset corresponding to one knot to 6 for the FDL-Graph algorithm and to 16 for the SDL-

Graph algorithm. The dictionary learned by the FDL-Graph and SDL-Graph algorithms

are shown in Figure 4, where the training patches are from noisy data in Figure 3(b). The

17

main difference between FDL-OMP/ SDL-OMP and FDL-Graph/ SDL-Graph is that by

graph regularization the geometric similarity of data patches is employed. We observe from

the close-up window in Figure 5 that the results using the graph regularization methods

maintain the even continuous and weak features of the original data. From the single

trace comparison in Figure 6 we can see that all methods achieve strong amplitudes close

to the clear data trace. But for weak amplitudes, the denoising results using the Graph

regularization methods are much better compared to the methods without using Graph

regularization. The statistical results with various noise levels are shown in Figure 7. In

general, it shows that the graph regularization term is helpful for capturing the features of

seismic data.

Finally, we have applied FX-Decon, curvelet denoising, and the proposed FDL-Graph,

and SDL-Graph algorithms to denoising in Figure 8. Note that the data set is an actual

field data where noise was not artificially added; the noise seen in the data is the original

noise captured along with the data acquisition. The denoising results are presented on the

left side of Figure 8 and Figure 9. On the right side, the difference between the original

noisy data and the denoised data is displayed. The denoising results using the FDL-Graph,

and SDL-Graph algorithms are clearer and smoother than the results produced using the

FX-Decon and curvelet denoising methods. From the magnified windows, we can see that

the FDL-Graph and SDL-Graph methods better recover the weak features found in the

original data compared to the other methods.

CONCLUSION

This paper proposed graph regularization dictionary learning methods for seismic data

processing, where the intrinsic structure of seismic data can be utilized by graph Laplacian

regularization. Experimental results achieved on simulated and real seismic data show that

graph regularization provides improved denoising results compared to methods that only

employ a learned dictionary. However, computation of the sparse coefficient stage is more

18

time consuming than similar methods without graph regularization. Also, the recovery for

stronger noise levels is not satisfactory up to now. Future work will focus on improving the

computational speed of the algorithm. However, even in its current state, the present model

that connects dictionary learning and graph regularization is useful for denoising and other

geophysical inversion problems such as migration, imaging, and full waveform inversion.

ACKNOWLEDGEMENTS

This work is supported by National Nature Science Foundation of China (grant number:

NSFC 91330108, 41374121, 61327013), and the Fundamental Research Funds for the Central

Universities (grant number: HIT.PIRS.A201501).

APPENDIX A

Let’s recall that we want to use the split Bregman algorithm to solve the problem

minX∈Rk×m

(1

2∥Y −DX∥2F +

α

2Tr(XLXT ) + λ∥X∥1

). (7)

First, we introduce a further variable Z ∈ Rk×m and rewrite the problem as follows,

minX,Z∈Rk×m

1

2∥Y −DX∥2F +

α

2Tr(XLXT ) +

µ

2∥X− Z∥2F + λ∥Z∥1 (8)

with some fixed parameter µ > 0. Now, let

E(X,Z) := λ∥Z∥1 +1

2∥Y −DX∥2F +

α

2Tr(XLXT ),

such that (8) is of the form

minX,Z∈Rk×m

E(X,Z) +µ

2∥X− Z∥2F .

We introduce the so-called Bregman distance

DE((X,Z), (Xℓ,Zℓ)) := E(X,Z)− E(Xℓ,Zℓ)− ⟨Pℓ,X−Xℓ⟩ − ⟨Qℓ,Z− Zℓ⟩,

19

where (Pℓ,Qℓ) is a subgradient of E at (Xℓ,Zℓ), i.e., (Pℓ,Qℓ) ∈ ∂E(Xℓ,Zℓ). Now, to solve

(8), we replace E(X,Z) by the Bregman distance and consider the iteration

(Xℓ+1,Zℓ+1) = argminX,Z∈Rk×m

{DE((X,Z), (Xℓ,Zℓ)) +

µ

2∥X− Z∥2F

}= argmin

X,Z∈Rk×m

{E(X,Z)− ⟨Pℓ,X⟩ − ⟨Qℓ,Z⟩+ µ

2∥X− Z∥2F

}. (9)

This iteration is senseful, since the two additional terms ⟨Pℓ,X⟩ and ⟨Qℓ,Z⟩ vanish for the

desired solution (X,Z) of (8). From (9) we derive the necessary conditions

0 ∈ ∂XE(Xℓ+1,Zℓ+1)−Pℓ + µ(Xℓ+1 − Zℓ+1),

0 ∈ ∂ZE(Xℓ+1,Zℓ+1)−Qℓ − µ(Xℓ+1 − Zℓ+1).

Since (Pℓ+1,Qℓ+1) ∈ ∂E(Xℓ+1,Zℓ+1) we obtain the following recursions from these condi-

tions,

Pℓ+1 = Pℓ − µ(Xℓ+1 − Zℓ+1), Qℓ+1 = Qℓ + µ(Xℓ+1 − Zℓ+1), (10)

and particularly, Pℓ+1+Qℓ+1 = Pℓ+Qℓ for all ℓ. Now, introducing Bℓ := 1µQ

ℓ, the second

equation in (10) implies the recursion

Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.

Moreover, by

µ

2∥X− Z+Bℓ∥2F =

µ

2∥X− Z∥2F + µ⟨X− Z,Bℓ⟩+ µ

2∥Bℓ∥2F

we can rewrite (9) as


{E(X,Z) +

µ

2∥X− Z+Bℓ∥2F − µ⟨X− Z,Bℓ⟩ − ⟨Pℓ,X⟩ − ⟨Qℓ,Z⟩

}= argmin

X,Z∈Rk×m

{E(X,Z) +

µ

2∥X− Z+Bℓ∥2F − ⟨X,Pℓ +Qℓ⟩

},

where we have used that −µ⟨X−Z,Bℓ⟩ = −⟨Qℓ,X⟩+ ⟨Qℓ,Z⟩. Taking the initial matrices

B0 = P0 = Q0 = 0, it follows that Pℓ +Qℓ = 0 for all ℓ, and we arrive at the iteration


{E(X,Z) +

µ

2∥X− Z+Bℓ∥2F

},

Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.

20

Applying alternating minimization, this leads to the following iteration scheme,

Xℓ+1 = argminX∈Rk×m

{1

2∥DX−Y∥2F +

α

2Tr(XLXT ) +

µ

2∥X− Zℓ +Bℓ∥2F

},

Zℓ+1 = argminZ∈Rk×m

{λ∥Z∥1 +

µ

2∥Xℓ+1 − Z+Bℓ∥2F

}, (11)

Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.

The functional in the first equation of (11) is differentiable, and we obtain the necessary

condition

DT (DX−Y) + αXL+ µ(X− Zℓ +Bℓ) = 0,

i.e.,

(DTD+ µI)X+ αXL = DTY + µ(Zℓ −Bℓ).

This equation is uniquely solvable if the eigenvalues α1, . . . , αk of (DTD + µI) and the

eigenvalues β1, . . . , βk of αL satisfy

αi + βj = 0 ∀ i = 1, . . . , k, j = 1, . . . ,m,

see Bartels and Stewart (1972); Bhatia and Rosenthal (1997). This is true in our case since

(DTD+ µI) is positive definite for µ > 0 and L is positive semidefinit.

The second subproblem in (11) can be simply solved by component-wise shrinkage. For

each single component zℓ+1i,j of Zℓ+1 = (zℓ+1

i,j )k,mi=1,j=1 we have

zℓ+1i,j = argmin

z∈R{λ |z|+ µ

2|z − xℓ+1

i,j −Bℓi,j |2}

i.e.,

0 ∈ λzℓ+1

|zℓ+1|+ µ

(zℓ+1 − xℓ+1

i,j −Bℓi,j

),

where z|z| denotes the set [−1, 1] if z = 0. Hence, we find the solution by soft shrinkage,

zℓ+1i,j = Tλ/µ(xℓ+1

i,j +Bℓi,j) :=

xℓ+1i,j +Bℓ

i,j − λµ for (xℓ+1

i,j +Bℓi,j) ≥ λ

µ ,

xℓ+1i,j +Bℓ

i,j +λµ for (xℓ+1

i,j +Bℓi,j) ≤ −λ

µ ,

0 otherwise.

21

APPENDIX B

In our paper, we have proposed two dictionary learning algorithms, called FDL-Graph and

SDL-Graph. In order to illustrate the dictionary learning step more clearly, we will give a

small toy example, where the training set contains the following 10 training patches of size

3× 3,

Y1 =

1 1 0

0 0 1

0 0 0

Y2 =

1 0 0

0 1 0

0 0 0

Y3 =

0 0 0

1 1 0

0 0 1

Y4 =

1 0 0

0 1 1

0 0 1

Y5 =

1 1 1

0 1 1

0 0 0

Y6 =

0 0 1

1 1 1

0 1 1

Y7 =

1 1 0

0 1 1

0 0 1

Y8 =

0 1 1

0 0 1

0 0 0

Y9 =

1 1 1

0 0 1

0 0 1

Y10 =

0 0 0

0 1 1

0 0 1

.

We construct the partition trees using the two algorithms in the previous section and show,

how the dictionary construction can be directly incorporated into the partition tree con-

struction. The example also shows that the obtained trees and thus the found dictionary

elements slightly differ for the two approaches.

FDL-Graph algorithm

First level: The training set is Y1,Y2, · · · ,Y10. We compute the mean (center) C1 =

110

10∑i=1

Yi and the normalized singular vectors to the maximal singular value of C1CT1 and

CT1 C1, i.e.,

u1 = argmax∥x∥2=1

C1CT1 x, v1 = argmax

|∥x∥2=1CT

1 C1x.

Then we get the first dictionary element D1 = u1vT1 .

Second level: The training set is divided into two partial sets according to the de-

scription for FDL tree partitioning, we call them L21 and L22, where L21 = {Y5,Y8,Y9},

22

L22 = {Y1,Y2,Y3,Y4,Y6,Y7,Y10}. The center matrices of the two knots sets are C2 =

13(Y5 +Y8 +Y9) and C3 = 1

7(Y1 +Y2 +Y3 +Y4 +Y6 +Y7 +Y10). We compute the

maximal singular values λ2, λ3 of center matrices C2,C3 and the first left and right singular

vectors u2, v2 and u3, v3 of C2, C3, respectively. Now, we obtain the second dictionary

element D2 = λ2u2vT2 − λ3u3v

T3 .

Third level: The partition procedure yields now the subsets L31 = {Y8}, L32 =

{Y5,Y9} of L21, and the subsets L33 = {Y3,Y6}, L34 = {Y1,Y2,Y4,Y7,Y10} of L22.

We compute the center matrices of each partial set, C4 = Y8, C5 = 12(Y5 + Y9), C6 =

12(Y3 +Y6), and C7 =

15(Y1 +Y2 +Y4 +Y7 +Y10). We get the dictionary element

D3 = λ3u3vT3 − λ4u4v

T4 ,

where λ4 and λ5 are maximal singular values of the center matrices C4 and C5, and u4, v4

and u5,v5 are first singular vectors of C4, C5. Similarly,

D4 = λ6u6vT6 − λ7u7v

T7 ,

where λ6 and λ7 are maximal singular values of center matrices C6 and C7 with corre-

sponding singular vectors u6, v6, u7, v7.

Fourth level: The training set L32 is split into L41 = {Y5}, L42 = {Y9}, L33 is

split into L43 = {Y3}, L44 = {Y6}; and L3,4 is split into L45 = {Y1,Y7} and L46 =

{Y2,Y4,Y10}. Then D5, D6 and D7 can be learned from {L41, L42}, {L43, L44}, and

{L45, L46}.

Fifth level: The training set L45 is split into L51 = {Y1} and L52 = {Y7}, L46 is split

into L53 = {Y2} and L54 = {Y4,Y10}, and we learn the dictionary elements D8 and D9.

Sixth level: The training set L54 is split into L61 = {Y4} and L62 = {Y10}, and we

learn the dictionary element D10.

The SDL-Graph algorithm gives us the following partition tree.

First level: The training set is {Y1,Y2, · · · ,Y10}.

23

Second level: The training subsets are {Y1} and {Y2,Y3, · · · ,Y10}.

Third level: The training subsets are {Y5} and {Y2,Y3,Y4,Y6,Y7,Y8,Y9,Y10}.

Fourth level: The training subsets are {Y7,Y4} and {Y2,Y3,Y6,Y8,Y9,Y10}.

Fifth level: The training subsets are {Y7}, {Y4}, {Y9} and {Y2,Y3,Y6,Y8,Y10}.

Sixth level: The training subsets are {Y8} and {Y2,Y3,Y6,Y10}.

Seventh level: The training subsets are {Y3} and {Y2,Y6}.

Eighth level: The training subsets are {Y2} and {Y6}.

For this tree, we obtain 9 dictionary elements, one at the first, one at the second, one

at the third, one at the fourth, two at the fifth, one at the sixth, one at the seventh and

one at the eigth level.

24

REFERENCES

Aharon, M., M. Elad, and A. Bruckstein, 2006, The K-SVD: An algorithm for designing of

overcomplete dictionaries for sparse representation: IEEE Trans Signal Process, 54, no.

11, 4311-4322, doi: 10.1109/TSP.2006.881199.

Anagaw, A. Y., and MD. Sacchi, 2012, Edge-preserving seismic imaging using the total

variation method: Journal of Geophysics and Engineering, 9, no. 2, 138-146.

Bartels, R.H., and G.W. Stewart, 1972, Solution of the matrix equation AX + XB = C:

Commun. ACM 15, no. 9, 820-826.

Bechouche, S., and J. Ma, 2014, Simultaneously dictionary learning and denoising for seismic

data: Geophysics, 79, A27-A31, doi: 10.1190/geo2013-0382.1.

Belkin, M., and P. Niyogi, 2003, Laplacian eigenmaps for dimensionality re-

duction and data representation: Neural computation, 15, no. 6, 1373-1396,

doi:10.1162/089976603321780317.

Beck, A., and M. Teboulle, 2009, A fast iterative shrinkage-thresholding algorithm

for linear inverse problems: SIAM Journal on Imaging Sciences, 2, no. 1, 183-202,

doi:10.1137/080716542.

Bhatia, R., and P. Rosenthal, 1997, How and why to solve the operator equation AX+XB =

Y : Bull. London Math. Soc., 29, no. 1, 1-21.

Bougleux, S., A. Elmoataz, and M. Melkemi, 2009, Local and nonlocal discrete regular-

ization on weighted graphs for image and mesh processing: International Journal of

Computer Vision, 84, no. 2, 220-236, doi:10.1007/s11263-008-0159-z.

Bonar, D., and M. Sacchi, 2012, Denoising seismic data using the nonlocal means algorithm:

Geophysics, 77, no. 1, A5–A8, doi: 10.1190/geo2011-0235.1.

Cai, J. F., S. Huang, H. Ji, Z. Shen, and G. Ye, 2014, Data-driven tight frame construction

and image denoising: Applied and Computational Harmonic Analysis, 37, no. 1, 89-105.

Canales, L. L., 1984, Random noise reduction: Expanded Abstracts of Annual Meeting,

Society of Exploration Geophysicists, 525-527, doi: 10.1190/1.1894168.

Chanerley, A. A., and N. A. Alexander, 2002, An approach to seismic correction which in-

25

cludes wavelet de-noising: Proceedings of the Sixth Conference on Computational Struc-

tures Technology, pp. 107-108.

Chambolle, A., and T. Pock, 2011, A First-Order Primal-Dual Algorithm for Convex Prob-

lems with Applications to Imaging: Journal of Mathematical Imaging and Vision 40, no.

1, 120-145, doi:0.1007/s10851-010-0251-1.

Ding, C., and X. He, 2004, K-means clustering via principal component analysis:

Proceedings of the Twenty-First International Conference on Machine Learning, 29,

doi:10.1145/1015330.1015408.

Dong, W., L. Zhang, G. Shi, and X. Li, 2013, Nonlocally centralized sparse representation

for image restoration: IEEE Transactions on Image Processing, 22, no. 4, 1620-1630,

doi:10.1109/TIP.2012.2235847.

Elad, M., and M. Aharon, 2006, Image denoising via sparse redundant representations over

learned dictionaries: IEEE Transactions on Image Processing, 15, no. 12, 3736-3754,

doi:10.1109/TIP.2006.881969.

Engan, K., S. O. Aase, and J. H. Husoy, 1999, Method of optimal directions for frame

design: IEEE International Conference on Acoustics, Speech, and Signal Processing, 5,

doi: 10.1109/ICASSP.1999.760624.

Elmoataz, A., O. Lezoray, and S. Bougleux, 2008, Nonlocal discrete regularization on

weighted graphs: a framework for image and manifold processing: IEEE transactions

on Image Processing, 17, no. 7, 1047-1060, doi:10.1109/TIP.2008.924284.

Goldstein, T., and S. Osher, 2009, The split Bregman method for L1-regularized problems:

SIAM J. Imaging Sciences, 2, no. 2, 323-343, doi:10.1137/080725891.

Hein, M., and M. Maier, 2006, Manifold denoising: Advances in Neural Information Pro-

cessing Systems, 19, 561-568.

Hennenfent, G., and F. J. Herrmann, 2006, Seismic denoising with nonuniformly sampled

curvelets: Computing in Science and Engineering, 8, 16-25, doi:10.1109/MCSE.2006.49.

Herrmann, F. J., and G. Hennenfent, 2008, Non-parametric seismic data recovery with

curvelet frames: Geophysical Journal International, 173, no. 1, 233-248.

26

Kheradmand, A., and P. Milanfar, 2014, A general framework for regularized, similarity-

based image restoration: IEEE Transactions on Image Processing, 23, no. 12, 5136-5151,

doi:10.1109/TIP.2014.2362059.

Lee, H., A. Battle, R. Raina, and A. Y. Ng, 2007, Efficient sparse coding algorithms:

Advances in Neural Information Processing Systems, 19, 801-808.

Liu, X., D. Zhai, D. Zhao, G. Zhai, and W. Gao, 2014, Progressive image denoising through

hybrid graph laplacian regularization: a unified framework: IEEE Transactions on Image

Processing, 23, no. 4, 1491-1503, doi:10.1109/TIP.2014.2303638.

Liu, L., G. Plonka, and J. Ma, 2017, Seismic data interpolation and denoising by learning

a tensor tight frame: Inverse Problems, to appear.

Lu, X., H. Yuan, P. Yan, Y. Yuan, and X. Li, 2012, Geometry constrained sparse coding

for single image super-resolution: IEEE Conference on Computer Vision and Pattern

Recognition, 1648-1655, doi10.1109/CVPR.2012.6247858.

Naghizadeh, M., 2010, A unified method for interpolation and de-noising of seismic records

in the f-k domain: Expanded Abstracts of Annual Meeting, Society of Exploration Geo-

physicists, 3579-3583, doi: 10.1190/1.3513593.

Ma, J., and G. Plonka, 2010, The curvelet transform: IEEE Signal Processing Magazine,

27, no.2, 118-133, doi: 10.1109/MSP.2009.935453.

Neelamani, R., A. Baumstein, D. Gillard, M. Hadidi, and W. Soroka, 2008, Coherent and

random noise attenuation using the curvelet transform: The Leading Edge, 27, no.2,

240–248, doi: 10.1190/1.2840373.

Needell, D., and R. Vershynin, 2010, Signal recovery from incomplete and inaccurate mea-

surements via regularized orthogonal matching pursuit: IEEE Journal of Selected topics

in signal processing, 4, no. 2, 310-316, doi:10.1109/JSTSP.2010.2042412.

Pan, Q., L. Zhang, G. Dai, and H. Zhang, 1999, Two denoising methods by wavelet

transform: IEEE Transactions on Signal Processing, 47, no. 12, 3401-3406, doi:

10.1109/78.806084.

Plonka, G., and J. Ma, 2011, Curvelet-wavelet regularized split Bregman iteration for

27

compressed sensing, Int. J. Wavelets Multiresolut Inf. Process. 9, no. 1, 79-110.

doi:10.1142/S0219691311003955.

Reiter, DT., and W. Rodi, 1996, Nonlinear waveform tomography applied to crosshole

seismic data: Geophysics, 61, no. 3, 902-913, doi: 10.1190/1.1444015.

Starck, JL., EJ. Candes, and DL. Donoho, 2002, The curvelet transform for im-

age denoising: IEEE Transactions on Image Processing, 11, no. 6, 670-684,

doi:10.1109/TIP.2002.1014998.

Tang, Y., Y. Shen, A. Jiang, N. Xu, and C. Zhu, 2013, Image denoising via graph regularized

K-SVD: IEEE International Symposium on Circuits and Systems (ISCAS), 2820-2823,

doi:10.1109/ISCAS.2013.6572465.

Yankelevsky, Y., and M. Elad, 2016, Dual graph regularized dictionary learning: IEEE

Transactions on Signal and Information Processing over Networks, 2, no. 4, 611-624,

doi:10.1109/TSIPN.2016.2605763.

Yu, S., J. Ma, X. Zhang, and M. Sacchi, 2015, Denoising and interpolation of

high-dimensional seismic data by learning tight frame: Geophysics, 80, V119-V132,

doi:10.1190/geo2014-0396.1.

Yu, S., J. Ma, and S. Osher, 2016, Monte Carlo data-driven tight frame for seismic data

recovery: Geophysics, 81, no. 4, V327-V340.

Zhang, R., and T. Ulrych, 2003, Physical wavelet frame denoising: Geophysics, 68, 225-231,

doi: 10.1190/1.1543209.

Zeng, X., W. Bian, W. Liu, J. Shen, and D. Tao, 2015, Dictionary pair learning on Grass-

mann manifolds for image denoising: IEEE Transactions on Image Processing, 24, no.

11, 4556-4569, doi:10.1109/TIP.2015.2468172.

Zheng, M., J. Bu, C. Chen, C. Wang, L. Zhang, G. Qiu, and D. Cai, 2011, Graph regularized

sparse coding for image representation. IEEE Transactions on Image Processing, 20, no.

5, 1327-1336, doi:10.1109/TIP.2010.2090535.

28

Figure Captions:

Figure 1: The weighted and undirected graph, where the 0/1 denotes a mask for k-

nearest neighbors.

Figure 2: The process for teaching a dictionary by FDL-Graph and SDL-Graph algo-

rithms, where λi is the maximal singular value of center matrix Ci in each knot, ui and vi

are the first vectors in the singular value decomposition of Ci.

Figure 3: Seismic Data: (a) real data; (b) real data with random noise; and (c) field

data.

Figure 4: Comparison of dictionaries learned through two methods: (a) FDL-Graph;

(b) SDL-Graph.

Figure 5: Denoising results for real data with noise level=25: (a) FX-Decon; (b)

Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.

Figure 6: Single trace comparison of the reconstructions with the clear data. (a) FX-

Decon; (b) Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.

Figure 7: Denoising performance of FX-Decon, Curvelet, FDL-OMP, SDL-OMP, FDL-

Graph and SDL-Graph algorithms at different noise levels.

Figure 8: Denoising of field data by FX-Decon (a) and Curvelet approach (c). Figures

in (b) and (d) present the differences between the (noisy) data and the denoising result.

Figure 9: Denoising of field data by FDL-Graph (a) and SDL-Graph (c). Figures in (b)

and (d) present the differences between the (noisy) data and the denoising result.

29

Figure 1: The weighted and undirected graph, where the 0/1 denotes a mask for k-nearest

neighbors.

30

Figure 2: The process for teaching a dictionary by FDL-Graph and SDL-Graph algorithms,

where λi is the maximal singular value of center matrix Ci in each knot, ui and vi are the

first vectors in the singular value decomposition of Ci.

31

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(a) (b)

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(c)

Figure 3: Seismic Data: (a) real data; (b) real data with random noise; and (c) field data.

32

(a) (b)

Figure 4: Comparison of dictionaries learned through two methods: (a) FDL-Graph; (b)

SDL-Graph.

33

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(a) (b)

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(c) (d)

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(e) (f)

Figure 5: Denoising results for real data with noise level=25: (a) FX-Decon; (b) Curvelet;

(c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.

34

0 50 100 150 200 2500

50

100

150

200

250

Time sumpling number

Am

plitu

de

Clear DataFX−Decon

0 50 100 150 200 2500

50

100

150

200

250

Time sampling number

Am

plitu

de

Clear DataCurvelet

(a) (b)

0 50 100 150 200 25020

40

60

80

100

120

140

160

180

200

220

Trace number

Tim

e sa

mpl

ing

num

ber

Clear DataFDL−OMP

0 50 100 150 200 2500

50

100

150

200

250


Am

plitu

de

Clear DataSDL−OMP

(c) (d)

0 50 100 150 200 25020

40

60

80

100

120

140

160

180

200

220


Am

plitu

de

Clear DataFDL−Graph

0 50 100 150 200 2500

50

100

150

200

250


Am

plitu

de

Clear DataSDL−Graph

(e) (f)

Figure 6: Single trace comparison of the reconstructions with the clear data.(a) FX-Decon;

(b) Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.

35

10 20 30 40 5025

30

35

40

Noise Level

PS

NR

FDL−GraphSDL−GraphCurveletFX−Decon

10 20 30 40 5025

30

35

40

Noise Level

PS

NR

FDL−GraphSDL−GraphFDL−OMPSDL−OMP

(a) (b)

10 20 30 40 500

0.005

0.01

0.015

0.02

0.025

Noise Level

Err

or

FDL−GraphSDL−GraphCurveletFX−Decon

10 20 30 40 500

0.005

0.01

0.015

0.02

0.025

Noise Level

Err

or

FDL−GraphSDL−GraphFDL−OMPSDL−OMP

(c) (d)

10 20 30 40 500

5

10

15

20

25

30

Noise Level

Tim

e(s)

FDL−GraphSDL−GraphFDL−OMPSDL−OMLCurveletFX−Decon

(e)

Figure 7: Denoising performance of the FX-Decon, Curvelet, FDL-OMP. SDL-OMP, FDL-

Graph and SDL-Graph algorithms at different noise levels.

36

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(a) (b)

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(c) (d)

Figure 8: Denoising of field data by FX-Decon (a) and Curvelet approach (c). Figures in

(b) and (d) present the differences between the (noisy) data and the denoising result.

37

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(a) (b)

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

Trace number

Tim

e sa

mpl

ing

num

ber

50 100 150 200 250

50

100

150

200

250

(c) (d)

Figure 9: Denoising of field data by FDL-Graph (a) and SDL-Graph (c). Figures in (b)

and (d) present the differences between the (noisy) data and the denoising result.

Documents

Graph regularized seismic dictionary learning