Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Graph regularized seismic dictionary learning
Lina Liu1, Jianwei Ma2, Gerlind Plonka3
1Department of Mathematics and Institute of Geophysics, Harbin Institute of Technology, Harbin, China,
[email protected] of Mathematics and Institute of Geophysics, Harbin Institute of Technology, Harbin, China,
[email protected] for Numerical and Applied Mathematics, University of Gottingen, Gottingen, Germany,
2
ABSTRACT
A graph-based regularization for geophysical inversion is proposed that offers a more ef-
ficient way to solve inverse denoising problems by dictionary learning methods designed
to find a sparse signal representation that adaptively captures prominent characteris-
tics in a given data. Most traditional dictionary learning methods convert 2D seismic
data patches or 3D data volumes into 1D vectors for training or learning, but this
conversion breaks down the inherent geometric structure of the data. To overcome
this problem, two algorithms of graph regularization for dictionary learning based on
clustering and SVD methods, named FDL-Graph and SDL-Graph, are proposed. First,
the clustering-based dictionary learning method divides the training patches into re-
gions/clusters of similar geometric structures employing a simplified principal compo-
nent analysis method or L2 distance function to effectively perform such a clustering.
Next, the clusters are used to learn a basis or a frame that best describes the patches.
Besides employing an adapted dictionary for sparse data representation we also con-
sider the local geometric structures of the seismic data using a Laplacian graph. Since
this novel denoising model adds a graph regularization to the sparse representation
model to emphasize correlations among data patches, the proposed method leads to
smoother results and can achieve better denoising performance than existing methods
without graph regularization, both in terms of peak signal-to-noise ratio values and
visual estimation of weak-even preservation. Comparisons of experimental results on
synthetic and recorded seismic data using traditional FX deconvolution and curvelet
thresholding methods are also provided.
INTRODUCTION
Geophysical inversion plays a key role in seismic exploration processing for tasks such as
interpolation, denoising, migration/imaging, and full waveform inversion. However, this
approach creates an ill-posed problem, making it difficult to achieve satisfactory results.
3
Finding and building a regularization model based on prior knowledge is an important step
towards improving such problems. Examples include Tikhonov regularization, total varia-
tion regularization, and L1-norm regularization for smooth constraints, piece-wise smooth
constraints, and sparsity-promoting constraints, respectively. The graph regularization that
we consider in this work is a nonlocal constraint that makes use of some prior knowledge
in a Laplacian graph. While this type of special regularization has previously been applied
to image processing, to our knowledge this is the first use of this approach for geophysical
processing and inversion.
Denoising is still one of the most fundamental, widely studied, and significant problems
in seismic data processing. A relevant point worth noting is that it is also a geophysical
inversion problem. The purpose of denoising or restoration is to estimate the original seis-
mic data from noisy seismic data. A classical approach is the least squares method, where
the estimator is chosen to minimize the data error. This method is usually already applied
during the acquisition process in order to achieve data sets with lower noise level. To im-
prove denoising results, regularization methods are required that provide stable and unique
solutions. In addition, regularization methods serve to impose desired features on subsur-
face images. Quadratic regularization methods such as Tikhonov regularization tend to
produce models where discontinuities are blurred. They have been used to solve the acous-
tic inverse problem of crosshole seismology (Reiter and Rodi, 1996). On the other hand,
non-quadratic regularization methods such as total variation can provide high-resolution
images of the subsurface, where edges and discontinuities are properly preserved. For ex-
ample, Anagaw and Sacchi (2012) used total variation where sparsity was imposed on the
gradient of the model parameters, allowing the capture of edges that were not horizontally
or vertically aligned. Another regularization method that has attracted a renewed interest
and considerable attention in signal processing literature is L1-norm regularization. This
approach is popular for seismic data processing methods such as seismic data denoising and
interpolation (Herrmann and Hennenfent, 2008).
However, most regularization methods do not exploit the full geometric structure char-
4
acteristics of the data. To overcome this issue, additional regularizers are incorporated in
order to satisfy specific requirements in various scenarios. For example, the graph reg-
ularization method describes the relationship between data with manifold learning. The
common assumption is that, if two data points are close in the intrinsic data structure,
then their representations in any other domain are close as well. Various manifold learning
methods have been proposed to explore such structures (Belkin and Niyogi, 2003; Liu et al.,
2014; Zheng et al., 2011). In recent years, graph regularization has been used for describ-
ing the relationship between image patches (Elmoataz et al., 2008; Bougleux et al., 2009;
Kheradmand and Milanfar, 2014), e.g. by building a k-nearest neighbor graph to encode the
geometric information in the data. Such graph-based methods have been used for image de-
noising (Tang et al., 2013; Yankelevsky and Elad, 2016), image representation (Zheng et al.,
2011), and image super-resolution (Lu et al., 2012).
Dictionaries that allow a sparse transform offer a nice way to implement seismic data
denoising. Whether denoising is effective or not largely depends on dictionary selection or
the choice of sparse transforms. Wavelet transforms have been often used for solving seismic
data denoising problems (see e.g. Chanerley and Alexander, 2002). Zhang and Ulrych, 2003
presented a physical wavelet frame for seismic data denoising that used the characteristics
of seismic data. Curvelets are now one of most popular tools for seismic representation
and denoising (Hennenfent and Herrmann, 2006; Neelamani et al., 2008; Ma and Plonka,
2010). In addition to such sparse representation methods, Bonar and Sacchi (2012) applied
a nonlocal means algorithm to seismic data denoising by utilizing similar samples or pixels
in spatial proximity.
A new popular and highly effective approach for solving seismic denoising problem is
sparse representation of the seismic data over a trained dictionary. Compared to sparse
transformation denoising methods (Pan et al., 1999; Starck et al., 2002), a dictionary trained
through a dictionary learning method can provide a sparser representation of seismic data.
Different dictionary learning methods have already been applied to the seismic data denois-
ing process. Bechouche and Ma (2014) used the method of optimal direction (MOD) to
5
teach the dictionary (see also Engan et al., 1999), while Yu et al. (2015) proposed a dictio-
nary learning seismic data denoising method that used a data driven tight frame (DDTF).
However, almost all patch-based dictionary learning denoising methods convert seismic data
patches or volumes into one-dimensional (1D) vectors for training processing, and thereby
lose the inherent 2D or 3D geometric structures of the seismic data.
To overcome this limitation, this work focuses on processing the graph structure of
seismic data patches via sparse representation modeling, which considers the geometric
structure of the sparse coefficients to capture the intrinsic geometric structure of the seismic
data. The method is a two-stage approach that includes a learning dictionary stage and a
sparse coding stage. In the first stage, 2D training data patches are clustered into subsets
with similar structures using the clustering method. In order to efficiently perform such
clustering, we employ a generalized PCA method and the Euclidean distance function.
We then teach the dictionary from the cluster. In the second stage, we add the intrinsic
geometric structure of the data through a regularized graph and use the learned dictionary
to provide a sparse representation of noisy data patches. Finally all of the denoised patches
are combined to obtain the denoised seismic data.
The rest of the paper is organized as follows: In Section II we provide a brief de-
scription of the sparse representation problem, then propose two graph-based dictionary
learning methods for solving the denoising problem by combining graph regularization with
clustering and SVD method-based dictionary learning FDL-Graph and SDL-Graph. The
corresponding optimization schemes and algorithms are also presented. Experimental re-
sults and comparative discussions using simulated and recorded seismic data are provided
in Section III. A conclusion and final thoughts are provided in Section IV.
6
METHOD
Sparse representation by learned dictionaries and graph regularization
One essential tool in image denoising is the sparse image representation in an over-complete
dictionary, i.e., we want to approximate the data using a linear combination of a relatively
small number of atoms from the dictionary. Here the choice of the dictionary plays an
important role, its atoms need to contain the important structures being typical for seismic
images.
Let us first fix some notations. Let {Y1, . . . ,Ym} be a given training set of patches of
seismic data. One may think about Yj to be (n × n) patches with relatively small n, as
n = 8. Further, let Y := [y1 . . .ym] be an N ×m matrix with N := n2, where the columns
yj = colYj ∈ RN are the vectorized patches. With D := [d1 . . .dk] ∈ RN×k we denote the
dictionary, where each column di ∈ RN (or the reshaped patch Di ∈ Rn×n with di = colDi,
respectively) is one atom in the dictionary.
Once, we have determined a suitable dictionary D, a sparsity promoting optimization
problem can be formulated as
minX∈Rk×m
(1
2∥Y −DX∥2F + λ
m∑i=1
∥xi∥0
), (1)
where X = [x1, . . . ,xm] ∈ Rk×m denotes the matrix of sparse coefficient vectors, such that
the set of training patches Y = [y1 . . .ym] is sparsely represented by DX. Here, λ is a
regularization parameter, ∥ · ∥F denotes the Frobenius norm of the matrix and ∥xi∥0 counts
the number of non-zero entries of xi. Note that this optimization problem is NP-hard.
Therefore, ∥ · ∥0 is usually replaced by the convex norm ∥ · ∥1, and the relaxed optimization
problem reads
minX∈Rk×m
(1
2∥Y −DX∥2F + λ
m∑i
∥xi∥1
)= min
X∈Rk×m
(1
2∥Y −DX∥2F + λ∥X∥1
), (2)
where ∥X∥1 :=∑m
i=1 ∥xi∥1 =∑m
i=1
∑kj=1 |xi,j | is the sum of all the moduli of all entries in
X. The model (2) has been widely used for data denoising in the last decade, and many
7
algorithms have been developed to solve these optimization problems efficiently, see e.g.
Beck and Teboulle (2009); Chambolle and Pock (2011); Needell and Vershynin (2010).
We extend this model for sparse representation of seismic images by two important
ingredients, dictionary learning and graph regularization.
Dictionary learning. We want to employ a dictionary D that is adaptively learned from
the data. In machine learning and in other applications, see e.g. Aharon et al. (2006);
Elad and Aharon (2006); Dong et al. (2013), the following model has been studied already,
minX∈Rk×m,D∈RN×k
(1
2∥Y −DX∥2F + λ∥X∥∗
)(3)
with ∥ · ∥∗ being either ∥ · ∥0 or ∥ · ∥1, and where K-SVD for dictionary learning is employed.
The approach is based on alternating optimization: For a fixed D, an improved sparse
matrix X is computed (e.g. by the OMP method) and for fixed X, the dictionary D is
updated using K-SVD. However, K-SVD is very expensive. In particular, many iteration
steps are needed since the atoms of the dictionary are updated component by component
and two-dimensional structures of the set of training patches are not explicitly used but all
patches are considered only in vectorized form.
Therefore, other methods for dictionary learning came up aiming at a cheaper technique
to replace the K-SVD. For example, in Cai et al. (2014) and Liu et al. (2017) a further a
priori dictionary structure is imposed to reduce the number of parameters, as block-wise
Toeplitz structure or (directional) tensor-product frames.
Motivated by ideas in Zeng et al. (2015), we will propose two different adaptive dictio-
nary learning methods in the next subsection. These methods exploit the two-dimensional
geometric structure of the training data in a more direct way and are essentially cheaper
than K-SVD.
Graph regularization. The second ingredient is an extension of the considered functional
in (3) by a further term that measures the similarity between the image patches. Here we
8
follow ideas in Zheng et al. (2011) and Yankelevsky and Elad (2016) and employ a graph
that represents the internal topology of the set of training patches.
For the given set of training patches Y1, . . . ,Ym, we construct a weighted undirected
complete graph G(V,E,W), where the finite set V of m vertices represents the given
patches, V = {Y1, . . . ,Ym}. Further, E = V × V is a set of weighted edges, and these
weights are collected in the weight matrix W ∈ Rm×m. We measure the similarity of
the training patches Yi and Yj simply by the norm of its difference ∥Yi − Yj∥F . For
a pre-defined k, we fix the k nearest neighbors of Yi (being different from Yi itself) by
inspecting ∥Yi − Yj∥2F for j ∈ {1, . . . ,m} \ {i} and define the symmetric weight matrix
W = (Wi,j)mi,j=1 by Wi,j = 1 if Yj is among the k nearest neighbors of Yi or if Yi is among
the k nearest neighbors of Yj . Otherwise, we set Wi,j = 0. In particular, Wi,i = 0 for
i = 1, . . . ,m. Thus, each row (or column) of W contains at least k ones. The process of
graph building is illustrated in Figure 1. The degree of each vertex Yi, i.e., the number of
all edges with weight 1 to the vertex Yi is given by ∆i =∑m
j=1Wi,j . Introducing the diag-
onal matrix ∆ = diag(∆1, . . . ,∆m), the graph Laplacian of G is now given by L = ∆−W.
By construction, L is symmetric and positive semidefinite, with non-diagonal entries being
non-positive, and the sum of all entries in each column (or row) is zero.
A direct computation shows that
Tr(YLYT ) =
m∑i,j=1
Wi,j∥Yi −Yj∥2F =
m∑i,j=1
Wi,j∥yi − yj∥22 =∑
Yi∼Yj
∥Yi −Yj∥2F
measuring the similarity of neighborhood patches in the graph, where we have used the
notation Yi ∼ Yj if Wi,j = 1. For each j, the vector Dxj is assumed to be a good
approximation yj . Since the (fixed) transform matrix D induces a linear mapping, we
can suppose that the vectors xi, i = 1, . . . ,m possess a similar topological structure as
yi, i = 1, . . . ,m, and particularly that, if yi and yj are k-neighbors with a small distance
∥yi − yj∥2, we also have that ∥xi − xj∥2 is small. Therefore, we incorporate the term
Tr(XLXT ) =
m∑i,j=1
Wi,j∥xi − xj∥22 =∑
Yi∼Yj
∥xi − xj∥22
9
and obtain the new minimization problem
minX∈Rk×mD∈RN×k
1
2∥Y −DX∥2F +
α
2Tr(XLXT ) + λ∥X∥1, (4)
where the Laplacian matrix L only depends on the training data Y that generate the graph.
Note that this model is simpler than the model considered in Yankelevsky and Elad (2016),
since we have not incorporated the graph Laplacian into the dictionary learning process.
We remark that the graph G can also be defined by other weight matrices, as e.g.
Wi,j =
1
2πh2 exp(−∥Yi−Yj∥2F
2h2
)for i = j
0 for i = j
using the Gaussian kernel and some parameter h, or alternatively by employing a thresh-
olded kernel to achieve a sparse matrix L. Note however, that the graph construction and
hence the graph Laplacian is already determined by the vectorized training patches since
the Frobenius norm does not exploit any two-dimensional structures of the training patches
Y1, . . . ,Ym.
Therefore these two-dimensional features have to be incorporated already into the dic-
tionary by the dictionary learning method.
Denoising procedure. For the special denoising application we employ the given noisy
seismic images themselves to acquire the set of training patches, i.e., we take patches of
the noisy images to obtain the set of training patches {Y1, . . . ,Ym}. We propose a new
denoising scheme based on the minimization problem (4). It consists of the following steps
that we present in Algorithm 1.
10
Algorithm 1 : Denoising algorithm based on dictionary learning and
graph regularization
Input
Noisy training data Y = [y1 . . .ym]
Number of iterations
Parameters k, α and λ
Algorithm
1: Set YD := Y.
Loop steps 2-5 until the given number of iterations is achieved:
2: Compute the graph Laplacian L for the given training set YD.
3: Determine the dictionary D by a dictionary learning algorithm based on YD.
4: Solve the minimization problem minX∈Rk×m
12∥Y −DX∥2F + α
2 Tr(XLXT ) + λ∥X∥1.
5: Reconstruct the data YD := DX.
Output
Denoised data YD
In the next subsections, we describe the two essential steps 3 and 4 of the algorithm in
more detail.
For step 2 to compute the Laplacian L := (Li,j)mi,j=1, we can simply proceed as follows.
First we compute the symmetric matrix of all distances ∥Yi − Yj∥2F . Then, in each row
i = 1, . . . ,m, we order the nonzero values by size and collect the indices j corresponding to
the smallest k distances in the index set Ii. For i, j ∈ {1, . . . ,m} and i = j we put
Li,j :=
−1 for j ∈ Ii or i ∈ Ij ,
0 otherwise.
Finally, set Li,i =m∑j=1j =i
|Li,j |.
11
Methods for dictionary learning
We will propose here two dictionary learning methods to perform step 3 of Algorithm 1.
Both are based on a special partition tree structure. We construct the dictionaries in two
steps. First we compute a special tree structure to partition the set of our training patches.
Then, in a second step, we compute the dictionary based on the obtained subset partitions
in the tree. The first step, the tree construction, will be different for our two proposed
methods, while the method to determine the dictionary from the tree structure will be the
same.
First dictionary construction (FDL). Motivated by the ideas in Zeng et al. (2015) we
employ for the first dictionary learning method a top-bottom two-dimensional subspace
partition (TTSP) as follows.
Step 1: Construction of the partition tree. For the given training set Y1, . . . ,Ym ∈
Rn×n of image patches we compute the mean
C :=1
m
m∑i=1
Yi ∈ Rn×n
and the two non-symmetric (n× n)-covariance matrices
CL :=1
m
m∑i=1
(Yi −C)(Yi −C)T , CR :=1
m
m∑i=1
(Yi −C)T (Yi −C).
Observe that CL and CR possess the same eigenvalues. Now, the normalized eigenvectors
u and v corresponding to the maximal eigenvalue of CL and CR is computed, i.e.,
u := argmax∥x∥2=1
xTCLx, v := argmax∥x∥2=1
xTCRx,
representing the main structures of the training patches being not captured by the mean
patch C. We compute the numbers
si := uTYiv, i = 1, . . . ,m.
These numbers give us a measure, how strong each single patch is correlated to the found
structure and will be used to obtain a partition of the set of all patches {Y1, . . . ,Ym} into
12
two partial sets. For this purpose, we order these numbers by size,
sℓ1 ≤ sℓ2 ≤ . . . ≤ sℓm ,
and compute
κ := argmin1≤κ≤m−1
κ∑r=1
(sℓr −
1
κ
κ∑ν=1
sℓν
)2
+
m∑r=κ+1
(sℓr −
1
m− κ
m∑ν=κ+1
sℓν
)2 .
Using κ, the partition {Yℓ1 , . . . ,Yℓκ} ∪ {Yℓκ+1, . . . ,Yℓm} is derived. This is the clustering
k-means method for the special case k = 2, that can be simply solved exactly for the set of
numbers {sℓ1 , . . . , sℓm}.
Having found this first partition, we can proceed to partition the two obtained subsets
further, using the same scheme. This procedure yields a binary tree, where we stop the
further partition of a subset, if it contains a less than a predefined number of elements.
While this partitioning procedure is equivalent to the first part of the TTSP algorithm
proposed in Zeng et al. (2015), we compose now the data-dependent dictionary differently
from the idea in Zeng et al. (2015) as follows.
Step 2: Determine the dictionary from the partition tree. Each knot k in the
tree is now associated with a subset of training patches {Yj}j∈Λk, where Λk ⊂ {1, . . . ,m}
denotes the subset of indices of these patches. We assume that the root of the tree has the
knot number k = 1 (i.e., Λ1 = {1, . . . ,m}) and we proceed numbering by going through
each level from left to right. For each knot k, we compute the mean (center)
Ck :=1
|Λk|∑i∈Λk
Yi
and the normalized singular vectors to the maximal singular value of CkCTk and CT
kCk,
i.e.,
uk := argmax∥x∥2=1
CkCTk x, vk := argmax
∥x∥2=1CT
kCkx.
If λk denotes the maximal singular value of Ck then λkukvTk is the best rank-1 approxi-
mation of Ck, since uk and vk are the first vectors in the singular value decomposition of
Ck.
13
The dictionary is now determined as follows. We fix the first dictionary element
D1 := u1vT1
capturing the main structure of the mean C = C1. Further, for each pair of children knots
2k and 2k + 1 to the same parent with means C2k and C2k+1 we set
Dk := λ2ku2kvT2k − λ2k+1u2k+1v
T2k+1, Dk :=
Dk
∥Dk∥F,
thereby capturing the difference of main structures of C2k and C2k+1. Note that this
procedure is essentially different from the construction in Zeng et al. (2015).
Remarks. 1. We remark that instead of approximating the centers Ck by a rank-1 matrix,
we can also use a more exact approximation with matrices of higher rank using the singular
value decomposition. For sparse representation purposes and if the patches do not contain
noise, one may even use the centers Ck directly instead of their approximations of smaller
rank.
2. The procedure avoids the problem of having dictionary elements being very similar.
Strong similarity of dictionary elements especially occurs if only the means of the subsets
in the leafs of the tree, or a small rank approximation of these means are employed, see
Zeng et al. (2015).
3. Our dictionary construction can be understood as a generalized wavelet approach,
where the dictionary elements for k ≥ 1 are wavelet elements while D1 is the low-pass
element.
Second dictionary construction (SDL). We want to propose a second less expensive
method for dictionary learning.
Step 1: Construction of the partition tree. This time, we construct the partition tree
in a different way, using the similarity of training patches as we did already for the graph
regularization. Here, we use the weights
wi,j := ∥Yi −Yj∥2F (5)
14
to measure the similarity between Yi and Yj .
First we order the patches such that Y1 has the smallest Frobenius norm. Now, we
compute the weights w1,j for j = 1, . . . ,m according to (5) and order them by size,
w1,ℓ1 ≤ w1,l2 ≤ . . . w1,ℓm ,
where ℓ1 = 1 since w1,1 = 0 is the smallest weight. Now, similarly as before in the first
construction, we divide the set of training patches into two subsets {Yℓ1 , . . .Yℓκ} and
{Yℓκ+1, . . .Yℓm} using
κ := argmin1≤κ≤m−1
κ∑r=1
(w1,ℓr −
1
κ
κ∑ν=1
w1,ℓν
)2
+
m∑r=κ+1
(w1,ℓr −
1
m− κ
m∑ν=κ+1
w1,ℓν
)2 .
We proceed to partition the obtained subsets using the same procedure and obtain a binary
tree, where each knot is associated with a subset of training patches.
Step 2: Determine the dictionary from the partition tree. We proceed analogously
as for the first construction and apply the rank-1 approximation of the meanC = C1 and the
first dictionary element as well as the normalized differences of the rank-1 approximations of
the centers of each pair of two children in the partition tree as further dictionary elements.
We consider a toy example to illustrate the two dictionary learning methods in the
appendix B (see also Figure 2).
Methods to solve the minimization problem
In the fourth step of the proposed denoising Algorithm 1 we have to solve the minimization
problem
minX∈Rk×m
(1
2∥Y −DX∥2F +
α
2Tr(XLXT ) + λ∥X∥1
), (6)
for given (noisy) training data Y and the dictionary D = [d1 . . .dk], where dj = colDj ∈
RN are the dictionary elements constructed in the last subsection.
We suggest here to solve the problem using the split Bregman iteration see e.g. Goldstein and Osher
(2009); Plonka and Ma (2011) which is in the considered case equivalent to the Alternating
15
Direction Method of Multipliers (ADMM), see Yankelevsky and Elad (2016). For other
approaches we also refer e.g. to Chambolle and Pock (2011) or to Lee et al. (2007). The
detail of the process to solve the Eq.(6) is presented in Appendix A. We outline the following
algorithm to perform step 4 of Algorithm 1.
Algorithm 2 : Solving the minimization problem
Input
Noisy training data Y = [y1 . . . ym] ∈ RN×m
Laplacian matrix L ∈ Rm×m
Learned dictionary D ∈ RN×k
X0 = Z0 = B0 = 0k×m
Parameters λ, µ, α > 0
Number of iterations
Algorithm
Iterate until the given number of iterations is achieved:
1: Compute Xℓ+1 as the solution of (DTD+ µI)X+ αXL = DTY + µ(Zℓ −Bℓ).
2: Compute Zℓ+1 componentwisely by employing soft shrinkage
zℓ+1i,j = Tλ/µ(xℓ+1
i,j +Bℓi,j) for i = 1, . . . , k, j = 1, . . . ,m.
3: Update Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.
Output X
EXPERIMENTS
In this section, we verify the seismic data denoising performance of the proposed FDL-
graph and SDL-graph methods by comparing their performance to results without graph
regularization, and highlight the advantages and disadvantages of using graph regulariza-
tion. Comparisons to a traditional FX (frequency domain in the time direction and spatial
domain in trace direction) deconvolution (FX-Decon) algorithm (Canales., 1984) and state-
of-the-art curvelet denoising by thresholding (Hennenfent and Herrmann, 2006) are also
provided.
16
Determining an objective data quality metric plays an important role in denoising ap-
plications. The peak-signal-to-noise ratio (PSNR) is given by
PSNR = 20 ∗ log 10(
255
std2(YC − YD)
),
where YD presents the denoised data and YC the clear data. The function std2 computes
the standard deviation of YC −YD. Computation time (in seconds), and the recovered error
(i.e., error =||YC−YD||2F
||YC ||2F) are used as the performance measurements in our quantitative
evaluation. The parameter k for building the graph Laplacian, where we need to fix the k
nearest neighbors, is always taken to be k = 6.
To qualitatively evaluate our algorithm’s effectiveness on seismic data, we compare the
following six algorithms for seismic data denoising at noise level 25: FX-Decon, Curvelet
algorithm, FDL-OMP, SDL-OMP, FDL-Graph, and SDL-Graph. Here FDL-OMP and SDL-
OMP denote the denoising methods that we obtain by solving the optimization problem
(1) using the dictionary D learned by FDL or SDL, respectively, but without applying
the graph-regularization term. Similarly, FDL-Graph and SDL-Graph denote the methods
obtained from Algorithm 1, using the first (FDL) or the second (SDL) dictionary learning
approach. The FX-Decon algorithm has been applied to a window of size 64× 64 samples
overlapping on 8 samples in both the time and spatial dimensions. The curvelet denoising
algorithm decomposes the noisy data into different scales and different directions, and small
curvelet coefficients are removed by thresholding. For the seismic data in Figure 3(b),
the FDL-Graph and SDL-Graph algorithms use a set of 961 patches that constitutes the
training set, where the size of patches is 16 × 16. The regularization parameters in Eq.(8)
have been empirically chosen to be α = 1.6; λ = 0.4; µ = 0.05 and α = 1.6; λ = 0.5;
µ = 0.04. For the training dictionary part, the number of patches contained in each knot
determines the number of layers of the tree. We set the minimal number of patches in a
subset corresponding to one knot to 6 for the FDL-Graph algorithm and to 16 for the SDL-
Graph algorithm. The dictionary learned by the FDL-Graph and SDL-Graph algorithms
are shown in Figure 4, where the training patches are from noisy data in Figure 3(b). The
17
main difference between FDL-OMP/ SDL-OMP and FDL-Graph/ SDL-Graph is that by
graph regularization the geometric similarity of data patches is employed. We observe from
the close-up window in Figure 5 that the results using the graph regularization methods
maintain the even continuous and weak features of the original data. From the single
trace comparison in Figure 6 we can see that all methods achieve strong amplitudes close
to the clear data trace. But for weak amplitudes, the denoising results using the Graph
regularization methods are much better compared to the methods without using Graph
regularization. The statistical results with various noise levels are shown in Figure 7. In
general, it shows that the graph regularization term is helpful for capturing the features of
seismic data.
Finally, we have applied FX-Decon, curvelet denoising, and the proposed FDL-Graph,
and SDL-Graph algorithms to denoising in Figure 8. Note that the data set is an actual
field data where noise was not artificially added; the noise seen in the data is the original
noise captured along with the data acquisition. The denoising results are presented on the
left side of Figure 8 and Figure 9. On the right side, the difference between the original
noisy data and the denoised data is displayed. The denoising results using the FDL-Graph,
and SDL-Graph algorithms are clearer and smoother than the results produced using the
FX-Decon and curvelet denoising methods. From the magnified windows, we can see that
the FDL-Graph and SDL-Graph methods better recover the weak features found in the
original data compared to the other methods.
CONCLUSION
This paper proposed graph regularization dictionary learning methods for seismic data
processing, where the intrinsic structure of seismic data can be utilized by graph Laplacian
regularization. Experimental results achieved on simulated and real seismic data show that
graph regularization provides improved denoising results compared to methods that only
employ a learned dictionary. However, computation of the sparse coefficient stage is more
18
time consuming than similar methods without graph regularization. Also, the recovery for
stronger noise levels is not satisfactory up to now. Future work will focus on improving the
computational speed of the algorithm. However, even in its current state, the present model
that connects dictionary learning and graph regularization is useful for denoising and other
geophysical inversion problems such as migration, imaging, and full waveform inversion.
ACKNOWLEDGEMENTS
This work is supported by National Nature Science Foundation of China (grant number:
NSFC 91330108, 41374121, 61327013), and the Fundamental Research Funds for the Central
Universities (grant number: HIT.PIRS.A201501).
APPENDIX A
Let’s recall that we want to use the split Bregman algorithm to solve the problem
minX∈Rk×m
(1
2∥Y −DX∥2F +
α
2Tr(XLXT ) + λ∥X∥1
). (7)
First, we introduce a further variable Z ∈ Rk×m and rewrite the problem as follows,
minX,Z∈Rk×m
1
2∥Y −DX∥2F +
α
2Tr(XLXT ) +
µ
2∥X− Z∥2F + λ∥Z∥1 (8)
with some fixed parameter µ > 0. Now, let
E(X,Z) := λ∥Z∥1 +1
2∥Y −DX∥2F +
α
2Tr(XLXT ),
such that (8) is of the form
minX,Z∈Rk×m
E(X,Z) +µ
2∥X− Z∥2F .
We introduce the so-called Bregman distance
DE((X,Z), (Xℓ,Zℓ)) := E(X,Z)− E(Xℓ,Zℓ)− ⟨Pℓ,X−Xℓ⟩ − ⟨Qℓ,Z− Zℓ⟩,
19
where (Pℓ,Qℓ) is a subgradient of E at (Xℓ,Zℓ), i.e., (Pℓ,Qℓ) ∈ ∂E(Xℓ,Zℓ). Now, to solve
(8), we replace E(X,Z) by the Bregman distance and consider the iteration
(Xℓ+1,Zℓ+1) = argminX,Z∈Rk×m
{DE((X,Z), (Xℓ,Zℓ)) +
µ
2∥X− Z∥2F
}= argmin
X,Z∈Rk×m
{E(X,Z)− ⟨Pℓ,X⟩ − ⟨Qℓ,Z⟩+ µ
2∥X− Z∥2F
}. (9)
This iteration is senseful, since the two additional terms ⟨Pℓ,X⟩ and ⟨Qℓ,Z⟩ vanish for the
desired solution (X,Z) of (8). From (9) we derive the necessary conditions
0 ∈ ∂XE(Xℓ+1,Zℓ+1)−Pℓ + µ(Xℓ+1 − Zℓ+1),
0 ∈ ∂ZE(Xℓ+1,Zℓ+1)−Qℓ − µ(Xℓ+1 − Zℓ+1).
Since (Pℓ+1,Qℓ+1) ∈ ∂E(Xℓ+1,Zℓ+1) we obtain the following recursions from these condi-
tions,
Pℓ+1 = Pℓ − µ(Xℓ+1 − Zℓ+1), Qℓ+1 = Qℓ + µ(Xℓ+1 − Zℓ+1), (10)
and particularly, Pℓ+1+Qℓ+1 = Pℓ+Qℓ for all ℓ. Now, introducing Bℓ := 1µQ
ℓ, the second
equation in (10) implies the recursion
Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.
Moreover, by
µ
2∥X− Z+Bℓ∥2F =
µ
2∥X− Z∥2F + µ⟨X− Z,Bℓ⟩+ µ
2∥Bℓ∥2F
we can rewrite (9) as
(Xℓ+1,Zℓ+1) = argminX,Z∈Rk×m
{E(X,Z) +
µ
2∥X− Z+Bℓ∥2F − µ⟨X− Z,Bℓ⟩ − ⟨Pℓ,X⟩ − ⟨Qℓ,Z⟩
}= argmin
X,Z∈Rk×m
{E(X,Z) +
µ
2∥X− Z+Bℓ∥2F − ⟨X,Pℓ +Qℓ⟩
},
where we have used that −µ⟨X−Z,Bℓ⟩ = −⟨Qℓ,X⟩+ ⟨Qℓ,Z⟩. Taking the initial matrices
B0 = P0 = Q0 = 0, it follows that Pℓ +Qℓ = 0 for all ℓ, and we arrive at the iteration
(Xℓ+1,Zℓ+1) = argminX,Z∈Rk×m
{E(X,Z) +
µ
2∥X− Z+Bℓ∥2F
},
Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.
20
Applying alternating minimization, this leads to the following iteration scheme,
Xℓ+1 = argminX∈Rk×m
{1
2∥DX−Y∥2F +
α
2Tr(XLXT ) +
µ
2∥X− Zℓ +Bℓ∥2F
},
Zℓ+1 = argminZ∈Rk×m
{λ∥Z∥1 +
µ
2∥Xℓ+1 − Z+Bℓ∥2F
}, (11)
Bℓ+1 = Bℓ − Zℓ+1 +Xℓ+1.
The functional in the first equation of (11) is differentiable, and we obtain the necessary
condition
DT (DX−Y) + αXL+ µ(X− Zℓ +Bℓ) = 0,
i.e.,
(DTD+ µI)X+ αXL = DTY + µ(Zℓ −Bℓ).
This equation is uniquely solvable if the eigenvalues α1, . . . , αk of (DTD + µI) and the
eigenvalues β1, . . . , βk of αL satisfy
αi + βj = 0 ∀ i = 1, . . . , k, j = 1, . . . ,m,
see Bartels and Stewart (1972); Bhatia and Rosenthal (1997). This is true in our case since
(DTD+ µI) is positive definite for µ > 0 and L is positive semidefinit.
The second subproblem in (11) can be simply solved by component-wise shrinkage. For
each single component zℓ+1i,j of Zℓ+1 = (zℓ+1
i,j )k,mi=1,j=1 we have
zℓ+1i,j = argmin
z∈R{λ |z|+ µ
2|z − xℓ+1
i,j −Bℓi,j |2}
i.e.,
0 ∈ λzℓ+1
|zℓ+1|+ µ
(zℓ+1 − xℓ+1
i,j −Bℓi,j
),
where z|z| denotes the set [−1, 1] if z = 0. Hence, we find the solution by soft shrinkage,
zℓ+1i,j = Tλ/µ(xℓ+1
i,j +Bℓi,j) :=
xℓ+1i,j +Bℓ
i,j − λµ for (xℓ+1
i,j +Bℓi,j) ≥ λ
µ ,
xℓ+1i,j +Bℓ
i,j +λµ for (xℓ+1
i,j +Bℓi,j) ≤ −λ
µ ,
0 otherwise.
21
APPENDIX B
In our paper, we have proposed two dictionary learning algorithms, called FDL-Graph and
SDL-Graph. In order to illustrate the dictionary learning step more clearly, we will give a
small toy example, where the training set contains the following 10 training patches of size
3× 3,
Y1 =
1 1 0
0 0 1
0 0 0
Y2 =
1 0 0
0 1 0
0 0 0
Y3 =
0 0 0
1 1 0
0 0 1
Y4 =
1 0 0
0 1 1
0 0 1
Y5 =
1 1 1
0 1 1
0 0 0
Y6 =
0 0 1
1 1 1
0 1 1
Y7 =
1 1 0
0 1 1
0 0 1
Y8 =
0 1 1
0 0 1
0 0 0
Y9 =
1 1 1
0 0 1
0 0 1
Y10 =
0 0 0
0 1 1
0 0 1
.
We construct the partition trees using the two algorithms in the previous section and show,
how the dictionary construction can be directly incorporated into the partition tree con-
struction. The example also shows that the obtained trees and thus the found dictionary
elements slightly differ for the two approaches.
FDL-Graph algorithm
First level: The training set is Y1,Y2, · · · ,Y10. We compute the mean (center) C1 =
110
10∑i=1
Yi and the normalized singular vectors to the maximal singular value of C1CT1 and
CT1 C1, i.e.,
u1 = argmax∥x∥2=1
C1CT1 x, v1 = argmax
|∥x∥2=1CT
1 C1x.
Then we get the first dictionary element D1 = u1vT1 .
Second level: The training set is divided into two partial sets according to the de-
scription for FDL tree partitioning, we call them L21 and L22, where L21 = {Y5,Y8,Y9},
22
L22 = {Y1,Y2,Y3,Y4,Y6,Y7,Y10}. The center matrices of the two knots sets are C2 =
13(Y5 +Y8 +Y9) and C3 = 1
7(Y1 +Y2 +Y3 +Y4 +Y6 +Y7 +Y10). We compute the
maximal singular values λ2, λ3 of center matrices C2,C3 and the first left and right singular
vectors u2, v2 and u3, v3 of C2, C3, respectively. Now, we obtain the second dictionary
element D2 = λ2u2vT2 − λ3u3v
T3 .
Third level: The partition procedure yields now the subsets L31 = {Y8}, L32 =
{Y5,Y9} of L21, and the subsets L33 = {Y3,Y6}, L34 = {Y1,Y2,Y4,Y7,Y10} of L22.
We compute the center matrices of each partial set, C4 = Y8, C5 = 12(Y5 + Y9), C6 =
12(Y3 +Y6), and C7 =
15(Y1 +Y2 +Y4 +Y7 +Y10). We get the dictionary element
D3 = λ3u3vT3 − λ4u4v
T4 ,
where λ4 and λ5 are maximal singular values of the center matrices C4 and C5, and u4, v4
and u5,v5 are first singular vectors of C4, C5. Similarly,
D4 = λ6u6vT6 − λ7u7v
T7 ,
where λ6 and λ7 are maximal singular values of center matrices C6 and C7 with corre-
sponding singular vectors u6, v6, u7, v7.
Fourth level: The training set L32 is split into L41 = {Y5}, L42 = {Y9}, L33 is
split into L43 = {Y3}, L44 = {Y6}; and L3,4 is split into L45 = {Y1,Y7} and L46 =
{Y2,Y4,Y10}. Then D5, D6 and D7 can be learned from {L41, L42}, {L43, L44}, and
{L45, L46}.
Fifth level: The training set L45 is split into L51 = {Y1} and L52 = {Y7}, L46 is split
into L53 = {Y2} and L54 = {Y4,Y10}, and we learn the dictionary elements D8 and D9.
Sixth level: The training set L54 is split into L61 = {Y4} and L62 = {Y10}, and we
learn the dictionary element D10.
The SDL-Graph algorithm gives us the following partition tree.
First level: The training set is {Y1,Y2, · · · ,Y10}.
23
Second level: The training subsets are {Y1} and {Y2,Y3, · · · ,Y10}.
Third level: The training subsets are {Y5} and {Y2,Y3,Y4,Y6,Y7,Y8,Y9,Y10}.
Fourth level: The training subsets are {Y7,Y4} and {Y2,Y3,Y6,Y8,Y9,Y10}.
Fifth level: The training subsets are {Y7}, {Y4}, {Y9} and {Y2,Y3,Y6,Y8,Y10}.
Sixth level: The training subsets are {Y8} and {Y2,Y3,Y6,Y10}.
Seventh level: The training subsets are {Y3} and {Y2,Y6}.
Eighth level: The training subsets are {Y2} and {Y6}.
For this tree, we obtain 9 dictionary elements, one at the first, one at the second, one
at the third, one at the fourth, two at the fifth, one at the sixth, one at the seventh and
one at the eigth level.
24
REFERENCES
Aharon, M., M. Elad, and A. Bruckstein, 2006, The K-SVD: An algorithm for designing of
overcomplete dictionaries for sparse representation: IEEE Trans Signal Process, 54, no.
11, 4311-4322, doi: 10.1109/TSP.2006.881199.
Anagaw, A. Y., and MD. Sacchi, 2012, Edge-preserving seismic imaging using the total
variation method: Journal of Geophysics and Engineering, 9, no. 2, 138-146.
Bartels, R.H., and G.W. Stewart, 1972, Solution of the matrix equation AX + XB = C:
Commun. ACM 15, no. 9, 820-826.
Bechouche, S., and J. Ma, 2014, Simultaneously dictionary learning and denoising for seismic
data: Geophysics, 79, A27-A31, doi: 10.1190/geo2013-0382.1.
Belkin, M., and P. Niyogi, 2003, Laplacian eigenmaps for dimensionality re-
duction and data representation: Neural computation, 15, no. 6, 1373-1396,
doi:10.1162/089976603321780317.
Beck, A., and M. Teboulle, 2009, A fast iterative shrinkage-thresholding algorithm
for linear inverse problems: SIAM Journal on Imaging Sciences, 2, no. 1, 183-202,
doi:10.1137/080716542.
Bhatia, R., and P. Rosenthal, 1997, How and why to solve the operator equation AX+XB =
Y : Bull. London Math. Soc., 29, no. 1, 1-21.
Bougleux, S., A. Elmoataz, and M. Melkemi, 2009, Local and nonlocal discrete regular-
ization on weighted graphs for image and mesh processing: International Journal of
Computer Vision, 84, no. 2, 220-236, doi:10.1007/s11263-008-0159-z.
Bonar, D., and M. Sacchi, 2012, Denoising seismic data using the nonlocal means algorithm:
Geophysics, 77, no. 1, A5–A8, doi: 10.1190/geo2011-0235.1.
Cai, J. F., S. Huang, H. Ji, Z. Shen, and G. Ye, 2014, Data-driven tight frame construction
and image denoising: Applied and Computational Harmonic Analysis, 37, no. 1, 89-105.
Canales, L. L., 1984, Random noise reduction: Expanded Abstracts of Annual Meeting,
Society of Exploration Geophysicists, 525-527, doi: 10.1190/1.1894168.
Chanerley, A. A., and N. A. Alexander, 2002, An approach to seismic correction which in-
25
cludes wavelet de-noising: Proceedings of the Sixth Conference on Computational Struc-
tures Technology, pp. 107-108.
Chambolle, A., and T. Pock, 2011, A First-Order Primal-Dual Algorithm for Convex Prob-
lems with Applications to Imaging: Journal of Mathematical Imaging and Vision 40, no.
1, 120-145, doi:0.1007/s10851-010-0251-1.
Ding, C., and X. He, 2004, K-means clustering via principal component analysis:
Proceedings of the Twenty-First International Conference on Machine Learning, 29,
doi:10.1145/1015330.1015408.
Dong, W., L. Zhang, G. Shi, and X. Li, 2013, Nonlocally centralized sparse representation
for image restoration: IEEE Transactions on Image Processing, 22, no. 4, 1620-1630,
doi:10.1109/TIP.2012.2235847.
Elad, M., and M. Aharon, 2006, Image denoising via sparse redundant representations over
learned dictionaries: IEEE Transactions on Image Processing, 15, no. 12, 3736-3754,
doi:10.1109/TIP.2006.881969.
Engan, K., S. O. Aase, and J. H. Husoy, 1999, Method of optimal directions for frame
design: IEEE International Conference on Acoustics, Speech, and Signal Processing, 5,
doi: 10.1109/ICASSP.1999.760624.
Elmoataz, A., O. Lezoray, and S. Bougleux, 2008, Nonlocal discrete regularization on
weighted graphs: a framework for image and manifold processing: IEEE transactions
on Image Processing, 17, no. 7, 1047-1060, doi:10.1109/TIP.2008.924284.
Goldstein, T., and S. Osher, 2009, The split Bregman method for L1-regularized problems:
SIAM J. Imaging Sciences, 2, no. 2, 323-343, doi:10.1137/080725891.
Hein, M., and M. Maier, 2006, Manifold denoising: Advances in Neural Information Pro-
cessing Systems, 19, 561-568.
Hennenfent, G., and F. J. Herrmann, 2006, Seismic denoising with nonuniformly sampled
curvelets: Computing in Science and Engineering, 8, 16-25, doi:10.1109/MCSE.2006.49.
Herrmann, F. J., and G. Hennenfent, 2008, Non-parametric seismic data recovery with
curvelet frames: Geophysical Journal International, 173, no. 1, 233-248.
26
Kheradmand, A., and P. Milanfar, 2014, A general framework for regularized, similarity-
based image restoration: IEEE Transactions on Image Processing, 23, no. 12, 5136-5151,
doi:10.1109/TIP.2014.2362059.
Lee, H., A. Battle, R. Raina, and A. Y. Ng, 2007, Efficient sparse coding algorithms:
Advances in Neural Information Processing Systems, 19, 801-808.
Liu, X., D. Zhai, D. Zhao, G. Zhai, and W. Gao, 2014, Progressive image denoising through
hybrid graph laplacian regularization: a unified framework: IEEE Transactions on Image
Processing, 23, no. 4, 1491-1503, doi:10.1109/TIP.2014.2303638.
Liu, L., G. Plonka, and J. Ma, 2017, Seismic data interpolation and denoising by learning
a tensor tight frame: Inverse Problems, to appear.
Lu, X., H. Yuan, P. Yan, Y. Yuan, and X. Li, 2012, Geometry constrained sparse coding
for single image super-resolution: IEEE Conference on Computer Vision and Pattern
Recognition, 1648-1655, doi10.1109/CVPR.2012.6247858.
Naghizadeh, M., 2010, A unified method for interpolation and de-noising of seismic records
in the f-k domain: Expanded Abstracts of Annual Meeting, Society of Exploration Geo-
physicists, 3579-3583, doi: 10.1190/1.3513593.
Ma, J., and G. Plonka, 2010, The curvelet transform: IEEE Signal Processing Magazine,
27, no.2, 118-133, doi: 10.1109/MSP.2009.935453.
Neelamani, R., A. Baumstein, D. Gillard, M. Hadidi, and W. Soroka, 2008, Coherent and
random noise attenuation using the curvelet transform: The Leading Edge, 27, no.2,
240–248, doi: 10.1190/1.2840373.
Needell, D., and R. Vershynin, 2010, Signal recovery from incomplete and inaccurate mea-
surements via regularized orthogonal matching pursuit: IEEE Journal of Selected topics
in signal processing, 4, no. 2, 310-316, doi:10.1109/JSTSP.2010.2042412.
Pan, Q., L. Zhang, G. Dai, and H. Zhang, 1999, Two denoising methods by wavelet
transform: IEEE Transactions on Signal Processing, 47, no. 12, 3401-3406, doi:
10.1109/78.806084.
Plonka, G., and J. Ma, 2011, Curvelet-wavelet regularized split Bregman iteration for
27
compressed sensing, Int. J. Wavelets Multiresolut Inf. Process. 9, no. 1, 79-110.
doi:10.1142/S0219691311003955.
Reiter, DT., and W. Rodi, 1996, Nonlinear waveform tomography applied to crosshole
seismic data: Geophysics, 61, no. 3, 902-913, doi: 10.1190/1.1444015.
Starck, JL., EJ. Candes, and DL. Donoho, 2002, The curvelet transform for im-
age denoising: IEEE Transactions on Image Processing, 11, no. 6, 670-684,
doi:10.1109/TIP.2002.1014998.
Tang, Y., Y. Shen, A. Jiang, N. Xu, and C. Zhu, 2013, Image denoising via graph regularized
K-SVD: IEEE International Symposium on Circuits and Systems (ISCAS), 2820-2823,
doi:10.1109/ISCAS.2013.6572465.
Yankelevsky, Y., and M. Elad, 2016, Dual graph regularized dictionary learning: IEEE
Transactions on Signal and Information Processing over Networks, 2, no. 4, 611-624,
doi:10.1109/TSIPN.2016.2605763.
Yu, S., J. Ma, X. Zhang, and M. Sacchi, 2015, Denoising and interpolation of
high-dimensional seismic data by learning tight frame: Geophysics, 80, V119-V132,
doi:10.1190/geo2014-0396.1.
Yu, S., J. Ma, and S. Osher, 2016, Monte Carlo data-driven tight frame for seismic data
recovery: Geophysics, 81, no. 4, V327-V340.
Zhang, R., and T. Ulrych, 2003, Physical wavelet frame denoising: Geophysics, 68, 225-231,
doi: 10.1190/1.1543209.
Zeng, X., W. Bian, W. Liu, J. Shen, and D. Tao, 2015, Dictionary pair learning on Grass-
mann manifolds for image denoising: IEEE Transactions on Image Processing, 24, no.
11, 4556-4569, doi:10.1109/TIP.2015.2468172.
Zheng, M., J. Bu, C. Chen, C. Wang, L. Zhang, G. Qiu, and D. Cai, 2011, Graph regularized
sparse coding for image representation. IEEE Transactions on Image Processing, 20, no.
5, 1327-1336, doi:10.1109/TIP.2010.2090535.
28
Figure Captions:
Figure 1: The weighted and undirected graph, where the 0/1 denotes a mask for k-
nearest neighbors.
Figure 2: The process for teaching a dictionary by FDL-Graph and SDL-Graph algo-
rithms, where λi is the maximal singular value of center matrix Ci in each knot, ui and vi
are the first vectors in the singular value decomposition of Ci.
Figure 3: Seismic Data: (a) real data; (b) real data with random noise; and (c) field
data.
Figure 4: Comparison of dictionaries learned through two methods: (a) FDL-Graph;
(b) SDL-Graph.
Figure 5: Denoising results for real data with noise level=25: (a) FX-Decon; (b)
Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.
Figure 6: Single trace comparison of the reconstructions with the clear data. (a) FX-
Decon; (b) Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.
Figure 7: Denoising performance of FX-Decon, Curvelet, FDL-OMP, SDL-OMP, FDL-
Graph and SDL-Graph algorithms at different noise levels.
Figure 8: Denoising of field data by FX-Decon (a) and Curvelet approach (c). Figures
in (b) and (d) present the differences between the (noisy) data and the denoising result.
Figure 9: Denoising of field data by FDL-Graph (a) and SDL-Graph (c). Figures in (b)
and (d) present the differences between the (noisy) data and the denoising result.
29
Figure 1: The weighted and undirected graph, where the 0/1 denotes a mask for k-nearest
neighbors.
30
Figure 2: The process for teaching a dictionary by FDL-Graph and SDL-Graph algorithms,
where λi is the maximal singular value of center matrix Ci in each knot, ui and vi are the
first vectors in the singular value decomposition of Ci.
31
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(a) (b)
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(c)
Figure 3: Seismic Data: (a) real data; (b) real data with random noise; and (c) field data.
32
(a) (b)
Figure 4: Comparison of dictionaries learned through two methods: (a) FDL-Graph; (b)
SDL-Graph.
33
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(a) (b)
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(c) (d)
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(e) (f)
Figure 5: Denoising results for real data with noise level=25: (a) FX-Decon; (b) Curvelet;
(c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.
34
0 50 100 150 200 2500
50
100
150
200
250
Time sumpling number
Am
plitu
de
Clear DataFX−Decon
0 50 100 150 200 2500
50
100
150
200
250
Time sampling number
Am
plitu
de
Clear DataCurvelet
(a) (b)
0 50 100 150 200 25020
40
60
80
100
120
140
160
180
200
220
Trace number
Tim
e sa
mpl
ing
num
ber
Clear DataFDL−OMP
0 50 100 150 200 2500
50
100
150
200
250
Time sampling number
Am
plitu
de
Clear DataSDL−OMP
(c) (d)
0 50 100 150 200 25020
40
60
80
100
120
140
160
180
200
220
Time sampling number
Am
plitu
de
Clear DataFDL−Graph
0 50 100 150 200 2500
50
100
150
200
250
Time sampling number
Am
plitu
de
Clear DataSDL−Graph
(e) (f)
Figure 6: Single trace comparison of the reconstructions with the clear data.(a) FX-Decon;
(b) Curvelet; (c) FDL-OMP; (d) SDL-OMP; (e) FDL-Graph; (f) SDL-Graph.
35
10 20 30 40 5025
30
35
40
Noise Level
PS
NR
FDL−GraphSDL−GraphCurveletFX−Decon
10 20 30 40 5025
30
35
40
Noise Level
PS
NR
FDL−GraphSDL−GraphFDL−OMPSDL−OMP
(a) (b)
10 20 30 40 500
0.005
0.01
0.015
0.02
0.025
Noise Level
Err
or
FDL−GraphSDL−GraphCurveletFX−Decon
10 20 30 40 500
0.005
0.01
0.015
0.02
0.025
Noise Level
Err
or
FDL−GraphSDL−GraphFDL−OMPSDL−OMP
(c) (d)
10 20 30 40 500
5
10
15
20
25
30
Noise Level
Tim
e(s)
FDL−GraphSDL−GraphFDL−OMPSDL−OMLCurveletFX−Decon
(e)
Figure 7: Denoising performance of the FX-Decon, Curvelet, FDL-OMP. SDL-OMP, FDL-
Graph and SDL-Graph algorithms at different noise levels.
36
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(a) (b)
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(c) (d)
Figure 8: Denoising of field data by FX-Decon (a) and Curvelet approach (c). Figures in
(b) and (d) present the differences between the (noisy) data and the denoising result.
37
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(a) (b)
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
Trace number
Tim
e sa
mpl
ing
num
ber
50 100 150 200 250
50
100
150
200
250
(c) (d)
Figure 9: Denoising of field data by FDL-Graph (a) and SDL-Graph (c). Figures in (b)
and (d) present the differences between the (noisy) data and the denoising result.