Upload
guangming
View
222
Download
0
Embed Size (px)
Citation preview
Accepted Manuscript
Incremental Low-Rank and Sparse Decomposition for Compressing Videos
Captured by Fixed Cameras
Chongyu Chen, Jianfei Cai, Weisi Lin, Guangming Shi
PII: S1047-3203(14)00198-9
DOI: http://dx.doi.org/10.1016/j.jvcir.2014.12.001
Reference: YJVCI 1454
To appear in: J. Vis. Commun. Image R.
Received Date: 4 January 2014
Accepted Date: 26 November 2014
Please cite this article as: C. Chen, J. Cai, W. Lin, G. Shi, Incremental Low-Rank and Sparse Decomposition for
Compressing Videos Captured by Fixed Cameras, J. Vis. Commun. Image R. (2014), doi: http://dx.doi.org/10.1016/
j.jvcir.2014.12.001
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Incremental Low-Rank and Sparse Decomposition forCompressing Videos Captured by Fixed Cameras
Chongyu Chena, Jianfei Caib,∗, Weisi Linb, Guangming Shia
aSchool of Electronic Engineering, Xidian University, Xi’an, Shaanxi, 710071 ChinabSchool of Computer Engineering, Nanyang Technological University, 639798 Singapore
Abstract
Videos captured by stationary cameras are usually with a static or gradually
changed background. Existing schemes are not able to globally exploit the
strong background temporal redundancy. In this paper, motivated by the re-
cent advance on low-rank and sparse decomposition (LRSD), we propose to
apply it for the compression of videos captured by fixed cameras. In particular,
the LRSD is employed to decompose the input video into the low-rank com-
ponent, representing the background, and the sparse component, representing
the moving objects, which are encoded by different methods. Moreover, we
further propose an incremental LRSD (ILRSD) algorithm to reduce the large
memory requirement and high computational complexity of the existing LRSD
algorithm, which facilitates the process of large-scale video sequences without
much performance loss. Experimental results show that the proposed coding
scheme can significantly improve the existing standard codecs, H.264/AVC and
HEVC, and outperform the state-of-the-art background modeling based coding
schemes.
Keywords: Video coding, stationary camera, incremental low-rank and sparse
decomposition, CUR decomposition, background subtraction
∗Corresponding authorEmail addresses: [email protected] (Chongyu Chen), [email protected]
(Jianfei Cai), [email protected] (Weisi Lin), [email protected] (Guangming Shi)
Preprint submitted to Journal of Visual Communication and Image RepresentationDecember 1, 2014
1. Introduction
In practical surveillance and teleconference systems, large amount of videos
are captured by stationary cameras, which require efficient compression and fast
transmission. For these videos, the static or gradually changed background in
the scene is a common characteristic, which leads to much temporal redundancy.
Highly efficient compression of these videos is possible if such redundancy can
be effectively removed.
Standard video codecs, including H.264/AVC [1] and HEVC [2], are typi-
cally block-based. They achieve high efficiency in the compression of general
videos by exploiting both temporal and spatial redundancies. When compress-
ing videos captured by fixed cameras, further improvements on coding efficiency
can be achieved by using some specially designed configuration [3]. However,
the standard block-based codecs cannot exploit the strong background tempo-
ral redundancy in a global manner because they partition each video frame into
blocks.
Background subtraction (BGS) based coding techniques [3, 4] have been
proposed for compressing videos captured by fixed cameras. Zhang et al. [3]
propose a representative background difference based coding scheme, where the
background is first generated by background modeling; individual frames are
then subtracted by the background to obtain the difference. The difference se-
quence is finally encoded by H.264/AVC. Their idea of encoding the background
difference has been experimentally shown to be efficient for compressing surveil-
lance videos and thus has been adopted in IEEE 1857 [5] which is specially
targeted for surveillance videos. Further developments on the BGS method
have also been reported on improving the coding strategy of the residual video
at the macro-block level [6] and improving the background modeling part [7].
However, BGS based methods cannot well handle the cases with global illumi-
nation changes in the scene since it lacks of efficient way to adaptively adjust
the background.
Background prediction based methods [8, 9, 10] have also been proposed,
2
which indicate another way of utilizing the background. In particular, Paul et
al. [10] propose to use the most common frame in a scene (McFIS) as the long-
term reference frame, which has shown its ability in compressing short video
sequences. When the video sequence is long, a McFIS reference frame trained
from the first few frames may not be a good one. Thus, the McFIS based scheme
is usually combined with a scene change detection technique, which increases
the encoding complexity and cannot have a unified solution.
Recently, a few low-rank and sparse decomposition (LRSD) tools [11, 12,
13, 14] have been developed, which can decompose a surveillance video into
a low-rank component and a sparse component, approximately representing
the background and the foreground moving objects, respectively (see Fig. 1).
We notice that LRSD can be seen as a more general BGS, which can better
represent the background frames and unify the background modeling and the
background subtraction processes. In the case of illumination change, the low-
rank coefficients can easily capture the change in a graceful way. Thus, it can
produce background difference with less energy compared with the typical BGS.
Even in the case of static background, the low-rank coefficients can still help
mitigate frame variations to produce less residual.
Therefore, in this paper, we propose to apply LRSD for the compression of
videos captured by fixed cameras. In particular, we represent the frames of the
background component by very few independent frames based on the linear de-
pendency, which dramatically removes the temporal redundancy. The remaining
part, consisting of the sparse and residual components, can be efficiently com-
pressed by the existing block-based coding scheme. Moreover, by noticing that
the existing LRSD algorithm cannot handle high-resolution or long-time videos
due to its high memory requirement, we further propose an incremental LRSD
(ILRSD) algorithm that can effectively handle large-scale video sequences with-
out much performance loss. Experimental results on standard test sequences
show that, the proposed LRSD based or ILRSD based coding scheme can sig-
nificantly improve the existing video codecs.
The main contributions of this paper are twofold. First, we apply the LRSD
3
for compressing videos captured by fixed cameras, where we develop a coding
scheme for individual LRSD components. To the best of our knowledge, the
idea of applying LRSD for video compression has not been reported by others.
Second, we significantly improve the existing LRSD algorithm by reducing its
memory requirement and computation complexity, which gives LRSD the ability
in processing large-scale videos and the possibility of being applied in practical
applications.
We would like to point out that a preliminary version of this paper has
been reported in [15]. Compared with the previous conference version, this
paper employs a new LRSD algorithm, presents an insightful analysis of the
success of the proposed coding structure, provides more technical details and
experimental results, and, most importantly, proposes a brand-new incremental
LRSD algorithm for practical video compression.
The rest of this paper is organized as follows. Section 2 briefly introduces the
related theory of LRSD. Section 3 describes the proposed LRSD based video
coding scheme. In Section 4, we propose an incremental LRSD algorithm to
overcome the memory bottleneck of the existing LRSD method. In Section 5,
we conduct numerous experiments on different video sequences to compare the
proposed coding schemes with the state-of-the-art alternatives. Finally, Sec-
tion 6 concludes this paper.
2. Low-Rank and Sparse Decomposition
In matrix theory, the linear dependency among the columns of a matrix
is referred to as the low-rank property. As a result, if we stack many linear
dependent frames as the columns of a matrix L, then L is exactly low-rank
and its rank is identical to the number of its independent columns. Matrices
converted from videos captured by fixed cameras are expected to be low-rank
because of the static backgrounds. In this case, perturbations of such videos
can be seen as other matrices that are added to L.
The emerging theory of robust principal component analysis (RPCA) [11,
4
12, 13] provides a suitable formulation for the separation of perturbations and
background. That is,
A = L + S, (1)
where L is the low-rank matrix described above and S is a sparse matrix. Given
a matrix A, L and S can be found by RPCA algorithms such as the augmented
Lagrange multiplier (ALM) method [12] and the principal component pursuit
(PCP) [13], assuming that the low-rank component L is not sparse and the
sparse component S is not low-rank. For a matrix constructed by stacking
frames of a video captured by a fixed camera as columns, the assumption of
RPCA usually holds because its low-rank component is often the static back-
ground and thus is not sparse, while its sparse component often includes moving
objects that are linear independent and thus is not low-rank. An example of
the separation of a surveillance video via ALM [12] is shown in Fig. 1, which
shows the ability of RPCA algorithms in handling sparse perturbations caused
by moving objects.
(a) Original (b) Low-rank (c) Sparse
Figure 1: Different components separated by ALM [12]. (a) The first frame of the original
video sequence “Hall”. (b) The background restored from the first column of L. (c) The
foreground converted from the first column of S.
Existing RPCA algorithms often concentrate on finding more meaningful
decompositions. However, their complexity is often uncontrollable due to their
automatic and iterative solving procedure, which makes them unsuitable for
video coding. Recently, the GoDec [14] algorithm is proposed for separating
low-rank and sparse components of matrices, which also works well for matrices
constructed from videos captured by fixed cameras. The formulation of GoDec
5
can be seen as a noisy version of RPCA, that is
A = L + S + N, (2)
where matrix N is the noise component. Besides the controllable complexity,
GoDec also provides controllable rank of L and sparsity of S. These character-
istics make GoDec a good choice for video coding.
According to the theory of GoDec [14], the problem (2) can be solved by
minimizing the decomposition error:
minL,S
‖A − L − S‖2F , s. t. rank(L) ≤ r, ‖S‖0 ≤ k, (3)
where r is the target rank of L and k is the target sparsity of S. Here the
sparsity refers to the number of non-zero entries in S. In the GoDec method,
the final components L and S are found by solving the following subproblems
iteratively: ⎧⎪⎨⎪⎩
Lt = arg minrank(L)≤r
‖A − L − St−1‖2F
St = arg min‖S‖0≤k
‖A − Lt − S‖2F
, (4)
where the subscript t represents the t-th iteration. Given the rank of L and
the sparsity of S, Lt and St are computed efficiently by performing low-rank
approximation (LRA) and entry-wise hard thresholding alternatively:⎧⎨⎩
Lt = LRA(A − St−1, r)
St = THR(A − Lt, k), (5)
where “LRA(A−St−1, r)” represents the computation of the rank-r approxima-
tion of A − St−1 and “THR(A − Lt, k)” represents the entry-wise hard thresh-
olding of A − Lt with parameter k [16], i.e., keeping k entries of A − Lt that
have the largest absolute values. In general, the optimal LRA of a matrix can be
computed by the truncated singular value decomposition (SVD) [17]. However,
it is shown in [14] that near optimal LRA is sufficient for the convergence of
GoDec. Thus, the bilateral random projections (BRP) based LRA is employed
in GoDec to accelerate the computation.
6
Algorithm 1 summarizes the procedures of GoDec. In this algorithm, the
input parameters ε and tmax are the target relative error of decomposition and
the maximum number of iterations, respectively. The parameter tmax is used to
avoid infinite loop because the relative error of decomposition might not further
decrease after several iterations. It should be pointed out that the convergence
of GoDec comes from the combination of LRA and hard thresholding in each
iteration [14]. That is, the global optimalities of St and Lt yield decreasing
decomposition errors and the convergence to a local minimum.
Algorithm 1 GoDecInput: A, r, k, ε, tmax
Output: L, S
Initialize: S0 ← 0
for t = 1 → tmax do
Lt = LRA(A − St−1, r);
St = THR(A − Lt, k);
if ‖A − Lt − St‖2F /‖A‖2
F ≤ ε then
Break;
end if
end for
When applying GoDec on a matrix with an unknown “low-rank plus sparse”
(L+S) structure, it is necessary to determine the target rank and the sparsity in
advance. For the target rank r, according to [14], we can set the r to be large at
the beginning and then reduce it by checking the rank of Lt during the iterations.
When the target sparsity is unknown, Zhou and Tao [18] suggest replacing the
hard thresholding by a soft thresholding, resulting in a more adaptive algorithm
called semi-soft (SS) GoDec. The soft thresholding of an m× n matrix X with
a threshold τ is to change the entries of X as
X(i, j) =
⎧⎨⎩
X(i,j)|X(i,j)| (|X(i, j)| − τ), |X(i, j)| > τ
0, |X(i, j)| ≤ τ(6)
where i (1 ≤ i ≤ m) and j (1 ≤ j ≤ n) are the row index and the column index
7
respectively. Similar to GoDec, the convergence of SS GoDec is guaranteed by
the combination of LRA and soft thresholding in each iteration. For multimedia
applications, the soft thresholding is more reasonable than the hard thresholding
because the sparsity k is usually unknown. As a result, in this research, we
choose SS GoDec, rather than the original GoDec, as the base for the low-rank
and sparse decomposition (LRSD).
3. LRSD Based Video Coding
In this section, we propose a scheme to improve the coding efficiency of
block-based codecs based on the low-rank and sparse decomposition (LRSD). It
should be noted that our scheme can be combined with any block-based codec
such as H.264/AVC or HEVC.
Given a video sequence of resolution H × W , the proposed scheme consists
of the following steps:
1. Stack a set of frames of the video as columns of a matrix A ∈ Rm×n,
where m = HW and n is the number of frames;
2. Separate the components of A using SS GoDec, so that A = L + S + N ,
where L is a rank-r matrix, S is a sparse matrix, and N is a dense residual
matrix that has many small entries;
3. Compute a low-rank decomposition of L, so that L = CX, where the
m×r matrix C contains some columns of L, representing the independent
components of the background, and X is a r × n matrix, storing the
coefficients to recover each background frame based on the independent
components.
4. Construct S by normalizing the entries of S + N so as to ensure that the
entries of the dense matrix S are ranging from 0 to 255;
5. Convert S and the normalized C to two video sequences, denoted as VS
and VC respectively, and compress them separately using a base codec,
e.g. H.264/AVC or HEVC.
8
As shown in Fig. 2 (a), the compressed video sequence consists of four
parts, the bit streams of VC and VS , the r × n matrix X (“Coefficient 1”),
and the denormalization coefficients (“Coefficient 2”) for restoring S + N and
C. Fig. 2 (b) shows the corresponding decoding process. Based on the ob-
servation that SS GoDec often converges in less than 20 iterations, we set the
maximum number of iteration to be 20 in the proposed scheme. For the rest of
the section, we describe the steps of the encoding scheme in detail, and explain
our choices of parameters by showing some experimental results. For simplicity,
we use H.264/AVC as an example of our base codec.
LRSD
Normalization
Sparse and residual components
CUR decomposition
Low-rank component
Encoding by the base codec
Coefficients 1 Bit-stream 1 Bit-stream 2Coefficients 2
Video frames
Independent frames
Encoding by the base codec
Normalization
(a) encoder
Denormalization
Sparse and residual componentsLow-rank component
Decoding by the base codecDecoding by the base codec
Coefficients 1 Bit-stream 1 Bit-stream 2Coefficients 2
Decoded video frames
Independent frames
Multiplying
Adding
Denormalization
(b) decoder
Figure 2: The diagram of the proposed LRSD based video coding scheme.
3.1. Encoding the low-rank component via coding-oriented decomposition
In this paper, we propose to compress L by its low-rank property. In par-
ticular, we factorize the m × n matrix L into two small matrices by computing
9
the CUR decomposition [19] of L. That is,
L = CUR, (7)
where the m×r matrix C consists of r adaptively selected columns of L, the r×n
matrix R consists of r adaptively selected rows of L, and the r×r matrix U is the
pseudo-inverse of the intersection of C and R. In this way, L is divided into two
small matrices, C and X = UR. Matrix C is used to restore the r independent
frames of the background and construct a short video VC which only has r
frames. Note that we normalize C before converting it to VC . Next, we compress
VC via H.264/AVC and directly store X without compression considering the
amount of data for X is small. At the decoder side, C can be recovered by
stacking the denormalized frames of VC as columns. Then, the restoration of L
can be done by multiplying C and X.
Compared to other decomposition such as SVD and QR factorization, the
employed CUR decomposition is considered to be coding-oriented. This is
because CUR uses the original columns of L as the basis to represent other
columns, while SVD and QR factorization use orthogonal basis. General codecs
can easily exploit the redundancy between the original columns, but cannot
exploit the redundancy in the orthogonal basis.
Note that the low-rank component L can be directly converted to a video VL
that basically represents all the background frames. Although the frames of VL
are highly correlated, directly compressing VL via H.264/AVC is still less efficient
than the proposed CUR-based coding scheme. To verify this statement, we do
a comparison between the scheme of directly encoding VL and the proposed
CUR-based coding scheme. Here, the first 200 frames of the “Hall” video [20]
are used as the input sequence. We use identical quantization parameters for
both methods and the distortion of the decoded video is measured by averaging
the peak signal-to-noise-ratio (PSNR) of the luminance components. As shown
in Fig. 3, the proposed scheme is more efficient than directly compressing VL
via H.264/AVC, no matter the rank of L is 1, 3 or 5. This is mainly because
the block-based coding scheme is inefficient in exploiting the global redundancy
10
of the background frames. It can also be seen that the proposed scheme tends
to be less efficient as the rank increases. This is because the size of C increases
while the background of the scene is actually unchanged. Thus, the target rank
k is suggested to be set at the minimum necessary level, i.e. just matching the
number of dominant changes in the background.
0 2 4 6 8 10 12 14 16 18 20
30
35
40
45
50
Bit rate (kbps)
Aver
age
Y−
PS
NR
(dB
)
Code VL via H.264/AVC (rank=1)
Code VL via H.264/AVC (rank=3)
Code VL via H.264/AVC (rank=5)
Code VC
and store X (rank=1)
Code VC
and store X (rank=3)
Code VC
and store X (rank=5)
Figure 3: A comparison of coding the low-rank background via H.264/AVC and the proposed
scheme.
3.2. Encoding the sparse and residual components
To guarantee sufficiently high quality of the decoded video, both the sparse
component S and residual component N have to be compressed. In the proposed
scheme, we first convert S +N to a video denoted as VS , and then VS is directly
encoded. Note that the entries of S + N can be positive and negative. So the
normalization of these entries is necessary before converting S + N to VS . As
a result, the maximum and minimum entries of S + N must be stored, which
constitute “coefficient 2” shown in Fig. 2. Existing block-based codecs such as
H.264/AVC are expected to be efficient in compressing VS , because there are
many near flat blocks in each frame of VS , which become flat after moderate
quantization. Note that any optimization of the base codec in macro-block level
such as [6] can be employed for further compressing VS , which is not the focus
of this paper.
Fig. 4 shows the comparisons of compressing VS when the threshold of
SS GoDec, denoted as τ , changes. These comparisons indicate that the coding
11
efficiency of S + N is not sensitive to τ . Thus, we empirically set τ to 10 when
using SS GoDec in the proposed scheme. In addition, the comparison between
compressing the original video and compressing VS shows that the performance
gain of the proposed scheme becomes saturated at high bit rates or very high
PSNR ranges. This is mainly because we normalize S +N into a value range of
[0, 255] so as to facilitate the subsequent 8-bit H.264/AVC coding, which causes
information loss.
0 50 100 150 200 250 30020
25
30
35
40
45
50
Bit rate (kbps)
Av
erag
e Y
−P
SN
R (
dB
)
Threshold = 6
Threshold = 12
Threshold = 18
Original video
Figure 4: A comparison of coding VS and the original video via H.264/AVC when the threshold
of SS GoDec changes.
3.3. Comparing with background subtraction
In this subsection, we use an example to illustrate the advantages of LRSD
in terms of background modeling and background subtraction. As we know,
videos captured by a stationary camera often contain large background changes
due to the automatic exposure control of the camera or the change of lighting
condition, which will deteriorate the coding performance of the background
subtraction based schemes [3, 6]. In contrast, the proposed LRSD can be viewed
as a generalized BGS approach that is adaptive to large background changes,
especially for that caused by the illumination change.
When dealing with illumination changes, pixel-based background modeling
methods, such as Gaussian mixture model (GMM) [21] and segment-and-weight
based running average [7], tend to produce pixel-wise changes in the back-
ground. For illustration, we show in Fig. 5 some representative background
frames extracted by LRSD and GMM on the test sequence “Lobby” [20]. The
LRSD background frames shown in Fig. 5 (a) and (b) are obtained by applying
12
SS GoDec with r = 2, τ = 10, and tmax = 20. Two representative background
frames obtained by GMM are shown in Fig. 5 (c) and (d), which have unpleas-
ant artifacts. We observe that GMM, as well as other pixel-based background
modeling methods, often produce a sequence of background frames whose tem-
poral redundancy is difficult to be removed. On the contrary, the redundancy
between the LRSD background frames can be efficiently removed by the existing
codec. In addition, the linear combination of LRSD background frames can well
represent the background of individual frames.
(a) LRSD BG 1 (b) LRSD BG 2 (c) GMM BG 1 (d) GMM BG 2
Figure 5: The background frames extracted from the “Lobby” video: (a) and (b) are the
background frames extracted by SS GoDec; (c) and (d) are the background frames (mean
values) extracted by GMM [21] at the 195-th and 200-th frames, respectively.
Another advantage of LRSD in representing the background lies in its ability
in indicating significant global changes, which could be used for scene change
detection. In the “Lobby” video, the background illumination keeps changing
from the 125-th frame to the 155-th frame. Such changes are exactly reflected
in the low-rank coefficients X shown in Fig. 6. That is, the coefficient for back-
ground 1 changes from around 1 to around 0 and the coefficient for background 2
changes inversely when the scene becomes dark. Therefore, a significant change
of the low-rank coefficient can be used as an indicator of scene change, which
requires much less computations compared to the reconstructed frame based
method [10]. Considering scene change detection is beyond the scope of this
paper, we do not discuss it in detail.
We further compare the difference frames generated by LRSD and the con-
ventional way of background subtraction that uses the same set of backgrounds
as LRSD and chooses the best one for each frame. Fig. 7 shows the energy of
13
50 100 150 200 250 300 350 400
0
0.2
0.4
0.6
0.8
1
Frame index
Low
−ran
k co
effic
ient
s
Coefficients for background 1Coefficients for background 2
Figure 6: The low-rank coefficients for the “Lobby” video.
�� ��� ��� ��� ��� ��� ��� ���
�
�
�
������
� ��������
����������� ��
�������
Figure 7: The residual energies for the “Lobby” video.
individual difference frames, which is measured by the sum of the square of indi-
vidual entries. It can be seen that LRSD produces smaller residual energy at the
places with large illumination changes. This is mainly because the coefficients
X can combine the background frames C well to produce better background for
each individual frame.
4. Proposed Incremental LRSD
Although the proposed LRSD based video coding scheme is able to improve
the existing block-based codecs in compressing videos captured by fixed cameras
(see Section 5), there still exist several problems when applying it to practical
video surveillance systems. First, SS GoDec requires an input matrix to be
fully stored in the memory, which is unsuitable for high-resolution or long-time
video processing. Second, the complexity of the LRSD algorithm is relatively
high for real-time processing in existing surveillance cameras, which are usually
of limited computation and storage resources. In addition, accumulating video
frames for the LRSD process at the camera side also causes additional delay in
delivering the video content.
14
The problems described above mainly come from the disability of the SS GoDec
in incrementally processing the matrix. Therefore, in this section, we propose
two extensions of the SS GoDec to solve these problems. The first extension
called incremental sparse decomposition is able to compute the low-rank coeffi-
cients and recover the sparse component of new matrix columns using a given
low-rank structure, and the second extension called incremental low-rank recov-
ery is able to recover the low-rank structure incrementally without storing the
entire matrix in the memory. To the best of our knowledge, such an incremen-
tal LRSD algorithm with controllable rank has not been reported in literature
before.
4.1. Incremental Sparse Decomposition with Given Low-Rank Structure
For surveillance videos, the scene background or the low-rank component
usually does not change frequently. Thus, the previously or offline obtained
low-rank structure can be reused for the L+S decomposition of the new video
frames. Also, the existing background modeling based coding schemes [3, 4, 10]
usually use the first few frames of a group of pictures (GOP) to extract the
background. This motivates us to propose an extension of the SS GoDec that is
able to perform LRSD on new matrix columns with a given low-rank structure,
which is called incremental sparse decomposition in this paper.
In particular, the decomposition now becomes
A′ = CX ′ + S′ + N ′, (8)
where the m×r matrix C is the given low-rank structure, A′ is an m×n′ matrix
representing the newly stacked video frames to be decomposed, X ′ is an r × n′
matrix to store the low-rank coefficients, and S′ and N ′ are the corresponding
sparse and residual components respectively. Similar to the original GoDec, we
can recover X ′ and S′ by solving the following subproblems in each iteration:⎧⎪⎨⎪⎩
X ′t = arg min ‖A′ − CX ′ − S′
t−1‖2F
S′t = arg min
‖S′‖0≤k‖A′ − CX ′
t − S′‖2F
, (9)
15
where the subscript t represents the t-th iteration. Since the first part of (9) can
be seen as a least square problem with given A′, C, and S′t−1, the coefficients
X ′t can be directly computed by
X ′t = C+(A′ − S′
t−1), (10)
where C+ is the pseudo-inverse of C. Meanwhile, as pointed out by Zhou and
Tao [18], we can solve S′t by performing a soft thresholding on A′ −CX ′
t, which
is the same as SS GoDec.
As described in Section 2, the convergence of SS GoDec is guaranteed by the
combination of LRA and soft thresholding in each iteration. Note that the LRA
step is actually projecting A − St−1 to a low-rank subspace that approaches to
span{C}, where span{C} represents the linear subspace spanned by the columns
of C. Here, the operation in (10) can be seen as the projection of A′ − S′t−1
to span{C}. Thus, the combination of the subspace projection (10) and soft
thresholding also guarantees the convergence.
Note that, when the target rank r = 1, the proposed incremental sparse
decomposition is significantly simplified. Its complexity is comparable to that
of the typical background subtraction process since the subspace projection (10)
is reduced to computing an inner product and it usually converges within 5
iterations.
4.2. Incremental Recovery of the Low-Rank Structure
The proposed incremental sparse decomposition requires a given low-rank
structure, which could be offline pre-computed or obtained based on the first few
frames. Typically, using large number of frames or the entire GOP can recover
a better low-rank structure. However, in practice, it is difficult to compute a
global low-rank structure since the input matrix is too large to be stored in the
memory, especially for long-time or high-resolution videos. Here we propose
another extension of SS GoDec to incrementally recover the global low-rank
structure. Our basic idea is to partition the input matrix into several sub-
matrices to compute individual local low-rank components and then convert
16
these local low-rank components into a global one.
In particular, inspired by the divide-factor-combine (DFC) matrix comple-
tion framework [22], we propose to compute the global low-rank structure C
using the following five steps:
1. Partition the input m × n matrix A into p sub-matrices of identical size
m × np, i.e.,
A = [A1, A2, . . . , Ap]. (11)
2. Perform SS GoDec on each sub-matrix, so that
Ai = Li + Si + Ni, 1 ≤ i ≤ p, (12)
where Li is the low-rank component, Si is the sparse component, and Ni
is the residual component.
3. Extract the low-rank structure of each sub-matrix by the CUR decompo-
sition, i.e.,
Li = CiUiRi, 1 ≤ i ≤ p; (13)
4. Perform SS GoDec on the combined low-rank structures, so that
[C1, C2, . . . , Cp] = Lc + Sc + Nc, (14)
where Lc is the low-rank component, Sc is the sparse component, and Nc
is the residual component.
5. Extract the low-rank structure of Lc by CUR decomposition, i.e.,
Lc = CUR. (15)
Then, the m×r matrix C is considered as the low-rank structure of A. It should
be noted that the low-rank structure recovered by the incremental low-rank
recovery is less optimal compared to the original SS GoDec. This is because
SS GoDec achieves global optimalities of St and Lt in each iteration, while
the incremental low-rank recovery achieves local optimalities of Li and Si in
each partition. In other words, low-rank structure recovered incrementally is an
approximation to that recovered globally. Numerical evaluations of the accuracy
17
loss are reported in Section 4.3, which indicates that such loss is acceptable in
practical applications.
From the above steps, we can see that the proposed incremental method
requires a memory of at most max{O(mn/p), O(mrp)}, which is much smaller
than the memory requirement of O(mn) in SS GoDec for some typical values of
p and r. Thus, the proposed incremental method is more suitable for large-scale
LRSD. Note that using some hierarchical implementation could further reduce
the memory requirement.
4.3. Numerical Evaluation of the Proposed ILRSD
Since the proposed incremental LRSD algorithm (ILRSD) is an approxima-
tion to the original SS GoDec, in this subsection, we numerically evaluate how
effective the proposed incremental scheme is by applying it for the low-rank and
sparse decomposition of large matrices, where the proposed incremental LRSD
scheme requires a two-pass process: the incremental low-rank recovery pass (see
Section 4.2) and the incremental sparse decomposition pass (see Section 4.1).
In particular, we generate numerous m × m matrices that are in the form of
A = L+S +N . For each matrix, the low-rank component is the product of two
small random matrices, i.e., L = BD, where both the m × r matrix B and the
r×m matrix D are standard Gaussian matrices. The sparse component S is an
m×m matrix with only 0.2m2 non-zero entries that are drawn from a standard
Gaussian distribution. The residual component N is an m×m Gaussian matrix
with a zero mean and a standard deviation of 10−3. The threshold parameter
τ is set to 0.01 and the maximum number of iterations is set to 10 for both
ILRSD and SS GoDec. In ILRSD, to obtain the global low rank structure C,
each input matrix is partitioned into 10 sub-matrices.
Table 1 shows the simulation results, where the matrices L and S are the
estimated low-rank and sparse components respectively. Each value in the table
is the average result over 100 simulations. To see the ability of ILRSD in
reducing the memory cost, we use a common PC with a memory of 2 GB and a
dual-core CPU of 2.67 GHz in these simulations. It can be seen that SS GoDec
18
Table 1: Errors and time costs of SS GoDec and the proposed ILRSD algorithm. The results
separated by “/” are obtained by SS GoDec and the proposed scheme respectively.
m r ‖L−L‖F‖L‖F
‖S−S‖F‖S‖F
Time (second)
1200 10 2.22 × 10−4 / 8.08 × 10−4 1.03 × 10−2 / 1.07 × 10−2 4.87 / 4.68
2400 20 1.56 × 10−4 / 4.73 × 10−4 1.02 × 10−2 / 1.06 × 10−2 20.63 / 30.06
3600 30 1.28 × 10−4 / 3.36 × 10−4 1.02 × 10−2 / 1.05 × 10−2 49.11 / 68.52
4800 40 N. A. / 2.62 × 10−4 N. A. / 1.04 × 10−2 N. A. / 129.64
fails due to insufficient memory when the input matrix dimension of m goes
up to 4800, while our proposed scheme can still perform the decomposition.
When the matrix dimension is relatively low, the proposed scheme achieves
a decomposition performance comparable to that of SS GoDec in terms of the
relative errors for L and S. The relative errors decrease as the size of the matrix
increases. From the table, we can also see that the time cost of the proposed
scheme is often higher than that of SS GoDec. This is mainly because the
incremental low-rank recovery pass for obtaining the global low-rank structure C
is currently implemented in a serial manner, which could be greatly accelerated
by processing individual sub-matrices in a parallel manner.
4.4. ILRSD Based Video Coding
The proposed incremental LRSD (ILRSD) algorithm solves the two limita-
tions of SS GoDec, which makes our LRSD based coding scheme more suitable
for practical usages. Moreover, because the two passes of ILRSD are indepen-
dent to each other, it also introduces more flexibility when applying ILRSD to
different video coding scenarios. For example, for offline encoding or transcod-
ing stored surveillance videos, ILRSD can be directly used to replace the LRSD
step in the proposed coding scheme described in Section 3. That is, both the
incremental low-rank recovery pass and the incremental sparse decomposition
pass are applied to the entire video to compute the low-rank structure, low-rank
coefficients, and sparse components, which can be seen as a large-scale exten-
sion of the proposed LRSD based coding scheme. For real-time remote video
19
surveillance systems with fixed cameras, we can extract the global background
in advance using the incremental low-rank recovery pass of the proposed ILRSD
algorithm. Then, during the real-time operation, only the sparse and residual
components of the current frames need to be encoded and transmitted. In this
way, more bits can be saved and the bandwidth requirement can be reduced.
5. Experimental Results
In this section, we conduct experiments to evaluate the performance of the
proposed schemes for compressing videos captured by fixed videos. First, we
compare the proposed schemes with H.264/AVC [23] and HEVC [2] to show that
our schemes can improve the coding efficiency of the existing standard codecs.
Then, we compare the proposed ILRSD based scheme with the state-of-the-art
background modeling based schemes. All the experiments are conducted on
a common PC with an Intel(R) Core(TM) i5-2400 CPU and a memory of 4
GB. The incremental low-rank recovery and sparse decomposition algorithms
are implemented in MATLAB.
5.1. Comparison with standard codecs
We use the H.264/AVC reference software JM18.4 1 with High Profile and
the HEVC reference software HM12.0 2. JM is configured according to the ITU-
T recommendations [24], and HM is configured according to its Main Profile in
low-delay mode. It is reported in [3] that H.264/AVC can achieve higher coding
efficiency in surveillance video coding by using a special configuration. Thus, we
also include this fine-tuned H.264/AVC for comparison. Note that no fine-tuned
configuration is used in the proposed coding schemes.
At this stage, four representative surveillance videos with low resolution are
used, named “Hall”, “Escalator”, “Campus”, and ‘Lobby” 3, shown in Fig. 8.
1http://iphome.hhi.de/suehring/tml/download/2http://hevc.hhi.fraunhofer.de/3http://perception.i2r.a-star.edu.sg/bk model/bk index.html
20
For simplicity, we only use the first 400 frames of “Lobby” and 350 frames of
the rest videos. Besides the general surveillance video “Hall” that contains the
stationary background and common moving objects, different test videos have
different characteristics. In the “Escalator” video, there are several escalators
that cause periodic perturbations. The “Campus” video has several trees that
cause irregular perturbations, and the “Lobby” video has a sharp change of
brightness caused by turning off the lights.
(a) Hall (b) Escalator (c) Campus (d) Lobby
(e) Silent (CIF) (f) Bridge close
(CIF)
(g) PETS 2007 (SD)
Figure 8: Sample frames of the seven test video sequences. Top row: four low-resolution
videos. Bottom row: three relatively high-resolution videos.
In this comparison, for SS GoDec and the proposed ILRSD, we empirically
set the threshold τ to 10 and the maximum number of iterations to 20. The
target rank is set to 2 for the “Lobby” video due to the illumination change,
and 1 for other videos. Note that if the GOP information is available, we can
use a target rank of 1 in each GOP by assuming that the background is ap-
proximately unchanged in the GOP. For ILRSD, each sub-matrix is constructed
using 25 video frames. For VC and VS defined in Section 3, considering that
VC , representing the essential background information for all the background
frames, is more important than VS in terms of overall reconstruction quality,
we set the quantization parameter (QP) for encoding VC to be half as much as
21
that for encoding VS in the proposed schemes.
Fig. 9 shows the PSNR performance of different coding schemes under differ-
ent rates through varying the QP of VS from 10 to 48. It is shown in Fig. 9 that
20 40 60 80 100 120
32
34
36
38
40
42
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC
(a) Hall
50 100 150 200 25028
30
32
34
36
38
40
42
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC
(b) Escalator
50 100 150 200 250 30028
29
30
31
32
33
34
35
36
37
38
39
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC
(c) Campus
5 10 15 2036
38
40
42
44
46
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC
(d) Lobby
Figure 9: PNSR performance of compressing the four low-resolution videos by the proposed
LRSD based and ILRSD based schemes using H.264/AVC as the base codec, compared with
H.264/AVC with a special configuration reported in [3].
for the four test videos, the proposed coding schemes can significantly improve
the base codec, H.264/AVC. Using the proposed schemes, much less bits are re-
quired to achieve a sufficient quality of the reconstructed frames, e.g., around 40
dB. Since we use adaptive quantization in the LRSD based scheme and uniform
quantization in the ILRSD based scheme, the proposed LRSD based scheme
performs generally better than the proposed ILRSD based scheme.
22
In general, less moving objects in the video lead to higher coding performance
for our schemes. For example, in the “Lobby” video, the proposed LRSD based
scheme achieves a significant PSNR gain of up to 5 dB compared to H.264/AVC.
When there are more moving objects, our schemes also work well, but the PSNR
gain becomes smaller. For example, the proposed LRSD based scheme achieves
PSNR gains up to 3 dB and 2.5 dB for the “Hall” and “Escalator” videos,
respectively. When there are many irregular background perturbations, the
sparse assumption of the foreground might not hold. But even in such a case,
our schemes can still obtain small PSNR gains compared to H.264/AVC. An
example is shown in Fig. 9 (c). That is, for the “Campus” video, the LRSD
based scheme achieves a PSNR gain of up to 1 dB. It can also been seen from
Fig. 9 that for the four surveillance videos, the performance of the proposed
ILRSD based scheme is in general similar to that of the LRSD based scheme,
which verifies the effectiveness of the proposed incremental scheme. Compared
with the fine-tuned H.264/AVC, our proposed schemes without the fine tuning
of H.264/AVC can still achieve better performance. Fig. 10 shows the PSNR
improvement of using ILRSD with the latest standard video codec HEVC, where
now HEVC is the base codec for our proposed scheme. Note that Fig. 10 (a)
shows the general case and Fig. 10 (b) shows the worst case.
10 20 30 40 50 60 70 80 9032
33
34
35
36
37
38
39
40
41
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed ILRSD−based HEVCHEVC
(a) Hall
40 60 80 100 120 140 160 18028
29
30
31
32
33
34
35
36
37
38
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
Proposed ILRSD−based HEVCHEVC
(b) Campus
Figure 10: PNSR performance of the proposed ILRSD based scheme using HEVC as the base
codec.
23
We would like to point out that, compared with the base codec, the per-
formance of our proposed schemes become saturated or even worse at high bit
rates. Such performance cap is mainly due to two reasons. First, the decom-
posed low-rank, sparse, and residual components are in floating point format.
Rounding them into integers causes some information loss. Second, we normal-
ize the sparse and residual components into a value range of [0, 255] so as to
facilitate the subsequent 8-bit standard codec. Such normalization process also
causes information loss. That’s why the coding performance of our proposed
ILRSD scheme tends to be flat at relatively high bit rates or at high PSNR
range. We argue that such a limitation can be greatly alleviated if we use a
9-bit codec such as the one in [3]. In addition, for practical applications, a
PSNR value of 38 ∼ 40 dB is usually sufficient, which means a pretty good
visual quality.
5.2. Comparison with state-of-the-art
Here, we compare our proposed ILRSD scheme with two methods, the rep-
resentative BGS based coding scheme [3] and state-of-the-art background pre-
diction based coding scheme [10]. We use H.264/AVC with High Profile as the
base codec. In particular, we compare the following five methods: “ILRSD
based H.264/AVC”, “BGS based H.264/AVC”, “BG as long-term reference”,
“McFIS based H.264/AVC”, and H.264/AVC. “ILRSD based H.264/AVC” is
our proposed ILRSD based coding method with rank equal to 1. “BGS based
H.264/AVC” is the method of [3] but using the same background and the same
coding strategy as “ILRSD based H.264/AVC”, for fair comparison. In other
words, the only difference is that “BGS based H.264/AVC” uses background sub-
traction to obtain the residual sequence while our “ILRSD based H.264/AVC”
uses the ILRSD framework. “McFIS based H.264/AVC” is the method of [10]
(using the first 25 frames to generate the McFIS frame), and “BG as long-term
reference” is almost the same as [10] except using the background generated
by ILRSD as the long-term reference frame. This is to show the quality of the
generated background by ILRSD.
24
To show the performance on long and relatively high-resolution videos, here
we use the following four video sequences for experiments: 300-frame CIF
“Silent” video, 1000-frame QCIF “Hall” video, 1000-frame CIF ‘Bridge close”
video and 1000-frame SD ‘PETS 2007” video. Note that the “PETS 2007” video
is chosen from the first view of “Dataset S6” in PETS 2007 dataset 4, which is
of relatively high resolution and high density of moving objects. The “Silent”
and “Bridge close” videos are standard test sequences 5. Sample frames of these
test sequences can be found in Fig. 8.
10 20 30 40 50
33
34
35
36
37
38
39
40
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC
(a) Hall (QCIF)
0 200 400 600 80032
33
34
35
36
37
38
39
40
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC
(b) Bridge close (CIF)
500 1000 1500 2000
32
33
34
35
36
37
38
39
40
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC
(c) PETS 2007 (SD)
50 100 150 200 250 300 35032
33
34
35
36
37
38
39
40
41
Bitrate (kbps)
Ave
rage
Y−P
SN
R (
dB)
ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC
(d) Silent (CIF)
Figure 11: The PSNR comparison of our proposed ILRSD scheme with the state-of-the-art
methods.
4http://www.cvg.rdg.ac.uk/PETS2007/data.html5http://media.xiph.org/video/derf/
25
Fig. 11 shows the PSNR comparisons with the state-of-the-art methods. It
can be seen that the proposed ILRSD based scheme achieves the best perfor-
mance at relatively low bit rates. For a target average PSNR value of 38 dB,
which is quite reasonable for practical applications, the proposed ILRSD based
scheme, the BGS based scheme, and the McFIS based scheme achieve average
bit-rate reductions of 27.8%, 15.1%, and 8.4% on the four test videos, compared
to the base codec, H.264/AVC, respectively.
From Fig. 11, we can see that the proposed ILRSD based coding scheme
always outperform the BGS based coding scheme. Compared with the McFIS
based scheme, the proposed ILRSD based scheme performs better in most of the
cases except for the “Silent” video at relatively high bit rates. This is because for
the “Silent” video, the foreground person keeps moving within a limited range,
for which it is hard to extract a clean background. For such a case, background
prediction based approach seems more efficient.
With a MATLAB implementation, the proposed incremental sparse decom-
position requires about 5 ms for one QCIF frame and about 30 ms for a CIF
frame. Since MATLAB implementation is generally slower than C/C++ imple-
mentation, this result suggests that the proposed incremental sparse decompo-
sition part is quite lightweight and it could be implemented in practical surveil-
lance systems. For the 300-frame “Silent” video, the proposed incremental low-
rank recovery (using 30 frames to construct each sub-matrix) requires about
11.5 seconds to extract the low-rank structure. Thus, such complex processing
needs to be done offline or only at the beginning of encoding a video sequence.
5.3. A brief analysis of the quantization parameters
As reported in [25], a careful adjustment of the quantization parameter
(QP) for encoding VC and VS leads to a better encoding performance of the
input video. However, the optimal combination of QPs has to be found by pre-
encoding each component separately, which will cause unnecessary time delay
in practical applications. To avoid such pre-processing, we present a brief com-
parison between different combinations of QPs and give a empirical suggestion
26
for QP selection in practice. The representative video “PETS 2007” is used in
this comparison.
500 1000 1500 2000 250033
34
35
36
37
38
39
40
41Y
−PS
NR
(dB
)
Bitirate (kbps)
QPL = QP
S
QPL = QP
S/1.5
QPL = QP
S/2
QPL = QP
S/3
QPL = 0
H.264
(a) The proposed method
500 1000 1500 2000 250033
34
35
36
37
38
39
40
Y−P
SN
R (
dB)
Bitirate (kbps)
QPBG
= QPFG
QPBG
= QPFG
/1.5
QPBG
= QPFG
/2
QPBG
= QPFG
/3
QPBG
= 0
H.264
(b) The BGS based method
Figure 12: Comparisons between using different combinations of QPs on the “PETS 2007”
video.
Fig. 12 shows the comparisons of the coding efficiency among using different
combinations of QPs. For the proposed method, we use QPL to represent
the QP for encoding VC and QPS to represent the QP for encoding VS . For
the BGS based method, we use QPBG to represent the QP for encoding the
background and QPFG represent the QP for encoding the residuals. It can
be seen that, for both the proposed method and the BGS based method, using
different combinations of QPs have similar coding efficiencies, except for the case
that QPL=QPS or QPBG=QPFG. So for the proposed method, QPL=QPS/2
is a reasonable choice for good coding performance. By comparing Fig. 12 (a)
and (b), we can see that no matter which QP combination is used, our proposed
method always outperforms the BGS method.
6. Conclusion
The emerging theory of LRSD provides efficient algorithms for the separation
of the background and the moving objects in videos captured by fixed cameras.
In this paper, we have proposed a scheme that can efficiently encode the com-
ponents generated by LRSD and achieve good overall compression efficiency.
27
Moreover, we have also proposed an incremental LRSD algorithm that greatly
reduces the memory requirement and the computational complexity of the ex-
isting LRSD algorithm so that practical large-scale surveillance videos can be
handled. Numerous experiments on different test sequences demonstrated that
the proposed coding schemes can significantly improve the existing standard
codecs, H.264/AVC and HEVC, at relatively low bit rates and outperform the
state-of-the-art background modeling based coding schemes.
For future work, one extension is to use a 9-bit codec such as the one in [3],
instead of the 8-bit H.264/AVC in our current implementation, to encode the
residual videos so as to avoid the information loss at the beginning. In addition,
exploiting the coding modes in the macroblock level [26] may also boost the
performance of the proposed method.
Acknowledgement
This work is partially supported by MoE AcRF Tire 2 Grant, Singapore,
Grant No.: T208B1218, the Major State Basic Research Development Program
of China (973 Program, No. 2013CB329402), the National Natural Science
Foundation of China (Nos. 61227004, 61372131, 11204014), the 111 Project
(No. B07048), the Fundamental Research Funds for the Central Universities
(No. K5051399020), and the Research Fund for the Doctoral Program of Higher
Education of China (No. 20130203120009).
References
[1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the
H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Tech-
nol. 13 (7) (2003) 560–576. doi:10.1109/TCSVT.2003.815165.
[2] G. Sullivan, J. Ohm, W.-J. Han, T. Wiegand, Overview of the high effi-
ciency video coding (HEVC) standard, IEEE Trans. Circuits Syst. Video
Technol. 22 (12) (2012) 1649–1668. doi:10.1109/TCSVT.2012.2221191.
28
[3] X. Zhang, L. Liang, Q. Huang, Y. Liu, T. Huang, W. Gao, An efficient
coding scheme for surveillance videos captured by stationary cameras, in:
Proc. Visual Commun. Image Process. (VCIP), Vol. 7744, SPIE, 2010.
doi:10.1117/12.863522.
[4] X. Zhang, L. Liang, Q. Huang, T. Huang, W. Gao, A background model
based method for transcoding surveillance videos captured by station-
ary camera, in: Picture Coding Symposium (PCS), 2010, pp. 78–81.
doi:10.1109/PCS.2010.5702583.
[5] W. Gao, Y. Tian, T. Huang, S. Ma, X. Zhang, IEEE 1857 standard empow-
ering smart video surveillance systems, IEEE Intell. Syst. PP (99) (2013)
1–1. doi:10.1109/MIS.2013.101.
[6] X. Zhang, Y. Tian, L. Liang, T. Huang, W. Gao, Macro-block-level se-
lective background difference coding for surveillance video, in: Int’l Conf.
Multimedia and Expo (ICME), IEEE, 2012, pp. 1067–1072.
[7] X. Zhang, Y. Tian, T. Huang, W. Gao, Low-complexity and high-
efficiency background modeling for surveillance video coding, in: Proc.
Visual Commun. and Image Process. (VCIP), IEEE, 2012, pp. 1–6.
doi:10.1109/VCIP.2012.6410796.
[8] M. Paul, W. Lin, C. Lau, B.-S. Lee, Video coding using the most common
frame in scene, in: Int’l Conf. Acoust., Speech, Signal Process. (ICASSP),
IEEE, 2010, pp. 734–737. doi:10.1109/ICASSP.2010.5495033.
[9] K. Misra, J. Zhao, A. Segall, McFIS in hierarchical bipredictve
pictures-based video coding for referencing the stable area in a scene,
in: Int’l Conf. Image Process. (ICIP), IEEE, 2011, pp. 3521–3524.
doi:10.1109/ICIP.2011.6116473.
[10] M. Paul, W. Lin, C.-T. Lau, B.-S. Lee, Explore and model better I-frames
for video coding, IEEE Trans. Circuits Syst. Video Technol. 21 (9) (2011)
1242–1254. doi:10.1109/TCSVT.2011.2138750.
29
[11] J. Cai, E. J. Candes, Z. Shen, A singular value thresholding algorithm for
matrix completion, SIAM J. Optim. 20 (4) (2010) 1956–1982.
[12] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method
for exact recovery of corrupted low-rank matrices, arXiv preprint
arXiv:1009.5055.
[13] E. J. Candes, X. Li, Y. Ma, J. Wright, Robust principal component anal-
ysis?, J. ACM 58 (3) (2011) 11:1–11:37.
[14] T. Zhou, D. Tao, GoDec: Randomized low-rank & sparse matrix decom-
position in noisy case, in: Int’l Conf. Mach. Learning (ICML), 2011.
[15] C. Chen, J. Cai, W. Lin, G. Shi, Surveillance video coding via low-rank
and sparse decomposition, in: ACM Multimedia, 2012, pp. 713–716.
[16] K. Bredies, Dirk, A. Lorenz, Iterated hard shrinkage for minimization prob-
lems with sparsity constraints, SIAM J. Sci. Comput. 30 (2006) 657–683.
[17] C. Eckart, G. Young, The approximation of one matrix by another of lower
rank, Psychometrika (1936) 211–218.
[18] T. Zhou, D. Tao, Greedy bilateral sketch, completion & smoothing, in: Int’l
Conf. Artificial Intell. Statistics, 2013.
[19] P. Drineas, M. W. Mahoney, S. Muthukrishnan, Relative-error CUR matrix
decompositions, SIAM J. Matrix Anal. Appl. 30 (2008) 844–881.
[20] L. Li, W. Huang, I. Y.-H. Gu, Q. Tian, Statistical modeling of complex
backgrounds for foreground object detection, IEEE Trans. Image Process.
13 (11) (2004) 1459–1472. doi:10.1109/TIP.2004.836169.
[21] T. Bouwmans, F. E. Baf, B. Vachon, Background modeling using mixture of
gaussians for foreground detection - a survey, Recent Patents on Computer
Science 1 (3) (2008) 219–237.
30
[22] L. W. Mackey, A. S. Talwalkar, M. I. Jordan, Divide-and-conquer ma-
trix factorization, in: Advances in Neural Information Processing Systems
(NIPS) 24, 2011, pp. 1134–1142.
[23] G. J. Sullivan, P. Topiwala, A. Luthra, The H.264/AVC advanced video
coding standard: Overview and introduction to the fidelity range exten-
sions, in: SPIE conference on Applications of Digital Image Processing
XXVII, 2004.
[24] T. Tan, G. Sullivan, T. Wedi, Recommended simulation common conditions
for coding efficiency experiments, ITU-T Q.6/SG16, VCEG-AA10 (2005).
[25] J. Hou, L.-P. Chau, M. Zhang, M. Zhang, N. Magnenat-Thalmann,
Y. He, A highly efficient compression framework for time-varying
3d facial expressions, IEEE Trans. Circuits Syst. Video Tech-
nol.doi:10.1109/TCSVT.2014.2313890.
[26] S. Wang, J. Fu, Y. Lu, S. Li, W. Gao, Content-aware layered compound
video compression, in: IEEE International Symposium on Circuits and
Systems (ISCAS), 2012, pp. 145–148.
31
�� �������������������� ����������������������������������������������������������� ������������������������������������ ���������������� ���������������������� ���� !"����������������#��������� �������������������������������������