33
Accepted Manuscript Incremental Low-Rank and Sparse Decomposition for Compressing Videos Captured by Fixed Cameras Chongyu Chen, Jianfei Cai, Weisi Lin, Guangming Shi PII: S1047-3203(14)00198-9 DOI: http://dx.doi.org/10.1016/j.jvcir.2014.12.001 Reference: YJVCI 1454 To appear in: J. Vis. Commun. Image R. Received Date: 4 January 2014 Accepted Date: 26 November 2014 Please cite this article as: C. Chen, J. Cai, W. Lin, G. Shi, Incremental Low-Rank and Sparse Decomposition for Compressing Videos Captured by Fixed Cameras, J. Vis. Commun. Image R. (2014), doi: http://dx.doi.org/10.1016/ j.jvcir.2014.12.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Embed Size (px)

Citation preview

Page 1: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Accepted Manuscript

Incremental Low-Rank and Sparse Decomposition for Compressing Videos

Captured by Fixed Cameras

Chongyu Chen, Jianfei Cai, Weisi Lin, Guangming Shi

PII: S1047-3203(14)00198-9

DOI: http://dx.doi.org/10.1016/j.jvcir.2014.12.001

Reference: YJVCI 1454

To appear in: J. Vis. Commun. Image R.

Received Date: 4 January 2014

Accepted Date: 26 November 2014

Please cite this article as: C. Chen, J. Cai, W. Lin, G. Shi, Incremental Low-Rank and Sparse Decomposition for

Compressing Videos Captured by Fixed Cameras, J. Vis. Commun. Image R. (2014), doi: http://dx.doi.org/10.1016/

j.jvcir.2014.12.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Page 2: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Incremental Low-Rank and Sparse Decomposition forCompressing Videos Captured by Fixed Cameras

Chongyu Chena, Jianfei Caib,∗, Weisi Linb, Guangming Shia

aSchool of Electronic Engineering, Xidian University, Xi’an, Shaanxi, 710071 ChinabSchool of Computer Engineering, Nanyang Technological University, 639798 Singapore

Abstract

Videos captured by stationary cameras are usually with a static or gradually

changed background. Existing schemes are not able to globally exploit the

strong background temporal redundancy. In this paper, motivated by the re-

cent advance on low-rank and sparse decomposition (LRSD), we propose to

apply it for the compression of videos captured by fixed cameras. In particular,

the LRSD is employed to decompose the input video into the low-rank com-

ponent, representing the background, and the sparse component, representing

the moving objects, which are encoded by different methods. Moreover, we

further propose an incremental LRSD (ILRSD) algorithm to reduce the large

memory requirement and high computational complexity of the existing LRSD

algorithm, which facilitates the process of large-scale video sequences without

much performance loss. Experimental results show that the proposed coding

scheme can significantly improve the existing standard codecs, H.264/AVC and

HEVC, and outperform the state-of-the-art background modeling based coding

schemes.

Keywords: Video coding, stationary camera, incremental low-rank and sparse

decomposition, CUR decomposition, background subtraction

∗Corresponding authorEmail addresses: [email protected] (Chongyu Chen), [email protected]

(Jianfei Cai), [email protected] (Weisi Lin), [email protected] (Guangming Shi)

Preprint submitted to Journal of Visual Communication and Image RepresentationDecember 1, 2014

Page 3: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

1. Introduction

In practical surveillance and teleconference systems, large amount of videos

are captured by stationary cameras, which require efficient compression and fast

transmission. For these videos, the static or gradually changed background in

the scene is a common characteristic, which leads to much temporal redundancy.

Highly efficient compression of these videos is possible if such redundancy can

be effectively removed.

Standard video codecs, including H.264/AVC [1] and HEVC [2], are typi-

cally block-based. They achieve high efficiency in the compression of general

videos by exploiting both temporal and spatial redundancies. When compress-

ing videos captured by fixed cameras, further improvements on coding efficiency

can be achieved by using some specially designed configuration [3]. However,

the standard block-based codecs cannot exploit the strong background tempo-

ral redundancy in a global manner because they partition each video frame into

blocks.

Background subtraction (BGS) based coding techniques [3, 4] have been

proposed for compressing videos captured by fixed cameras. Zhang et al. [3]

propose a representative background difference based coding scheme, where the

background is first generated by background modeling; individual frames are

then subtracted by the background to obtain the difference. The difference se-

quence is finally encoded by H.264/AVC. Their idea of encoding the background

difference has been experimentally shown to be efficient for compressing surveil-

lance videos and thus has been adopted in IEEE 1857 [5] which is specially

targeted for surveillance videos. Further developments on the BGS method

have also been reported on improving the coding strategy of the residual video

at the macro-block level [6] and improving the background modeling part [7].

However, BGS based methods cannot well handle the cases with global illumi-

nation changes in the scene since it lacks of efficient way to adaptively adjust

the background.

Background prediction based methods [8, 9, 10] have also been proposed,

2

Page 4: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

which indicate another way of utilizing the background. In particular, Paul et

al. [10] propose to use the most common frame in a scene (McFIS) as the long-

term reference frame, which has shown its ability in compressing short video

sequences. When the video sequence is long, a McFIS reference frame trained

from the first few frames may not be a good one. Thus, the McFIS based scheme

is usually combined with a scene change detection technique, which increases

the encoding complexity and cannot have a unified solution.

Recently, a few low-rank and sparse decomposition (LRSD) tools [11, 12,

13, 14] have been developed, which can decompose a surveillance video into

a low-rank component and a sparse component, approximately representing

the background and the foreground moving objects, respectively (see Fig. 1).

We notice that LRSD can be seen as a more general BGS, which can better

represent the background frames and unify the background modeling and the

background subtraction processes. In the case of illumination change, the low-

rank coefficients can easily capture the change in a graceful way. Thus, it can

produce background difference with less energy compared with the typical BGS.

Even in the case of static background, the low-rank coefficients can still help

mitigate frame variations to produce less residual.

Therefore, in this paper, we propose to apply LRSD for the compression of

videos captured by fixed cameras. In particular, we represent the frames of the

background component by very few independent frames based on the linear de-

pendency, which dramatically removes the temporal redundancy. The remaining

part, consisting of the sparse and residual components, can be efficiently com-

pressed by the existing block-based coding scheme. Moreover, by noticing that

the existing LRSD algorithm cannot handle high-resolution or long-time videos

due to its high memory requirement, we further propose an incremental LRSD

(ILRSD) algorithm that can effectively handle large-scale video sequences with-

out much performance loss. Experimental results on standard test sequences

show that, the proposed LRSD based or ILRSD based coding scheme can sig-

nificantly improve the existing video codecs.

The main contributions of this paper are twofold. First, we apply the LRSD

3

Page 5: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

for compressing videos captured by fixed cameras, where we develop a coding

scheme for individual LRSD components. To the best of our knowledge, the

idea of applying LRSD for video compression has not been reported by others.

Second, we significantly improve the existing LRSD algorithm by reducing its

memory requirement and computation complexity, which gives LRSD the ability

in processing large-scale videos and the possibility of being applied in practical

applications.

We would like to point out that a preliminary version of this paper has

been reported in [15]. Compared with the previous conference version, this

paper employs a new LRSD algorithm, presents an insightful analysis of the

success of the proposed coding structure, provides more technical details and

experimental results, and, most importantly, proposes a brand-new incremental

LRSD algorithm for practical video compression.

The rest of this paper is organized as follows. Section 2 briefly introduces the

related theory of LRSD. Section 3 describes the proposed LRSD based video

coding scheme. In Section 4, we propose an incremental LRSD algorithm to

overcome the memory bottleneck of the existing LRSD method. In Section 5,

we conduct numerous experiments on different video sequences to compare the

proposed coding schemes with the state-of-the-art alternatives. Finally, Sec-

tion 6 concludes this paper.

2. Low-Rank and Sparse Decomposition

In matrix theory, the linear dependency among the columns of a matrix

is referred to as the low-rank property. As a result, if we stack many linear

dependent frames as the columns of a matrix L, then L is exactly low-rank

and its rank is identical to the number of its independent columns. Matrices

converted from videos captured by fixed cameras are expected to be low-rank

because of the static backgrounds. In this case, perturbations of such videos

can be seen as other matrices that are added to L.

The emerging theory of robust principal component analysis (RPCA) [11,

4

Page 6: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

12, 13] provides a suitable formulation for the separation of perturbations and

background. That is,

A = L + S, (1)

where L is the low-rank matrix described above and S is a sparse matrix. Given

a matrix A, L and S can be found by RPCA algorithms such as the augmented

Lagrange multiplier (ALM) method [12] and the principal component pursuit

(PCP) [13], assuming that the low-rank component L is not sparse and the

sparse component S is not low-rank. For a matrix constructed by stacking

frames of a video captured by a fixed camera as columns, the assumption of

RPCA usually holds because its low-rank component is often the static back-

ground and thus is not sparse, while its sparse component often includes moving

objects that are linear independent and thus is not low-rank. An example of

the separation of a surveillance video via ALM [12] is shown in Fig. 1, which

shows the ability of RPCA algorithms in handling sparse perturbations caused

by moving objects.

(a) Original (b) Low-rank (c) Sparse

Figure 1: Different components separated by ALM [12]. (a) The first frame of the original

video sequence “Hall”. (b) The background restored from the first column of L. (c) The

foreground converted from the first column of S.

Existing RPCA algorithms often concentrate on finding more meaningful

decompositions. However, their complexity is often uncontrollable due to their

automatic and iterative solving procedure, which makes them unsuitable for

video coding. Recently, the GoDec [14] algorithm is proposed for separating

low-rank and sparse components of matrices, which also works well for matrices

constructed from videos captured by fixed cameras. The formulation of GoDec

5

Page 7: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

can be seen as a noisy version of RPCA, that is

A = L + S + N, (2)

where matrix N is the noise component. Besides the controllable complexity,

GoDec also provides controllable rank of L and sparsity of S. These character-

istics make GoDec a good choice for video coding.

According to the theory of GoDec [14], the problem (2) can be solved by

minimizing the decomposition error:

minL,S

‖A − L − S‖2F , s. t. rank(L) ≤ r, ‖S‖0 ≤ k, (3)

where r is the target rank of L and k is the target sparsity of S. Here the

sparsity refers to the number of non-zero entries in S. In the GoDec method,

the final components L and S are found by solving the following subproblems

iteratively: ⎧⎪⎨⎪⎩

Lt = arg minrank(L)≤r

‖A − L − St−1‖2F

St = arg min‖S‖0≤k

‖A − Lt − S‖2F

, (4)

where the subscript t represents the t-th iteration. Given the rank of L and

the sparsity of S, Lt and St are computed efficiently by performing low-rank

approximation (LRA) and entry-wise hard thresholding alternatively:⎧⎨⎩

Lt = LRA(A − St−1, r)

St = THR(A − Lt, k), (5)

where “LRA(A−St−1, r)” represents the computation of the rank-r approxima-

tion of A − St−1 and “THR(A − Lt, k)” represents the entry-wise hard thresh-

olding of A − Lt with parameter k [16], i.e., keeping k entries of A − Lt that

have the largest absolute values. In general, the optimal LRA of a matrix can be

computed by the truncated singular value decomposition (SVD) [17]. However,

it is shown in [14] that near optimal LRA is sufficient for the convergence of

GoDec. Thus, the bilateral random projections (BRP) based LRA is employed

in GoDec to accelerate the computation.

6

Page 8: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Algorithm 1 summarizes the procedures of GoDec. In this algorithm, the

input parameters ε and tmax are the target relative error of decomposition and

the maximum number of iterations, respectively. The parameter tmax is used to

avoid infinite loop because the relative error of decomposition might not further

decrease after several iterations. It should be pointed out that the convergence

of GoDec comes from the combination of LRA and hard thresholding in each

iteration [14]. That is, the global optimalities of St and Lt yield decreasing

decomposition errors and the convergence to a local minimum.

Algorithm 1 GoDecInput: A, r, k, ε, tmax

Output: L, S

Initialize: S0 ← 0

for t = 1 → tmax do

Lt = LRA(A − St−1, r);

St = THR(A − Lt, k);

if ‖A − Lt − St‖2F /‖A‖2

F ≤ ε then

Break;

end if

end for

When applying GoDec on a matrix with an unknown “low-rank plus sparse”

(L+S) structure, it is necessary to determine the target rank and the sparsity in

advance. For the target rank r, according to [14], we can set the r to be large at

the beginning and then reduce it by checking the rank of Lt during the iterations.

When the target sparsity is unknown, Zhou and Tao [18] suggest replacing the

hard thresholding by a soft thresholding, resulting in a more adaptive algorithm

called semi-soft (SS) GoDec. The soft thresholding of an m× n matrix X with

a threshold τ is to change the entries of X as

X(i, j) =

⎧⎨⎩

X(i,j)|X(i,j)| (|X(i, j)| − τ), |X(i, j)| > τ

0, |X(i, j)| ≤ τ(6)

where i (1 ≤ i ≤ m) and j (1 ≤ j ≤ n) are the row index and the column index

7

Page 9: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

respectively. Similar to GoDec, the convergence of SS GoDec is guaranteed by

the combination of LRA and soft thresholding in each iteration. For multimedia

applications, the soft thresholding is more reasonable than the hard thresholding

because the sparsity k is usually unknown. As a result, in this research, we

choose SS GoDec, rather than the original GoDec, as the base for the low-rank

and sparse decomposition (LRSD).

3. LRSD Based Video Coding

In this section, we propose a scheme to improve the coding efficiency of

block-based codecs based on the low-rank and sparse decomposition (LRSD). It

should be noted that our scheme can be combined with any block-based codec

such as H.264/AVC or HEVC.

Given a video sequence of resolution H × W , the proposed scheme consists

of the following steps:

1. Stack a set of frames of the video as columns of a matrix A ∈ Rm×n,

where m = HW and n is the number of frames;

2. Separate the components of A using SS GoDec, so that A = L + S + N ,

where L is a rank-r matrix, S is a sparse matrix, and N is a dense residual

matrix that has many small entries;

3. Compute a low-rank decomposition of L, so that L = CX, where the

m×r matrix C contains some columns of L, representing the independent

components of the background, and X is a r × n matrix, storing the

coefficients to recover each background frame based on the independent

components.

4. Construct S by normalizing the entries of S + N so as to ensure that the

entries of the dense matrix S are ranging from 0 to 255;

5. Convert S and the normalized C to two video sequences, denoted as VS

and VC respectively, and compress them separately using a base codec,

e.g. H.264/AVC or HEVC.

8

Page 10: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

As shown in Fig. 2 (a), the compressed video sequence consists of four

parts, the bit streams of VC and VS , the r × n matrix X (“Coefficient 1”),

and the denormalization coefficients (“Coefficient 2”) for restoring S + N and

C. Fig. 2 (b) shows the corresponding decoding process. Based on the ob-

servation that SS GoDec often converges in less than 20 iterations, we set the

maximum number of iteration to be 20 in the proposed scheme. For the rest of

the section, we describe the steps of the encoding scheme in detail, and explain

our choices of parameters by showing some experimental results. For simplicity,

we use H.264/AVC as an example of our base codec.

LRSD

Normalization

Sparse and residual components

CUR decomposition

Low-rank component

Encoding by the base codec

Coefficients 1 Bit-stream 1 Bit-stream 2Coefficients 2

Video frames

Independent frames

Encoding by the base codec

Normalization

(a) encoder

Denormalization

Sparse and residual componentsLow-rank component

Decoding by the base codecDecoding by the base codec

Coefficients 1 Bit-stream 1 Bit-stream 2Coefficients 2

Decoded video frames

Independent frames

Multiplying

Adding

Denormalization

(b) decoder

Figure 2: The diagram of the proposed LRSD based video coding scheme.

3.1. Encoding the low-rank component via coding-oriented decomposition

In this paper, we propose to compress L by its low-rank property. In par-

ticular, we factorize the m × n matrix L into two small matrices by computing

9

Page 11: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

the CUR decomposition [19] of L. That is,

L = CUR, (7)

where the m×r matrix C consists of r adaptively selected columns of L, the r×n

matrix R consists of r adaptively selected rows of L, and the r×r matrix U is the

pseudo-inverse of the intersection of C and R. In this way, L is divided into two

small matrices, C and X = UR. Matrix C is used to restore the r independent

frames of the background and construct a short video VC which only has r

frames. Note that we normalize C before converting it to VC . Next, we compress

VC via H.264/AVC and directly store X without compression considering the

amount of data for X is small. At the decoder side, C can be recovered by

stacking the denormalized frames of VC as columns. Then, the restoration of L

can be done by multiplying C and X.

Compared to other decomposition such as SVD and QR factorization, the

employed CUR decomposition is considered to be coding-oriented. This is

because CUR uses the original columns of L as the basis to represent other

columns, while SVD and QR factorization use orthogonal basis. General codecs

can easily exploit the redundancy between the original columns, but cannot

exploit the redundancy in the orthogonal basis.

Note that the low-rank component L can be directly converted to a video VL

that basically represents all the background frames. Although the frames of VL

are highly correlated, directly compressing VL via H.264/AVC is still less efficient

than the proposed CUR-based coding scheme. To verify this statement, we do

a comparison between the scheme of directly encoding VL and the proposed

CUR-based coding scheme. Here, the first 200 frames of the “Hall” video [20]

are used as the input sequence. We use identical quantization parameters for

both methods and the distortion of the decoded video is measured by averaging

the peak signal-to-noise-ratio (PSNR) of the luminance components. As shown

in Fig. 3, the proposed scheme is more efficient than directly compressing VL

via H.264/AVC, no matter the rank of L is 1, 3 or 5. This is mainly because

the block-based coding scheme is inefficient in exploiting the global redundancy

10

Page 12: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

of the background frames. It can also be seen that the proposed scheme tends

to be less efficient as the rank increases. This is because the size of C increases

while the background of the scene is actually unchanged. Thus, the target rank

k is suggested to be set at the minimum necessary level, i.e. just matching the

number of dominant changes in the background.

0 2 4 6 8 10 12 14 16 18 20

30

35

40

45

50

Bit rate (kbps)

Aver

age

Y−

PS

NR

(dB

)

Code VL via H.264/AVC (rank=1)

Code VL via H.264/AVC (rank=3)

Code VL via H.264/AVC (rank=5)

Code VC

and store X (rank=1)

Code VC

and store X (rank=3)

Code VC

and store X (rank=5)

Figure 3: A comparison of coding the low-rank background via H.264/AVC and the proposed

scheme.

3.2. Encoding the sparse and residual components

To guarantee sufficiently high quality of the decoded video, both the sparse

component S and residual component N have to be compressed. In the proposed

scheme, we first convert S +N to a video denoted as VS , and then VS is directly

encoded. Note that the entries of S + N can be positive and negative. So the

normalization of these entries is necessary before converting S + N to VS . As

a result, the maximum and minimum entries of S + N must be stored, which

constitute “coefficient 2” shown in Fig. 2. Existing block-based codecs such as

H.264/AVC are expected to be efficient in compressing VS , because there are

many near flat blocks in each frame of VS , which become flat after moderate

quantization. Note that any optimization of the base codec in macro-block level

such as [6] can be employed for further compressing VS , which is not the focus

of this paper.

Fig. 4 shows the comparisons of compressing VS when the threshold of

SS GoDec, denoted as τ , changes. These comparisons indicate that the coding

11

Page 13: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

efficiency of S + N is not sensitive to τ . Thus, we empirically set τ to 10 when

using SS GoDec in the proposed scheme. In addition, the comparison between

compressing the original video and compressing VS shows that the performance

gain of the proposed scheme becomes saturated at high bit rates or very high

PSNR ranges. This is mainly because we normalize S +N into a value range of

[0, 255] so as to facilitate the subsequent 8-bit H.264/AVC coding, which causes

information loss.

0 50 100 150 200 250 30020

25

30

35

40

45

50

Bit rate (kbps)

Av

erag

e Y

−P

SN

R (

dB

)

Threshold = 6

Threshold = 12

Threshold = 18

Original video

Figure 4: A comparison of coding VS and the original video via H.264/AVC when the threshold

of SS GoDec changes.

3.3. Comparing with background subtraction

In this subsection, we use an example to illustrate the advantages of LRSD

in terms of background modeling and background subtraction. As we know,

videos captured by a stationary camera often contain large background changes

due to the automatic exposure control of the camera or the change of lighting

condition, which will deteriorate the coding performance of the background

subtraction based schemes [3, 6]. In contrast, the proposed LRSD can be viewed

as a generalized BGS approach that is adaptive to large background changes,

especially for that caused by the illumination change.

When dealing with illumination changes, pixel-based background modeling

methods, such as Gaussian mixture model (GMM) [21] and segment-and-weight

based running average [7], tend to produce pixel-wise changes in the back-

ground. For illustration, we show in Fig. 5 some representative background

frames extracted by LRSD and GMM on the test sequence “Lobby” [20]. The

LRSD background frames shown in Fig. 5 (a) and (b) are obtained by applying

12

Page 14: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

SS GoDec with r = 2, τ = 10, and tmax = 20. Two representative background

frames obtained by GMM are shown in Fig. 5 (c) and (d), which have unpleas-

ant artifacts. We observe that GMM, as well as other pixel-based background

modeling methods, often produce a sequence of background frames whose tem-

poral redundancy is difficult to be removed. On the contrary, the redundancy

between the LRSD background frames can be efficiently removed by the existing

codec. In addition, the linear combination of LRSD background frames can well

represent the background of individual frames.

(a) LRSD BG 1 (b) LRSD BG 2 (c) GMM BG 1 (d) GMM BG 2

Figure 5: The background frames extracted from the “Lobby” video: (a) and (b) are the

background frames extracted by SS GoDec; (c) and (d) are the background frames (mean

values) extracted by GMM [21] at the 195-th and 200-th frames, respectively.

Another advantage of LRSD in representing the background lies in its ability

in indicating significant global changes, which could be used for scene change

detection. In the “Lobby” video, the background illumination keeps changing

from the 125-th frame to the 155-th frame. Such changes are exactly reflected

in the low-rank coefficients X shown in Fig. 6. That is, the coefficient for back-

ground 1 changes from around 1 to around 0 and the coefficient for background 2

changes inversely when the scene becomes dark. Therefore, a significant change

of the low-rank coefficient can be used as an indicator of scene change, which

requires much less computations compared to the reconstructed frame based

method [10]. Considering scene change detection is beyond the scope of this

paper, we do not discuss it in detail.

We further compare the difference frames generated by LRSD and the con-

ventional way of background subtraction that uses the same set of backgrounds

as LRSD and chooses the best one for each frame. Fig. 7 shows the energy of

13

Page 15: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

50 100 150 200 250 300 350 400

0

0.2

0.4

0.6

0.8

1

Frame index

Low

−ran

k co

effic

ient

s

Coefficients for background 1Coefficients for background 2

Figure 6: The low-rank coefficients for the “Lobby” video.

�� ��� ��� ��� ��� ��� ��� ���

������

� ��������

����������� ��

�������

Figure 7: The residual energies for the “Lobby” video.

individual difference frames, which is measured by the sum of the square of indi-

vidual entries. It can be seen that LRSD produces smaller residual energy at the

places with large illumination changes. This is mainly because the coefficients

X can combine the background frames C well to produce better background for

each individual frame.

4. Proposed Incremental LRSD

Although the proposed LRSD based video coding scheme is able to improve

the existing block-based codecs in compressing videos captured by fixed cameras

(see Section 5), there still exist several problems when applying it to practical

video surveillance systems. First, SS GoDec requires an input matrix to be

fully stored in the memory, which is unsuitable for high-resolution or long-time

video processing. Second, the complexity of the LRSD algorithm is relatively

high for real-time processing in existing surveillance cameras, which are usually

of limited computation and storage resources. In addition, accumulating video

frames for the LRSD process at the camera side also causes additional delay in

delivering the video content.

14

Page 16: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

The problems described above mainly come from the disability of the SS GoDec

in incrementally processing the matrix. Therefore, in this section, we propose

two extensions of the SS GoDec to solve these problems. The first extension

called incremental sparse decomposition is able to compute the low-rank coeffi-

cients and recover the sparse component of new matrix columns using a given

low-rank structure, and the second extension called incremental low-rank recov-

ery is able to recover the low-rank structure incrementally without storing the

entire matrix in the memory. To the best of our knowledge, such an incremen-

tal LRSD algorithm with controllable rank has not been reported in literature

before.

4.1. Incremental Sparse Decomposition with Given Low-Rank Structure

For surveillance videos, the scene background or the low-rank component

usually does not change frequently. Thus, the previously or offline obtained

low-rank structure can be reused for the L+S decomposition of the new video

frames. Also, the existing background modeling based coding schemes [3, 4, 10]

usually use the first few frames of a group of pictures (GOP) to extract the

background. This motivates us to propose an extension of the SS GoDec that is

able to perform LRSD on new matrix columns with a given low-rank structure,

which is called incremental sparse decomposition in this paper.

In particular, the decomposition now becomes

A′ = CX ′ + S′ + N ′, (8)

where the m×r matrix C is the given low-rank structure, A′ is an m×n′ matrix

representing the newly stacked video frames to be decomposed, X ′ is an r × n′

matrix to store the low-rank coefficients, and S′ and N ′ are the corresponding

sparse and residual components respectively. Similar to the original GoDec, we

can recover X ′ and S′ by solving the following subproblems in each iteration:⎧⎪⎨⎪⎩

X ′t = arg min ‖A′ − CX ′ − S′

t−1‖2F

S′t = arg min

‖S′‖0≤k‖A′ − CX ′

t − S′‖2F

, (9)

15

Page 17: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

where the subscript t represents the t-th iteration. Since the first part of (9) can

be seen as a least square problem with given A′, C, and S′t−1, the coefficients

X ′t can be directly computed by

X ′t = C+(A′ − S′

t−1), (10)

where C+ is the pseudo-inverse of C. Meanwhile, as pointed out by Zhou and

Tao [18], we can solve S′t by performing a soft thresholding on A′ −CX ′

t, which

is the same as SS GoDec.

As described in Section 2, the convergence of SS GoDec is guaranteed by the

combination of LRA and soft thresholding in each iteration. Note that the LRA

step is actually projecting A − St−1 to a low-rank subspace that approaches to

span{C}, where span{C} represents the linear subspace spanned by the columns

of C. Here, the operation in (10) can be seen as the projection of A′ − S′t−1

to span{C}. Thus, the combination of the subspace projection (10) and soft

thresholding also guarantees the convergence.

Note that, when the target rank r = 1, the proposed incremental sparse

decomposition is significantly simplified. Its complexity is comparable to that

of the typical background subtraction process since the subspace projection (10)

is reduced to computing an inner product and it usually converges within 5

iterations.

4.2. Incremental Recovery of the Low-Rank Structure

The proposed incremental sparse decomposition requires a given low-rank

structure, which could be offline pre-computed or obtained based on the first few

frames. Typically, using large number of frames or the entire GOP can recover

a better low-rank structure. However, in practice, it is difficult to compute a

global low-rank structure since the input matrix is too large to be stored in the

memory, especially for long-time or high-resolution videos. Here we propose

another extension of SS GoDec to incrementally recover the global low-rank

structure. Our basic idea is to partition the input matrix into several sub-

matrices to compute individual local low-rank components and then convert

16

Page 18: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

these local low-rank components into a global one.

In particular, inspired by the divide-factor-combine (DFC) matrix comple-

tion framework [22], we propose to compute the global low-rank structure C

using the following five steps:

1. Partition the input m × n matrix A into p sub-matrices of identical size

m × np, i.e.,

A = [A1, A2, . . . , Ap]. (11)

2. Perform SS GoDec on each sub-matrix, so that

Ai = Li + Si + Ni, 1 ≤ i ≤ p, (12)

where Li is the low-rank component, Si is the sparse component, and Ni

is the residual component.

3. Extract the low-rank structure of each sub-matrix by the CUR decompo-

sition, i.e.,

Li = CiUiRi, 1 ≤ i ≤ p; (13)

4. Perform SS GoDec on the combined low-rank structures, so that

[C1, C2, . . . , Cp] = Lc + Sc + Nc, (14)

where Lc is the low-rank component, Sc is the sparse component, and Nc

is the residual component.

5. Extract the low-rank structure of Lc by CUR decomposition, i.e.,

Lc = CUR. (15)

Then, the m×r matrix C is considered as the low-rank structure of A. It should

be noted that the low-rank structure recovered by the incremental low-rank

recovery is less optimal compared to the original SS GoDec. This is because

SS GoDec achieves global optimalities of St and Lt in each iteration, while

the incremental low-rank recovery achieves local optimalities of Li and Si in

each partition. In other words, low-rank structure recovered incrementally is an

approximation to that recovered globally. Numerical evaluations of the accuracy

17

Page 19: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

loss are reported in Section 4.3, which indicates that such loss is acceptable in

practical applications.

From the above steps, we can see that the proposed incremental method

requires a memory of at most max{O(mn/p), O(mrp)}, which is much smaller

than the memory requirement of O(mn) in SS GoDec for some typical values of

p and r. Thus, the proposed incremental method is more suitable for large-scale

LRSD. Note that using some hierarchical implementation could further reduce

the memory requirement.

4.3. Numerical Evaluation of the Proposed ILRSD

Since the proposed incremental LRSD algorithm (ILRSD) is an approxima-

tion to the original SS GoDec, in this subsection, we numerically evaluate how

effective the proposed incremental scheme is by applying it for the low-rank and

sparse decomposition of large matrices, where the proposed incremental LRSD

scheme requires a two-pass process: the incremental low-rank recovery pass (see

Section 4.2) and the incremental sparse decomposition pass (see Section 4.1).

In particular, we generate numerous m × m matrices that are in the form of

A = L+S +N . For each matrix, the low-rank component is the product of two

small random matrices, i.e., L = BD, where both the m × r matrix B and the

r×m matrix D are standard Gaussian matrices. The sparse component S is an

m×m matrix with only 0.2m2 non-zero entries that are drawn from a standard

Gaussian distribution. The residual component N is an m×m Gaussian matrix

with a zero mean and a standard deviation of 10−3. The threshold parameter

τ is set to 0.01 and the maximum number of iterations is set to 10 for both

ILRSD and SS GoDec. In ILRSD, to obtain the global low rank structure C,

each input matrix is partitioned into 10 sub-matrices.

Table 1 shows the simulation results, where the matrices L and S are the

estimated low-rank and sparse components respectively. Each value in the table

is the average result over 100 simulations. To see the ability of ILRSD in

reducing the memory cost, we use a common PC with a memory of 2 GB and a

dual-core CPU of 2.67 GHz in these simulations. It can be seen that SS GoDec

18

Page 20: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Table 1: Errors and time costs of SS GoDec and the proposed ILRSD algorithm. The results

separated by “/” are obtained by SS GoDec and the proposed scheme respectively.

m r ‖L−L‖F‖L‖F

‖S−S‖F‖S‖F

Time (second)

1200 10 2.22 × 10−4 / 8.08 × 10−4 1.03 × 10−2 / 1.07 × 10−2 4.87 / 4.68

2400 20 1.56 × 10−4 / 4.73 × 10−4 1.02 × 10−2 / 1.06 × 10−2 20.63 / 30.06

3600 30 1.28 × 10−4 / 3.36 × 10−4 1.02 × 10−2 / 1.05 × 10−2 49.11 / 68.52

4800 40 N. A. / 2.62 × 10−4 N. A. / 1.04 × 10−2 N. A. / 129.64

fails due to insufficient memory when the input matrix dimension of m goes

up to 4800, while our proposed scheme can still perform the decomposition.

When the matrix dimension is relatively low, the proposed scheme achieves

a decomposition performance comparable to that of SS GoDec in terms of the

relative errors for L and S. The relative errors decrease as the size of the matrix

increases. From the table, we can also see that the time cost of the proposed

scheme is often higher than that of SS GoDec. This is mainly because the

incremental low-rank recovery pass for obtaining the global low-rank structure C

is currently implemented in a serial manner, which could be greatly accelerated

by processing individual sub-matrices in a parallel manner.

4.4. ILRSD Based Video Coding

The proposed incremental LRSD (ILRSD) algorithm solves the two limita-

tions of SS GoDec, which makes our LRSD based coding scheme more suitable

for practical usages. Moreover, because the two passes of ILRSD are indepen-

dent to each other, it also introduces more flexibility when applying ILRSD to

different video coding scenarios. For example, for offline encoding or transcod-

ing stored surveillance videos, ILRSD can be directly used to replace the LRSD

step in the proposed coding scheme described in Section 3. That is, both the

incremental low-rank recovery pass and the incremental sparse decomposition

pass are applied to the entire video to compute the low-rank structure, low-rank

coefficients, and sparse components, which can be seen as a large-scale exten-

sion of the proposed LRSD based coding scheme. For real-time remote video

19

Page 21: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

surveillance systems with fixed cameras, we can extract the global background

in advance using the incremental low-rank recovery pass of the proposed ILRSD

algorithm. Then, during the real-time operation, only the sparse and residual

components of the current frames need to be encoded and transmitted. In this

way, more bits can be saved and the bandwidth requirement can be reduced.

5. Experimental Results

In this section, we conduct experiments to evaluate the performance of the

proposed schemes for compressing videos captured by fixed videos. First, we

compare the proposed schemes with H.264/AVC [23] and HEVC [2] to show that

our schemes can improve the coding efficiency of the existing standard codecs.

Then, we compare the proposed ILRSD based scheme with the state-of-the-art

background modeling based schemes. All the experiments are conducted on

a common PC with an Intel(R) Core(TM) i5-2400 CPU and a memory of 4

GB. The incremental low-rank recovery and sparse decomposition algorithms

are implemented in MATLAB.

5.1. Comparison with standard codecs

We use the H.264/AVC reference software JM18.4 1 with High Profile and

the HEVC reference software HM12.0 2. JM is configured according to the ITU-

T recommendations [24], and HM is configured according to its Main Profile in

low-delay mode. It is reported in [3] that H.264/AVC can achieve higher coding

efficiency in surveillance video coding by using a special configuration. Thus, we

also include this fine-tuned H.264/AVC for comparison. Note that no fine-tuned

configuration is used in the proposed coding schemes.

At this stage, four representative surveillance videos with low resolution are

used, named “Hall”, “Escalator”, “Campus”, and ‘Lobby” 3, shown in Fig. 8.

1http://iphome.hhi.de/suehring/tml/download/2http://hevc.hhi.fraunhofer.de/3http://perception.i2r.a-star.edu.sg/bk model/bk index.html

20

Page 22: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

For simplicity, we only use the first 400 frames of “Lobby” and 350 frames of

the rest videos. Besides the general surveillance video “Hall” that contains the

stationary background and common moving objects, different test videos have

different characteristics. In the “Escalator” video, there are several escalators

that cause periodic perturbations. The “Campus” video has several trees that

cause irregular perturbations, and the “Lobby” video has a sharp change of

brightness caused by turning off the lights.

(a) Hall (b) Escalator (c) Campus (d) Lobby

(e) Silent (CIF) (f) Bridge close

(CIF)

(g) PETS 2007 (SD)

Figure 8: Sample frames of the seven test video sequences. Top row: four low-resolution

videos. Bottom row: three relatively high-resolution videos.

In this comparison, for SS GoDec and the proposed ILRSD, we empirically

set the threshold τ to 10 and the maximum number of iterations to 20. The

target rank is set to 2 for the “Lobby” video due to the illumination change,

and 1 for other videos. Note that if the GOP information is available, we can

use a target rank of 1 in each GOP by assuming that the background is ap-

proximately unchanged in the GOP. For ILRSD, each sub-matrix is constructed

using 25 video frames. For VC and VS defined in Section 3, considering that

VC , representing the essential background information for all the background

frames, is more important than VS in terms of overall reconstruction quality,

we set the quantization parameter (QP) for encoding VC to be half as much as

21

Page 23: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

that for encoding VS in the proposed schemes.

Fig. 9 shows the PSNR performance of different coding schemes under differ-

ent rates through varying the QP of VS from 10 to 48. It is shown in Fig. 9 that

20 40 60 80 100 120

32

34

36

38

40

42

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC

(a) Hall

50 100 150 200 25028

30

32

34

36

38

40

42

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC

(b) Escalator

50 100 150 200 250 30028

29

30

31

32

33

34

35

36

37

38

39

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC

(c) Campus

5 10 15 2036

38

40

42

44

46

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed LRSD−based H.264/AVCProposed ILRSD−based H.264/AVCH.264/AVCFine−tuned H.264/AVC

(d) Lobby

Figure 9: PNSR performance of compressing the four low-resolution videos by the proposed

LRSD based and ILRSD based schemes using H.264/AVC as the base codec, compared with

H.264/AVC with a special configuration reported in [3].

for the four test videos, the proposed coding schemes can significantly improve

the base codec, H.264/AVC. Using the proposed schemes, much less bits are re-

quired to achieve a sufficient quality of the reconstructed frames, e.g., around 40

dB. Since we use adaptive quantization in the LRSD based scheme and uniform

quantization in the ILRSD based scheme, the proposed LRSD based scheme

performs generally better than the proposed ILRSD based scheme.

22

Page 24: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

In general, less moving objects in the video lead to higher coding performance

for our schemes. For example, in the “Lobby” video, the proposed LRSD based

scheme achieves a significant PSNR gain of up to 5 dB compared to H.264/AVC.

When there are more moving objects, our schemes also work well, but the PSNR

gain becomes smaller. For example, the proposed LRSD based scheme achieves

PSNR gains up to 3 dB and 2.5 dB for the “Hall” and “Escalator” videos,

respectively. When there are many irregular background perturbations, the

sparse assumption of the foreground might not hold. But even in such a case,

our schemes can still obtain small PSNR gains compared to H.264/AVC. An

example is shown in Fig. 9 (c). That is, for the “Campus” video, the LRSD

based scheme achieves a PSNR gain of up to 1 dB. It can also been seen from

Fig. 9 that for the four surveillance videos, the performance of the proposed

ILRSD based scheme is in general similar to that of the LRSD based scheme,

which verifies the effectiveness of the proposed incremental scheme. Compared

with the fine-tuned H.264/AVC, our proposed schemes without the fine tuning

of H.264/AVC can still achieve better performance. Fig. 10 shows the PSNR

improvement of using ILRSD with the latest standard video codec HEVC, where

now HEVC is the base codec for our proposed scheme. Note that Fig. 10 (a)

shows the general case and Fig. 10 (b) shows the worst case.

10 20 30 40 50 60 70 80 9032

33

34

35

36

37

38

39

40

41

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed ILRSD−based HEVCHEVC

(a) Hall

40 60 80 100 120 140 160 18028

29

30

31

32

33

34

35

36

37

38

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

Proposed ILRSD−based HEVCHEVC

(b) Campus

Figure 10: PNSR performance of the proposed ILRSD based scheme using HEVC as the base

codec.

23

Page 25: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

We would like to point out that, compared with the base codec, the per-

formance of our proposed schemes become saturated or even worse at high bit

rates. Such performance cap is mainly due to two reasons. First, the decom-

posed low-rank, sparse, and residual components are in floating point format.

Rounding them into integers causes some information loss. Second, we normal-

ize the sparse and residual components into a value range of [0, 255] so as to

facilitate the subsequent 8-bit standard codec. Such normalization process also

causes information loss. That’s why the coding performance of our proposed

ILRSD scheme tends to be flat at relatively high bit rates or at high PSNR

range. We argue that such a limitation can be greatly alleviated if we use a

9-bit codec such as the one in [3]. In addition, for practical applications, a

PSNR value of 38 ∼ 40 dB is usually sufficient, which means a pretty good

visual quality.

5.2. Comparison with state-of-the-art

Here, we compare our proposed ILRSD scheme with two methods, the rep-

resentative BGS based coding scheme [3] and state-of-the-art background pre-

diction based coding scheme [10]. We use H.264/AVC with High Profile as the

base codec. In particular, we compare the following five methods: “ILRSD

based H.264/AVC”, “BGS based H.264/AVC”, “BG as long-term reference”,

“McFIS based H.264/AVC”, and H.264/AVC. “ILRSD based H.264/AVC” is

our proposed ILRSD based coding method with rank equal to 1. “BGS based

H.264/AVC” is the method of [3] but using the same background and the same

coding strategy as “ILRSD based H.264/AVC”, for fair comparison. In other

words, the only difference is that “BGS based H.264/AVC” uses background sub-

traction to obtain the residual sequence while our “ILRSD based H.264/AVC”

uses the ILRSD framework. “McFIS based H.264/AVC” is the method of [10]

(using the first 25 frames to generate the McFIS frame), and “BG as long-term

reference” is almost the same as [10] except using the background generated

by ILRSD as the long-term reference frame. This is to show the quality of the

generated background by ILRSD.

24

Page 26: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

To show the performance on long and relatively high-resolution videos, here

we use the following four video sequences for experiments: 300-frame CIF

“Silent” video, 1000-frame QCIF “Hall” video, 1000-frame CIF ‘Bridge close”

video and 1000-frame SD ‘PETS 2007” video. Note that the “PETS 2007” video

is chosen from the first view of “Dataset S6” in PETS 2007 dataset 4, which is

of relatively high resolution and high density of moving objects. The “Silent”

and “Bridge close” videos are standard test sequences 5. Sample frames of these

test sequences can be found in Fig. 8.

10 20 30 40 50

33

34

35

36

37

38

39

40

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC

(a) Hall (QCIF)

0 200 400 600 80032

33

34

35

36

37

38

39

40

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC

(b) Bridge close (CIF)

500 1000 1500 2000

32

33

34

35

36

37

38

39

40

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC

(c) PETS 2007 (SD)

50 100 150 200 250 300 35032

33

34

35

36

37

38

39

40

41

Bitrate (kbps)

Ave

rage

Y−P

SN

R (

dB)

ILRSD based H.264/AVCBGS based H.264/AVCBG as long−term referenceMcFIS based H.264/AVCH.264/AVC

(d) Silent (CIF)

Figure 11: The PSNR comparison of our proposed ILRSD scheme with the state-of-the-art

methods.

4http://www.cvg.rdg.ac.uk/PETS2007/data.html5http://media.xiph.org/video/derf/

25

Page 27: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Fig. 11 shows the PSNR comparisons with the state-of-the-art methods. It

can be seen that the proposed ILRSD based scheme achieves the best perfor-

mance at relatively low bit rates. For a target average PSNR value of 38 dB,

which is quite reasonable for practical applications, the proposed ILRSD based

scheme, the BGS based scheme, and the McFIS based scheme achieve average

bit-rate reductions of 27.8%, 15.1%, and 8.4% on the four test videos, compared

to the base codec, H.264/AVC, respectively.

From Fig. 11, we can see that the proposed ILRSD based coding scheme

always outperform the BGS based coding scheme. Compared with the McFIS

based scheme, the proposed ILRSD based scheme performs better in most of the

cases except for the “Silent” video at relatively high bit rates. This is because for

the “Silent” video, the foreground person keeps moving within a limited range,

for which it is hard to extract a clean background. For such a case, background

prediction based approach seems more efficient.

With a MATLAB implementation, the proposed incremental sparse decom-

position requires about 5 ms for one QCIF frame and about 30 ms for a CIF

frame. Since MATLAB implementation is generally slower than C/C++ imple-

mentation, this result suggests that the proposed incremental sparse decompo-

sition part is quite lightweight and it could be implemented in practical surveil-

lance systems. For the 300-frame “Silent” video, the proposed incremental low-

rank recovery (using 30 frames to construct each sub-matrix) requires about

11.5 seconds to extract the low-rank structure. Thus, such complex processing

needs to be done offline or only at the beginning of encoding a video sequence.

5.3. A brief analysis of the quantization parameters

As reported in [25], a careful adjustment of the quantization parameter

(QP) for encoding VC and VS leads to a better encoding performance of the

input video. However, the optimal combination of QPs has to be found by pre-

encoding each component separately, which will cause unnecessary time delay

in practical applications. To avoid such pre-processing, we present a brief com-

parison between different combinations of QPs and give a empirical suggestion

26

Page 28: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

for QP selection in practice. The representative video “PETS 2007” is used in

this comparison.

500 1000 1500 2000 250033

34

35

36

37

38

39

40

41Y

−PS

NR

(dB

)

Bitirate (kbps)

QPL = QP

S

QPL = QP

S/1.5

QPL = QP

S/2

QPL = QP

S/3

QPL = 0

H.264

(a) The proposed method

500 1000 1500 2000 250033

34

35

36

37

38

39

40

Y−P

SN

R (

dB)

Bitirate (kbps)

QPBG

= QPFG

QPBG

= QPFG

/1.5

QPBG

= QPFG

/2

QPBG

= QPFG

/3

QPBG

= 0

H.264

(b) The BGS based method

Figure 12: Comparisons between using different combinations of QPs on the “PETS 2007”

video.

Fig. 12 shows the comparisons of the coding efficiency among using different

combinations of QPs. For the proposed method, we use QPL to represent

the QP for encoding VC and QPS to represent the QP for encoding VS . For

the BGS based method, we use QPBG to represent the QP for encoding the

background and QPFG represent the QP for encoding the residuals. It can

be seen that, for both the proposed method and the BGS based method, using

different combinations of QPs have similar coding efficiencies, except for the case

that QPL=QPS or QPBG=QPFG. So for the proposed method, QPL=QPS/2

is a reasonable choice for good coding performance. By comparing Fig. 12 (a)

and (b), we can see that no matter which QP combination is used, our proposed

method always outperforms the BGS method.

6. Conclusion

The emerging theory of LRSD provides efficient algorithms for the separation

of the background and the moving objects in videos captured by fixed cameras.

In this paper, we have proposed a scheme that can efficiently encode the com-

ponents generated by LRSD and achieve good overall compression efficiency.

27

Page 29: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

Moreover, we have also proposed an incremental LRSD algorithm that greatly

reduces the memory requirement and the computational complexity of the ex-

isting LRSD algorithm so that practical large-scale surveillance videos can be

handled. Numerous experiments on different test sequences demonstrated that

the proposed coding schemes can significantly improve the existing standard

codecs, H.264/AVC and HEVC, at relatively low bit rates and outperform the

state-of-the-art background modeling based coding schemes.

For future work, one extension is to use a 9-bit codec such as the one in [3],

instead of the 8-bit H.264/AVC in our current implementation, to encode the

residual videos so as to avoid the information loss at the beginning. In addition,

exploiting the coding modes in the macroblock level [26] may also boost the

performance of the proposed method.

Acknowledgement

This work is partially supported by MoE AcRF Tire 2 Grant, Singapore,

Grant No.: T208B1218, the Major State Basic Research Development Program

of China (973 Program, No. 2013CB329402), the National Natural Science

Foundation of China (Nos. 61227004, 61372131, 11204014), the 111 Project

(No. B07048), the Fundamental Research Funds for the Central Universities

(No. K5051399020), and the Research Fund for the Doctoral Program of Higher

Education of China (No. 20130203120009).

References

[1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the

H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Tech-

nol. 13 (7) (2003) 560–576. doi:10.1109/TCSVT.2003.815165.

[2] G. Sullivan, J. Ohm, W.-J. Han, T. Wiegand, Overview of the high effi-

ciency video coding (HEVC) standard, IEEE Trans. Circuits Syst. Video

Technol. 22 (12) (2012) 1649–1668. doi:10.1109/TCSVT.2012.2221191.

28

Page 30: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

[3] X. Zhang, L. Liang, Q. Huang, Y. Liu, T. Huang, W. Gao, An efficient

coding scheme for surveillance videos captured by stationary cameras, in:

Proc. Visual Commun. Image Process. (VCIP), Vol. 7744, SPIE, 2010.

doi:10.1117/12.863522.

[4] X. Zhang, L. Liang, Q. Huang, T. Huang, W. Gao, A background model

based method for transcoding surveillance videos captured by station-

ary camera, in: Picture Coding Symposium (PCS), 2010, pp. 78–81.

doi:10.1109/PCS.2010.5702583.

[5] W. Gao, Y. Tian, T. Huang, S. Ma, X. Zhang, IEEE 1857 standard empow-

ering smart video surveillance systems, IEEE Intell. Syst. PP (99) (2013)

1–1. doi:10.1109/MIS.2013.101.

[6] X. Zhang, Y. Tian, L. Liang, T. Huang, W. Gao, Macro-block-level se-

lective background difference coding for surveillance video, in: Int’l Conf.

Multimedia and Expo (ICME), IEEE, 2012, pp. 1067–1072.

[7] X. Zhang, Y. Tian, T. Huang, W. Gao, Low-complexity and high-

efficiency background modeling for surveillance video coding, in: Proc.

Visual Commun. and Image Process. (VCIP), IEEE, 2012, pp. 1–6.

doi:10.1109/VCIP.2012.6410796.

[8] M. Paul, W. Lin, C. Lau, B.-S. Lee, Video coding using the most common

frame in scene, in: Int’l Conf. Acoust., Speech, Signal Process. (ICASSP),

IEEE, 2010, pp. 734–737. doi:10.1109/ICASSP.2010.5495033.

[9] K. Misra, J. Zhao, A. Segall, McFIS in hierarchical bipredictve

pictures-based video coding for referencing the stable area in a scene,

in: Int’l Conf. Image Process. (ICIP), IEEE, 2011, pp. 3521–3524.

doi:10.1109/ICIP.2011.6116473.

[10] M. Paul, W. Lin, C.-T. Lau, B.-S. Lee, Explore and model better I-frames

for video coding, IEEE Trans. Circuits Syst. Video Technol. 21 (9) (2011)

1242–1254. doi:10.1109/TCSVT.2011.2138750.

29

Page 31: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

[11] J. Cai, E. J. Candes, Z. Shen, A singular value thresholding algorithm for

matrix completion, SIAM J. Optim. 20 (4) (2010) 1956–1982.

[12] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method

for exact recovery of corrupted low-rank matrices, arXiv preprint

arXiv:1009.5055.

[13] E. J. Candes, X. Li, Y. Ma, J. Wright, Robust principal component anal-

ysis?, J. ACM 58 (3) (2011) 11:1–11:37.

[14] T. Zhou, D. Tao, GoDec: Randomized low-rank & sparse matrix decom-

position in noisy case, in: Int’l Conf. Mach. Learning (ICML), 2011.

[15] C. Chen, J. Cai, W. Lin, G. Shi, Surveillance video coding via low-rank

and sparse decomposition, in: ACM Multimedia, 2012, pp. 713–716.

[16] K. Bredies, Dirk, A. Lorenz, Iterated hard shrinkage for minimization prob-

lems with sparsity constraints, SIAM J. Sci. Comput. 30 (2006) 657–683.

[17] C. Eckart, G. Young, The approximation of one matrix by another of lower

rank, Psychometrika (1936) 211–218.

[18] T. Zhou, D. Tao, Greedy bilateral sketch, completion & smoothing, in: Int’l

Conf. Artificial Intell. Statistics, 2013.

[19] P. Drineas, M. W. Mahoney, S. Muthukrishnan, Relative-error CUR matrix

decompositions, SIAM J. Matrix Anal. Appl. 30 (2008) 844–881.

[20] L. Li, W. Huang, I. Y.-H. Gu, Q. Tian, Statistical modeling of complex

backgrounds for foreground object detection, IEEE Trans. Image Process.

13 (11) (2004) 1459–1472. doi:10.1109/TIP.2004.836169.

[21] T. Bouwmans, F. E. Baf, B. Vachon, Background modeling using mixture of

gaussians for foreground detection - a survey, Recent Patents on Computer

Science 1 (3) (2008) 219–237.

30

Page 32: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

[22] L. W. Mackey, A. S. Talwalkar, M. I. Jordan, Divide-and-conquer ma-

trix factorization, in: Advances in Neural Information Processing Systems

(NIPS) 24, 2011, pp. 1134–1142.

[23] G. J. Sullivan, P. Topiwala, A. Luthra, The H.264/AVC advanced video

coding standard: Overview and introduction to the fidelity range exten-

sions, in: SPIE conference on Applications of Digital Image Processing

XXVII, 2004.

[24] T. Tan, G. Sullivan, T. Wedi, Recommended simulation common conditions

for coding efficiency experiments, ITU-T Q.6/SG16, VCEG-AA10 (2005).

[25] J. Hou, L.-P. Chau, M. Zhang, M. Zhang, N. Magnenat-Thalmann,

Y. He, A highly efficient compression framework for time-varying

3d facial expressions, IEEE Trans. Circuits Syst. Video Tech-

nol.doi:10.1109/TCSVT.2014.2313890.

[26] S. Wang, J. Fu, Y. Lu, S. Li, W. Gao, Content-aware layered compound

video compression, in: IEEE International Symposium on Circuits and

Systems (ISCAS), 2012, pp. 145–148.

31

Page 33: Incremental low-rank and sparse decomposition for compressing videos captured by fixed cameras

�� �������������������� ����������������������������������������������������������� ������������������������������������ ���������������� ���������������������� ���� !"����������������#��������� �������������������������������������