SPARSITY-BASED REPRESENTATION WITH LOW-RANK …

SPARSITY-BASED REPRESENTATION WITH

LOW-RANK INTERFERENCE: ALGORITHMS AND

APPLICATIONS

by

Minh Dao

A dissertation submitted to The Johns Hopkins University in conformity with the

requirements for the degree of Doctor of Philosophy.

Baltimore, Maryland

May, 2015

c© Minh Dao 2015

All rights reserved

Abstract

In this thesis, we develop a novel general framework that is capable of extracting a

low-rank interference while simultaneously promoting sparsity-based representation

of multiple correlated signals. The proposed framework provides a new and efficient

approach for the representation of multiple measurements where the underlying

signals exhibit a structured sparsity representation over some proper dictionaries

but are corrupted by the interference from external sources. Under the assumption

that the interference component forms a low-rank structure, the proposed algorithms

minimize the nuclear norm of the interference to exclude it from the representation

of multivariate sparse representation. In other words, this thesis investigates the

problem of structural sparse signal representation even in the presence of large but

correlated noise/interference.

A fast and efficient algorithm based on alternating direction methods of multipli-

ers (ADMM) approach is studied to solve the convex optimization problems arisen

from these models. Furthermore, we modify the classical ADMM approach by utiliz-

ii

ABSTRACT

ing an approximation to relax the dictionary transform representation, thus simplify

the computing efforts to achieve optimization solutions. By this modification, we

further show that the algorithm is guaranteed to converge to the global optimum

solutions. Extensive experiments are conducted on four practical applications: (i)

synthetic aperture radar image recovery, (ii) hyperspectral chemical plume detec-

tion and classification, (iii) robust speech recognition in noisy environments, and

(iv) video-based facial expression recognition; all of which show that the proposed

models provide significant improved performance compared with the state-of-the-art

results.

The thesis further extends the general simultaneous structured sparsity and low-

rank framework to multi-sensor for classification problems. Particularly, we study

a variety of novel sparsity-regularized regression methods, commonly categorized as

collaborative multi-sensor sparse representation for classification, which effectively

incorporates simultaneous structured-sparsity constraints, demonstrated via a row-

sparse and/or block-sparse coefficient matrix, both within each sensor and across

multiple heterogeneous sensors. The efficacy of the proposed multi-sensor algorithms

is verified in an automatic border patrol control application to discriminate between

human and animal footsteps.

Primary Reader: Dr. Trac D. Tran

Secondary Reader: Dr. Sang (Peter) Chin

iii

Acknowledgments

First and foremost, I would like to express my deepest gratitude to my advisor,

Prof. Trac D. Tran, for his continuous supports of my Ph.D. study and research,

for his patience, motivation, and immense knowledge. He provided the best aca-

demic environment, while allowed me great freedom in choosing research topics.

His guidance and constant encouragement have made me become not only a better

researcher, but also a better person. This thesis would not have been done without

all his insightful suggestions.

I also owe a significant part of this experiment to my co-advisor, Prof. Sang (Pe-

ter) Chin, for his kind and generous support throughout my entire Ph.D. program.

He has not only granted me deep and competent discussions in research, but also

inspired me a lot with his motivation, enthusiasm, and views of life.

I am thankful to Dr. Nasser Nasrabadi from the U.S. Army Research Laboratory

(ARL) for his support over the past year. Though I just worked in ARL for a short

time, he has continuously guided me and provided me valuable suggestions and

insights on my research, even after I finished my internship there.

iv

ACKNOWLEDGMENTS

I also thank my dissertation committee members, Prof. Ralph Etienne-Cummings

and Prof. Mark Foster for their constructive feedback and suggestions to improve

the quality of this dissertation.

I consider myself very fortunate to have been mentored by many wonderful col-

laborators: Prof. Vishal Monga, Pennsylvania State University; Dr. Lam Nguyen,

the Army Research Laboratory; Dr. Chiman Kwan, the Signal Processing, Inc.;

Prof. Markus Reischl, Karlsruhe Institute of Technology; and Prof. Huong Viet

Nguyen, Hanoi University of Technology. I thank them for the challenging problems

they brought, the insights they shared and their words of encouragement.

The Digital Signal Processing (DSP) Lab has been my academic home during my

Ph.D. time. I would like to thank all of the following lab mates and collaborators at

the DSP lab: Dr. Dzung T. Nguyen, Dr. Nam H. Nguyen, Dr. Yi Chen, Yuanming

Suo, Dung Tran, Souphy Sun, Sonia Joy, Akshay Rangamani, Tao Xiong, Luoluo

Liu, Xiang Xiang, and Qing Qu. I have also thoroughly enjoyed and benefited from

the collaboration with Dr. Umamahesh Srinivas and Hojjat Mousavi from PSU and

Dr. Nam Nguyen from Towson University.

I gratefully acknowledge the funding sources that made my Ph.D. work possi-

ble. I was financially supported by the Vietnam Education Foundation fellowship,

the JHU Applied Physics Laboratory fellowship, the ECE department at JHU, the

National Science Foundation, the Army Research Laboratory, the Army Research

Office, and the Office of Naval Research.

v

ACKNOWLEDGMENTS

I am heartily thankful to my family for their tremendous supports and belief

in me over years. My father passed away very long time ago but has continuously

inspired me by his great examples. My mother, who also departed her life before my

Ph.D. completion, had always been my largest support until her last days. Lastly,

this thesis is dedicated to my wife and my daughter. I owe my deepest gratitude to

them for their unconditional love and sacrifice. They are the main inspiration for

me to complete this thesis as well as continue my long journey ahead.

vi

Contents

Abstract ii

Acknowledgments iv

List of Tables xi

List of Figures xii

1 Introduction 1

1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Background Review 12

2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Sparse Signal Representation . . . . . . . . . . . . . . . . . . . . . . . 13

2.3Sparse Representation for Classification . . . . . . . . . . . . . . . . . . . 15

vii

CONTENTS

2.4 Multi-measurement Sparse Representation . . . . . . . . . . . . . . . 18

2.4.1 Joint Sparse Representation . . . . . . . . . . . . . . . . . . . 19

2.4.2 Group Sparse Representation . . . . . . . . . . . . . . . . . . 20

2.5 Low-rank Matrix Approximations . . . . . . . . . . . . . . . . . . . . 22

2.5.1 Matrix Completion . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5.2 Robust Principal Component Analysis . . . . . . . . . . . . . 24

3 Structured Sparse Representation with Low-rank Interference 25

3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Motivational Applications . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Simultaneous Low-rank and Sparse Representation Models . . . . . . 32

3.3.1 Sparse Representation with Low-rank Interference . . . . . . . 33

3.3.2 Joint Sparse Representation with Low-rank Interference . . . . 35

3.3.3 Group Sparse Representation with Low-rank Interference . . . 36

3.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4.1 ADMM-based Algorithm . . . . . . . . . . . . . . . . . . . . . 38

3.4.2 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . 44

4 Applications on Structured Sparse Representation with Low-rank

Interference 50

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Synthetic Aperture Radar Image Recovery . . . . . . . . . . . . . . . 52

viii

CONTENTS

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 Hyperspectral Gas Plumn Detection and Classification . . . . . . . . 64

4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65



4.4 Robust Noise Speech Recognition . . . . . . . . . . . . . . . . . . . . 72

4.4.1 Introduction of Sparsity-based Speech Recognition . . . . . . . 73



4.5 Video-based Facial Expression Recognition . . . . . . . . . . . . . . . 81

4.5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82



5 Multi-sensor Classification via Sparsity-based Representation with

Low-rank Interference 89

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.2 Multi-sensor Classification via Sparsity Models . . . . . . . . . . . . . 93

ix

CONTENTS

5.2.1 Multi-sensor Joint-Sparse Representation with Low-rank In-

terference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.2.2 Multi-sensor Group-Joint-Sparse Representation with Low-

rank Interference . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.3 Multi-sensor Kernel Model . . . . . . . . . . . . . . . . . . . . . . . . 100

5.3.1 Background on Kernel Sparse Representation . . . . . . . . . 100

5.3.2 Multi-sensor Kernel Group-Joint Sparse Representation with

Low-rank Interference. . . . . . . . . . . . . . . . . . . . . . . 102

5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.4.1 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . 104

5.4.2 Comparison Methods . . . . . . . . . . . . . . . . . . . . . . . 106

5.4.3 Classification Results and Analysis . . . . . . . . . . . . . . . 109

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6 Conclusions 119

Bibliography 121

Vita 136

x

List of Tables

4.1 RFI suppression comparison with side-looking mono-static simulationdata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 RFI suppression comparison with forward-looking ARL UWB MIMOreal data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3 Overall recognition rates from four hyperspectral video test sequences’AA12 ’, ’R134a6’, ’SF6 27 ’, and ’TEP 9 ’. . . . . . . . . . . . . . . . 70

4.4 Confusion matrix for SRC based emotion recognition with neutralfaces explicitly provided . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5 Confusion matrix for SR+L based emotion recognition without know-ing neutral faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.6 Confusion matrix for GSR+L based emotion recognition without know-ing neutral faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1 Total amount of data collected in two days. . . . . . . . . . . . . . . 1055.2 List of sensor combinations. . . . . . . . . . . . . . . . . . . . . . . . 1105.3 Summarized classification results of single sensor sets, multiple sensor

sets, and combining all sets. . . . . . . . . . . . . . . . . . . . . . . . 1145.4 Classification results of set 15 (all-inclusive sensors). . . . . . . . . . 115

xi

List of Figures

1.1 A general multi-sensor problem with unknown low-rank interference . 6

3.1 Sparse representation with low-rank interference model . . . . . . . . 343.2 Joint sparse representation with low-rank interference model . . . . . 363.3 Group sparse representation with low-rank interference model . . . . 37

4.1 ARL UWB MIMO forward-looking SAR system . . . . . . . . . . . . 534.2 Singular values of RFI component . . . . . . . . . . . . . . . . . . . . 574.3 Comparison of RFI suppression performances with side-looking sim-

ulated data when RFI power is 5 times that of SAR signals . . . . . . 604.4 Comparison of RFI suppression performances with ARL UWB forward-

looking real-world data when RFI power is twice that of SAR signals 624.5 Zoom-in portions of SAR images shown in Fig. 4.4 within the region

of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.6 Low-rank and joint sparse representation construction in a hyperspec-

tral frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.7 Chemical detection performance from a frame of ”SF6 27” sequence . 704.8 Chemical detection performance from a frame of ”TEP 9” sequence . 714.9 Comparison of digit speech recognition results - test set A. . . . . . . 774.10 Comparison of digit speech recognition results - test set B. . . . . . . 784.11 Decomposition results of GSR+L+ on the MFC coefficient for a

speech test of number 7 corrupted via car engine noise at SNR=-5dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.12 Decomposition results of GSR+L on the MFC coefficient for a speechtest of number 5 corrupted via vent wind noise at SNR=-10dB . . . . 80

4.13 Separations of neutral faces and expression components . . . . . . . . 84

5.1 Multi-sensor sample construction. . . . . . . . . . . . . . . . . . . . . 945.2 Four acoustic sensors, three seismic sensor, one PIR sensor and one

ultrasound sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

xii

LIST OF FIGURES

5.3 Signal segments of length 30000 samples captured by all the availablesensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.4 Comparison of classification results - DEC09 as test data. . . . . . . . 1125.5 Comparison of classification results - DEC10 as test data. . . . . . . . 113

xiii

Chapter 1

Introduction

1.1 Motivations

Nowadays, many modern applications in signal and image processing, machine

learning, computer vision or pattern recognition involve simultaneous representa-

tions of multiple correlated signals [1–5]. These applications normally face the sce-

nario where data sampling is performed simultaneously from multiple co-located

sources (such as channels or sensors), yet within a small spatio-temporal neighbor-

hood, recording the same physical event. This multi-measurement data collection

scenario allows exploitation of complementary features within the related signal

sources to improve the resulting signal representation and guide successful decision-

making. Similar signals being recorded by multiple cameras/sensors in a room,

same objects performing on consecutive frames of a video sequences, or analogous

1

CHAPTER 1. INTRODUCTION

scenes being observed across multiple electromagnetic spectrum in hyperspectral

remote sensing are just some examples that can be benefited from simultaneously

processing data in batch.

One powerful tool to efficiently incorporate simultaneous signal representations

is sparse decomposition and sparse representation [6]. A sparse representation is

mainly based on the observation that signals of interest are inherently sparse in

certain bases or dictionaries where they can be approximately represented by only

a few significant components carrying the most relevant information [7,8]. A sparse

representation not only provides better signal compression for bandwidth/storage

efficiency but also leads to faster processing algorithms as well as more effective

signal separation for detection, classification and recognition purposes because it

focuses on the most intrinsic property of the data. Furthermore, sparse signal rep-

resentation allows us to capture the hidden simplified structure often present in the

data jungle, and thus minimizes the harmful effects of noise in practical settings.

Recently, with the emergence of the compressed sensing (CS) framework [7, 8],

sparse representation and related optimization problems involving sparsity as a

prior called sparse recovery have increasingly attracted the interest of researchers in

various diverse disciplines, from signal processing to pattern recognition, machine

learning or computer vision. Remarkably, in many of these fields, sparsity-based

techniques arguably achieve state-of-the-art results. While we enjoy the benefits

that sparsity-based techniques bring in many application domains, one of the rais-

2


ing questions is how to even make these models more robust, especially in multi-

measurement representation settings. Particularly, there are two aspects that need

to be deliberately considered:

Incorporating structured priors : Signals recorded in a small area within a

short time frame often exhibit a high level of joint structure and rich mutual cor-

relation. When performing different observation signals simultaneously via sparsity

models, various structures of the nonzero coefficients (also called sparse supports)

among multiple input samples can be incorporated to further enhance the repre-

sentations of signals, thus improve the reconstruction quality or boost the overall

classification rates.

Dealing with interference : Real-world data is often incomplete, missing,

corrupted, and even contradictory due to various sources of interference, from noisy

operating environments to adversary tampering and sensor failure. These problems

severely hamper the effectiveness of classical tools such as the method of least squares

or principal component analysis (PCA) and they inspire the recent development of

robust PCA (RPCA) [9], which has the much improved capability to handle large,

yet sparse, outliers/corruptions.

While the sparsity society have seen great advances in developing algorithms

towards these two fundamental tasks, the applicability of current approaches are

limited due to several critical drawbacks: (i) the lack of capability to effectively

deal with large noise/interference; (ii) most existing methods require to know the

3


information of the interference in prior, while in real-world problems such knowl-

edge is often unknown; and (iii) rich spatio-temporal correlation structure often

existing in natural signals has not been fully incorporated. To further address these

current short-comings, this dissertation presents new theoretical ideas and

mathematical frameworks on structured sparse data representation of

multiple measurements in the presence of low-rank interference . Fur-

thermore, we develop computationally efficient algorithms, and extensively validate

experiments on various already-collected data sets where signals contain missing or

inaccurate parts but exhibit a high level of correlation.

1.2 Main Contributions

In this section, we briefly describe the main contributions of this dissertation.

Simultaneous Structured Sparsity and Low-rank Models

The main goal of this thesis is to develop a general novel framework that is capa-

ble of extracting the low-rank interference while simultaneously promoting sparsity-

based representations of multiple correlated signals [10]. This provides an efficient

approach for the representation of multiple measurements where the underlying sig-

nals exhibit a structured sparsity representation over some proper dictionaries but

the set of testing samples are corrupted by the interference from external sources.

4


Under the assumption that the interference component forms a low-rank structure,

the proposed algorithms minimize the nuclear norm of the interference to exclude

it from the representation of multivariate sparse representation. Put it differently,

this thesis investigates the problem of effective structural sparse data representation

even in the presence of large but correlated noise/interference. The main associated

model is:

YYY = DDDAAA+LLL+NNN (1.1)

where YYY is the set of correlated data observations; DDD is the given sparsifying dic-

tionary; AAA contains the sparse coefficients with certain sparsity structure; LLL is the

low-rank interference (LRI); and finally, NNN is the low-energy common dense noise

due to the imperfection of the test sample. In other words, we seek to search for the

low-rank interference LLL and simultaneously track various structures in the sparse

code AAA.

Fig. 1.1 illustrates a general practical system that can be potentially benefited

from the proposed framework. This is a typical multi-sensor setup when multiple

co-located sources/sensors simultaneously record the same physical events. It is

very often that sensors are interfered by unknown external signals or noises during

the recording process; thus the recorded measurements capture not only the signals

of interest but also the undesired interferences that may even dominate the main

signals, making the whole observation severely corrupted. In a multi-model setting,

sensor co-location normally ensures that interference/noise patterns are very simi-

5


Figure 1.1: A general multi-sensor problem with unknown low-rank interference.

lar, hence justifying the low-rank assumption. We are able to capture column-sparse

(e.g., sensor failure), row-sparse (e.g., adversary jamming) gross corruptions, and/or

dense background noises with large magnitudes (radio-frequency interference from

broadcasting stations and wireless systems, wind noise in environmental monitoring,

and any pattern noise that remains stationary during the data collection process).

An optimization framework that can capture the sparsest structured representation

(or even the label) of the signal while simultaneously discarding the noisy interfer-

ence is highly desirable.

For the sparse code AAA, we not only seek for the sparsest solution but also enforce

the support sets of the coefficient vectors (represented as columns of AAA) to perform

6


various structured priors. Specifically, we consider three circumstances promoting

structures on AAA: element-wise sparse, row-sparse, and group-sparse regularizations,

from which three corresponding models are proposed: sparse representation with

LRI (SR+L), joint sparse representation with LRI (JSR+L), and group sparse rep-

resentation with LRI (GSR+L). Detailed description as well as applicability analysis

of each model will be presented in chapter 3.

Adaptive ADMM-based Algorithm

We propose a fast and efficient algorithm to solve for the three proposed mod-

els of simultaneously optimizing for both sparse structure and low-rank constraints.

The requirements to concurrently seek for multiple variables as well as carry out

multiplex regularization constrains complicate the optimization process. Therefore,

our algorithm, while still based on the classical alternating direction methods of mul-

tipliers (ADMM) [11], use a Taylor expansion approximation to relieve the burden

of dictionary transform. Consequently, we are able to reduce complex optimizations

to a simple closed-form solution for each iterative variable in one iteration step, yet

conceivably lower the overall computational complexity. It should be noted that our

approach is different from most of other existing methods which are heavily depen-

dent on the variable splitting technique [12] when solving for complicated `1-related

norm [13] and nuclear norm minimizations [14,15]. Moreover, a theorem is outlined

and its detail proof is provided; which guarantee the proposed algorithm to converge

7


to the global optimal solution.

Applications on Various Problems and Diversified Data Sets

We extensively apply our proposed models on a number of practical problems

that potentially involve simultaneous minimizations of structured sparsity on sig-

nal representation and low-rank on interference component. Particularly, we explore

four novel methods solving four critical real-world problems focusing on both classifi-

cation and reconstruction tasks: (i) a robust framework for separation and extraction

of unpredicted radio-frequency interference (RFI) from raw synthetic aperture radar

(SAR) signals [16]; (ii) a chemical gas plume detection and classification algorithm

in hyperspectral sequence with unknown dominant background content [10]; (iii)

a noise robust automatic speech recognition algorithm that is adaptive with noise

sources and performs well under immense noises [10]; and (iv) a novel video-based

facial expression recognition method without requiring knowledge of the neutral face

content [17]. We compare our algorithms with the state-of-the-art conventional as

well as other sparsity-based methods. The empirical results show that our proposed

models outperform most of other competing methods or at least demonstrate com-

parable performance. Furthermore, in most of experimental settings, the results

from our framework are achieved when using less information of the input sam-

ples (only dictionary of the main signal is required) while those from competing

techniques require both dictionaries of the signal of interest and interference con-

8


tent. This further reinforces that the linear decomposition of a supervised sparse

signal representation and a low-rank interference component is a critical problem

and reserves extensive studies.

Multi-sensor for Classification Models

One more contribution in this dissertation is the extension of the general frame-

work into multi-sensor models for classification. By multi-sensor, we refer to the

problem associated with systems composing of both sensors of the same signal type

(homogeneous sensors) and sensors of different signal types (heterogeneous sensors).

Typically, it is demanded that highly-correlated mutual information from multiple,

yet co-located, sources/sensors are appropriately fused to improve the overall detec-

tion/classification accuracy.

We study a variety of novel sparsity-regularized regression methods, commonly

categorized as collaborative multi-sensor sparse representation for classification,

which effectively incorporate simultaneous structured-sparsity constraints, demon-

strated via a row-sparse and/or block-sparse coefficient matrix, both within each

sensor and across multiple sensors [18, 19]. Furthermore, we robustify our mod-

els to deal with the presence of low-rank signal-interference/noise. The low-rank

assumption on interference is appropriate for multi-sensor datasets since the sen-

sors are spatially co-located and data samples are temporally recorded, thus any

interference from external sources will have similar effect on all the multiple sensor

9


measurements.

We further extend our frameworks to kernelized models which rely on sparsely

representing a test sample in terms of all the training samples in a feature space

induced by a kernel function. The kernel representation has been proved to yield a

significant improvement in discriminative tasks in many data sets since the kernel-

based methods implicitly exploit the higher-order non-linear structure of the testing

data which may not be linearly separable in the original space [20,21]. We analyti-

cally investigate and empirically testify that the low-rank assumption on interfered

signals is still valid even after a nonlinear kernel transform. The advantages and

disadvantages of all explored models are discussed in detail for a multi-sensor border

patrol classification problem where the goal is to detect whether the event involves

human or human leading animals footsteps.

1.3 Outline

The remainder of this thesis is organized as follows. Chapter 2 introduces the

necessary background information to understand this work. We briefly review the

sparsity theory for recovering sparse representations of vectors in a given dictionary

as well as structured sparsity constraints of matrices when multiple measurements

are jointly represented. In this chapter, we also review some of low-rank approxima-

tion models, such as matrix completion or RPCA, which provide a robust alternative

10


framework to approximate low-dimensional structures from high-dimensional obser-

vations. Chapter 3 focuses on developing various proposed sparsity models based on

different assumptions of the structures of coefficient vectors and low-rank noise/in-

terference. A fast and efficient algorithm based on ADMM approach to solve the

convex optimization problems arisen from these models and the guarantee analysis

on its convergence to optimal solution are also outlined in this chapter. In chap-

ter 4, extensive experiments are conducted on four practical applications: synthetic

aperture radar image recovery, hyperspectral chemical plume detection and clas-

sification, robust speech recognition in noisy environments, and video-based facial

expression recognition to verify the methods’ effectiveness. In Chapter 5, we present

our generalized works on multi-sensor models. An extension model based on non-

linear kernel sparse is also provided and experimental results on the application of

automatic multi-sensor border patrol control are conducted. Finally, we summarize

the conclusions of this dissertation in Chapter 6.

11

Chapter 2

Background Review

In this chapter, we review some main theoretical results of the sparse represen-

tation theory and low-rank models that are related to this thesis. We first intro-

duce the notations and terminologies used throughout the thesis. We then present

fundamental concepts on sparsity of signals for single-measurement and multiple-

measurement cases. Low-rank matrix approximation techniques are introduced after

that and some specific low-rank models such as matrix completion and robust PCA

are presented.

2.1 Notations

The following notational conventions will be used throughout this thesis. We

denote vectors by boldface lowercase letters, such as xxx, and denote matrices by

12

CHAPTER 2. BACKGROUND REVIEW

boldface uppercase letters, such as XXX. For a matrix XXX, XXX i,j represents the element

at row ith and column jth of XXX while a bold lower-case letter with subscript, such

as xxxj, represents its jth column. The `q-norm of a vector xxx ∈ RP is defined as

‖xxx‖q = (∑P

i=1 |xi|q)1/q where xi is the ith element of xxx. For the specific case when q

= 0, the `0-norm of xxx, denoted ‖xxx‖0 is defined as the number of nonzero elements

in xxx. Given a matrix XXX ∈ RP×K , ‖XXX‖F , ‖XXX‖1,q, and ‖XXX‖∗ are used to defined

its Frobenious norm, mixed `1,q-norm and nuclear-norm, respectively. Operators

rank, dim, trace, (.)T denotes a rank, dimension, trace and matrix transposition,

respectively.

2.2 Sparse Signal Representation

Sparse signal recovery has been rigorously studied over the past few years as

a revolutionary signal sampling paradigm and drawn increasing attention in many

areas such as signal and image processing, computer vision, machine learning, and

control theory (see e.g., [7,8,22,23] and the references therein). According to sparse

signal recovery theory, an unknown signal aaa ∈ RP in the linear representation of a

dictionary matrixDDD ∈ RN×P (P > N) can be faithfully recovered from the measure-

ments yyy ∈ RN if aaa is sparse, i.e., it contains significantly fewer measurements than

the ambient dimension of the signal. More formally, consider an under-determined

system of linear equations yyy = DDDaaa where the dictionary DDD has more columns than

13


rows, hence promoting infinitely solutions for yyy. The reconstruction of the spars-

est solution aaa given signal yyy can be casted as the following sparsity-driven inverse

problem

minaaa

‖aaa‖0

s.t. yyy = DDDaaa,

(2.1)

where the `0-norm of aaa, denoted by ‖aaa‖0, is defined as the number of nonzero entries

of aaa (also called the sparsity level of the vector aaa).

While finding the sparsest solution of a given signal using the above minimization

is in general NP-hard [24], the pioneering work of Donoho [25] and Candes et. al.

[26] showed that, under some mild conditions, however, the `0-norm can be efficiently

solved by recasting it as a convex `1-based linear programming problem

minaaa

‖aaa‖1

s.t. yyy = DDDaaa,

(2.2)

where the `1-norm is defined as ‖aaa‖1 =∑P

i=1 |ai| with ai’s being the entries of

aaa. There has been a number of researches investigating conditions under which

the `0-norm and `1-norm minimizations are equivalent., in which mutual coherence

[27, 28] and restricted isometry property (RIP) [26, 29] conditions are among the

most well-known ones.

Mutual coherence condition. The mutual coherence of the dictionary DDD is

defined as

µ(DDD) , maxi6=j

|dddTi dddj|‖dddi‖2 ‖dddj‖2

. (2.3)

14


In [27] and [28] it is shown that if aaa is a solution of the under-determined system of

linear equations yyy = DDDaaa and the following sufficient condition holds

(2 ‖aaa‖0 − 1)µ(DDD) < 1, (2.4)

then the optimization programs (2.1) and (2.2) are equivalent and the convex relax-

ation (2.2) yields a unique sparsest solution as `0-norm.

Restricted isometry property. An alternative sufficient condition to guar-

antee the equivalence between (2.1) and (2.2) is the so-called restricted isometry

property (RIP). A matrix DDD has the RIP with a constant δk if δk is the smallest

constant that satisfies

(1− δk) ‖www‖22 ≤ ‖DDDaaa‖

22 ≤ (1 + δk) ‖www‖2

2 , (2.5)

for every www ∈ RP be a k-sparse vector (i.e., it has at most k nonzero elements).

In [29], it is shown that if δ2k ≤√

2 − 1, then `0-norm and `1-norm solutions are

equivalent. This bound has been further improved; for example [30] shows that if

δ2k < 0.4652, then we got the same unique solution via solving (2.2) instead of (2.1).

2.3 Sparse Representation for Classifica-

tion

Although the concept of sparsity was first employed to solve inverse recovery

problems, where it acts as a strong prior to the abbreviated ill-posed nature of the

15


problems, it did not take long for researchers to figure out that sparse representation

is also useful in discriminative applications as well [23,31,32] . Here the crucial obser-

vation is that the test sample can be represented effectively as a linear combination

of a few training samples in the same class, but not in the others. Therefore, the

sparse coefficient vector, which can be recovered via either `0-based greedy pursuit

methods such as orthogonal matching pursuit (OMP) [33] and subspace pursuit (SP)

[34], or `1-based convex programming problems such as iterative hard thresholding

algorithm (IHT) [35], can naturally be considered as the discriminative factor.

Recently, a well-known sparse representation-based classification (SRC) frame-

work was proposed in [32]; which is based on the assumption that all of the samples

belonging to the same class lie approximately in the same low-dimensional subspace.

This technique was first proposed for robust face recognition which yields remarkably

improvement over conventional algorithms under various distortion scenarios, in-

cluding illumination, disguise, occlusion, and random pixel corruption. Suppose we

are given a dictionary representing C distinct classes DDD = [DDD1,DDD2, ...,DDDC ] ∈ RN×P ,

where N is the feature dimension of each sample and the c-th class sub-dictionary

DDDc has Pc training samples dddc,pp=1,...,Pc , resulting the total samples of P =∑C

c=1 Pc

in the dictionary DDD. To label a test sample yyy ∈ RN , it is often assumed yyy can be

represented by a subset of the training samples in DDD. Mathematically, yyy is written

as

16


yyy = [DDD1,DDD2, ...,DDDC ]

aaa1

aaa2

...

aaaC

+ nnn = DDDaaa+ nnn, (2.6)

where aaa ∈ RP is the unknown coefficient vector and nnn is the low-energy noise

due to the imperfection of the test sample. Our assumption implies that only a

few coefficients of aaa are non-zeros and most of the others are insignificant. More

particularly, only entries of aaa that are associated with the class of the test sample

yyy are non-zeros, and thus, aaa is a sparse vector encoding the membership of yyy. The

classifier seeks the sparsest representation aaa by solving the `1-norm minimizations

(2.2). Once the coefficient vector aaa is obtained, the next step is to assign the test

sample yyy to a class label. This can be determined by simply taking the minimal

residual between aaa and its approximation from each class sub-dictionary

Class(yyy) = argminc=1,...,C

‖yyy −DDDcaaac‖2 , (2.7)

where aaac is the induced vector by keeping only the coefficients corresponding to the

c-th class in aaa. This step can be interpreted as assigning the class label of yyy to the

class that can best represent yyy.

Robustness to Outliers

SRC has been widely proved to outperform conventional classification algorithms

in many practical problems. Furthermore, it has the robustness to the existing of

17


severe occlusion or corruption by casting it as a sparse error vector eee with only a

few non-zero entries which may have arbitrarily large magnitudes. The corrupted

measurement yyy can be written as the summation of the clean signal and the error eee:

yyy = DDDaaa+ eee =

[DDD III

]︸︷︷︸

DDD

aaa

eee

︸︷︷︸

aaa

= DDDaaa (2.8)

To recover aaa (as well as the noise eee), the authors in [32] propose to simultaneously

minimize the `1-norm of both aaa and eee. This sparse recovery strategy is often referred

to as the extended `1-minimization or dense error correction [36]:

minaaa,eee

‖aaa‖1 + ‖eee‖1 s.t. yyy = DDDaaa+ eee. (2.9)

2.4 Multi-measurement Sparse Represen-

tation

In practice, many applications involve simultaneous representation of multiple

correlated signals, in which the particularly interested case is where data sensing

is performed simultaneously from multiple co-located sources/sensors, yet within

the same spatio-temporal neighborhood, recording the same physical events. In

the case of multiple measurements, rather than recovering each single sparse vec-

tor aaai (i = 1 , 2 , ..., K) independently, the inter-correlation between observations

18


in the sparse representation procedure can be further reinforced by concatenat-

ing the set of measurements YYY = [yyy1, yyy2, ..., yyyK ] ∈∈∈ RN×K and sparse vectors

AAA = [aaa1, aaa2, ..., aaaK ] ∈∈∈ RP×K and representing in the combined matrix manner

YYY = DDDAAA. This matrix representation not only can simultaneously recover the set of

sparse coefficient vectors aaai1≤i≤K but also brings another layer of robustness by

exploiting the prior-known structure of sparse supports among all testing samples.

2.4.1 Joint Sparse Representation

Joint sparse representation (JSR) which assumes the fact that multiple mea-

surements belonging to the same class can be simultaneously represented by a few

common training samples in the dictionaries has been successfully applied in many

applications, such as hyperspectral target detection [37, 38], acoustic signal classi-

fication [39], or visual data classification [40]. In the joint sparsity model, sparse

coefficient vectors aaaiKi=1 share the same support set, and thus, matrix AAA is a row-

sparse matrix with only a small number of non-zero rows. The sparse coefficient

vectors can be recovered jointly by solving the following `1,q-regularized minimiza-

tion:minAAA

‖AAA‖1,q

s.t. YYY = DDDAAA,

(2.10)

where the norm ‖AAA‖1,q, (q > 1), is defined as the sum of the `q-norm of rows of AAA .

In other words, this norm can be phrased as performing an `q-norm on each row to

19


enforce the ’joint’ and then `1-norm on the resulting vector to enforce the ’sparsity’.

It is clear that this `1,q regularization norm encourages shared sparsity patterns

across related observations, and thus, the solution of the optimization (2.10) has

common support at column level.

2.4.2 Group Sparse Representation

Adding group or class information is another common way to promote structure

within sparse supports by enforcing them to share common groups instead of rows.

This problem is critically beneficial for classification tasks where multiple measure-

ments not necessarily represent the same signals but rather come from the same set

of classes. This leads to group sparse representation where the dictionary atoms are

grouped and the sparse coefficients are enforced to have only a few active groups

at a time. Another factor that needs consideration is that although the multiple

signals have common active groups because they are of the same class, they are not

necessarily share the full sets, since they are not signals of the same event. This

means that the desired multi-task classification model not only wish the number of

active groups to be small, but also inside each group only a few members are active

at a time, resulting a two-level of sparsity model: group-sparse and sparse within

group.

For this problem, the group information of classification tasks needs to be given in

20


priority in the form of sub-dictionaries. To be more specific, the dictionary DDD is the

concatenation of sub-dictionariesDDD = [DDD1, DDD2, ..., DDDC ] where C is the total number

of groups or classes, and DDDc(c = 1 , 2 , ..., C) is a sub-dictionary corresponding to

the group c with column-size Pc. To promote group structure and sparsity in the

support sets simultaneously, collaborative hierarchical sparse representation (CHiSR

or C-HiLasso) [41] model is proposed as follow

minAAA

‖AAA‖1 + λG

C∑c=1

‖AAAc‖F

s.t. YYY = DDDAAA,

(2.11)

where AAAc is the sub-matrix extracted from AAA using the rows indexed by group c,

‖.‖F denotes the Frobenious norm of a matrix and λG is a parameter balancing

between two terms. While the first term ‖AAA‖1 encourages element-wise sparsity in

general, the second term is a group regularizer that tends to minimize the number

of active groups. It is noted that taking the Frobenious norm of a matrix is equal to

vectorizing that matrix and taking the `2-norm on the resulting vector. Therefore,

the group regularizer has similar effect as the `1,2-norm constrain appeared in the

JSR model but has a group-sparse instead of row-sparse property. In succession,

the minimization of both `1-norm and group regularizer in a combined cost function

promotes group-sparse and sparsity within group at the same time.

21


2.5 Low-rank Matrix Approximations

Low-rank matrix approximation is an efficient way of representing the signal

sparsity in the principal component domain. This is an intimate connection to

sparse signal representation theory and provides a robust alternative framework to

recover low-dimensional structures from high-dimensional observations, especially

for scenarios where the data is highly incomplete or severely damaged. In the most

general form, this problem consists of recovering a low-rank matrix X ∈X ∈X ∈ RN1×N2

from a set of M linear measurements: yyy = A(XXX) where A : RN1×N2 → RM is a

linear map. In order to reconstruct XXX, one would like to find the simplest model

that fits the low-rank observations

minXXX

Rank(XXX)

s.t. yyy = A(XXX).

(2.12)

Similar to the sparse representation case, the intractable and NP-hard optimiza-

tion problem of rank-minimization can be relaxed to a convex problem using a

nuclear norm minimization [42]

minXXX

‖XXX‖∗

s.t. yyy = A(XXX),

(2.13)

where the nuclear matrix norm ‖XXX‖∗ is defined as the sum of all singular values of

the matrix XXX.

Matrix completion [42,43] and robust principal component analysis (robust PCA

22


or RPCA) [7, 9] are two highly applicable low-rank matrix recovery techniques in

which matrix completion retrieves missing information while RPCA recovers an

underlying low-rank structure from its sparse but grossly corrupted entries. These

problems have been beneficial in solving a wide range of applications including

background modeling, target tracking [9], image alignment [44] or video denoising

[45] problems, etc.

2.5.1 Matrix Completion

A highly applicable subset of the general low-rank matrix recovery problem in (2.13)

is the matrix completion problem where the goal is to recover an unknown matrix

from a subset of its entries. Typically, given an incomplete observation matrix

YYY = XXX|Ω, where Ω is the index set of available entries of XXX, we want to recover

back the original matrix XXX with the prior knowledge that XXX is low-rank. Here, the

linear map A is an operator that sets unobserved entries to zero. Again, to achieve

XXX, a nuclear norm minimization is proposed as follows

minXXX

‖XXX‖∗

s.t. YYY = XXX|Ω.(2.14)

23


2.5.2 Robust Principal Component Analysis

Another recent landmark result introduced by Candes et. al. [7, 9] investigates

the problem of robust principle component analysis (RPCA). The question is how to

accurately recover a low-rank matrix from its grossly corrupted entries. Mathemati-

cally, letXXX be a low-rank data matrix. It frequently happens that we are not able to

observeXXX directly; instead we observe its corrupted version YYY = XXX+EEE. The matrix

EEE captures outliers, assumed to be sparse but can have arbitrarily large magnitudes.

To separate XXX and EEE, a principle component pursuit strategy is proposed

minX,EX,EX,E

‖XXX‖∗ + λE ‖EEE‖1

s.t. YYY = XXX +EEE,

(2.15)

where λE is a positive weighting parameter of the sparse noise. This problem in

some sense can be viewed as a generalization of matrix completion. In fact, if we

set entries of EEE such that EEEij = −X−X−X ij with (i, j) ∈ ΩC , then RPCA turns into the

matrix completion problem (2.14). However, RPCA is generally more difficult to

solve since it assumes no prior information of the support location of the outlier

entries.

24

Chapter 3

Structured Sparse Representation

with Low-rank Interference

3.1 Problem Formulation

The sparsity-based signal representation has been verified to perform well in

many range of practical applications. However, these results normally look elegant

only when the noise level is low, i.e., the noise power is bounded by some certain

small threshold. Multi-measurement sparse representation models usually have a

better performance in the presence of noise since they allow to incorporate the un-

derlying structure among correlated signals. However, in order for these algorithms

to work, the levels of noise still need to be in much lower scales compared with those

of the main signals.

25

CHAPTER 3. MODELS AND ALGORITHMS

Gross noise and errors are now ubiquitous in many modern applications such

as image/video/signal processing, machine learning or sensor network. In many

situations, the observed measurements capture not only the signals of interest but

also the undesired interferences that can be environmental noise, obstructed signal

from external sources, or intrinsic background information that is always present in

the signal. These interferences can be very large and effect everywhere, i.e., every

column in the measurement matrix is superimposed with some considerable interfer-

ence. In some extreme cases, the interference may even dominate the main signals,

making the whole observation be severely corrupted. Any conventional sparse rep-

resentation method therefore cannot be applied. Instead, an alternative model that

has the capability of efficiently subtracting the interference from the sparsity regu-

larization should be employed. Under the assumption that the interference in every

measurement shares similar structural property hence the whole interference matrix

behaves as a low-rank structure, we propose a robust model that effectively sep-

arates the low-rank interference from the sparse representation. The main signal

representation model is

YYY = DDDAAA+LLL+NNN, (3.1)

where YYY ∈ RN×K is the set of correlated data observations; DDD ∈ RN×P is the

sparsifying dictionary; AAA ∈ RP×K contains the sparse codes with certain sparsity

structure; LLL ∈ RN×K is the low-rank and/or sparse interference; and finally, NNN is

the low-energy common dense noise. Normally NNN has little effects on the signal

26


representation. For simplicity, in this thesis, the presence of NNN will be omitted from

all model descriptions, though it is still counted via the fidelity constraints penalized

by a Frobenious norm in the optimization process. The problem can be rephrased

as: given a training dictionary DDD ∈ RN×P and the observations YYY ∈ RN×K where

YYY = LLL + DDDAAA, we want to recover both AAA and LLL simultaneously. The matrix LLL

captures interference with the prior knowledge that LLL is low-rank, while AAA is a

sparsity coefficient matrix that promote the correlating structure among multiple

related measurements.

To separate LLL and DDDAAA, we propose a general model that simultaneously fits the

low-rank approximation and structure sparse regularizer at the same time:

minA,LA,LA,L

FS(AAA) + λL ‖LLL‖∗

s.t. YYY = DDDAAA+LLL,

(3.2)

where the nuclear matrix norm ‖LLL‖∗, defined as the sum of all singular values of the

matrix LLL: ‖LLL‖∗.=∑

i σi(LLL), is a convex-relaxed surrogate of the rank [9], FS(AAA) is

a convex structured sparsity-promoting penalty of AAA, and λL is a trade-off positive

weighting parameter to balance the two terms.

Our proposed model can also be viewed as the problem of decomposing a matrix

YYY into two factors: the sparse representation DDDAAA and the low-rank component LLL.

The first factor is assumed to have some prior knowledge given in advance and is

effectively described via the signal dictionaryDDD. Furthermore, this signal representa-

27


tion may reveal sparsity structures among multiple sparse coefficient vectors present

as columns of AAA. The second factor, while also staying in some low-dimensional sub-

spaces, does not have any prior signal information except low-rank property. Put

it differently, model (3.2) solves for the decomposition of a supervised sparse repre-

sentation DDDAAA and an un-supervised low-dimensional subspace LLL.

It is worth mentioning that if the information of the second component can be

learned, i.e., LLL can be further factored into LLL = DDDLAAAL with some given dictio-

nary DDDL, the observation matrix can be better expressed as YYY = DDDAAA +DDDLAAAL =[DDD DDDL

] AAA

AAAL

. The problem is then back to the standard sparse representa-

tion model with concatenated dictionary

[DDD DDDL

]and concatenated coefficient

matrix

AAA

AAAL

. Our model, however, works in a much more general case when

the second component is only learned to carry correlation among all measurements

and but no more signal information is provided (i.e., the interference dictionary is

unknown). Consequently, a low-rank minimization constraint is expected to obtain

this component.

Robustness to Outlier:

The joined model in (3.2) has the capability to extract a low-rank approximation

in LLL while promote structural sparsity in AAA at the same time. Moreover, it is

28


inherently robust in the case of presenting outliers in the data samples appearing

as whole columns (full measurement corruption such as sensor failures) or whole

rows corruption (corruptions at certain frequency bands or data samples). In other

words, if a small fraction of the measurements yyyiKi=1in YYY are grossly corrupted

while the others are clean, YYY can be further decomposed into YYY = DDDAAA + LLL + EEE,

where EEE contains a small number of non-zero columns. Since EEE can also be viewed

as a low-rank matrix, the summation LLL = LLL +EEE of the true low-rank interference

LLL and the outlier component EEE is also low-rank. Therefore, we can model the

new problem to seek for a low-rank component LLL and a structured-sparse matrix AAA

simultaneously. If the rank of LLL as well as the number of non-zero columns in EEE

are small enough, hence the rank of LLL is also low, the model will still well capture

both the wide-spread low-rank interference and outlier samples in LLL along with the

signal representation in DDDAAA.

3.2 Motivational Applications

Before going into model detail, we briefly present several potential applications

and show how well they can fit into our underlying models of interest (3.2). There

are many practical applications in which the signal observations can be decomposed

as a linear summation of a sparse signal representation and a low-rank interference

component. Specifically for the low-rank term, there are two main important sce-

29


narios when the interfered signal can be characterized as having low-rank property:

low-rank background and low-rank external signal interference.

Low-rank background: In many situations, the observation signals are not

standing alone. The recorded data may contain not only the signal of interest but

also the underlying background information. In some cases, the background content

is not easily extracted and can be intrinsically present in the observation signals all

the time. The background component of signals recorded by various sensors within

a small local area in a short span of time, however, should be stationary, hence

raising a low-rank background interference.

Low-rank interference signal: Another common scheme that happens in

many applications is when there are external sources interfering with the data

recording process. Since multiple observations are collected via recording the same

physical events in a small neighborhood area and within a small time frame, simi-

lar interference sources are picked up across all measurements promoting large but

low-rank corruption. These interference sources may include sound and vibration

from a car passing by, noise from a machine working nearby, or interference from

any radio-frequency source.

We now outlines some practical problems that have direct benefits from our

underlying models. The first problem is the radio-frequency interference (RFI)

suppression in ultra-wideband (UWB) radar systems. In a radar communication,

the received signals can be interfered by radio frequency radiations emitted from

30


random external sources. These RFI sources pose critical challenges for UWB radar

systems since (i) RFI often occupies a wide range of the radars operating frequency

spectrum; (ii) RFI might have significant power; and (iii) RFI signals are difficult

to predict and model due to the non-stationary nature as well as the complexity

of various communication devices. RFI signals, however, normally preserve a high

degree of correlation, hence can be considered as a low-rank interference component.

Consequently, a joint sparse and low-rank model has the capability for the separation

and then suppression of RFI signals from UWB radar data via modeling RFI as low-

rank components and main radar signals as a sparsity over a dictionary transform

in a joint optimization framework.

Another promising application in pattern recognition is when the underlying

structure of a tested sample is the combination of both target and background

contents. One particular data example is hyperspectral imagery where a testing pixel

should stay in a low-dimensional subspace of target and background dictionaries and

can be represented as a linear combination of a few training samples from the union

of these sets. If the background information is not well trained, i.e., the background

dictionary is not provided, a model that can automatically subtract the background,

which is potentially a low-rank structure, is necessary and a decomposition model

of sparse representation and low-rank component is well desired in this application.

One more interesting application is the problem of singer identification. The

audio signal from a song is the combination of both music and singing-voice com-

31


ponents. Due to the repetitious nature of musical accompaniment, the music com-

ponent is low-rank while the vocal component from a singer exhibits high temporal

correlation along the performance. Thus, a simultaneous low-rank and joint sparse

representation shows a potential way in recognizing the singing voice from its music

accompaniment. Some other potential applications include video-based facial ex-

pression recognition, noise robust speech recognition, or neural signal modeling in

the presence of very large noise or accidental interference. Those are just some crit-

ical real-world problems that involve decomposing a data matrix into a structural

sparse coding and a low-rank interference matrix, some of which will be extensively

applied and introduced in detail in the next chapter while the others will be inves-

tigated in future works.

3.3 Simultaneous Low-rank and Sparse

Representation Models

In the previous section, we introduced the general model of simultaneously de-

composing a matrix into low-rank and sparse representation components, in which

the general regularizer FS(AAA) in (3.2) captures sparsity property among the support

sets of the coefficient matrix AAA. In this section, we consider three circumstances of

FS enforcing on AAA: element-wise sparse , row-sparse, and hierarchical group-sparse

32


regularizations.

3.3.1 Sparse Representation with Low-rank In-

terference

In the first case, FS(AAA) penalizes as an `1 matrix-norm which purely promotes the

sparsity inAAA. This normally happens when every measurement vector is a separated

sparse representation but all of the measurements are effected by similar noises or

external source signals. Consequently, we enforce an overall sparsity in AAA, but do

not explore any structure among non-zero coefficients. Moreover, we normally use a

simple `1 matrix-norm in the case the structure among sparse supports of coefficient

vectors (columns of AAA) does not follow a simple arrangement (like row or group

properties), hence the sparsity structure is not easily modeled and optimized. The

sparse representation with low-rank interference model (SR+L - demonstrated in

Fig. 3.1) is proposed as follow:

minA,LA,LA,L

‖AAA‖1 + λL ‖LLL‖∗

s.t. YYY = DDDAAA+LLL.

(3.3)

It is noted that at first sight, (3.3) looks similar to robust PCA [9] that decom-

poses a matrix into a low-rank and a sparse matrix. However, the natures of the

two problems are different, both objectively and applicatively. Essentially, robust

33


Figure 3.1: Sparse representation with low-rank interference model.

PCA focuses on separating a sparse matrix from a low-rank component, hence the

sparse component normally embodies unpredicted structures and is regularized by

an `1-norm minimization. We, however, address in the decomposition of a low-rank

component from a sparsity over a dictionary transform. In other words, (3.3) is a

more general model than robust PCA and if we set DDD be an identity matrix then

(3.3) turns into the robust PCA problem. On the contrary, while we enjoy the

additional benefit of the sparsifying signal dictionary DDD, the optimization becomes

more complex and more restricted conditions are required to accomplish an accurate

decomposition. It can be predicted that the key to successful noise-source separa-

tion here depends heavily on the incoherence level between atoms in the training

dictionary and the subspace basis of the low-rank component.

On a different note, it is worthwhile to mention that while sparsity-based repre-

sentation with sparse noise has been an active research area in recent years [32,36],

very few works have explored the sparsity dictionary-based approach with large

and dense noise/interference appearing as low-rank. In [46], a recursive projected

34


compressed sensing method is proposed for the recovery of a sparse representation

from large but correlated dense noise by assuming that both signal and interference

components are changing very slowly over time and relying heavily on projecting

the estimated interference onto the nullspace of the current signal subspace. In [47],

Mardani et. al. propose to learn both low-rank and compressed sparse matrices

simultaneously and apply it to detect anomalies in traffic flows. The model does

not explore the underlying structure among the sparse coefficient vectors and is

developed mainly for a single re-constructive task. Our general structured sparse

representation with LRI framework, on the other hand, is not only able to deal with

low-rank widespread interference but also strengthens the sparse representation of

signals with different collective structures (such as row or group-sparse) among mul-

tiple measurements to solve for both reconstruction and classification problems.

3.3.2 Joint Sparse Representation with Low-rank

Interference

The second case of FS(AAA) is an `1,q matrix-norm (defined as the summation of `2-

norms of rows of AAA) that promotes a row-sparse (so-termed joint sparse) property in

the coefficient matrix AAA. Joint sparse representation has shown its efficiency in the

case measurement samples are recorded within the same spatial-temporal neighbor-

hood, tracing similar objects or events. This commonplace scenario, while revealing

35


Figure 3.2: Joint sparse representation with low-rank interference model.

common sparse supports representing the set of measurements, also ensures that

interference noise patterns are very similar, hence justifying the low-rank property

on the interference matrix. Therefore, we propose a joint sparse representation with

LRI (JSR+L) framework that can efficiently takes into consideration both low-rank

and row-sparse approximations in the same cost function:

minA,LA,LA,L

‖AAA‖1,q + λL ‖LLL‖∗


(3.4)

The `1,q-norm ofAAA, as outlined in section 2.4.1, enforces the sparse coefficient vectors

to share common support sets. This model is visually depicted in Fig. 3.2.

3.3.3 Group Sparse Representation with Low-rank

Interference

The last case that we consider in this section is when FS(AAA) performs as a hier-

archical group-sparse function. Our model robustifies the collaborative hierarchical

36


Figure 3.3: Group sparse representation with low-rank interference model.

Lasso (CHi-Lasso) model [41] with two levels of group sparsity and within-group

sparsity. CHi-Lasso has shown its advantages in solving many application domains

like face recognition, source identification and source separation [41, 48]. However,

how should we deal with the case when all of the measurements are effected by some

external noises , i.e., in the presence of large but low-rank noise. Take the speaker

identification problem as an example: all time frames of the voice signals may con-

tain common noises in the recording process (e.g., airplane or auto cabin). These

noises may be large enough to severely defect the identification process. However,

they normally have pattern, resulting a low-rank noise component in the representa-

tion. A group sparse representation with low-rank interference (GSR+L) framework

(illustrated in Fig. 3.3) is therefore beneficial in this case:

minA,LA,LA,L

‖AAA‖1+ λG

C∑c=1

‖AAAc‖F +λL ‖LLL‖∗


(3.5)

37


where the first two penalizations surrogate the sparsity coefficient matrix AAA to be-

have as a hierarchical group-sparse structure while the nuclear norm is adopted in

the third term to characterize the low-rank interference LLL. The structural property

of this decomposition is visualized in Fig. 3.3.

3.4 Algorithm

3.4.1 ADMM-based Algorithm

In this section, we propose a fast and efficient algorithm to solve for the general

problem (3.2) and specifically outline the detail iterative solution for each model

proposed in section 3.3. Model (3.2) is a convex optimization. However the pres-

ence of multiple variables and regularization constrains complicates the optimization

process. A common way to tackle this problem is based on the variable splitting

technique [12] which decouples each variable into two variables and use the classical

alternating direction methods of multipliers (ADMM) to iteratively solve for mul-

tiple simplified sub-problems. This method has been shown particularly efficient

in solving the `1-norm minimization [13]. Taking the GSR+L model for instance,

38


auxiliary variables are introduced to recast the problem (3.5) into:

minAAA,LLL,GGG,HHH

‖GGG‖1,q + λG

C∑c=1

‖HHHc‖F + λL ‖LLL‖∗

s.t. YYY = DDDAAA+LLL, AAA = GGG, AAA = HHH,

(3.6)

where GGG and HHH are new auxiliary variables that simplify the multiplex optimization

of the matrix AAA. The constrained optimization (3.6) is then reformulated to an

un-constrained counterpart by introducing the augmented Lagrangian function and

the algorithm is utilized by iteratively updating one variable at a time.

The variable splitting technique allows to break a hard-to-solve problem into

multiple sub-problems with simpler closed-form solutions, hence making the com-

plex minimization such as (3.2) solvable. However, together with introducing new

variables, the computation in each iteration step is also increased and more iterations

are likely required to achieve optimization convergence; ending more computational

time of the whole algorithm. In this problem, we introduce an alternative way to ef-

ficiently optimize (3.2) without relying on variable splitting approach. Our method

is still based on ADMM but introduce an approximation step to relieve the burden

of dictionary transform, yet guarantee convergence to the global optimal solution.

The augmented Lagrangian function of (3.2) is defined as

L(AAA,LLL,ZZZ) = FS(AAA) + λL ‖LLL‖∗ + 〈YYY −DDDAAA−LLL,ZZZ〉+µ

2‖YYY −DDDAAA−LLL‖2

F (3.7)

where ZZZ is the multiplier for the smoothness constraint, and µ is a positive penalty

39


Inputs: Matrices YYY and DDD, and weighting parameter λL.Initializations: AAA0 = 000, LLL0 = 000, j = 0.

While not converged do1. Update LLLk+1:

LLLk+1 = argminLLL

L(AAAk,LLL,ZZZk)

= argminLLL

λL ‖LLL‖∗+µ

2

∥∥∥∥LLL− (YYY −DDDAAAk+ZZZk

µ)

∥∥∥∥2

F

,(3.9)

2. Update AAAk+1:

AAAk+1 = argminAAA

L(AAA,LLLk+1,ZZZk)

= argminAAA

FS(AAA) +µ

2θ‖AAA− (AAAk − θTTT k)‖2

F ,(3.10)

where TTT k = DDDT (DDDAAAk − (YYY −LLLk+1 − 1µZZZk))

3. Update the multiplier:

ZZZk+1 = ZZZk+µ(YYY −DDDAAAk+1−LLLk+1) (3.11)

4. k = k + 1.end while

Outputs: (AAA, LLL) = (AAAk,LLLk).

Algorithm 1: Adaptive ADMM-based algorithm.

parameter. The algorithm then minimizes L(AAA,LLL,ZZZ) with respect to one variable

at a time by keeping others fixed and then updating the variables sequentially. The

algorithm is formally presented in Algorithm 1.

At the kth iteration, the algorithm has the following iteration scheme

1. Solve for LLLk+1 : LLLk+1 = argminLLL L(AAAk,LLL,ZZZk) (a)

2. Solve for AAAk+1 : AAAk+1 = argminAAA L(AAA,LLLk+1,ZZZk) (b)

3. Update the multiplier: ZZZk+1 = ZZZk+µ(YYY −DDDAAAk+1−LLLk+1) (c)

(3.8)

The algorithm involves two main subproblems to solve for the intermediate min-

40


imizations with respect to variables LLL and AAA at each iteration k, respectively. The

first optimization subproblem which updates LLL can be re-casted as

LLLk+1 = argminLLL

λL ‖LLL‖∗+µ

2

∥∥∥∥LLL− (YYY −DDDAAAk+ZZZk

µ)

∥∥∥∥2

F

. (3.12)

The proximal minimization in (3.12) can be solved via the singular value thresh-

olding (SVT) operator [49] in which we first define a singular value decomposition

(UUU,∆∆∆,VVV ) = svd(YYY −DDDAAAk+ ZZZkµ

). The intermediate explicit solution of LLLk+1 is then

determined by applying the soft-thresholding operator to the singular values:

LLLk+1 = UUUS λLµ

(∆∆∆)VVV ,

where the soft-thresholding operator of ∆∆∆ over λLµ

is defined as S λLµ

(δ) = max(|δ| −

λLµ, 0) sgn(δ) for every element δ in the diagonal of ∆∆∆.

The second subproblem to update AAA can be re-written as:

AAAk+1 = argminAAA

FS(AAA) +µ

2

∥∥∥∥DDDAAA− (YYY −LLLk+1+ZZZk

µ)

∥∥∥∥2

F

. (3.13)

When FS(AAA) is one of the three structural sparsity promoting functions as discussed

in section 3.3, the subproblem in (3.13) becomes a convex utility function. Unfor-

tunately, its closed-form solution is not easily determined. The difficulties not only

come from the structural-sparsity regularization (such as row-sparse or group-sparse)

on the variable AAA but also the operation over dictionary transformation on DDDAAA as

well as the engagement of multiple modalities. In fact, this subproblem alone is in a

more general form than other `1-based optimization problems. For example, if FS is

41


a mixed `1,q-norm, (3.13) is back to the joint-sparse representation framework; while

if FS regularizes a hierarchical group-sparse constrain as defined in (3.5), we are to

solve for the collaborative hierarchical sparse modeling problem outlined in section

2.4.2, which normally requires a multiple-iteration algorithm to achieve a converged

solution.

In order to tackle these difficulties, we do not solve for an exact solution of (3.13).

Instead, the third term in the objective function is approximated by its Taylor

expansion at AAAk (which is achieved from iteration k) up to the second derivative

order

∥∥∥DDDAAA−(YYY −LLLk+1 +ZZZkµ

)∥∥∥2

F≈∥∥∥DDDAAAk −(YYY −LLLk+1 +ZZZk

µ)∥∥∥2

F

+2 〈AAA−AAAk,TTT k〉+ 1θ‖AAA−AAAk‖2

F , (3.14)

where θ is a positive proximal parameter and TTT k = DDDT (DDDAAAk − (YYY −LLLk+1 + 1µZZZk))

is the gradient at AAAk of the expansion.

The first component in the right-hand side of (3.14) is constant with AAA. Con-

sequently, by replacing (3.14) into the subproblem (3.13), and manipulating the

last two terms of (3.14) into one component, the optimization to update AAA can be

simplified to

AAAk+1 = argminAAA

FS(AAA) +µ


F . (3.15)

The explicit solution of (3.15) can then be solved via the proximal operators as-

sociated with the composite norms preserved in FS. For FS being the `1-norm

42


component-wise sparsity constrain, (3.15) becomes

AAAk+1 = argminAAA

‖AAA‖1 +µ


F . (3.16)

which is the well known soft-thresholding shrinkage operator and the explicit so-

lution is derived by AAAk+1 = S µ2θ

(AAAk − θTTT k) with S being the afore-defined soft-

thresholding operator.

For the second case when FS is an `1,q-norm promoting row-sparsity, (3.15) can

be re-casted as

AAAk+1 = argminAAA

‖AAA‖1,q +µ


F . (3.17)

The intermediate solution of AAAk+1 can then be solved via the following lemma which

is generalized from [50].

Lemma 1 : Given a matrix RRR, the optimal solution to: minXXX

δ ‖XXX‖1,q+ 12‖XXX −RRR‖2

F

is the matrix XXX where the i-th row of XXX is given by:

XXX i,: =

‖RRRi,:‖q−δ‖RRRi,:‖q

RRRi,: if ‖RRRi,:‖q > δ

0 otherwises.

Finally, when FS is a hierarchical group-sparsity operator, we can rewrite (3.15)

as the following optimization

AAAk+1 = argminAAA

‖AAA‖1 + λG

C∑c=1

‖AAAc‖F +µ


F . (3.18)

It is noted that both ‖·‖2F and ‖·‖1 have element-wise separable structure, meaning

their operation on a matrix is equal to the summation of operations over all sub-

43


matrices that it is formed from. Applying this separable property into the first and

third terms of (3.18), we can further simplify it to solve for the sub-coefficient matrix

of each class separately

(AAAk+1)c=argminAAAc

‖AAAc‖1+λG‖AAAc‖F+µ

2θ‖AAAc−((AAAk)c−θ(TTT k)c)‖2

F

(∀c = 1, 2, ..., C). (3.19)

The explicit solution of (3.19) can then be solved via the following lemma:

Lemma 2 : Given a matrix RRR, the optimal solution to:

minXXX

α1 ‖XXX‖1+α2 ‖XXX‖F+1

2‖XXX−RRR‖2

F

is the matrix XXX:

XXX =

‖SSS‖F−α2

‖SSS‖FSSS if ‖SSS‖F > α2

000 otherwise.

where SSS is a matrix whose ijth element is defined by: SSSij = max(RRRij− 1α1, 0) sgn(RRRij).

3.4.2 Convergence Analysis

The algorithm 1 explicitly utilizes one approximation step to overcome the bur-

den of dictionary transform in the utility function, yet eliminate the use of decoupled

auxiliary variables. Furthermore, it is guaranteed to provide the global optimum of

the convex program (3.2) as stated in the following theorem.

Theorem 1 : If the proximal parameter θ satisfies the condition: σmax(DDDTDDD) <

1θ, where σmax(·) is the largest eigenvalue of a matrix, then AAAj,LLLj generated by

44


algorithm 1 for any value of the penalty coefficient µ converges to the optimal solution

AAA, LLL of (3.2) as j →∞.

Proof

Suppose AAA∗,LLL∗,ZZZ∗ is an optimal solution of (3.2) then from optimization

theory we have the following conditions

DDDTZZZ∗ ∈ ∂FS(AAA∗) (a)

ZZZ∗ ∈ ∂λL ‖LLL∗‖∗ (b)

YYY = DDDAAA∗ +LLL∗ (c)

(3.20)

Now, we consider the optimization condition for the minimization with respect to

AAA. From (3.10) it follows that

µ

θ((AAAk − θTTT k)−AAAk+1) ∈ ∂FS(AAAk+1)

⇔ µ

θ(AAAk− θ[DDDT (DDDAAAk− YYY +LLLk+1−

1

µZZZk)]−AAAk+1) ∈ ∂FS(AAAk+1) (3.21)

From (3.11) we have LLLk+1−YYY = 1µ(ZZZk−ZZZk+1)−DDDAAAk+1. Substitute this into (3.21)

µ

θ(AAAk−θ[DDDT (DDDAAAk+

1

µ(ZZZk−ZZZk+1)−DDDAAAk+1−

1

µZZZk)]−AAAk+1) ∈ ∂FS(AAAk+1)

⇔ µ

θ(AAAk −AAAk+1)− µDDDTDDD(AAAk −AAAk+1) +DDDTZZZk+1 ∈ ∂FS(AAAk+1). (3.22)

Furthermore, considering DDDTZZZ∗ ∈ ∂FS(AAA∗). This condition together with (3.22)

and the convex property of the projection FS lead to:

〈(AAAk+1 −AAA∗), (µ

θ(AAAk −AAAk+1)− µDDDTDDD(AAAk −AAAk+1) +DDDTZZZk+1 −DDDTZZZ∗)〉 ≥ 0

⇔ 〈DDD(AAAk+1−AAA∗),(ZZZk+1−ZZZ∗)−µDDD(AAAk−AAAk+1)+µ

θ〈(AAAk+1−AAA∗),(AAAk−AAAk+1)〉 ≥ 0.

45


(3.24)

Next, let consider the condition for the sub-optimization problem of the variable LLL.

The intermediate minimization (3.9) implies

µ(YYY −DDDAAAk +1

µZZZk −LLLk+1) ∈ ∂ ‖λLLLLk+1‖∗

⇔ µ(− 1

µ(ZZZk −ZZZk+1) +DDDAAAk+1 −DDDAAAk +

1

µZZZk) ∈ ∂ ‖λLLLLk+1‖∗

⇔ ZZZk+1 − µDDD(AAAk −AAAk+1) ∈ ∂ ‖λLLLLk+1‖∗ (3.25)

where, again, the second derivative comes from substituting (3.11) that denotes

LLLk+1 − YYY = 1µ(ZZZk − ZZZk+1) − DDDAAAk+1. It is noted that nuclear norm is a convex

function. This implies that from (3.25) and ZZZ∗ ∈ ∂ ‖λLLLL∗‖∗ we can lead to

〈(LLLk+1 −LLL∗), (ZZZk+1 −ZZZ∗)− µD(D(D(AAAk −AAAk+1) ≥ 0 (3.26)

Taking the addition on both sides of (3.23) and (3.26), we have:

〈(DDDAAAk+1 +LLLk+1)− (DDDAAA∗ +LLL∗), (ZZZk+1 −ZZZ∗)− µD(D(D(AAAk −AAAk+1)〉

+µ

θ〈(AAAk+1 −AAA∗), (AAAk −AAAk+1)〉 ≥ 0 (3.27)

Further considering (DDDAAAk+1 + LLLk+1) = YYY + 1µ(ZZZk − ZZZk+1) and (DDDAAA∗ + LLL∗) = YYY ,

(3.27) can be rephrased as

1µ〈ZZZk−ZZZk+1,ZZZk+1−ZZZ∗〉+ µ

θ〈(AAAk+1−AAA∗), (AAAk−AAAk+1)〉 ≥ 〈ZZZk−ZZZk+1,D(D(D(AAAk −AAAk+1)〉 (3.28)

46


By using the equalities ZZZk+1 − ZZZ∗ = (ZZZk+1 − ZZZk) + (ZZZk − ZZZ∗) and AAAk+1 − AAA∗ =

(AAAk+1 −AAAk) + (AAAk −AAA∗), (3.28) can yield

1

µ〈ZZZk −ZZZk+1,ZZZk −ZZZ∗〉+

µ

θ〈(AAAk −AAAk+1,AAAk −AAA∗)〉

≥ 1

µ‖ZZZk −ZZZk+1‖2

F +µ

θ‖AAAk −AAAk+1‖2

F + 〈ZZZk −ZZZk+1,DDD(AAAk −AAAk+1)〉 (3.29)

Now considering the two equalities:

‖ZZZk −ZZZ∗‖2F − ‖ZZZk+1 −ZZZ∗‖2

F = −‖ZZZk −ZZZk+1‖2F + 2 〈ZZZk −ZZZk+1,ZZZk −ZZZ∗〉 (3.30)

and

‖AAAk −AAA∗‖2F − ‖AAAk+1 −AAA∗‖2

F = −‖AAAk −AAAk+1‖2F + 2 〈AAAk −AAAk+1,AAAk −AAA∗〉 (3.31)

Taking the summation 1µ∗ (3.30) + µ

θ∗ (3.31):

1

µ(‖ZZZk −ZZZ∗‖2

F − ‖ZZZk+1 −ZZZ∗‖2F ) +

µ

θ(‖AAAk −AAA∗‖2

F − ‖AAAk+1 −AAA∗‖2F )

= 2[1

µ〈ZZZk −ZZZk+1,ZZZk −ZZZ∗〉+

µ

θ〈AAAk −AAAk+1,AAAk −AAA∗〉]

−(1


F +µ


F )

≥ 2[1


F +µ


F + 〈ZZZk −ZZZk+1,DDD(AAAk −AAAk+1)〉]

−[1


F +µ


F ]

=1


F +µ


F + 2 〈ZZZk −ZZZk+1,DDD(AAAk −AAAk+1)〉 (3.32)

where the inequality from the third derivation is achieved by substituting the in-

equality (3.29) into the first bracket.

47


Next, let consider

2 〈ZZZk −ZZZk+1,DDD(AAAk −AAAk+1)〉 ≥ −α ‖ZZZk −ZZZk+1‖2F −

1

α‖D(D(D(AAAk −AAAk+1)‖2

F

which is derived from the Cauchy inequality and holds for every real number α > 0.

Furthermore, applying the inequality ‖DDD(AAAk −AAAk+1)‖2F ≤ σmax(DDD

TDDD) ‖AAAk −AAAk+1‖2F ,

where σmax(DDDTDDD) is the largest eigenvalue of DDDTDDD, (3.32) can then be represented

as

1

µ(‖ZZZk−ZZZ∗‖2

F− ‖ZZZk+1−ZZZ∗‖2F ) +

µ

θ(‖AAAk−AAA∗‖2

F− ‖AAAk+1−AAA∗‖2F )

≥ 1

µ‖ZZZk−ZZZk+1‖2

F +µ

θ‖AAAk−AAAk+1‖2

F− α ‖ZZZk−ZZZk+1‖2F−

1

α‖DDD(AAAk−AAAk+1)‖2

F

≥ 1

µ‖ZZZk−ZZZk+1‖2

F +µ

θ‖AAAk−AAAk+1‖2

F− α ‖ZZZk−ZZZk+1‖2F−

1

ασmax(DDD

TDDD) ‖AAAk−AAAk+1‖2F

=(1

µ− α) ‖ZZZk−ZZZk+1‖2

F + (µ

θ− σmax

α) ‖AAAk−AAAk+1‖2

F (3.33)

This can be equivalently represented as

1

µ2(‖ZZZk −ZZZ∗‖2

F − ‖ZZZk+1 −ZZZ∗‖2F ) +

1

θ(‖AAAk −AAA∗‖2


≥ (1− αµ)1

µ2‖ZZZk −ZZZk+1‖2

F + (1− θσmaxαµ

)1


F . (3.34)

The inequality (3.34) is valid for every real number α > 0. Let α =√θσmaxµ

> 0 then

we have (1 − αµ) = (1 − θσmaxαµ

) = 1 −√θσmax. By defining β , 1 −

√θσmax > 0,

(3.34) becomes

1µ2

(‖ZZZk −ZZZ∗‖2F − ‖ZZZk+1 −ZZZ∗‖2

F ) + 1θ(‖AAAk −AAA∗‖2


≥ β[ 1µ2‖ZZZk −ZZZk+1‖2

F + 1θ‖AAAk −AAAk+1‖2

F ]. (3.35)

48


Let define WWW k ,

1µZZZk

√θAAAk

and WWW ∗ ,

1µZZZ∗

√θAAA∗

then (3.35) can be further

simplified to

(‖WWW k −WWW ∗‖2F − ‖WWW k+1 −WWW ∗‖2

F ) ≥ β[‖WWW k −WWW k+1‖2F ]. (3.36)

This implies that limk→∞ ‖WWW k −WWW k+1‖2F = 0 and ‖WWW k −ZZZ∗‖2

F is a monotonically

non-increasing and hence the sequence WWW k converges. Therefore, both the se-

quences AAAk and ZZZk converges to stationary points. From (3.11), it consequently

leads to the convergence of LLLk.

Suppose that AAAk,LLLk,ZZZk converges toAAA, LLL, ZZZ

. We will prove that

AAA, LLL, ZZZ

is also a global optimal of (3.2). Taking the limitation of (3.22) and (3.25) over k

and using the convergence of the sequenceAAAk which implies limk→∞(AAAk−AAAk+1) = 000,

we can lead to DDDTZZZ ∈ ∂FS(AAA) and ZZZ ∈ ∂λL∥∥∥LLLLLLLLL∥∥∥

∗. Furthermore, let k →∞ over

the equation (3.11), AAA and LLL are related by the equality YYY = DDDAAA+ LLL. These imply

that the tripleAAA, LLL, ZZZ

also satisfies the optimal solution conditions (3.20) of the

optimization (3.2), i.e., AAAk,LLLk,ZZZk converges to the optimal solution of (3.2).

49

Chapter 4

Applications on Structured Sparse

Representation with Low-rank

Interference

4.1 Introduction

There are many real world problems involving the linear decomposition of a su-

pervised sparse signal representation and a low-rank interference component. In

this chapter, we will extensively explore four practical problems and show how well

they can be profitable from our proposed models of interest. The improvement from

all of the experimental results adopted from our models compared with existing

50

CHAPTER 4. APPLICATIONS

conventional algorithms as well as modernized sparse representation methods will

verify the robustness of our models. Furthermore, by introducing the low-rank in-

terference concept, our methods provide a new and powerful approach in attracting

the problems involving large noise or signal interference that existing sparsity-based

techniques could not competently solve when the interference’s information is not

sufficiently given. Put it differently, while other current methods require a given

training dictionary of the engaging signal interference, our methods provide compa-

rable or even better results without demanding the input of this dictionary.

The four specific applications that we will discover in this section include:

• An adaptive framework for robust separation and extraction of multiple sources

of radio-frequency interference (RFI) from raw synthetic aperture radar (SAR)

signals in challenging bandwidth management environments.

• A chemical gas plume detection and classification algorithm in hyperspectral

video sequence when the background content almost dominates the target

chemical information.

• A robust speech recognition under the presence of various noise sources and

different noise levels, even with immense noises.

• An innovative method for video-based facial expression recognition given no

prior information of neutral face content.

51


4.2 Synthetic Aperture Radar Image Re-

covery

RFI sources poses critical challenges for UWB systems since (i) RFI often occu-

pies a wide range of the radars operating frequency spectrum; (ii) RFI might have

significant power; and (iii) RFI signals are difficult to predict and model due to the

non-stationary nature as well as the complexity of various communication devices.

Existing techniques for RFI suppression either employ filtering (notching) which in-

troduces other harmful side-effects such as side-lobe distortion and target-amplitude

reduction or RFI modeling/estimation/tracking which requires complicated narrow-

band modulation models or even direct RFI sniffing [51, 52]. We explore in this

section a joint sparse and low-rank model for the separation and then suppression of

RFI signals from UWB radar data via modeling RFI as low-rank components in a

joint optimization framework. The proposed framework is completely adaptive with

highly time-varying environments, does not require any prior knowledge of the RFI

sources (other than the low-rank assumption), and is capable of processing already-

contaminated radar directly. Both simulated data and real-world data measured by

the U.S. Army Research Laboratory (ARL) UWB synthetic aperture radar (SAR)

confirm that our RFI suppression technique successfully recovers UWB radar signals

corrupted by high-powered RFI signals.

52


Figure 4.1: ARL UWB MIMO forward-looking SAR system with 16 identicalreceiving antennas positioned in a 2-meter physical array and two transverse elec-tromagnetic (TEM) horn transmitting antennas located at both ends.

4.2.1 Introduction

A synthetic aperture radar transmits and receives electromagnetic waves by a

sensor or an array of sensors attached to a platform moving along the radar path

to generate the synthetic aperture [53]. The system shoots and collects signals at

a constant pulse repetition interval (PRI) along the cross-range or forward-range

directions to produce equally spaced aperture records. The collected data is then

processed by a backprojection algorithm to form high resolution SAR imagery.

53


In this application, we are interested in low-frequency ultra-wideband (UWB)

radar and communications systems which have played important roles in many prac-

tical applications. The U.S. Army Research Laboratory (ARL) has been developing

low-frequency UWB radar systems to detect difficult targets in various applications

such as foliage penetration (FOPEN) [53], ground penetration for improvised explo-

sive device (IED) detection [54], and sensing-through-the-wall (STTW) [55]. Such

systems (where one specific example is depicted in Fig. 4.1) must operate in the

low-frequency spectrum that spans from under 100 MHz to several GHz in order to

achieve the penetration capability while maintaining high imaging resolution. The

most critical challenge for any UWB system is that it must be able to operate in

the presence of others: collected radar information is corrupted in both time and

frequency domain by various RFI. This is a notoriously challenging problem due

to the dynamic and unpredictable nature of the noise sources, not to mention the

strength of the interference signals. Previous work in this RFI-suppression area

includes parametric noise modeling [56], spectral decomposition [57], and adaptive

filtering [58,59], all with limited successes. Most can only provide acceptable results

with one particular source of RFI. Several past efforts have taken advantage of the

low-rank RFI property to extract them via eigen-decompositions [60,61]. However,

these techniques heavily depend on the quality of the orthogonal subspaces and

cannot distinguish signal-versus-noise if they happen to have the same power within

the same subspace.

54


Recent sparsity-based approaches attempted in solving the RFI problem via

modeling both the raw SAR signal of interest as well as the interference to be

sparse with respect to well-designed dictionaries [51, 52]. The signal dictionary DDDR

is obtained from discrete time-shifted versions of the transmitted radar signal sss(t)

whereas the RFI dictionary DDDrfi is constructed from real observed RFI collected

from the environment with the radar transmitters turned off (we call this process

RFI sniffing). Hence, the observed data record yyyRi at aperture i is modeled as

yyyRi = DDDRaaaRi +DDDrfieeeRi + nnnRi

=

[DDDR DDDrfi

] aaaRi

eeeRi

+ nnnRi

(4.1)

where aaaRi and eeeRi are sparse coefficient vectors of the SAR and RFI signals at

aperture i, respectively, and nnnRi represents the typical unstructured dense noise with

small variance. Popular sparse recovery algorithms such as OMP or `1-minimization

variants can be employed to solve for both aaaR and eeeR simultaneously. This technique

processes each data record independently and requires prior knowledge of RFI via

sniffing.

4.2.2 Problem Formulation

In this section, we propose to solve the problem of extracting RFI signal from raw

SAR data using our SR+L model. It will be demonstrated that the RFI problem

55


can be solved if SAR signals are sparse with respect to a certain dictionary while

the RFI sources need to satisfy only one single assumption - low-rank. This RFI

property results from our focus on batch-processing of data collected from sensors

within a small spatial-temporal window. Finally, this approach processes raw SAR

data directly without involving the costly image formation step. Hence, it can be

incorporated into most existing systems as a pre-processing module prior to other

popular signal processing and image formation steps.

The proposed sparse-representation-plus-low-rank signal model for a SAR system

can be described as follows

YYY R = DDDRAAAR +LLLR +NNNR (4.2)

where columns of YYY R contain observed SAR signals within a small spatial-temporal

window, DDDR is the sparsifying dictionary for SAR signals as previously mentioned,

AAAR contains the sparse coefficients or sparse codes (thus DDDRAAAR describes the signals

of interest), LLLR represents the RFI corruption embedded in the observed data, and

NNNR is the typical dense Gaussian noise with low bounded power.

The proposed decomposition model in (4.2) and the corresponding SR+L recov-

ery algorithm rely on two key assumptions: (i) SAR signals of interest are sparse

or structured-sparse hence, XXXR is a sparse matrix; (ii) RFI components contain a

high degree of correlation hence LLLR is low-rank. The sparse nature of radar signals

have been well established in the compressed sensing community [52, 62–64]. The

56


Figure 4.2: Singular values of RFI component matrix LLLR in ranking ordermagnitude-wise where received signals in neighboring apertures are grouped to-gether. This RFI data is obtained with our transmitters turned off [13]-[14].

low-rank assumption on LLLR is confirmed in Fig. 4.2 which illustrates that the mag-

nitude of singular values of the pure-RFI matrix LLLR decays very quickly, indicating

that LLLR is low-rank. The majority of the RFI power concentrates within the top

10% of its components. Here, sensor co-location ensures that interference noise pat-

terns are very similar, and if observed signals are collected and processed within a

small spatial-temporal neighborhood, we believe that the RFI low-rank assumption

is always valid. Moreover, with the low-rank noise model in LLLR, we are able to

capture sensor failure (LLLR is column-sparse); adversary jamming (LLLR is row-sparse);

and/or dense background noises with large magnitude such as RFI from broadcast-

ing stations and cellular phone communications. Although there might be multiple

interference sources ranging from AM to FM radio, from digital TV broadcast to

cellular phone communications, the RFI is relatively sparse in frequency.

It can be further noticed that the sparsity coefficient matrix AAAR indeed reserves

57


some structural property since the system of SAR sensors are normally attached

as an array in a vehicle, running forward or crossing a scene to collect the radar

signals. However, the sparsity patterns are neither block-structured nor row-sparsity.

Therefore, the recovery algorithm that we will explore in this thesis is just the SR+L

algorithm which promote sparsity-wise on the coefficient matrix AAAR. Given the

observed data record YYY R and SAR sparsifying transform matrix DDDR, SR+L model

to simultaneously recover the SAR sparse coding matrix AAAR and radio-frequency

low-rank interference LLLR can be exhibited as the following

minAAAR,LLLR

‖AAAR‖1 + λL ‖LLLR‖∗

s.t. YYY R = DDDRAAAR +LLLR.

(4.3)

4.2.3 Experimental Results

In this section, we validate the proposed signal model and the sparse+low-rank

recovery algorithm above with several RFI-suppression experiments on two different

data sets. In both cases, the RFI involved is real - RFI was collected on the ARL

ground as shown in Fig. 4.1. All parameters are empirically tuned to our best effort

to achieve the highest possible SNR for each method.

UWB Mono-Static Side-Looking Simulation Data

The first experiment is conducted on an UWB simulated data set: mono-static

side-looking SAR data are collected from 300 aperture positions in a straight line,

58


RFI toSAR

PowerRatio

SAR SignalsCorrupted byRFI withoutProcessingSNR (dB)

RFI Notchingfollowed bySpectrumRecoverySNR (dB)

RFI Suppressionvia Sparse

Recovery withSniffing SNR

(dB)


Representationwith Low-rankRFI SNR (dB)

0.25 12.04 16.49 25.01 19.030.5 6.02 11.31 18.46 16.081 0.00 6.50 12.77 12.972 -6.02 3.06 9.45 9.765 -13.98 -1.33 4.49 5.4710 -20.00 -2.56 0.38 2.43

Table 4.1: RFI suppression comparison with side-looking mono-static simulationdata.

imaging a scene with around 40 point targets of random amplitudes at random

locations. The signal to noise ratio between the original SAR signals xxx and the

recovered signals , xxx as tabulated in Table 4.1 is defined as the root-mean-square

ratio expressed in dB scale: SNR(xxx, xxx) = 20 log10RMS(xxx)

RMS(xxx−xxx). Compared with latest

state-of-the-art notching (which also includes advanced spectrum recovery) [14] and

sparse recovery with sniffing techniques [13]-[14], the proposed method remains ef-

fective until the RFI level becomes weak, rendering the nuclear norm of LLLR in the

optimization ineffective.

UWB MIMO Forward-Looking Real Data

Next, the proposed spectral recovery technique is tested and evaluated using

real-world data from the ARL UWB SAR in forward-looking mode. The SNRs

in various RFI-to-SAR power ratios are shown in Table 4.2. Again, the proposed

59


(a) (b) (c)

(d) (e)

Figure 4.3: Comparison of RFI suppression performances with side-looking simu-lated data when RFI power is 5 times that of SAR signals: (a) original SAR imagesof about 40 point targets of different sizes, magnitudes, and locations; (b) RFI-corrupted image without any processing, SNR=-13.98dB; (c) recovered RFI imagefrom RFI notching (based on sniffing information) followed by spectrum recovery,SNR=-1.33dB; (d) RFI-suppressed image via sparse recovery with RFI sniffing,SNR=4.49dB; (e) RFI-suppressed image via proposed SR+L with low-rank RFImodeling technique, SNR=5.47dB.

60


RFI toSAR

PowerRatio

SAR SignalsCorrupted byRFI withoutProcessingSNR (dB)

RFI Notchingfollowed bySpectrumRecoverySNR (dB)


Recovery withSniffing SNR

(dB)


Representationwith Low-rankRFI SNR (dB)

0.25 12.04 11.16 19.04 16.560.5 6.02 9.22 16.68 13.671 0.00 6.93 13.00 10.652 -6.02 2.49 7.61 7.915 -13.98 -0.60 1.65 4.6310 -20.00 -1.54 0.48 2.29

Table 4.2: RFI suppression comparison with forward-looking ARL UWB MIMOreal data.

algorithm proves to be very effective given that RFI sniffing is not needed and the

scene of interest is quite complex. Figs. 4.3 - 4.5 compare the visual quality of

various recovered SAR images in both simulated and real-data cases - our proposed

method seems to offer an additional level of denoising advantage (objectively as well

as subjectively). In this experiment, we group only 10 neighboring apertures into a

batch and each batch is processed independently. So, YYY in this case has 160 data

columns (this radar configuration has 16 receiving channels per aperture position).

To sum up, we present in this section an effective RFI extraction algorithm based

on jointly minimizing the sparsity of the SAR signals and the rank of the RFI. Our

technique does not require any specific prior knowledge of the interference sources.

Experiments on simulated as well as real UWB SAR data sets show remarkable

robustness and confirm the methods validity.

61


(a) (b)

(c) (d)

Figure 4.4: Comparison of RFI suppression performances with ARL UWBforward-looking real-world data when RFI power is twice that of SAR signals: (a)original SAR image of a road with buried targets of interest (in the area enclosed inthe red rectangle); (b) recovered RFI image from RFI notching (based on sniffinginformation) followed by spectrum recovery, SNR=2.49dB; (c) RFI-suppressed im-age via sparse recovery with RFI sniffing, SNR=7.61dB; (d) RFI-suppressed imagevia proposed low-rank RFI modeling technique, SNR=7.91dB.

62


(a) (b) (c)

(d) (e)

Figure 4.5: Zoom-in portions of SAR images shown in Fig. 4.4 within the region ofinterest (red-rectangle region in Fig. 4.4(a)): (a) original SAR image; (b) corruptedSAR image without any processing; (c) recovered image from RFI notching; (d)RFI-suppressed image via sparse recovery with sniffing; (e) RFI-suppressed imagevia proposed low-rank RFI modeling technique without sniffing. One can observethat the proposed low-rank modeling technique yields visually-pleasing recoveredSAR image where the two targets of interest stand out clearly from the clutteredbackground.

63


4.3 Hyperspectral Gas Plumn Detection

and Classification

The second experiment that we use to verify our proposed methods is a chemi-

cal gas plume detection and classification problem for hyperspectral imagery (HSI)

data. Hyperspectral remote sensors collect information from hundreds of continuous

and narrow spectral bands [65]. Each hyperspectral pixel is a vector of various bands

which can discriminate the presenting materials based on its spectral characteristic.

One critical problem in the detection and classification problems of chemical plumes

via hyperspectral imaging data is the presence of underlying background which is

often observed to dominate the chemical gas content in each hyperspectral pixel.

Sparsity-based representation approach has proved to be an efficient way in tackling

this difficulty by enforcing each observed sample pixel as a sparse linear representa-

tion of both background and training target dictionaries. This approach, however,

must require prior-knowledge of a background dictionary which may not always be

given in practical settings.

In this section, we propose to effectively use our structured sparse representa-

tion with low-rank interference methods to solve the hyperspectral chemical plume

classification problem without relying on a background dictionary. The proposed

algorithm relies on two key observations: (i) each hyperspectral pixel can be ap-

64


proximately represented by a sparse linear combination of the training samples; and

(ii) neighborhood pixels from the same hyperspectral image as well as consecutive

hyperspectral frames usually have similar background content, hence promoting a

low-rank background component. SR+L and JSR+L models will be utilized in the

experiments. The dataset analyzed in this paper consists of three hyperspectral

sequences recording the release of chemical plumes captured at different scenarios.

Spectral signatures of 400 different chemical samples are also given to create the

chemical dictionary DDDH . The out-performance in comparison with conventional

classification methods like sparse logistic regression (SLR) [66] and support vec-

tor machine (SVM) [67] as well as existing sparsity-based methods with provided

background information will demonstrate the effectiveness of our proposed methods.

4.3.1 Introduction

The detection of chemical plumes from hyperspectral data is playing an impor-

tant role in remote sensing technology. Locating the area of gasses in the atmosphere,

identifying what gasses are present and how much concentration they are, as well

as monitoring their travelling path are the main concerns in a variety of scenarios

from examining chemical plants for EPA compliance to guiding evacuation routes

in times of industrial accident or terrorist action.

Hyperspectral remoting sensors collect information of hundreds continuous nar-

65


row spectral bands across electromagnetic spectrum. Therefore each HSI pixel is

a vector of various bands which has the capability to discriminate and classify the

presenting materials based on its spectral characteristic. A number of algorithms

have been proposed for the gas plume detection in hyperspectral data. The gener-

alized least square (GLS) via matched filters is one of the common approaches to

distinguish the plume from the background [68]. In [69], Foy et. al. used indepen-

dent component analysis to subtract the background clutter. The use of support

vector machine (SVM) [70] has also shown good performances to solve the gas plume

detection problem. Some of other approaches include hierarchical clustering [71] or

nonlinear Bayesian algorithms [72].

Sparsity-based representation has opened a new trend in effectively solving clas-

sification and target detection problems in hyperspectral imaging [37, 73]. Hyper-

spectral target detection via sparsity models relies on the fact that the spectral

signature of each pixel approximately lies in a low-dimensional subspace spanned

by the training samples of the same class with that pixel in the dictionary. The

information of each target pixel in a hyperspectral image is the combination of both

background and chemical signatures. Therefore, an observed sample yyyH being a

union of the background and target samples stays in a sparse domain of the combin-

ing background dictionary DDDB and hyperspectral chemical dictionary DDDH and can

be compactly represented as

66


Figure 4.6: Low-rank and joint sparse representation construction in a hyperspec-tral frame.

yyyH = DDDHaaaH +DDDBaaaB

=

[DDDH DDDB

] aaaH

aaaB

(4.4)

where aaaH and aaaB are the sparse codings of the chemical and background dictionar-

ies, respectively. A sparse recovery algorithm is then employed to solve for both

aaaH and aaaB simultaneously and a classifier will be imposed on aaaH to decide which

chemical that pixel contain. This technique, however, requires prior knowledge of

the background which is normally taken from a chemical-free hyperspectral frame.

67



In this section, we propose to solve the problem of chemical classification from

hyperspectral sequences using our SR+L and JSR+L models. The background com-

ponent can be considered as the interference in the representation of a target pixel

over the chemical dictionary. With the observation that hyperspectral images are

smooth as neighboring pixels usually consist of similar materials and thus their spec-

tral characteristics are highly correlated, we collect neighboring spatial pixels and

also the pixels in the same areas in the previous and future frames into columns of a

matrix YYY H as Fig. 4.6. The chemical representations AAAH of these pixels should have

common sparse supports with respect to the chemical dictionary DDDH , whereas the

background content LLLH should have very similar structure hence stay as a low-rank

matrix. Given the observation matrix YYY H and the hyperspectral chemical dictio-

nary DDDH , the coefficient matrix AAAH and background component LLLH are obtained

by solving the simultaneous joint sparse representation with low-rank background

interference problem using JSR+L model:

minAAAH ,LLLH

‖AAAH‖1,q + λL ‖LLLH‖∗

s.t. YYY H = DDDHAAAH +LLLH .

(4.5)

When the row-sparse coefficient matrixAAAH is obtained in all blocks, they can be com-

bined to determine the chemical indices presenting in the whole frame. Moreover,

for comparison purpose, we also employ SR+L which incorporates an element-wise

68


sparsity instead of a row-wise sparsity on AAAH . It will be shown in the experiments

that by exploiting the correlation of involving observations (demonstrated by a row-

sparse structure), we will gain better performance in classification results.


The performance of the proposed methods are compared with sparse logistic re-

gression (SLR) [66], support vector machine (SVM) [74] and two sparsity-based mod-

els: pixel-wise sparse representation and `12-norm joint sparse representation with

prior knowledge of a background dictionary. In the setups of the two sparsity-based

frameworks, an additional background dictionary DDDB is constructed by randomly

selecting a number of pixels in the first frame of each testing hyperspectral sequence

which is informed to be chemical free . This dictionary is concatenated with the

chemical dictionary to generate the combined training dictionary. The pixel-wise

sparsity model then represents each observed sample in a sparse linear combination

domain of the chemical-background dictionary, while joint sparse model enforces

neighboring pixels to share the same sparsity patterns of the representation. The

overall recognition rates defined as ratios of the total number of correctly classified

frames to the total number of frames that chemical is actually present, expressed as

a percentage, are reported in Table 4.3.

The improvements offered by our proposed techniques, especially the JSR+L

69


Sequences ’AA12’ ’R134a6’ ’SF6 27’ ’TEP 9’SLR 39.4 76.1 24.0 65.8SVM 48.5 67.2 24.0 68.5

Pixel-wise sparsity 45.4 73.1 24.0 80.8Joint sparse recovery 90.9 92.5 44.0 89.0

Proposed SR+L 90.9 89.6 36.0 82.2Proposed JSR+L 100.0 97.1 60.0 87.7

Table 4.3: Overall recognition rates from four hyperspectral video test sequences’AA12 ’, ’R134a6’, ’SF6 27 ’, and ’TEP 9 ’.

(a) (b)

Figure 4.7: Chemical detection performance from a frame of ”SF6 27” sequence viaapplying: (a) AMSD with the fore-known chemical type and a given chemical-freebackground frame (groundtruth) and (b) AMSD on the resulting classified chemicaland background component from our proposed JSR+L method

model, even not demanding the knowledge of the background content, validate the

robustness of our proposed sparsity-based representation with low-rank interfer-

ence algorithms. Furthermore, while providing very competitive results for chemical

classification in the tested hyperspectral sequences, they also reveal the background

component existing in the data samples, with which the area that chemical is present

can be detected. To verify this, we apply an adaptive matched subspace detector

70


(a) (b)

Figure 4.8: Chemical detection from a frame of ”TEP 9” sequence via applying:(a) AMSD with the fore-known chemical type and a given chemical-free backgroundframe (groundtruth) and (b) AMSD on the resulting classified chemical and back-ground component from our proposed JSR+L method

(AMSD) method [75] employing the generalized likelihood ratio between the projec-

tions to subspaces of two hypotheses denoting the gas plume absent and present to

detect the area the chemical gas appears. Figs. 4.7 and 4.8 shows the comparable

chemical detection results of AMSD applying on the ground truth chemical type

and a given chemical-free background frame and AMSD applying on the resulting

classified chemical and background component from our proposed JSR+L method

on ’SF6 27’ and ’TEP 9’ testing sequences. This further ascertain the robustness of

our proposed models in target detection and classification for hyperspectral imaging

data.

71


4.4 Robust Noise Speech Recognition

The third experiment that we conduct is the speech recognition under various

noisy conditions. In speech recognition, one normally has to face the case when the

recorded signals contain not only the speech signals of interest but also the various

interference audio signals which can be environmental noises (such as music noises,

or noises from street), background noises (such as car engine, factory machine, or

wind noises) or interference vocal noises from surrounding people. These noises are

normally unpredictable and sometimes may even dominate the main speech signals

to be recognized. A noise robust speech recognition model that can be totally

adaptive with noisy sources as well as efficiently works even with heavy noises is

therefore essential.

In this section, we propose a novel joint low-rank and sparsity framework for

speech recognition that is robust with various noisy conditions. The proposed meth-

ods only require the knowledge of the dictionary of speech exemplars which are la-

beled speech segments extracted from the training data [76]. A batch-processing of

multiple noisy speech segments in the mel-frequency cepstral (MFC) [77] domain

within a small time window are performed to sparsely represent the speech compo-

nent as linear combinations of atoms in speech training dictionary while the noisy

parts in all segments are separated and suppressed via modeling them as a low-

rank component in a joint optimization framework. The noise low-rank assumption

72


is verified with the observation that noisy components in a short period of time

normally stay stationary or have a high degree of correlation.

4.4.1 Introduction of Sparsity-based Speech Recog-

nition

Conventional speech recognition methods based upon hidden Markov models

(HMM) or Gaussian mixture models (GMM) [78] have been broadly proven to be

powerful in speech recognition when the levels of corrupted noises are insubstan-

tial. However, the performance of these methods normally degrades considerably

when the noisy environments are more complex and/or speech is corrupted by noisy

sources not formerly seen. Recently, an exemplar-based sparse representation frame-

work for speech recognition was developed in [76] which models each input noisy

speech as a sparse linear combination of speech and noise dictionary atoms. The

method is shown to perform better than HMM-based conventional recognizers at low

signal-to-noise ratios (SNRs). However, the model requires to have prior knowledge

of the training noises which is not always the case in reality.

Examplar-based Sparse Representation of Noisy Speech

This section introduces an overview of the work in [76] which is one of the

state-of-the-art sparsity-based speech recognition algorithms under noisy conditions.

73


In speech recognition, signals are normally not operated in their original forms.

In stead, they are transformed and processed in a featured spectrogram domain

which represents the spectro-temporal distribution of acoustic signals’ energy. Mel-

frequency cepstrum (MFC) [77] is one of the most common spectrogram transforma-

tions that are used in sparsity-based speech recognition methods since it tentatively

reserves the additivity property of speech and noise. The examplar-based approach

used in [76] is also operated in the MFC domain. Once the signal is transformed

into Mel-scale magnitude spectrogram domain, a sliding time window shift is then

processed by dividing an MFC utterance into a number of overlapping, fixed-length

windows to form a speech utterance matrix YYY S whose columns correspond to the

sequence of these windowed utterance segments. Noisy speech segments can now be

compactly represented in batch as a sparse linear combination of both speech and

noise exemplar dictionaries as followed:

YYY S = DDDSAAAS +DDDNAAAN

=

[DDDS DDDN

] AAAS

AAAN

(4.6)

whereDDDS andDDDN are given dictionaries of speech and noise sources; andAAAS andAAAN

are their corresponding sparse coefficient matrices. Consequently, DDDSAAAS andDDDNAAAN

correspond to the speech and noise spectrogram components constituting observed

noisy signals, respectively. A sparsity-based recovery algorithm is then adopted to

74


define the sparsity coefficients of both speech (AAAS) and noise (AAAN) simultaneously.

Finally, the sparse coding results for overlapping windows can be recombined and

averaged to determine the speech that utterance YYY S belongs to.


The exemplar-based sparse representation approach proposed in [76] has been

proved to be robust with varied noisy scenarios. However, this method is heavily

based upon the availability of a training noise dictionary, hence cannot be applied

for speech recognition under the presence of unpredicted and unobserved noises. Put

it differently, if a new test sample consists of noises whose sources are not included

in the noise dictionary DDDN , the model in [76] is unlikely to be able to recognize

that utterance. In this section we propose to use our proposed join low-rank and

sparse representation models to solve for the robust noise automatic speech recogni-

tion problem which is fully adaptive with noise sources. Our models are also based

on the exemplar-based approach to transform the raw speech data into the MFC

domain, followed by a time overlapping window shifting technique. We propose

that the speech component of each testing sample can be sparsely represented with

respect to a clean training speech dictionary while its noise part can be automati-

cally suppressed if it satisfies low-rank property. The low-rank assumption for the

noise part can be validated hinging on the observation that unwanted sound noises

75


recorded within a small time frame often stay stationary and normally occupy cer-

tain frequency ranges in the spectrogram. Some examples of these noise sources

include vent wind noise, the sound from a machine operating nearby, a car running

by, hospital noises or noises coming from an operating jet engine, just name a few.

Under the aforementioned analysis, our proposed sparse-representation-plus-low-

rank signal model for a robust noise speech recognition system can be illustrated as

follows

YYY S = DDDSAAAS +LLLS, (4.7)

where DDDS and AAAS are the corresponding dictionary and sparse codes of the speech

component, and LLLS denotes the low-rank noise/interference of the speech obser-

vation. Furthermore, in a speech recognition problem, the dictionary DDDS is the

concatenation of multiple sub-dictionaries of speeches. Therefore, we execute the

GSR+L algorithm over the speech data model (4.7). It is noted that SR+L algo-

rithm can also be applied on (4.7) without taking advantage of the group information

of the modeling dictionary.


We validate the proposed models for a digit recognition problem on the AURORA-

2 database [79] which contains connected digit utterances of various people speaking

sequences of digits corrupted by various noises at different SNRs. The speech dic-

tionary is constituted by 11 sub-dictionaries taken from a clean training data set,

76


clean 20 15 10 5 0 −520

30

40

50

60

70

80

90

100Test set A

SNR (dB)

Cla

ssifi

catio

n ra

te (%

)

MDTGMMSR with training noiseProposed SR+LProposed GSR+L

Figure 4.9: Comparison of digit speech recognition results - test set A.

in which the first 10 sub-dictionaries containing training samples of single digits ’0’

to ‘9’ and the last one is from training set of people speaking ‘oh’ (an alternative

illustration of ‘zero’ ). The AURORA-2 corpus data contains two test sets A and

B in which test set A comprises noisy subsets of four different noise types: subway,

car, babble, and exhibition hall at six SNR values: 20, 15, 10, 5, 0, and -5 dB and

test set B contains four different noise types: restaurant, street, airport, and train

station selected at the same SNR levels. Furthermore, each training material of

AURORA-2 consists of a clean and a multi-condition training set by mixing the

clean utterances with noises at various SNRs: 20, 15, 10, and 5 dB.

Our two proposed methods, SR+L and GSR+L are processed through both test

sets to subtract the low-rank noise and determine the coefficient matrix AAAS. A

class label for each utterance is then assigned using minimal error residual classi-

77


clean 20 15 10 5 0 −520

30

40

50

60

70

80

90

100Test set B

SNR (dB)

Cla

ssifi

catio

n ra

te (%

)

MDTGMMSR with training noiseProposed SR+LProposed GSR+L

Figure 4.10: Comparison of digit speech recognition results - test set B.

fiers. The results are then compared with other popular speech recognizers such

as missing data technique (MDT) [80], and GMM-based model 78, as well as the

exemplar-based sparse representation framework in [76] to verify the effectiveness

of the proposed methods. It is noted that in our proposed models, only clean utter-

ances are extracted to construct the training speech dictionaries while the compet-

ing sparsity-based representation method requires the information of both training

speech and noise dictionaries. The classification rates defined as ratios of the total

number of correctly recognized utterances to the total number of testing utterances

in percentages are plotted in Fig. 4.9 and Fig. 4.10, corresponding to two testing

data sets A and B, respectively. The experimental results with particular datasets

show that all methods constantly achieve high classification performance when the

noise levels are weak. However at substantially low SNRs, our proposed models

78


(a) (b)

(c) (d)

Figure 4.11: Decomposition results of GSR+L on the MFC coefficient domain fora speech test sample of digit number 7 corrupted via car engine noise at SNR=-5dB: (a) Original clean speech; (b) Noisy speech observation; (c) Recovered speechcomponent (sparse representationDDDSAAAS); and (d) Recovered noise component (low-rank noise LLLS).

79


(a) (b)

(c) (d)

Figure 4.12: Decomposition results of GSR+L on the MFC coefficient domain fora speech test sample of digit number 5 corrupted via vent wind noise at SNR=-10dB: (a) Original clean speech; (b) Noisy speech observation; (c) Recovered speechcomponent (sparse representationDDDSAAAS); and (d) Recovered noise component (low-rank noise LLLS).

80


outperform the other conventional speech recognizers and the sparsity model de-

veloped in [76]. These results prove that the proposed algorithms is completely

adaptive with different noise cases and very effective given that noise levels are very

high and noise conditions are quite complex.

Furthermore, to show the detail insides of our models, we visually illustrate in

Figs. 4.11 and 4.12 the decomposition results of sparse representation and low-rank

components for our proposed GSR+L on the MFC coefficient domain of two testing

samples. The close similarity between the original clean speech samples and the

recovered speech component even at very high noise levels (SNR=-5dB in the first

case and SNR=-10dB in the second one) further verify the sparse and low-rank

assumptions as well as show the robustness of our proposed algorithms.

4.5 Video-based Facial Expression Recog-

nition

In this section, we will explore another compelling idea of utilizing the proposed

structural sparse representation and low-rank decomposition models in a problem of

video-based facial expression recognition. Automatic recognition of human expres-

sions is an important problem in machine learning, especially in the applications of

robotics and natural human-machine interfaces. Given a video of a human face, the

81


question is how to recognize his/her emotional expression with the assumption that

no information of the person has been provided in advance? In other words, we want

to develop an algorithm that is capable of automatically identify if an individual is

happy or sad, surprise or fear, disgust or shy, etc, given that he or she has not been

observed before.

4.5.1 Motivations

Although humans can recognize facial expressions virtually without much effort

or delay, a reliable expression recognition by machine is still a challenging problem.

Most of the existing methods either model an expressed face as a graph of tracking

landmark points or learn key features that carry more amount of information related

to discriminative emotions like eyebrows and mouth [81].

Recently, an interesting approach namely bilinear model was proposed by Tanen-

baum et. al. [82]; which is based on the key idea that an expressed face can be

decomposed into an invariant component (neutral face) and an expression part (emo-

tion). Motivated by that idea, in [83], a dictionary-based sparsity model is developed

that representing an observed face as a sparse linear representation over a combined

dictionary of neutral faces and emotions of a set of training samples. This pro-

vides a framework that is capable of both detecting a person’s identity as well as

recognizing the emotion that he or she is expressing. The model, however, is com-

82


pletely dependent on the availability of a dictionary of neutral faces. Furthermore,

that dictionary need to include the information of the person that the algorithm is

processing. Therefore, the algorithm is limited to the scenario that information of

tested people have been precedently stored in the algorithm’s database before being

processed.


Our proposed model of video facial expression recognition is based on the as-

sumption that the expressed face in each frame is the superimposition of a neutral

face and an expression part or emotion. By concatenating consecutive frames into

columns of a matrix YYY E = [yyyE1|yyyE2|...|yyyEK ] where the column yyyEi (i = 1, 2, ..., K) is

taken by vectorizing a cropped face in the frame ith of the testing sequence, matrix

YYY E can be decomposed into

YYY E = LLLE +XXXE. (4.8)

The separability of the neutral face and emotion elements can be visually perceived

in Fig. 4.13. The neutral component LLLE carries the distinguishable properties for

identifying the person and stays stationary over time, hence is a low-rank matrix.

Ideally, LLLE is a rank-one matrix with each column being the exact neutral content.

Matrix XXXE is the emotion term that contains the information for the recognition

of facial expressions. This term can be further represented as a sparse linear com-

bination of a well-prepared training emotions DDDE. The data model for the facial

83


Figure 4.13: Separations of neutral faces and expression components.

expression recognition problem can be recasted as:

YYY E = LLLE +DDDEAAAE, (4.9)

whereAAAE is the sparse coefficients that is customized for the expression classification

purpose.

It should be noted that in our experiments, the training emotions constituting

the dictionary DDDE are simply taken by deducting neutral faces from the original

expressed faces in a set of training sequences. Moreover, DDDE comprises C sub-

dictionaries DDDE = [DDDE1 ,DDDE2 , ...,DDDEC ] resembling C prototypic expressions such as

happiness, sadness, or anger, etc. All in all, the problem can be well formulated into

84


our GSR+L model and explicitly established as the optimization:

minAAAE ,LLLE

‖AAAE‖1 + λG

C∑c=1

‖AAAEc‖F + λL ‖LLLE‖∗

s.t. YYY E = DDDEAAAE +LLLE.

(4.10)

Similar to the preceding application’s comparison mechanism, we also verify the

framework using SR+L algorithm which does not account for the group-structured

property among data jungles. SR+L model for the video-based facial expression

recognition problem can be demonstrated using a simplified optimization as follows

minAAAE ,LLLE

‖AAAE‖1 + λL ‖LLLE‖∗

s.t. YYY E = DDDEAAAE +LLLE.

(4.11)


We conduct experiments on the CK+ dataset [84], which consists of 321 se-

quences of emotions with seven prototype labels: angry, contempt, disgust, fear,

happiness, sadness, and surprise, to verify the effectiveness of our proposed meth-

ods. Furthermore, we compare our methods with the sparse representation for clas-

sification (SRC) model which was empirically argued in [83] as the state-of-the-art

technique in recognizing human being’s facial emotions.

Each facial video in the data set contains the first frame as a neutral face of

a person, followed by a sequence of faces of gradually increasing expressions. The

dataset is separated into two non-overlapping sets in which one is served as the

85


An Co Di Fe Ha Sa Su

An 0.71 0.01 0.07 0.02 0.01 0.03 0.16Co 0.07 0.60 0.02 0 0.16 0.03 0.12Di 0.04 0 0.93 0.02 0.01 0 0Fe 0.16 0 0.09 0.25 0.25 0 0.26Ha 0.01 0 0 0.01 0.96 0 0.02Sa 0.22 0 0.13 0.01 0.04 0.24 0.35Su 0 0.01 0 0 0.01 0 0.98

Table 4.4: Confusion matrix for SRC based emotion recognition with neutral facesexplicitly provided, in a similar setting with [85]. We randomly choose half fortraining and the other half for testing per class. The optimizer is OMP and thesparsity level is set to 35%. Results are averaged over 20 runs and rounded to thenearest. The total recognition rate is 0.80.

training and the other is used as the testing set. The training set of sequences are

then utilized to generate the emotion dictionaryDDDE which are ordinarily constituted

by taking the difference between the first neutral frame and the last expressed frame

in every sequence. For SRC, an additional dictionary of neutral frames taken from

sequences in both training and testing sets is also required. This means, in order to

make SRC work, the person of testing must have been previously observed. On the

other hand, our proposed models are verified in more challenging scenarios which

are completely independent from the knowledge of neutral face information.

Tables 4.4, 4.5, and 4.6 demonstrate the confusion matrices of the recognition

results, expressed as rates of exactly classified samples over the total numbers of

testing sequences for particular prototypic expressions, for SRC method and our

proposed SR+L and GSR+L models, respectively. In all tables, ’An’, ’Co’, ’Di’,

’Fe’, ’Ha’, ’Sa’ and ’Su’ are corresponding abbreviations of the seven labeled expres-

86



An 0.51 0 0.10 0.02 0 0.31 0.06Co 0.03 0.63 0.03 0 0.04 0.26 0.01Di 0.04 0 0.74 0.02 0.01 0.15 0.04Fe 0.08 0 0.01 0.51 0.03 0.19 0.18Ha 0 0.01 0 0.03 0.85 0.08 0.03Sa 0.09 0 0.04 0.04 0 0.70 0.13Su 0 0.01 0 0.02 0.01 0.02 0.94

Table 4.5: Confusion matrix for SLR based emotion recognition without explicitlyknowing neutral faces. We randomly choose 15 sequences for training and 10 fortesting per class. We let the optimizer run for 100 iterations. Results are averagedover 20 runs and rounded to the nearest. The total recognition rate is 0.70 with astandard deviation of 0.14.


An 0.77 0.01 0.09 0.02 0 0.07 0.04Co 0.08 0.84 0 0 0.03 0.04 0Di 0.05 0 0.93 0.01 0.01 0.01 0Fe 0.09 0.01 0.03 0.53 0.12 0.07 0.15Ha 0.01 0.02 0.01 0.02 0.93 0 0.03Sa 0.19 0.02 0.02 0.05 0 0.65 0.07Su 0 0.02 0 0.02 0 0.02 0.95

Table 4.6: Confusion matrix for C-HiSLR based emotion recognition on CK+dataset without knowing neutral faces. We randomly choose 15 sequences for train-ing and 10 sequences for testing per class. We let the optimizer run for 600 iterations.Results are averaged over 20 runs and rounded to the nearest. The total recognitionrate is 0.80 with a standard deviation of 0.05.

87


sions: angry, contempt, disgust, fear, happiness, sadness, and surprise. Overall, SRC

provides an average accurately recognized rate of 80% while SR+L is lowered by just

10%. This outlines very potential results of SR+L, even when the method does not

break out the dictionary of neutral faces and only one single low-rank assumption

is utilized for the neutral contents. Moreover, when the group/class information is

further explored in GSR+L, the overall exact recognition rate is boosted to 80%,

equalized to that of SRC. This performance, together with the fact that it can be

applied to never-been-observed human, expose another application domain that our

proposed models can be elegantly utilized.

88

Chapter 5

Multi-sensor Classification via

Sparsity-based Representation

with Low-rank Interference

In chapters 3 and 4, we studied the general framework based on simultaneously

minimizing the structured sparsity of the main signals and the rank of the interfer-

ences among multiple observations of the same signal type. The robustness of the

proposed models was verified in various application domains. In this chapter, we

further extend the framework for multi-sensor problems when multiple sources/sen-

sors within a small neighborhood simultaneously record the same physical event.

Specifically, we propose a collaborative sparse representation framework for multi-

sensor classification which exploits correlation as well as complementary information

89

CHAPTER 5. MULTI-SENSOR CLASSIFICATION

among homogeneous and heterogeneous sensors while simultaneously extracting the

low-rank interference term. Furthermore, we extend our framework to kernelized

models which rely on sparsely representing a test sample in terms of all the training

samples in a feature space induced by a kernel function. Extensive experiments are

conducted on a real data set for a multi-sensor classification problem focusing on

discriminating between human and animal footsteps. Results are compared with

conventional classifiers and existing sparsity-based representation methods to verify

the effectiveness of our proposed models.

5.1 Introduction

Multi-sensor classification has been an active research topic within the context

of various practical applications, such as medical image analysis, remote sensing,

and military target/threat detection [86–88]. These applications normally face the

scenerio where data sampling is performed simultaneously from multiple co-located

sources/sensors, yet within a small spatio-temporal neighborhood, recording the

same physical event. This commonplace scenario allows exploitation of the comple-

mentary features within the related signal sources to improve the resulting classifi-

cation performance. A variety of approaches have been proposed in the literature

to answer this question [89, 90]. These methods mostly fall into two categories:

decision in - decision out (DI-DO) and feature in - feature out (FI-FO) [88]. In

90


[89], the authors investigated the DI-DO method for vehicle classification problem

using data collected from acoustic and seismic sensors. They proposed to perform

local classification (decision) for each sensor signal by conventional methods such as

SVM. These local decisions are then incorporated via a maximum a posterior (MAP)

estimator to arrive at the final classification decision. In [90], a FI-FO method is

studied for vehicle classification using both visual and acoustic sensors. A method

is proposed to extract temporal gait patterns from both sensor signals as inputs

to an SVM classifier. Furthermore, the authors compared the DI-DO and the FI-

FO approaches on their dataset and showed a higher discrimination performance

of the FI-FO approach over the DI-DO counterpart. Conventional classifiers such

as sparse logistic regression (SLR) [66] and SVM [67] have also been employed to

jointly classify signals from multiple sensors/sources, either at decision level [89] or

feature level [88, 90].

In [1], Nguyen et. al. developed a multi-sensor fusion model based on the sparse

representation-based classification (SRC) theory, an algorithmic advance based on

the assumption that all of the samples belonging to the same class lie approximately

in the same low-dimensional subspace and was first proposed for face recognition

[32]. In this work, the related information from multiple sensors is fused via a

joint sparsity constraint not only within observations of each sensor but also among

all sensors. The model is also extended to account for sparse noise and empiri-

cally shown to outperform powerful classifiers widely used in machine learning in

91


a multi-sensor border patrol classification problem to discriminate between human

and human-animal footsteps.

Motivated by the work in [1], we propose a collaborative multi-sensor sparse rep-

resentation method for classification, which also incorporates simultaneous structured-

sparsity constraints, demonstrated via a row-sparse coefficient matrix, both within

and across multiple sensors. However, instead of considering the noise as a sparse

term as in [1], we study the case of presenting noise/interference as a low-rank signal.

This scenario is normally observed when the recorded data is the superimposition

of target signals with interferences which can be signals from external sources, the

underlying background that is inherently anchored in the data, or any pattern noise

that remains stationary during signal collection. These interferences normally have

correlated structure and appear as a low-rank signal-interference/noise. Regularly,

the model with the low-rank interference may be more appropriate for a multi-sensor

dataset since the sensors are spatially co-located and data samples are temporally

recorded, thus any interference from external sources will have similar effect on all

the multiple sensor measurements. The significant improvement in classification

performance conducted on the same multi-sensor datasets as in [1] further verifies

this low-rank assumption. Moreover, we extend the framework further to cover the

integration of a group sparse regularization into our model and the utilization of the

sparsity-based representation in the kernel induced feature space, both yielding one

more layer of classification robustness.

92


5.2 Multi-sensor Classification via Spar-

sity Models

Consider a multi-sensor system containing M sensors (so-called M tasks or

modalities) used to solve a C-class classification problem. For each sensor m =

1, ...,M , we denote DDDm = [DDDm1 ,DDD

m2 , ...,DDD

mC ] as an N × P dictionary, consisting of

C sub-dictionaries DDDmc ’s with respect to C classes. Here, each class sub-dictionary

DDDmc = [dddmc,1, ddd

mc,2, ..., ddd

mc,Pc

] ∈ RN×Pc , c = 1, ..., C, represents a set of training data

from the mth sensor labeled with the cth class; N is the feature dimension of each

sample; and Pc is the number of training samples for class c, resulting in a total of

P =∑C

c=1 Pc samples in the dictionary DDDm. Given a test sample set YYY collected

from M sensors YYY = [YYY 1,YYY 2, ...,YYY M ], where each sample subset YYY m from the sensor

m consists of T observations YYY m = [yyym1 , yyym2 , ..., yyy

mT ] ∈ RN×T , we would like to decide

which class the sample YYY belongs to. In our application, each observation is one

local segment of the test signal, where segments are obtained by simultaneously

partitioning the test signal of each sensor into T (overlapping) segments, as shown

in the Fig. 5.1.

In [1], a multi-sensor joint-sparse representation (MS-JSR) method is developed

by imposing shared row-sparsity constraints on all the coefficient matrices AAAm’s in

the linear representation of the sample subset YYY m over the training dictionary DDDm:

93


Sensor 1

Sensor m

Sensor M

Figure 5.1: Multi-sensor sample construction.

YYY m = DDDmAAAm, i.e. enforcing sparse coefficient vectors obtained from all sensors

to share similar sparsity patterns, hence promoting the common class association.

The model is then extended to deal with the presence of sparse noise, such as

impulsive noise or wind noise, which provides an improvement in the classification

performance.

5.2.1 Multi-sensor Joint-Sparse Representation with

Low-rank Interference

In this section, we study a new multi-sensor model that is capable of coping

with dense and large but correlated noises, so-termed low-rank interferences. This

94


scenario often happens when there are external sources interfering with the recording

process of all sensors. Since all the sensors are mounted onto a common sensing

platform, recording the same physical events simultaneously, similar interference

sources are picked up across all the sensors promoting large but low-rank corruption.

These interference sources may include sound and vibration from a car passing

by, a helicopter hovering nearby, or interference from any radio-frequency source.

Furthermore, in many situations, the recorded data may contain not only the signal

of interest but also the intrinsic underlying background noise that sometimes is not

well trained. In a multiple sensor system, the background portion of the signals

recorded by various sensors in a short span of time, especially sensors of the same

type and located within a small local area, should be stationary, hence raising a low-

rank background interference. Our proposed multi-sensor joint sparse representation

with low-rank interference (MS-JSR+L) model is expected to tackle this problem by

extracting the low-rank component while collaboratively taking advantage of having

correlated information from different sensors.

Mathematically, each set of measurements YYY m collected from sensor m are com-

posed of a linear representation of the sub-dictionaryDDDm’s atoms and an interference

component LLLm: YYY m = DDDmAAAm + LLLm. By concatenating the interference matrices

LLL = [LLL1,LLL2, ...,LLLM ] and the coefficient matrices AAA = [AAA1,AAA2, ...,AAAM ], LLL becomes

a low-rank component while AAA should be sparse at full row level. The coefficient

matrixAAA and low-rank interference component LLL can be recovered jointly by solving

95


the optimization problem

minAAA,LLL

‖AAA‖1,q + λL ‖LLL‖∗

s.t. YYY m = DDDmAAAm +LLLm (m = 1, ...,M),

(5.1)

where the norm ‖AAA‖1,q with q > 1 encourages shared sparsity patterns within each

sensor and across multiple sensors; the nuclear matrix norm ‖LLL‖∗ enforces the in-

terference effecting all sensors to stay in a low-dimensional space; and λL > 0 is a

weighting parameter balancing the two regularization terms.

Once the solution AAA, LLL of (5.1) is computed, the class label of YYY is decided by

the minimal residual rule

Class(YYY) = argminc=1,...,C

M∑m=1

∥∥∥YYY m −DDDmc AAA

m

c − LLLm

c

∥∥∥2

F, (5.2)

where DDDmc , AAA

m

c , and LLLm

c are the induced matrices associated with the cth class and

mth sensor, respectively. This step can be interpreted as collaboratively assigning

the class label of YYY to the class that can best represent all sample subset YYY m’s.

The model in (5.1) has the capability to extract the low-rank approximation in LLL

while promoting sparsity at row level in the concatenated matrixAAA at the same time.

Moreover, it is inherently robust in the case of existing outliers in the data samples.

In other words, many cases of sparse noise such as nonzero-row or nonzero-column

sparse matrices can also be viewed as a low-rank structure and recovered by the

nuclear norm minimization of ‖LLL‖∗, resulting a more flexible model in MS-JSR+L.

Take the data collected by the multi-sensor system presented in the experiment

section 5.4 as an example, it is frequently observed that the sparse noise compo-

96


nent appears as nonzero rows (corruptions at certain frequency bands) or nonzero

columns (sensor failures) which essentially can be extracted by a low-rank nuclear-

norm minimization. To be more specific, if a small fraction of the measurements

yyymt in YYY (m = 1, ...,M and t = 1, ..., T ) is grossly corrupted or all measurements

yyymt are affected at the same frequency sampling locations, each YYY m can be fur-

ther decomposed into YYY m = DDDmAAAm + LLLm + EEEm, where EEE = [EEE1,EEE2, ...,EEEM ] is

the concatenation of sparse noise components effecting all sensors that contains a

small number of non-zero columns or rows. While EEE is a sparse matrix, it can also

be viewed as a low-rank component, hence the summation LLL = LLL +EEE of the true

low-rank interference LLL and the outlier component EEE is also low-rank. Therefore,

we can model this new problem as seeking a low-rank component LLL and row-sparse

matrix AAA simultaneously.

In some applications (which are outside the scope of this paper) where sparse

and low-rank interferences are more discriminative, such as when locations of outlier

elements in EEE are more randomly distributed, the sparse noise cannot be considered

as a low-rank component and vise versa. Accordingly, we can extend the framework

further to cover a multi-sensor model that is robust even in the presence of both

sparse noise and low-rank interference as the following

minAAA,EEE,LLL

‖AAA‖1,q + λE ‖EEE‖1 + λL ‖LLL‖∗

s.t. YYY m = DDDmAAAm +EEEm +LLLm (m = 1, ...,M).

(5.3)

97


5.2.2 Multi-sensor Group-Joint-Sparse Represen-

tation with Low-rank Interference

MS-JSR+L has the capability to extract correlated noise/interference while si-

multaneously exploiting inter-correlation of multiple sensors in the coefficient ma-

trix by enforcing row-level sparsity, hence boosting the overall classification result.

Moreover, this model can even be strengthened by further incorporating the group

sparsity constraint into the coefficient matrix AAA. This leads to a group sparse rep-

resentation where the dictionary atoms are grouped and the sparse coefficients are

enforced to have only a few active groups at a time. Therefore, our desired classifica-

tion model not only enforce the number of active groups to be small, but also inside

each group only a few rows are forced to be active at a time, resulting in a two-

sparsity-level model: group-sparse and row-sparse in a combined cost function. The

new model searches for the group-and-row sparse structure representation among

all sensors and low-rank interference simultaneously and is termed MS-GJSR+L

minAAA,LLL

‖AAA‖1,q + λG

C∑c=1

‖AAAc‖F + λL ‖LLL‖∗

s.t. YYY m = DDDmAAAm +LLLm (m = 1, ...,M),

(5.4)

where AAAc = [AAA1c ,AAA

2c , ...,AAA

Mc ] is the concatenation of all sub-coefficient matrices AAAmc ’s

induced by the labeled indices corresponding to class c; the group regularizer defined

98


by the second term in (5.4) tends to minimize the number of active classes (groups)

in the same matrixAAA; and λG ≥ 0 is the weighting parameter of the group constraint.

Consequently, the model promotes group-sparsity and row-sparsity within a group

at the same time, in parallel with extracting the low-rank interference appearing in

all measurements all together. Once the solutions of coefficient matrix and low-rank

term are recovered, the class label of YYY is decided by the same function (5.1).

The optimization framework (5.4) is a more general form than all the other meth-

ods described earlier. Specifically, if we let λG = 0 then (5.4) becomes MS-JSR+L.

Furthermore, if we eliminate the presence of LLL (i.e., set LLL to be a zero matrix in

all optimization iteration), then it reduces to a framework where a joint-sparse con-

straint is advocated through out all sensors without taking care of the interference

noise. Note that different from the regularization of the group constraint, we can-

not set λL = 0 in this case, since otherwise the optimization (5.4) will erroneously

produce the irregular solution AAA, LLL = 000,YYY . Finally, if the number of sensor

reduces to M = 1, we simply have a joint-sparse representation with measurements

from a single sensor alone.

99


5.3 Multi-sensor Kernel Model

5.3.1 Background on Kernel Sparse Representa-

tion

Sparse representation has been widely known as an efficient method for classi-

fication when the test sample can be sparsely represented as a linear combination

of the training samples in the original input domain. In this section, we extend the

linear sparse representation to the nonlinear kernel domain and show empirically

that kernel methods can be an effective solution for a multi-sensor classification

problem. In fact, classifiers such as SVM or SLR have been extensively validated

in many applications that they perform better in the kernel domain. The reason is

that if the classes in the data set are not linearly separable, then the kernel methods

can be used to project the data onto a feature space, in which the classes become

linearly separable [20, 21].

Denote κ : RN × RN 7→ R as the kernel function, defined as the inner product

κ(xxxi,xxxj) = 〈φ(xxxi), φ(xxxj)〉 , where φ : xxx 7→ φ(xxx) is an implicit mapping that maps the

vector xxx onto a higher dimensional space, possibly infinite. Note that in general the

mapping function φ is not explicitly defined, but rather characterized by the dot

product of two functions. Commonly used kernels include the radial basis function

(RBF) Gaussian kernel κ(xxxi,xxxj) = exp(−‖xxxi − xxxj‖22 /η

2) with η used to control the

100


width of the RBF, and the order-d polynomial kernel κ(xxxi,xxxj) = (xxxi ·xxxj+1)d [74,91].

To describe the multi-sensor model in the kernel feature domain, we use similar

notations to define test samples and their dictionaries for multiple sensors as in

section 5.2, while the assumption is that the representations in the nonlinear kernel-

induced space of test samples within a sensor and among different sensors share the

same support sets.

Let YYY m = [yyym1 , yyym2 , ..., yyy

mT ] ∈ RN×T be the set of T test samples for sensor m

and ΦΦΦ(YYY m) = [φφφ(yyym1 ),φφφ(yyym2 ), ...,φφφ(yyymT )] be their mapping in the feature kernel space.

The kernel sparse representation of YYY m in terms of the training samples dddmp Pp=1

can be formulated as

ΦΦΦ(YYY m) = ΦΦΦ(DDDm)AAAm (m = 1, ...,M), (5.5)

where ΦΦΦ(DDDm) = [φφφ(dddm1 ),φφφ(dddm2 ), ...,φφφ(dddmP )] are training samples in the feature space

andAAAm is a sparse coefficient matrix associated with the signals from the mth sensor.

We recall that the coefficient matrix can be seen as the discriminative feature for

classification.

101


5.3.2 Multi-sensor Kernel Group-Joint Sparse Rep-

resentation with Low-rank Interference.

In section 5.2, we learned about different models that incorporate the structure

of noise into the optimization cost function to improve the classification accuracy.

Especially, we analytically and empirically showed that enforcing noise/interference

as a low-rank structure is of critical benefit in a multi-sensor problem. Moreover,

low-rank is a reasonable assumption to describe the structure of noise/interference

even through a nonlinear kernel mapping. This is due to the fact that although a

kernel transformation may distort the interferences from their original forms in the

time domain, the effects of the interferences on the multi-sensor test samples are

still analogous. This is even more valid when the processing data is taken from the

same sensor types. The dictionary-based description of test samples in (5.5) can

then be adapted as

ΦΦΦ(YYY m) = ΦΦΦ(DDDm)AAAm +LLLmφ (m = 1, 2, ...M) (5.6)

with LLLφ = [LLL1φ,LLL

2φ, ...,LLL

Mφ ] being the low-rank corruption in the kernel space for the

sample set YYY .

Since ΦΦΦ is not clearly define, (5.6) cannot be evaluated directly but is implicitly

represented in the feature space using the kernel trick. That means, we do not need

to explicitly express the data in the feature space; rather, we only evaluate the ker-

102


nel functions at the training points [92]. In fact, by left multiplying (ΦΦΦ(DDDm))T

on both sides of (5.6), it can be equally reformulated as: (ΦΦΦ(DDDm))TΦΦΦ(YYY m) =

(ΦΦΦ(DDDm))TΦΦΦ(DDDm)AAAm+(ΦΦΦ(DDDm))TLLLmφ . DefineKKKDDDmYYYm = (ΦΦΦ(DDDm))TΦΦΦ(YYY m),KKKDDDmDDDm =

(ΦΦΦ(DDDm))TΦΦΦ(DDDm), and KKKLLLm = (ΦΦΦ(DDDm))TLLLmφ . Then, (i, j) entry of KKKDDDmDDDm is defined

as the dot product between two dictionary atoms: κ(dddmi , dddmj ) =

⟨φφφ(dddmi ),φφφ(dddmj )

⟩, and

(i, t) entries of KKKDDDmYYYm and KKKLLLm are defined as the dot product between the dic-

tionary atom φφφ(dddmi ) and the observation sample φφφ(yyymt ): κ(dddmi , yyymt ) = 〈φφφ(dddmi ),φφφ(yyymt )〉

and the interference sample φφφ(lllmt ): κ(dddmi , lllmt ) = 〈φφφ(dddmi ),φφφ(lllmt )〉, respectively. Fur-

thermore, by stacking KKKLLL = [KKKLLL1 ,KKKLLL2 , ...,KKKLLLM ] and using simple algebras it can

be proved that rank[KKKLLL1 ,KKKLLL2 , ...,KKKLLLM ] ≤ rank[LLL1φ,LLL

2φ, ...,LLL

Mφ ], or the low-rank

assumption on LLLφ still hold for KKKLLL.

Under all the assumptions above, we propose a multi-sensor kernel group-joint

sparse representation with low-rank interference (MS-KerGJSR+L) which is the

kernelized extension of MS-GJSR+L

minAAA,KKKLLL

‖AAA‖1,q + λG

C∑c=1

‖AAAc‖F + λL ‖KKKLLL‖∗

s.t. KKKDDDmYYYm = KKKDDDmDDDmAAAm +KKKLLLm , (m = 1, ...,M).

(5.7)

The classification assignment for YYY in the kernelized model (5.7) is slightly modified

from (5.2) with attentive manipulations of involving kernel functions.

103


5.4 Experimental Results

5.4.1 Experimental Setups

1. Data collection. Footstep data collection was conducted by using two

sets of nine sensors consisting of four acoustic, three seismic, one passive infrared

(PIR), and one ultrasonic sensors over two days (see Fig. 5.2 for all four different

types of sensors). The test subjects are human only and human leading animals

(human-animal), in which the human only footsteps include one person walking,

one person jogging, two people walking, two people running, and a group of people

walking or running; whereas the human-animal footsteps include one person leading

a horse or dog, two people leading a horse and a mule, three people leading a horse,

a mule and a donkey, and a group of multiple people with several dogs. To make

the data more practical, in each test, the test subjects are asked to carry varying

loads, such as backpack or a metal pipe. In addition, test participants might include

males, females or both. Ideally, we would like to discriminate between human and

wild animal footsteps. However, footstep data with only wild animals is difficult to

collect and this is the best data collection setup that researchers at the U.S. Army

Research Laboratory have performed.

During each run, the test subjects follow a path, where two sets of nine sensors

were positioned, and return to the starting point. The two sensor sets are placed

104


Data Human Human-animalDEC09 16 15DEC10 18 19

Table 5.1: Total amount of data collected in two days.

Figure 5.2: Four acoustic sensors (left), seismic sensor (middle left), PIR sensor(middle right) and ultrasound sensor (right).

100 meters apart on a path. A total of 68 round-trip runs were conducted in two

days, including 34 runs for human footsteps and another 34 runs for human-animal

footsteps. To increase the number of test and training samples, we consider each trip

going forward or backward as a separate run. In other words, the total number of

runs is doubled. The total collected data, named DEC09 and DEC10, corresponding

to two different days in December 09 and 10, are shown in Table 5.1.

2. Segmentation. To accurately perform classification, it is necessary to ex-

tract the actual events from the run series. Although the raw signal of each run

might last several minutes, the event is much shorter as it occurs in a short period

of time when the test subject is close to the sensors. In addition, the event can be

at arbitrary locations. To extract useful features, we need to detect time locations

where the physical event occurs. To do this, we identify the location with strongest

signal response using the spectral maximum detection method [93]. From this lo-

105


cation, ten segments with 75% overlap on both sides of the signals are taken; each

segment has 30, 000 samples corresponding to 3 seconds of physical signal. This pro-

cess is performed for all the sensor data. Overall, for each run, we have nine signals

captured by nine sensors; each signal is divided into ten overlapping segments, thus

M = 9 and T = 10 in our formulations.

Fig. 5.3 visually demonstrates the sensing signals captured by all nine sensors

for one segment where the ground-truth event is one person walking. As one can

observe, different sensors characterize different signal behaviors. The seismic signal

shows the cadences of the test person more clearly, while it is more difficult to

visualize this event from other sensors. In this figure, note that the forth acoustic

signal is corrupted due to the sensor failure during the collection process.

3. Feature extraction. After segmentation, we extract the cepstral features

[94] in each segment and keep the first 500 coefficients for classification. Cepstral

features have been proved to be very effective in speech recognition and acoustic

signal classification. The feature dimension, which is represented by the number of

extracted cepstral features, is N = 500.

5.4.2 Comparison Methods

To verify the effectiveness of our proposed approaches, we compare the results

with several conventional classification methods such as (joint) sparse logistic re-

gression (SLR) [66], kernel (joint) SLR [95], linear support vector machine (SVM),

106


0 0.5 1 1.5 2 2.5 3x 104

−0.05

0

0.05

secondma

gnitu

de

First accoustic sensor

0 0.5 1 1.5 2 2.5 3x 104

−0.2

0

0.2

second

magn

itude

Second accoustic sensor

0 0.5 1 1.5 2 2.5 3x 104

−0.05

0

0.05

second

magn

itude

Third accoustic sensor

0 0.5 1 1.5 2 2.5 3x 104

−10

−5

0

5x 10−4

second

magn

itude

Forth accoustic sensor

0 0.5 1 1.5 2 2.5 3x 104

−5

0

5x 10−3

second

magn

itude

First seismic sensor

0 0.5 1 1.5 2 2.5 3x 104

−10

−5

0

5x 10−3

second

magn

itude

Second seismic sensor

0 0.5 1 1.5 2 2.5 3x 104

−0.02

0

0.02

second

magn

itude

Third seismic sensor

0 0.5 1 1.5 2 2.5 3x 104

1.6

1.65

1.7

1.75

second

magn

itude

Pasive infrared sensor

0 0.5 1 1.5 2 2.5 3x 104

−1.935

−1.934

−1.933

−1.932

second

magn

itude

Ultrasonic sensor

Figure 5.3: Signal segments of length 30000 samples captured by all the availablesensors consisting of four acoustic, three seismic, one PIR and one ultrasonic sensors.

107


and kernel SVM [74]. For the conventional classifiers such as SVM and SLR, we

incorporate information from multiple sensors by concatenating all M sensors’ train-

ing dictionaries to form an elongated dictionary DDD ∈ RNM×P . Atoms of this new

dictionary DDD are considered as the new training samples and used to train the SVM

and SLR classifiers. These classifiers are then utilized to test on the concatenated

test segments and a voting scheme is finally employed to assign a class label for each

test signal. For the kernel versions, namely KerSVM and KerSLR, we use an RBF

kernel with bandwidth selected via cross validation.

Another more effective method to exploit the information across the sensors is

to use the heterogeneous model proposed in [95]. This model, called heterogeneous

feature machine (HFM), has been shown to be efficient in solving problems in which

various modalities (sensors) are simultaneously employed. The main idea of this

model is to associate to all the training data in each training dictionary DDDm ∈

RN×P an appropriate coefficient vector aaam ∈ RP , m = 1, ...,M , using a sparsity

or joint sparsity regularization together with the logistic loss taken over the sum

of all sensors. Once these coefficient vectors are obtained, each segment of the test

sample is then assigned to a class and the final decision is made by selecting the

label that occurs most frequently. This method can also be generalized to kernel

domain, referred as KerHFM in our experiments.

All the aforementioned methods (SVM, SLR, and HFM and the kernelized ver-

sions) can be seen as a combination of both the FI-FO and DI-DO categories:

108


feature-level fusion across multiple sensors and then decision-level fusion on the

observation segments of each test signal. Although these methods are efficient in

combining information from different sensors, they are clearly suboptimal in combin-

ing information within each sensor. One example to demonstrate this sub-optimality

is that if the event does not fully exist in all the observation segments, then fusing

all the observation segments at the decision level will probably result in a misclas-

sification.

5.4.3 Classification Results and Analysis

In this section, we perform extensive experiments on the multi-sensor data set

and compare with aforementioned methods to verify the effectiveness of our pro-

posed models. The variety of different sensor types allows us to test various sensor

combination setups, including single sensor, sensors of the same type or sensors of

different signal types, in order to provide deeper understanding of advantage and

disadvantages of each proposed method. For presentation purpose, we number the

nine sensors as S1, S2, ... S9 in which sensors S1−4, S5−7, S8 and S9 correspond to

the four acoustic, three seismic, one PIR, and one ultrasonic sensors, respectively.

For all methods, 15 combination sets of sensors are processed and compared,

in which the first nine sets are conducted separately using only one single sensor,

corresponding to S1, S2, ... S9. The next six sets combine multiple sensors into

various scenarios as listed in Table 5.2. It is noticed during experimentation that

109


Set 10 11 12 13 14 15

Sensors S1−2 S5−7 S1−4 S1−7 S1−2,5−9 S1−9

Table 5.2: List of sensor combinations.

part of the testing data collected from two acoustic sensors S3 and S4 in DEC09

is completely corrupted due to the malfunction of these two sensors. So in set

10 we only use the two clean acoustic sensors S1 and S2. Set 11 and 12 are the

combinations of signals of the same types with set 11 using all three seismic sensors

and set 12 using all four acoustic sensors, respectively. Set 13 utilizes all acoustic

and seismic signals. In set 14, we evaluate the effectiveness of using all four different

types of sensors including two clean acoustic sensors S1−2, three seismic sensor S5−7

as well as the PIR and ultrasonic sensors. And finally we use all the nine sensors

referred to as set 15.

In the first experiment, we use the DEC10 data for training and the DEC09

data for testing, which leads to 72 training and 60 testing samples. For each sensor

m, the corresponding training dictionary DDDm is constructed from all the cepstral

feature segments extracted from the 72 training signals. In our experiments, ten

overlapping segments are taken from each individual sensor signal. Therefore, each

training dictionary DDDm is of size 500 × 720 and the associated observation YYY m

is of size 500 × 10, where 500 is the feature dimension. Our proposed methods,

which are based on different assumptions of the structures of the sparse coefficient

vectors, the low-rank assumptions of noise/interference and the linearity properties

110


of signal representations, are processed for all the 15 sensor sets to determine the

joint coefficient matrix AAA and the class label is determined by the corresponding

minimal error residual classifiers. Note that we set q = 2 and use cross validation

to define all the regularization parameters that give the best accurate results. Next,

the different competing methods including conventional classifiers (SLR, SVM and

HFM), their kernelized models, as well as other sparsity-based methods (MS-JSR,

MS-JSR+E and MS-KerJSR corresponding to multi-sensor models using joint sparse

representation, joint sparse with sparse noise and joint sparse in kernel domain) are

performed on the same sensor sets. The classification rates, defined as the ratios

of the total number of correctly classified samples to the total number of testing

samples, expressed as percentages, are plotted in Fig. 5.4.

To validate the efficiency of our proposed methods, we rerun the experiments

using the DEC09 data for training and the DEC10 data for testing. Again, similar

phenomenons are observed as can be seen in Fig. 5.5. We tabulate the classification

performance of all the proposed models as well as the competing methods in Table

5.3-(a) and 5.3-(b), taking DEC09 and DEC10 as testing samples respectively. The

second and third columns in each table describe the classification accuracy by using

a single sensor and multiple sensors (which average the classification rates of sets 1-9

and 10-15, respectively), and the last column shows the overall results by averaging

over all 15 sensor sets. Furthermore, we illustrate in Table 5.4 the detail classification

performance of the set 15 which accommodates all nine sensors and is the most

111


1 2 3 4 5 6 7 8 9 10 11 12 13 14 1545

50

55

60

65

70

75

80

85

90

95

Sensor Set

Cla

ssifi

catio

n ra

te (%

)

MS−JSRMS−JSR+EMS−JSR+LMS−GJSR+LMS−KerJSRMS−KerGJSR+LSLRKerSLRHFMKerHFMSVMKerSVM

Figure 5.4: Comparison of classification results - DEC09 as test data.

interesting set among all sensor combinations. The last three columns correspond

to the classification accuracy of human (H), human-animal footsteps (HA), and the

overall accuracy (OA), respectively.

Figs. 5.4 and 5.5 visualize our proposed models with solid lines which clearly

show that they out-perform the other competing methods presented by dashed lines.

Especially, we observe the distinct leading performance of the four frameworks:

MS-JSR+L, MS-GJSR+L, MS-KerJSR, and MS-KerGJSR+L. Moreover, Table 5.3

points out that MS-GJSR+L exhibits the best performance when multiple sensors

are utilized and MS-KerGJSR+L achieves the highest average classification rate

when an individual sensor is used (bold numbers). The kernelized and low-rank

112


1 2 3 4 5 6 7 8 9 10 11 12 13 14 1545

50

55

60

65

70

75

80

85

90

95

Sensor Set

Cla

ssifi

catio

n ra

te (%

)

MS−JSRMS−JSR+EMS−JSR+LMS−GJSR+LMS−KerJSRMS−KerGJSR+LSLRKerSLRHFMKerHFMSVMKerSVM

Figure 5.5: Comparison of classification results - DEC10 as test data.

interference joint method also achieves the best classification rate when averaging

results of all 15 examining sensor sets (with MS-GJSR+L as the closest runner up).

Next, we take a closer look to have a better understanding and summarizes the main

practical benefits of our proposed models.

Combining different sensors. It is obvious from the plots in Figs. 5.4 and

5.5 that there is a significant performance boost in the classification results going

from data sets 1-9 (using single sensor) to data sets 10-15 (using multiple sensors);

demonstrating the improvement of incorporating multi sensors in our sparsity-based

representation methods over only processing signals within one sensor alone. Quan-

titatively, the average improvements of multiple sensors over individual sensor range

113


MethodsSinglesensor

Multiplesensors

Combineall sets

MS-JSR 66.30 77.22 70.67MS-JSR+E 66.85 80.28 72.22MS-JSR+L 76.48 90.28 82.00

MS-GJSR+L 78.70 91.67 83.89MS-KerJSR 78.15 89.72 82.78

MS-KerGJSR+L 79.44 90.56 83.89SLR 64.44 71.94 67.44

Ker-SLR 66.85 75.56 70.33HFM 65.37 71.39 67.78

Ker-HFM 67.41 76.67 71.11SVM 60.56 70.56 64.56

Ker-SVM 67.41 74.72 70.33

(a) DEC09 as test data

MethodsSinglesensor

Multiplesensors

Combineall sets



MS-KerGJSR+L 80.25 90.74 84.44SLR 62.65 74.54 67.41

Ker-SLR 69.91 76.39 72.50HFM 65.90 73.61 68.98

Ker-HFM 69.14 78.24 72.78SVM 62.65 70.37 65.74

Ker-SVM 68.52 75.69 71.39

(b) DEC10 as test data

Table 5.3: Summarized classification results of single sensor sets, multiple sensorsets, and combining all sets.

from 10% to 15% (as seen in Table 5.3 ) among all methods. Additionally, these

plots also demonstrate that the more sensor we have, even sensors of the same or

114


Methods H HA OA



MS-KerGJSR+L 90.00 100.00 95.00SLR 76.67 66.67 71.67

Ker-SLR 100.00 50.00 75.00HFM 70.00 76.67 73.33

Ker-HFM 100.00 53.33 76.67SVM 100.00 46.67 73.33

Ker-SVM 100.00 53.33 76.67

(a) DEC09 as test dataMethods H HA OA



MS-KerGJSR+L 86.11 100.00 93.06SLR 61.11 91.67 76.39

Ker-SLR 66.67 86.11 76.39HFM 61.11 86.11 73.61

Ker-HFM 58.33 100.00 79.17SVM 55.56 86.11 70.83

Ker-SVM 69.44 80.56 75.00

(b) DEC10 as test data

Table 5.4: Classification results of set 15 (all-inclusive sensors).

different signal types, the better the classification performance is. While different

signal types produce different classification results, e.g., the combinations of acous-

tic sensor signals (set 10 and 12) give better discrimination between human and

human-animal events than the seismic signal (set 11), higher classification rates are

115


always achieved when more collective sensors are fused. In fact, the classification

rates of set 15 using the information from all-inclusive sensors (tabulated in Table

5.4) are always among the best or closed to the best performance of all sets for all

the proposed models.

Low-rank interference. We have discussed in the previous sections about the

need to develop a model that takes into account the noise or unknown interfered

signal as a low-rank component in a multi-sensor problem as well as how to formulate

and optimize it effectively. The empirical classification results on the border patrol

control dataset further validate our low-rank assumption. In both Figs. 5.4 and 5.5,

the model MS-JSR+E with sparse noise constraint somewhat improves over MS-

JSR, while MS-JSR+L truly brings out another layer of robustness to the dataset.

This clearly reassures our discussion that noise/interference more likely appears as

a low-rank component in a system with co-located sensors; and this turns out to be

a critical approach in a multiple sensory problem.

Structured sparsity. Throughout this chapter, we consider row-sparsity as

the broad prior assumption to enforce the correlation/complementary information

among homogeneous/heterogeneous sensors simultaneously and accomplish signifi-

cantly enhanced results. Moreover, MS-GJSR+L model which takes group structure

of coefficient matrices as accompanied prior is homogeneously slightly better than

MS-JSR+L. This cements the conclusion that group information is beneficial in

solving classification counterparts. Also, the fact that MS-GJSR+L performs the

116


best when multiple sensors are used underlines the broad effectiveness of incorporat-

ing the low-rank component for the interfered noise/signal or even sensor corruption

and carefully choosing structural sparsity priors in a class-specific manner.

Kernelized sparse representation. Another observation is the benefit of

classifying signal in the kernel induced domain, as seen by the significant improve-

ment in the performance of MS-KerJSR over MS-JSR. Notably, the kernel model

executes very well especially in the cases when only one single sensor is utilized.

Furthermore, the model MS-KerGJSR+L that integrates both kernel and low-rank

information yields the most consistently good classification rates. Although it does

not always perform the best, its results are consistently among the top classification

rates, both in single-sensor or multi-sensor cases. When combining all the results,

MS-KerGJSR+L also offers the best classification rate as averaging over all examin-

ing sensor sets, yet highlighting the robustness of consolidating kernel scheme with

low-rank interference and structural sparsity. This justifies the broad practical ben-

efit of MS-KerGJSR+L for the classification of multi-sensor data and proves that it

has the potential to become the model of choice when the linear/nonlinear behavior

of signal as well as the effect of noise/interference are not well learned.

117


5.5 Summary

We have proposed in this chapter various novel sparsity models to solve for a

multi-sensor classification problem by exploiting information from different signal

sources as well as exploring different assumptions of the structures of coefficient

vectors, low-rank interference and the signal non-linearity property. Experimental

results with a particular practical real dataset collected by the U.S. Army Research

Laboratory reveal several critical observations: (1) the use of complementary in-

formation from multiple sensors significantly improves the classification results over

just using a single sensor; (2) appropriate structured regularizations (joint and group

sparsity) bring more advantage in selecting the correct classification labels, hence

increasing the classification rate; (3) low-rank interference/noise is a critical issue

in multi-sensor fusion problem and (4) the classification in feature space induced by

a kernel function yields a compelling performance improvement.

118

Chapter 6

Conclusions

In this thesis, we considered a general framework of simultaneously minimizing

the structured sparsity of the main signals and the rank of the interferences on

a multiple-measurement joint processing scheme. We specifically proposed various

sparsity models based on different assumptions of the structures of sparsity coeffi-

cients and low-rank noise/interference. We proposed algorithms based on ADMM

technique to address these problems efficiently. Furthermore, we modify the classical

ADMM approach by utilizing an approximation to relax the dictionary transform

representation, thus simplify the computing efforts to achieve the optimization. By

this modification, we also showed that the algorithm is guaranteed to converge to

global optimum solutions.

We perceptively apply the proposed models in various real-world problems. As

we showed through these experiments, our proposed algorithms had significant im-

119

CHAPTER 6. CONCLUSIONS

proved performance compared with the state-of-the-art results. Especially, we ex-

hibited that our proposed models performed well even under less knowledge of input

observations (other competing methods require the prior knowledge of interference

signal sources, i.e., a dictionary of the interference, while our models have no demand

for this information except that a low-rankness assumption is satisfied).

We further extend our works to multi-sensor setups by promoting correlations as

well as complementary information between heterogeneous sensors simultaneously

while considering sparsity structures within and among different sensors’ observa-

tions. We also introduce kernelized models for the multi-sensor works and verified

the efficacy of all proposed algorithms in an automatic border patrol control appli-

cation to discriminate between human and animal footsteps. Our proposed methods

not only provide new tools, but also deepen the understanding of adaptive sparsity

modeling, signal behavior and efficient multi-sensor data collection and collabora-

tion. Nevertheless, although our techniques are only verified on a limited data set

specifically geared for border patrol control, they are not restricted to this specific

application. Rather, they can be applied to a broader set of classification or dis-

crimination problems, where the data is usually collected from multiple co-located

sensors.

120

Bibliography

[1] N. H. Nguyen, N. M. Nasrabadi, and T. D. Tran, “Robust multi-sensor classi-

fication via joint sparse representation,” in IEEE International Conference on

Information Fusion (FUSION), 2011, pp. 1–8.

[2] M. Yuan and Y. Lin, “Model selection and estimation in regression with

grouped variables,” Journal of the Royal Statistical Society - Series B, vol. 68,

no. 1, pp. 49–67, 2006.

[3] P. Zhao, G. Rocha, and B. Yu, “The composite absolute penalties family for

grouped and hierarchical variable selection,” Annals of Statistics, vol. 37, pp.

3469–3497, 2009.

[4] G. Obozinski, M. J. Wainwright, and M. I. Jordan, “Support union recovery in

high-dimensional multivariate regression,” Annals of Statistics, vol. 39, no. 1,

pp. 1–47, 2011.

[5] H. Zhang, N. M. Nasrabadi, Y. Zhang, and T. S. Huang, “Joint dynamic sparse

121

BIBLIOGRAPHY

representation for multi-view face recognition,” Pattern Recognition, vol. 45,

no. 4, pp. 1290–1298, 2012.

[6] S. Mallat, “A Wavelet Tour of Signal Processing, Third Edition: The Sparse

Way”. Academic Press, Dec. 2008.

[7] E. J. Candes, J.Romberg, and T. Tao, “Robust uncertainty principles: ex-

act signal reconstruction from highly incomplete frequency information,” IEEE

Trans. on Information Theory, vol. 52, pp. 5406–5425, 2006.

[8] D. L. Donoho, “Compressed sensing,” IEEE Trans. on Information Theory,

vol. 52, pp. 1289–1306, 2006.

[9] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component

analysis?” Journal of ACM, vol. 58, no. 3, pp. 1–37, 2011.

[10] M. Dao, Y. Suo, S. P. Chin, and T. D. Tran, “Structured sparse representation

with low-rank interference,” Signals, Systems and Computers (ASILOMAR),

Asilomar Conference on, 2014.

[11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimiza-

tion and statistical learning via the alternating direction method of multipliers,”

Foundations and Trends R© in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.

[12] M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo, “Fast image recovery

122

BIBLIOGRAPHY

using variable splitting and constrained optimization,” Image Processing, IEEE

Transactions on, vol. 19, no. 9, pp. 2345–2356, 2010.

[13] J. Yang and Y. Zhang, “Alternating direction algorithms for L1 problems in

compressive sensing,” SIAM Journal on Scientific Computing, vol. 33, no. 1,

pp. 250–278, 2011.

[14] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of sub-

space structures by low-rank representation,” Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. 35, no. 1, pp. 171–184, 2013.

[15] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier

method for exact recovery of corrupted low-rank matrices,” arXiv preprint

arXiv:1009.5055, 2010.

[16] L. H. Nguyen, M. D. Dao, and T. D. Tran, “Joint sparse and low-rank model for

radio-frequency interference suppression in ultra-wideband radar applications,”

IEEE International Conference on Image Processing (ICIP), 2014.

[17] X. Xiang, M. Dao, G. D. Hager, and T. D. Tran, “Hierarchical sparse and

collaborative low-rank representation for emotion recognition,” IEEE Interna-

tional Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

[18] M. Dao, N. M. Nasrabadi, and T. D. Tran, “Multi-sensor classification via

123

BIBLIOGRAPHY

sparsity-based representation with low-rank interference,” IEEE International

Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

[19] M. Dao, N. H. Nguyen, N. M. Nasrabadi, and T. D. Tran, “Collabora-

tive multi-sensor classification via sparsity-based representation,” submitted to

IEEE Transactions on Signal Processing, 2015.

[20] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral

image classification,” IEEE Trans. on Geoscience and Remote Sensing, vol. 43,

no. 6, pp. 1351–1362, 2005.

[21] S. Gao, I. W.-H. Tsang, and L.-T. Chia, “Kernel sparse representation for

image classification and face recognition,” in Computer Vision–ECCV 2010.

Springer, 2010, pp. 1–14.

[22] H. Rauhut, “Random sampling of sparse trigonometric polynomials,” Applied

and Computational Harmonic Analysis, vol. 22, no. 1, pp. 16–42, 2007.

[23] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparse

representation for computer vision and pattern recognition,” Proceedings of the

IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.

[24] E. Amaldi and V. Kann, “On the approximability of minimizing nonzero vari-

ables or unsatisfied relations in linear systems,” Theoretical Computer Science,

vol. 209, no. 1, pp. 237–260, 1998.

124

BIBLIOGRAPHY

[25] D. L. Donoho, “For most large underdetermined systems of linear equations

the minimal 1-norm solution is also the sparsest solution,” Communications on

pure and applied mathematics, vol. 59, no. 6, pp. 797–829, 2006.

[26] E. J. Candes and T. Tao, “Decoding by linear programming,” Information

Theory, IEEE Transactions on, vol. 51, no. 12, pp. 4203–4215, 2005.

[27] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparse over-

complete representations in the presence of noise,” Information Theory, IEEE

Transactions on, vol. 52, no. 1, pp. 6–18, 2006.

[28] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,”

Information Theory, IEEE Transactions on, vol. 50, no. 10, pp. 2231–2242,

2004.

[29] E. J. Candes, “The restricted isometry property and its implications for com-

pressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9, pp. 589–592,

2008.

[30] S. Foucart, “A note on guaranteed sparse recovery via 1-minimization,” Applied

and Computational Harmonic Analysis, vol. 29, no. 1, pp. 97–103, 2010.

[31] G. Obozinski, B. Taskar, and M. I. Jordan, “Joint covariate selection and joint

subspace selection for multiple classification problems,” Statistics and Comput-

ing, vol. 20, no. 2, pp. 231–252, 2010.

125

BIBLIOGRAPHY

[32] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face

recognition via sparse representation,” IEEE Trans. on Pattern Analysis and

Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.

[33] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,”

IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231–2242,

2004.

[34] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal

reconstruction,” IEEE Trans. on Information Theory, vol. 55, no. 5, pp. 2230–

2249, 2009.

[35] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed

sensing,” Applied and Computational Harmonic Analysis, vol. 27, no. 3, pp.

265–274, 2009.

[36] J. Wright and Y. Ma, “Dense error correction via l1 minimization,” IEEE

Trans. on Information Theory, vol. 56, no. 7, pp. 3540–3560, 2010.

[37] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification

using dictionary-based sparse representation,” Geoscience and Remote Sensing,

IEEE Transactions on, vol. 49, no. 10, pp. 3973–3985, 2011.

[38] M. Dao, D. Nguyen, T. Tran, and S. Chin, “Chemical plume detection in

126

BIBLIOGRAPHY

hyperspectral imagery via joint sparse representation,” in IEEE Military Com-

munications Conference (MILCOM), 2012, pp. 1–5.

[39] H. Zhang, N. M. Nasrabadi, T. S. Huang, and Y. Zhang, “Transient acoustic

signal classification using joint sparse representation,” International Conference

on Acoustics, Speech and Signal Processing (ICASSP), pp. 2220–2223, 2011.

[40] G. Obozinski, B. Taskar, and M. I. Jordan, “Joint covariate selection and joint

subspace selection for multiple classification problems,” Journal of Statistics

and Computing, vol. 20, no. 2, pp. 231–252, 2010.

[41] P. Sprechmann, I. Ramırez, G. Sapiro, and Y. C. Eldar, “C-hilasso: A col-

laborative hierarchical sparse modeling framework,” IEEE Trans. on Signal

Processing, vol. 59, no. 9, pp. 4183–4198, 2011.

[42] E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,”

Foundations of Computational Mathematics, vol. 9, pp. 717–772, 2009.

[43] J. F. Cai, E.J.Cande, and Z. Shen, “A singular value thresholding algorithm

for matrix completion,” submitted, 2008.

[44] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “Rasl: Robust alignment

by sparse and low-rank decomposition for linearly correlated images,” Pattern

Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 11, pp.

2233–2246, 2012.

127

BIBLIOGRAPHY

[45] H. Ji, C. Liu, Z. Shen, and Y. Xu, “Robust video denoising using low rank

matrix completion,” in Computer Vision and Pattern Recognition (CVPR),

2010 IEEE Conference on. IEEE, 2010, pp. 1791–1798.

[46] C. Qiu and N. Vaswani, ““Recursive sparse recovery in large but correlated

noise”,” Annual Allerton Conference on Communication, Control, and Com-

puting, 2011.

[47] M. Mardani, G. Mateos, and G. B. Giannakis, “Recovery of low-rank plus

compressed sparse matrices with application to unveiling traffic anomalies,”

IEEE Trans. on Information Theory, vol. 59, no. 8, pp. 5186–5205, 2013.

[48] R. Jenatton, J. Mairal, F. R. Bach, and G. R. Obozinski, “Proximal meth-

ods for sparse hierarchical dictionary learning,” in International Conference on

Machine Learning (ICML), 2010, pp. 487–494.

[49] J. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding algorithm

for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp.

1956–1982, 2010.

[50] J. Yang, W. Yin, Y. Zhang, and Y. Wang, “A fast algorithm for edge-preserving

variational multichannel image restoration,” SIAM Journal on Imaging Sci-

ences, vol. 2, no. 2, pp. 569–592, 2009.

[51] L. H. Nguyen and T. D. Tran, “Robust and adaptive extraction of rfi signals

128

BIBLIOGRAPHY

from ultra-wideband radar data,” in Geoscience and Remote Sensing Sympo-

sium (IGARSS), 2012 IEEE International. IEEE, 2012, pp. 7137–7140.

[52] L. H. Nguyen, T. Tran, and T. Do, “Sparse models and sparse recovery for ultra-

wideband sar applications,” Aerospace and Electronic Systems, IEEE Transac-

tions on, vol. 50, no. 2, pp. 940–958, 2014.

[53] L. H. Nguyen, R. Kapoor, and J. Sichina, “Detection algorithms for ultraw-

ideband foliage-penetration radar,” in AeroSense’97. International Society for

Optics and Photonics, 1997, pp. 165–176.

[54] L. H. Nguyen, K. A. Kappra, D. C. Wong, R. Kapoor, and J. Sichina, “Mine

field detection algorithm utilizing data from an ultrawideband wide-area surveil-

lance radar,” in Aerospace/Defense Sensing and Controls. International Soci-

ety for Optics and Photonics, 1998, pp. 627–643.

[55] L. Nguyen, M. Ressler, and J. Sichina, “Sensing through the wall imaging using

the army research lab ultra-wideband synchronous impulse reconstruction (uwb

sire) radar,” in SPIE Defense and Security Symposium. International Society

for Optics and Photonics, 2008, pp. 69 470B–69 470B.

[56] T. Miller, L. Potter, and J. McCorkle, “Rfi suppression for ultra wideband

radar,” Aerospace and Electronic Systems, IEEE Transactions on, vol. 33, no. 4,

pp. 1142–1156, 1997.

129

BIBLIOGRAPHY

[57] X.-Y. Wang, W.-D. Yu, X.-Y. Qi, and Y. Liu, “Rfi suppression in sar based on

approximated spectral decomposition algorithm,” Electronics letters, vol. 48,

no. 10, pp. 594–596, 2012.

[58] R. Lord and M. Inggs, “Efficient rfi suppression in sar using lms adaptive filter

integrated with range/doppler algorithm,” Electronics Letters, vol. 35, no. 8,

pp. 629–630, 1999.

[59] X. Luo, L. Ulander, J. Askne, G. Smith, and P.-O. Frolind, “Rfi suppression in

ultra-wideband sar systems using lms filters in frequency domain,” Electronics

Letters, vol. 37, no. 4, pp. 241–243, 2001.

[60] F. Zhou, R. Wu, M. Xing, and Z. Bao, “Eigensubspace-based filtering with

application in narrow-band interference suppression for sar,” Geoscience and

Remote Sensing Letters, IEEE, vol. 4, no. 1, pp. 75–79, 2007.

[61] Y. Chunrui, Z. Yongsheng, D. Zhen, and L. Diannong, “Eigen-decomposition

method for rfi suppression applied to sar data,” in Multimedia Technology

(ICMT), 2010 International Conference on. IEEE, 2010, pp. 1–4.

[62] J. H. Ender, “On compressive sensing applied to radar,” Signal Processing,

vol. 90, no. 5, pp. 1402–1414, 2010.

[63] V. M. Patel, G. R. Easley, D. M. Healy Jr, and R. Chellappa, “Compressed

130

BIBLIOGRAPHY

synthetic aperture radar,” Selected Topics in Signal Processing, IEEE Journal

of, vol. 4, no. 2, pp. 244–254, 2010.

[64] W. Zhang, M. G. Amin, F. Ahmad, A. Hoorfar, and G. E. Smith, “Ultra-

wideband impulse radar through-the-wall imaging with compressive sensing,”

International Journal of Antennas and Propagation, vol. 2012, 2012.

[65] J. C. Harsanyi and C.-I. Chang, “Hyperspectral image classification and dimen-

sionality reduction: an orthogonal subspace projection approach,” Geoscience

and Remote Sensing, IEEE Transactions on, vol. 32, no. 4, pp. 779–785, 1994.

[66] L. Meier, S. V. D. Geer, and P. Bhlmann, “The group lasso for logistic re-

gression,” Journal of the Royal Statistical Society: Series B, vol. 70, no. 1, pp.

53–71, 2008.

[67] B. Waske and J. A. Benediktsson, “Fusion of support vector machines for classi-

fication of multisensor data,” IEEE Trans. on Geoscience and Remote Sensing,

vol. 45, no. 12, pp. 3858–3866, 2007.

[68] M. Bernhardt, J. Heather, and M. Smith, “New models for hyperspectral

anomaly detection and un-mixing,” in Defense and Security. International

Society for Optics and Photonics, 2005, pp. 720–730.

[69] B. R. Foy and J. Theiler, “Scene analysis and detection in thermal infrared re-

131

BIBLIOGRAPHY

mote sensing using independent component analysis,” in Defense and Security.

International Society for Optics and Photonics, 2004, pp. 131–139.

[70] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral

and spatial classification of hyperspectral data using svms and morphologi-

cal profiles,” Geoscience and Remote Sensing, IEEE Transactions on, vol. 46,

no. 11, pp. 3804–3814, 2008.

[71] E. Hirsch and E. Agassi, “Detection of gaseous plumes in ir hyperspectral im-

ages using hierarchical clustering,” Applied optics, vol. 46, no. 25, pp. 6368–

6374, 2007.

[72] P. Heasler, C. Posse, J. Hylden, and K. Anderson, “Nonlinear bayesian al-

gorithms for gas plume detection and estimation from hyper-spectral thermal

image data,” Sensors, vol. 7, no. 6, pp. 905–920, 2007.

[73] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Sparse representation for target

detection in hyperspectral imagery,” IEEE Journal of Selected Topics in Signal

Processing, vol. 5, no. 3, pp. 629–640, 2011.

[74] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis.

Cambridge University Press, 2004.

[75] D. Manolakis, C. Siracusa, and G. Shaw, “Adaptive matched subspace detec-

tors for hyperspectral imaging applications,” in Acoustics, Speech, and Signal

132

BIBLIOGRAPHY

Processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE International Confer-

ence on, vol. 5. IEEE, 2001, pp. 3153–3156.

[76] J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-based sparse

representations for noise robust automatic speech recognition,” IEEE Trans.

on Audio, Speech, and Language Processing, vol. 19, pp. 2067–2080, 2011.

[77] B. Logan, “Mel frequency cepstral coefficients for music modeling,” in ISMIR,

2000.

[78] H. Bourlard, H. Hermansky, and N. Morgan, “Towards increasing speech recog-

nition error rates,” Speech communication, vol. 18, pp. 205–231, 1996.

[79] H. Hirsch and D. Pearce, “The aurora experimental framework for the per-

formance evaluation of speech recognition systems under noisy conditions,” in

Automatic Speech Recognition: Challenges for the new Millennium, 2000.

[80] B. Raj and R. M. Stern, “Missing-feature approaches in speech recognition,”

IEEE Signal Processing Magazine, vol. 22, pp. 101–116, 2005.

[81] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on

local binary patterns: A comprehensive study,” Image and Vision Computing,

vol. 27, no. 6, pp. 803–816, 2009.

[82] J. B. Tenenbaum and W. T. Freeman, “Separating style and content with

bilinear models,” Neural computation, vol. 12, no. 6, pp. 1247–1283, 2000.

133

BIBLIOGRAPHY

[83] S. Taheri, V. M. Patel, and R. Chellappa, “Component-based recognition

of facesand facial expressions,” Affective Computing, IEEE Transactions on,

vol. 4, no. 4, pp. 360–371, 2013.

[84] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews,

“The extended cohn-kanade dataset (ck+): A complete dataset for action unit

and emotion-specified expression,” Computer Vision and Pattern Recognition

Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 94–

101, 2010.

[85] S. Zafeiriou and M. Petrou, “Sparse representations for facial expressions recog-

nition via l 1 optimization,” Computer Vision and Pattern Recognition Work-

shops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 32–39, 2010.

[86] D. L. Hall and J. Llinas, “An introduction to multisensor data fusion,” Pro-

ceedings of the IEEE, vol. 85, no. 1, pp. 6–23, 1997.

[87] M. E. Liggins, J. Llinas, and D. L. Hall, Handbook of Multisensor Data Fusion:

Theory and Practice, 2nd ed. CRC Press, 2008.

[88] P. K. Varshney, “Multisensor data fusion,” Electronics and Communication

Engineering Journal, vol. 9, no. 6, pp. 245–253, 1997.

[89] M. Duarte and Y.-H. Hu, “Vehicle classification in distributed sensor networks,”

Journal of Parallel and Distributed Computing, vol. 64, no. 7, pp. 826–838, 2004.

134

BIBLIOGRAPHY

[90] A. Klausner, A. Tengg, and B. Rinner, “Vehicle classification on multi-sensor

smart cameras using feature- and decision-fusion,” IEEE conference on Dis-

tributed Smart Cameras, pp. 67–74, 2007.

[91] B. Scholkopf and A. J. Smola, Learning with kernels: support vector machines,

regularization, optimization, and beyond. MIT press, 2001.

[92] J. Yin, Z. Liu, Z. Jin, and W. Yang, “Kernel sparse representation based clas-

sification,” Neurocomputing, vol. 77, no. 1, pp. 120–128, 2012.

[93] S. Engelberg and E. Tadmor, “Recovery of edges from spectral data with noise

- a new perspective,” SIAM Journal on Numerical Analysis, vol. 46, no. 5, pp.

2620 – 2635, 2008.

[94] D. G. Childers, D. P. Skinner, and R. C. Kemerait, “The cepstrum: A guide

to processing,” Proceedings of the IEEE, vol. 65, no. 10, pp. 1428 – 1443, 1977.

[95] L. Cao, J. Luo, F. Liang, and T. S. Huang, “Heterogeneous feature machines

for visual recognition,” IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), pp. 1095–1102, 2009.

135

Vita

Minh Dao was born in Phutho, Vietnam in April 1984.

He received a Bachelor in Electrical Engineering from

Hanoi University of Technology, Vietnam in 2007 and a

double Master degree in Information and Communica-

tion Technologies from Polytechnic University of Turin

and Karlsruhe Institute of Technology with an Erasmus

Mundus fellowship in 2009. In September 2009, he en-

rolled in the Ph.D. program in Electrical and Computer

Engineering at The Johns Hopkins University, where he has been awarded Vietnam

Education Foundation (VEF) and Applied Physics Laboratory (APL) fellowships.

His current research interests include compressed sensing, sparse representation, low-

rank matrix recovery and applications in image/video/signal processing, computer

vision, machine learning and social network analysis.

136

Documents

SPARSITY-BASED REPRESENTATION WITH LOW-RANK …