11
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive Moving Average Models for Covariance Estimation Ami Wiesel, Or Bibi, and Amir Globerson Abstract—We consider large scale covariance estimation using a small number of samples in applications where there is a nat- ural ordering between the random variables. The two classical ap- proaches to this problem rely on banded covariance and banded in- verse covariance structures, corresponding to time varying moving average (MA) and autoregressive (AR) models, respectively. Mo- tivated by this analogy to spectral estimation and the well known modeling power of autoregressive moving average (ARMA) pro- cesses, we propose a novel time varying ARMA covariance struc- ture. Similarly to known results in the context of AR and MA, we address the completion of an ARMA covariance matrix from its main band, and its estimation based on random samples. Finally, we examine the advantages of our proposed methods using numer- ical experiments. Index Terms—Autoregressive moving average, covariance esti- mation, matrix completion, instrumental variables. I. INTRODUCTION L ARGE scale covariance estimation using a small number of samples is a fundamental problem in modern multi- variate statistical analysis. In many applications, e.g., linear array processing, climatology, spectroscopy, and longitudinal data analysis, there is a natural ordering between the random variables and it is reasonable to assume that the statistical relation between two variables decays with their distance. Based on this assumption, low order parametric models may be imposed on the unknown covariance and allow for consistent estimation using a small number of samples even when the dimension is much greater. A natural approach to covariance estimation is to formulate the decay in statistical relation using the notion of indepen- dence or correlation. Assuming that is uncorrelated with if leads an intuitive -banded covariance structure [1]–[3]. This structure is a special case of the more general class of sparse covariance models [4]. An alternative approach uses the notion of conditional independence. In the Gaussian case, assuming that and with are conditionally inde- pendent given the rest of the elements in leads to a -banded inverse covariance [1], [5]–[8]. Indeed, this structure is a spe- cial case of sparse inverse covariance models, also known as Manuscript received March 20, 2012; revised July 31, 2012; accepted Feb- ruary 18, 2013. Date of publication April 04, 2013; date of current version May 08, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Shuguang (Robert) Cui. Preliminary re- sults of this work have been presented in IEEE SAM-2012. This work was sup- ported by the Israeli Smart Grid consortium, and also in part by the ISF Centers of Excellence Grant 1789/11 and ISF Grant 786/11. The authors are with The Selim and Rachel Benin School of Computer Sci- ence and Engineering, The Hebrew University of Jerusalem, Jerusalem 91905, Israel (e-mail: [email protected]; o[email protected]; [email protected]). Digital Object Identier 10.1109/TSP.2013.2256900 Gaussian graphical models [9]. A remarkable result of [1] is that the condition is sufcient for consistent estima- tion in these models. The starting point for this work is the analogy between covariance estimation and spectrum estimation in stationary random processes. The two classical models for spectrum estimation are moving average (MA) and autoregressive (AR) processes. Interestingly, the nite process can be easily shown to be equivalent to a -banded Toeplitz covariance model, whereas the nite process corresponds to a -banded Toeplitz inverse covariance model. In the non-sta- tionary cases, the Toeplitz restriction is removed and the structures reduce to the banded and inverse banded structures discussed above. In stationary processes, it is well known that MA and AR are usually crude approximations of reality and that better modeling power may be obtained through autoregressive moving average (ARMA) processes [10]–[12]. Continuing this analogy, we propose a new non-stationary covariance model. In this paper, we introduce a novel time varying covariance model. It reduces to sparse structures when , and reduces to sparse structures when . The general model leads to a dense covariance with a dense inverse, but is parameterized by a small number of unknowns. are a special case of recursive linear models with correlated errors. A different special case, known as Bow-free acyclic path (BAP), has recently been studied in [13], [14]. The rst contribution of this paper concerns matrix completion. It is well known that and co- variance matrices can be uniquely completed given their main and bands, respectively. Thus, it is natural to expect that models can be reconstructed given their leading diagonals. Following this intuition, we derive an ARMA completion procedure denoted by . The algorithm uses an instrumental variables (IVs) approach which is motivated by classical time series analysis techniques [10]–[12]. It does not require any ad hoc starting point, is not iterative, and is not prone to convergence to suboptimal local solutions. We provide a simple condition under which is exact, as well as a counter example when the condition does not hold. Numerical results with random parameters suggest that, in practice, the condition almost always holds. The second contribution of this paper addresses covariance estimation with an emphasis on the high dimensional regime in which . Following existing methods in and , we propose to use again but replace the unknown band with its sample version. This empir- ical band is much smaller than the full sample covariance matrix and is therefore more accurate. To formalize these statements, 1053-587X/$31.00 © 2013 IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791

Time Varying Autoregressive Moving AverageModels for Covariance Estimation

Ami Wiesel, Ofir Bibi, and Amir Globerson

Abstract—We consider large scale covariance estimation usinga small number of samples in applications where there is a nat-ural ordering between the random variables. The two classical ap-proaches to this problem rely on banded covariance and banded in-verse covariance structures, corresponding to time varyingmovingaverage (MA) and autoregressive (AR) models, respectively. Mo-tivated by this analogy to spectral estimation and the well knownmodeling power of autoregressive moving average (ARMA) pro-cesses, we propose a novel time varying ARMA covariance struc-ture. Similarly to known results in the context of AR and MA, weaddress the completion of an ARMA covariance matrix from itsmain band, and its estimation based on random samples. Finally,we examine the advantages of our proposed methods using numer-ical experiments.

Index Terms—Autoregressive moving average, covariance esti-mation, matrix completion, instrumental variables.

I. INTRODUCTION

L ARGE scale covariance estimation using a small numberof samples is a fundamental problem in modern multi-

variate statistical analysis. In many applications, e.g., lineararray processing, climatology, spectroscopy, and longitudinaldata analysis, there is a natural ordering between the randomvariables and it is reasonable to assume that the statisticalrelation between two variables decays with their distance.Based on this assumption, low order parametric models may beimposed on the unknown covariance and allow for consistentestimation using a small number of samples even when thedimension is much greater.A natural approach to covariance estimation is to formulate

the decay in statistical relation using the notion of indepen-dence or correlation. Assuming that is uncorrelated withif leads an intuitive -banded covariance structure[1]–[3]. This structure is a special case of the more general classof sparse covariance models [4]. An alternative approach usesthe notion of conditional independence. In the Gaussian case,assuming that and with are conditionally inde-pendent given the rest of the elements in leads to a -bandedinverse covariance [1], [5]–[8]. Indeed, this structure is a spe-cial case of sparse inverse covariance models, also known as

Manuscript received March 20, 2012; revised July 31, 2012; accepted Feb-ruary 18, 2013. Date of publication April 04, 2013; date of current version May08, 2013. The associate editor coordinating the review of this manuscript andapproving it for publication was Prof. Shuguang (Robert) Cui. Preliminary re-sults of this work have been presented in IEEE SAM-2012. This work was sup-ported by the Israeli Smart Grid consortium, and also in part by the ISF Centersof Excellence Grant 1789/11 and ISF Grant 786/11.The authors are with The Selim and Rachel Benin School of Computer Sci-

ence and Engineering, The Hebrew University of Jerusalem, Jerusalem 91905,Israel (e-mail: [email protected]; [email protected]; [email protected]).Digital Object Identifier 10.1109/TSP.2013.2256900

Gaussian graphical models [9]. A remarkable result of [1] is thatthe condition is sufficient for consistent estima-tion in these models.The starting point for this work is the analogy between

covariance estimation and spectrum estimation in stationaryrandom processes. The two classical models for spectrumestimation are moving average (MA) and autoregressive (AR)processes. Interestingly, the finite process can be easilyshown to be equivalent to a -banded Toeplitz covariancemodel, whereas the finite process corresponds to a-banded Toeplitz inverse covariance model. In the non-sta-tionary cases, the Toeplitz restriction is removed and thestructures reduce to the banded and inverse banded structuresdiscussed above. In stationary processes, it is well known thatMA and AR are usually crude approximations of reality and thatbetter modeling power may be obtained through autoregressivemoving average (ARMA) processes [10]–[12]. Continuingthis analogy, we propose a new non-stationarycovariance model.In this paper, we introduce a novel time varying

covariance model. It reduces to sparse structures when, and reduces to sparse structures when .

The general model leads to a dense covariance with a denseinverse, but is parameterized by a small number of unknowns.

are a special case of recursive linear models withcorrelated errors. A different special case, known as Bow-freeacyclic path (BAP), has recently been studied in [13], [14].The first contribution of this paper concerns

matrix completion. It is well known that and co-variance matrices can be uniquely completed given their mainand bands, respectively. Thus, it is natural to expect that

models can be reconstructed given theirleading diagonals. Following this intuition, we derive an ARMAcompletion procedure denoted by . The algorithm uses aninstrumental variables (IVs) approach which is motivated byclassical time series analysis techniques [10]–[12]. It does notrequire any ad hoc starting point, is not iterative, and is notprone to convergence to suboptimal local solutions. We providea simple condition under which is exact, as well as a counterexample when the condition does not hold. Numerical resultswith random parameters suggest that, in practice, the conditionalmost always holds.The second contribution of this paper addresses

covariance estimation with an emphasis on the high dimensionalregime in which . Following existing methodsin and , we propose to use again but replacethe unknown band with its sample version. This empir-ical band is much smaller than the full sample covariance matrixand is therefore more accurate. To formalize these statements,

1053-587X/$31.00 © 2013 IEEE

Page 2: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

2792 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

we analyze under both deterministically and stochasticallyperturbed inputs. Similarly to [1], we show that, under tech-nical conditions, it is consistent in the operator norm as long as

. In addition, we demonstrate the performanceadvantages of the estimator using numerical experiments.The outline of the paper is as follows. In Section II, we in-

troduce the time-varying covariance model. InSection III, we consider matrix completion and in Section IVwe apply these results in the context of estimation. In Section V,we demonstrate the advantages of our proposed methods usingnumerical experiments. Finally, concluding remarks are givenin Section VI.The following notation is used. Boldface upper case letters

denote matrices, boldface lower case letters denote column vec-tors, and standard lower case letters denote scalars. The super-scripts and denote the transpose, theconjugate transpose, the inverse and the pseudoinverse, respec-tively. For sets and , the cardinality is denoted by and theset difference operator is denoted by . The sub-matrix ofindexed by and is denoted by , and this operation pre-cedes any other matrix operation, e.g., . Theoperators and denote the max-imum absolute column sum norm, the operator norm, the max-imum absolute row sum norm and the element-wise infinitynorm, respectively. means that is positive definite.By we mean that the columns of lie within therange space of . The banding operator outputs a ma-trix with the principal diagonals of and zero-paddingelsewhere. We use the following indices sets definitions. For

and given and , we define

and . These setssatisfy:

(1)

and can be visualized asThroughout the paper, we use a few technical lemmas pro-

vided in Appendix A.

II. MODELS

In this section, we review the two classical MA and AR co-variance structures, and then generalize them to the more flex-ible class of ARMA models.A natural model for the covariance of a random vector

which satisfies a known ordering is a banded matrix, i.e., a ma-trix such that [1]:

(2)

The rationale behind this structure is that, in the Gaussian case,a zero in the covariance matrix denotes statistical independencebetween variables:

(3)

and it is reasonable to assume that distant elements are statisti-cally independent. It is well known that banded positive definitematrices have banded Cholesky factorization

(4)

where is a -banded lower triangular matrix. This providesan elegant generative model for the random vector using anuncorrelated latent random vector . Namely,

(5)

where is a driving vector which satisfies

(6)

Note that each variable in averages over adjacent elements in, and hence the name “moving average” (MA) process. More

precisely, this is a time-varying or non-stationary MA model.The classical stationary MA process assumes that is also aToeplitz matrix. In what follows, we use to denote co-variance matrices which satisfy (4).An alternative model for the covariance matrix of an ordered

random vector is that its inverse is -banded and satisfies[1], [15]

(7)

The rationale is that, in the Gaussian case, a zero in the inversecovariance matrix corresponds to conditional independence be-tween variables:

(8)

where denotes a vector with all the elements in exceptand . Intuitively, this means that the dependency between dis-tant variables disappears once we know the elements betweenthem. Using the Cholesky factorization, any matrix whose in-verse is -banded can be written as

(9)

where is the Cholesky factor, is a -bandedlower triangular matrix with zero valued diagonal elements andis a positive definite diagonal matrix. This leads to the dy-

namical “autoregressive” model

(10)

where is a vector of latent variables satisfying

(11)

(12)

Page 3: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

WIESEL et al.: TIME VARYING AUTOREGRESSIVE MOVING AVERAGE MODELS FOR COVARIANCE ESTIMATION 2793

The name “autoregressive” expresses the causal dependency ofon . The classical stationary AR model is a special case

in which has a Toeplitz structure. As before, we useto denote covariance matrices which satisfy (9).Motivated by theMA and AR interpretation of the banded co-

variance and inverse covariance models, we now propose a newcovariance structure based on the classical ARMA process. Thescalar, infinite and stationary ARMA process for

is defined as

(13)

where is a zero mean, white driving process. The naturalvector, finite and non-stationary version of (13) is the lengthrandom vector defined as

(14)

where is a vector of latent variables which satisfies (6), is a-banded, lower triangular matrix with zero diagonal elementsand is a -banded lower triangular matrix. Indeed, if we in-crease the dimension without bound and constrain andto Toeplitz matrices, then model (13) can be interpreted as thelimit to (14). The two extreme cases of the ARMA model areclearly the pure MA process in (5) where and the pureAR process in (10) where and is adiagonal matrix.The ARMA model defines a low order parametric model for

the covariance of . Rearranging (14), we obtain

(15)

with a covariance matrix

(16)

Hereinafter, we denote the class of matrices satisfying (16) ascovariances. To simplify the notation, we will

often use

(17)

with

(18)

Intuition on this structure can be obtained by examining its ex-treme cases. It is easy to see that this model has the pure MAmodel in (4) and the pure AR model in (9) as special cases.Unlike these special cases, both and are non-bandedand dense matrices in the general ARMA model. It is impor-tant to note that the decomposition of in (16) is not uniqueand there may be different legitimate pairs of matriceswhich yield the same covariance. Thus, throughout this paperwe only address the issue of recovering or estimating ratherthan .For completeness, we note that the proposed ARMAmodel is

a special case of recursive linear models with correlated errors.

Bow-free acyclic path (BAP) diagrams are another special casewhich has recently been studied in [13], [14]. The ARMAmodelis not necessarily a BAP, and is aimed at different applicationswhere there is a natural ordering between the variables. Fur-thermore, the current results on BAPs assume that the numberof samples is greater than the dimension and are therefore notsuitable for large scale covariance estimation.

III. COVARIANCE COMPLETION

Matrix completion problems consider the recovery of a fullmatrix given the values of only a subset of its entries assumingsome structure or objective function [16], [17]. We now reviewknown results on these problems in MA and AR models andextend them to the ARMA setting.Completion of an covariance matrix from its main-band is trivial via zero padding [1]

(19)

Completion of an covariance matrix from its main-band is also well known. This procedure is known as positivedefinite completion and is equivalent to maximum likelihoodestimation [9], [16]. In brief, we recover forusing simple linear regressions based on . The th rowof (10) can be expressed as

(20)

We multiply this equation on the right by and take the ex-pectation. The residual is uncorrelated with and thereforedisappears so that

(21)

Solving for yields the non-stationary Yule Walker for-mulas

(22)

from which the covariance can be easily completed.We now extend these results to general ARMA structures.

Similarly to the completion method, we use linear re-gressions to recover from and then reconstruct . Themain difference is that we cannot use simple regressions and re-quire a more careful treatment based on IVs. As before, the throw of (14) can be expressed as

(23)

but now and are correlated. IV methods are directlyaimed at regressions when the explanatory variables (covari-ates) are correlated with the residual terms. Thus, instead ofmultiplying by , we multiply (23) by the IVs which,due to (1), are uncorrelated with

(24)

Taking the expectation yields

(25)

Page 4: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

2794 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

for . Due to (1), the number of equations willbe greater than or equal to number of unknowns . Thus, wecan try to solve these linear systems as

(26)

and pad the rest of the matrix with zeros. Assuming that isidentical to , we can easily recover as

(27)

where

(28)

and complete to

(29)

This ARMA completion procedure, denoted by , is summa-rized below.

Procedure AC

Input:

Output:

For do

Clearly, is correct when . Due to the lack of identi-fication in ARMA models, this may not be the situation in prac-tice and it is very likely that . The next theorem charac-terizes the conditions for exact completion in this practical case.Theorem 1: Let follow an model. Define a

bandwidth and assume that

(30)

Then, reconstructs the unknown covariance exactly so that. In particular, if then (30) always hold, and

the reconstruction is exact (yet trivial).Proof: The core of the proof lies in the identity

(31)

The last equality is similar to (24)–(25) and is based on the factthat, like is also uncorrelated with . The identitysuggests a simple recursive completionmethod. Assume that wehave already reconstructed , then usingand (31), we can recover . In what follows, we show that

the last two lines of are simply a non-recursive implemen-tation of this idea.Using (28), identity (31) can be compactly expressed as

(32)

For any matrix , we have the following recursive properties

(33)

for , where we have used the triangular structure of. Thus, the last two lines of can be expressed as

(34)

(35)

Based on this observation, the proof that proceeds byinduction on the correctness of starting with andup to . The basis of the induction

(36)

clearly holds and we now show that the hypothesis

(37)

leads to

(38)

For this purpose, we define the following partitioning

(39)

(40)

We begin by analyzing and . The matrix is -bandedand . Thus, the inner banding in (34) can be omitted

(41)

(42)

(43)

(44)

Due to the triangular structure, we have

(45)

(46)

Page 5: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

WIESEL et al.: TIME VARYING AUTOREGRESSIVE MOVING AVERAGE MODELS FOR COVARIANCE ESTIMATION 2795

Finally, we complete the induction by deriving and

(47)

and

(48)

Theorem 1 extends classical results on andto the case of , and shows that, under assumption(30), the covariance matrix can be completed given itsleading diagonals. The assumption is similar to classical invert-ibility assumptions in stationary ARMA process identificationvia IVs [10]–[12]. Numerical results in Section Vwith randomlygenerated parameters suggest that it usually holds in practice.However, it is possible to construct specific counter examplesas follows.Theorem 2: There exist covariances which

cannot be completed from their main band.Proof: Consider the following counter example. First, de-

fine the matrix

(49)

whose parameters are

(50)

Next, define the matrix

(51)

whose parameters are

(52)

Both are legitimate matrices. Their main -bandsare identical but each has a different value in the ’th entrywhich needs to be completed. Thus, it is impossible to uniquelydecide between them given the main band.

IV. COVARIANCE ESTIMATION

In this section, we consider the estimation of the covariancematrix of an observed sample of random vectors. Specifically,let be a length , zero mean normal vector with covariance .Our goal is to estimate based on independent and identicallydistributed realizations of denoted by for .The simplest estimator is the well known sample covariance

(53)

In the Gaussian case, it coincides with the unstructured Max-imum Likelihood (ML) estimate as long as and is there-fore consistent and efficient when . Unfortunately, thenumber of samples in many practical applications is not suf-ficient, and better performance may be obtained through loworder parametric models.The standard methods for estimation in pure MA and AR

structures use only the main band of the sample covariance [1].In what follows, we extend this approach to ARMA models.Our estimator is obtained by taking the matrix and applyingthe procedure (see Section III) to it, with . In otherwords, it takes the main band of the matrix and usesto obtain a new matrix . The matrix will serve as our esti-mator of the true covariance . We now analyze the number ofsamples needed for this scheme to achieve a desired accuracy.As in similar analyses, we require certain conditions on thethat generate the data. This is a common approach in ana-

lyzing the covariance estimation and its dependence on dimen-sion (e.g., see [1]). Specifically, we shall assume that these be-long to a subclass of ARMA models of arbitrary dimension .Define the following set of matrices:

(54)

It is instructive to explain the meaning of the above conditionson . The first condition is similar to standard

Page 6: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

2796 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

assumptions in high dimensional analysis of covariance estima-tion methods. It basically states that the elements and the param-eters of the unknown covariance and its inverse are bounded byconstants which do not depend on the dimension. Here we alsorequire similar conditions on the matrices which are ob-tained as intermediate steps of applied to (we assume that

is applied with ). The second condition addressesthe continuity of the pseudo-inversions in (26), and guaranteesstability under perturbation.1 The third condition is simply (30),which is necessary for to work.The following theorem states that when is close to , then

our estimator will also be close to . We shall later use it inTheorem 4 to obtain convergence rates for our estimator.Theorem 3: Let be a matrix in . Assume that

the matrix satisfies

(55)

where

(56)

Then the matrix , which results from applying to ,satisfies

(57)

where

(58)

The constants are functions of and and donot depend on the dimension . They are defined in (65), (71),and (74) below.

Proof: Recall that calculates the matriceswhich are then used to obtain the completed matrix . We willbe interested in two such sets of matrices. Those obtained whenapplying to the true covariance , and those obtained whenapplying to the sample covariance . The former will bedenoted by and the latter by . Sinceunder assumption (30), applying to results in itself,we have that:

(59)

Recall also that are given by:

(60)

In what follows, we will analyze how differ fromand use this to analyze how differs from .

We begin by analyzing . From (56) we have that

(61)

1We believe this condition may be dropped via more careful analysis.

Thus,

and due to Lemma 2

(62)

Using the sub-multiplicative and triangle inequalities we get

(63)

The matrices and are -banded and lower triangular. Thus,

(64)

where

(65)

Next, we analyze the difference between and . We define

so that

We first note two facts regarding the banded matricesand . First, due to Lemma 1 the matrix satisfies

(66)

Second, the two banded matrices satisfy

(67)

Page 7: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

WIESEL et al.: TIME VARYING AUTOREGRESSIVE MOVING AVERAGE MODELS FOR COVARIANCE ESTIMATION 2797

where we have used basic inequalities between norms ofsymmetric matrices, and (55). This inequality is actually thecore of the pure analysis. It states that the operatornorm of the banded matrix error is bounded by the elementwise error and is independent of the dimension which may bemuch larger. We next combine inequalities (66) and (67) withLemma 3 to obtain

(68)

where

(69)

Using Lemma 1 again yields

(70)

where we define

(71)

Finally, we turn to the overall error in . If

(72)

then due to Lemma 2

(73)

where

(74)

Recall that , so that application of Lemma 3yields

(75)

as long as (61) and (72) are satisfied. Rearranging these condi-tions results in (57) as required.Theorem 3 can now be used to the derive the non-asymptotic

behavior of our estimator. The theorem below states that whenestimating a of dimension with samples, the estimation

error is of order , implying only a weak dependence on. This is similar to results obtained for MA and AR estimatorsin [1].

Theorem 4: Let be a matrix in . Assume thatin (53) is a sample covariance constructed using indepen-

dent and identically distributed realizations of a multivariateGaussian distribution with zero mean and covariance . Then,there exist constants and which depend only on, such that for

(76)

the inequality

(77)

holds with probability greater than , where the con-stants are defined in Theorem 3, and is arbitrary.

Proof: Lemma A.3 and the union bound in page 221 of [1]show that

(78)

for

(79)

Combining this with the deterministic inequalities in Theorem3 yields the required result.Finally, we now briefly discuss the issue of order selection. In

order to enjoy the advantages of low order parametric modelsit is vital to choose the orders and in an efficient and ac-curate manner. In some applications, these bandwidths are apriori known from previous experiments or the physical char-acteristics of the system. In other scenarios, these parametersmust be inferred from the available data. There are differentapproaches to this task. Hypothesis testing and expected like-lihood based criteria have been recently proposed in the contextof time varying AR covariance models [5], [15], [18]. Alterna-tively, cross validation approaches can be used as in [1], [2]. Wefollow these last works and propose a simple -fold cross vali-dation procedure. We divide the indices set into

non-overlapping groups such that ,and consider a set of candidate pairs for .We define as the output of applied to

(80)

with parameters . Our criterion for choosing is thenmean squared Frobenius error:

(81)

After finding this optimal , we re-estimate the covariance usingthe bands using the full set of samples.

Page 8: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

2798 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

Fig. 1. Error in reconstruction of covariances of dimensionfrom their main band, as a function of .

V. EXPERIMENTS

In what follows we experiment with the procedure on twotasks. The first is the completion of ARMA covariance matricesfrom their main band (see Section III). The second is estima-tion of ARMA models from finite samples (see Section IV).

A. Evaluating the Completion Algorithm

In Section III we show that the procedure can be usedto complete an ARMA covariance from its main band. Thecompletion is guaranteed to work under the condition in (30),which appears to hold quite generally. To show this, we drawwell conditioned ARMA models with and

from a random ensemble. At each experiment, werandomly generate two 10-banded positive definite matricesas the Hadamard product between Wishart matrices withdegrees of freedom and banded Toeplitz masks withelements equal to within the band. We use one ofthese matrices to compute the AR component via (9), andthe other to compute the MA component via (4). Pluggingthese and into (16) yields a well conditioned ARMAcovariance. We then use to complete from its mainband, for different values of . Fig. 1 shows the Frobenius normerror of this procedure for different values averaged over 100independent experiments. It can be seen that we achieve perfectreconstruction for .

B. Evaluating the Estimation Algorithm

In what follows we evaluate our ARMA estimation method(Section IV) on various simulated covariance matrices. Wecompare four different estimators: the sample covariance, anautoregressive model , a moving average model

, and our autoregressive moving average model. Note that to ensure fairness, all three parametric

models have the same number of degrees of freedom. TheARMA model is estimated using the matrix completion ap-proach corresponding to with . The AR models

Fig. 2. Best estimators as a function of the ARMA model: ARMA is white,AR, MA and sample covariance are grays.

are estimated using standard Yule Walker estimation and theMA models are estimated using the procedure in [2].2

In our first experiment, we examine the performance as afunction of the parameters. Specifically, we let

and define and as Toeplitz matrices withand evaluated on the grid . Foreach point, we simulate all four estimators and compute thesquared Frobenius norm error in the covariance. In Fig. 2 weplot the labels of the best estimators for each pole and zerovalues and for different numbers of samples. As expected, theadvantage of the ARMA estimator is emphasized when the ARand MA components are of the same order with different signs.Also, the region in which ARMA dominates becomes largerwith the number of samples.In our second experiment, we consider the stationary narrow

band example in [19]:

(82)

We construct the time varying ARMA covariance modelas follows. First we define a size Toeplitz ma-trix with the corresponding parameters

and . Next, following [5] in the context of timevarying AR covariance estimation, we add non-stationarityusing phase path variations along propagation channels. Wedefine the Doppler matrix as:

(83)

with , and the overall time varying covariance matrix as

(84)

2The MA estimation method in [2] typically outperforms simple banding of.

Page 9: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

WIESEL et al.: TIME VARYING AUTOREGRESSIVE MOVING AVERAGE MODELS FOR COVARIANCE ESTIMATION 2799

Fig. 3. Error in Gaussian as a function of the number of samples.

Fig. 4. Error in Uniform as a function of the number of samples.

Note that this construction involves the use of complexvariables. For this purpose, we replace the transpose and pseu-doinverse operators in with their complex counterparts,i.e., conjugate transpose and complex valued pseudoinverse,respectively.We report the results of four experiments performed on this

model. In Fig. 3 we provide the normalized Frobenius norm er-rors of the and with estimators asa function of the number of samples. The samples are drawnfrom a multivariate normal distribution. In order to test the ro-bustness to non Gaussian samples, we also repeat the experi-ment using uniform distribution for the driving noise as de-scribed in [19]. The results are provided in Fig. 4. Next, in Fig. 5we examine the effect of model selection and run the simulationagain without providing the estimators their band parameter, butchoosing it using the proposed cross validation technique. In allthree cases, it is easy to see the advantage of the ARMA esti-mator over its competitors.

Fig. 5. Error in Gaussian with model selection as a function ofthe number of samples.

Fig. 6. Error in Gaussian as a function of the bandwidth .

Finally, we consider the performance as a function of thebandwidth parameter of the algorithm. The previous re-sults are all based on the choice . To illustratethe effect of , we evaluate performance for different values of. This is done for the model above with normalnoise, and with a sample size of . Results are shownin Fig. 6. The curve suggests that there is an optimal band-width around which can provide even better performance.Clearly, this comes with the additional cost of optimizing thisadditional hyper parameter.In our third experiment, we consider the

in Exercise C2.22 in the spectral analysis book [20]. Asbefore, we construct a Toeplitz covariance withthe parameters

and zero otherwise. Wethen add non stationarity as formulated in (84). We report theerror as a function of the number of samples with and withoutcross validation in Figs. 7 and 8, respectively. Both figures

Page 10: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

2800 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013

Fig. 7. Error in Gaussian as a function of the number of samples.

Fig. 8. Error in Gaussian with cross validation as a function ofthe number of samples (MA error is larger than 0.2).

demonstrate the advantages of ARMA approach both when theband parameters are a priori known, and when these must beestimated from the available data.

VI. DISCUSSION

In this paper, we introduced a low order parametric modelfor covariances based on a time varying ARMA structure. Weconsidered the completion of such covariances from their mainband, and their estimation from random samples. Our contri-butions generalize existing results on the special cases of pureMA and pure AR structures. In particular, we provide conditionsunder which our proposedARMAestimates are consistent in theoperator norm as long as .The model assumes that the locations of the

non-zero valued parameters are known in advance based on theordering of the variables. Recently, there has been a growing in-terest in learning sparse models where the sparsity pattern hasto be estimated from the data as well. This has been done inthe context of sparse covariance matrices as well as sparse in-

verse covariance matrices [21]–[25]. Future work should ad-dress these approaches in the context of ARMA models. By in-terpreting the MA part as latent variables, such extensions willbe related to the recent works on learning covariance matriceswith hidden variables [26].

APPENDIX ATECHNICAL LEMMAS

Lemma 1: Let be an arbitrary matrix, then.

Proof: Let where is the all ones matrix,then where is the Hadamard elementwiseproduct. The spectral norm is Hadamard submultiplicative sothat [27]. Finally, forany symmetric matrix and .Lemma 2 (Full Rank Pseudoinverse Perturbation): If is a

tall full rank with . and, then is full rank with

(85)

Proof: Corollary 22 and Lemma 23 in [28] withand .Lemma 3: Let for satisfy , and

with . Then,

(86)

Proof: Apply the submultiplicative and triangle inequali-ties to

(87)

REFERENCES[1] P. J. Bickel and E. Levina, “Regularized estimation of large covariance

matrices,” Annal. Statist., vol. 36, no. 1, pp. 199–227, 2008.[2] A. J. Rothman, E. Levina, and J. Zhu, “A new approach to cholesky-

based covariance regularization in high dimensions,” Biometrika, vol.97, no. 3, p. 539, 2010.

[3] T. T. Cai, C. H. Zhang, and H. H. Zhou, “Optimal rates of convergencefor covariance matrix estimation,” Annal. Statist., vol. 38, no. 4, pp.2118–2144, 2010.

[4] S. Chaudhuri, M. Drton, and T. S. Richardson, “Estimation of a covari-ance matrix with zeros,” Biometrika, 2007.

[5] Y. I. Abramovich, N. K. Spencer, and M. D. E. Turley, “Order estima-tion and discrimination between stationary and time-varying (TVAR)autoregressive models,” IEEE Trans. Signal Process., vol. 55, no. 6,pp. 2861–2876, Jun. 2007.

[6] Y. I. Abramovich, N. K. Spencer, and M. D. E. Turley, “Time-varyingautoregressive (TVAR) models for multiple radar observations,” IEEETrans. Signal Process., vol. 55, no. 4, pp. 1298–1311, Apr. 2007.

[7] A. Kavcic and J. M. F. Moura, “Matrices with banded inverses: Inver-sion algorithms and factorization of Gauss-Markov processes,” IEEETrans. Inf. Theory, vol. 46, no. 4, pp. 1495–1509, Apr. 2000.

[8] A. Asif and J. M. F. Moura, “Block matrices with l-block-banded in-verse: Inversion algorithms,” IEEE Trans. Signal Process., vol. 53, no.2, pp. 630–642, Feb. 2005.

[9] S. L. Lauritzen, Graphical Models. New York, ser. Oxford StatisticalScience Series, 1996, vol. 17.

[10] P. Stoica and M. Jansson, “MIMO system identification: State-spaceand subspace approximations versus transfer function and instrumentalvariables,” IEEE Tran. Signal Process., vol. 48, no. 11, pp. 3087–3099,Nov. 2000.

Page 11: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11 ...amiw/wiesel_arma.pdf · IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 11, JUNE 1, 2013 2791 Time Varying Autoregressive

WIESEL et al.: TIME VARYING AUTOREGRESSIVE MOVING AVERAGE MODELS FOR COVARIANCE ESTIMATION 2801

[11] P. Stoica, T. Soderstrom, and B. Friedlander, “Optimal instrumentalvariable estimates of the AR parameters of an ARMA process,” IEEETrans. Automat. Control, vol. 30, no. 11, pp. 1066–1074, Nov. 1985.

[12] M. Viberg, P. Stoica, and B. Ottersten, “Array processing in correlatednoise fields based on instrumental variables and subspacefitting,” IEEETrans, Signal Process,, vol. 43, no. 5, pp. 1187–1199, May 1995.

[13] C. Brito and J. Pearl, “A new identification condition for recursivemodels with correlated errors,” Structural Equation Model,, vol. 9, no.4, pp. 459–474, 2002.

[14] M. Drton, M. Eichler, and T. S. Richardson, “Computing maximumlikelihood estimates in recursive linear models with correlated errors,”J. Mach. Learn. Res., vol. 10, pp. 2329–2348, 2009.

[15] Y. I. Abramovich, N. K. Spencer, and M. D. E. Turley, “Time-varyingautoregressive (TVAR) models for multiple radar observations,” IEEETrans. on Signal Process., vol. 55, no. 4, pp. 1298–1311, Apr. 2007.

[16] B. Grone, C. R. Johnson, E. Marques de Sa, and H. Wolkowicz, “Pos-itive definite completions of partial Hermitian matrices,” Linear Al-gebra Appl., vol. 58, pp. 109–124, 1985.

[17] E. Ben David and B. Rajaratnam, “Positive definite completion prob-lems for directed acyclic graphs,” Arxiv Preprint arXiv:1201.03102011.

[18] Y. Abramovich and B. A. Johnson, “Expected likelihood approachfor covariance matrix estimation: Complex angular central GaussianCase,” in Proc. IEEE SAM-2012, June 2012, 2012.

[19] A. K. Rao, Y. F. Huang, and S. Dasgupta, “ARMA parameter estima-tion using a novel recursive estimation algorithm with selective up-dating,” IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 3,pp. 447–457, 1990.

[20] P. Stoica and R. L.Moses, Spectral Analysis of Signals. Upper SaddleRiver, NJ, USA: Pearson/Prentice Hall, 2005.

[21] P. J. Bickel and E. Levina, “Covariance regularization by thresh-olding,” The Annal. Statist., vol. 36, no. 6, pp. 2577–2604, 2008.

[22] N. Meinshausen and P. Buhlmann, “High-dimensional graphs andvariable selection with the LASSO,” Annal. Statist., pp. 1436–1462,2006.

[23] A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu, “Sparse permuta-tion invariant covariance estimation,” Electron. J. Statist., vol. 2, pp.494–515, 2008.

[24] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covarianceestimation with the LASSO,” Biostat, vol. 9, no. 3, pp. 432–441, July2008.

[25] G. Cao, L. R. Bachega, and C. A. Bouman, “The sparse matrix trans-form for covariance estimation and analysis of high dimensional sig-nals,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 625–640, Mar.,2011.

[26] V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky, “Latent variablegraphical model selection via convex optimization,” Arxiv PreprintarXiv:1008.1290 2010.

[27] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge,MA, USA: Cambridge University Presss, 1991.

[28] D. Hsu, S.M. Kakade, and T. Zhang, “A spectral algorithm for learninghidden Markov models,” J. Comput. Syst. Sci., 2012.

Ami Wiesel received the B.Sc. and M.Sc. degreesin electrical engineering from Tel-Aviv University,Tel-Aviv, Israel, in 2000 and 2002, respectively, andthe Ph.D. degree in electrical engineering from theTechnion—Israel Institute of Technology, Haifa, Is-rael, in 2007.He was a postdoctoral fellow with the Department

of Electrical Engineering and Computer Science,University of Michigan, Ann Arbor, in 2007–2009.Since Jan. 2010, he has been a Faculty Member atthe Rachel and Selim Benin School of Computer

Science and Engineering at the Hebrew University of Jerusalem, JerusalemIsrael.Dr. Wiesel was a recipient of the Young Author Best Paper Award for a 2006

paper in the IEEE TRANSACTIONS ON SIGNAL PROCESSING and a Student PaperAward for a 2005 Workshop on Signal Processing Advances in Wireless Com-munications (SPAWC) paper. He was awarded the Weinstein Study Prize in2002, the Intel Award in 2005, the Viterbi Fellowship in 2005 and 2007, and theMarie Curie Fellowship in 2008.

Ofir Bibi received the B.Sc. degree in physics andcomputer science from The Hebrew University,Jerusalem, Israel, in 2009. He is currently workingtoward the Ph.D. degree in The Alice and Jack OrmutInternational Ph.D. Program for outstanding studentsin Brain Research: Computation and InformationProcessing at The Hebrew University.Mr. Bibi was awarded the Intel Award in 2005.

Amir Globerson (M’07) received the B.Sc. degreein computer science and physics from the HebrewUniversity, Jerusalem, Israel, in 1995. He receivedthe Ph.D. degree in neural computation in 2007, alsofrom the Hebrew University.He was a postdoctoral fellow at the University

of Toronto and at MIT from 2007 to 2009. SinceSeptember 2009 he has been a Faculty Member atthe Rachel and Selim Benin School of ComputerScience and Engineering at the Hebrew Universityof Jerusalem, Jerusalem, Israel.

Dr. Globerson was a recipient of two best paper awards at the Uncertainty inArtificial Intelligence (UAI) conference (2007 and 2008), and an outstandingstudent paper award at the Advances in Neural Information Processing Systems(NIPS) conference (2004). He was awarded the Rothschild Yad Hanadiv post-doctoral fellowship in 2008 and the HP labs innovation research award in 2010.