23
SIAM J. MATRIX ANAL. APPL. Vol. 16, No. 3, pp. 863-885, July 1995 () 1995 Society for Industrial and Applied Mathematics 011 VECTOR ORTHOGONAL POLYNOMIALS AND LEAST SQUARES APPROXIMATION * ADHEMAR BULTHEELt AND MARC VAN BARELt Abstract. We describe an algorithm for complex discrete least squares approximation, which turns out to be very efficient when function values are prescribed in points on the real axis or on the unit circle. In the case of polynomial approximation, this reduces to algorithms proposed by Rutishauser, Gragg, Harrod, Reichel, Ammar, and others. The underlying reason for efficiency is the existence of a recurrence relation for orthogonal polynomials, which are used to represent the solution. We show how these ideas can be generalized to least squares approximation problems of a more general nature. Key words, orthogonal vector polynomials, discrete least squares AMS subject classifications. 41A20, 65F25, 65D05, 65D15, 30E10 1. Introduction. Let (Zk}km=o be a set of complex nodes and (w}’= o a set of positive weights (let us assume that wk > 0). We shall first solve the problem of finding the least squares polynomial approxi- mant in the space with positive semidefinite inner product m k=O Note that this is a positive definite inner product for the space of vectors (f (z0),..., f(zm)) representing the function values at the given nodes. The polynomial of degree n _< m which minimizes IIf-Pll, with Ilvll (v,v)l/2 (note that this is a seminorm) can be found as follows. Find a basis {0,..., n} for ]?n, which is orthonormal with respect to (., .). The solution p is the generalized Fourier expansion of f with respect to this basis, truncated after the term of degree n. An algorithm that solves the problem will compute implicitly or explicitly the orthonormal basis and the Fourier coefficients. As we see in the following sections, we can reduce the complexity of such an algorithm by an order of magnitude when a "short recurrence" exists for the orthogonal polynomials. We consider the case where all the zi are on the real line, in which case a three-term recurrence relation exists, and the case where all the zi are on the complex unit circle, in which case a Szeg5 type recurrence relation exists. The above-mentioned discrete least squares problem is closely related to many other problems in numerical analysis. For example, consider the quadrature formula fa b m k--0 Received by the editors July 20, 1993; accepted for publication (in revised form) by M. Gutknecht May 25, 1994. This research was supported by the Human Capital and Mobility project ROLLS of the European Community contract ERBCHRXGT930416. Department of Computer Science, Katholicke Universiteit, Leuven, Celestijnenlaan 200A, B- 3001 Leuven, Belgium (adhemar. bultheel@cs, kuleuven, ac. be). 863

Vector Orthogonal Polynomials and Least Squares Approximation

Embed Size (px)

Citation preview

SIAM J. MATRIX ANAL. APPL.Vol. 16, No. 3, pp. 863-885, July 1995

() 1995 Society for Industrial and Applied Mathematics011

VECTOR ORTHOGONAL POLYNOMIALS AND LEAST SQUARESAPPROXIMATION *

ADHEMAR BULTHEELt AND MARC VAN BARELt

Abstract. We describe an algorithm for complex discrete least squares approximation, whichturns out to be very efficient when function values are prescribed in points on the real axis or onthe unit circle. In the case of polynomial approximation, this reduces to algorithms proposed byRutishauser, Gragg, Harrod, Reichel, Ammar, and others. The underlying reason for efficiency isthe existence of a recurrence relation for orthogonal polynomials, which are used to represent thesolution. We show how these ideas can be generalized to least squares approximation problems of amore general nature.

Key words, orthogonal vector polynomials, discrete least squares

AMS subject classifications. 41A20, 65F25, 65D05, 65D15, 30E10

1. Introduction. Let (Zk}km=o be a set of complex nodes and (w}’=o a set ofpositive weights (let us assume that wk > 0).

We shall first solve the problem of finding the least squares polynomial approxi-mant in the space with positive semidefinite inner product

m

k=O

Note that this is a positive definite inner product for the space of vectors (f (z0),...,f(zm)) representing the function values at the given nodes. The polynomialof degree n _< m which minimizes

IIf-Pll, with Ilvll (v,v)l/2

(note that this is a seminorm) can be found as follows. Find a basis {0,..., n}for ]?n, which is orthonormal with respect to (., .). The solution p is the generalizedFourier expansion of f with respect to this basis, truncated after the term of degreen. An algorithm that solves the problem will compute implicitly or explicitly theorthonormal basis and the Fourier coefficients. As we see in the following sections,we can reduce the complexity of such an algorithm by an order of magnitude when a"short recurrence" exists for the orthogonal polynomials. We consider the case whereall the zi are on the real line, in which case a three-term recurrence relation exists,and the case where all the zi are on the complex unit circle, in which case a Szeg5type recurrence relation exists.

The above-mentioned discrete least squares problem is closely related to manyother problems in numerical analysis. For example, consider the quadrature formula

fa

b m

k--0

Received by the editors July 20, 1993; accepted for publication (in revised form) by M. GutknechtMay 25, 1994. This research was supported by the Human Capital and Mobility project ROLLS ofthe European Community contract ERBCHRXGT930416.

Department of Computer Science, Katholicke Universiteit, Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium (adhemar. bultheel@cs, kuleuven, ac. be).

863

864 A. BULTHEEL AND M. VAN BAREL

where w(x) is a positive weight for the real interval [a, b]. We get a Gaussian quadra-ture formula, exact for all polynomials of degree 2m-t- 1 by a special choice of the nodesand weights. The nodes Zk are the zeros of the (m + 1)st orthogonal polynomial with

respect to (f, g) f: f(x)g(x)w(x)dx. These are also the eigenvalues of the truncated2Jacobi matrix which is associated with this orthogonal system. The weights wi are

proportional to q20 where qi0 is the first component of the corresponding eigenvector.Another link can be made with inverse spectral problems. These come in sev-

eral forms. One variant is precisely the inverse of the previous quadrature problem:find the Jacobi matrix, when its eigenvalues and the first entries of the normalizedeigenvectors are given.

We shall call the computation of the quadrature formula or the eigenvalue de-composition of the Jacobi matrix direct problems, while the inverse spectral problem,and the least squares problem will be called inverse problems.

For a survey of inverse spectral problems, we refer to Boley and Golub [5]. Oneof the methods mentioned there is the Rutishauser-Gragg-Harrod algorithm. Thisalgorithm can be traced back to Rutishauser [14] and was adapted by Gragg-Harrod[11] with a technique of Kahan-Pal-Walker for chasing a nonzero element in thematrix.

For a discrete least squares interpretation of these algorithms we refer to Reichel[12]. When the zi are not on the real line, but on the unit circle, similar ideas leadto algorithms discussed by Ammar and He [4] and Ammar, Gragg, and Reichel [2]for the inverse eigenvalue problem and to Reichel, Ammar, and Gragg [13] for a leastsquares interpretation.

We first survey the general theory in the context of discrete least squares ap-proximation where the zk are arbitrary complex numbers in 2, 3, and 4. In 5, weexplain how the complexity can be reduced with an order of magnitude when shortrecurrences exist.

The next step (6) is to generalize these results to the problem of minimizing

m

(2) min IwOkPO(Zk) +’." + wakpa(zk)l2,k=O

where the {wok,..., w,k}=o are given complex numbers and the polynomials pi ofdegree at most di, i 0,..., c must be found, with the constraint that at least oneof them is monic of strict degree.

When a 1, this generalization is related with rational approximation, in con-trast with the previously described problem, which is related to polynomial approx-imation. We refer to the generalized problem as the matrix case, while the simplerpolynomial case is eferred to as the scalar case.

For the matrix case, we may distinguish between two levels of complication. Whenall the degrees di are equal, it turns out (7) that the solution method can be describedin terms of square matrix orthogonal polynomials of size a/ 1, and the previous theoryof scalar orthogonal polynomials is readily generalized.

When not all the degrees di are equal, we are in the most general case that weconsider here (8). The solution can now be described in terms of vector orthogonalpolynomials, which allows the scalar orthogonal polynomials of the first case and thematrix orthogonal polynomials of the second case to combine, both of which show upduring the solution of the problem.

The breakdown of the algorithm will only occur in the case of exact interpolation.This is discussed in 9.

DISCRETE LEAST SQUARES 865

To avoid an unduly complicated notation, we mainly restrict ourselves in thispaper to the case a 1, but the generalization to general a >_ 1 should be obvious.

2. Polynomial least squares approximation. Discrete least squares approx-imation by polynomials is a classical problem in numerical analysis where orthogonalpolynomials play a central role.

Given an inner product (., "/defined on Pm Pro, the polynomial p E Pn of degreeat most n _< m, which minimizes the error

is given by

n

P E pkak, ak (j’, ,)

when the (k} form an orthonormal set of polynomials:

The inner product we consider here is of the discrete form (1) where the zi are distinctcomplex numbers.

Note that when m n, the least squares solution is the interpolating polynomial,so that interpolation can be seen as a special case.

To illustrate where the orthogonal polynomials show up in this context, we startwith an arbitrary polynomial basis (eke, Ck E IPk --]Pk-1. Setting

n

C,k=O

the least squares problem can be formulated as finding the weighted least squaressolution of the system of linear equations

n

k=O

i O,...,m,

which is the same as the least squares solution of

WnA WF,

where W diag(w0,..., win) and

,= A=f(-,)r)o(Zm) .)n(Zrn)

Note that when Ck(z) zk, the power’ basis, then n is a rectangular Vandermondematrix.

The normal equations for this system are

(H w,)A tpH W2F.

866 A. BULTHEEL AND M. VAN BAREL

When the Ck are chosen to be the orthonormal polynomials k, then HnW2n In+land the previous system gives the solution An HnW2F immediately.

When the let squares problem is solved by QR factorization, i.e., when Q is anm x m unitary matrix such that QHWn [RT oT]T is upper triangular, we mustsolve the triangular system given by the first n + 1 rows of

0 A X

where X is related to the residual vector r by

[ 0X ] =QHr’ r=WA-WF.Note that the least squares error is ]Z]] ]]r]]. Again, when the Ck are replaced bythe orthonormal polynomials , we get the trivial system (m n)

0

Note ghat a unitary matrix Q is always relaged go ghe orthonormal polynomials by

Q w,where

vo(zo) v (zo) 1since

QHQ (Hw2( ImT1"3. The Hessenberg matrix. From the previous discussion, it follows that the

central problem is to construct the orthonormal basis (k}. In general, the polynomialzcpc_(z) can be expressed as a linear combination of the polynomials 0,...,leading to a relation of the form

+... + o vo(z), k= 1,...,m+l.

We can express the previous relations as

(3) Tz[o(z),..., m(z)] [o(z),..., m(z)]H + em+m+(Z)rlm+,m+,

where H is an upper Hessenberg matrix

?01 ?Om ?0 m+rl rlm rl,m+

?mm ?m,m-I

and eT [0 0 0 1]m-bl

DISCRETE LEAST SQUARES 867

Note that a discrete inner product of the proposed form will cause a breakdownin the generation of the polynomials at stage m + 1. Indeed, we should identify afunction with the (m + 1)-vector of its function values in z, k 0,... ,m. Thuswhen we say the "polynomial p," we actually mean the vector (p(zo),...,p(Zm)).Thus our "function space" is a space of (m + 1)-vectors, which is inherently (m + 1)-dimensional, and thus the (m + 1)st orthogonal polynomial will be orthogonal to thewhole space, hence it must be zero. Thus, if k are these orthogonal polynomials,then [m+l(Z0),..., m+l(Zm)]T will be the zero vector. This is equivalent to sayingthat ,+1 is proportional to (z- zo)... (z- Zm).

Even when we use terms as "functions" and "polynomials," the problem consid-ered is in fact a vectorial problem, which can be best formulated in terms of matrices,which we do below.

Setting (I) (I)m as before, we rewrite the relation (3) as

Z( (H

with Z diag(z0,..., Zm).Multiplying with the diagonal matrix W and using WZ ZW, we are led to

H (W()HZ(W() QHZQ,

which means that the diagonal matrix Z and the Hessenberg matrix H are unitarilysimilar.

The constant polynomial 0 is normalized when it is equal to %-0 with 700 givenby

QHw1 [v/oo, 0,..., O]T,where wl [wo,..., w]T. Indeed, using Q W(I) and supposing I1 oll- 1, we seethat all the entries in QHwi are zero by orthogonality, except for the first one, whichis 1/o.

This condition is not sufficient to characterize Q completely. We can fix it uniquelyby making the k have positive leading coefficients. This will be obtained when all

2the T]kk, k O, 1,..., m are positive. Since we assume that all the weights w arepositive, the T]kk are nonzero and therefore this normalization can always be realized.

We thus obtained a one-to-one relation between the data {zi, wi}n, the unitarymatrix Q and the elements v/ij, i 0,..., m, j 0,..., m + 1 of an extended (with/o0) Hessenberg matrix. This also fixes the orthonormal polynomials.

Since Z and H are unitarily similar, they have the same spectrum and the con-struction of H from Z by unitary similarity transformations is in fact an inversespectral problem: given the spectrum Z and the first components of the eigenvectors,find the set of orthonormal eigenvectors (the columns of QH), such that QHZQ is theeigenvalue decomposition of some upper Hessenberg matrix with the normalizationdescribed above.

In the direct problem, one computes the eigenvalues {z}n and the eigenvectorsQ from the Hessenberg matrix, e.g., with the QR algorithm. For the inverse problem,the Hessenberg matrix is reconstructed from the spectral data by an algorithm thatcould be called an inverse QR algorithm. This is the Rutishauser-Gragg-Harrod al-gorithm for the case of the real line [11], [12] and the unitary inverse QR algorithmdescribed in [2] for the case of the unit circle. For the least squares problem, weadd the function values f(zk) and when these are properly transformed by the simi-larity transformations of the inverse QR algorithm, this will result in the generalized

868 A. BULTHEEL AND M. VAN BAREL

Fourier coefficients of the approximant and some information about the correspondingresidual. Indeed, the solution of the approximation problem is given by

Note that the normal equations are never explicitly formed.The whole scheme can be collected in one table giving the relations

Q’[w0 II z][ A"r]oo0

?Om ?O,m+l:m

Tram

with w0 WF and W1 --[W0,..., Wra]T as before. The approximation error isFor further reference we refer to the matrix of the right-hand side as the extendedHessenberg matrix.

4. Updating. Suppose that An was computed by the last scheme for some dataset {zi, fi, wi}n. We then end up with a scheme of the following form (n 3, m 5)"

xxX

xxx

X X X X X

X X X X X

X X X X X

X X X X

X Xx x

A new data triple (Zm+:, fm.+.l, Wrn--l) can be added, for example, in the top line. Thethree crosses in the top line of the left scheme below represent Wm+lfm+l, Wm+ andZm+, respectively. The other crosses correspond to the ones we had in the previousscheme.

x

xxxx

X X X X X

X X X X X

X X X X X

X X X X

X X X

X X

X X

X

X

X

X

xx

X X X X X X X

X X X X X X X

X X X X X X

X X X X X

X X

X X

X X

This left scheme must be transformed by unitary similarity transformations into theright scheme, which has the same form as the original one but with one extra rowand one extra column. This result is obtained by eliminating the (2,2) element by an

elementary rotation/reflection in the plane of the first two rows. The correspondingtransformation on the columns will influence columns 3 and 4 and will introduce anonzero element at position (3,3), which should not be there. This is eliminated by a

rotation/reflection in the plane of rows 2 and 3, etc. We call this procedure chasingthe elements down the diagonal. In the first column of the result, we find above thehorizontal line the updated coefficients An. When we do not change n, it is sufficientto perform only the operations that influence these coefficients. Thus we could have

DISCRETE LEAST SQUARES 869

stopped after we obtained the form

X X x xx x xx xxxxx

x x x x xx x x x xx x x x xx x x x x

x x x xx x x x

x x

This can be done with O(n2) operations per new data point. In the special case ofdata on the real line or on the unit circle, this reduces to O(n) operations as we seein the next section.

5. Recurrence relations. The algorithm described above simplifies consider-ably when the orthogonal polynomials satisfy a particular recurrence relation. Aclassical situation occurs when the z E , i 0, 1,..., m. Since also the weights ware real, the Q and H matrix will be real, which means that we can drop the com-plex conjugation from our notation. However, in view of the generalization to follow,where we have complex numbers instead of the wi, we keep for the moment the bar,although it has no effect, being applied to real numbers. Thus we observe that forzi E R, the Hessenberg matrix H satisfies

HH (QHZQ)H QHZQ H.

This means that H is Hermitian and therefore tridiagonal. The matrix H reduces tothe classical Jacobi matrix

a0

al

containing the coefficients of the three term recurrence relation

#-1 O, Z#k(Z) bkk-(Z) / akCPk(Z) -t- bk+Pk+(Z), k O, 1,..., m 1.

A similar situation occurs when the zi are purely imaginary, in which case the matrixH is skew Hermitian. We do not discuss this case separately.

The algorithm we described before now needs to perform rotations (or reflections)on vectors of length 3 or 4, which reduces the complexity of the algorithm by an order.This is the basis of the autishauser-Gragg-Harrod algorithm [14], [11]. See also [5],

In this context, it was observed only lately [9], [10], [2], [13], [3] that the situationwhere the zi " (the unit circle) also leads to a simplification. It follows from

HHH QHzHzQ QHQ Im+lthat H is then a unitary Hessenberg matrix. The related orthogonal polynomialsare orthogonal with respect to a discrete measure supported on the unit circle. Thethree-term recurrence relation is replaced by a recurrence of Szegb-type

870 A. BULTHEEL AND M. VAN BAREL

with

(z) zk(1/) e k and a 1 I > 0,

where the "1k are the so-called reflection coefficients or Schur parameters. Just like inthe case of a tridiagonal matrix, the Hessenberg matrix is built up from the recurrencecoefficients ’1k, ak. However, the connection is much more complicated. For example,for m 3, H has the form

-’11 -0"1"1)’2 -r10"2’13 -(:r16263"14

ffl -’112 -1ff23 -1

The Schur parameters can be recovered from the Hessenberg matrix by

aj=jj, j=l,...,m, 700=1/0=a0,

j 1,...,m+1.

The complexity reduction in the algorithm is obtained from the important observationthat any unitary Hessenberg matrix H can be written as a product of elementaryunitary factors

with

H G1G2...GmGm+I

and

Gk ik_l @ [ --Tk ak ]aIm-k, k= 1, m

G’ diag(1, 1 -’1m+1)m+l

This result can be found, e.g., in [9], [2].Now an elementary similarity transformation on rows/columns k and k + 1 of H,

represented in this factored form, will only affect the factors Gk and part of the factorsGk-1 and Gk+l. Again, these operations require computations on short vectors oflength 3, making the algorithm very efficient again. For the details consult [9], [2],[13]. For example, the interpolation problem (n m) is solved in O(m2) operationsinstead of O(m3).

6. Vector approximants. The previous situation of polynomial approximationcan be generalized as follows.

m find polynomials pk E Id, k 0,... ,Given {zi; fo, fa; wo, w}=o,such that

Izo’foPo(z’) /... / wo,,f,,p,(z,)li=0

is minimized. Now it does not really matter whether the wi are positive or not, sincethe products wjifi will now play the role of the weights and the fji are arbitrary

DISCRETE LEAST SQUARES 871

complex numbers. Thus, to simplify the notation, we could as well write wji insteadof wjifji since these numbers will always appear as products. Thus the problem is tominimize

m

+’" +i----0

Setting d-- (do,... ,da), Pd --[]do,...,]do,]T,

Wi "--[WOi,...,gai], p(Z) [Po(Z),... ,pa(z)]T e d,

we can write this as

m

minE IwiP(Zi)12’ P Fd.i--o

Of course, this problem has the trivial solution p 0, unless we require at least oneof the pi(z) to be of strict degree di, e.g., by making it monic. This, or any othernormalization condition could be imposed for that matter.

In this paper we require that p is monic of degree da, and rephrase this as

To explain the general idea, we restrict ourselves to c 1, the case of a generalc being a straightforward generalization, which would only increase the notationalburden. Thus we consider the problem

m

minE IwoPo(Z) + wp(z)l, Po Fdo,P F.i=0

Note that when w0i wi > 0, wli -wifi, and pl =_ 1 E F0M, (i.e., d 0), then weget the polynomial approximation problem discussed before.

When we set w0i wifoi and wi -wifi with wi > 0, the problem becomes

m2minEwi IfoiPo(zi) IiPl (zi)l 2,

i=o

which is a linearized version of the rational least squares problem of determining therational approximant Po/P for the data fi/foi, or equivalently the rational approx-imant P/po for the data foi/fi. Note that in the linearized form, it is as easy toprescribe pole information (f0 0) as it is to fix a finite function value (f0 : 0).

The solution of the general case is partly parallel to the polynomial case dx 0discussed before, and partly parallel to another simple case, namely, do dl n,which we discuss first in 7. In the subsequent 8, we consider the general case wheredo d.

7. Equal degrees. We consider the case a 1, do d n. This means thatd is here equal to I2n.

7.1. The optimization problem. We must find

m

mine IwP(z)lU’ Po , p Mni=0

872 A. BULTHEEL AND M. VAN BAREL

where w, [w0 wx] and p(z) [p0(z) pl (z)]T e I2n1in [15], [17]. We propose a solution of the form

k=O

where

This problem was considered

(z)e C k 0,1,.. n.k-l ak E

Proposing p(z) to be of this form assumes that the leading coefficients of the blockpolynomials k are nonsingular. Otherwise this would not represent all possible cou-ples of polynomials (Po,P) I2n. We call this the regular case and assume for themoment that we are in this comfortable situation. In the singular case, a breakdownmay occur during the algorithm, and we shall deal with that separately.

Note that the singular case did not show up in the previous scalar polynomialcase, unless at the very end when n m + 1, since the weights were assumed to bepositive. We see below that in this block polynomial situation, the weights are notpositive and could even be singular.

When we denote

the optimization problem is to find the least squares solution of the homogeneouslinear system

W(nAn 0

with the constraint that p should be monic of degree n.For simplicity reasons, suppose that m + 1 2(m + 1) is even. If it were not,

we would have to make a modification in our formulations for the index m. Thealgorithm however, does not depend on m being odd or even, as we shall see later.

By making the block polynomials k orthogonal so that

m

(4) Ek(z)HwHwt(Z) 5kI2, k, O, 1,..., m’,i--O

we can construct a unitary matrix Q C(m+l)x(m+l) by setting

Q W(I),

where (I) (I)m, is a (2m + 2) (m + 1) matrix, so that Q is a square matrix of sizem+l.

We also assume that the number of data points m + 1 is at least equal to thenumber of unknowns 2n / 1 (recall that one coefficient is fixed by the monic normal-ization).

DISCRETE LEAST SQUARES 873

The unitarity of the matrix Q means that

QHQ (HwHw( Im-t.-1

and the optimization problem reduces’to

m

minEP(zi)HwHwip(zi) min(AC’)H(HwHw’(ACm’)i-0

min(ACm )H (ACre,)

minE aZakk=O

m

min E(Jalk[2 + la2k12),k=O

ak [alk a2k]T

with the constraint that p(z) E ]P2nI; thus an+l am, 0, and pl E IPnM. Sincethe leading term of p is only influenced by nan, we are free to choose ao,..., an-,so that we can set them equal to zero, to minimize the error. Thus it remains to find

min(lanl2 +

such that

a2n p (z) ]nM

To monitor the degree of pl, we require that the polynomials k have an uppertriangular leading coefficient

Ok ’Tk ] Zk0 +""

with ak, flk > 0. Note that this is always possible in the regular case. The conditionp FnM then sets a2n 1/n and aln is arbitrary, hence to be set equal to zero ifwe want to minimize the error.

As a conclusion, we have solved the approximation problem by computing thenth block polynomial n, orthonormal in the sense of (4), and with leading coefficientupper triangular. The solution is

p (z) (Pn (z)a2n

a2n 1//n.

7.2. The algorithm. As in the scalar polynomial case, expressing zpk(z) interms of 0,..., k+ for z {z0,... ,Zm} leads to the matrix relation

Z(I) (H,

where as before (I) (I)m,, Z diag(z0,... ,zm), Z Z (R) I2 diag(z012,... ,ZmI2),and H is a block upper Hessenberg matrix with 2 2 blocks. If the leading coefficient

874 A. BULTHEEL AND M. VAN BAREL

of 99k is upper triangular, then the subdiagonal blocks of H are upper triangular. Thecomputational scheme is compressed in the formula

0o ?’]0m’ 0,m’+lTllm’ Tll,m’+l

m’m’

where w [wTo,..., WTm]T and where all ii are 2 x 2 blocks and the viii are uppertriangular with positive diagonal elements. Thus

990 7o; z99k-i (z) 990(Z)70k +’’"-t- 99k(Z)Ttk, k 1, m’.

The updating after adding the data (Zm+l, Wm+l), where Wm+l (w0,/--l, Wl,/-l),makes the transformation with unitary similarity transformations from the left to theright scheme below. The three crosses in the top row of the left scheme represent thenew data.

x x x x x x x x x x x xX X X X X X X X X X X X X X X X

X X X X X X X X X X X X X X

x x x x x x x x x x x xX X X X X X X X X X

X X X X X X X X

X X X X X X

The successive elementary transformations eliminate the crosses on the subdiagonal,chasing them down the matrix. This example also illustrates what happens at theend when m is an even number: the polynomial 99m’ E I2m, instead of 99m’ I2m, 2.Again, when finishing this updating after 99n has been computed, it will require onlyO(n2) operations per data point introduced. In the special case of data on the realline or the unit circle, this reduces to O(n) operations. For the details, we refer to[], [i].

By the same arguments as in the scalar case, it is still true that H is Hermitian,hence block tridiagonal, when all the zi are real. Taking into account that the sub-diagonal blocks are upper triangular, we obtain in this case that H is pentadiagonaland the extended Hessenberg matrix has the form

Bo Ao

with the 2 2 blocks B upper triangular and the A Hermitian. This leads to thefollowing block three-term recurrence

990 B z99k(z) 99k-lB + 99k(z)Ak + 99k+l(z)B+l, 0 <_ k < m’

This case was considered in [15].

DISCRETE LEAST SQUARES 875

Similarly, the case where all zi lie on the unit circle 21", leads to a 2 x 2 block gen-eralization of the corresponding polynomial case. For example, the extended unitaryblock Hessenberg matrix takes the form (m’ 3)

(To -172 -1273-F172 -F1]273 F123a2 -1-’273 123

0"3 F3

7i, O’i, ’]i, Fi ( C2x2.

The matrices

Uk__ [ --Tk k ]ak Fk

are unitary: uHut I4. Note that by allowing some asymmetry in the U} we do notm’need a -74 in the last column as we had in the scalar case. We have for k 1,

the block Szeg5 recurrence relations

++

which start with 0 a-1.The block Hessenberg matrix can again be factored as

H G1G2... Gin,

with

Gk I2(k-) @ Uk Im-2k-, k 1,..., m’.

The proof of this can be found in [17]. This makes it possible to perform the elemen-tary unitary similarity transformations of the updating procedure only on vectors ofmaximal length 5, very much like in the case of real points zi. Thus also here, thecomplexity of the algorithm reduces to O(m2) for interpolation. More details canbe found in [17]. For the case of the real line, the algorithm was also discussed inIll, solving an open problem in [5, p. 615]. The previous procedure now solves theproblem also for the case of the unit circle.

7.3. Summary. The casec- 1, do dl n and also the case c >_ 1, do-d do n for that matter, generalizes the polynomial approximation problemby constructing orthonormal polynomials } which are ( / 1) x (( / 1) polynomialmatrices and these are generated by a block three-term recurrence relation when allz E ]R and by a block Szeg5 recurrence relation when all z

The computational algorithm is basically the same, since it reduces the extendedmatrix

[wlZ] e C(’+’)(+’+)

by a sequence of elementary unitary similarity transformations to an upper trapezoidalmatrix

Q ] -[H01HI

876 A. BULTHEEL AND M. VAN BAREL

with H block upper Hessenberg with (a / 1) ( / 1) blocks and

H0 Q’w 0,..., 0]

where 700 E C(a+l) (a+1) is upper triangular with positive diagonal elements, as wellas all the subdiago.nal blocks of H. FQr n m’, where ( + 1)(m’ + 1) 1 m + 1(which implies that am, is of size (c + 1)) we solve an interpolation problem. Itrequires O(m2) operations when zi E R or ’, instead of O(m3) when the z arearbitrary in C.

8. Arbitrary degrees. In this section we consider the case 1 with do dl.For more details we refer to [16].

8.1. The problem. We suppose without loss of generality that do 5 anddl n + 5, n, 5 >_ 0. We must find once more

m

min IwiP(Zi)l 2, Po 17, P ’+

with wi--[w0i wi] and [p0(z) p (z)]T e d, d-- (do, d).The polynomial approximation problem is recovered by setting 5 0. The case

do dl 5 is recovered by setting n 0.The simplest approach to the general problem is by starting with the algorithm.

In the subsequent subsections, we propose a computational scheme involving uni-tary similarity transformations, next we give an interpretation in terms of orthogonalpolynomials and finally we solve the approximation problem.

8.2. The algorithm. Comparing the cases 5 0 and n 0, we see that thealgorithm applies a sequence of elementary unitary similarity transformations on .anextended matrix

[wlZ1, W--[W,...,WTm]T, Z-’diag(zo,...,Zm)

to bring it in the form of an extended (block) upper Hessenberg

QH[w[Z] [ I2 ] [Ho[H]Q

When n 0, the transformations were aimed at chasing down the elements of [w[Z]below the main diagonal, making [H0]H] upper triangular. Therefore H turned outto be block upper Hessenberg.

When 0, the transformations had much the same objective, but now, therewas no attempt to eliminate elements from the first column of w, only elements fromthe second column were pushed to the SE part of the matrix. The matrix then turnedout to be upper Hessenberg in a scalar sense.

The general case can be treated by an algorithm that combines both of theseobjectives. We start as in the polynomial case (n 0), chasing only elements fromthe second column of w. However, once we reach row n + 1, we start eliminatingelements in the first column, too.

DISCRETE LEAST SQUARES 877

Applying this procedure shows that the extended Hessenberg [H0[H] has the form

xxxxx

X

0 n mx -x x x xX X X X X

X X X X

X X X

X X

x xx

x x X X

x x x xX x X xx x x xx x x xx x x xx x x xx x x

x X

x

x x

x xx x

x

x x x xx x x

x xX

xX

xxx

in the first row)

x x x x x x x x xX X X X X X

X X X X X X

X X X X X X

X X X X X X

xX X X x xX X X X X X

X X X X X

X X X X

X X X

the element (R) is chased down the diagonal by elementary unitary similarity trans-formations operating on two successive rows/columns until we reach the followingscheme (where (R) 0 and (R) and e are the last elements introduced which are ingeneral nonzero)

X X X X X X X

X X X X X X X

X X X X X X X

X X X X X X X

X X X X X X X

e x x x x x x x) x x x x x x x

(R) x x x x x xX X X X X

X X X X

X X X

Now the element (R) in row n + 1 is eliminated by a rotation/reflection in the planeof this row and the previous one. The corresponding transformation on the columns

xxxxX

xxX

xX

X X X

This means that the NW part of H, of size (n / 1) x (n + 1), will be scalar upperHessenberg as in the case n 0, while the SE part of size (m n + 2) x (m n + 2)has the block upper Hessenberg form of the case 5 0.

The updating procedure works as follows. Starting with (the new data are found

878 A. BULTHEEL AND M. VAN BAREL

will introduce a nonzero element at position (R). Then (R) and (R) are chased down thediagonal in the usual way until we reach the final situation

X X X X X X X

X X X X X X

X X X X X

X X X X

X X X

X X

X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X

X X X X

X X X

8.3. Orthogonal vector polynomials. The unitary matrix Q involved in theprevious transformation was for the case 5 0 of the form Q Wm, where W wasa scalar diagonal matrix of the weights and (I)m was the matrix with/j-element givenby oj(zi), with oj the jth orthonormal polynomial.

When n 0, then Q Wm,, where W is the block diagonal with blocks beingthe 2 1 "weights" wi and (I)m, is the block matrix with 2 2 blocks, where the/j-block is given by oj(zi), with oj the jth block orthonormal polynomial.

For the general case, we have a mixture of both. For the NW part of the Hmatrix, we have the scalar situation and for the SE part we have the block situation.

To unify both situations, we turn to vector polynomials rk of size 2 1. For theblock part, we see a block polynomial o as a collection of two columns and set

o(z) [r_l(Z)lr.(z)].

For the scalar part, we embed the scalar polynomial o in a vector polynomial rj bysetting

In both ces, the orthogonality of the translates into the orthogonality relation

i=O

for the vector polynomials k. Let us apply this to the situation of the previousalgorithm. For simplicity, we suppose that all zi . For zi , the situation issimilar.

For column number j 0, 1,..., n- 1, we are in the situation of scalar orthogonalpolynomials: Qj wij(z) wirj(zi). Setting

x b0 a0 b

bl al "

DISCRETE LEAST SQUARES 879

we have for j 0,..., n- 2, the three-term recurrence relation

z(j(z) Oj-1 (z)bj + gj(z)aj + j+l (z)bj+l,

By embedding, this becomes

z(z) _() +(z) + +(),+, 0 [0 o1r.Thus, setting

H [(zo)r,..., .(z)Vlr,we have for the columns Qj of Q the equality

Qj WIIj j 0,1,...,n- 1.

For the trailing part of Q, i.e., for columns (n + 2j 1, n + 2j), j 0,1,..., we are inthe block polynomial case. The block polynomials j(z) group two vector polynomials

v() [+=-x(z)l-+()],

which correspond to two columns of Q, namely,

qi [Q+9.-llQ=+2i].

Observe that we have the following relation between Qj and the block orthogonalpolynomials

Q2i,n+2j-1Qij wioj(zi) Q2i+l,n+2j-1Q2i,n+2j ]Q2i+l,n/2j

where this time wi [Woi wli]. As above, denote the vector of function values forby Hi. The block column of function values for is denoted by . Then clearly

Qj W4j,

Denoting in the extended Hessenberg matrix

x b0

[H01HI 0 A0

we have the block recurrence

zj(z) j_(z)B + y(z)Ay + j+l(Z)Bj+l, j =0,1,

The missing link between the scalar and the block part is the initial condition for thisblock recurrence. This is related to columns n- 2, n- 1 and n of Q. Because columnsn- 2 and n- 1 are generated by the scalar recurrence, we know that these columnsare Qj WHj, j n- 2, n- 1, where the Hj are related to the embedded scalar

880 A. BULTHEEL AND M. VAN BAREL

polynomials. A problem appears in column Qn where the three-term recurrence ofthe leading (scalar) part migrates to the block three-term recurrence of the trailing(block) part, i.e., from a three-term to a five-term scalar recurrence. We look at thiscolumn in greater detail. Because

"do Woo

0

0 WOm

Q,HWE, E

we have

Qodo +’" + Q,d, WE;

thus

1(WE [Q01 IQn-1]A-I) [do, d,-]TAn_ ..,

1(WE W[nol Ir,_] *A_I)

d

(E-[nol in- ]A*=W _)

W(E P_), P,_ [Hoi... IH,- A*d.

x] -x"

Setting Qn WHn, Hn [Trn(zo)T,... ,Trn(zm)T]T, we find that

111],(z)= p,_(z)

where

Pn- (Z) oo(z)do +"" + on-i (z)dn-

is the polynomial least squares approximant of degree n- 1 for the data (zi, wi),i=O,...,m.

8.4. Solution of the general problem. Now we are ready to solve the generalproblem. We start with the degree structure of the polynomials 7rj(z). Suppose thejth column of Q is Qj, which we write as

Q WH, I [,(0)r,...,.(z)r]r

with W diag(w0,..., win) and 7rj(z) [j(z) Cj(z)]T. Then it follows from theprevious analysis ttiat the Cj are the sdalar orthogonal polynomials j, and hence thedegree of Cj(z) is j, for j 0, 1,... ,n- 1. Moreover, the are zero for the sameindices (their degree is -oc). For j n, we just found that Cn is 1/dn, thus of.degree0 and Cn is of degree at most n- 1, since the latter is proportional to the polynomial

DISCRETE LEAST SQUARES 881

least squares approximant of that degree. With the block recurrence relation, we noweasily find that the degree structure of the block polynomials

Cn+2j n+j--1 n+j--1

for j 1, 2,..., while oo has degree structure- 0 ]n-1 n-1

It can be checked that in the regular case, that is when all the subdiagonal ele-ments b0,..., bn-1 as well as dn are nonzero and when also all the subdiagonal blocksB1,...,B are regular (upper triangular), then the degrees of Ck Ok are precisely kfor k 0, 1,..., n- 1 and in the block polynomials oj, the entries Cn+2j and Cn+2j-1have the precise degrees that are indicated, i.e., j and n +j- 1, respectively. Thus, ifwe propose a solution to our approximation problem of the form (suppose m _> n+2)

n+2+l

j=0

then p(z) [po(z) p(z)]T will automatically satisfy the degree restrictions do <_ 5and d _< n / 5. We must find

min(Ann H HwH n’ n + 26 + 1,

where

A [ao,...,an,]T and Fin, [H01...lIIn’].

Since WFIn, form the first n / 1 columns of the unitary matrix Q, this reduces to

min(A, )H (A,l],) min laj 12.j=0

If we require as before that Pl(Z) is monic of degree n .+ 5, then an, 1/n, whereis the leading coefficient in Cj. The remaining aj are arbitrary. Hence, to minimizethe error, we should make them all zero. Thus our solution is given by

9. The singular case. Let us start by considering the singular case for dod n. We then generate a singular subdiagonM block ?Tkk of the Hessenberg matrix.The algorithm performing the unitary similarity transformations will not be harmedby this situation. However, the sequence of block orthogonal polynomials will breakdown. From the relation

+... +it follows that if Tkk is singular, then this cannot be solved for ok(z). In the regularcase, all the rjj are regular and then the leading coefficient of ok is r... Tk-k1. Thus,if all the rjj are regular upper triangular, then also the leading coefficient of ok will

882 A. BULTHEEL AND M. VAN BAREL

be regular upper triangular. As we said in the Introduction, the singular situation willalways occur, even in the scalar polynomial case with positive weights, but only atthe very end where k m/ 1. That is exactly the stage where we reach the situationwhere the least squares solution becomes the solution of an interpolation problem. Weshow below that this is precisely what also happens when some premature breakdownoccurs.

Suppose that the scalar entries of the extended block Hessenberg matrix are[H01H] [h],=0, (We use h to distinguish them from the block entries U.)Suppose that the element is the first element on its diagonal that becomes zeroand thus produces some singular subdiagonal block in H. Then it is no problem toconstruct the successive scalar columns of the matrix until the recurrencerelation hits the zero entry h. If we denote for O, I,..., I, the jth columnof as II, then we know from what we have seen, that II represents the vector offunction values at the nodes zo,..., Zm of some vector polynomial rj(z) E p2xl. Theproblem in the singular case is that rk(z) cannot be solved from

because hkk O. However, from

noln ]=[*0 Iz] [it follows that

0]wo Qohoo; w Qoho + Qhll

WII (wl WHoho)(w Qoho)-Qh -0.

This means that we get an exact approximation since wir (zi) 0, i 0,..., m.For the general case hkk 0, k >_ 2, we have that

ZQk-2 Qoho Qt;-h-, Qkht; O.

Since Qj WHj for j 0,..., k 1, we also have

0 ZWHk_2 WHohok WIIk-lhk-l,k

() (Zn- noo n-l-l,),

Then

and for k > 2

ZQk-2 Qohok +’" + Qkhk,

where Qi, j 0,1,... denotes the jth column of Q. We shall discuss the case hkk 0separately for k 0, k 1 and k _> 2 separately.

If h00 0, then w0 0. This is a very unlikely situation because then there isonly a trivial solution (p0, p) (1, 0) which fits exactly.

Next consider hl --0; then define ;r as

r= [0 ]-rhl’l

DISCRETE LEAST SQUARES 883

where Z- Z (R) 12. Define the polynomial

r(z) zrc_2(z) "fo(z)ho

then, WH W[’f(zo)T,..., r(Zm)T]T will be zero since it is equal to the expression(6), which is zero. This means that

wi’fk(zi) 0, i 0,...,m.

The latter relations just tell us that this "f is an exact solution of the approximationproblem, i.e., it interpolates.

In the general situation where do dl, we must distinguish between the scalarand the block part. For the scalar part we can now also have a breakdown in thesequence of orthogonal polynomials since the weights are not positive anymore, butarbitrary complex numbers.

Using the notation

hoo hol ho,n+

hnn hn,n+l

for the NW part of the extended Hessenberg matrix, the situation is as sketchedabove: whenever some hkk is zero, we will have an interpolating polynomial solution.It then holds that

r(z) z’f_ (z) "fo(z)hot: r_(z)h_,k

and because wn W[rT(zo),.. ,T,’fk (Zm)]T is zero, we get

w’f(z) 0, 0,...,m,

identifying r(z) as a (polynomial) interpolant.For the SE part, i.e., for the block polynomial part, a zero on the subsubdiagonal

(i.e., when we get a singular subdiagonal block in the Hessenberg matrix), will implyinterpolation as we explained above for the block case.

The remaining problem is the case where the bottom element in the first columnof the transformed extended Hessenberg matrix becomes zero. That is the elementthat has been previously denoted by dn. Indeed, if this is zero, then our derivation,which gave (5)

"fin(Z) Pn--l(Z)does not hold anymore. But again, here we will have interpolation, i.e., a least squareserror equal to zero. It follows from the derivation in the previous section that whendn O,

Thus

W(E-P_)=dnQ,=O.

wi 0 7rk(z4)dk wi O,k=O

pn-(Zi)

884 A. BULTHEEL AND M. VAN BAREL

where pn-l(z) k=on-1 k(z)a. This is the same as

Woi WliPn-1 (Zi) O, 0,..., m,

which means that (1,pn-(z))/d’ with d’ 0 to normalize pn-(z) as a monic poly-nomial fits the data exactly.

10. Conclusion. We have shown that the inverse QR algorithm for solving dis-crete polynomial approximation problems for knots on the real line or on the unitcircle can be generalized to more general approximation problems of the form (2).

In the previous section, we only considered the problem of updating, i.e., how toadapt the approximant when one knot is added to the set of data. There also existsa possibility to consider downdating, i.e., when one knot is removed from the set ofinterpolation points. For the polynomial approximation problem, this was discussedin [6] for real data and in [3] for data on the unit circle. The procedure can bebased on a direct QR algorithm which will "diagonalize" the Hessenberg matrix inone row and column (e.g., the last one). This means that the only nonzero elementin the last row and the last column of the transformed Hessenberg matrix is Zm onthe diagonal. The unitary similarity transformations on the rest of the extendedHessenberg matrix brings out the corresponding weight in its first columns and theleading m m part gives the solution for the downdated problem. Of course, justas the updating procedure can be generalized, the downdating procedure can also beadapted to our general situation. A combination of downdating and updating providesa tool for least squares approximation with a sliding window, i.e., where a windowslides over the data, letting new data enter and simultaneously forgetting about theoldest data.

The inverse QR algorithm that we described in the previous sections is, in princi-ple, applicable in the situation of arbitrary complex data. However its complexity canbe reduced by an order of magnitude if the knots are real or located on the unit circle.The secret of this complexity reduction is the exploitation of a recurrence relation forthe corresponding orthogonal polynomials and the parametrization of the Hessenbergmatrix involved in terms of the recurrence coefficients.

The polynomial discrete least squares approximation problem discussed in thepapers where the algorithm was first conceived gave rise to the construction of asequence of polynomials orthogonal with respect to a discrete inner product. In themore general problem, these generalize to orthogonal block polynomials when all thedegrees of the approximating polynomials are equal, or, in the more general case ofarbitrary degrees, both scalar and block orthogonal polynomials appear that can beuniformly treated as vector orthogonal polynomials.

The algorithm has been reported to have excellent numerical stability properties[11], [12] and is preferred over the so-called Stieltjes procedure [12]. See also [7],[8]. Moreover it is well suited for implementation in a pipeline fashion on a parallelarchitecture [17], [15].

REFERENCES

[1] G. AMMAR AND W. GRAGG, O(n2) reduction algorithms for the construction of a band matrixfrom spectral data, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 426-431.

[2] G. AMMAR, W. GRAGG, AND L. REICHEL, Constructing a unitary Hessenberg matrix from spec-tral data, in Numerical Linear Algebra, Digital Signal Processing and Parallel Algorithms,G. Golub and P. Van Dooren, eds., Vol. 70, NATO-ASI Series, F: Computer and SystemsSciences, Springer-Verlag, Berlin, 1991, pp. 385-395.

DISCRETE LEAST SQUARES 885

[3] G. AMMAR, W. GRAGG, AND L. REICHEL, Downdating of Szeg5 polynomials and data-fittingapplications,.Linear Algebra Appl.,. 172 (1992), pp. 315-336.

[4] G. AMMAI AND C. HE, On an inverse eigenvalue problem for unitary Hessenberg matrices,Linear Algebra Appl., 1994, to appear.

[5] D. BOLEY AND G. GOLUB, A survey of matrix inverse eigenvalue problems, in Inverse Problems,Vol. 3, Physics Trust Publications, Bristol, England, 1987, pp. 595-622.

[6] S. ELHAY, (. GOLUB, AND J. KAUTSKY, Updating and downdating of orthogonal polynomialswith data fitting applications, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 327-353.

[7] W. GAUTSCHI, On generating orthogonal polynomials, SIAM J. Sci. Statist. Comput., 3 (1982),pp. 289-317.

[8] Computational problems and applications of orthogonal polynomials, in OrthogonalPolynomials and Their Applications, C. Brezinski, L. Gori, and A. Ronveaux, eds., Vol. 9,IMACS Annals on Computing and Applied Mathematics, 1991, pp. 61-71.

[9] W. GPA((, The QR algorithm for unitary Hessenberg matrices, J. Comput. Appl. Math., 16(s), .

[10] W. (RAGG AND L. REICHEL, A divide and conquer method for unitary orthogonal eigenprob-lems, Numer. Math., 57 (1990), pp. 695-718.

[11] W. B. GlAGG AND W. J. HARROD, The numerically stable reconstruction of Jacobi matrices

from spectral data, Numer. Math., 44 (1984), pp. 317-335.[12] L. REICHEL, Fast QR decomposition of Vandermonde-like matrices and polynomial least

squares approximation, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 552-564.[13] L. REICHEL, G. AMMAR, AND W. GIA(G, Discrete least squares approximation by trigonometric

polynomials, Math. Comp., 57 (1991), pp. 273-289.[14] H. RUTISHAUSER, On Jacobi rotation patterns, in Proceedings of Symposia in Applied Mathe-

matics, Vol. 15, Experimental Arithmetic, High Speed Computing and Mathematics, Amer.Math. Soc., Providence, 1963, pp. 219-239.

[15] M. VAN BAREL AND A. BULTHEEL, A parallel algorithm for discrete least squares rationalapproximation, Numer. Math., 63 (1992), pp. 99-121.

[16] , Discrete least squares approximation with polynomial vectors, Tech. Report TWl90,Department of Computer Science, .Katholieke Universiteit Leuven, May 1993.

[17] , Discrete inearized least squares rational approximation on the unit circle, J. Comput.Appl. Math., 50 (1994), pp. 545-563.