Approximating commuting operators

Linear Algebra and its Applications 327 (2001) 131–149www.elsevier.com/locate/laa

Approximating commuting operators

John Holbrooka, Matjaž Omladic b,∗aDepartment of Mathematics and Statistics, University of Guelph, Guelph, Canada

bDepartment of Mathematics, Faculty of Mathematics and Physics, University of Ljubljana,Jadranska 19, Sl-1000 Ljubljana, Slovenia

Received 24 April 2000; accepted 16 October 2000

Submitted by R. Guralnick

Abstract

The problem of approximatingm-tuples of commutingn × n complex matrices by com-mutingm-tuples of generic matrices is studied. We narrow the gap for commuting triples byshowing that they can be perturbed ifn < 6 and that they are not always perturbable ifn > 29.© 2001 Elsevier Science Inc. All rights reserved.

AMS classification: 15A27; 15A30

Keywords: m-Tuples of commuting matrices; Approximation by generic matrices

1. Introduction

We focus here on the problem of approximatingm-tuples of commutingn × n

matrices (overC) by commutingm-tuples ofgeneric matrices, i.e. matrices with dis-tinct eigenvalues. Originally we were motivated by our interest in matrix inequalitiessuch as the multivariate von Neumann inequality (see, e.g., [8,9,11]) and multipli-cative inequalities for the numerical radius (see, e.g., [3,7,15]). In the study of suchinequalities, it is natural to try to replace the commuting matrices by commutinggenerics in order to take advantage of additional structure, such as simultaneous di-agonal forms or polynomial expressions in a single generator. Later (with the timelyadvice of Laffey) we learned something of the long history of commuting matrix

∗ Corresponding author.E-mail address: [email protected] (J. Holbrook), [email protected]

(M. Omladic).

0024-3795/01/$ - see front matter� 2001 Elsevier Science Inc. All rights reserved.PII: S 0 0 2 4 - 3 7 9 5 ( 0 0 ) 0 0 2 8 6 - X

132 J. Holbrook, M. Omladic / Linear Algebra and its Applications 327 (2001) 131–149

m-tuples in the context of algebraic geometry (see, e.g., [4]). Particularly importantfor us are the techniques of Guralnick [5], by means of which he shows that certaincommuting triples (m = 3) of large dimension (e.g.,n = 32) cannot be genericallyperturbed. One of the classic papers on pairs of commuting matrices is [16], whereNeubauer and Saltman characterize the commuting pairs ofn × n matrices whichgenerate a commutative algebra of dimension exactlyn.

The problem of generically perturbingm-tuples of (commuting)n × n matricesis, as we shall explain in Section 2, rather well understood except for the case oftriples (m = 3). The present work attempts to clarify this remaining case. We showthat commuting triples can always be perturbed (i.e. arbitrarily well approximated bycommuting generic triples) ifn < 6 and in certain other cases. On the other hand, weshow that “imperturbable triples” can occur forn = 30; to see this we modify someof the ideas in [5] (where the casesn � 32 are treated). Our techniques also give par-tial information about other values ofn. It was brought to our attention that recentlyGuralnick and Sethuraman [6] gave the answer to this problem for the casen = 4 forarbitrary characteristic. They use a recent result of Neubauer and Sethuraman givenin a related paper [17].

Note that we work with the complex numbersC as our field throughout thepaper. This is natural because of our interest in perturbation of matrices with respectto the usual matrix norms. However, the question has a natural extension to (alge-braically closed) fields of arbitrary characteristic if we replace the usual complextopology by Zariski topology. Although many of our proofs depend on the meth-ods of functional analysis, so that they do not extend to arbitrary characteristic, theresults may turn out to be valid more generally. On the other hand, some of theproofs are obviously valid in arbitrary algebraically closed fields (e.g. the proof ofProposition 4.1).

The overall plan of this paper is as follows. In Section 2 we review what waspreviously known for various values ofn andm, introducing some notation and pre-liminary ideas. In Section 3 we establish the existence of imperturbable triples ofdimensionn = 30, and relate this to the reducibility of the varietyC(3,30) (cf. [4]).In Section 4 we develop an array of perturbation techniques that are effective forsmaller values ofn. In Section 5 we apply these results to describe perturbationalgorithms for triples of dimensionn < 6, and for larger values ofn under certainadditional conditions. In Section 6 we comment briefly on the application of ourresults to multivariate von Neumann inequalities.

2. The variety C(m, n)

Given natural numbersm,n, let C(m, n) denote the set of allm-tuples(A1, . . . ,

Am) of n × n complex matricesAk (i.e. Ak ∈ Mn(C)), where the matrices withineachm-tuple commute

AkAj = AjAk (k, j = 1, . . . ,m).

J. Holbrook, M. Omladic / Linear Algebra and its Applications 327 (2001) 131–149 133

This set may be viewed as a variety inCmn2defined by(m(m − 1)/2)n2 quadratic

equations relating the entries of the matrices. It is not always easy to determinewhether this variety is irreducible or to compute its dimension; Gerstenhaber [4]poses these problems explicitly. Here our main concern has a more “analytic” or“metric” character. LetG(m, n) denote the subset ofC(m, n) consisting of thosem-tuples such that eachAk is “generic”, i.e. hasn distinct eigenvalues. Note thatan m-tuple (A1, . . . , Am) ∈ G(m, n) has certain convenient properties; theAk aresimultaneously diagonalizable, for example, and there exist polynomialspk suchthat Ak = pk(A1) (k = 2, . . . ,m). For given m,n we ask whetherG(m, n) =C(m, n); here the overline indicates closure with respect to the Euclidean metricon Cmn2

(or with respect to any convenient norm onMn(C)). For most values ofm,n, the answers have long been clear. In the rest of this section, we recall thoseanswers.

If n = 1, the answer is trivially ‘yes’ for anym, since commutativity and gene-ricity are automatic. It is easy to see thatG(m,2) = C(m,2) by writing the 2× 2matrices in a givenm-tuple in simultaneous upper-triangular form; one checks di-rectly that, while retaining commutativity, the entries can be perturbed so that thoseon the diagonal (eigenvalues) become distinct. The 3× 3 case is a little harder, butit has certainly been worked out by many; it occurs several times in the literature(e.g., [4,5,12]). In such perturbation problems the usual first step (often taken withoutcomment in the rest of this paper) is to reduce to the nilpotent case by observingthat, if anyAk has more than one eigenvalue, we can reduce to lower dimensionsby working within the spectral subspaces ofAk. Thus, in the case ofC(m,3), weneed only consider (commuting) strictly upper-triangular 3× 3 matrices; it is easy tointroduce (small) nonzero entries on the diagonal while maintaining commutativity.Then we can reduce (without comment!) to lower dimensions.

For n = 4, m = 4, the answer is ‘no’, i.e.G(4,4) /= C(4,4); the classic reasonmay be explained quickly. LetEij denote the “matrix unit” with 1 in theij posi-tion and 0’s elsewhere. InM4(C) the four matricesEij (i � 2 < j) commute (eachproduct is 04, in fact) and along withI4 yield five linearly independent elements ofM4(C). If theEij (i � 2 < j) had (sufficiently nearby) commuting generic pertur-bations, these along withI4 would simultaneously diagonalize to yield five linearlyindependent 4× 4 diagonal matrices, which is not possible. Since any imperturb-able commutingm-tuple can be extended (by 0n, for example) to an imperturbable(m + 1)-tuple, we haveG(m,4) /= C(m,4) for eachm � 4.

It is also true that an imperturbablem-tuple inC(m, n) can be “extended” to onein C(m, n + 1), in the sense of the following proposition.

Proposition 2.1. If (A1, . . . , Am) ∈ C(m, n)\G(m, n), then

(A1 ⊕ λ, . . . , Am ⊕ λ) ∈ C(m, n + 1)\G(m, n + 1)

provided λ (∈ C) is not an eigenvalue of any Ak .


Proof. Suppose, instead, that we have(G1(ν), . . . ,Gm(ν)) ∈ G(m, n + 1) suchthat

(G1(ν), . . . ,Gm(ν)) → (A1 ⊕ λ, . . . , Am ⊕ λ)

asν → ∞. Let γ be a (counter-clockwise) loop aroundλ, so small that it enclosesno eigenvalues of theAk. By spectral continuity, ifν is sufficiently large,γ willenclose just a single eigenvalue of eachGk(ν). Consider the Riesz projections

Pν = 1

2�i

∫γ

(z − G1(ν))−1 dz (2.1)

for suchν. As ν → ∞, G1(ν) → A1 ⊕ λ, so thatPν must approach the projectionP onto the(n + 1)st coordinate axis. If we replaceG1(ν) in (2.1) by Gk(ν) forsomek with 1 < k � n, we might get a different projection. However, since thisprojection also approachesP asν → ∞ and since it commutes withPν , it has tobe equal toPν (for ν large enough). ThenQν := I − Pν commutes with eachGk(ν)

and the eigenvalues ofQνGk(ν)Qν are those ofGk(ν), omitting only the eigenvalueenclosed byγ . Thus

(QνG1(ν)Qν, . . . ,QνGm(ν)Qν) ∈ G(m, n). (2.2)

It follows that, for eachk,

QνGk(ν)Qν → (I − P)(Ak ⊕ λ)(I − P) = Ak.

Thus the (generic, commuting)m-tuples of (2.2) approach(A1, . . . , Am), contra-dicting our hypothesis. �

From Proposition 2.1 and our description of the classic examples inC(4,4), itfollows thatG(m, n) /= C(m, n) for m,n � 4.

CertainlyG(1, n) = C(1, n); this just says that the generic matrices are densein Mn(C). A less familiar fact is thatG(2, n) = C(2, n) (for anyn). This result, invarious forms, occurs here and there in the literature; see, e.g., [1,4,5,7,10,14]. In [5],Guralnick gives an especially efficient approach toG(2, n) = C(2, n).

The remarks above answer our question in all cases except for triples (m = 3) ofsize 4× 4 or larger. In [5], Guralnick showed by an ingenious dimensional argumentthatG(3,32) /= C(3,32). He gave a related argument to cover eachn � 32. As analternative, we may use Proposition 2.1 to extend the answer ‘no’ fromm = 3, n =32 tom = 3, n � 32. From this survey of the literature it appears that the question‘Is G(m, n) = C(m, n)?’ is left in doubt only form = 3 and 4� n � 31. In whatfollows we shall narrow this gap somewhat.

3. Imperturbable triples; reducible C(3, n)

By modifying some of the ideas in [5], we shall show thatG(3, n) /= C(3, n) forn � 30.


Proposition 3.1. For each n � 30 there exist commuting matrices A1, A2, A3in Mn(C) that cannot be approximated (arbitrarily well) by commuting genericG1,G2,G3.

Proof. The generic matrices�n form an open subset ofMn(C) � Cn2and we may

viewG(3, n) as the image of the mapω : �n × Cn × Cn → G(3, n), where

(G1, α, β) → (G1, p(G1), q(G1)),

with

p(z) =n∑

j=1

αj zj−1, q(z) =

n∑j=1

βjzj−1.

This map is bijective so thatG(3, n) is an irreducible variety of dimensionn2 + 2n(here the double overline indicatesZariski closure).

On the other hand, for large enoughn we shall construct subsets ofC(3, n) havingdimension strictly greater thann2 + 2n, so that

G(3, n) ⊂ G(3, n)�C(3, n). (3.1)

To this end, consider the nilpotentN1 = 0a⊕b

1 J3, wherea andb are natural num-bers, 0a denotes thea × a matrix of 0’s, andJ3 denotes the 3× 3 nilpotent Jor-dan block. ThusN1 ∈ Mn(C), wheren = a + 3b. Considering the usual “Frobeniusform” of matrices in the commutantZ(N1) of N1, we see thatZ(N1) has dimensiond1 = a2 + 2ab + 3b2. A key idea in [5] suggests in our context that we considerthe varietyS of commuting pairs(N2, N3), where each of theNk is chosen fromthe subspaceZ0(N1) of Z(N1) determined as follows: in the Frobenius form thea × a block is 0a and the diagonal of each 3× 3 block vanishes. NowZ0(N1) hasdimensiond0 = 2ab + 2b2 and it is easy to check that forany N2, N3 ∈ Z0(N1) theproductN2N3 has at mostb2 nonzero entries (in the upper-right corners of the 3× 3blocks). Thus the dimensiond2 of the varietyS satisfies

d2 � 2d0 − b2 = 4ab + 3b2.

Consider the morphismφ : C3 × GLn × S → C(3, n) defined by

(z1, z2, z3,X,N2, N3) → (X(N1 + z1)X−1,X(N2 + z2)X

−1,

X(N3 + z3)X−1).

Two similaritiesX andX′ in GLn yield the same point in the imageT of φ only ifX′X−1 ∈ Z(N1). Hence the preimage of any point inT has dimension at mostd1.We obtain the following estimate for the dimensiond of T:

d � 3 + n2 + d2 − d1 � 3 + n2 + 4ab + 3b2 − (a2 + 2ab + 3b2)

= n2 + 3 + 2ab − a2.


Thusd > n2 + 2n provided 3+ 2ab − a2 > 2a + 6b and this happens exactly whenb � 8 anda lies strictly betweenb − 1 ± √

b2 − 8b + 4. With b = 8 we can takea = 6,7,8, which correspond ton = 30,31,32. The casea = b = 8 amounts to areformulation of the basic example in [5]. It is easy to check that appropriate choicesof a andb yield all n � 30; alternatively, we could invoke Proposition 2.1.�

Corollary 3.2. For n � 30 the variety C(3, n) is reducible.

Proof. We have seen in (3.1) thatG(3, n) is a strict subvarietyV1 of C(3, n) whenn � 30. Consider the subsetV2 of C(3, n) consisting of those triples(A1, A2, A3)

such thatA1 is “derogatory”, i.e.{I,A1, A21, . . . , A

n−11 } is linearly dependent (one of

the many equivalent formulations of the “derogatory” condition). ClearyV2 is a strictsubvariety ofC(3, n). Also C(3, n) = V1 ∪ V2, becauseC(3, n)\V2 ⊂ G(3, n); tosee this, recall that the commutantZ(A1) of a nonderogatoryA1 contains just poly-nomials inA1 so that

(A1, A2, A3) ∈ C(3, n)\V2 ⇒ A2 = p(A1), A3 = q(A1)

for some polynomialsp, q. Clearly we can perturbA1 to genericA′1 andp, q to

p′, q ′ so thatA′2 = p′(A′

1) andA′3 = q ′(A′

1) are also generic. �

Remark. The arguments above make it clear thatC(3, n) is irreducible if and only

if G(3, n) = C(3, n). In fact, this is also equivalent toG(3, n) = C(3, n) (i.e. theanswer ‘yes’ to our basic question). To see this, letV2 be as in the proof of Corollary

3.2. ClearlyC(3, n)\V2 ⊃ G(3, n), so that ifG(3, n) = C(3, n), thenC(3, n) is alsothe Zariski closure ofC(3, n)\V2. It may be shown (see, e.g., [2, Proposition 7, Sec-tion 9.7], and the surrounding discussion) in such a case that the larger variety, hereC(3, n), is also themetric closure ofC(3, n)\V2, i.e. thatC(3, n) = C(3, n)\V2. But(as in the proof of Corollary 3.2)C(3, n)\V2 ⊂ G(3, n). Hence

G(3, n) = C(3, n) ⇒ G(3, n) = C(3, n).

The reverse implication is trivial.

4. Some perturbation techniques

In this section, we study some special cases of linear spaces of commuting oper-ators on a given finite dimensional vector space. Let the radical of the algebra thatthe operators from the space generate have index of nilpotency no greater than 2.We show that any commuting triple from a space with this property is perturbable.Actually, our result is somewhat more general. Observe that examples of Section 3do not satisfy this assumption.


In the following proposition we consider the problem of determining the maximaldimension of a linear space of surjective operators from a space into a space ofone dimension less. We need these results in the sequel; however, they may be ofindependent interest.

Proposition 4.1. Let L be a vector space of dimension no smaller than 3 of op-erators from a k-dimensional into a (k − 1)-dimensional vector space, k > 2. Then,some nonzero element of L fails to be surjective.

Remark. Observe that Proposition 4.1 is true also fork = 2 in a trivial way sincein this case there is noL satisfying the assumptions. Note also that the propositiondoes not hold forL of dimension 2. For example, the matrices

A =

0 1 0 · · · 0 00 0 1 · · · 0 0· · · · · · · ·0 0 0 · · · 0 1

, B =

1 0 · · · 0 00 1 · · · 0 0· · · · · · ·0 0 · · · 1 0

(4.1)

and all of their nontrivial linear combinations are clearly surjective. We will provein the following lemma that these pairs are characteristic for spaces of surjectiveoperators.

Lemma 4.2. Let L be a vector space of dimension no smaller than 2 of operatorsfrom a k-dimensional into a (k − 1)-dimensional vector space, k > 2. Then, everynontrivial member of L is surjective if and only if any two linearly independentoperators A,B ∈ L are of the form (4.1) in some basis.

Proof. It is clear that nontrivial linear combinations of operatorsA andB are sur-jective. So, let us prove the reverse side of the assertion. Choose a nontrivial vec-tor e1 ∈ KerA and proceed inductively. Assuming that linearly independent vectorse1, e2, . . . , ej have already been chosen for somej, 1 � j < k, such thatAe1 =0, Ae2 = Be1, . . . , Aej = Bej−1, we have to treat two cases. Suppose first thatBejbelongs to the linear span ofBe1, Be2, . . . , Bej−1. Then, completing the basis of thedomain arbitrarily with some vectorsej+1, . . . , ek and letting the basis of codomainequalAe2, Ae3, . . . , Aek, the matrices ofA andB become

A =

0 1 0 · · · 0 0 0 · · · 00 0 1 · · · 0 0 0 · · · 0· · · · · · · · · · · · ·0 0 0 · · · 0 1 0 · · · 0

0 0 0 · · · 0 0 1 · · · 0· · · · · · · · · · · · ·0 0 0 · · · 0 0 0 · · · 1

,


B =

1 0 · · · 0 ∗ ∗ · · · ∗0 1 · · · 0 ∗ ∗ · · · ∗· · · · · · · · · · · ·0 0 · · · 1 ∗ ∗ · · · ∗

0 0 · · · 0 0 ∗ · · · ∗· · · · · · · · · · · ·0 0 · · · 0 0 ∗ · · · ∗

.

We can now clearly choose an appropriateλ /= 0 such thatB − λA is not surjective,contradicting the assumptions of the lemma. Thus, the remaining case must be true,namely, thatBe1, Be2, . . . , Bej are linearly independent. By surjectivity ofA thereexists someej+1 such thatAej+1 = Bej and since the kernel ofA is one-dimen-sional and spanned bye1, it must hold thate1, e2, . . . , ej+1 are linearly independent.This completes the induction step.

At the end of induction, matrices corresponding toA andB in basese1, e2, . . . , ekandBe1, Be2, . . . , Bek−1 have the form

A =

0 1 0 · · · 0 00 0 1 · · · 0 0· · · · · · · ·0 0 0 · · · 0 1

, B =

1 0 · · · 0 αk−10 1 · · · 0 αk−2· · · · · · ·0 0 · · · 1 α1

.

Define ak × k matrix

S =

1 −α1 −α2 · · · −αk−10 1 −α1 · · · −αk−20 0 1 · · · −αk−3· · · · · · ·0 0 0 · · · 1

,

and letT be the lower right(k − 1) × (k − 1) corner ofS. Then,T −1BS = T −1(T ,0)= (I,0) andT −1AS = T −1(0, T ) = (0, I ). �

Proof of Proposition 4.1. Contrary to the conclusions of the proposition, assumethat every nonzero member ofL is surjective. Choose an arbitrary basis{A,B,C}of L and writeP(α, β) = αA + βB + C andQ(λ,µ) = λA + µB. Observe thatPandQ are linearly independent as soon asλ andµ are not both zero. For any fixedαandβ define thek(k − 1)-row determinant

D(λ,µ) =

∣∣∣∣∣∣∣∣∣∣∣∣

P 0 0 · · · 0 0Q P 0 · · · 0 00 Q P · · · 0 0· · · · · · · ·0 0 0 · · · Q P

0 0 0 · · · 0 Q

∣∣∣∣∣∣∣∣∣∣∣∣


made of k rows andk − 1 columns of(k − 1) × k blocks. In the determinantD(πλ, πµ) multiply thek − 1 rows of theith block row byπ1−i and thek columnsof thejth block column byπj−1 for 1 � i � k and 1� j � k − 1. The determinantgets multiplied by

π−(k−1)(1+···+k−1)πk(1+···+k−2) = π(−k(k−1)2+k(k−1)(k−2))/2 = π−k(k−1)/2,

but it has also turned intoD(λ,µ). This proves that this polynomial is homogeneousin λ andµ of orderN = k(k − 1)/2. It is therefore of the form

D(λ,µ) =N∑i=0

diλiµN−i .

Now, in order to makeD(λ,µ) = 0 with P andQ linearly independent, it suffices totakeλ = 1, µ = 0 if dN = 0 andλ = 0, µ = 1 if d0 = 0; but if bothd0 anddN arenonzero, we can even findλ andµ both nonzero satisfying equationD(λ,µ) = 0.By Lemma 4.2 we may choose a basis in whichP andQ are of the form (4.1). Notethat this change of basis should leave the determinantD(λ,µ) equal to 0. However,we intend to show that this determinant is nonzero and this contradiction will yieldthe proposition. The proof of this will be given by induction onk. Observe that fork = 2,D(λ,µ) becomes a 2× 2 determinant∣∣∣∣0 1

1 0

∣∣∣∣ ,which is clearly nonzero. Next, observe that in this determinant for a generalk eachfirst column in a block column has exactly one nonzero entry equal to 1 appearing inthe upper left corners of the lower ‘main’ block diagonal. Therefore, crossing out allthese columns and the according rows can only change the sign of the determinant.Having crossed them out we end up with the determinant of a matrix with identityin the upper left block corner and zeros in the rest of the first block row, so that wecan cross out all the rows in the first block row and accordingly all the columns inthe first block column without changing the value of the determinant. After that weobtain the same kind of matrix as at the beginning with indexk diminished by 1. �

We now begin the study of linear spaces of operators on a given finite dimensionalvector space. The length of the longest Jordan chain of such an operator will be calledits order. Note that the maximal order in such a linear space is no greater than theorder of nilpotency of the radical of the algebra generated by these operators.

Proposition 4.3. Let L be a 3-dimensional vector space of commuting operatorsof order not exceeding 2. Then, there exists a basis {A,B,C} of L such that forany ε > 0 there are commuting matrices Aε, Bε and Cε such that ‖A − Aε‖ < ε,

‖B − Bε‖ < ε, ‖C − Cε‖ < ε, such that the order of any operator from the spanLε of these matrices never exceeds 2, and such that Lε commutes with a nontrivialprojection.


Remark. Observe that any basis ofL can be obtained from{A,B,C} by an in-vertible 3× 3 matrix. Thus, what Proposition 4.3 claims to be possible for a certainbasis and every positiveε, is actually possible for any basis and any positiveε. Thus,we may change the basis arbitrarily in the course of the proof as long as we have inmind that every change of basis may also change the actualε. Clearly, only a finitenumber of changes are allowed.

Proof. If any of the operators inL had more than one point in the spectrum, thespectral projectionP of this operator with respect to one of the points would dothe job without even perturbingL. Thus, we may and will assume that all havesingleton spectra. Also assuming that the only point in their spectra is zero we get thatA2 = 0 for allA ∈ L. DenoteX0 = ⋂

A∈L KerA andU = ∨A∈L ImA, and note

thatU ⊂ X0. If this inclusion were strict, we could find a vectorx in the bigger set,but not in the smaller and a functional annihilatingU, but notx. Assumingf (x) = 1define a projectionP = x ⊗ f by Py = f (y)x, which does the job again withoutchanging the spaceL.

Consequently, we may and will assume thatX0 = U and we will further supposethat twice the dimension ofX0 does not exceed the dimension of the underlyingspace, since otherwise we could go to the adjoints where the roles ofX0 andUwould be played by their annihilators. Now, decompose the space with respect toX0and any of its algebraic complements and note that anyA ∈ L has the form

A =(

0 ∗0 0

)with respect to this decomposition. It follows that we can slightly perturb the upperright corner ofA, if necessary, to get rankA =dimX0. Observe that this possibleperturbation has effected neither commutativity nor order assumptions and it has notchanged eitherU orX0.

Next, chooseB to be linearly independent ofA, write KerA = X0 ⊕ X1, andobserve thatBX1 ⊂ X0 =ImA. Hence, ifBX1 /= {0}, then there exists a nontrivialsubspaceX2 such that(X0 ⊕ X1) ∩ X2 = {0} andAX2 = BX1. Proceed by induc-tion to find spacesX0,X1, . . . ,Xk, for somek � 1, such that(X0 ⊕ · · · ⊕ Xk−1) ∩Xk = {0}, so that the direct sumX0 ⊕ X1 ⊕ · · · ⊕ Xk exists, and such thatB(X1 ⊕· · · ⊕ Xk−1) = A(X2 ⊕ · · · ⊕ Xk). Now, if BXk is not contained inB(X1 ⊕ · · · ⊕Xk−1), we can find a necessarily nontrivial spaceXk+1 having trivial intersectionwith X1 ⊕ X2 ⊕ · · · ⊕ Xk and satisfying B(X1 ⊕ · · · ⊕ Xk) = A(X2 ⊕ · · · ⊕Xk+1). As we are in a finite dimensional space, this induction must stop in a fi-nite number of steps so that there must exist a smallest index, sayk, such thatBXk ⊂ B(X1 ⊕ · · · ⊕ Xk−1).

Assume at first thatX0 ⊕ X1 ⊕ · · · ⊕ Xk does not exhaust the whole space un-der consideration and denote byY an algebraic complement of it. This implies thatX0 = AX2 ⊕ AX3 ⊕ · · · ⊕ AXk ⊕ AY. Fix this decomposition ofX0 and the al-gebraic complement of it, defined by the decompositionX1 ⊕ X2 ⊕ · · · ⊕ Xk ⊕ Y.


Recall that any memberC of L has all the blocks with respect toX0 and its com-plement zero, except possibly for the upper right corner to be denoted byC12. Writedown the block matrices ofA12 andB12 with respect to the above decompositions:

A12 =

0 1 0 · · · 0 00 0 1 · · · 0 0· · · · · · · ·0 0 0 · · · 1 00 0 0 · · · 0 1

,

B12 =

b11 b12 · · · b1k−1 b1k b1k+10 b22 · · · b2k−1 b2k b2k+1· · · · · · · .

0 0 · · · bk−1k−1 bk−1k bk−1k+10 0 · · · 0 0 bk k+1

.

Observe that the induction procedure through which the spacesXi have been ob-tained ensures that the block operatorsb11, b22, . . . , bk−1k−1 are all surjective andhave therefore all right inverses to be denoted, respectively, byf11, f22, . . . , fk−1k−1.Denote byS the(k + 1) × (k + 1) block matrix having identities on the main diag-onal, entries(−fk−1k−1bk−1k, −fk−1k−1bk−1k+1) above the main diagonal in the(k − 1)st block row and zeros elsewhere. Further, letT be the down rightk × k

corner ofS and multiply A and B both by Diag(T , S) from the right and its in-verse from the left to getA unchanged andB having the entries in the(k − 1, k)thand (k − 1, k + 1)th block position annihilated. Suppose inductively that in somedecomposition we have achieved the above form ofA andB with bij = 0 for p <

i < j � k + 1, i /= k, for somep, 1 � p � k − 1. Then, defineS to have identitieson the main diagonal, entries(−fppbp p+1, . . . ,−fppbp k+1) above the main diag-onal in thepth block row and zeros elsewhere. Let againT denote the down rightk × k corner ofS and multiplyA andB both by Diag(T , S) from the right and itsinverse from the left to getA unchanged andB in gain for zeros on block positions(p, p + 1), . . . , (p, k + 1). We may therefore assume that the “above-diagonals" ofthe firstk − 1 block rows ofB12 have been zero from the very beginning.

Now, letP be the projection onAY ⊕ Y alongAX2 ⊕ · · · ⊕ AXk ⊕ X1 ⊕ · · · ⊕Xk. Then,P clearly commutes withA andB, while for anyC ∈ L it holds that theimage ofC is a subset ofX0 which is left invariant underP and lying in the kernelof C what givesCPC = 0. Thus, the proposition follows by Lemma 4.4 below.

Next, suppose thatY is trivial and assume as above that

A12 =

0 1 0 · · · 0 00 0 1 · · · 0 0· · · · · · · ·0 0 0 · · · 0 1

,


B12 =

b11 0 · · · 0 00 b22 · · · 0 0· · · · · · ·0 0 · · · bk−1k−1 0

.

From the fact thatbii are surjective we conclude that dimX1 �dimX2 � · · · �dimXk. Suppose that there is an indexj,1 < j � k, such that the dimension ofXj−1 is strictly greater than that ofXj and letj be the greatest index with this proper-ty. Letpj−1 be a necessarily nontrivial projection having the same kernel asbj−1j−1,let pi = I for j � i � k and letpi = fiipi+1bii for i = j − 2, j − 3, . . . ,1. Then,pi+1bii = biipi for all 1 � i < k and the nontrivial projectionP =Diag(p2, . . . ,

pk, p1, p2, . . . , pk) commutes withA andB, and leavesX0 invariant, thus forcingCPC = 0 for anyC ∈ L so that we are done by Lemma 4.4 again.

Finally, we have to treat the case when the dimensions ofXi , i /= 0, are all equalto d, say. In this caseA andB have the form (4.1) withd × d block entries. Ifd isgreater than 1, then there exists a nontrivial projectionp on thed-dimensional spaceand, consequently,P =Diag(p, p, . . . , p) is a nontrivial projection commuting withA andB, and satisfyingCPC = 0, thus giving again the proposition by Lemma 4.4.If, on the other hand,d = 1, then, the(12) blocks of matrices fromL form a linearspace of dimension 3 of operators from ak-dimensional into a(k − 1)-dimensionalvector space. By Proposition 4.1 there is anA ∈ L such thatA12 is not surjective.Choose aB linearly independent ofA and repeat the above procedure. SinceA12 failsto be surjective, operatorsA12 andB12 cannot have the form (4.1) so that we have tobe done by some other case this time.�

It remains to prove the lemma.

Lemma 4.4. Let L be a 3-dimensional vector space of commuting nilpotent oper-ators of order 2. If for some basis {A,B,C} of L, there is a nontrivial projectionP commuting with A and B and satisfying CPC = 0, then the conclusions of theproposition are valid.

Proof. For anyε > 0 leaveA and B unchanged and perturbC into C + δP forsomeδ > 0. It is clear that the “new” basis operators commute and that they are nottoo far from the “old” ones provided that we chooseδ small enough. For theoperatorT = C + δP + αA + βB observe thatT 2(T − δ)2 = 0, as a straightfor-ward computation proves. Since clearly, at least one of operatorsT of this form hasthe pointδ in the spectrum, its spectral projection with respect to 0 will do the trick.It is also clear that the order of the operators in the new space has not increased.

�


5. Some positive perturbation results

Theorem 5.1. Let L be a vector space of commuting operators of order not ex-ceeding 2 and dimension 3. Then, there exists a basis {A,B,C} of L such that forany ε > 0 there are commuting matrices Aε, Bε and Cε such that ‖A − Aε‖ < ε,

‖B − Bε‖ < ε, ‖C − Cε‖ < ε and such that Lε is simultaneously diagonalizable.

Proof. The proof will be done by induction on the dimension of the underlyingvector space. Suppose the situation as in the theorem on a certain vector space andsay that the theorem is valid on any vector space of strictly smaller dimension. Per-turb by Proposition 4.3 the basis ofL a bit to be able to find a nontrivial projectioncommuting with the whole ofL. The restriction of (the perturbed)L to either ImP

or KerP is a space of commuting operators, having order not exceeding 2, dimensionnot exceeding 3 and acting on a vector space of dimension strictly smaller than thestarting one. If the dimension of a restriction is 3, the induction assumption gives acommuting simultaneously diagonalizable perturbed basis of it, if it is 2, we get it byresults onC(2, n) discussed in Section 2, and if it is even less, any perturbation by asmall operator with different eigenvalues will do the trick. In any case, the internaldirect sum of the two perturbations properly glued back together will give the desiredperturbation ofL. �

Corollary 5.2. Any commuting triple such that the order of nilpotency of the radicalof the algebra they generate is no greater than 2, is perturbable.

Theorem 5.3. Let L be a linear space of n × n nilpotent matrices such that foreach of them the difference between its order of nilpotency and its rank equals 1.Then, any triple of matrices from L is perturbable.

Proof. The minimal possible difference between the ordern of a matrix fromL andits order of nilpotency will be denoted byk and called ‘the difference of orders’. Wewill give the proof of the theorem inductively on the difference of orders for all sizesof matrices simultaneously. If the difference of orders equals 0, the spaceL containsa nilpotent matrix with maximal possible order of nilpotency. Thus, any member ofL is a function of this matrix and the theorem follows.

Now, assume inductively that the theorem holds for a certain difference of ordersk − 1 and letL be any linear space of matrices satisfying the suppositions of thetheorem with the difference of orders equal tok. Let A be the matrix at which thisdifference of orders is attained, so thatA may be written in the block form

A =(J 00 0

),

whereJ is an(n − k) × (n − k) Jordan block. For anyB ∈ L and anyλ ∈ C therank ofλA + B is no greater thann − k − 1 by our hypothesis. SinceB commutes


with A, it can be written as

B =(p(J ) e1b

∗ce∗

n−k D

),

whereb, c ∈ Ck, p is a polynomial (assumed with no loss of generality to have zerocoefficients with indices 0 and 1) andD is a k × k matrix. After crossing out thefirst column and the(n − k)th row of matricesA andB (which are both zero), thematrices turn into

A =(I 00 0

), B =

(T e1b

∗ce∗

n−k−1 D

),

whereT is a strictly upper triangular matrix. It is clear that for everyλ ∈ C the rankof λA + B equals the rank ofλA + B and is therefore no greater thann − k − 1.Since forλ /= 0 the operatorλI + T is invertible and upper triangular withλ−1 onthe diagonal, we have that

λA + B =(λI + T e1b

∗ce∗

n−k−1 D

)

=(

I 0λ−1ce∗

n−k−1 I

) (λI + T e1b

∗0 D − λ−1ce∗

n−k−1e1b∗).

Now, the lower right corner of the rightmost matrix above equalsD − λ−1

ce∗n−k−1e1b

∗ = 0 for all λ ∈ C, λ /= 0, by the rank condition, thus forcingD = 0andce∗

n−k−1e1b∗ = 0.

Next, choose an arbitrary tripleA,B, andC of L and assume with no loss ofgenerality that they are linearly independent. Write

A =(J 00 0

), B =

(p(J ) e1b

∗ce∗

n−k 0

), C =

(q(J ) e1f

∗ge∗

n−k 0

),

where we assume with no loss of generality that polynomialsp and q have zerocoefficients with indices 0 and 1. Commutativity ofB andC implies further that

b∗g = f ∗c. (5.1)

We will now show that, after replacingB andC with the appropriate linear combina-tions of theirs if necessary, there are columnsz,w ∈ Ck such thatb∗z = 0,w∗c = 0,andw∗z /= 0. If k > 2, this is clear since we may choosez = w equal to any nonzerocolumn orthogonal to bothb andc. In casek = 2 we only need to choose the matrixB in such a way that eitherb = 0, orc = 0, orb∗c /= 0. Namely, we can then choosez = w to be equal to any nonzero vector orthogonal toc in the first case,z = w

to be equal to any nonzero vector orthogonal tob in the second case, while in thethird case we choosez /= 0 orthogonal toc andw /= 0 orthogonal tob. In orderto show that the problem can be reduced to one of these three cases, assume thecontrary. Applying the above condition to matricesB, C, andB + C, it follows that


b, c, f, g, b + f, c + g are all nonzero and thatb∗c = f ∗g = (b + f )∗(c + g) = 0.This implies, using (5.1), thatf ∗c = 0. Sinceb and f are both orthogonal to thenonzero vectorc, they are linearly dependent, so that a nontrivial linear combinationof B andC has zero upper right corner, a contradiction. It remains to treat the casek = 1. If for someC we havef /= 0, it follows by (5.1) thatb = 0 impliesc = 0.Therefore, a nontrivial linear combination ofB andC has both upper right and lowerleft corners equal to zero. We can now replaceB by this linear combination, chooseC appropriately and takez = w = 1.

So, with no loss of generality, we may assume the existence of vectorsz,w ∈ Ck

such thatb∗z = w∗c = 0 andw∗z /= 0. Assume further, with no loss of generality,thatz∗z = w∗z = 1. Define

P =(

0 00 zw∗

),

and consider the tripleA, B, andC + δP for anyδ > 0. It is clear that these matricescommute. We will show thatC + δP has exactly one nonzero eigenvalue (equalto δ) with algebraic multiplicity 1. Furthermore, we will show that the restrictionsof these operators and all of their linear combinations to the spectral subspace ofC + δP corresponding to its eigenvalue 0, satisfy the assumptions of the theoremwith the difference of orders equal tok − 1 and are therefore perturbable by inductivehypothesis. It will then follow thatA, B, andC + δP and thereforeA, B, andCare perturbable.

It remains to show thatA, B, andC + δP have the desired properties. Define

S =(

I −δ−1(f ∗z)e1z∗

−δ−1(w∗g)ze∗n−k I − zz∗ − zw∗

)and observe that

S−1 =(I − δ−2(f ∗z)(w∗g)e1e

∗n−k −δ−1(f ∗z)e1w

∗−δ−1(w∗g)ze∗

n−k I − zz∗ − zw∗).

A straightforward computation reveals that

A = SAS−1 =(J 00 0

),

B = SBS−1 =(p(J ) − δ−1(f ∗z)(z∗c)e1e

∗n−k e1b

∗[c − (z∗c)z]e∗

n−k 0

),

Cδ = S(C + δP )S−1 =(q(J )− δ−1(f ∗z)(z∗g)e1e

∗n−k e1[f ∗ − (f ∗z)w∗]

[g − (z∗g)z]e∗n−k δzz∗

).

It is clear that(0z

)is a left eigenvector ofCδ for eigenvalueδ and that(0 z∗) is a

corresponding right eigenvector. The orthogonal complementX of this eigenvectoris invariant underA, B, andCδ. The restriction of any of their linear combinations to


X is equal to the restriction of the according linear combination ofA, B, andSCS−1

toX and is therefore nilpotent of the same degree. Since the rank of a restriction canonly get smaller, it follows that the linear span of the restrictions ofA, B, andCδ toX satisfies the assumptions of the theorem with difference of ordersk − 1. �

Theorem 5.4. For every n � 5 any triple of n × n matrices is perturbable.

Proof. Choose a triple of linearly independent commutingn × n matrices withn �5 and denote their span byL. Assume with no loss of generality that all the membersof L are nilpotent. If all of them have order of nilpotency no greater than 2, then weare done by Theorem 5.1. If at least one of them has order of nilpotency equal ton,we are done by the fact that any other member ofL is a polynomial in it. If none ofthem has ordern, but some of them have ordern − 1, we are done by Theorem 5.3. Ifthe maximal order of nilpotency inL is n − 2, we need to haven = 5, otherwise wewould be done by the above. Now, if the rank of all members ofL is no greater than2, we are done again by Theorem 5.3. It remains to treat the case when the maximalorder of nilpotency of members ofL is 3, and the maximal rank of members ofLis 3. Denote byA the operator at which both are attained and write

A =(J ′ 00 J

),

whereJ ′ is a 3× 3 andJ is a 2× 2 Jordan block. Choose aB ∈ L linearly indepen-dent ofA and write (after subtracting a multiple ofA, if necessary)

B =

0 0 α β γ

0 0 0 0 β

0 0 0 0 00 π ρ 0 σ

0 0 π 0 0

.

By assumption,λA + B has rank no greater than 3 for allλ ∈ C. Cross out the firstcolumn and the third row of matricesA andB which are trivial, to get

A =(I 00 J

), B =

(αJ βI + γ J

πI + ρJ σJ

),

so that forλ /= 0

λA + B

=(λI + αJ βI + γ J

πI + ρJ (σ + λ)J

)

=(

I 0πλ−1I + (ρλ−1 − παλ−2)J I

)

×(λI + αJ βI + γ J

0 −πβλ−1I + (σ + λ − γπλ−1−ρβλ−1 + παβλ−2)J

),


where we have taken into account that(λI + αJ )−1 = λ−1I − αλ−2J . The rankcondition implies thatπβ = 0. We will treat the following three cases separately: (a)there exists aB ∈ L as above such thatπ /= 0, (b) there exists aB ∈ L as abovesuch thatβ /= 0, and (c)π = β = 0 for all B as above. In addition toB choose a thirdmatrix linearly independent ofA andB

C =

0 0 α′ β ′ γ ′0 0 0 0 β ′0 0 0 0 00 π ′ ρ′ 0 σ ′0 0 π ′ 0 0

.

First, assume case (a). Then,β = 0 by the rank condition. Commutativity ofB andC implies thatπβ ′ = π ′β forcingβ ′ = 0. ReplacingC by an appropriate nontriviallinear combination ofB andC, if necessary, we may assume alsoπ ′ = 0. Commut-ativity of B andC now yieldsπγ ′ = 0 andπσ ′ = 0 forcingγ ′ = σ ′ = 0. Observethat nowC has all entries zero except possibly the(13) and the(43) entry. Noticealso that the same is true forA2 and for(λA + B)2. Since these two operators arelinearly independent (for an appropriate choice ofλ ∈ C), it follows thatC can beviewed as a fixed polynomialp in A andB. Thus, any sequence of generic com-muting pairsAn,Bn converging, respectively, toA andB yields also a sequence ofmatricesCn = p(An,Bn) with desired properties converging toC. Case (b) goessimilarly. In case (c) observe thatP = e2e

∗2 commutes with bothB andC, so that the

theorem follows using usual tricks after perturbingA into A + δP with δ > 0 smallenough. �

6. The multivariate von Neumann inequality

Here we comment briefly on applications of our perturbation results to the theoryof the multivariate von Neumann inequality. Details of several points mentioned heremay be found in [9]. The multivariate von Neumann inequality may be stated asfollows:

‖p(C1, C2, . . . , Cm)‖ � max{|p(z1, z2, . . . , zn)| : |zk| � 1}, (6.1)

wheneverp is a polynomial inm complex variables and theCk are commuting con-tractions on a Hilbert space of dimensionn. We are interested here in the case wherenis finite, and we may regard theCk as commutingn × nmatrices such that‖Ck‖ � 1.Inequality (6.1) is known to hold form = 1 (von Neumann) and form = 2 (Ando),but, unless additional conditions are imposed on the contractions, (6.1) can fail forthree or more variables (Varopoulos). Until recently, the “minimal” counterexampleswere due to Kaijser and Varopoulos and involved three commuting 5× 5 contrac-tions and a certain quadratic polynomial. In [11] Lewis and Wermer, noting thatthe known counterexamples were nilpotent, asked whether generic counterexamples


could be constructed. The natural approach was to try to perturb the Kaijser–Var-opoulos examples to commuting generic matrices (which could then be harmlesslyrenormalized as contractions). This was successfully done by Holbrook and Omladic(see [11, p. 276]) and by Lotto and Steger [13], using various ad hoc techniques.

Such ad hoc arguments can be replaced by the general perturbation results ofSection 5. The mere existence of the 5× 5 counterexamples of Kaijser–Varopoulosimplies that there are generic counterexamples, because of Theorem 5.4. A morerecent development (see [9]) is the discovery of 4× 4 commuting triples (initiallynilpotent) that violate (6.1). This makes it even easier to produce generic counterex-amples, because the perturbation techniques for 4× 4 triples are more elementary. In[9] the special features of the nilpotent 4× 4 examples are exploited in applying ourperturbation techniques to obtain quite simple and explicit generic counterexamples.

Acknowledgement

We are grateful for the support of the Ministry of Science and Technology ofSlovenia, and of NSERC of Canada. Over the last few years, this support has helpedto provide several opportunities for collaboration.

References

[1] J. Barría, P. Halmos, Vector spaces for two commuting matrices, Linear and Multilinear Algebra27 (1990) 147–157.

[2] D. Cox, J. Little, D. O’Shea, Ideals, Varieties, and Algorithms, Springer, Berlin, 1992.[3] K. Davidson, J. Holbrook, Numerical radii of zero-one matrices, Michigan Math. J. 35 (1988)

261–267.[4] M. Gerstenhaber, On dominance and varieties of commuting matrices, Annals Math. 73 (1961)

324–348.[5] R. Guralnick, A note on commuting pairs of matrices, Linear and Multilinear Algebra 31 (1992)

71–75.[6] R. Guralnick, B. Sethuraman, Commuting pairs and triples of matrices and related varieties, Linear

Algebra Appl. 310 (2000) 139–148.[7] J. Holbrook, Polynomials in a matrix and its commutant, Linear Algebra Appl. 48 (1982) 293–301.[8] J. Holbrook, Inequalities of von Neumann type for small matrices, in: K. Jarosz (Ed.), Function

Spaces, Marcel Dekker, New York, 1992, pp.189–193.[9] J. Holbrook, Schur norms and the multivariate von Neumann inequality, preprint 1996; a version

is to appear in Operator Theory, Advances and Applications.[10] T. Laffey, S. Lazarus, Two-generated commutative matrix subalgebras, Linear Algebra Appl. 147

(1991) 249–273.[11] K. Lewis, J. Wermer, On the theorems of Pick and von Neumann, in: K. Jarosz (Ed.), Function

Spaces, Marcel Dekker, New York, 1992, pp. 273–280.[12] B. Lotto, Von Neumann’s inequality for commuting diagonalizable contractions I, Proc. Amer.

Math. Soc. 120 (1994) 889–895.[13] B. Lotto, T. Steger, Von Neumann’s inequality for commuting diagonalizable contractions II, Proc.

Amer. Math. Soc. 120 (1994) 897–901.


[14] T. Motzkin, O. Taussky, Pairs of matrices with property L II, Trans. Amer. Math. Soc. 80 (1955)387–401.

[15] V. Müller, The numerical radius of a commuting product, Michigan Math. J. 35 (1988) 255–260.[16] M. Neubauer, D. Saltman, Two-generated commutative subalgebras ofMn(F), J. Algebra 164

(1994) 545–562.[17] M. Neubauer, B. Sethuraman, Commuting pairs in the centralizers of 2-regular matrices, J. Algebra

214 (1999) 174–181.

Documents

Approximating commuting operators