Between-Group Metrics

DOI: 10.1007/s00357-011 -

Between-Group Metrics

John C. Gower

The Open University, United Kingdom

Casper J. Albers

University of Groningen, The Netherlands

Abstract: In canonical analysis with more variables than samples, it is shown that, aswell as the usual canonical means in the range-space of the within-groups dispersionmatrix, canonical means may be defined in its null space. In the range space we havethe usual Mahalanobis metric; in the null space explicit expressions are given andinterpreted for a new metric.

Keywords: Between-group distances; Canonical analysis; Mahalanobis distance.

1. Introduction

In Canonical Variate Analysis measurements on each of p variablesfor n samples are distributed among k groups of sizes n1 + n2 + . . . +nk = n. These measurements are available in an n × p matrix X, assumedcolumn-centered, and therefore of rank at most min(n − 1, p), with group-membership given in an n × k indicator matrix G. Here, gij = 1 when theith sample belongs to the jth group but otherwise G is zero. Thus G1 = 1and 1′G = 1′N, where N = diag(n1, n2, . . . , nk) = G′G; we also writenHn = nGkN

−1k G′

n (idempotent). With this notation, the usual betweenand within-group orthogonal decomposition:

Authors’ Addresses: J.C. Gower, Department of Mathematics and Statistics, The OpenUniversity, Walton Hall, Milton Keynes MK7 6AA, United Kingdom, e-mail: [email protected],uk; C.J. Albers, Department of Psychometrics and Statistics, University of Gronin-gen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands, e-mail: [email protected].

Published online

-9090 z

Journal of Classification 28:315-326 (2011)

4 October 2011

J. Gower and C. Albers

nXp = nGkN−1k G′

nXp + [I− nGkN−1k G′

n]nXp = HX+ (I−H)X

has an associated analysis of variance

pX′nXp︸︷︷︸T

= pX′nHnXp︸︷︷︸B

+ pX′n(I−H)nXp︸︷︷︸

W

,

expressing that the Total sum-of-squares (T) is the sum of the Between-Group sum-of-squares (B) and the Within-Group sum-of-squares (W). Notethat the n rows of HX repeat the k different means n1, n2, . . . , nk times; toget each mean only once, we require N−1GX which we write as X̄.

In classical canonical variate analysis, the spectral decompositionW=UΣ2U′ underpins the transformation to canonical variablesXL whereL = UΣ−1. These define canonical means HXL with inner-products(HXL)(HXL)′=HXW−1X′H that use the metric LL′=W−1 to gener-ate Mahalanobis distances between the canonical means; note that L′WL =I. The rank of the canonical means is k − 1 (or less) but they may be ap-proximated in a smaller space, by using a conventional principal compo-nents analysis of HXL. The two steps (i) define a metric, followed by (ii) aprincipal components analysis, are usually subsumed into a single two-sidedeigenvalue calculation but the two-step process is better for understandingthe following.

The above requires that W has full rank p. The case when p > n isincreasingly important where much of the interest is in overcoming compu-tational difficulties, perhaps reducing the number of variables by identifyingand rejecting those deemed irrelevant or by focussing on some form of func-tional multivariate analysis (see e.g. Krzanowski, Jonathan, McCarthy, andThomas 1995; Mertens 1998). Here, we explore a novel structural prop-erty of canonical analysis that occurs when p > n. When p > n thenrank W = n − k and rank T = n − 1 and W does not have an ordinaryinverse so the Mahalanobis metric is undefined. This need not be a majorproblem, because we may define the spectral decomposition

W = (U1,U0)

(Σ2

0

)(U′

1U′

0

),

where Σ2 is a diagonal matrix of the non-zero eigenvalues of W, U1 arethe eigenvectors in the range space of W and U0 those in its null space.Then, we may define canonical means HXL where now L = U1Σ

−1 inthe range space. No longer is L′WL = I but rather L′WL = In−k. Then(L′WL)(L′WL) = In−k = L′WL. We may write this:(

L′U′

0

)WLL′W (L,U0) =

(L′U′

0

)W (L,U0) ,

316


which, because (L,U0) is non-singular, gives W(LL′)W = W showingthat the metric LL′ is now a generalized inverse, rather than an inverse, ofW. With this minor change, we may proceed as before with a principalcomponents analysis. The range space solution is a natural extension ofclassical canonical variate analysis and is reasonably well known (see e.g.Krzanowski et al. 1995; Rao and Yanai 1979). An interesting thing is thatcanonical means may also be defined in the null space and it is this solutionthat is the focus of interest in the following. This solution is less well knownthough it was noted by Krzanowski et al. (1995) and termed by them “zerovariance discrimination”. The null space solution follows from noting thatthe null vectors satisfy:

X′(I−H)XU0 = 0 ,

and soXU0 = HXU0. (1)

Note that the k different means are repeated n1, n2, . . . , nk times in the nrows of both XU0 and, equivalently, HXU0. Being null vectors of W, thecanonical variables XU0 have zero variability within groups, but the cor-responding canonical means HXU0 have non-zero sums-of-squares. Evi-dently, the computation of HXU0 is straightforward, as is any subsequentprincipal components analysis; see e.g. Albers and Gower (2010), where apractical application of the null space solution may be found. For a fullerunderstanding it is interesting to ask what functional form, analogous to Ma-halanobis distance in the range space, is taken by the distance dij betweenthe ith and jth canonical means in the null space of W. This is our main ob-jective below, but first we have to address a minor but troublesome technicalmatter.

The total dispersionT = X′X has rank n−1 so implying an extensivenull space of rank p−n+1; this null space is also common to the null spacesof B and W. This common null space is uninteresting; we are concernedonly with the additional null spaces of W and B that are in the range spaceof T, especially the intersection of the range space of T and the null spaceof W which normally has dimension/rank k − 1. To simplify the followingdevelopment we assume that the common null space has been eliminated bytaking the spectral decomposition T = VΛV′ and redefining X as XV.Throughout the following, we assume that X has been so redefined, nowX′X = Λ.

This initialization to give X with n− 1 columns, eliminates the com-mon null space from the dispersion matrices T, B and W. However, it doesnot remove null items from X itself. Indeed the vector 1, which eliminatesthe general mean, is one such null vector and is what gives rise to the rather

317


extensive algebraic manipulations required in the following. Any null linearcombinations among the rows of X would generate additional null vectorsin the common null space. The position is complicated, because such linearcombinations may be of two, not mutually exclusive, kinds (i) linear combi-nations within groups and (ii) linear combinations among the group means.Loss of rank within groups merely reduces the number of columns of the re-defined X but to handle all variants that include (ii) is not trivial and wouldgreatly extend this short paper. Therefore, apart from some passing conclud-ing remarks, throughout the following, we assume that rank X̄ = k− 1 andthat rank X = n− 1.

2. Derivation of d2ij

From here on we shall be working in the null space of W so we dropthe suffix from U0. Starting from XU = HXU for the null-vectors of W,as in (1), we have that

nXn−1Uk−1 = G(G′G)−1G′XU (2)

= nGkAk−1 ,

where kAk−1 = X̄U are the k group-mean coordinates given in repeatedform in XU.

Thus, the rows of A = (G′G)−1G′XU give coordinates that gener-ate the distance dij between each pair of group-mean coordinates. To derivean explicit expression for d2ij , we do not require A itself, which has the usualrotational indeterminacy, but only AA′. Then, d2ij = (AA′)ii +(AA′)jj −2(AA′)ij .

From (2) we have

U′U = ((X′X)−1X′GA)′((X′X)−1X′GA)

= A′PA (say) ,

where P = G′X(Λ−2)X′G with X′X = Λ. So from the orthogonalityof U, our problem is to solve A′PA = I for AA′. The solution wouldbe trivial if AA′ and P were not singular, but 1′NA = 1′XU = 0 and1′P = 1′G′X(A−2)X′G = 1′G′X(Λ−2)X′G = 0, showing that bothAA′ and P are singular (with rank k − 1). To proceed, consider(

A′N1n1

′N

)(Q+ λ11′)(NA, 1

nN1) , (3)

where 1′NQ = 1′XPN−1 = 0 shows that Q = N−1PN−1 = X̄Λ−2X̄′.The introduction of λ may seem arbitrary but we shall show that it has nosubstantive effect. On expansion, (3) becomes

318


(I 00 λ

)giving:

Q+ λ11′ =(

A′N1n1

′N

)−1(I

λ

)(NA, 1

nN1)−1

and (Q+ λ11′

)−1= (NA, 1

nN1)

(I

1λ

)(A′N1n1

′N

).

Thus,N−1

(Q+ λ11′

)−1N−1 = AA′ + 11′/λn2. (4)

From (4) we may calculate d2ij . The constant term 11′/λn2 has no effecton derived distances and we shall show that (Q+ λ11′)−1 also is invariantto non-zero choices of λ. Thus (4) contains everything needed for findingAA′ but the evaluation of (Q+λ11′)−1 needs some care, because 1′NQ =1′XPN−1 = 0 shows that Q is singular and the equivalent of (A.4) isunavailable.

The technical details for inverting Q + λ11′ are rather extensive andhave been exiled to Appendix B, where (B.7) gives our main result

d212 =(n1 + n2)

2

n21n

22

[(x̄1 − x̄2)

′Λ−2(x̄1 − x̄2)− (q1 − q2)′Q−1

12 (q1 − q2)]−1

.

(5)where Q12 is the matrix formed from Q by striking out its first two rowsand columns and q1 and q2 are vectors of the final k−2 elements in the firsttwo columns of Q. We give the result for the first two groups but clearly itgeneralizes trivially to give the distance dij between groups i and j.

3. Interpretation

When k = 2 the vectors q1 and q2 of (5) do not exist and C, asdefined in Appendix B, becomes

C =

(c11 c12c21 c22

),

and its inverse is immediately replacing (5) by

d−212 =

n21n

22

(n1 + n2)2 (x̄1 − x̄2)

′ Λ−2 (x̄1 − x̄2) . (6)

We next examine the part of expression (5) that is enclosed in square brack-ets. We have

X̄ =

⎛⎝ x̄1

x̄2

X̄12

⎞⎠ and

(q1 − q2) = (x̄1 − x̄2)Λ−2X̄′

12

Q12 = X̄12Λ−2X̄′

12

}.

319


Figure 1. R is the space spanned by the means x̄3, . . . , x̄k. Apart from the factorn1n2/(n1 + n2), the inverse of the distance between groups 1 and 2 in the intersectionspace is given by the illustrated projection onto the space orthogonal to R.

Hence,

d−212 =

n21n

22

(n1 + n2)2[(x̄1 − x̄2)

′Λ−2(x̄1 − x̄2)− (x̄1 − x̄2)′Λ−1RΛ−1(x̄1 − x̄2)

]where R = Λ−1X̄′

12(X̄12Λ−2X̄′

12)−1X̄12Λ

−1 represents orthogonal pro-jection in the metric Λ onto the space spanned by the k−2 rows of X̄12Λ

−1.Thus

d−212 =

n21n

22

(n1 + n2)2[(x̄1 − x̄2)

′Λ−1(I−R)Λ−1(x̄1 − x̄2)]. (7)

Thus, expressions (5), with its variants (6) and (7) are our main results.In the more usual range space solution, inter-group distance is a Ma-

halonobis distance. In the intersection space solution discussed here, d−112 ,

especially in the special case (6), is seen to have some of the features of aMahalonobis distance but with W replaced by Λ2 = T2, the square of thetotal variation. Of course d−1

12 is not a Mahalonobis distance; formally it isnot even a distance. Apart from the inverse relationship, we have T occuringin squared form. This is not surprising as a dimensional analysis shows: Ris dimensionless, d12 has dimension L (for length) as do x̄1 and x̄2, while Λhas the dimensions of T = X′X. Together, these show that the right and leftside of (7) match correctly. The interpretation of (7), visualized in Figure1, is that d−1

12 is a measure of how far the space of x̄1, x̄2 is from the spacespanned by x̄3, . . . , x̄k .

320


If x̄1 and x̄2 lie in R then (7) gives a null projection onto R⊥ but thiscase is excluded by our assumption that rank X̄ = k − 1. An indication ofthe corresponding results when rank X̄ < k − 1 is given by the collinearitycase k = 3 and rank X̄ = 1. Then, without giving the detailed derivation,it can be shown that d−2

12 is proportional to (x̄1 − x̄2)′Λ−2(x̄1 − x̄2). That

the projection term vanishes is consistent and plausible but we do not knowwhether it generalizes to other reduced rank cases.

4. Discussion

A referee remarked that the intersection space (i.e. the intersectionof the range space of T with the null space of W) is not so interestingfor canonical analysis. Perhaps utility rather than interest was behind theremark; in a statistical context, something which is not useful might be re-garded as uninteresting. Whatever was intended, it is true that the linkedconcepts of interest and utility deserve consideration.

We believe that the existence of the intersection space makes it inter-esting and its properties worthy of investigation. Such a study is an essentialprerequisite for assessing utility. Our explicit formulae for d2ij are inter-esting, with interesting interpretations. We apologize for the fairly heavyalgebra used; the elegance of the results suggests that there ought to be aneater derivation. We have seen that the d−2

ij (note the inverse) has someresemblance to Mahalanobis distance, using the total variation T rather thanwithin-group variation W. Unlike the Mahalonobis distance this quantityis not dimensionless, indicating the desirability of prior normalizations. Asthere is no within-group variation, there is complete separation between thegroups. Is this separation meaningful? It might be thought that the dis-tance depended on the vagaries of the number p of variables and that justby increasing p, different intersection space distances would be found. But,the size k − 1 of the intersection space remains unchanged and perhapsother properties of the intersection space remain invariant. Krzanowski et al.(1995) compared discrimination as effected by a variety of methods, mainlybased on the range space of W but also using the intersection space. Theyfound that the intersection space (Zero-variance Discrimination in their ter-minology) could behave well. Our own preliminary numerical results alsosuggest that distance in the intersection space is worth considering (Albersand Gower 2010). We suggest that performance depends on whether Wcontains variables with very small relative normalized variation. Such vari-ables will be good discriminators and in the limit will become part of theintersection space. A variable with truly zero within group variation will bean ideal discriminator, provided the corresponding between group variationis not itself null.

321


To sum up, the intersection space is interesting and there is evidencethat it could be useful. Research is now needed on invariance properties ofthe intersection space.

Appendix ABasic Formulae

This appendix contains some basic results for convenience of refer-ence. Most are well-known and, apart from (A.1) and A.2), no derivationsare supplied.

Using 1′NQ = 0 shows that

n1q11 + n2q12 + 1′Mq1 = 0 ,

n1q12 + n2q22 + 1′Mq2 = 0 ,

n1q1 + n2q2 +Q12M1 = 0 ,

where N = diag(n1, n2,M). It follows that

n21q11+n2

2q22+2n1n2q12 = −1′M(n1q1+n2q2) = 1′MQ12M . (A.1)

Similarly,

n1c1 + n2c2 = n1q1 + n2q2 + λ(n1 + n2)1

= −Q12M1+ λ(n1 + n2)1

= −C12M1+ λn1. (A.2)

Furthermore, we need

det

(α a′a A

)=

(α− a′A−1a

)detA, (A.3)

detC12 = det(Q12+λ11′) = detQ12

(1 + λ1′Q−1

12 1), (A.4)

and

C−112 = (Q12+λ11′)−1 = Q−1

12 −λQ−1

12 11′Q−1

12

1 + λ1′Q−112 1

. (A.5)

Appendix BThe Inverse of Q+ λ11′ and the Derivation

of Explicit Formulae for d2ij

For simplicity, we derive d212, the other values of d2ij following bysymmetry. Notation is established in the following partitioning of the matrixC = Q+ λ11′ , written explicitly as

322


C =

⎛⎜⎜⎜⎜⎝

c11 c21 c′1c12 c22 c′2

c1 c2 C12

⎞⎟⎟⎟⎟⎠ ,

where ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

C = Q+ λ11′,C12 = Q12 + λ11′,c1 = q1 + λ1 ,c2 = q2 + λ1 ,

(c11, c12, c22) = (q11, q12, q22) + λ .

From (4) we obtain

Δd212 =1

n21

det

(c22 c′2c2 C12

)+

1

n22

det

(c11 c′1c1 C12

)

+2

n1n2det

(c12 c′2c1 C12

), (B.1)

where Δ = detC, and the determinants are the cofactors of c11, c22 andc12. Using (A.3), (B.1) becomes

Δd212 = detC12

[1

n21

(c22 − c′2C

−112 c2

)+

1

n22

(c11 − c′1C

−112 c1

)

+2

n1n2

(c12 − c′1C

−112 c2

) ],

which simplifies to

n21n

22Δd212 = detC12

[n21c11 + n2

2c22 + 2n1n2c12

−(n1c1 + n2c2)′C−1

12 (n1c1 + n2c2)

]. (B.2)

Using (A.1) and (A.2) we have

(n1c1 + n2c2)′C−1

12 (n1c1 + n2c2)

= (C12M1− λn1)′C−112 (C12M1− λn1)

= 1′MC12M1− 2λn1′M1+ λ2n21′C−112 1. (B.3)

Bringing everything together using (B.3), (A.4) and (A.5), and expanding interms of qij , (B.2) becomes

323


n21n

22Δd212 = (1 + λ1′Q121) detQ12

[n21q11 + n2

2q22 + 2n1n2q12

+λ(n1 + n2)2 − 1′MQ12M1− λ(1′M1)2 + 2λn1′M1

−λ2n21′(Q−1

12 − λQ−112 11

′Q−112

1 + λ1′Q−112 1

)1

],

which, on using (A.1), simplifies to

n21n

22Δd212 = detQ12

[λ(n1 + n2)

2 + λ(n− n1 − n2)(n+ n1 + n2)−λn21′Q−1

12 1

1 + λ1′Q−112 1

](1 + λ1′Q−1

12 1)

= detQ12

[λn2

(1 + λ1′Q−1

12 1)− λ2n21′Q−1

12 1]

= λn2 detQ12. (B.4)

This simple result shows that d2ij is proportional to detQij .To show that d2ij in (B.4) is independent of λ requires an analysis of

Δ = det(Q + λ11′). Let R = Q + 11′. Then I = R−1Q +R−111′ andso 1′IN1 = 1′

(R−1Q+R−111′

)N1 and because QN1 = 0, we have

that 1′N1 =(1′R−11

)(1′N1), showing that

1′R−11 = 1.

ThenΔ = det(R+ (λ− 1)11′)

= detR(1 + (λ− 1)1′R−11

)= λdetR

= λdet(Q+ 11′).

That R �= 0 is guaranteed by our assumption that X̄ = k − 1 made at theend of Section 1.

Thus, finally, (B.4) becomes

d212 =1

det(Q + 11′)n2

n21n

22

detQ12 , (B.5)

showing that d2ij depends only on the group sizes and detQ12. Recall thatQ = X̄Λ−1X̄′ and that Q12 is obtained from Q by striking out its first tworows and columns.

Alternative expressionsThe expression for Δ has many forms. Of special interest is that derivedfrom writing

Δ = det

⎛⎝ c11 c12 c′1

c12 c22 c′2c1 c2 C12

⎞⎠ .

324


Multiplying the first row by n1 and then adding qi (i = 2, . . . , k) timesthe other rows, replaces the first row by nλ1. This shows that λ may besubtracted from rows 2, . . . , k to give

n1Δ = λn det

⎛⎝ 1 1 1′

q12 q22 q′2

q1 q2 Q12

⎞⎠ ,

whence similar operations on the columns give

n21Δ = λn2 det

⎛⎝ 1 1 1′

0 q22 q′20 q2 Q12

⎞⎠

= λn2(q22 − q′

2Q−112 q2

)detQ12.

Similar expressions may be derived by annihilating the second row/columnand the first row and second column to give

n22Δ = λn2

(q11 − q′

1Q−112 q1

)detQ12

n21Δ = λn2

(q22 − q′

2Q−112 q2

)detQ12

−n1n2Δ = λn2(q12 − q′

1Q−112 q2

)detQ12

⎫⎬⎭ . (B.6)

Combining, gives the symmetric form

(n1 + n2)2Δ =

λn2[(q11 + q22 − 2q12)− (q1 − q2)

′Q−112 (q1 − q2)

]detQ12 ,

which, on substitution into (B.5) gives

d212 =

(n1 + n2)2

n21n

22

[(x̄1 − x̄2)

′Λ−2(x̄1 − x̄2)− (q1 − q2)′Q−1

12 (q1 − q2)]−1

.

(B.7)

Other substitutions for Δ given by (B.6) give alternative, less symmetric,expressions for d2ij .

References

ALBERS, C.J., and GOWER, J.C. (2010). “Canonical Analysis: Ranks, Ratios and Fits”,available at http://www.gmw.rug.nl/∼casper

KRZANOWSKI, W.J., JONATHAN, P., MCCARTHY, W.V., and THOMAS, M.R. (1995),“Discriminant Analysis with Singular Covariance Matrices: Methods and Applica-tions to Spectroscopic Data”, Journal of the Royal Statistical Society, Series C (Ap-plied Statistics), 44, 101–115.

325


MERTENS, B.J.A. (1998), “Exact Principal Components Influence Measures Applied tothe Analysis of Spectroscopic Data on Rice”, Journal of the Royal Statistical Society,Series C (Applied Statistics), 47, 527–542.

RAO, C.R.,and YANAI, H.(1979), “General Definition and Decomposition of Projectorsand Some Applications to Statistical Problems”, Journal of Statistical Planning andInference, 3, 1–17

326

Documents

Between-Group Metrics