View
222
Download
0
Category
Preview:
Citation preview
ORIGINAL ARTICLE
doi:10.1111/j.1558-5646.2008.00587.x
THE ONTOGENETIC TRAJECTORY OF THEPHENOTYPIC COVARIANCE MATRIX, WITHEXAMPLES FROM CRANIOFACIAL SHAPE INRATS AND HUMANSPhilipp Mitteroecker1,2,3 and Fred Bookstein4,5
1Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria2Department of Theoretical Biology, University of Vienna, Vienna, Austria
3E-mail: philipp.mitteroecker@univie.ac.at4Department of Anthropology, University of Vienna, Vienna, Austria5Department of Statistics, University of Washington, Seattle, Washington
Received September 4, 2008
Accepted November 5, 2008
Many classic quantitative genetic theories assume the covariance structure among adult phenotypic traits to be relatively static
during evolution. But the cross-sectional covariance matrix arises from the joint variation of a large range of developmental
processes and hence is not constant over the period during which a population of developing organisms is actually exposed
to selection. To examine how development shapes the phenotypic covariance structure, we ordinate the age-specific covariance
matrices of shape coordinates for craniofacial growth in rats and humans. The metric that we use for this purpose is given by the
square root of the summed squared log relative eigenvalues. This is the natural metric on the space of positive-definite symmetric
matrices, which we introduce and justify in a biometric context. In both species, the covariance matrices appear to change
continually throughout the full period of postnatal development. The resulting ontogenetic trajectories alter their direction at
major changes of the developmental programs whereas they are fairly straight in between. Consequently, phenotypic covariance
matrices—and thus also response to selection—should be expected to vary both over ontogenetic and phylogenetic time scales as
different phenotypes are necessarily produced by different developmental pathways.
KEY WORDS: Craniofacial growth, geometric morphometrics, morphological integration, Riemannian metric, space of covariance
matrices.
Phenotypic and genetic covariance structures among morphome-
tric traits play central roles in many quantitative models that pre-
dict response to natural or artificial selection or that reconstruct
evolutionary history. In these models, the direction and rate of
evolution are determined by the genetic covariance structure in
relation to the selective regime (Lande 1979; Cheverud 1984).
Long-term predictions depend on the assumption that the covari-
ance structure remains relatively constant during evolution. Many
publications compare covariance matrices of adult populations
across a wide taxonomic range, providing conflicting evidence
for that assumption (for reviews see, e.g., Roff 1997; Steppan
1997; Lynch and Walsh 1998; Steppan et al. 2002; McGuigan
2006).
But the cross-sectional phenotypic covariance structure of
a population arises out of the joint variation of a large range
of developmental processes and factors with varying pleiotropy
throughout ontogeny. Whenever development varies among indi-
viduals, the population covariance matrix is subject to ontogenetic
7 2 7C© 2009 The Author(s). Journal compilation C© 2009 The Society for the Study of Evolution.Evolution 63-3: 727–737
P. MITTEROECKER AND F. BOOKSTEIN
change, and response to selection for subadult traits would differ
across age stages. When new processes emerge in the course of
development (by way of, for example, expression of new genes,
new epigenetic interactions, or environmental influences), a novel
pattern of variances and covariances is superimposed on top of the
previous population covariance structure (see, e.g., Mitteroecker
and Bookstein 2007, 2008; Hallgrimsson and Lieberman 2008).
Similarly, when developmental processes cease at some stage,
they stop contributing to the covariance structure of a population.
The timing of cessation and initiation of factors may vary itself
among individuals and contribute to the observed phenotypic vari-
ation. A careful statistical tracking of age- or stage-specific covari-
ance matrices over ontogeny hence might permit inferences about
the timing and phenotypic effects of developmental processes.
Furthermore, insights into the ontogeny of covariance structures
may lead to a refined understanding of the evolutionary behavior
of (adult) population covariance matrices.
How does one study a series of covariance matrices arising
from empirical data in this way? One standard statistical approach
to series of any complex quantitative representation—a vector of
data, an image, a matrix, or other algebraic structure—is to put a
reasonable distance function in place between all pairs of the ob-
jects and then carry out an ordination analysis, a low-dimensional
representation of the patterns of distances among the objects. The
distance (or “metric”) to be used should be one that is invariant
(unchanging) under operations that the scientist in the application
at hand would wish not to have an effect on the computations.
In the application here, that invariance is to be over changes of
basis for the covariance structure: the distances should remain
unchanged when the covariance matrices are, for instance, com-
puted from principal components instead of the original variables,
or when all landmark configurations are jointly rotated, translated,
or rescaled.
The metric to which we turn for the distance between two
covariance matrices is the square root of the summed squared
logarithms of the relative eigenvalues between them. Relative or
generalized eigenvalues express the variances and covariances
of one sample relative to that of another sample; they are equal
to the eigenvalues of the product of one matrix premultiplied
by the inverse of the other matrix. Although this metric has not
hitherto been applied in theoretical or mathematical biology, to
our knowledge, it has been known for over a century in differential
geometry and has been applied before in mathematical–statistical
studies of information geometry (Helgason 1978; Amari 1985).
We will show how this metric is in fact the “natural” distance
(in a mathematical sense) representing the contrast between two
covariance matrices. We will argue this twice, at two different
levels of mathematical abstraction. In a more geometric language,
inside the nonlinear space of all possible covariance matrices
the proposed metric specifies the geodesic—the shortest path—
between two matrices. This metric ensures that when a population
of developing phenotypes is constantly transformed by the same
set of developmental factors, the resulting curve of covariance
matrices falls in a straight line; but changes in the developmental
program lead to bends in their trajectory (see below for proofs).
Before setting out a rather technical explanation, which the
less mathematical reader may simply skip over, we demonstrate
the usefulness of this approach in examples from the postnatal
craniofacial ontogeny of two taxa, rats and humans. We find that
the age-specific population covariance matrices change through-
out the full period of postnatal development in both species. The
time course and the pattern of these changes reflect onset and off-
set of developmental processes in a surprisingly precise way.
Example 1: Ontogenetic Changeof Covariance Structure in the RatCraniumOur first example is a familiar dataset, the octagon of neuro-
cranial landmarks digitized by Melvin Moss from a longitudinal
roentgenographic study of 21 genetically homogeneous male lab-
oratory rats by Henning Vilmann (Fig. 1A). The data were pub-
lished in full as Appendix A.4.5 of Bookstein (1991) and have
been exploited in the 1998 textbook by Dryden and Mardia and
in many other publications. The data were collected at eight ages:
7, 14, 21, 30, 40, 60, 90, and 150 days. In the computations here,
we restrict the sample to the 18 animals with complete data at
all eight ages. Definitions of the eight landmarks and thin-plate
spline visualizations of developmental shape change can be found
in Bookstein (1991:67–69).
With eight landmarks there are 12 degrees of freedom for
the Procrustes shape coordinates. That is too many for reliable
estimation of eigenspaces of covariances on only 18 animals so
we reduce to subspaces of the first few principal components
(see Appendix). Analyses of the first four, five, six, or seven
principal components of the 12 variables all yield roughly the
same ordination of the covariance matrices. For definiteness, we
report the five-principal-component version below, accounting for
96% of the total shape variation.
After computing all pairwise distances between the covari-
ance matrices, we performed a principal coordinate analysis
(PCoord) to arrive at a low-dimensional ordination that provides
the best (least squares) approximation of these distances (see the
Appendix for computational details). Eigenvalues of the princi-
pal coordinate structure (6.2, 5.2, 1.9, . . .) indicate that a two-
dimensional representation is sufficient, accounting for 68% of
the total variation of the reconstructed metric coordinates. The
pairwise Euclidean distances within the space of the first two
principal coordinates have a correlation of 0.94 with the actual
distances between the matrices.
7 2 8 EVOLUTION MARCH 2009
ONTOGENY OF THE PHENOTYPIC COVARIANCE MATRIX
Figure 1. (A) Midsagittal section of a rat cranium with the eight neurocranial landmarks used in the analysis. For more details see
Bookstein (1991). (B) A lateral cranial radiograph of an adult human male with the 18 landmarks, including six semilandmarks along the
frontal bone (see Bulygina 2003; Bulygina et al. 2006).
Figure 2 shows the principal coordinate ordination, where
every point represents one covariance matrix and is labeled by
the individuals’ age. The gray line connecting the points hence
represents the “ontogenetic trajectory” of the covariance matrix.
As expected, there is continual change of the covariance matrix
throughout postnatal development. Apparently, between 21 and
40 days of age the trajectory changes its direction substantially.
Figure 2. The first two principal coordinates (PCoord) of the co-
variance matrices for the Vilmann rat data. Every point represents
one covariance matrix and is labeled by the age of the animals.
The gray line connecting these points hence represents the onto-
genetic trajectory of the population covariance matrix.
During this time span most of the factors driving postnatal cranial
growth undergo major modifications. At about the 21st day of
postnatal life rats are weaned and need to feed on a harder diet
thereafter. By day 20 the first two molars are completely erupted
in both the maxilla and the mandible, and the third molar reaches
its final position around day 40–50 (Shaw et al. 1950). Cranial
growth is not driven by growth and eruption of the dentition
after that age, and also brain growth decreases markedly after
the second week of postnatal life (Watson et al. 2006; Kousba
et al. 2007). Puberty in male rats typically starts at approximately
40 days of age, initiating a rise in testosterone level (McCormick
and Mathews 2007). Adult body size is reached at approximately
day 120 (Watson et al. 2006); this may account for the bend of
the trajectory between the two final age stages.
In fact, during the fourth-to-sixth week period, the trajectory
makes a turn closer to 180◦ than to 90◦—more nearly opposite
than orthogonal. To assess the actual pattern of change, for dif-
ferent subsegments of this path one can enquire as to whether the
largest contribution to that sum of squared logarithms comes from
a dimension that is increasing in variance or a dimension that is
decreasing in variance. We find that between the ages of 7 and
21 days, most of the difference between the covariance matrices
is accounted for by loss of variance of the uniform component
(the most large-scale shape features). From age 40–150 days, this
variance is partially restored: hence the appearance in Figure 2 of
the trajectory’s making nearly a 180◦ turn. From age 21–40 days
the most substantial change in the covariance structure appears to
be a general regulation of the size of the upper posterior neurocra-
nium with respect to the cranial base. And from age 40–150 days,
there is a direction of variance reduction as well, involving the
EVOLUTION MARCH 2009 7 2 9
P. MITTEROECKER AND F. BOOKSTEIN
regulation of the spacing of the three sutural points across the top
of the skull. Additional details will be treated in a separate paper.
Example 2: Ontogenetic Changeof Covariance Structure in theHuman CraniumIn a second example, we analyze a longitudinal sample consisting
of 440 human cephalograms from the Denver Growth Study car-
ried out between 1931 and 1966. Lateral radiographs of 13 males
and 13 females were taken beginning at age one month approxi-
mately every year up to the age of 21 or later. On every radiograph,
18 landmarks (including six semilandmarks) on the midsagittal
plane of the face and the cranial base were digitized by Ekaterina
Bulygina (see Fig. 1B, Bulygina 2003, and Bulygina et al. 2006 for
more details). After sliding the semilandmarks (Bookstein 1997;
Gunz et al. 2005), the 440 configurations of 18 landmarks and
semilandmarks were superimposed by a Generalized Procrustes
Analysis (Rohlf and Slice 1990; Bookstein 1996), resulting in
36 shape variables (with 32 degrees of freedom). Not all indi-
viduals were radiographed at exactly the same age intervals. To
compute a cross-sectional covariance matrix for each age class,
the shape of each individual was interpolated at a one-year interval
(from 2 to 17 years of age) based on the existing radiographs by
linearly regressing the shape coordinates on age within a moving
time frame of six years. Based on the first 15 principal com-
ponents of the full data (accounting for 96% of the total shape
variation), a covariance matrix was computed for every age cohort
after removing age-specific sexual dimorphism.
The first three principal coordinates explain 79% of the to-
tal variation of the reconstructed coordinates, and the Euclidean
distances within the space of the first three principal coordinates
correlate 0.98 with the actual distances among the covariance
matrices. In the projection of the first three principal coordinates
(Fig. 3), once again we see a covariance matrix that is changing
Figure 3. A projection of the first three principal coordinates (PCoord) of the covariance matrices for the Denver Growth Study. Every
point represents one covariance matrix, labeled by the age at which the individual shape trajectories have been estimated.
throughout the entire period of postnatal development. The tra-
jectory exhibits three major bends at which the direction rotates
at about 90◦; between these change-points it is basically straight.
The changes in direction are at ages 6, 9 to 11, and 15 years.
These three bends in the ontogenetic trajectory of the covari-
ance matrix reflect major changes in the postnatal developmental
program of the human cranium. At the first bend, about 6 years of
age, the brain has reached its final size and neurocranial growth
basically ceases. Also, the first permanent teeth (the first molars)
usually erupt at that age. The next bend corresponds to the onset
of puberty, which for American white populations starts on av-
erage at age 10–10.5 years in females and at age 11.5–12 years
in males (Tanner et al. 1985; Ulijaszek et al. 1998). Because the
sample is of mixed sex, male and female onset of puberty are
separately reflected by two alterations of the trajectory. The last
bend in the trajectory corresponds to the cessation of pubertal
growth phase and the achievement of adult height in females at
an average age of 14.5 years. The average time of cessation of
growth in males at 17.5 years is not included in the investigated
time range. The large distance between the covariance matrices
at age 3 and age 4 coincides with the complete eruption of the
deciduous dentition during the third year of life. However, when
analyzed more closely, the trajectory between the second and the
sixth year of life is not as straight as the other linear segments,
rather it is somewhat smoothly curved, indicating slight and con-
tinuous change of development, such as the transition from infant
to childhood growth pattern (Ulijaszek et al. 1998).
In both examples, similar ordinations of such high-
dimensional trajectories could potentially also result from a ran-
dom walk—a purely random process (similar to the so-called
“horseshoe effect”; Kendall 1971). To rule out such an artifact
we repeated the ordinations with different subsets of the full age
range and with different numbers of principal components (re-
sults not shown). The trajectories bend at the same ages in these
subsets, so a statistical artifact seems unlikely.
7 3 0 EVOLUTION MARCH 2009
ONTOGENY OF THE PHENOTYPIC COVARIANCE MATRIX
Why This Particular Formula for theDistance between Two CovarianceMatrices?We are introducing a new metric, a specific formula for the dis-
tance between two covariance matrices, and we need to argue that
this must be the metric to be used for questions like ours. A while
ago a similar question was asked in geometric morphometrics.
Its answer emerged by consensus, as practitioners of this statis-
tical method came to adopt the Procrustes metric as the distance
measure for pairs of shapes (of landmark configurations). The
Procrustes metric is natural in the sense in which mathematicians
use this word. It responds to the invariances that we need to im-
pose on a shape distance—no dependence on relative position,
relative orientation, and relative scale of the two shapes under
study—but it otherwise matches the familiar distance function,
sum of squared differences of Cartesian coordinates, that we are
used to. For a justification of the Procrustes metric in these terms,
see, for instance, Bookstein (2009).
The distance between two p × p covariance matrices S1 and
S2 that we propose and that we used in the two examples above is
the square root of the summed squared logarithms of the relative
eigenvalues between the two matrices:
‖S1, S2‖cov =√√√√
p∑i=1
(log λi )2, (1)
where λi are the generalized or relative eigenvalues of S1 with
respect to S2, or equivalently, the eigenvalues of S−12 S1. (Note that
‖S1, S2‖cov = ‖S2, S1‖cov, but see the following section for more
details.)
We are similarly in need of a mathematical justification for
this fundamental formula in the context of quantitative biology.
The mathematical literature in this regard is even older than that
of the Procrustes metric, but has not been introduced previously
to the community of biometricians and quantitative biologists, as
far as we know. The following pair of expositions are two initial
attempts to bridge the gap between the theorems that establish it
as mathematically unique and the intuitive grasp of the biological
reader who wishes to apply it in a biological context.
To begin, it is useful to consider a slide rule: a once-familiar
mechanical analogue computer invented in the 17th century to
multiply numbers by adding their logarithms. The scaling in terms
of the logarithm is unique in the sense that the log (to any base)
is the only function f such that f (xy) = f (x) + f (y) for all positive
x, y.
Think of multiplication by a number as a mathematical op-
erator on the positive real line, so that the number 3, for in-
stance, is both a dot on the line and the action of tripling, and
the number 2 is both a dot and the action of doubling. We have
log 2 + log 3 = log 6. In terms of distances along the slide rule,
the distance between 1 and 2 is the same as the distance between
3 and 6. In this way, we assign a measure (log 2) to the operation
of doubling, a distance between x and 2x that is the same no mat-
ter what the positive number x may be, and likewise a measure
(log 3) to the operation of tripling, such that the distance that
corresponds to the combination of the two operations is the sum
of the distances of the two factors each from the identity map
(the number 1). We emphasize that the logarithm is the unique
distance function with this property.
The logarithms in formula (1) are there for precisely the
same reason as the logarithms built into the scales of a slide
rule. For a covariance matrix is both an object and an operator in
the same sense that a quantity like 2 or 3 is both a number and
an operator. Covariance matrices multiply vectors to give other
vectors, for example. If we establish a distance measure between
two matrices S1 and S2, and we reexpress both of them using a
new set of basis vectors T, we would like to leave the distance
unchanged: we want
‖S1, S2‖cov = ∥∥TS1Tt , TS2Tt∥∥
cov (2)
for all invertible transformations (changes of basis) T, wheret is
the matrix transpose. In biometrical terms, this means that we
want a metric distance measure that is invariant against changes
of the factor structure of the measurements in which we express
it. The formula (1) is the unique formula with this property.
To begin understanding why we care about being independent
of changes of factor structure, let us consider a simple set of
covariance matrices: the p × p matrices
M1(x) =
⎛⎜⎜⎜⎜⎜⎝
x 0 . . . 0
0 1 . . . 0
. . .
0 0 . . . 1
⎞⎟⎟⎟⎟⎟⎠
.
Each of these is the operation that multiplies the first element of
a p-vector by x and leaves everything else alone (for instance,
M1(1) is just the identity matrix).
Suppose we want an expression for the distance between
two of these matrices that meets the “slide rule requirement,” the
requirement that the distance between M1(x) and M1(1) be the
same as the distance between M1(xy) and M1(y). Then it follows
that the distance of M1(x) from M1(1) has to be |log x|, which
is the same as the distance of M1(xy) from M1(y). The absolute
value | · | comes from the fact that “distance” on the slide rule is
the same in both directions. (The distance between 1 and 2 is the
same as the distance between 1 and .5; both are equal to log 2.)
If you insert the two matrices M1(y), M1(xy) in formula (1),
the relative eigenvalues you get are just λ1 = xy/y = x, λ2 =λ3 = . . . = 1. Then (because log 1 = 0) formula (1) reduces
to just
EVOLUTION MARCH 2009 7 3 1
P. MITTEROECKER AND F. BOOKSTEIN
‖M1(xy), M1(y)‖cov = |log x |,
which is the same as the slide rule formula we had before.
You can see that this argument would be the same whatever
single diagonal entry of M we were tinkering with. In fact, it is
the same for any single-factor model
M2(λ, v) = I + λvvt ,
where v is any p-dimensional vector whose variance is being in-
flated by a factor λ while leaving the variance of every other
direction of the measurement space unchanged. The distance be-
tween M2(λ1, v) and M2(λ2, v) has to be | log (1 + λ1) − log (1 +λ2)| for exactly the same reason that ‖M1(x), M1(y)‖cov had to
be |log x − log y|.There is another way to think about this construction. We
have a set of matrices M2(λ, v) for which distances add exactly—
we have the triangle equality
‖I, M2(λ1,v)‖cov +‖M2(λ1,v), M2(λ2,v)‖cov = ‖I, M2(λ2,v)‖cov
(3)
for all λ2 ≥ λ1 > 0. It is tempting (and ultimately entirely appro-
priate) to interpret this as the statement that the matrices M2(λ,
v) lie on a straight line in the space we are building.
We have arrived at a metric that makes sense of single-factor
models as lying on straight lines—sliderule-like lines—in some
sort of metric space suited for the covariance matrices they em-
body. We need only one more step to make the formula (1) per-
fectly general: the extension from a single-factor model to the
superposition of two. To show this, at least informally, let us
examine a covariance matrix with not one but two single factors,
I + λ1v1vt1 + λ2v2vt
2,
where v1 and v2 are orthonormal (vectors of unit length and cross-
product zero). For distances of this matrix from the identity I of
the same dimension, the eigendecomposition gives us the relative
eigenvalues 1 + λ1, 1 + λ2, 1, . . . , 1 and thus the squared distance
log2(1 + λ1) + log2(1 + λ2) from formula (1). But this is just
the Pythagorean sum of the two squared distances log2(1 + λ1),
log2(1 + λ2) of the components deriving from the factor models
on v1 and v2 separately. As a consequence, any two single factor
models with orthogonal factors would lie on two orthogonal lines
in the underlying space.
Because any covariance matrix can be built up as the su-
perposition of single factors—this is just another version of
the singular-value decomposition—we have actually shown that
the formula (1) applies to any difference between a matrix
and the identity, that is, any covariance matrix at all. By iterating
the one-dimensional change of basis a few paragraphs above as
many times as necessary—once for each dimension of the natural
basis—we find that we have established the formula (1) as the
unique formula for a metric distance between covariance matri-
ces that is invariant against changes of basis for the covariances,
which is to say, invariant against changes of the true factor model
underlying our datasets.
The Space of Covariance MatricesAnother justification for the proposed metric arises from differ-
ential geometry. A p × p covariance matrix A is a symmetric,
positive (semi)definite matrix and may be construed as an ele-
ment of a corresponding “space of covariance matrices.” Positive
definiteness means that xtAx > 0 for any vector |x| > 0, that is,
all variables and linear combinations of variables have a positive
variance, or equivalently, all eigenvalues of A are positive.
Let
M(p, R) := {A = (aij)|1 � i, j � p, ai, j ∈ R}
be the vector space of p × p real matrices and
Sym(p, R) := {A ∈ M(p, R)|A = At}
the space of all symmetric matrices. The space of covariance
matrices is the subspace
Sym+(p, R) := {A ∈ M(p, R)|A = At, xtAx > 0 ∀x �= 0}
of real, symmetric, positive definite matrices. It is a manifold of
dimension p(p + 1)/2 and has the form of a convex cone in the
vector space Sym(p, R) of all symmetric matrices (Forstner and
Moonen 1999; Smith 2005). A convex cone is a subset of a vector
space that is closed under linear combinations with positive coef-
ficients. That is, the sum of any two covariance matrices is again a
covariance matrix (whereas this is not necessarily the case for the
difference between two covariance matrices). Also, a covariance
matrix multiplied by any positive scalar yields another covariance
matrix, whereas this is obviously not the case for multiplication
with negative scalars (a variance cannot be negative).
A manifold can be approximated locally by a “flat” Euclidean
space, a so-called tangent space, but the way a distance is
defined in the tangent space varies smoothly from point to
point. The tangent space TASym+(p, R) to Sym+(p, R) at a point
A ∈ Sym+(p, R) is just given as
TASym+(p, R) := Sym(p, R),
the vector space induced by the p(p + 1)/2 nonredundant ele-
ments of the symmetric covariance matrices (i.e., relaxed for the
constraint of positive definiteness).
On a “curved” manifold a notion of a “straight line” still does
exist: the geodesics. These are curves that connect points on the
7 3 2 EVOLUTION MARCH 2009
ONTOGENY OF THE PHENOTYPIC COVARIANCE MATRIX
manifold along the shortest paths. The length of the shortest path
between any two matrices S1, S2 ∈ Sym+(p, R) in the space of co-
variance matrices is given by our formula (1), the square root of the
summed squared logarithms of the relative eigenvalues between
the two matrices (for formal proofs of this assertion see, e.g.,
Forstner and Moonen 1999; Moakher 2005; Smith 2005). The rel-
ative eigenvalues can be interpreted as the eigenvalues of S1 when
the variables are transformed so that S2 = I, and are thus identical
to the eigenvalues of S−12 S1—the matrix S1 expressed relative to
S2. If S1 = S2 then S−12 S1 = I and ‖S1, S2‖cov = 0, whereas a
large deviation of S−12 S1 from I corresponds to a large difference
between S1 and S2. This deviation, or the “size” of the matrix
S−12 S1, is measured as the sum of its squared log eigenval-
ues. This has the convenient property that log2(λi ) = log2(1/λi ),
guaranteeing that ‖S1, S2‖cov = ‖S2, S1‖cov and further that
‖S1, S2‖cov > 0 unless S1 = S2. Forstner and Moonen (1999)
also proved that the Triangle Inequality holds true for this dis-
tance function, that is, ‖S1, S2‖cov + ‖S2, S3‖cov ≥ ‖S1, S3‖cov.
The distance function thus is a valid metric, related to the Fisher in-
formation metric for multivariate normal distributions with com-
mon mean vectors (e.g., Lenglet et al. 2006). As we already men-
tioned, the metric is invariant to general linear transformations Tof the variables (eq. 2), because the eigenvalues of S−1
2 S1 are the
same as those of (T S2Tt)−1 T S1Tt = (Tt)−1S−12 S1Tt.
A low-dimensional ordination of the high-dimensional, non-
linear space of covariance matrices can be used to assess the
empirical distribution of a sample of covariance matrices. A prin-
cipal component analysis (PCA) of a sample of matrices (when
strung out as row vectors with n(n + 1)/2 elements) yields a
low-dimensional projection of the Euclidean vector space of sym-
metric matrices. This space is the tangent space to Sym+(p, R)
(although with a different metric) and so resembles the space
of covariance matrices only locally. More generally, the space
of covariance matrices may be ordinated by a principal coordi-
nate analysis (PCoord), also called classical metric scaling (e.g.,
Mardia et al. 1979). PCoord is based on an eigendecomposition
of a distance matrix in which the choice of the distance function
enables a wide range of different ordinations (see the Appendix).
Only when PCoord is based on Euclidean distances are the re-
sults identical to PCA. When PCoord is based on the Riemannian
metric of a manifold, the low-dimensional space of principal co-
ordinates optimally preserves the geometry of those distances (in
a least-squares sense). In contrast to PCA, these principal coor-
dinates usually cannot be interpreted as linear combinations of
the original variables, because they do not arise from a simple
orthogonal projection. Instead the underlying geometrical trans-
formation is often referred to as an “unfolding” or “flattening” of
the manifold.
Sampling experiments that we have done indicate that the
distribution of covariance matrices sampled from a Wishart dis-
tribution is approximately isotropic in this unfolded principal co-
ordinate space. This is an important property, guaranteeing that
random sampling does not induce an apparent “signal” in the or-
dination of the space of covariance matrices, but is isotropically
distributed analogously to independent and identically distributed
Gaussian noise in Euclidean space. Deviations from a spherical
distribution in the ordination thus are likely to indicate a biological
signal.
APPLICATIONS TO STUDIES OF ONTOGENY AND
PHYLOGENY
Both of our examples have been in a context of actual ontogenetic
series. Additional methodological considerations arise in applica-
tions to covariance matrices of different populations or species at
the same age or life stage. One issue, for instance, is to formulate
rules about which within-group factors ought to be removed be-
fore covariances are computed, as we removed sexual dimorphism
for the preceding human growth example. One might remove such
factors if they are assumed not to represent selectable variation.
Strongly dominant factors in the data, such as size allometry, can
likewise be projected out if their contributions to phenotypic vari-
ance are assumed to be ecophenotypic or if they would swamp
more subtle genetic or epigenetic effects of interest.
On the other hand, when a short list of factors is known in
advance to be of greatest interest, one might turn to only these
factors, and replace the metric based on the relative eigenvalues in
(1) simply by the square root of the summed squared log ratios of
variances along the predefined factors themselves (which is equiv-
alent to the Euclidean distance in the space of log-transformed
variances of the factors). This avoids what is otherwise an incon-
venience in the case of highly anisotropic covariance structures.
Minor differences in development between two populations may
lead to a “rotation” of factors with large variance (such as allom-
etry) and thus to a large impact on the relative eigenvalues, even
though the variance along these factors may remain constant. This
becomes particularly important when morphometric analyses are
in form space, with its typically highly anisotropic covariance
structures, instead of in shape space, where these matrices are
more nearly spherical (Mitteroecker et al. 2004, 2005). But this
metric for the variation along a priori defined factors does not
take into account mutual covariation and requires that the factors
correspond biologically among the groups, i.e., that they represent
the “same” factors of variation even though differing slightly in
direction.
DiscussionWhenever development varies among individuals, the phenotypic
covariance structure will undergo ontogenetic change. For cranial
growth in both rats and humans, we found that the cross-sectional
EVOLUTION MARCH 2009 7 3 3
P. MITTEROECKER AND F. BOOKSTEIN
population covariance matrix changes continually throughout the
full period of postnatal development. These findings are in agree-
ment with several earlier studies reporting age-specific differ-
ences among covariance matrices. However, owing to method-
ological limitations, most of these studies could not describe the
actual temporal pattern of change. For example, Zelditch (1988),
Zelditch et al. (1992), and Zelditch and Carmichael (1989) iden-
tified a repatterning of the covariance structure in early postnatal
ontogeny of the rat cranium. Atchley and Rutledge (1980) re-
ported a change of the covariance structure in rat development
associated with a rapid growth phase. Rutledge et al. (1972),
Atchley and Rutledge (1980), and Nonaka and Nakata (1984)
found that heritable phenotypic variance and covariance increase
during postnatal development of mice and rats whereas variance
due to maternal effects decreases (particularly after weaning).
Zelditch et al. (2006) described a continual change of the co-
variance structure during cranial ontogeny in both rats and mice.
Similarly, Cane (1993) reported developmental changes of vari-
ances and covariance in the common tern.
The population covariance matrix thus may be construed
as tracing out an ontogenetic trajectory in a space of covariance
matrices, such as the mean phenotype does in shape space or form
space (e.g., Cobb and O’Higgins 2004; Mitteroecker et al. 2004;
Bastir et al. 2007). Both of our examples here indicated that the
direction of this trajectory is mainly determined by the underlying
developmental factors. Although the average trajectory in shape
space is only slightly and smoothly curved (see Bookstein, 1991;
Bulygina et al. 2006), the trajectory of the covariance matrix
seems to reflect the onset and offset of developmental processes in
a surprisingly plausible way. An approximately straight trajectory
of the covariance matrix results from a constant transformation of
the developing individuals by the same developmental factors or
processes (see also eq. 3). Whenever some of these factors cease
to affect development or new factors emerge, the trajectory of the
covariance matrix will alter its direction. In other words, bends
in the trajectory appear to coincide with changes in the common
developmental program.
Some published studies failed to identify “significant” dif-
ferences among covariance matrices of different age stages (e.g.,
Willmore et al. 2006), but this typically owes to methods with
low statistical power. Ultimately, this is not a matter of statistical
significance because the shorter the time intervals (i.e., the more
careful an ontogeny is sampled), the less “significant” the incre-
mental results will be. However, the chance that k age-specific
covariance matrices appear in chronological order along a curvi-
linear trajectory is only 2/k!, which is already less than 0.02 at
k = 5. Alternatively, a set of covariance matrices may be tested
as to whether they all are the same or at least proportional (e.g.,
by the likelihood ratio tests in Mardia et al. 1979), but such tests
seem to have few congenial biological applications.
THE SPACE OF COVARIANCE MATRICES
The notion of a “distance” between two covariance matrices or of a
“direction” of change requires a metric vector space of covariance
matrices. The p(p + 1)/2 nonredundant elements of covariance
matrices induce a space of the same dimension, but this space
is not a Euclidean vector space (e.g., the difference between two
covariance matrices is not necessarily a covariance matrix again).
The Euclidean distance between covariance matrices thus is not a
“meaningful” measure of difference—it is not comparable across
different locations (for different matrices) and in different direc-
tions of the space. Instead, the space of covariance matrices is
a nonlinear manifold—a “curved” space—for which the notion
of a straight-line distance is replaced by that of a geodesic, the
shortest path between two points on the manifold. In the space of
covariance matrices, the length of the geodesic (the Riemannian
metric of this space) is given by the square root of the summed
squared log relative eigenvalues (formula 1). Only for small dif-
ferences among the matrices are the Euclidean distances similar
to the geodesic distances.
A scientifically meaningful measure of distance between co-
variance matrices must be invariant under operations that should
not have an effect on the analysis (Narens and Luce 2008). The
distance function in (1) is invariant to general linear transforma-
tions of the coordinate system, such as joint translation, rotation,
and scaling of the landmark configurations. (In geometric mor-
phometrics these three “nuisance” parameters are standardized to
arbitrary values and the choice of these values must not affect any
results.) Furthermore, it is invariant to rotations of the coordinate
system in shape space or form space so that the distances remain
the same whether the covariance matrices are computed from
the original variables, from principal components, or from other
rotations of the data. Because the metric is based on relative differ-
ences among the variances, it is further invariant to scaling of the
variables separately, which is of importance when the variables
do not share common units such as in many nonlandmark-based
morphometric approaches.
The space of covariance matrices (and its local approxima-
tion by an ordination analysis) thus provides a meaningful metric
geometry that supports the usual rhetoric in theoretical and evolu-
tionary biology, involving claims about directions of ontogenetic
or phylogenetic trajectories and about the proximity or interme-
diacy of states along these trajectories. Interestingly, even when
the measured morphometric variables do not share commensu-
rate units and hence do not support a metric distance measure, the
corresponding space of covariance matrices still exhibits a metric
geometry (Mitteroecker and Huttegger, in press). The observed
distribution of covariance matrices in this space might give rise to
inferences about the actual pattern of differences that would oth-
erwise not be evident from a tabulation of pairwise significance
tests or measures of (dis)similarity.
7 3 4 EVOLUTION MARCH 2009
ONTOGENY OF THE PHENOTYPIC COVARIANCE MATRIX
EVOLUTIONARY DYNAMICS OF COVARIANCE
STRUCTURES
In quantitative genetic theory, the response to selection depends
on the genetic population covariance matrix, which is an expres-
sion of heritable genetic and epigenetic factors having varying
pleiotropic effects during development as well as of the past se-
lective regime. Evolutionary modifications of such pleiotropic
factors and processes due to selection on a trait may indirectly
lead to evolutionary effects on other traits. Selection on the adult
phenotype can lead to evolutionary modifications of developmen-
tal processes at any age. But subadult characteristics might also
be subject to selection themselves, for example as adaptations to
juvenile environments or as a result of “internal” selection (selec-
tion against the disintegration of developmental processes; e.g.,
Wimsatt 2007). Due to developmental changes of the covariance
structure, the total phenotypic effect of selection for age-specific
traits may vary across all age stages.
Novel factors and processes appearing during development
may even modify the principal pattern of modularity and in-
tegration among traits. For instance, pleiotropic developmen-
tal processes initiated at a certain ontogenetic stage may intro-
duce novel dependencies among traits. More local developmental
processes, in contrast, can add modular variance to the current
covariance matrix (e.g., Wagner and Altenberg 1996; Hansen
2006; Mitteroecker and Bookstein 2007, 2008; Hallgrimsson and
Lieberman 2008; Klingenberg 2008). Such changes in the pattern
of augmented variance and covariance likely correspond to bends
in the ontogenetic trajectory of the covariance matrix.
The trajectory of the phenotypic covariance matrix is de-
termined by the underlying developmental program, giving rise
to the variational properties of a developing population. Despite
stringent functional and developmental constraints on the covari-
ance among organismal parts (e.g., Cheverud 1996; Young and
Badyaev 2006) as well as partially targeted or canalized growth
(Tanner 1963; Debat and David 2001), phenotypically distinct
species may be expected to have different patterns of variances
and covariances, as different phenotypes are necessarily produced
by different developmental pathways. The assumption of identical
or at least proportional covariance matrices during evolution that
is central to several models and methods in quantitative genetics
(e.g., the estimation of selection gradients) may thus be unwar-
ranted in many real-world applications. Instead, phenotypic and
genetic covariance matrices should be expected to vary both over
evolutionary time scales and over the time scale of the organismal
life cycle.
ACKNOWLEDGMENTSWe thank A. Cardini, P. Gunz, S. Huttegger, and D. Slice for helpfulcomments on the manuscript as well as W. Callebaut, M. Perlman, F.J. Rohlf, and S. T. Smith for helpful discussions. We are grateful to E.
Bulygina for loaning us the digitized Denver growth study data. FLB wassupported in part by EU FP6 Marie Curie Research Training NetworkMRTN-CT-019564.
LITERATURE CITEDAmari, S. 1985. Differential-geometrical methods in statistics, Lecture notes
in statistics. Springer, Berlin.Atchley, W., and J. Rutledge. 1980. Genetic components of size and shape.
I. Dynamics of components of phenotypic variability and covariablityduring ontogeny in the laboratory rat. Evolution 34:1161–1173.
Bastir, M., P. O’Higgins, and A. Rosas. 2007. Facial ontogeny in Neanderthalsand modern humans. Proc. R. Soc. B 274:1125–1132.
Bookstein, F. 1991. Morphometric tools for landmark data: geometry andbiology. Cambridge Univ. Press, New York.
———. 1996. Biometrics, biomathematics and the morphometric synthesis.Bull. Math. Biol. 58:313–365.
———. 1997. Landmark methods for forms without landmarks: morphomet-rics of group differences in outline shape. Med. Image Anal. 1:225–243.
Bookstein, F. L. 2009. Geometric morphometrics for virtual anthropology. In
G. W. Weber and F. L. Bookstein, eds.Virtual anthropology: a guide toa new interdisciplinary field. Springer, Vienna.
Bulygina, E. 2003. Morphology of the frontal bone in application to a com-parative study of the late Pleistocene fossil hominids from the territoryof the former Soviet Union. Master Thesis, Univ. College London.
Bulygina, E., P. Mitteroecker, and L. C. Aiello. 2006. Ontogeny of facialdimorphism and patterns of individual development within one humanpopulation. Am. J. Phys. Anthropol. 131:432–443.
Cane, W. P. 1993. The ontogeny of postcranial integration in the commonstern, Sterna hirundo. Evolution 47:1138–1151.
Cheverud, J. M. 1984. Quantitative genetic and developmentalconstraints on evolution by selection. J. Theor. Biol. 110:155–171.
Cheverud, J. 1996. Developmental integration and the evolution of pleiotropy.Am. Zool. 36:44–50.
Cobb, S., and P. O’Higgins. 2004. Hominins do not share a common postnatalfacial ontogenetic shape trajectory. J. Exp. Zool. B 302:302–321.
Debat, V., and P. David. 2001. Mapping phenotypes: Canalization, plasticityand developmental stability. Trends Ecol. Evol. 16:555–561.
Dryden, I. L., and K. V. Mardia. 1998. Statistical shape analysis. John Wileyand Sons, New York.
Forstner, W., and B. Moonen. 1999. A Metric for Covariance Matrices, Pp.113–128 in Krumm, F. and Schwarze, V. S., eds. Quo vadis geode-sia . . . ?, Festschrift for Erik W. Grafarend on the occasion of his 60thbirthday, Stuttgart Univ., Stuttgart.
Gunz, P., P. Mitteroecker, and F. Bookstein. 2005. Semilandmarks in threedimensions. Pp. 73–98 in Modern morphometrics in physical anthropol-ogy. Kluwer Press, New York.
Hallgrimsson, B., and D. E. Lieberman. 2008. Mouse models and the evolu-tionary developmental biology of the skull. Integr. Comp. Biol. 48:373–384.
Hansen, T. F. 2006. The evolution of genetic architecture. Annu. Rev. Ecol.Evol. Syst. 37:123–157.
Helgason, S. 1978. Differential geometry, lie groups, and symmetric spaces.Academic Press, New York.
Kendall, D. G. 1971. Seriation from abundance matrices. Pp. 215–252 inMathematics in the archaeological and historical sciences EdinburghUniv. Press, Edinburgh.
Klingenberg, C. P. 2008. Morphological integration and developmental mod-ularity. Annu. Rev. Ecol. Evol. Syst. 39:115–132.
EVOLUTION MARCH 2009 7 3 5
P. MITTEROECKER AND F. BOOKSTEIN
Kousba, A. A., T. S. Poet, and C. Timchalk. 2007. Age-related braincholinesterase inhibition kinetics following in vitro incubation withchlorpyrifos-oxon and diazinon-oxon. Toxicol. Sci. 95:147–155.
Lande, R. 1979. Quantitative genetic analysis of multivariate evolution, ap-plied to brain:body size allometry. Evolution 33:402–416.
Lenglet, C., M. Rousson, R. Deriche, and O. Faugeras. 2006. Statistics onthe manifold of multivariate normal distributions: theory and applica-tion to diffusion tensor MRI processing. J. Math. Imaging Vis. 25:423–444.
Lynch, J. M., and B. Walsh. 1998. Genetics and analysis of quantitative traits.Sinauer, Sunderland, MA.
Mardia, K. V., J. T. Kent, and J. M. Bibby. 1979. Multivariate analysis. Aca-demic Press, London.
McCormick, C. M., and I. Z. Mathews. 2007. HPA function in adolescence:role of sex hormones in its regulation and the enduring consequences ofexposure to stressors. Pharmacol. Biochem. Behav. 86:220–233.
McGuigan, K. 2006. Studying phenotypic evolution using multivariate quan-titative genetics. Mol. Ecol. 15:883–896.
Mitteroecker, P., and F. L. Bookstein. 2007. The conceptual and statisticalrelationship between modularity and morphological integration. Syst.Biol. 56:818–836.
———. 2008. The evolutionary role of modularity and integration in thehominoid cranium. Evolution 62:943–958.
Mitteroecker, P., P. Gunz, M. Bernhard, K. Schaefer, and F. Bookstein. 2004.Comparison of cranial ontogenetic trajectories among great apes andhumans. J. Hum. Evol. 46:679–697.
Mitteroecker, P., P. Gunz, and F. L. Bookstein. 2005. Heterochrony and geo-metric morphometrics: a comparison of cranial growth in Pan paniscusversus Pan troglodytes. Evol. Dev. 7:244–258.
Mitteroecker, P., and S. Huttegger. In press. The concept of morphospaces inevolutionary and developmental biology. Biological Theory.
Moakher, M. 2005. A differential geometric approach to the geometric meanof symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl.26:735–747.
Narens, L., and R. D. Luce. 2008. Meaningfulness and invariance. Pp. 417–421 in S. N. Durlauf, and L. E. Blume, eds. The New Palgrave Dictionaryof Economics. Palgrave Macmillan, New York.
Nonaka, K., and M. Nakata. 1984. Genetic variation and craniofacial growthin inbred rats. J. Craniofac. Genet. Dev. Biol. 4:271–302.
Roff, D. A. 1997. Evolutionary quantitative genetics. Chapman & Hall, NewYork.
Rohlf, F. J., and D. E. Slice. 1990. Extensions of the Procrustes method forthe optimal superimposition of landmarks. Syst. Zool. 39:40–59.
Rutledge, J. J., O. W. Robinson, E. J. Eisen, and J. E. Legates. 1972. Dynamicsof genetic and maternal effects in mice. J. Anim. Sci. 35:911–918.
Shaw, J. H., N. M. Shaffer, and L. W. J. Soldan. 1950. The postnatal develop-ment of the molar teeth in the cotton rat. J. Dent. Res. 29:197–207.
Smith, S. T. 2005. Covariance, subspace, and intrinsic Cramer-Rao bounds.IEEE Trans. Signal Proc. 53:1610–1630.
Steppan, S. J. 1997. Phylogenetic analysis of phenotypic covariance structure.I. Contrasting results from matrix correlation and common principalcomponent analysis. Evolution 51:571–586.
Steppan, S. J., P. C. Phillips, and D. Houle. 2002. Comparative quantita-tive genetics: evolution of the G matrix. Trends Ecol. Evol. 17:320–327.
Tanner, J. M. 1963. Regulation of growth in size in mammals. Nature 199:845–850.
Tanner, J. M., and P. S. W. Davies. 1985. Clinical longitudinal standardsfor height and height velocity for North American children. J. Pediatr.107:317–329.
Ulijaszek, S. J., F. E., Johnston, and M. A. Preece (Eds.). 1998. The Cam-
bridge encyclopedia of human growth and development. CambridgeUniv. Press, Cambridge.
Wagner, G. P., and L. Altenberg. 1996. Complex adaptations and the evolutionof evolvability. Evolution 50:967–976.
Watson, R. E., J. M. DeSesso, M. E. Hurtt, and G. D. Cappon. 2006. Post-natal growth and morphological development of the brain: a speciescomparison. Birth Defects Res. 77:471–484.
Willmore, K. E., L. Leamy, and B. Hallgrimsson. 2006. The effects of develop-mental and functional interactions on mouse cranial variability throughlate ontogeny. Evol. Dev. 8:550–567.
Wimsatt, W. C. 2007. Echoes of Haeckel? Reentrenching Development inEvolution. Pp. 309–355 in M. D. Laubichler, and J. Maienschein, eds.From embryology to evo-devo: a history of developmental evolution.MIT Press, Cambridge, MA.
Young, R. L., and A. V. Badyaev. 2006. Evolutionary persistence of phenotypicintegration: influence of developmental and functional relationships oncomplex trait evolution. Evolution 60:1291–1299.
Zelditch, M. L. 1988. Ontogenetic variation in patterns of phenotypic integra-tion in the laboratory rat. Evolution 42:28–41.
Zelditch, M. L., and A. C. Carmichael. 1989. Ontogenetic variation in pat-terns of developmental and functional integration in skulls of Sigmodon
fulviventer. Evolution 43:814–824.Zelditch, M. L., F. L. Bookstein, and B. L. Lundrigan. 1992. Ontogeny of
integrated skull growth in the cotton rat Sigmodon fulviventer. Evolution46:1164–1180.
Zelditch, M. L., J. G. Mezey, H. D. Sheets, B. L. Lundrigan, and T. GarlandJr. 2006. Developmental regulation of skull morphology II: Ontogeneticdynamics of covariance. Evol. Dev. 8:46–60.
Associate Editor: C. Goodnight
AppendixComputation of Principal Coordinates
The distance function between two covariance matrices can be
computed easily in mathematical or statistical packages. For in-
stance, the command in MATHEMATICA corresponding to equation
(1) is
Norm[Log[Eigenvalues[{S1,S2}]]]
and in R or S-PLUS is
sqrt(sum(log(eigen(solve(S2,S1))$values)∧2)).
As a numerical example, take the two matrices
S1 =(
2.3 −0.3
−0.3 3.6
)and S2 =
(3.7 1.9
1.9 2.8
).
Then the relative eigenvalues of S1 and S2, or equivalently, the
eigenvalues of S−12 S1 are 2.636 and 0.460293, and the distance
‖S1, S2‖cov = ‖S2, S1‖cov = 1.24156 (using natural logs, but the
base of the logarithm is not relevant as long as it is the same for
all pairs of distances).
To compute the relative eigenvalues, the covariance matrices
need to be of full rank and, to warrant stability of the result, there
7 3 6 EVOLUTION MARCH 2009
ONTOGENY OF THE PHENOTYPIC COVARIANCE MATRIX
should be many times more cases than variables. Furthermore, the
last principal components often represent measurement error or
other small random fluctuations that yet may have a considerable
impact on the distance measure, as it is based on ratios of variances
(relative eigenvalues). The covariance matrices should thus be
projected into the space of the first few principal components
(pooled within groups or for the whole dataset, not for each group
separately) that capture most of the variance in the data.
To compute the principal coordinate ordination for k covari-
ance matrices Si, i = 1 . . . k, let V be the k × k matrix of squared
Riemannian distances with elements vl,m = ‖Sl , Sm‖2cov. Calcu-
late the centered inner product matrix
D = −1
2HVH,
where H = I − 1k 11t is the symmetric k × k centering matrix with
diagonal elements 1 − 1/k and off-diagonal entries −1/k. The
principal coordinates of the covariance matrices are the scaled
eigenvectors (√
λ1e1,√
λ2e2, . . . ,√
λk−1ek−1) of the matrix D,
where λi is the ith eigenvalue and ei the ith eigenvector (e.g.,
Mardia et al. 1979). The variance of the ith principal coordinate
equals λi/(k − 1) so that λ1/∑
λi is the fraction of the total
variation of the reconstructed coordinates accounted for by the
first PCoord, and similarly for the subsequent dimensions.
The MATHEMATICA code for principal coordinate analysis is
given below, where the matrix of principal coordinates is con-
tained in pcoord
V = Table[Norm[Log[Eigenvalues[{S[[i]],S[[j]]}]]]∧ 2,{i,1,k},{j,1,k}];
H = IdentityMatrix[k]--Table[1/k,{k},{k}];D = -(1/2) * H.D.H;
{eval,evec} = Eigensystem[D];
pcoord = evec * Sqrt[eval];
To represent the uncertainty around the estimated covari-
ance matrices, one may compute a bootstrap distribution for ev-
ery matrix and plot these distributions in the ordination or just
show equal frequency ellipsoids around the original covariance
matrices in the PCoord plot. As mentioned in the main text, an
ordination of a high-dimensional random walk (i.e., random and
independent changes over time) would produce a somewhat re-
peatable pattern of bends in the trajectory. To test against that
null model, at least informally, it is advisable to subject several
subsets of the data to the ordination analysis and observe whether
the bends still occur at the same points of the trajectory. Only
then is an interpretation justified in terms of common underlying
processes.
EVOLUTION MARCH 2009 7 3 7
Recommended