11
Measuring Shifts in Function and Evolutionary Opportunity Using Variability Profiles: A Case Study of the Globins Gavin J. P. Naylor, 1 Mark Gerstein 2 1 Department of Zoology and Genetics, Iowa State University, Ames, IA 50011, USA 2 Molecular Biophysics and Biochemistry Department, Yale University New Haven, CT, 06520-8114, USA Received: 16 September 1999 / Accepted: 19 May 2000 Abstract. Variability profiles measured over a set of aligned sequences can be used to estimate evolutionary freedom to vary. Differences in variability profiles be- tween clades can be used to identify shifts in function at the molecular level. We demonstrate such a shift be- tween the alpha and beta subunits of hemoglobin. We also show that the variability profiles for myoglobin are different between whales and primates and speculate that the differences between the two clades may reflect a shift associated with the novel oxygen storage demands in the lineage leading to whales. We discuss the relationship between sequence variability and “evolutionary opportu- nity” and explore the utility of Maynard Smith’s multi- dimensional evolutionary opportunity space metaphor for exploring functional constraints, genetic redundancy, and the context dependency of the genotype-phenotype map. This work has implications for quantitatively de- fining and comparing protein function. Supplementary data is available from bioinfo.mbb.yale.edu/align. Key words: Hemoglobin — Molecular evolution — Functional shift Proteins evolve through amino acid substitution. Some substitutions are neutral, or nearly so, and have little effect on protein function. Others are deleterious and are removed by natural selection. The extent to which sub- stitutions are tolerated varies from site to site and region to region within a protein and reflects the degree of con- straint. A region or site that is tightly constrained is less free to vary than one in which constraints are relaxed. Such differences in freedom to vary can be represented using Maynard-Smith’s (1970) concept of a protein se- quence space in which each site in an alignment is rep- resented on its own axis, and the number of axes required to represent all conceivable variants for a protein is equal to the number of sites in its sequence. Each sequence occupies a unique point in this space; variants differing at one site are adjacent (Hamming) neighbors. The col- lection of all viable sequence variants for a particular protein forms a localized interconnected neighborhood of points within the space. This representation has proved conceptually intuitive and analytically powerful (Vingron and Sibbald 1993; Vingron and Waterman 1994; Huynen et al. 1996; Fontana and Schuster 1998; Bornberg-Bauer and Chan 1999; Wuchty et al. 1999). In this paper we explore the relationships between sequence variation, protein function, shift in function, and the pro- cess of evolutionary change within the context of the protein sequence space representation. A number of in- teresting insights emerge. In protein sequence space, constraints are reflected in the multidimensional shape of the cluster of points that make up the neighborhood of variants viable for a spe- cific protein. The boundary defining the edge of this neighborhood is characteristic of the protein’s function and can be thought of as its functional “signature.” Any sequence combination falling outside the boundary will fail to function. Over the course of evolution, mutation Correspondence to: G.J.P. Naylor J Mol Evol (2000) 51:223–233 DOI: 10.1007/s002390010084 © Springer-Verlag New York Inc. 2000

Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

Measuring Shifts in Function and Evolutionary Opportunity UsingVariability Profiles: A Case Study of the Globins

Gavin J. P. Naylor,1 Mark Gerstein2

1 Department of Zoology and Genetics, Iowa State University, Ames, IA 50011, USA2 Molecular Biophysics and Biochemistry Department, Yale University New Haven, CT, 06520-8114, USA

Received: 16 September 1999 / Accepted: 19 May 2000

Abstract. Variability profiles measured over a set ofaligned sequences can be used to estimate evolutionaryfreedom to vary. Differences in variability profiles be-tween clades can be used to identify shifts in function atthe molecular level. We demonstrate such a shift be-tween the alpha and beta subunits of hemoglobin. Wealso show that the variability profiles for myoglobin aredifferent between whales and primates and speculate thatthe differences between the two clades may reflect a shiftassociated with the novel oxygen storage demands in thelineage leading to whales. We discuss the relationshipbetween sequence variability and “evolutionary opportu-nity” and explore the utility of Maynard Smith’s multi-dimensional evolutionary opportunity space metaphorfor exploring functional constraints, genetic redundancy,and the context dependency of the genotype-phenotypemap. This work has implications for quantitatively de-fining and comparing protein function. Supplementarydata is available from bioinfo.mbb.yale.edu/align.

Key words: Hemoglobin — Molecular evolution —Functional shift

Proteins evolve through amino acid substitution. Somesubstitutions are neutral, or nearly so, and have littleeffect on protein function. Others are deleterious and areremoved by natural selection. The extent to which sub-stitutions are tolerated varies from site to site and region

to region within a protein and reflects the degree of con-straint. A region or site that is tightly constrained is lessfree to vary than one in which constraints are relaxed.Such differences in freedom to vary can be representedusing Maynard-Smith’s (1970) concept of a protein se-quence space in which each site in an alignment is rep-resented on its own axis, and the number of axes requiredto represent all conceivable variants for a protein is equalto the number of sites in its sequence. Each sequenceoccupies a unique point in this space; variants differingat one site are adjacent (Hamming) neighbors. The col-lection of all viable sequence variants for a particularprotein forms a localized interconnected neighborhoodof points within the space. This representation hasproved conceptually intuitive and analytically powerful(Vingron and Sibbald 1993; Vingron and Waterman1994; Huynen et al. 1996; Fontana and Schuster 1998;Bornberg-Bauer and Chan 1999; Wuchty et al. 1999). Inthis paper we explore the relationships between sequencevariation, protein function, shift in function, and the pro-cess of evolutionary change within the context of theprotein sequence space representation. A number of in-teresting insights emerge.

In protein sequence space, constraints are reflected inthe multidimensional shape of the cluster of points thatmake up the neighborhood of variants viable for a spe-cific protein. The boundary defining the edge of thisneighborhood is characteristic of the protein’s functionand can be thought of as its functional “signature.” Anysequence combination falling outside the boundary willfail to function. Over the course of evolution, mutationCorrespondence to:G.J.P. Naylor

J Mol Evol (2000) 51:223–233DOI: 10.1007/s002390010084

© Springer-Verlag New York Inc. 2000

Page 2: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

pressure drives sites that are free to vary to explore theopportunity space available to them. Different lineagesexplore different parts of this space. Given enough time,a radiation of evolutionary lineages will collectively ex-plore all of the opportunity space available for a particu-lar protein function.

Sequence variants sampled from different pointswithin a particular protein’s opportunity space can beobtained by sequencing DNA coding for that proteinfrom a variety of organisms. If we align the sequencevariants and estimate how many evolutionary changeshave occurred for each residue in the alignment, we candetermine the alignment’s overall variability profile.This profile reflects the protein’s constraints in much thesame way as does the shape of its neighborhood in pro-tein sequence space (Benner et al. 1994). Indeed, the twoare related. The variability profile reflects a mutationallydirected walk through the neighborhood and constitutes ahistorical sampling of the opportunity space. If variabil-ity profiles are characteristic of a protein’s function, thenany shift in function should be reflected by a correspond-ing shift in the variability signature. It follows that vari-ability signatures could be used to measure the functionaldivergence among protein families (Gu 1999). This hasobvious implications for bioinformatics and comparativegenomics where there is a pressing need to define andcompare protein functions on a large scale (Riley 1997,1998; Hegyi and Gerstein 1999; Jansen and Gerstein2000; Mewes et al. 1998; Ashburner et al. 2000; Drawidand Gerstein 2000). One can envision applications wherespecially tailored metrics describing variability profilesare automatically compared among gene families for awide variety of taxa.

In this paper we explore the use of variability profilesas reflections of functional constraints using sequencesfor the globin superfamily of proteins.

Case Study—The Globins

Descriptions of the Molecules

Functional hemoglobin is a tetramer made up of 2 alphaand 2 beta globin subunits (Fig. 1a). The alpha and betasubunits share a common ancestry due to ancient geneduplication. The two subunits have a high degree of se-quence similarity and almost identical tertiary structures.Both are box-like and consist of a two-layered sandwichof eight alpha helices connected by turns (Fig. 1b). Someof the functionally important residues are indicated inFig. 1c. When the four subunits are assembled into func-tional hemoglobin, there is little contact between the twoalpha chains or between the two beta chains; however,there are several contacts between the pairs of unlikechains.

Myoglobin, like hemoglobin, is a member of the glo-

bin superfamily of proteins. It has a similar tertiary struc-ture and exhibits a high degree of sequence similarity tothe two hemoglobin subunits. Like hemoglobin, myoglo-bin is involved in binding oxygen. However, it differs inthat it is used to store oxygen in muscle, rather than totransport it through the blood, and is not allostericallyregulated like hemoglobin.

Materials and Methods

We obtained alpha and beta amino acid sequences for 20 differentcarnivore species, 16 ungulates, and 20 primates (Fig. 2) from publicsequence data bases. All 112 (56 × 2) sequences were simultaneouslyaligned using a combined sequence and structure alignment approach(Gerstein et al. 1994; Gerstein and Altman 1995; Gerstein and Levitt1996, 1998); that is, “key” structures representing fairly divergent se-quences were first aligned based on their three-dimensional structurecoordinates. Then sequences were aligned to the structure to which theywere most homologous. Six data subsets were isolated from the align-ment: one Hb alpha and one Hb beta subset for each of the threemammalian groups. We used cladistic parsimony as implemented in thethe computer programs PAUP*4.0 (Swofford 1999) and MacClade 3.0(Maddison and Maddison 1992) to infer the number of substitutionsthat had occurred at each site, for each of the three sets of mammaliantaxa (Fig. 2). This information was used to generate a profile of inferredvariability for each of the six data sets (three sets of taxa for two genes).The six resultant variability profiles were then compared using thecoefficient of functional divergence theta (Gu 1999). Theta (Ul) can beinterpreted as the loss of rate correlation over sites between two ho-mologous genes, or as the probability that the two homologous geneshave shifted function.

Ul 4 1 − rl

where rl is the coefficient of rate correlation measured over pairs ofcorresponding sites for the two variability profiles being contrasted. Itis calculated thus:

rl =Cov~l1,l2!

=Var~l1!Var~l2!

where Var (l1), Var (l2), and Cov (l1, l2) are the respective variancesand covariances of evolutionary rates (number of changes inferredusing parsimony) measured for each of the two variability profiles. Forfurther details the reader is referred to Gu (1999).

Results

Hemoglobin

Our results indicate that within the alpha subunit thevariability profile is statistically similar for all threegroups of mammals. The same is true for the beta sub-unit. However, the variability profiles associated with thealpha subunit is markedly distinct from those of the betasubunit. The coefficient of functional divergenceu be-tween hemoglobin alpha and beta subunits was 0.36,significantly larger than 0 (p < 0.01). A structural model

224

Page 3: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

of hemoglobin coloured to reflect the degree of change ateach site is shown in Fig. 3.

As might be expected, and as can be seen in the vari-ability plots, the match among different clades of organ-isms for the same protein subunit is not perfect. This isprobably a consequence of the stochasticity of the sub-stitution process and the restricted sampling of taxa (andtherefore of the evolutionary opportunity space) used foreach data set. Given these drawbacks, it is all the moreremarkable that such clear-cut differences in variabilityprofiles exist between the alpha and beta subunits (Fig.2). It would appear from these results that variabilitysignatures may indeed provide a powerful and sensitiveway to represent the subtle but important differences infunction that exist between closely related proteins—apparently even when they have highly similar tertiarystructures as is the case for the presented globins.

Myoglobin

We aligned myoglobin sequences for 15 primates and 15cetaceans and subjected them to the same procedure de-scribed above. In contrast to the situation seen for he-moglobins, the variability profiles for myoglobins weresignificantly different between the two orders of mam-mals (Ul 4 0.34,p < 0.025; Fig. 4). Fewer sites appearfree to vary in the cetaceans than in the primates (29variable sites for cetaceans, 38 for primates), but thecetacean sites that are variable show a higher incidenceof change than is the case for their primate counterparts(mean number of changes for cetaceans: 1.7, SD 0.79;for primates 1.3, SD 0.67). These differences are par-ticularly pronounced in the region between sites 127–164in the G and H helices (Fig. 4) where 16 out of the 38sites show variation in the primates, but only 6 of the 38

Fig. 1. Schematic giving an overview of the globins.(A)Schematic showing the orientation of the tetrameric form ofhemoglobin. The two alpha subunits contact each of the beta unitsbut do not come into contact with each other.(B) The structuralarrangement of the eight alpha helices common to all globinmonomers. The position of the heme is shown. Also indicated arethe approximate position of the three main exons in myoglobin: exon1 extends from the N-terminus to B2, exon 2 from B3 to G6, exon 3from G7 onward.(C) Some of the key residues in the globin foldare highlighted: F8 is the proximal His, which binds heme. E7 is thedistal His, which contacts heme. Phe at CD1 and Leu at F4 are twoother conserved heme contacts. Some inter subunit saltbridges switchbetween the R and T states. They are Tyr at alpha subunit C7 toAsp at beta subunit G1 in the T state. This becomes Asp at alphasubunit G1 to Asn at beta subunit G4 in the R state. In general thealpha subunit-1 to beta subunit-2 contacts are made by the FG andBC turns. The alpha subunit-1 to beta subunit-1 contacts are madeby the G and H helices with the GH turn.

225

Page 4: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

226

Page 5: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

show variation in the cetaceans. The fact that variablesites in cetaceans are fewer in number yet more prone tochange indicates that myoglobin may be differently con-strained in primates than it is in cetaceans. It might beargued that the whale and primate myoglobin variabilityplots differ not as a result of differential constraints butas a consequence of different taxon sampling schemes.Perhaps the 15 cetaceans radiated more recently than didthe 15 primates? Though this could account for the factthat fewer sites show variation in cetaceans than in pri-mates, it cannot account for the increased amount ofpersite change for those sites that do show variation in thecetaceans. If differences in taxon sampling were the un-derlying cause for the differences we would expect theclade with the fewest number of variable sites to alsoshow the lowest amounts of per site change (assuming astochastic model of evolutionary change).

It is enticing to suggest that the difference in variabil-ity profiles reflect a shift in function associated with thenovel oxygen storage demands of sustained deep divingin cetaceans. Unfortunately, we cannot determine fromthe data whether it is the cetaceans or the primates (orboth) that have shifted function. Myoglobin sequencesfrom other orders of mammals will help clarify this. Atpresent there are insufficient myoglobin sequences inpublicly available databases to determine this unequivo-cally. We emphasize that inferences about functional di-vergence based on such variability profiles cannot sub-stitute for careful comparative assays of the biochemicalproperties of the gene products such as those reviewed inRomero-Herrera et al. (1978) and Perutz (1983). How-ever, the comparison of variability profiles betweenparalogous sequences can provide a powerful and infor-mative initial step toward understanding functional di-vergence. Information gleaned from such comparisonscan be used to guide the choice of functionally important“candidate sites” for subsequent experimental verifica-tion using site-directed mutagenesis, circumventing theneed for random mutagenesis.

Discussion

In the following sections we discuss how variability pro-files can be related to “evolutionary opportunity” anddiscuss the broader implications within the context of theprotein sequence space representation. We speculate howdrift and selection might interact with the underlyinggenetic architecture to shape molecular evolutionarychange.

In the majority of cases proteins serving differentfunctions will reside in different parts of protein se-quence space. Neutral sequence variants associated witha particular protein function will describe an opportunityspace that is available for local exploration through sto-chastic (passive diffusion) processes. In the hemoglobin

Fig

.2.

Var

iabi

lity

prof

iles

for

hem

oglo

bin

alph

aan

dhe

mog

lobi

nbe

tafo

rth

ree

mam

mal

ian

grou

ps(c

arni

vore

s,un

gula

tes,

and

prim

ates

).P

rofil

esw

ere

dete

rmin

edus

ing

the

tree

topo

logi

essh

own;

the

num

ber

ofch

ange

sim

plie

dfo

rea

chsi

tebe

ing

dete

rmin

edby

pars

imon

y.P

lots

from

all

six

data

sets

are

show

nin

regi

ster

tofa

cilit

ate

dire

ctco

mpa

rison

.T

hedo

mai

nsm

arke

dA

toF

corr

espo

ndto

the

alph

ahe

lices

A–F

inF

ig.1

.Not

eth

atdi

ffere

nces

inva

riabi

lity

prof

iles

betw

eenalph

aan

dbe

tase

quen

ces

dono

thi

nge

onth

eph

ylog

enie

sbe

ing

corr

ect

asva

riabi

lity

prof

iles

for

both

alph

aan

dbe

tase

quen

ces

are

base

don

iden

tical

tree

topo

logi

es.C

ompa

rabl

enu

mbe

rsof

taxa

wer

eus

edfo

rea

chgr

oup

inan

effo

rtto

prov

ide

equi

vale

ntsa

mpl

ing

ofth

ene

utra

lspa

cefo

rea

chof

the

diffe

rent

clad

es.V

aria

bilit

ypr

ofile

sar

ecl

early

diffe

rent

betw

een

the

alph

aan

dbe

tasu

buni

tsbu

tar

esi

mila

rw

ithin

subu

nits

for

each

ofth

eth

ree

grou

ps.

227

Page 6: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

Fig. 3. Ball and stick models of hemoglobin colored to reflect thedegree of change at each site. Variability profiles from the six data setsin Fig. 2 were plotted separately. Invariant residues are shown in gray,those with one or two changes in green, those with three or fourchanges in orange, and those with five or more changes in red. Thisfigure gives a three-dimensional structural context to the variability

plots shown in Fig. 2. Note that the evolutionary freedom to vary, whilesimilar among the different mammalian groups for each hemoglobinsubunit, does not show perfect correspondence at the level of individualresidues. Instead the patterns of variability suggest a correspondence atthe level of particular subregions of the molecule (along particularsurfaces of certain helices, for example).

228

Page 7: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

Fig

.4.

Var

iabi

lity

prof

iles

for

myo

glob

inco

ntra

stin

gpr

imat

esan

dce

tace

ans.

Pro

files

wer

ede

term

ined

usin

gth

etr

eeto

polo

gies

show

n.T

here

lativ

epo

siti

onof

each

resi

due

isde

pict

edon

the

xax

is.

The

dom

ains

mar

ked

Ato

Fco

rres

pond

toth

eal

pha

helic

esA

–Fin

Fig

.1.

Myo

glob

ins

show

dist

inct

lydi

ffere

ntpa

ttern

sof

varia

bilit

ybe

twee

npr

im

ates

and

ceta

cean

s,su

gges

ting

that

the

cons

trai

ntpr

ofile

sdi

ffer

betw

een

the

two

grou

ps.

The

coef

ficie

ntof

func

tiona

ldiv

erge

nce

betw

een

the

prim

ates

and

the

wha

les

was

0.34

(p

<0.

025)

.

229

Page 8: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

examples presented, we see similar patterns of variabilitywithin each of the subunits for three groups of mammals,but different patterns of variability between the twosubunits. This suggests that there is an opportunityspace associated with the alpha subunit that is distinctfrom that of the beta subunit. We conjecture thatgroups of distinct but related neutral neighborhoods, thatcorrespond to alleles of different fitness for a particularprotein function, are aggregated into clusters. Corridorsof viability bring the different neutral neighborhoodswithin a cluster of alleles into close proximity such thatsingle mutational steps can occasionally provide entrypoints to alleles of different fitness (new phenotypes)(Huynen et al. 1996). If a particular phenotype confers aselective advantage its frequency in the population willlikely increase.

We envisage that the shape of a neutral space canchange with context. As context shifts, part of the spacecan become “out of bounds” (no longer neutral) whilenew, previously “forbidden” space can become neutral

and available for exploration. In such a scenario, evolu-tion would involve not only movement along predefinedcorridors but also a change in the opportunity space itselfthrough context-sensitive contraction and expansion ofthe corridors themselves. The result would be a dynami-cally changing context-sensitive opportunity space forevolutionary experimentation and a perpetually changingor “restless” genotype-phenotype map (Wagner and Al-tenburg 1996). There are many factors that can affectcontext. At one level, intrinsic changes in the freedom tovary of sites within a particular protein can be broughtabout by an influential substitution elsewhere in the pro-tein as described by the Covarion model of Fitch (1971).At another level, interactions among proteins that eitherenhance function or share the burden of a function can“open up” the neutral space. For example, built in re-dundancies in metabolic pathways that foster architec-tural resilience could, in principle, render more of theprotein space effectively neutral and thus available forexploration.

Fig. 5. Ball and stick models of myoglobin colored to reflect thedegree of change at each site. Variability profiles from the data sets inFig. 3 were plotted separately. Invariant residues are shown in graythose with one change in green, those with two changes in orange, and

those with three or more changes in red. (Note this is a different scalethan used in Fig. 3.) This figure gives a three-dimensional structuralcontext to the variability plots shown in Fig. 4.

230

Page 9: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

Complexity and Robustness

As systems become more complex, the number of waysto solve a task increases, which, in turn, leads to moreevolutionary opportunity. The idea that evolutionary op-portunity increases with complexity can seem superfi-cially counterintuitive because we tend to think of com-plex systems as sensitive to perturbation. This sentimentis reflected in the most recent edition of Futuma’sEvo-lutionary Biologytext: “The greater the number and de-gree of functional integration of interacting parts, themore stringent constraints on evolution are likely to be,and the rarer will be evolutionary “breakthroughs” tonew organismal designs” (Futuma 1998, 684). This iscertainly the case for mechanical devices whose subcom-ponents are designed to work in an additive fashion with-out built-in redundancy. Such devices have no “neutralspace.” However, while small perturbations will quicklybring mechanical devices to a grinding halt, biologicalsystems show considerable resilience. The resilience fos-tered by neutral space can be thought of as a means tohedge one’s bets in a changing environment while simul-taneously providing a platform from which to explore theavailable space for improved fitness configurations. It isimportant to recognize that no intentional design is im-plied by such an architecture. The enhanced opportunityto explore new fitness configurations is merely a conse-quence of the resilience itself. Such robustness has beendocumented in biochemical pathways (Barkai andLeibler 1997; Glas et al. 1998) and has been suggested tooccur in brain development (Fritzsch 1995) and also inecosystems (Naeem and Li, 1997).

Other Insights

The protein sequence space representation can provideinsight into the acquisition of new protein functions andby extension the evolutionary origin of novel pheno-types. As stated previously, proteins that carry out dif-ferent functions are likely centered in different parts ofthe sequence space. However, their context-sensitive anddynamically changing neutral neighborhoods may occa-sionally come into close proximity. If neighborhoodsrepresenting different functions come into adjacency, thepotential exists for mutational “jumps” that traversefunctions (Huynen et al. 1996). There is mounting em-pirical evidence that this may be more common thanpreviously suspected (reviewed in Golding and Dean1998). In general, gene duplication has been forwardedas the most plausible mechanism to account for the ac-quisition of new protein function (Walsh 1995). It neednot be theonly mechanism. A protein could shift func-tion without undergoing gene duplication if it were partof a resilient and robust network whose elements con-tinued to function effectively in its absence (Barkai andLeibler 1997). This sort of redundancy would “open up”neutral space (in the same way as would gene duplica-tion), increasing the likelihood that neighborhoods asso-ciated with different functions came into adjacency.

The presented sequence space construct provides arepresentation in which neutrality and robustness fosterboth architectural resilience for the stability of an exist-ing function while also providing increased evolutionaryopportunity for innovation. There are parallels betweenthe presented model and the holey adaptive landscapeforwarded by Gavrilets (1997). In Gavrilets’s landscapethere is an emphasis after Dobzhansky (1937) on “ridgesof well fit genotypes that extend through the genotypespace.” These ridges connect clusters of viable genotypesin the genotype space. Evolution proceeds as a percola-tion through the nexus of connected components. Onekey difference between the two models is that in ourrepresentation, most of the evolutionary change occursalong neutral corridors where differences in sequence donot result in corresponding differences in structure (i.e.,phenotype) (Chothia and Gerstein 1997). In Gavrilets’slandscape, the ridges of viable genotypes have pheno-types that are intermediate in form. Gavrilets’s landscapemay be a better description of sequence evolution that isdriven by selection rather than neutrality (Gillespie 1991).

Just as the concept of a neutral neighbourhood canreconcile evolutionary stability (resilience) with evolu-tionary change (innovation), it is also consistent with thegenerally reliable phylogenetic performance of molecu-lar sequence data and its occasional failures. Moleculardata will tend to be reliable when a neutral neighborhoodis both large and stationary in protein sequence space.Under these conditions equally viable (neutral) variantsof a protein arise at different points along an evolution-ary trajectory and are passed on from ancestor to descen-dant lineages. The process is essentially one of passivediffusion, transparent to the distorting effects of naturalselection that can cause character distributions to be phy-logenetically misleading. The evolutionary branchingpattern leaves an unbiased trace in the distribution ofdifferent neutral variants among the terminal taxa. Bycontrast, when the neutral neighborhood is nonstationaryover a tree, sequence data can be phylogenetically mis-leading because sites run the risk of being free to vary inone lineage, but not in another. Such conditions promoteamong-lineage rate heterogeneity and highly skewedcharacter distributions among terminal taxa, both ofwhich are known to be problematic for phylogenetic in-ference (Pesole et al. 1995; Naylor and Brown 1998;Sullivan and Swofford 1997).

Conclusions

We have presented data for hemoglobin indicating thatthe respective functions of both its alpha and beta sub-units have remained static in three different groups ofmammals for the past 40 million years. We have con-trasted this with data that suggests myoglobin may haveshifted its function in cetaceans relative to primates overa similar time frame. We have expanded an idea origi-nally introduced by Maynard Smith in 1970 and have

231

Page 10: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

speculated how it might shed light on topics as diverse asbiological resilience, evolutionary opportunity, the originof evolutionary novelty, and the reasons for the generalsuccess and occasional failures of molecular phyloge-netic inference. However, the empirical evidence we pre-sent is based on a mere 140 globin sequences. There is aneed for a broader and denser sampling of related proteinsequences across the diversity of life to better test someof the presented ideas. We look forward to more com-plete answers to some of these questions as technologiesfor rapid DNA sequencing improve.

Acknowledgments. We are grateful to Dan Ashlock, Homayoun Ba-gheri, Mike Charleston, Olivier Fedrigo, Xun Gu, Fred Janzen, Jun-hyong Kim, Gunter Wagner, Yufeng Wang, and Steve Willson fordiscussion and to Elizabeth Knurek for editorial improvement of thetext. GN acknowledges support from NSF Grant #DEB9707145, MGthe Donaghue and Keck Foundations.

Appendix

Globin sequences compared in this study can be found in SwissProt.

HBA_CERSI Ceratotherium simumHBA_EQUHE Equus hemionusHBA_HORSE Equus caballusHBA_RHIUN Rhinoceros unicornisHBA_ALCAA Alces alcesHBA_BISBO Bison bonasusHBA_BOSGA Bos gaurusHBA_BOVIN Bos taurusHBA_CAMDR Camelus dromedariusHBA_CAPHI Capra hircusHBA_HIPAM Hippopotamus amphibiusHBA_LAMGL Lama glamaHBA_ODOVI Odocoileus virginianusHBA_PIG Sus scrofaHBA_RANTA Rangifer tarandusHBA_TRAST Tragelaphus strepsicerosHBB_CERSI Ceratotherium simumHBB_EQUHE Equus hemionusHBB_HORSE Equus caballusHBB_RHIUN Rhinoceros unicornisHBB_ALCAA Alces alcesHBB_BISBO Bison bonasusHBB_BOSGA Bos gaurusHBB_BOVIN Bos taurusHBB_CAMDR Camelus dromedariusHBB_HIPAM Hippopotamus amphibiusHBB_LAMGL Lama glamaHBB_ODOVI Odocoileus virginianusHBB_PIG Sus scrofaHBB_RANTA Rangifer tarandusHBB_SHEEP Ovis ariesHBB_TRAST Tragelaphus strepsicerosHBA_AILFU Ailurus fulgensHBA_AILME Ailuropoda melanoleucaHBA_CANFA Canis familiarisHBA_CROCR Crocuta crocutaHBA_FELCA Felis silvestrisHBA_LEPWE Leptonychotes weddelliHBA_LUTLU Lutra lutraHBA_LYNLY Lynx lynxHBA_MELCA Mellivora capensisHBA_MELME Meles melesHBA_MUSLU Mustela lutreolaHBA_ODORO Odobenus rosmarus

HBA_PAGLA Paguma larvataHBA_PANLE Panthera leoHBA_PHOVI Phoca vitulinaHBA_PROCR Proteles cristatusHBA_PROLO Procyon lotorHBA_PTEBR Pteronura brasiliensisHBA_URSMA Thalarctos maritimusHBA_VULVV Vulpes vulpesHBB_AILFU Ailurus fulgensHBB_AILME Ailuropoda melanoleucaHBB_CANFA Canis familiarisHBB_CROCR Crocuta crocutaHBB_FELCA Felis silvestrisHBB_LEPWE Leptonychotes weddelliHBB_LUTLU Lutra lutraHBB_LYNLY Lynx lynxHBB_MELCA Mellivora capensisHBB_MELME Meles melesHBB_MUSLU Mustela lutreolaHBB_ODORO Odobenus rosmarusHBB_PAGLA Paguma larvataHBB_PANLE Panthera leoHBB_PHOVI Phoca vitulinaHBB_PROCR Proteles cristatusHBB_PROLO Procyon lotorHBB_PTEBR Pteronura brasiliensisHBB_URSMA Thalarctos maritimusHBB_VULVV Vulpes vulpesHBA_ATEGE Ateles geoffroyiHBA_CALAR Callithrix argentataHBA_CEBAP Cebus apellaHBA_CERAE Cercopithecus aethiopsHBA_CERTO Cercocebus torquatusHBA_COLBA Colobus badiusHBA_EULFU Eulemur fulvusHBA_GORGO Gorilla gorillaHBA_HUMAN Homo sapiensHBA_LORTA Loris tardigradusHBA_LEMVA Lemur vareciaHBA_MACMU Macaca mulattaHBA_MANSP Mandrillus sphinxHBA_NYCCO Nycticebus coucangHBA_PAPCY Papio hamadryasHBA_TARSY Tarsius syrichtaHBA_PREEN Presbytis entellusHBA_SAGFU Saguinus fuscicollisHBA_THEGE Theropithecus geladaHBA_GALCR Galago crassicaudatusHBB_ATEGE Ateles geoffroyiHBB_CALAR Callithrix argentataHBB_CEBAP Cebus apellaHBB_CERAE Cercopithecus aethiopsHBB_CERTO Cercocebus torquatusHBB_COLBA Colobus badiusHBB_EULFU Eulemur fulvusHBB_GORGO Gorilla gorillaHBB_HUMAN Homo sapiensHBB_LORTA Loris tardigradusHBB_LEMVA Lemur vareciaHBB_MACMU Macaca mulattaHBB_MANSP Mandrillus sphinxHBB_NYCCO Nycticebus coucangHBB_PAPCY Papio hamadryasHBB_PREEN Presbytis entellusHBB_SAGFU Saguinus fuscicollisHBB_THEGE Theropithecus geladaHBB_TARSY Tarsius syrichtaHBB_GALCR Galago crassicaudatus

232

Page 11: Measuring Shifts in Function and Evolutionary Opportunity ...naylorlab.scs.fsu.edu/Publications/Globins.pdfMeasuring Shifts in Function and Evolutionary Opportunity Using Variability

MYG_BALAC Balaenoptera acutorostrataMYG_BALPH Balaenoptera physalusMYG_ESCGI Eschrichtius robustusMYG_GLOME Globicephala melasMYG_INIGE Inia geoffrensisMYG_KOGSI Kogia simusMYG_MEGNO Megaptera novaeangliaeMYG_MESCA Mesoplodon carlhubbsiMYG_ORCOR Orcinus orcaMYG_PHYCA Physeter catodonMYG_TURTR Tursiops truncatusMYG_ZIPCA Ziphius cavirostrisMYG_PHOPH Phocoena phocoenaMYG_CALJA Callithrix jacchusMYG_CEBAP Cebus apellaMYG_GALCR Galago crassicaudatusMYG_GORBE Gorilla gorillaMYG_HUMAN Homo sapiensMYG_HYLAG Hylobates agilisMYG_LAGLA Lagothrix lagotrichaMYG_LEPMU Lepilemur mustelinusMYG_MACFA Macaca fascicularisMYG_NYCCO Nycticebus coucangMYG_PANTR Pan troglodytesMYG_PAPAN Papio hamadryasMYG_PERPO Perodicticus pottoMYG_PONPY Pongo pygmaeusMYG_SAISC Saimiri sciureus j

References

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: toolfor the unification of biology. The Gene Ontology Consortium. NatGenet 25(1):25–29

Barkai N, Leibler S (1997) Robustness in simple biochemical net-works. Nature 387:913–917

Benner SA, Badcoe I, Cohen MA, Gerloff DL (1994) Bona fide pre-diction of aspects of protein conformation. Assigning interior andsurface residues from patterns of variation and conservation in ho-mologous protein sequences. J Mol Biol 235:926–958

Bornberg-Bauer E, Chan H-S (1999) Modelling evolutionary land-scapes: mutational stability, topology, and superfunnels in sequencespace. PNAS 96:10698–10694

Chothia C, Gerstein M (1997) Protein evolution. How far can se-quences diverge? Nature 385:579–581

Dobzhansky TH (1937) Genetics and the origin of species. ColumbiaUniversity Press, New York, NY

Drawid A, Gerstein M (2000) A Bayesian system integrating expres-sion data with sequence patterns for localizing proteins: comprehen-sive application to the yeast genome. J Mol Biol 360:1077–1093

Fitch WM (1971) Rate of change of concomitantly variable codons. JMol Evol 1:84–96

Fontana W, Schuster P (1998) Continuity in evolution: on the nature oftransitions. Science 280:1451–1455

Fritzsch B (1995) Evolution of the ancestral vertebrate brain.In ArbibMA (ed) The handbook of brain theory and neural networks. Ed.M.A. Arbib. MIT Press, Cambridge Mass

Futuma D (1998) Evolutionary biology, 3d ed. Sinauer Associates,Sunderland, MA

Gavrilets S (1997) Evolution and speciation on holey adaptive land-scapes. TREE 12(8):307–312

Gerstein M, Altman R (1995) Average core structures and variabilitymeasures for protein families: application to the immunoglobulins.J Mol Biol 251:161–175

Gerstein M, Levitt M (1996) Using iterative dynamic programming to

obtain accurate pair-wise and multiple alignments of protein struc-tures.In Proc. Fourth Int. Conf. on Intell. Sys. Mol. Biol., AAAIPress, Menlo Park, CA, pp 59–67

Gerstein M, Levitt M (1998) Comprehensive assessment of automaticstructural alignment against a manual standard, the Scop classifi-cation of proteins. Prot Sci 7:445–456

Gerstein M, Sonnhammer E, Chothia C (1994) Volume changes onprotein evolution. J Mol Biol 236:1067–1078

Gillespie J (1991) The causes of molecular evolution. Oxford Univer-sity Press, Oxford

Glas R, Bogyo M, McMaster J, Gaczynska M, Ploegh H (1998) Aproteolytic system that compensates for loss of function. Nature392:618–620

Golding GB, Dean AM (1998). The structural basis of molecular ad-aptation. Mol Biol Evol 15:355–369

Gu X (1999) Statistical methods for testing functional divergence aftergene duplication. Mol Biol Evol 16(12):1664–1674

Hegyi H, Gerstein M (1999) The relationship between protein structureand function: a comprehensive survey with application to the yeastgenome. J Mol Biol 288:147–164

Huynen MA, Stadler PF, Fontana W (1996) Smoothness within rug-gedness: the role of neutrality in adaptation. PNAS USA 93:397–401

Jansen R, Gerstein M (2000) Analysis of the yeast transcriptome withbroad structural and functional categories: characterizing highlyexpressed proteins. Nucleic Acids Res 28:1481–1488

Maddison WP, Maddison DR (1992) MacClade: analysis of phylogenyand character evolution, version 3.0. Sinauer Associates, Sunder-land, MA

Maynard-Smith J (1970) Natural selection and the concept of a proteinspace. Nature 225:563–564

Mewes HW, Hani J, Pfeiffer F, Frishman D (1998) MIPS: a databasefor protein sequences and complete genomes. Nucleic Acids Res26:33–37

Naeem S, Li S (1997) Biodiversity enhances ecosystem reliability.Nature 390:507

Naylor GJP, Brown WM (1998)Amphioxusmitochondrial DNA, chor-date phylogeny, and the limits of inference based on comparisons ofsequences. Syst Biol 47(1):61–76

Perutz MF (1983) Species adaptation in a protein molecule. Mol BiolEvol 1:1–28

Pesole G, Dellisanti G, Preparata G, Saccone C (1995) The importanceof base composition in the correct assessment of genetic distance.J Mol Evol 41:1124–1127

Riley M (1997) Genes and proteins ofEscherichia coli K-12(GenProtEC). Nucleic Acids Res 25:51–52

Riley M (1998) Systems for categorizing functions of gene products.Curr Opin Struct Biol 8:388–392

Romero-Herrera AE, Lehmann H, Joysey KA, Friday AE (1978) Onthe evolution of myoglobin. Phil Trans R Soc Lond B Biol Sci283(995):61–163

Sullivan J, Swofford DL (1997) Are guinea pigs rodents? The impor-tance of adequate models in molecular phylogenetics. J MammalEvol 4(2):77–86

Swofford DL (1999) PAUP* 4.0b2: phylogenetic analysis using par-simony (*and other methods). Sinauer Associates, Sunderland, MA

Vingron M, Sibbald PR (1993) Weighting in sequence space: a com-parison of methods in terms of generalized sequences. Proc NatlAcad Sci USA 90:8777–8781

Vingron M, Waterman MS (1994) Sequence alignment and penaltychoice. Review of concepts, case studies and implications. J MolBiol 235:1–12

Wagner GP, Altenburg L (1996) Complex adaptations and the evolu-tion of evolvability. Evolution 50(3):967–976

Walsh JB (1995) How often do duplicated genes evolve new functions?Genetics 139:421–428

Wuchty S, Fontana W, Hofacken I, Schuster P (1999) Complete sub-optimal folding of RNA and the stability of secondary structures.Biopolymers 49(2):145–165

233