15
Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levy a,b,c,1 , Subhajyoti De a,d,e , and Sarah A. Teichmann a,1 a Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom; b Département de Biochimie, Université de Montréal, Montréal, QC, Canada H3T 1J4; c Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel; d Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045; and e Molecular Oncology Program, University of Colorado Cancer Center, Aurora, CO 80045 Edited* by Ken A. Dill, Stony Brook University, Stony Brook, NY, and approved October 30, 2012 (received for review June 1, 2012) In living cells, functional proteinprotein interactions compete with a much larger number of nonfunctional, or promiscuous, interac- tions. Several cellular properties contribute to avoiding unwanted protein interactions, including regulation of gene expression, cellu- lar compartmentalization, and high specicity and afnity of func- tional interactions. Here we investigate whether other mechanisms exist that shape the sequence and structure of proteins to favor their correct assembly into functional protein complexes. To exam- ine this question, we project evolutionary and cellular abundance information onto 397, 196, and 631 proteins of known 3D structure from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, respectively. On the basis of amino acid frequencies in interface patches versus the solvent-accessible protein surface, we dene a propensity or stickinessscale for each of the 20 amino acids. We nd that the propensity to interact in a nonspecic manner is inversely correlated with abundance. In other words, high abun- dance proteins have less sticky surfaces. We also nd that stickiness constrains protein evolution, whereby residues in sticky surface patches are more conserved than those found in nonsticky patches. Finally, we nd that the constraint imposed by stickiness on protein divergence is proportional to protein abundance, which provides mechanistic insights into the correlation between protein conserva- tion and protein abundance. Overall, the avoidance of nonfunctional interactions signicantly inuences the physico-chemical and evolu- tionary properties of proteins. Remarkably, the effects observed are consistently larger in E. coli and S. cerevisiae than in H. sapiens, suggesting that promiscuous proteinprotein interactions may be freer to accumulate in the human lineage. promiscuity | protein structure | interaction potential T he interior of cells is a highly crowded environment where proteins continuously encounter each other (1). Thus, for cells to function properly, it is important that casual encounters do not outweigh functional ones. Statistically, the competition from nonfunctional interactions should be severe (24), given that the huge number of possible interactions far outweighs the comparatively small number of functional interactions: the Escherichia coli proteome contains about 4,200 proteins, yielding over 8,000,000 potential distinct pairwise interactions. Eukary- otic proteomes are even larger and require additional mecha- nisms to minimize the impact of nonfunctional interactions (3, 5, 6). For example, Zhang et al. showed that, in Saccharomyces cerevisiae, the average concentration of coexpressed and colo- calized proteins is close to the upper tolerable limit (3), implying that compartmentalization of proteins in time and space was crucial to allow the expansion of eukaryotic protein repertoires. In addition to cellular mechanisms such as compartmentalization and regulation of protein abundance, shown to be important for intrinsically unstructured proteins, for example (7), specic physico- chemical properties contribute to minimizing nonfunctional pro- tein-protein interactions (PPIs). This has been observed within the protein core (8) and within interface patches (9), which, due to their hydrophobic character, have a potential to mediate nonfunctional interactions. Pechmann et al. showed that interface regions are often aggregation-prone but protected by strategically placed disulde bonds and salt bridges (9). Such aggregation-prone regions have also been shown to be less frequent among highly expressed proteins, which, according to the law of mass action, are potentially more deleterious to the cell than lowly expressed proteins (10). Importantly, in these studies aggregation is mea- sured along the protein sequence and therefore reects the po- tential for aggregation of the unfolded state. Most previous studies have highlighted negative-designprin- ciples at known binding regions (9) or examined nonfunctional interactions through aggregation (1013). In contrast, here we concentrate on the surface regions of proteins in their folded state. Specically, we ask if the folded state of proteins is evolutionarily constrained by nonfunctional interactions. This means, in partic- ular, that we consider surface residues but not amino acids buried in the protein core, as these cannot be involved in proteinprotein interactions. In a molecular evolution-oriented study, Yang et al. recently observed that such surface-specic evolutionary con- straints exist in yeast (14). Here we present a complementary analysis that places the emphasis on the physico-chemical prop- erties of proteins associated with constraints from nonfunctional interactions and describe these properties in two additional spe- cies to better cover the tree of life. We thus assembled three datasets of proteins of known structure in their biological state (biological unit), resulting in 397, 196, and 631 proteins for E. coli, S. cerevisiae, and Homo sapiens, respectively. Results Dening an Interaction Propensity Scale. To investigate the impact of promiscuous interactions, we rst dene an interaction pro- pensity scale to use as a proxy for an amino acid stickinessscale. We derive this scale purely from structural data by taking the log ratio of amino acid frequencies observed at the protein surface versus in proteinprotein interfaces, as previously de- ned (1517) and as illustrated in Fig. 1B. As we consider pro- tein structures in terms of biological units, surface amino acids as dened here are not involved in interfacial proteinprotein contacts in the crystal structure. This scale thus reects a trade- off between the probability of nding a given amino acid in a solvated environment versus the residue being involved in an interaction with another protein. For example, lysine is fre- quent at the surface (15% of amino acids) but rare in in- terface core regions (<5% of amino acids), which makes it an Author contributions: E.D.L., S.D., and S.A.T. designed research; E.D.L. and S.D. performed research; E.D.L., S.D., and S.A.T. analyzed data; and E.D.L. and S.A.T. wrote the paper. The authors declare no conict of interest. *This Direct Submission article had a prearranged editor. Freely available online through the PNAS open access option. Data deposition: The data processed in this paper are available at: www.tinyurl.com/ structuralregions. 1 To whom correspondence may be addressed. E-mail: [email protected] or sat@ mrc-lmb.cam.ac.uk. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1209312109/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1209312109 PNAS Early Edition | 1 of 6 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Cellular crowding imposes global constraints onthe chemistry and evolution of proteomesEmmanuel D. Levya,b,c,1, Subhajyoti Dea,d,e, and Sarah A. Teichmanna,1

aMedical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom; bDépartement de Biochimie, Université de Montréal,Montréal, QC, Canada H3T 1J4; cDepartment of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel; dDepartment of Medicine, Universityof Colorado School of Medicine, Aurora, CO 80045; and eMolecular Oncology Program, University of Colorado Cancer Center, Aurora, CO 80045

Edited* by Ken A. Dill, Stony Brook University, Stony Brook, NY, and approved October 30, 2012 (received for review June 1, 2012)

In living cells, functional protein–protein interactions compete witha much larger number of nonfunctional, or promiscuous, interac-tions. Several cellular properties contribute to avoiding unwantedprotein interactions, including regulation of gene expression, cellu-lar compartmentalization, and high specificity and affinity of func-tional interactions. Here we investigate whether other mechanismsexist that shape the sequence and structure of proteins to favortheir correct assembly into functional protein complexes. To exam-ine this question, we project evolutionary and cellular abundanceinformation onto 397, 196, and 631 proteins of known 3D structurefrom Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens,respectively. On the basis of amino acid frequencies in interfacepatches versus the solvent-accessible protein surface, we definea propensity or “stickiness” scale for each of the 20 amino acids.We find that the propensity to interact in a nonspecific manner isinversely correlated with abundance. In other words, high abun-dance proteins have less sticky surfaces. We also find that stickinessconstrains protein evolution, whereby residues in sticky surfacepatches are more conserved than those found in nonsticky patches.Finally, we find that the constraint imposed by stickiness on proteindivergence is proportional to protein abundance, which providesmechanistic insights into the correlation between protein conserva-tion and protein abundance. Overall, the avoidance of nonfunctionalinteractions significantly influences the physico-chemical and evolu-tionary properties of proteins. Remarkably, the effects observed areconsistently larger in E. coli and S. cerevisiae than in H. sapiens,suggesting that promiscuous protein–protein interactions may befreer to accumulate in the human lineage.

promiscuity | protein structure | interaction potential

The interior of cells is a highly crowded environment whereproteins continuously encounter each other (1). Thus, for

cells to function properly, it is important that casual encountersdo not outweigh functional ones. Statistically, the competitionfrom nonfunctional interactions should be severe (2–4), giventhat the huge number of possible interactions far outweighs thecomparatively small number of functional interactions: theEscherichia coli proteome contains about 4,200 proteins, yieldingover 8,000,000 potential distinct pairwise interactions. Eukary-otic proteomes are even larger and require additional mecha-nisms to minimize the impact of nonfunctional interactions (3, 5,6). For example, Zhang et al. showed that, in Saccharomycescerevisiae, the average concentration of coexpressed and colo-calized proteins is close to the upper tolerable limit (3), implyingthat compartmentalization of proteins in time and space wascrucial to allow the expansion of eukaryotic protein repertoires.In addition to cellular mechanisms such as compartmentalization

and regulation of protein abundance, shown to be important forintrinsically unstructured proteins, for example (7), specific physico-chemical properties contribute to minimizing nonfunctional pro-tein-protein interactions (PPIs). This has been observed within theprotein core (8) and within interface patches (9), which, due to theirhydrophobic character, have a potential to mediate nonfunctionalinteractions. Pechmann et al. showed that interface regions are

often aggregation-prone but protected by strategically placeddisulfide bonds and salt bridges (9). Such aggregation-proneregions have also been shown to be less frequent among highlyexpressed proteins, which, according to the law of mass action, arepotentially more deleterious to the cell than lowly expressedproteins (10). Importantly, in these studies aggregation is mea-sured along the protein sequence and therefore reflects the po-tential for aggregation of the unfolded state.Most previous studies have highlighted “negative-design” prin-

ciples at known binding regions (9) or examined nonfunctionalinteractions through aggregation (10–13). In contrast, here weconcentrate on the surface regions of proteins in their folded state.Specifically, we ask if the folded state of proteins is evolutionarilyconstrained by nonfunctional interactions. This means, in partic-ular, that we consider surface residues but not amino acids buriedin the protein core, as these cannot be involved in protein–proteininteractions. In a molecular evolution-oriented study, Yang et al.recently observed that such surface-specific evolutionary con-straints exist in yeast (14). Here we present a complementaryanalysis that places the emphasis on the physico-chemical prop-erties of proteins associated with constraints from nonfunctionalinteractions and describe these properties in two additional spe-cies to better cover the tree of life. We thus assembled threedatasets of proteins of known structure in their biological state(“biological unit”), resulting in 397, 196, and 631 proteins forE. coli, S. cerevisiae, and Homo sapiens, respectively.

ResultsDefining an Interaction Propensity Scale. To investigate the impactof promiscuous interactions, we first define an interaction pro-pensity scale to use as a proxy for an amino acid “stickiness”scale. We derive this scale purely from structural data by takingthe log ratio of amino acid frequencies observed at the proteinsurface versus in protein–protein interfaces, as previously de-fined (15–17) and as illustrated in Fig. 1B. As we consider pro-tein structures in terms of biological units, surface amino acids asdefined here are not involved in interfacial protein–proteincontacts in the crystal structure. This scale thus reflects a trade-off between the probability of finding a given amino acid in asolvated environment versus the residue being involved in aninteraction with another protein. For example, lysine is fre-quent at the surface (∼15% of amino acids) but rare in in-terface core regions (<5% of amino acids), which makes it an

Author contributions: E.D.L., S.D., and S.A.T. designed research; E.D.L. and S.D. performedresearch; E.D.L., S.D., and S.A.T. analyzed data; and E.D.L. and S.A.T. wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.

Data deposition: The data processed in this paper are available at: www.tinyurl.com/structuralregions.1To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1209312109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1209312109 PNAS Early Edition | 1 of 6

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Page 2: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

interaction-resistant or “nonsticky” amino acid (17). We usedonly E. coli proteins to derive this scale, but our conclusions arenot dependent on the organism used because the scales based onS. cerevisiae and H. sapiens proteins are almost identical to thatof E. coli (Rcoli-yeast = 0.94, Rcoli-human = 0.97; Fig. S2).

Chemical Constraints on Surfaces of Highly Abundant Proteins.Nonfunctional interactions are, on average, detrimental to fit-ness because they sequester interaction partners (18). Accordingto the law of mass action, the number of nonfunctional inter-actions that a protein participates in should be proportional to itsabundance (19). Therefore, an abundant protein with a stickysurface is expected to be more deleterious than a low-abundanceprotein with the same surface stickiness. If cellular crowdingand its associated promiscuous interactions were a constraint in

cellular systems, we would expect an anticorrelation betweenprotein surface stickiness and protein abundance. We quantifiedthe stickiness of a protein surface as the average of interface-propensity scores, thus reflecting the tendency of its solvent-ac-cessible residues to interact with other protein surfaces (Fig. 1).For all three organisms, we used all of the available experimentaldata on protein abundance provided by the PaxDb database(http://pax-db.org) (20). These values are linearly proportional toprotein copy numbers in cells.Plotting surface stickiness against protein abundance reveals

a significant anticorrelation in all three organisms studied (pcoli =9.10−10, pyeast = 7.10−7, phuman = 2.10−5; Fig. 1C; these andsubsequent P values associated with correlations were calculatedusing the F-statistic obtained by analysis of variance of the linearassociation between abundance and stickiness). However, the

Surface Stickiness

Pro

tein

Abu

ndan

ce (

A.U

.)

−0.4 −0.3−0.5 −0.2 −0.1 0

15

2010

050

020

00 R= −0.48p = 9.4e−10

E. coli S. cerevisiae H. sapiens

E. coli (397 structures) S. cerevisiae (196 structures) H. sapiens (631 structures)

Interior Stickiness

Surface Stickiness

Interior Stickiness

Surface Stickiness

Interior Stickiness0.2 0.3 0.4 0.5 0.6 0.7

15

2010

050

020

00 R= −0.08 p = 0.4

−0.5 −0.4 −0.3 −0.2 −0.1 0

15

2010

050

010

000 R= −0.36

p = 7.2e−07

0.2 0.3 0.4 0.5 0.6 0.7

15

2010

050

010

000

R= −0.26 p = 0.00018

Pro

tein

Abu

ndan

ce (

A.U

.)

−0.5 −0.4 −0.3 −0.2 −0.1 0

0.2

15

2010

020

00

R= −0.25p = 2.0e−05

0.2 0.3 0.4 0.5 0.6 0.7

0.2

15

2010

020

00R= −0.07p = 0.33

B

A

C

D

KEDQNPGRATSHVLWYM I CF −1

.00

.01

.0

amino acidinterface propensity

( )freqAAfreqAA

interface

surfacelog

SurfaceProtein InterfaceInterior

Projection of protein abundance and evolutionary information onto structures

proxy for amino acid ”stickiness” scale

+ +

Fig. 1. The solvent-accessible surfaces of high-abundance proteins are enriched in nonsticky amino acids compared with low-abundance proteins. (A)Illustration of the approach taken in this study. (B) We first define a stickiness scale for each amino acid using its interface propensity. The propensity isdefined by the log ratio of amino acid frequencies at interfaces versus surfaces. The definition of the structural regions used is explained in more detail inFig. S1. (C and D) We calculate a stickiness score by averaging interface propensity scores of residues in the region considered (surface or interior). We thenplot this score against the abundance of the protein and indicate the Spearman rank correlation coefficients of the relationships, as well as the P value as-sociated with the linear association obtained by analysis of variance. The contour lines mark the 2/6, 3/6, 4/6, and 5/6 percentile of the density function range.

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1209312109 Levy et al.

Page 3: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

magnitude of the anticorrelation, as measured by the Spearmanrank correlation coefficient, varies greatly. The strongest anti-correlation is found in E. coli (R = −0.48), followed by yeast (R =−0.36), followed by human (R = −0.25).This result shows that the surface of highly abundant proteins

has adapted to become less sticky and more soluble than forlowly abundant proteins, especially in E. coli and, to a smallerextent, in yeast and humans. This weaker signal might reflect thefact that eukaryotic cells are more compartmentalized thanbacterial cells, which may introduce a bias in the measure ofprotein concentration approximated here with abundance. Ananalysis of protein stickiness as a function of localization indeedreveals significant differences across different cellular compart-ments. Interestingly, nuclear proteins are more sticky than therest of the proteome taken as an average (pcerevisiae = 0.023;psapiens = 0.016) whereas mitochondrial proteins are less sticky(pcerevisiae = 0.0021, psapiens = 0.0045). Remarkably, in H. sapiensthe gene ontology (GO) term most enriched in nonsticky pro-teins is “soluble fraction” (psapiens = 3.6*10−5; Fig. S3).The amino acid potential provided in this analysis yields

results that are significantly different from those obtained on thebasis of the commonly used hydrophobicity scale of Kyte andDoolittle (21). When considering this hydrophobicity scale, theassociation described in Fig. 1 disappears in S. cerevisiae andH. sapiens and greatly weakens in E. coli (Fig. 2 and Fig. S4).We further tested 71 additional scales associated with “hydro-phobicity” from the AAindex database (22) (Table S1). In-terestingly, the scale of Wimley and White (23) yields the bestcorrelation (R = −0.44) in E. coli, and is based on the transfer ofamino acids from a hydrophobic environment (lipid bilayerinterface) to water. This is different from the Kyte and Doolittlescale, which is based on measures of transfers of amino acidsbetween two polar environments (e.g., ethanol and water). Thesimilarity between the stickiness scale and the Wimley and Whitescale may reflect the fact that an interaction resembles more atransfer from water to a hydrophobic environment than a trans-fer between two relatively polar environments. Fig. S5 providesa comparison of these three scales, and Table S2 presents thevalues for our stickiness scale.Current views of protein evolution emphasize stability, which

must be maintained to avoid misfolding and thereby prevent lossof function or aggregation (24, 25). To assess the extent to whichthe anticorrelation observed here is linked to the unfolded state ofthe protein, we reproduce the same plots but now consider aminoacids at the protein interior instead of the surface (Fig. 1D). For

two organisms, the correlation disappears almost entirely when weconsider amino acids at the interior. The surface–interior differ-ence is most marked in E. coli, where the correlation vanishesalmost completely (R = −0.08) and becomes insignificant (P =0.4). In humans, the weaker anticorrelation observed in Fig. 1B isalso lost with interior amino acids (R = −0.07, P = 0.33), whereasin yeast a weak correlation persists (R = −0.26, P = 4.10−2).Considering protein length provides a further piece of evi-

dence showing that misassembly rather than misfolding is re-sponsible for the anticorrelation between surface stickiness andabundance. It is known that the small hydrophobic core of shortproteins (26) requires compensating mechanisms (27) that in-crease their stability. In line with this, we find an increase ininterior stickiness among small proteins relative to larger pro-teins for all three species (Fig. S6; pcoli = 2.10−5; pyeast = 0.015;phuman = 2.10−8). The increased stickiness associated with thecore of small proteins suggests that a strong amino acid in-teraction potential can lead to an increase in stability. Compar-atively, however, the lack of association between surfacestickiness and protein length (Fig. S6; pcoli = 0.05; pyeast = 0.89;phuman = 0.15) implies that stability is unlikely to drive the evo-lution of protein surfaces toward nonsticky amino acids.Taken together, these results suggest that, in addition to se-

lection against misfolding and aggregation of polypeptide chains,

−2.4 −2 −1.6 −1.2 −0.8

15

2010

020

00R= −0.22

p= 0.0056

Surface Hydrophobicity Surface Hydrophobicity

Pro

tein

Abu

ndan

ce (

A.U

.)

−2.6 −2.2 −1.8 −1.4 −1.01

520

100

500

1000

0 R= −0.02p= 0.51

E. coli S. cerevisiae

Fig. 2. Protein hydrophobicity is less strongly tuned as a function of abun-dance than stickiness. We calculate a “hydrophobicity score” for the surfaceand interior regions of a protein by averaging Kyte and Doolittle hydropho-bicity scores of residues in the region (21). We then plot this score against theabundance of the protein and indicate the Spearman rank correlation coef-ficients of the relationships, as well as the P value associated with the linearassociation obtained by analysis of variance. The hydrophobicity analysis forall species and surface as well as interior regions is shown in Fig. S4.

E. coli S. cerevisiae H. sapiens

Average stickiness of the surrounding surface patch

A

B

01

23

4ra

te(p

rote

in)

rate

(sur

face

resi

du

e)

35% differencep = 9e−120

65% differencep = 4e−101

12% differencep = 6e−52

in non-sticky

context

400Ao 2

in sticky

context

central residue

The composition of central residues is independent of their context

Fig. 3. The relative evolutionary rate of an amino acid is influenced by thestickiness of its environment. (A) Illustration of the procedure used to calculatethe stickiness score of a residue’s environment. We use this score as a proxy forthe probability of the central residue to trigger a promiscuous interactionupon mutation. Note that, although the central residue is classified accordingto its context, its chemical composition remains independent of the contextand follows an average surface composition, even for the most sticky categoryof patches (Fig. S7). (B) An evolutionary conservation ratio is calculated foreach surface amino acid. The ratio is equal to the median evolutionary rate ofthe entire protein divided by the evolutionary rate of the residue. We bin allresidues into five classes of equal size and increasing stickiness and show theboxplot distribution of evolutionary rates for each class. In all three organisms,the stickier the environment of a residue, the more the residue is conservedrelative to the rest of the protein. Note that in this analysis we consider theconservation of the central residue and not that of the patch surrounding it.P values are calculated using the Wilcoxon test.

Levy et al. PNAS Early Edition | 3 of 6

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Page 4: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

avoidance of nonfunctional interactions by folded proteins is animportant constraint that is proportional to abundance. More-over, adaptation to this constraint is achieved through a bias insurface amino acid composition toward nonsticky amino acids.

Surface Stickiness Is an Evolutionary Constraint. To assess whethernonfunctional interactions place a constraint on protein evolution,we study conservation at the amino acid level. We ask whether,within a protein, amino acids surrounded by a sticky environmentare more conserved than amino acids surrounded by a nonstickyenvironment. We computed rates of evolution for each amino acidfor all three species and projected these data onto protein struc-tures of each organism (Materials and Methods). In parallel, wecalculated a surrounding stickiness score for every surface aminoacid of each protein (Fig. 3A). This score is calculated from theamino acid composition of the 400-Å2 surface patch surroundingthe residue of interest by averaging its amino acids stickinessvalues (note that the stickiness of the central residue is in-dependent from that of the patch). Residues are then binned intofive “surrounding stickiness” classes of equal size for each or-ganism, and evolutionary conservation is compared across the fiveclasses (Fig. 3B). We reason that residues in more sticky envi-ronments are expected to have a higher probability of triggeringnonfunctional interactions upon mutation and on average shouldbe more constrained than those in less sticky environments.Importantly in Fig. 3, the evolutionary rate of each residue is

normalized as we divide the rate of the protein by that of each

residue. Therefore, the larger the ratio, the more conserved theresidue relative to the protein. This shows the clear effect ofa residue’s environment stickiness on its degree of conservationrelative to the protein: residues in nonsticky environments (left-most bin) are 35%, 65%, and 12% freer to evolve than residuesin stickier environments (right-most bin) for E. coli, S. cerevisiae,and H. sapiens, respectively. Because these values are obtainedafter a normalization per protein, they reflect the impact ofstickiness on conservation relative to the conservation of theprotein. This normalization is necessary to single-out the effectof stickiness because lowly expressed proteins are poorly con-served (28) but also carry most of the sticky patches, as shown inFig. 1C. Interestingly, the weaker adaptation of human proteinsagainst nonfunctional interactions observed in Fig. 1C is repro-duced here, as differences in evolutionary conservation acrossthe five probability classes are weakest in the human data set.It can be argued that the conservation of residues found in sticky

surface patches is due to those patches being unknown biologicalinterfaces. However, several pieces of evidence suggest otherwise.First, if this were the case, we would not expect to see such a dif-ference in signal between species (i.e., decreasing signal strengthfrom E. coli toH. sapiens) because functional interfaces should, onaverage, be conserved in all species. Second, we would expect thecentral residue within sticky patches to resemble interface aminoacids. To assess this, we compared the frequency distribution ofamino acids in sticky patches (Fsticky) with that of amino acids atthe interface (Finterface) and surface (Fsurface). Because amino acids

Protein abundance class (%) Protein abundance class (%) Protein abundance class (%)

Ratio r = =

0-20 20-40 40-60 60-80 80-100 Top 5 0-20 20-40 40-60 60-80 80-100 Top 5 0-20 20-40 40-60 60-80 80-100 Top 5

E. coli S. cerevisiae H. sapiens

B

A

Mut1Mut1

Mut2

Interior Non sticky surfaceSticky surface

Mut2

Mut3

Mut3Mut4

Mut4

Low Low

Low

Low HighLowHigh

Low

Promiscuous

PPIs

Misfolding

toxicity

Low abundance protein High abundance protein Constrained by

substitution frequency between D and E

substitution frequency between K and R

Ra

tio

r

Ra

tio

r

Ra

tio

r

23

45

67

8

0.6

0.8

11

.21

.4

0.9

11

.11

.21

.31

.4

(K-R) + ∑ (R-K)*

∑ (K-K) + ∑ (R-R)

∑ (D-D) + ∑ (E-E)

∑ (D-E) + ∑ (E-D)

Fig. 4. The strength of selection against changes in protein stickiness is proportional to protein abundance. (A) Ratio of frequencies of two substitutiontypes: one between charged residues of equal stickiness (D and E) and one between charged residues with a change in stickiness (K and R). The ratio is plottedfor five bins of increasing protein abundance, each containing the same number of these charged residues. The sixth bin contains the top 5% abundantproteins. The ratio, r, defined in the figure, increases by 160%, 78%, and 13% in E. coli, S. cerevisiae, and H. sapiens, respectively, for the most abundantproteins relative to the least abundant ones. Thus, substitutions between K and R become less frequent than substitutions between D and E among highlyabundant proteins. The red intervals show the SD of the ratios r obtained from 1,000 datasets where abundance data are randomized. (B) Scheme illustratingthe constraints from misfolding and promiscuous interactions. Selection against misfolding provides an explanation for the relationship between proteinabundance and evolutionary conservation for residues buried in the interior because the deleterious effects of misfolded aggregates increase with abun-dance. Avoidance of promiscuous interactions provides a further mechanism that explains negative selection proportional to abundance for residues on thesolvent-accessible surface of proteins.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1209312109 Levy et al.

Page 5: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

such as cysteine are rare in all regions, we normalized these dis-tributions by the average frequencies observed in all regions(Ftotal). As expected, the linear regression between (Fsticky/Ftotal)and (Finterface/Ftotal) was not significant (pcoli= 0.27, pcerevisiae= 0.66,psapiens = 0.48). Residues found in sticky patches are in fact nearlyidentical in their composition to surface residues, as reflected bythe highly significant linear regression between (Fsticky/Ftotal) and(Fsurface/Ftotal): pcoli = 3.1e-14, pcerevisiae = 4.1e-14, psapiens = 9.2e-12,as obtained by analysis of variance. These results are detailed inFig. S7 and show that for biological units in the Protein Data Bank(PDB) the surfaces are largely solvent-exposed as opposed tobeing involved in cryptic stable interfaces. Considering isolatedsubunits, however, we observe the opposite because the stickypatches include genuine interfaces. For this data set, the distri-bution of residues at the center of sticky patches is closer to in-terface amino acids (Fig. S7, pcoli= 0.019, pcerevisiae= 0.16, psapiens=0.028) than to surface ones (for the latter, the regression slopes areactually negative (slopecoli=−0.85, slopecerevisiae=−1.02, slopesapiens=−0.19). Finally, the increasing conservation of residues in in-creasingly sticky environments holds true even at known interfaces,both at the rim and at the core (Fig. S8), showing that, even withinprotein–protein contact regions, stickiness is controlled. The latterobservation supports the notion of negative design (29) in sensitiveinterface regions (8, 9, 30). Although unknown biological inter-faces must exist, these observations make us confident that theyare unlikely to contribute significantly to the signal observed.

Nonfunctional Interactions Might Contribute to the DifferentialConservation Between Highly and Lowly Expressed Proteins. In thefirst part of this study, we observed an anticorrelation betweenprotein abundance and protein surface stickiness. Subsequentlywe saw that stickiness is correlated with conservation withina protein. This prompts us to ask whether protein stickinessmight be involved in the well-established correlation betweenprotein abundance and evolutionary conservation. Thus, wewould expect low-copy proteins to be more tolerant thanabundant proteins to amino acid substitutions that significantlychange their surface stickiness.To test this hypothesis, we took advantage of the properties of

two pairs of charged amino acids: aspartic (D) and glutamic (E)acids have similar stickiness scores, whereas arginine (R) and ly-sine (K) do not (Fig. 1B) (15, 17, 31). Arginine is more frequentlyfound at protein–protein interfaces than lysine, making it a stickieramino acid according to our definition. This characteristic enablesus to make the following prediction: among high-copy proteins,where significant changes in stickiness have a greater impact,substitutions between K and R should be less frequent than sub-stitutions between D and E. Also, because K, R, E, and D aremostly present at protein surfaces (15, 17), we do not need torestrict ourselves to proteins of known structure and can measuresubstitution rates from whole proteomes.We thus measured the substitutions frequencies between K

and R (fK<->R) as well as between D and E (fD<->E) amongorthologs of three species pairs: E. coli–Salmonella typhimurium,S. cerevisiae–Saccharomyces paradoxus, and H. sapiens–Musmusculus, as detailed in Table S3. Fig. 4 shows ratios of thesefrequencies (fD↔E/fK↔R) as a function of protein abundance.Substitutions between K and R are rare among abundant pro-teins relative to substitutions between D and E. In contrast,among low-copy proteins, both substitution types occur at morecomparable frequencies. Interestingly, the magnitude of the ef-fect observed, again, decreases in strength from E. coli (160%change between lowest and highest abundance classes) to yeast(78% change) and to humans (13% change).Taken together, these observations provide mechanistic insights

into the well-established correlation between protein abundanceand evolutionary conservation. Although this correlation hasbeen known for over a decade (28), the biological mechanisms

associated with it are still not entirely clear. Selection againstmisfolding can explain part of the correlation (24, 25), where theassumption is that toxicity of misfolded proteins is proportional totheir abundance. Our results support the notion that avoidance ofpromiscuous interactions, or negative pleiotropy (32), representsan additional mechanistic explanation (Fig. 4B).

DiscussionIt has been shown previously that mutations tend to arise faster atthe protein surface than in the interior (33, 34). In fact, Toth-Petroczy and Tawfik recently showed that mutations at the interioraccumulate more rapidly once the surface has drifted sufficiently(35). Therefore, by lowering the tolerance for mutations at thesurface, the divergence of the entire protein becomes constrained(35). Promiscuous interactions, which constrain mutations at thesurface, could thereby limit the evolutionary rate of the entireprotein. This is consistent with the results of a recent study byYang et al. showing, in a theoretical molecular evolutionary modelusing S. cerevisiae, that protein misinteraction represents an evo-lutionary constraint (14).Considering two additional species and taking a complemen-

tary approach placing more emphasis on the physico-chemicalproperties of proteins, we also find that protein misinteractionsrepresent an evolutionary constraint. We provide a physico-chemical rationalization of nonfunctional interactions throughthe stickiness scale. This scale is significantly different from theKyte and Doolittle hydrophobic scale, which is commonly usedas in, e.g., Yang et al. (14). Our stickiness scale is more similar tothe Wimley and White scale, although differences, e.g., betweenlysine and arginine, suggest that it is important to consider the“interaction” potential of amino acids in interpreting non-functional interactions. Interestingly, lysine underrepresentationat nonbiological crystal contacts also supports the notion thatlysine and arginine have different potentials to be involved innonfunctional interactions (17, 36). We thus hope that thestickiness scale proposed here will help to refine models thatcouple protein chemistry to cellular crowding (5). Furthermore,taken together, the work by Yang et al. and our work suggest thatproteins are constrained to avoid nonfunctional interactions,adding to the commonly accepted stability and solubility con-straints on the amino acid composition of proteins.Finally, the impact of promiscuous interactions appears most

prominent among the unicellular organisms E. coli and S. cer-evisiae. It is thus tempting to speculate that nonfunctionalinteractions may have accumulated in the human lineage (37) ina similar fashion to the accumulation of noncoding DNA (38). Ina further analogy to noncoding DNA, nonfunctional interactionsmay represent the raw material for exploring and ultimatelyselecting functional interactions (39, 40) through mechanismssuch as colocalization (41). These speculations should never-theless be considered with care, as the weaker signal observed forH. sapiens may also result from the ill-defined nature of proteinabundance in multicellular organisms. Future studies will thus beneeded to explore these ideas further and better understand theproperties of proteomes across the tree of life.

MethodsSequence Data. Sequences of proteins and their respective orthologs werealigned with MUSCLE (42). Orthology information was taken from ref. 43 forE. coli and from ENSEMBL v.48 (44) for H. sapiens. Multiple sequencealignments of S. cerevisiae proteins with their orthologs were taken fromWapinsky et al. (45). The details of the species used are in Table S4. Proteinmultiple alignments were concatenated to obtain three proteome widemultiple alignments (one for each species). These were used to calculateamino acid evolutionary rates using Rate4Site (46).

Structural Data. Species-specific structures were retrieved by sequence ho-mology. We searched for structures where the sequence from the SEQRESfield was similar to proteins from E. coli, S. cerevisiae, or H. sapiens

Levy et al. PNAS Early Edition | 5 of 6

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Page 6: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

proteomes. We imposed a minimal sequence identity of 90% and a mini-mum overlap of 70%. We used protein structures from the PDB (47), andthe dataset includes all structures present in the second release of3DComplex (48). All structures for which the biological state was manuallyannotated in the PiQSi database (49) as “error,” “probable error,” or“undefined” were discarded, as well as all DNA-binding and membraneproteins. Finally, we kept only structures with a resolution below 3 Å. Asummary of the number of structures per organism and complex type isgiven in Table S5. Structural regions were defined as in Levy (15). The en-vironment stickiness for a given residue was calculated based on its sur-rounding residues, i.e., residues with the Cα within a 400-Å2 patch centeredon the Cα of the residue of interest.

Abundance Data. Protein abundance data were taken from PaxDb (20) (http://pax-db.org). Because of the uncertainty associated with very low abundance

proteins, we discarded all proteins with an abundance unit below 1. Statis-tical analyses and plots were done with R. Data used in this study areavailable at www.tinyurl.com/structuralregions.

ACKNOWLEDGMENTS. We thank Dan Tawfik, Joël Janin, Eugene Shakhno-vich, Sergei Maslov, David Liberles, Joseph Marsh, Eviatar Natan, GideonSchreiber and Peter Tompa for their comments on the manuscript. We alsothank the two anonymous referees for their constructive comments thatsignificantly helped improve the paper. E.D.L. acknowledges the HumanFrontier Science Project for financial support through a long-term fellow-ship; Stephen Michnick and Université de Montréal for hosting part of thisresearch; and the Weizmann Institute of Science for hosting part of thisresearch. S.D. acknowledges support from the University of Colorado Schoolof Medicine and the National Cancer Institute Physical Sciences OncologyCenter initiative (U54-CA143798). E.D.L. and S.A.T. were supported by theMedical Research Council (file Reference U105161047).

1. McGuffee SR, Elcock AH (2010) Diffusion, crowding & protein stability in a dynamicmolecular model of the bacterial cytoplasm. PLOS Comput Biol 6(3):e1000694.

2. Janin J (1996) Quantifying biological specificity: The statistical mechanics of molecularrecognition. Proteins 25(4):438–445.

3. Zhang J, Maslov S, Shakhnovich EI (2008) Constraints imposed by non-functional protein-protein interactions on gene expression and proteome size. Mol Syst Biol 4:210.

4. Tompa P, Rose GD (2011) The Levinthal paradox of the interactome. Protein Sci20(12):2074–2079.

5. Heo M, Maslov S, Shakhnovich E (2011) Topology of protein interaction networkshapes protein abundances and strengths of their functional and nonspecific inter-actions. Proc Natl Acad Sci USA 108(10):4258–4263.

6. Johnson ME, Hummer G (2011) Nonspecific binding limits the number of proteins ina cell and shapes their interaction networks. Proc Natl Acad Sci USA 108(2):603–608.

7. Gsponer J, Futschik ME, Teichmann SA, Babu MM (2008) Tight regulation of un-structured proteins: From transcript synthesis to protein degradation. Science 322(5906):1365–1368.

8. Fleishman SJ, Baker D (2012) Role of the biomolecular energy gap in protein design,structure, and evolution. Cell 149(2):262–273.

9. Pechmann S, Levy ED, Tartaglia GG, Vendruscolo M (2009) Physicochemical principlesthat regulate the competition between functional and dysfunctional association ofproteins. Proc Natl Acad Sci USA 106(25):10159–10164.

10. Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M (2009) A relationship betweenmRNA expression levels and protein solubility in E. coli. J Mol Biol 388(2):381–389.

11. Hamada D, et al. (2009) Competition between folding, native-state dimerisation andamyloid aggregation in beta-lactoglobulin. J Mol Biol 386(3):878–890.

12. Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M (2007) Life on the edge: A linkbetween gene expression levels and aggregation rates of human proteins. TrendsBiochem Sci 32(5):204–206.

13. Münch C, Bertolotti A (2010) Exposure of hydrophobic surfaces initiates aggregationof diverse ALS-causing superoxide dismutase-1 mutants. J Mol Biol 399(3):512–525.

14. Yang JR, Liao BY, Zhuang SM, Zhang J (2012) Protein misinteraction avoidance causeshighly expressed proteins to evolve slowly. Proc Natl Acad Sci USA 109(14):E831–E840.

15. Levy ED (2010) A simple definition of structural regions in proteins and its use inanalyzing interface evolution. J Mol Biol 403(4):660–670.

16. Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein-protein recog-nition sites. J Mol Biol 285(5):2177–2198.

17. Janin J, Bahadur RP, Chakrabarti P (2008) Protein-protein interaction and quaternarystructure. Q Rev Biophys 41(2):133–180.

18. Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B (2009) Intrinsic protein disorder andinteraction promiscuity are widely associated with dosage sensitivity. Cell 138(1):198–208.

19. Levy ED, Michnick SW, Landry CR (2012) Protein abundance is key to distinguishpromiscuous from functional phosphorylation based on evolutionary information.Philos Trans R Soc Lond B Biol Sci 367(1602):2594–2606.

20. Wang M, et al. (2012) PaxDb, a database of protein abundance averages across allthree domains of life. Mol Cell Proteomics 11(8):492–500.

21. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic characterof a protein. J Mol Biol 157(1):105–132.

22. Kawashima S, et al. (2008) AAindex: Amino acid index database, progress report 2008.Nucleic Acids Res 36(Database issue):D202–D205.

23. Wimley WC, White SH (1996) Experimentally determined hydrophobicity scale forproteins at membrane interfaces. Nat Struct Biol 3(10):842–848.

24. Yang JR, Zhuang SM, Zhang J (2010) Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol 6:421.

25. Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding asa dominant constraint on coding-sequence evolution. Cell 134(2):341–352.

26. Chothia C (1975) Structural invariants in protein folding. Nature 254(5498):304–308.27. Pereira de Araújo AF, Gomes AL, Bursztyn AA, Shakhnovich EI (2008) Native atomic

burials, supplemented by physically motivated hydrogen bond constraints, containsufficient information to determine the tertiary structure of small globular proteins.Proteins 70(3):971–983.

28. Pál C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics158(2):927–931.

29. Doye JP, Louis AA, Vendruscolo M (2004) Inhibition of protein crystallization byevolutionary negative design. Phys Biol 1(1–2):9–13.

30. Levin KB, et al. (2009) Following evolutionary paths to protein-protein interactionswith high affinity and selectivity. Nat Struct Mol Biol 16(10):1049–1055.

31. MacCallum JL, Tieleman DP (2011) Hydrophobicity scales: A thermodynamic lookingglass into lipid-protein interactions. Trends Biochem Sci 36(12):653–662.

32. Liberles DA, Tisdell MD, Grahnen JA (2011) Binding constraints on the evolution ofenzymes and signalling proteins: The important role of negative pleiotropy. Proc BiolSci 278(1714):1930–1935.

33. Sasidharan R, Chothia C (2007) The selection of acceptable protein mutations. ProcNatl Acad Sci USA 104(24):10080–10085.

34. Franzosa EA, Xia Y (2009) Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol 26(10):2387–2395.

35. Tóth-Petróczy A, Tawfik DS (2011) Slow protein evolutionary rates are dictated bysurface-core association. Proc Natl Acad Sci USA 108(27):11151–11156.

36. Cie�slik M, Derewenda ZS (2009) The role of entropy and polarity in intermolecularcontacts in protein crystals. Acta Crystallogr D Biol Crystallogr 65(Pt 5):500–509.

37. Fernández A, Lynch M (2011) Non-adaptive origins of interactome complexity. Nature474(7352):502–505.

38. Lynch M (2007) The Origins of Genome Architecture (Sinauer Associates, Inc., Sun-derland, MA), pp 494.

39. Tawfik DS (2010) Messy biology and the origins of evolutionary innovations. NatChem Biol 6(10):692–696.

40. Nobeli I, Favia AD, Thornton JM (2009) Protein promiscuity and its implications forbiotechnology. Nat Biotechnol 27(2):157–167.

41. Kuriyan J, Eisenberg D (2007) The origin of protein interactions and allostery in co-localization. Nature 450(7172):983–990.

42. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32(5):1792–1797.

43. Moreno-Hagelsieb G, Janga SC (2008) Operons and the effect of genome redundancyin deciphering functional relationships using phylogenetic profiles. Proteins 70(2):344–352.

44. Flicek P, et al. (2008) Ensembl 2008. Nucleic Acids Res 36(Database issue):D707–D714.45. Wapinski I, Pfeffer A, Friedman N, Regev A (2007) Natural history and evolutionary

principles of gene duplication in fungi. Nature 449(7158):54–61.46. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: An algorithmic tool

for the identification of functional regions in proteins by surface mapping of evo-lutionary determinants within their homologues. Bioinformatics 18(Suppl 1):S71–S77.

47. Berman HM, et al. (2002) The Protein Data Bank. Acta Crystallogr D Biol Crystallogr58(Pt 6 No 1):899–907.

48. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA (2006) 3D complex: A structuralclassification of protein complexes. PLOS Comput Biol 2(11):e155.

49. Levy ED (2007) PiQSi: Protein quaternary structure investigation. Structure 15(11):1364–1367.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1209312109 Levy et al.

Page 7: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Supporting InformationLevy et al. 10.1073/pnas.1209312109

rASAc < 25% & ΔrASA = 0rASAc > 25% & ΔrASA = 0

INTERIORSURFACESUPPORTRIMCORE

ΔrASA > 0 & rASAm < 25%ΔrASA > 0 & rASAc > 25%ΔrASA > 0 & rASAm > 25% & rASAc < 25%

ΔrASA = rASAm-rASAc rASAm = relative ASA in monomerrASAc = relative ASA in complex

cross-section

CORE RIM RIM

INTERIOR

SURFACE

SUPPORT

Interacting

partner

Schematic view

Fig. S1. The regions of protein structure used in this study. We use the definitions of interface, surface, interface support, rim, and core as defined in Levyet al. (1). Amino acid interface propensities are computed as the log ratios of their frequencies at the interface core (orange) and surface (blue).

1. Levy ED (2010) A simple definition of structural regions in proteins and its use in analyzing interface evolution. J Mol Biol 403(4):660–670.

K E D Q N P G R A T S H V L W Y M I C F−1

.5−

1.0

−0

.50

.00

.51

.01

.5

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1

.5−

1.0

−0

.50

.00

.51

.01

.5

Inte

rfa

ce c

ore

pro

pe

nsi

ty r

ela

tiv

e t

o s

urf

ace

Inte

rfa

ce c

ore

pro

pe

nsi

ty r

ela

tiv

e t

o s

urf

ace

E. coli propensitiesE. coli propensities

S. c

erev

isia

e p

rop

en

siti

es

S. cerevisiae

A

C

DE

F

GH

I

K

L

M

NP

Q

R

ST

V

W

Y

K E D Q N P G R A T S H V L W Y M I C F−1

.5−

1.0

−0

.50

.00

.51

.01

.5

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1

.5−

1.0

−0

.50

.00

.51

.01

.5

H. s

apie

ns p

rop

en

siti

es

H. sapiens

A

C

DE

F

G

H

I

K

L

M

NPQ

R ST

V

W

Y

Fig. S2. Amino acid interface propensities are similar across distant species. (Upper panels) Residue interface-to-surface propensities for Saccharomycescerevisiae and Homo sapiens. (Lower panels) S. cerevisiae and H. sapiens residue interface propensities are very similar to those in Escherichia coli.

Levy et al. www.pnas.org/cgi/content/short/1209312109 1 of 9

Page 8: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

solu

ble

frac

tion

Low

est 1

0%T

op 1

0%Lo

wes

t 10%

Top

10%

cell

frac

tion

nucl

eopl

asm

nucl

ear

part

extr

acel

lula

r re

gion

nucl

ear

lum

enin

trin

sic

to m

embr

ane

mito

chon

dria

l lum

enm

itoch

ondr

ial m

atrix

mito

chon

dria

l par

tin

tegr

al to

mem

bran

eve

sicl

e

neur

on p

roje

ctio

ncy

topl

asm

ic v

esic

lenu

cleo

lus

Gol

gi a

ppar

atus

par

tce

ll pr

ojec

tion

Gol

gi a

ppar

atus

subs

ynap

tic r

etic

ulum

mem

bran

e fr

actio

nm

itoch

ondr

ial e

nvel

ope

inso

lubl

e fr

actio

nen

velo

peor

gane

lle e

nvel

ope

mito

chon

dria

l mem

bran

een

dopl

asm

ic r

etic

ulum

extr

acel

lula

r re

gion

par

tex

trac

ellu

lar

spac

een

dopl

asm

ic r

etic

ulum

par

tm

itoch

ondr

ial i

nner

mem

bran

eor

gane

lle in

ner

mem

bran

een

dopl

asm

ic r

etic

ulum

mem

bran

em

icro

tubu

le c

ytos

kele

ton

plas

ma

mem

bran

e pa

rtor

gane

lle m

embr

ane

cyto

skel

etal

par

tcy

tosk

elet

onva

cuol

een

dom

embr

ane

syst

em

endo

som

e

pvalue

standard deviation obtained on simulated random data

H. sapiens

S. cerevisiae

Z-score

p. value

Z-scoreS

ticki

ness

sco

re o

f pro

tein

sur

face

sS

ticki

ness

sco

re o

f pro

tein

sur

face

s

1 1 1 1 1 1

nucl

ear

part

mito

chon

drio

n

prot

ein

com

plex

mac

rom

olec

ular

com

plex

nucl

eus

intr

acel

lula

r or

gane

lle p

art

orga

nelle

par

t

orga

nelle

env

elop

e

enve

lope

mem

bran

e

endo

plas

mic

ret

icul

um

mem

bran

e pa

rt

orga

nelle

mem

bran

e

endo

mem

bran

e sy

stem

cyto

sol

mito

chon

dria

l par

t

orga

nelle

lum

en

intr

acel

lula

r or

gane

lle lu

men

1 1

Fig. S3. Changes in protein surface stickiness as a function of subcellular localization. Stickiness scores of surface residues are binned according to the geneontology (GO) annotation of the protein to which they correspond. Note that, for proteins with multiple GO annotations, residues are counted several times.For each bin or GO category, the median stickiness scores are shown. The red lines are the SDs of the medians when GO annotations are shuffled (100,000iterations). The first two bars are the scores of the top and lowest 10 quantiles. This illustrates the similarity in stickiness across different cellular compartments.There are, however, significant differences which, remarkably, are conserved across Saccharomyces cerevisiae and Homo sapiens; e.g., proteins in mitochondriatend to be less sticky than average (pcerevisiae = 0.023; psapiens = 0.016), whereas nuclear proteins tend to be more sticky (pcerevisiae = 0.0021, psapiens = 0.0045). Theleast sticky GO cellular component is the “soluble fraction” (psapiens = 3.6*10−5).

Levy et al. www.pnas.org/cgi/content/short/1209312109 2 of 9

Page 9: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Fig. S4. Protein hydrophobicity is less strongly tuned as a function of abundance than stickiness. We calculate a “hydrophobicity score” for the surface and interiorregions of a protein by averaging hydrophobicity scores (1) of residues in the region. We then plot this score against the abundance of the protein and indicate theSpearman rank correlation coefficients of the relationships, as well as the P value associated with the linear association obtained by analysis of variance.

1. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132.

Kite & Doolittle hydrophobicity

Stic

kine

ss s

cale

A

C

DE

F

G

H

I

K

LM

N PQ

R

ST

VWY

Wimley & White hydrophobicity

Stic

kine

ss s

cale

A

C

DE

F

G

H

I

K

LM

NP

Q

R

ST

VWY

Fig. S5. Comparison of the stickiness scale with the Kyte and Doolittle scale and the Wimley and White scales. The stickiness scale is distinct from the othertwo scales, particularly with respect to K, R, and W.

Levy et al. www.pnas.org/cgi/content/short/1209312109 3 of 9

Page 10: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

E. coli S. cerevisiae H. sapiens

200 400 600 80010001200

−0.

4−

0.3

−0.

2−

0.1

0.0

Protein length

Su

rfac

e S

ticki

ness R= −0.12

200 400 600 800

−0.

5−0.

4−0.

3−0.

2−0.

10.0

Protein length

R= −0.08

200 400 600 800 1000

−0.

6−

0.4

−0.

20.

00.

2

Protein length

R= −0.1

200 400 600 800100012000.

20.

30.

40.

50.

60.

70.

8Protein length

Inte

rio

r S

ticki

ness R= −0.36

200 400 600 800

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Protein length

R= −0.27

200 400 600 800 1000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Protein length

R= −0.41

p= 0.05 p= 0.89 p= 0.15

p= 2.e-5 p= 0.015 p= 2.e-8

Fig. S6. The stickiness scale captures information on protein stability and suggests that surface amino acids contribute little to stability compared with interioramino acids. Small proteins have a smaller volume-to-surface ratio (1) and thus a smaller hydrophobic core than large proteins. To meet the stability re-quirement, small proteins are therefore expected to exhibit particular properties (2). We observed in Fig. 1 that abundant proteins have a decreased surfacestickiness. To test whether this observation may be linked to stability, we compare the stickiness of surface (Lower panels) and interior (Upper panels) aminoacids as a function of protein length, which should be linked to stability. Stickiness of interior amino acids appears as such a requirement because we observethat small proteins have a stickier interior. However, surface amino acids have comparatively little to no influence on stability, as the correlation betweenstickiness and protein length tends to disappear for surface amino acids.

1. Chothia C (1975) Structural invariants in protein folding. Nature 254(5498):304–308.2. Pereira de Araújo AF, Gomes AL, Bursztyn AA, Shakhnovich EI (2008) Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient

information to determine the tertiary structure of small globular proteins. Proteins 70(3):971–983.

E. coli

S. cerevisiaeH. sapiens

a.a. surface frequency of Biological Units

Central a.a. frequency in the 10% most stickypatches at the surface of Biological Units

a.a. interface frequency

Central a.a. frequency in the 10% most stickypatches at known biological interfaces

Central a.a frequency in the 10% most stickypatches at the surface of single chains

Fre

qu

en

cyFr

eq

ue

ncy

A C D E F G H I K L M N P Q R S T V W Y02

46

810

12

A C D E F G H I K L M N P Q R S T V W Y02

46

810

12

A C D E F G H I K L M N P Q R S T V W Y

Fig. S7. Residue frequencies are independent of the stickiness of the 400-Å2 surrounding surface patch. This implies that sticky patches on surfaces are unlikelyto be cryptic interfaces. The 10% most “interface-like” patches found at the protein surface were identified, and their central residue was recorded. Thefrequencies of these central residues (dark blue) closely follow those of surface residues (light blue) rather than frequencies of interface residues (yellow).Therefore, highly sticky patches are unlikely to represent unknown biological protein interfaces. To further control that this strategy is likely to detect in-terfaces, we carried out the same analysis but considered the entire protein surface (including known interfaces). When these real interfaces are included, thestickiness of central residues radically shifts and becomes close to that of known interfaces (gray).

Levy et al. www.pnas.org/cgi/content/short/1209312109 4 of 9

Page 11: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

01

23

4

p <

0.0

00

77

p <

0.0

67

p <

0.0

82

p <

0.1

9

p <

0.0

05

7

p <

0.6

3

p <

0.0

01

4

p <

0.3

6

p <

0.0

87

p <

0.0

58

p <

0.2

p <

0.0

02

1

Propensity of a residue to trigger a promiscuous interaction upon mutation

Propensity of a residue to trigger a promiscuous interaction upon mutation

rate

(pro

tein

)ra

te(i

nte

rfa

ce co

re r

esi

du

e)

p =9.4e−16 p =8.9e−10 p =1.2e−15

E. coli S. cerevisiae H. sapiens

rate

(pro

tein

)ra

te(i

nte

rfa

ce rim

re

sid

ue

)

01

23

4

p <

0.0

099

p <

0.0

5

p <

0.0

083

p <

0.0

29

p <

0.3

9

p <

0.0

89

p <

0.0

13

p <

0.3

8

p <

0.6

1

p <

0.8

4

p <

0.7

6

p <

0.0

13

59% differencep

65% differencep

9% differencep = 0.0021

A

B

Fig. S8. The divergence rate of residues at interface cores depends on the stickiness of their environment. (A and B) Interface core residues are grouped intofive classes on the basis of their surrounding 400-Å2 environment, i.e., from residues in nonsticky environments (gray) to those in sticky environments (orange).The boxplot distribution of evolutionary rates of residues is shown for each group. Importantly, the rate of each residue is normalized by the median rateof the entire protein. All classes have a median (black thick line) greater than 1 because interface cores are in general more conserved than the rest of theprotein. The significant difference in rates between the first and last bins indicates that the environment of a residue correlates with the divergence rate. Apossible explanation for this observation is that residues in highly sticky environments are more likely to trigger nonfunctional interactions if they mutate andare therefore subject to more selective pressure. Consistent with the trend observed throughout this work, the difference is less marked among Homo sapiensproteins, suggesting that there is more tolerance for promiscuous interactions in human versus yeast and Escherichia coli. P values were computed using theWilcoxon test.

Levy et al. www.pnas.org/cgi/content/short/1209312109 5 of 9

Page 12: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Table S1. Correlation between protein abundance and protein surface scores

Scale ID Description E. coli S. cerevisiae H. sapiens

WIMW960101 Free energies of transfer of acwl-x-ll peptides from bilayerinterface to water (1)

−0.44 −0.28 −0.17

PONP800103 Average gain ratio in surrounding hydrophobicity (2) −0.44 −0.27 −0.22PONP800108 Average number of surrounding residues (2) −0.42 −0.24 −0.21ROSG850102 Mean fractional area loss (3) −0.41 −0.26 −0.21PONP800102 Average gain in surrounding hydrophobicity (2) −0.39 −0.28 −0.21GOLD730101 Hydrophobicity factor (4) −0.35 −0.27 −0.13KANM800102 Average relative probability of β-sheet (5) −0.35 −0.29 −0.16PONP800101 Surrounding hydrophobicity in folded form (2) −0.32 −0.27 −0.19BLAS910101 Scaled side-chain hydrophobicity values (6) −0.32 −0.11 −0.10ROSG850101 Mean area buried on transfer (3) −0.27 −0.47 −0.28VENT840101 Bitterness (7) −0.26 −0.39 −0.15LAWE840101 Transfer free energy between N-cyclohexyl-2-pyrrolidone

and water (8)−0.26 −0.19 −0.11

KANM800104 Average relative probability of inner β-sheet (5) −0.25 −0.22 −0.16PONP800106 Surrounding hydrophobicity in turn (2) −0.25 −0.33 −0.28MANP780101 Average surrounding hydrophobicity (9) −0.23 −0.20 −0.15CHAM820101 Polarizability parameter (10) −0.21 −0.43 −0.23WILM950101 Hydrophobicity coefficient in reverse phase high-performance

liquid chromatography (rp-hplc) (11)−0.20 −0.39 −0.19

NAKH900105 Amino acid composition of mitochondrial proteins from animal (12) −0.19 −0.16 −0.06NAKH900112 Transmembrane regions of mitochondrial proteins (12) −0.18 −0.15 −0.07ARGP820101 Hydrophobicity index (13) −0.18 −0.19 −0.09JOND750101 Hydrophobicity (14) −0.18 −0.19 −0.09ZIMJ680101 Hydrophobicity (15) −0.17 −0.17 −0.09PONP800107 Accessibility reduction ratio (2) −0.17 −0.21 −0.14GOLD730102 Residue volume (4) −0.17 −0.39 −0.19BIGC670101 Residue volume (16) −0.16 −0.41 −0.19COWR900101 Hydrophobicity index, 3.0 ph (17) −0.16 −0.31 −0.18NAKH900103 Amino acid composition of mt proteins (12) −0.15 −0.18 −0.04NAKH900111 Transmembrane regions of non-mt proteins (12) −0.15 0.06 −0.05CHAM820102 Free energy of solution in water (kcal/mol) (10) −0.14 −0.17 −0.12WILM950102 Hydrophobicity coefficient in rp-hplc (11) −0.13 −0.23 −0.05WILM950103 Hydrophobicity coefficient in rp-hplc (11) −0.13 −0.08 −0.16WOLR790101 Hydrophobicity index (18) −0.10 0.01 −0.15YUTK870102 unfolding gibbs energy in water, ph 9.0 (19) −0.09 −0.09 −0.01FAUJ830101 Hydrophobic parameter π (20) −0.09 −0.39 −0.08PONP800105 Surrounding hydrophobicity in β-sheet (2) −0.08 0.01 −0.05PONP930101 Hydrophobicity scales (21) −0.07 −0.22 0.00NAKH900107 Amino acid composition of mt proteins from fungi and plant (12) −0.05 −0.20 0.02CIDH920101 Normalized hydrophobicity scales for α-proteins (22) −0.02 −0.19 −0.05NAKH900109 Amino acid composition of membrane proteins (12) 0.00 0.24 0.06NAKH900108 Normalized composition from fungi and plant (12) 0.01 −0.31 −0.06WILM950104 Hydrophobicity coefficient in rp-hplc (11) 0.02 −0.27 −0.06JURD980101 Modified Kyte–Doolittle hydrophobicity scale (23) 0.03 −0.24 −0.10KIDA850101 Hydrophobicity-related index (24) 0.03 −0.17 −0.06EISD860103 Direction of hydrophobic moment (25) 0.05 −0.39 −0.13EISD860102 Atom-based hydrophobic moment (25) 0.05 −0.10 −0.02YUTK870101 Unfolding Gibbs energy in water, ph 7.0 (19) 0.05 −0.11 0.05FASG890101 Hydrophobicity index (26) 0.05 −0.28 −0.07EISD840101 Consensus normalized hydrophobicity scale (27) 0.06 −0.13 −0.04CIDH920103 Normalized hydrophobicity scales for α+β-proteins (22) 0.08 −0.31 0.04CIDH920105 Normalized average hydrophobicity scales (22) 0.08 −0.37 −0.05CIDH920104 Normalized hydrophobicity scales for α/β-proteins (22) 0.08 −0.34 −0.04PONP800104 Surrounding hydrophobicity in α-helix (2) 0.12 0.17 0.07KANM800103 Average relative probability of inner helix (5) 0.12 0.05 −0.02CIDH920102 Normalized hydrophobicity scales for β-proteins (22) 0.13 −0.31 −0.04SWER830101 Optimal matching hydrophobicity (28) 0.13 −0.32 −0.06NAKH900104 Normalized composition of mt proteins (12) 0.17 −0.23 −0.01NAKH900106 Normalized composition from animal (12) 0.18 −0.14 0.00YUTK870104 Activation Gibbs energy of unfolding, ph 9.0 (19) 0.18 0.23 0.17NAKH900101 Amino acid composition of total proteins (12) 0.19 0.32 0.20YUTK870103 Activation Gibbs energy of unfolding, ph 7.0 (19) 0.20 0.24 0.17

Levy et al. www.pnas.org/cgi/content/short/1209312109 6 of 9

Page 13: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Table S1. Cont.

Scale ID Description E. coli S. cerevisiae H. sapiens

NAKH900102 SD of amino acid composition of total proteins (12) 0.20 0.42 0.17RACS770101 Average reduced distance for c-α (29) 0.21 0.34 0.17NAKH900113 Ratio of average and computed composition (12) 0.22 0.34 0.15KANM800101 Average relative probability of helix (5) 0.24 0.13 0.04LEVM760101 Hydrophobic parameter (30) 0.26 −0.02 0.04CASG920101 Hydrophobicity scale from native protein structures (31) 0.29 −0.02 0.11NAKH900110 Normalized composition of membrane proteins (12) 0.30 0.08 0.06PRAM900101 Hydrophobicity (32) 0.31 0.00 0.06RACS770103 side chain orientational preference (29) 0.31 0.23 0.16ENGD860101 Hydrophobicity index (33) 0.31 0.00 0.06RACS770102 Average reduced distance for side chain (29) 0.34 0.28 0.18

Scores were computed using scales from the AAindex database instead of the “stickiness” scale.

1. Wimley WC, White SH (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature Structural Biology 3(10):842–848.2. Ponnuswamy PK, Prabhakaran M, Manavalan P (1980) Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. Biochim Biophys Acta 623(2):301–316.3. Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH (1985) Hydrophobicity of amino acid residues in globular proteins. Science 229(4716):834–838.4. Goldsack DE, Chalifoux RC (1973) Contribution of the free energy of mixing of hydrophobic side chains to the stability of the tertiary structure of proteins. Journal of Theoretical

Biology 39(3):645–651.5. Kanehisa MI, Tsong TY (1980) Local hydrophobicity stabilizes secondary structures in proteins. Biopolymers 19(9):1617–1628.6. Black SD, Mould DR (1991) Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem 193(1):72–82.7. Venanzi TJ (1984) Hydrophobicity parameters and the bitter taste of L-amino acids. Journal of Theoretical Biology 111(3):447–450.8. Lawson EQ, et al. (1984) A simple experimental model for hydrophobic interactions in proteins. J Biol Chem 259(5):2910–2912.9. Manavalan P, Ponnuswamy PK (1978) Hydrophobic character of amino acid residues in globular proteins. Nature 275(5681):673–674.10. Charton M, Charton BI (1982) The structural dependence of amino acid hydrophobicity parameters. Journal of Theoretical Biology 99(4):629–644.11. Wilce MCJ, Aguilar M-I, Hearn MTW (1995) Physicochemical basis of amino acid hydrophobicity scales: Evaluation of four new scales of amino acid hydrophobicity coefficients derived

from RP-HPLC of peptides. Analytical Chemistry 67(7):1210–1219.12. Nakashima H, Nishikawa K, Ooi T (1990) Distinct character in hydrophobicity of amino acid compositions of mitochondrial proteins. Proteins: Structure, Function, and Bioinformatics

8(2):173–178.13. Argos P, Rao JK, Hargrave PA (1982) Structural prediction of membrane-bound proteins. Eur J Biochem 128(2-3):565–575.14. Jones DD (1975) Amino acid properties and side-chain orientation in proteins: a cross correlation appraoch. Journal of Theoretical Biology 50(1):167–183.15. Zimmerman JM, Eliezer N, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. Journal of Theoretical Biology 21(2):170–201.16. Bigelow CC (1967) On the average hydrophobicity of proteins and the relation between it and protein structure. Journal of Theoretical Biology 16(2):187–211.17. Cowan R, Whittaker RG (1990) Hydrophobicity indices for amino acid residues as determined by high-performance liquid chromatography. Peptide Research 3(2):75–80.18. Wolfenden RV, Cullis PM, Southgate CC (1979) Water, protein folding, and the genetic code. Science 206(4418):575–577.19. Yutani K, Ogasahara K, Tsujita T, Sugino Y (1987) Dependence of conformational stability on hydrophobicity of the amino acid residue in a series of variant proteins substituted at

a unique position of tryptophan synthase alpha subunit. Proc Natl Acad Sci USA 84(13):4441–4444.20. Fauchere JL, Pliska V (1983) Hydrophobic parameters pi of amino acid side chains from the partitioning of N-acetyl-amino acid amides. Eur J Med Chem 18:369–375.21. Ponnuswamy PK (1993) Hydrophobic characteristics of folded proteins. Prog Biophys Mol Biol 59(1):57–103.22. Cid H, Bunster M, Canales M, Gazitua F (1992) Hydrophobicity and structural classes in proteins. Protein Eng 5(5):373–375.23. Jureti�c D, Lu�ci�c B, Zuci�c D, Trinajsti�c N (1998) Protein transmembrane structure: recognition and prediction by using hydrophobicity scales through preference functions. Theoretical

and Computational Chemistry, ed Cyril P (Elsevier), Vol 5, pp 405–445.24. Kidera A, Konishi Y, Oka M, Ooi T, Scheraga H (1985) Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 4(1):23–55.25. Eisenberg D, McLachlan AD (1986) Solvation energy in protein folding and binding. Nature 319(6050):199–203.26. Fasman GD (1989) Prediction of protein structure and the principles of protein conformation (Plenum, New York).27. Eisenberg D (1984) Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem 53:595–623.28. Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488.29. Rackovsky S, Scheraga HA (1977) Hydrophobicity, hydrophilicity, and the radial and orientational distributions of residues in native proteins. Proc Natl Acad Sci USA 74(12):5248–5251.30. Levitt M (1976) A simplified representation of protein conformations for rapid simulation of protein folding. J Mol Biol 104(1):59–107.31. Casari G, Sippl MJ (1992) Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. J Mol Biol

224(3):725–732.32. Prabhakaran M (1990) The distribution of physical, chemical and conformational properties in signal and nascent peptides. Biochem J 269(3):691–696.33. Engelman DM, Steitz TA, Goldman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annual Review of Biophysics and Biophysical

Chemistry 15:321–353.34. Kawashima S, et al. (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36(Database issue):D202–D205.

Levy et al. www.pnas.org/cgi/content/short/1209312109 7 of 9

Page 14: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Table S2. Interface propensities used as the stickiness score

Amino acid Interface propensity

A 0.0062C 1.0372D −0.7485E −0.7893F 1.2727G −0.1771H 0.1204I 1.1109K −1.1806L 0.9138M 1.0124N −0.2693P −0.1799Q −0.4114R −0.0876S 0.1376T 0.1031V 0.7599W 0.7925Y 0.8806

Table S3. Information on species pairs used for the arginine–lysine and aspartic acid-–glutamic acid substitution frequency analysis

Species 1 Species 2No. orthologous

pairsAverage % sequence

divergenceNo. conservedD, E, R, or K

No. of substitutionsbetween D-E or R-K

Escherichia coli Salmonella typhimurium 3,102 (868) 13.8 (10.4) 180,634 (65,231) 8,818 (2,743)Saccharomyces cerevisiae Saccharomyces paradoxus 3,798 (3,529) 8.0 (7.4) 463,813 (440,319) 14,822 (13,603)Homo sapiens Mus musculus 7,314 (4,435) 11.3 (10.8) 810,931 (525,740) 34,766 (21,876)

The numbers in parentheses refer to the subset of proteins with known abundance data.

Table S4. Species used in alignments of orthologous proteins

Escherichia coli set (no. orthogroupswith at least 10/13 species and a knownstructure = 397)

Saccharomyces cerevisiae set(no. orthogroups with at least 13/15 species

and a known structure = 196)

Homo sapiens set(no. orthogroups with at least 8/9 species

and a known structure = 701)

E. coli (K12 MG1655) S. cerevisiae H. sapiensVibrio parahaemolyticus Saccharomyces paradoxus Mus musculusBurkholderia cenocepacia (J2315) Saccharomyces mikatae Gallus gallusProteus mirabilis Saccharomyces bayanus Pan troglodytesPseudomonas fluorescens (SBW25) Candida glabrata Rattus norvegicusShewanella baltica (OS223) Saccharomyces castellii Danio rerioSerratia proteamaculans (568) Kluyveromyces lactis Xenopus tropicalisAeromonas salmonicida (A449) Ashbya gossypii Bos taurusSalmonella enterica Enteritidis (P125109) Kluyveromyces waltii Malus domesticaPseudomonas aeruginosa (LESB58) Debaryomyces hanseniiSalmonella typhimurium (LT2) Candida albicansYersinia enterocolitica (8081) Yarrowia lipoliticaAeromonas hydrophila (ATCC7966) Candida tropicalis

Candida guilliermondiiCandida lusitaniae

Proteins within an orthologous group were aligned with MUSCLE (1), and evolutionary rates of each amino acid were calculated using Rate4Site (2).

1. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797.2. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary

determinants within their homologues. Bioinformatics (Oxford, England) 18(Suppl 1):S71–S77.

Levy et al. www.pnas.org/cgi/content/short/1209312109 8 of 9

Page 15: Cellular crowding imposes global constraints on the chemistry … · Cellular crowding imposes global constraints on the chemistry and evolution of proteomes Emmanuel D. Levya,b,c,1,

Table S5. General statistics on the structural datasets used

Species name No. structural chains

Escherichia coli 397 (172)Saccharomyces cerevisiae 196 (193)Homo sapiens 631 (495)

Numbers in parentheses correspond to the number of structures withcorresponding abundance data available.

Levy et al. www.pnas.org/cgi/content/short/1209312109 9 of 9