Upload
dan-bolser
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
1/23
Maps of protein structure space reveal a fundamentalrelationship between protein structure and functionMargarita Osadchy and Rachel Kolodny1
Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel
Edited by Sung-Hou Kim, University of California, Berkeley, CA, and approved June 7, 2011 (received for review February 20, 2011)
To study the protein structurefunction relationship, we propose a
method to efficiently create three-dimensional maps of structure
space using a very large dataset of >30,000 Structural Classification
of Proteins (SCOP) domains. In our maps, each domain is repre-
sented by a point, and the distance between any two points
approximates the structural distance between their corresponding
domains. We use these maps to study the spatial distributions of
properties of proteins, and in particular those of local vicinities in
structure space such as structural density and functional diversity.
These maps provide a unique broad view of protein space and thus
reveal previously undescribed fundamental properties thereof. Atthe same time, the maps are consistent with previous knowledge
(e.g., domains cluster by their SCOP class) and organize in a unified,
coherent representation previous observation concerning specificprotein folds. To investigate the functionstructure relationship,
we measure the functional diversity (using the Gene Ontology con-
trolled vocabulary) in local structural vicinities. Our most striking
finding is that functional diversity varies considerably across struc-
ture space: The space has a highly diverse region, and diversity
abates when moving away from it. Interestingly, the domains in
this region are mostly alpha/beta structures, which are known to
be the most ancient proteins. We believe that our unique perspec-
tive of structure space will open previously undescribed ways of
studying proteins, their evolution, and the relationship between
their structure and function.
global map of protein universe protein function prediction
protein structure universe
Investigating protein structure space and its relationship to func-tion space is a fundamental scientific challenge. Characterizingthis relationship may also carry practical implications to proteinfunction prediction, whereby one wishes to infer the biologicalrole of a protein from its structure [as is the case with manyof the structures solved in the high-throughput pipeline of theStructural Genomics projects (1, 2)]. One way to approach thischallenge is to represent protein structure space by three-dimen-sional maps. Maps of structure space were first introduced byHolm and Sander (3) and were later used by Kim and colleagues(46). To calculate their maps, they first calculate the structuralsimilarity between all pairs of protein structures. Then, they usemultidimensional scaling (MDS) to find a collection of points in
three dimensions, each of which corresponds to a protein, andwhere the distance between any two points depends on the struc-tural similarity of the proteins they represent. Such a representa-tion provides a comprehensive visual view of structure space,
which is not constrained by a hierarchical system such as theStructural Classification of Proteins (SCOP) (7).
We propose an efficient way to calculate maps of protein struc-ture space, using the recently introduced FragBag model (8).Using FragBag, we represent each structure as a point in a high-dimensional space and project these points to three dimensions.It was recently shown that the similarity between the FragBag
vectors, or the points in the high-dimensional space, can identifynear structural neighbors as accurately as the state-of-the-artstructural aligners STRUCTAL and CE, for several definitionsof near structural neighbors (8). Because FragBag models struc-
tures as fixed-size vectors, we can replace MDS with a more effi-cient procedure, Principal Component Analysis (PCA) (9). Thus,
we can map a very large set of >30;000 protein structures. Ratherthan studying single structures, we study properties such asstructural density and functional diversity, which are defined ateach point of structure space through a whole collection of struc-tures in the vicinity of that point. By coloring the maps accordingto the values of these properties, we are able to visualize theirdistribution across structure space. This way we discover thatstructure space has a region of high functional diversity and thatthis region consists mainly of alpha/beta structures, which areknown to be the most ancient proteins (10). We believe thatstudying such maps holds great promise to revealing important
properties of protein structure space, its relation to function, andperhaps even to sequence.
Results
Constructing Functional Diversity Maps of Protein Structure Space. Tostudy protein structure space we analyze a set of 31,155 SCOP
v1.71 (7) domains. We initially represent each such domain bya 400-long FragBag vector, which may be thought of as a pointin 400-dimensional space. In the FragBag model, a protein struc-ture is represented by a count vector of backbone fragments takenfrom a library of 400 commonly occurring 12-residue fragments.For each contiguous (and overlapping) 12-residue segment alongthe protein backbone, we identify the library fragment that fits itbest in terms of RMSD after optimal superposition. The ith entryin the FragBag vector is the number of times the librarys ith
fragment was found to be the best fit. The FragBag distance be-tween two domains is the distance between their FragBag vectors.We have recently shown that this distance is a good approxima-tion of the structural distance, as quantified by structural align-ment (8). Using principal component analysis (PCA) (11), wethen project the points to three-dimensional space. The eigenva-lues of the resulting data covariance matrix (Fig. S1) drop sharplyand the fourth largest eigenvalue (0.0106) is 8% of the firstlargest eigenvalue (0.1326); this indicates that three dimensionscan adequately represent the essential features of protein struc-ture space. Fig. 1 BD shows a three-dimensional map of proteinstructure space, in which each domain is colored by its SCOPclass (7); we show three views of the map from three angles, toget a better sense of it. As expected, the domains cluster by their
SCOP class.The density of protein structure space is uneveni.e., certainregions have more domains per unit volume than others. Thiscan be seen in Fig. 1 FH, which shows again the three viewsof the map, now colored according to the density score of eachdomainthe number of domains that are within a 0.005 distance
Author contributions: M.O. and R.K designed research, R.K performed research; R.K
analyzed data; and M.O and R.K wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
1To whom correspondence may be addressed. E-mail: [email protected] or trachel@cs.
haifa.ac.il.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1102727108/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1102727108 PNAS July 26, 2011 vol. 108 no. 30 1230112306
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplementalhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdf8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
2/23
from it. Certain proteins are more studied than others, and as aresult, more variants thereof are included in our dataset. To ruleout this bias as the source of the observed uneven density, weprepared similar density maps, based on 40% and 95% sequencenonredundant subsets of the original data (containing 2,517 and4,238 domains, respectively). The results, shown in Fig. S2, arequalitatively identical to the original density map, and the corre-lation between the original density scores and those based on therestricted sets are very high (r 0.945 and r 0.960 for the 40%and the 95% sets, respectively). In the remainder of this study, weuse the full dataset.
Upon inspecting Fig. 1, one can see that there is a relation
between the SCOP class and the density score of a domain.Fig. 2A, which is a histogram of the density scores of the domains,color-coded by their SCOP class, shows this more clearly: Thealpha beta (yellow) and the all-beta (red) domains tend to re-side in low-density regions, whereas the all-alpha (blue) domainsconstitute the vast majority in the very high-density regions.
Next, we investigate how functional diversity varies acrossstructure space; for this, we quantify the functional diversity inthe vicinity of each domain in our dataset. We consider three de-finitions for the vicinity of a domain d: (i) Vfn is a fixed number(100) of the nearest structural neighbors ofd, (ii) Vsamp is a sam-ple of fixed size (100) from the domains that lie within a fixedstructural distance (0.005) from d, and (iii) Vfd is the collectionof all domains that are within some fixed structural distance(0.005) from d. Although Vfd is perhaps the most natural defini-
tion, it makes the vicinities of domains in denser regions containfar more members, which may bias the results.
Our measure for the functional diversity in a vicinity of a pro-tein, however vicinity is defined, is the number of distinct func-tions that the domains within this vicinity possess. To determinefunction, we use the functional annotations of the proteins fromthe Gene Ontology molecular function (GO-MF) controlled
vocabulary (12), and the mapping of terms to SCOP domainscalculated by Lopez and Pazos (13). When a single domain is an-
notated as having more than one function, we include all its func-tions toward the count.
Structure Space Has a Core of High Functional Diversity. Fig. 3 BDshows a functional diversity map of protein structure space. Thedomains in the map are color-coded according to the functionaldiversity of their vicinities (red for the most diverse ones; blue forthe least diverse), and vicinity is defined to be Vsamp (when there
were fewer than 100 domains within this distance, all wereincluded). This map shows a striking pattern: Protein space hasa highly diverse core, and diversity drops gradually toward itsperiphery (we denote the high diversity region core, because ofits location in our maps). Figs. S3 and S4 show the maps con-structed using the two alternative definitions of a vicinity, Vfnand Vfd; the results are very similar.
As a control for the validity of our finding, we re-created thediversity map (using Vsamp again) after randomly permuting thefunctional annotations across all domains (i.e., the set of func-tional annotations originally associated with each domain wasassociated with a different, randomly chosen domain). If our find-ing were merely an artifact of the projection to three dimensions,or of some feature of protein structure space (say, the unevendensity), the resulting diversity map would show again a highlydiverse core. Fig. 3 FHshows that this is not the case: Under the
Fig. 1. Maps of protein structure space. Each point represents a SCOP
domain, and the distance between any two points approximates the struc-
tural distance between their corresponding domains. BD show the map of
the SCOP classes: As expected, the points are clustered. FH show the struc-
tural density map, where the color of each point indicates the number of
domains that lie in its vicinity of fixed distance (denoted Vfd). We see that
the highest density is within the regions of the all-alpha domains, followed
by a region in the alpha/beta domain and in the all-beta domain. Fig. S2
shows a similar density map when considering sequence nonredundant
samples of the protein world.
B
A C1,600
Fig. 2. Structural density and functional diversity by SCOP class. We calculate
the separate histograms of structural density (A) and functional diversity
(B) of each of the SCOP classes and stack them one on top of the other.
We see that the densest regions are populated by all-alpha domains, and the
most functionally diverse regions by the alpha/beta domains. See Table S2
(listing the exact proportions of each of the SCOP classes, among the top
10%20% most dense/functionally diverse domains) and Fig. S12 for support-
ing evidence.
12302 www.pnas.org/cgi/doi/10.1073/pnas.1102727108 Osadchy and Kolodny
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdf8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
3/23
random permutation, the diversity score of almost all domains isvery high (colored in orange and red), and the map has no pro-
minent diverse core; the relatively few domains with low diversityscores (colored in blue) are mostly isolated domains, having fewerthan 100 neighbors within a 0.005 distance, and thus necessarilyless diverse vicinities (there are 5,356 such domains). The exis-tence of the diverse core is indeed a statistically significant finding(p < 0.005; see Methods for details). When using Vfn, the resultsare very similar (Fig. S3); as expected, when using Vfd, diversity ishighly correlated (r 0.953) with density, because domains indenser regions now have more members in their vicinities, andthus more functional annotations (Fig. S4).
We can reliably predict the functional diversity of structures ina randomly chosen test set, using the mapping calculated for atraining set. Our test set consists of 250 randomly chosen struc-tures from the sequence nonredundant set (using a 40% sequenceidentity threshold); it has 52, 40, 92, 52, and 14 domains of the
SCOP classes all-alpha, all-beta, alpha/beta, alpha beta, andothers, respectively. The training set has the 29,014 domains thatshare no sequence similarity with the test set proteins (BLASTE-value threshold of103 and sequence identity of 40%). UsingPCA of the training set FragBag data, we calculate the projectionPtrain to R
3. For each test set proteins p, we calculate Ptrainp andidentify the structures in ps training-set vicinity. The predictedfunctional diversity score is the number of unique GO-MF terms
within this vicinity. Fig. S5 plots the predicted functional diversity
scores vs. the ones calculated using the complete dataset for thethree definitions of vicinity, Vsamp, Vfn, and Vfd, and shows thatthese scores are highly correlated (r > 0.96).
A potential explanation for the high functional diversity inthe core is that the core contains a high proportion of multiple-function domains, compared to the periphery (recall that multi-ple-function domains contribute all their functions toward thediversity). This is not the case: Fig. S6 shows a functional multi-plicity map of structure space, i.e., a map in which each point iscolored according to the number of GO-MF annotations of thedomain it represents. The high functional diversity core seen inFig. 3 and Fig. S3 does not overlap with a region of high func-tional multiplicity. Further, we see the highly diverse core evenafter reconstructing the functional diversity maps using onlydomains annotated by only one function (61% of the data);
see Fig. S7.Another potential, yet invalid, explanation for the high func-
tional diversity in the core is related to the uneven degree ofdetail in the GO-MF vocabulary. The GO is implemented as ahierarchical directed graph, in which the terms are placed atthe nodes and the edges direct from the general to the specific.The level of detail in the GO-MF graph is uneven: Some areasare better studied and correspond to subgraphs of the GO-MFgraph that have more levels and, ultimately, more functionalannotations. In addition, proteins of the same function some-times have annotations at different levels (14). One could arguethat perhaps the proteins that lie in the core happen to have func-tions that are described in finer detail, and the apparent highdiversity of this core is merely an artifact of the uneven level ofdetail in the GO-MF graph. To demonstrate that this, again, is notthe case, we create functional diversity maps based on Watson etal.s GO-slim controlled vocabulary (14). GO slim is a trimmed
variant of GO-MF in which function is defined more broadly, byonly 190 terms (out of >7;800); in particular, GO slim targets alevel of detail in which neighboring proteins in structure spacehave similar functions. Fig. S8 shows a map that was constructedsimilarly to the one in Fig. 3 and Fig. S3, except that the functionannotations are replaced by their more general terms in theGO-slim graph. Once again we see the same phenomenon: adiverse core and more homogeneous periphery. Indeed, these al-ternative scores are highly correlated with the original diversityscore (r > 0.895); see Table S1.
We also consider three alternatives to the functional diversityscore used above. Two of these alternatives are based on a
weighted count of distinct GO-MF terms within a vicinity, ratherthan on a simple count. In the first, commonly occurring termshave a lower weight, and in the second, more specific terms(i.e., ones that are farther from the root in the GO-MF graph)have a lower weight. In the third alternative, the score is basedon the coherence measure proposed in refs. 15 and 16, whichquantifies the contribution of a functional annotation term toa vicinity based on statistical tests. When using vicinity definitionsVsamp, and Vfn, these alternative scores are correlated with theoriginal diversity score (r > 0.79); see Table S1. Indeed, the func-tional diversity maps under each of the three alternative scores,shown in Figs. S9S11, look similar to the one in Fig. 3.
Characterizing the Cores Structures. A comparison of the func-tional diversity maps (Fig. 3 BD) and the SCOP-class maps
Fig. 3. Functional-diversity map of protein structure space. The color of a
point indicates the degree of functional diversity measured by the numberof distinct GO-MF terms annotating the domains in its vicinity. Here, we use
the Vsamp definition for a vicinity of a protein: a sample of fixed size from all
domains that fall within a fixed distance from it. AD show the functional
diversity for the true data; EH show the functional diversity of a random
world, in which the proteins have the same structures, yet their functions
are assigned at random. We see that when using the true functional annota-
tions, there is a core of high functional diversity, and that functional diversity
drops toward the periphery. Alternatively, when the functions are assigned
at random, there is no such core, and function diversity is uniformly high. The
figures in SI Appendix, and Table S1, show that the results are qualitatively
similar when using alternative datasets, scoring functions, and the more
uniform (coarser) annotation graph GO slim.
Osadchy and Kolodny PNAS July 26, 2011 vol. 108 no. 30 12303
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdf8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
4/23
(Fig. 1 BD) reveals that the core of high functional diversity con-sists mainly of alpha/beta domains (colored in green). Fig. 2B andFig. S13 A and B show this finding in another way, via histogramsdetailing the contribution of each SCOP class to the functionaldiversity scores. Table S2 lists the exact proportions of the variousSCOP classes among the domains with the top 10% and 20%functional diversity scores; in all cases (including when consider-ing the diversity scores only within the sequence nonredundantsets), the majority of the high functional diversity domains arealpha/beta proteins.
Fig. 4 highlights several SCOP folds that lie in the diverse coreof structure space. The full dataset is shown in Fig. 4A with ablack outline; Fig. 4 BF show specific SCOP folds within this
outline, alongside the histograms of their functional diversityscores. The most obvious candidate for the SCOP fold whosestructures lie in the core is the TIM barrels (c.1), which are wellknown to accommodate many functions (2). Indeed, these lie inthe core, and their functional diversity scores are clearly highercompared with the full dataset (Fig. 4B). We see, however, thatthe TIM barrels are only a part of the picture, as the core containsalso many other domains. Fig. 4C shows SCOP fold adenine nu-cleotide alpha hydrolase-like (c.26) that was also noted as accom-modating many functions (1) and also lies within the core.
To identify more SCOP folds in the core, we search for foldswith (more than 25) domains that lie in functionally diverse vici-nities. We quantify the diversity of a SCOP fold by the averageand the median of the diversity scores of its domains, using thediversity scores based on the three definitions of vicinity. Table S3
lists the 20 most diverse folds under these measures: Each of theresulting six measures identifies different SCOP folds as the mostdiverse. To identify SCOP folds that are truly diverse, we considerfolds that are among the 20 most diverse folds under all sixmeasures. Nine folds satisfy this condition: 7-stranded beta/alphabarrel (c.6), ClpP/crotonase (c.14), methylglyoxal synthase-like(c.24), arginase/deacetylase (c.42), phosphorylase/hydrolase-like(c.56), alpha/beta-hydrolases (c.69), AraD-like aldolase/epimer-ase (c.74), amidase signature enzymes (c.117), protein kinase-like (PK-like) (d.144). As expected, the domains in these folds areindeed located in the core; Fig. 4 DF shows three examples.
Better Predicting of Function from Structure in Regions of Low
Functional Diversity. We use the set of 90 proteins* studied by Wat-son et al. (14) to assess if one can indeed better predict functionfor proteins in regions of structure space having low functionaldiversity. Watson et al. predicted function using global structuralsimilarity [as detected by secondary-structure matching (SSM)(17)] and evaluated the correctness of their predictions. Fig. S13maps the protein structures used in their experiment: on the rightthese structures within our dataset, and on the left, the samestructures with markers indicating if the prediction was correct.We see that Watson et al. better succeed in predicting the func-tion of proteins that lie in regions of low functional diversity.
Fig. 4. SCOP folds that lie in the functionally diverse core. We highlight the location in structure space of specific SCOP folds and show histograms of the
diversity of the domains of these folds; for comparison, A shows the full dataset (a copy of Fig. 3 A and B) outlined in black. B and Cshow two SCOP folds that
are known to be functionally diverse, the TIM barrel fold (c.1) and the adenine nucleotide alpha hydrosase-like fold (c.26). Indeed, the domains of these two
folds are located in the highly diverse core of structure space. There are, however, many other domains in the core. DF show three more examples of SCOP
folds that lie in the highly diverse core: phosphorylase/hydrolase-like (c.56), alpha/beta-Hydrolases (c.69), and protein kinase-like (PK-like, d.144), respectively.
Table S3 lists the mean and average functional diversity scores for several SCOP folds that lie in the core.
*Denoted the known-function dataset; 1nrh, 1tea were removed because they are
obsolete.
12304 www.pnas.org/cgi/doi/10.1073/pnas.1102727108 Osadchy and Kolodny
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdfhttp://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1102727108/-/DCSupplemental/Appendix.pdf8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
5/23
We quantify this difference by separating the proteins to twosets, according to their functional diversity, and comparing thesuccess rate in these sets. The first set consists of 35 proteins hav-ing high diversity (45) vicinities, and the second consists of 55proteins having low diversity (
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
6/23
This is a generalization of a call for caution recently made withrespect to function prediction for TIM barrels (24). Indeed, ouranalysis of Watson et al.s data (14) shows that they were moresuccessful in predicting function from structure for proteins lyingin less diverse regions of structure space. Thus, it seems that onecould use the functional diversity maps to better choose the para-meters of structure-based function prediction, according to thelocation of the target protein in structure space, and perhaps evento assign confidence levels for the prediction.
Materials and MethodsRepresenting Protein Domains in 400 Dimensional Space. For each domain in
the dataset, we calculate FragBag (8) description vectors of length L 400
based on a library of 400 12-mer fragments (http://cs.haifa.ac.il/~ibudowsk/
libraries/centers400_12.txt); each entry in the vector is the number of times
thecorresponding library fragment wasthe best approximation of anyof the
12-mer fragments in the backbone of the represented protein. A list of one
or more GO-MF annotations is associated with each domain. Our dataset
includes N 31;155 SCOP v1.71 (7) domains for which Lopez and Pazos
(13) provide a GO annotation. We have previously shown that the cosine dis-
tance between two FragBag vectors best approximates the structural align-
ment score (SAS) (25) between their corresponding structures (8). Notice that
the Euclidean (norm 2) distance between two FragBag vectors that were nor-
malized to length 1 is exactly twice their cosine distance. To see this, consider
p1 and p2 two FragBag vectors, and let ^p1 p1p1
, ^p2 p2p2
be the normalized
vectors. The cosine distance between p1 and p2 is 1 cosp1;p2 1 ^p1T ^p2;
the Euclidean distance between the normalized vectors is
^p1 ^p2T^p1 ^p2 ^p1
T^p1 ^p2T^p2 2^p1
T^p2 2 2^p1T^p2
21 ^p1T^p2:
Thus, we normalize all FragBag vectors and consider the Euclidean (norm 2)
distance; because all distances are relative, the uniform factor 2 is of no
consequence.
Projecting to Three Dimensions. We store the normalized descriptions of
length L 400 of the N structures in our dataset in an L N matrix and
project it to three dimensions using PCA. Namely, after centering the
L N coordinates about the origin (by subtracting their mean), we calculate
the L L covariance matrix (normalized by N) and find the eigenvectors cor-
responding to its three largest eigenvalues. By multiplying theseeigenvectors(a 3 L matrix) by the L N data matrix, we find the 3 N matrix that is the
projection of our data to three dimensions. There, the Euclidean (norm 2)
distances between two 3D vectors is an approximation of their Euclidean
(norm 2) distances in L dimensions. We emphasize that this requires only
the easy computation of finding the top three eigenvalues and eigenvectors
of the relatively small L L matrix. This is in contrast to the slightly different
calculation done in previous studies: Given N structures, they calculate a sym-
metric matrix D of size N N of all pairwise structural distances and use MDS
to find the coordinates of the points representing these N structures in three
(or two) dimensions (3, 5). The technical bottleneck in the MDS calculation is
finding the top three (or two) eigenvalues and eigenvectors of an N N
matrix derived from D (26); it is a challenging computation for datasets of
several tens of thousands proteins. Indeed, the datasets in previous studies
were smaller (e.g., less than 1,900 structures in ref. 6).
Calculating Alternative Functional Diversity Scores.Each of the domains in thedataset has a list of its GO-MF terms; in each case, the terms are the most
specific ones (rather than the term and all its parents). For each term, we
calculate its weighted functional diversity in two ways: (i) (110* the fraction
of its occurrence), where the fraction of its occurrence is the fraction of
domains that are annotated by it; the scaling factor was determined to
be 10, to better space the range of values in the dataset. (ii) The inverse
of the depth of the term in the GO-MF annotation graph; the depth is
the number of times we can replace the terms by more general ones until
we reach the root. There are seven cases (out of 9,500) in which a term
has two different depths, and these differ by at most three (this is a conse-
quence of GO being a graph rather than a tree). In these cases, we use the
average depth. To calculate the coherence measure, we check for each
term and vicinity if the term is enriched in the vicinity, i.e., if it appears
at a rate that is statistically significant. The coherence is the percent of
the terms in a region that are enriched. Thus, the coherence measure is a
value between 0100%, and high coherence implies low diversity and viceversa; see ref. 16 for more details. Finally, the GO-slim annotation of a func-
tional term is the most specific parent(s) of the term that is present in the
GO-slim annotation graph.
Measuring the Spatial Spread of the Core in True and Random Associations of
Functional Annotations to Structures. We measure the spatial spread of the
most diverse domains by their average distance from their center of mass.
We consider two definitions of the most diverse proteins: ( i) all domains
whose diversity scores are greater than 0.8 max_diversity, where max_
diversity is the highest diversity score found in our dataset; (ii) the 20% most
functionally diverse proteins. We measure the spatial spread of the most
diverse domains in our dataset, and in 300 random assignments of the func-
tional annotations tolocations in structurespace. The average distance in the
true dataset for these two definitions is 0.0860 and 0.1131, respectively. In
the random permutations, the average distances are 0.3501 0.0151 and
0.3499 0.0220, respectively, resulting in a p value < 0.0033.
ACKNOWLEDGMENTS. We thank Yuval Nov, Golan Yona, Chen Keisar, and ouranonymous reviewers for their helpful comments. R.K was supported by theMarie Curie IRG Grant 224774.
1. Redfern OC, Dessailly B, Orengo CA (2008) Exploring the structure and function
paradigm. Curr Opin Struct Biol 18:394402.
2. Friedberg I (2006) Automated protein function predictionthe genomic challenge.
Brief Bioinform 7:225242.
3. Holm L, Sander C (1996) Mapping the protein universe. Science 273:595603.
4. Hou J, Jun SR, Zhang C, Kim SH (2005) Global mapping of the protein structure space
and application in structure-based inference of protein function. Proc Natl Acad Sci
USA 102:36513656.
5. Hou J, Sims GE, Zhang C, Kim SH (2003) A global representation of the protein fold
space. Proc Natl Acad Sci USA 100:23862390.
6. Choi IG, Kim SH (2006) Evolution of protein structural classes and protein sequence
families. Proc Natl Acad Sci USA 103:1405614061.
7. MurzinAG, Brenner SE,HubbardT,ChothiaC (1995)SCOP: A structural classificationof
proteins database for the investigation of sequences and structures. J Mol Biol
247:536540.
8. Budowski-Tal I, Nov Y, Kolodny R (2010) FragBag, an accurate representation of
protein structure, retrieves structural neighbors from the entire PDB quickly and
accurately. Proc Natl Acad Sci USA 107:34813486.
9. Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for
nonlinear dimensionality reduction. Science 290:23192323.
10. Winstanley HF, Abeln S, Deane CM (2005) How old is your fold? Bioinformatics
21:i449i458.
11. Jolliffe IT (2002) Principal Component Analysis (Springer, New York).
12. Harris MA, et al. (2004) The Gene Ontology (GO) database and informatics resource.
Nucleic Acids Res 32:D258261.
13. Lopez D, Pazos F (2009) Gene Ontology functional annotations at the structural
domain level. Proteins 76:598607.
14. Watson JD, et al. (2007) Towards fully automated structure-based function prediction
in structural genomics: A case study. J Mol Biol 367:15111522.
15. Segal E, et al. (2003) Module networks: Identifying regulatory modules and their
condition-specific regulators from gene expression data. Nat Genet 34:166176.
16. Slonim N, Atwal GS, Tkacik G, Bialek W (2005) Information-based clustering. Proc Natl
Acad Sci USA, 102 pp:1829718302.
17. Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast
protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr
60:22562268.
18. Kolodny R, Petrey D, Honig B (2006) Protein structure comparison: implications for
the nature of fold space, and structure and function prediction. Curr Opin Struct Biol
16:393398.
19. Petrey D, Honig B (2009) Is protein classification necessary? Toward alternative
approaches to function annotation. Curr Opin Struct Biol 19:363368.
20. Sims GE, Choi IG, Kim SH (2005) Protein conformational space in higher order phi-Psi
maps. Proc Natl Acad Sci USA 102:618621.
21. Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure
alignment methods: Scoring by geometric measures. J Mol Biol 346:11731188.
22. Melvin I, Weston J, Noble WS, Leslie C (2011) Detecting remote evolutionary
relationships among proteins by large-scale semantic embedding. PLoS Comput Biol
7:e1001047.
23. Levitt M, Gerstein M (1998) A unified statistical framework for sequence comparison
and structure comparison. Proc Natl Acad Sci USA 95:59135920.
24. Loewenstein Y, et al. (2009) Protein function annotation by homology-based infer-
ence. Genome Biol 10:207.
25. Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains
of bacteriophage repressors and the globin core. Current Biol 3:141148.
26. deSilva V, Tenenbaum JB (2004) Sparse multidimensional scaling using landmark
points. (Stanford Univ, Stanford, CA).
12306 www.pnas.org/cgi/doi/10.1073/pnas.1102727108 Osadchy and Kolodny
http://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txthttp://cs.haifa.ac.il/~ibudowsk/libraries/centers400_12.txt8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
7/23
Supplementary Material
Figure 1S: Scree plot of the 400 dimensional data. The Figure shows the 20 largest eigenvalues of the
(normalized) correlation matrix sorted in decreasing order; the insert shows the largest 200
eigenvalues of this matrix. The sharp drop up to the third eigenvalues suggests that three dimensions
can adequately represent the essential features of protein structure space.
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
8/23
Figure 2S: Density maps of protein structure space for sequence non-redundant subsets. The points
on the map are colored according to the number of domains that lie within 0.005 distance from them
in the dataset considered. In the left column, the map is of a subset of size 4238, in which thesequence identity between any two proteins is at most 95%; in the right column, that map is of a
b f i 2517 i hi h h id i i 40% Th l i ffi i
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
9/23
Figure 3S: Functional-diversity maps of protein structure space with vicinity defined as Vfn: The
points on the map are colored according to their functional diversity measured by the number ofdistinct GO-MF terms annotating the domains in the V fn vicinity of 100 nearest neighbors. In panels (a-
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
10/23
Figure 4S: Functional-diversity maps of protein structure space with vicinity defined as V fd: Thepoints on the map are colored according to their functional diversity measured by the number of
distinct GO-MF terms annotating the domains in the V fdi vicinity of distance 0 005 In panels (a-d) we
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
11/23
Figure 5S: The predicted functional diversity score vs. the functional diversity score calculated using the full dataset, for a
test set of 250 randomly chosen structures: We consider the three definitions of local vicinities: V samp, Vfn, and Vfd. We
calculate the projection to three-dimensions based on set that does not include the 250 test set proteins and their sequence
homologues. The predicted functional diversity of a test set protein is the number of unique GO-MF terms in the vicinity of
the location calculated for the structure using that projection to a lower dimension. In all three cases, the agreement of the
predicted functional diversity scores and the functional diversity score calculated using the full dataset is very good
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
12/23
Figure 6S:
Functional multiplicity map of
protein structure space. Each
protein structure is color-coded
by the number of its GO-MF
functional annotations. The
number of annotations is at most
7, and in the vast majority of the
cases (99.4%) it is less than three
(colored by shades of blue); the
top panel shows a bar diagram of
the number of annotations.Below, there are three views of
structure space. The high
functional diversity core seen in
Figure 2 in the main document is
not due to the small set of
proteins that are annotated by
unusually many functions (see
Figure 6S below for additional
evidence).
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
13/23
Figure 7S: Functional diversity map of protein structure space using only proteins annotated by
one function: We restrict our attention to domains annotated by at one GO term (61.04% of the
full data set). The correlation coefficients between the scores calculated using only this subset, and
when calculating using the full dataset are listed in Table 1S below. We see the high diversity core
in this dataset as well, meaning it cannot be explained away as a consequence of multiply-
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
14/23
Figure 8S: Functional diversity maps of protein structure space using the GO-Slim annotation:
Here we use a more restricted ontology for the annotation of function, and replace each
annotation by its most specific parent in the GO-slim graph (1). The maps on the left column use
the definition of vicinity Vsamp, and on the right Vfn. The correlation coefficients between thesemeasures and the straight-forward count of distinct GO-terms are listed in Table 1S below. We
h h i i hi hl di d h d i di i d h
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
15/23
Figure 9S: Functional diversity maps using alternative scoring functions I. The functional-diversity
score used to construct these maps is the weighted sum of distinct terms within a vicinity of a
domain; the weight of a term depends on how common it is in the dataset, with more common terms
contributing less. The maps on the left column use the definition of vicinity Vsamp, and on the right Vfn.
The correlation coefficients between these measures and the straight-forward count of distinct GO-
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
16/23
Figure 10S: Functional diversity maps using alternative scoring functions II. The functional-
diversity score used to construct these maps is the weighted sum of distinct terms within a
vicinity of a domain; the weight of a term is its specificity in the GO annotation graph, with more
specific terms contributing less (the inverse of its distance from the root). The maps on the left
column use the definition of vicinity V samp, and on the right Vfn. The correlation coefficients
between these measures and the straight-forward count of distinct GO-terms are listed in Table1S below. Here too, we see our main finding: a highly diverse core surrounded by less diverse
regions
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
17/23
Figure 11S: Functional diversity maps alternative scoring function III. The functional-diversity
score used to construct these maps is the 'coherence measure' of the functional GO-MF terms
in the neighborhood of a protein, as suggested in (4) (5). Thus, this score ranges between 0-
100%, and higher values imply proteins centered at regions that are less functionally diverse
(since all their terms are unique to that region (see Methods for details). The maps on the left
column use the definition of vicinity V samp, and on the right Vfn. The correlation coefficients
between these measures and the straight-forward count of distinct GO-terms are listed in Table
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
18/23
Figure 12S: Functional density scores when sampling the full dataset based on sequence. The vicinity
of a protein is V fd, and the diversity score is the number of distinct GO-MF terms in the annotations of
the proteins in the vicinity. The correlation coefficients of this diversity score and the straightforward
one using the full dataset is listed in Table 1 below. Even when using this very sparsely sampled
subset, we see the same core of high functional diversity.
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
19/23
Figure 13S: Functional diversity by SCOP class with vicinity Vfn and Vfd: We calculate the
separate histograms of functional diversity for each of the SCOP classes, and stack them one ontop of the other. Table 2S lists the exact proportions of each of the SCOP classes, among the
top 10%/20% most dense/functionally diverse domains. This supports Figure 3 in that the most
functionally diverse regions are populated by the alpha/beta domains.
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
20/23
Figure 14S: A map of Watson et al. (1) "known-function" dataset. The right panel shows the
functional diversity of our complete dataset, and the 90 proteins in Watson et al.'s [14] set are
shown as black dots. The left panel shows the same structures and their marker depends on if
their function was correctly predicted using global structural similarity (triangles), or not
(circles). We see that the function of proteins in the periphery of structure space tends to be
more accurate than the function of the proteins in the core. This statement is quantified in the
Results section.
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
21/23
Table 1S: Pearson correlation coefficients between the functional diversity score1
on the
full dataset and alternative scores/datasets
Alternative
set
Vicinity
definition
Using only
domains
annotated withone function
Functional
diversity
calculate usingGO-Slim
Weighted score by
1-10*(fraction of
term occurrence)
Weighted score
by 1/distance
from root
Figure 6S Figure 7S Figure 8S Figure 9S
Vsamp 0.8779 0.8952 0.9394 0.9264
Vfn 0.8860 0.9299 0.9977 0.9885
Vfd 0.9747 0.9567 0.9997 0.9983
Alternative
set
Vicinity
definition
Coherence as a
functional (non)
diversity score
Density
Function
Sampling using 95%
sequence non
redundant
Sampling using
40% sequence
non redundant
Figure 10S Figure 1 Figure 11S Figure 11S
Vsamp -0.7934 0.4306 0.6141 0.6468
Vfn -0.8117 0.0926 0.2969 0.3423
Vfd -0.3030 0.7320 0.8641 0.8881
Table 2S: SCOP class composition of the most functionally diverse domains
Data
Set
Vicinity
definition
Within
top
% all
alpha
% all
beta
% alpha /
beta
%alpha +
Beta
%
others
Full Vsamp 10% 1.99 0.10 78.06 15.19 4.65
20% 2.93 0.20 76.45 15.55 4.87
Full Vfn 10% 2.89 1.04 72.35 19.62 4.10
20% 3.56 1.35 70.52 19.84 4.73
Full Vfd 10% 1.52 0.13 81.84 12.15 4.36
20% 4.18 0.15 74.93 15.73 5.02
NR
(95%)
Vfd 10% 4.16 0 79.95 10.76 5.13
20% 19.01 0.12 63.40 11.92 5.55
NR
(40%)
Vfd 10% 8.54 0 75.61 9.76 6.10
20% 20.98 0.20 60.49 11.81 6.52
1
The functional diversity score of a protein domain is the number of di stinct GO-MF terms annotatingthe set of domains that are in the vicinity of that domain.
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
22/23
Table 3S: SCOP folds that lie in the functionally diverse core
Vfd Vfn Vsamp
Top 20 means Top 20 medians Top 20 means Top 20 medians Top 20 means Top 20 medians
SCOP Fold Mean
Score
SCOP Fold Median
Score
SCOP Fold Mean
Score
SCOP Fold Median
Score
SCOP Fold Mean
Score
SCOP Fold Median
Score
c.117 152.5 c.117 155.5 c.24 45.2 c.88 46 c.117 73.7 c.117 73
c.42 148.5 c.74 151.5 c.88 44.8 c.24 46 c.42 71.2 c. 74 72.5
c.14 133.8 c.42 151 c.6 43.9 c.74 44 c.74 69.9 c.42 72c.74 132.8 c.14 150 c.69 43.6 c.69 44 c.67 68.2 c.67 69
c.6 128.9 c.24 135 d.165 42.6 c.6 44 c.24 68.0 c.24 69
c.24 128.1 c.6 133 c.56 42.3 a.137 43.5 c.6 67.4 d.95 68
c.67 121.8 c.93 131 c.66 41.7 c.56 43 c.14 67.1 c.36 68
c.93 118.1 d.95 122 c.74 41.6 d.165 42 c.36 66.6 c.14 68
c.36 110.6 c.67 121 c.117 41.5 c.93 42 d.174 66.1 c.6 68
d.174 107.8 d.96 115.5 c.41 41.5 c.66 42 e.26 65.8 c.69 67
c.1 107.4 c.36 114 c.42 41.3 c.41 42 c.93 65.7 e.26 66
d.95 107.3 c.1 113 c.14 41.2 c.23 42 c.69 65.3 d.174 66
c.60 105.8 d.174 112 c.23 41.1 c.14 42 c.79 64.7 c.93 66
c.79 103.8 c.69 111 c.26 40.9 d.144 41 c.60 64.7 c.56 66
c.69 103.6 c.60 111 c.45 40.7 c.117 41 c.1 64.6 c.1 66
c.56 100.1 c.56 106 a.137 40.7 c.61 41 c.7 64.4 d.144 65
c.7 99.2 c.80 102 c.53 40.4 c.53 41 c.39 64.3 d.96 65
d.96 97.5 c.79 101 c.60 40.4 c.45 41 c.56 63.7 c.79 65
d.144 97.0 c.7 101 d.144 40.3 c.42 41 d.144 63.1 c.60 65
c.39 95.8 d.144 100 c.61 40.2 c.26 41 c. 72 62.4 c.7 65
8/3/2019 Maps of protein structure space reveal a fundamental relationship between protein structure and function
23/23