18
Evolution of the Rossmann fold through insertions Spencer Bliven Bourne Journal Club. Sept. 10, 2013 PNAS, 110(36), E3381–7. doi:10.1073/pnas.1305519110 Consequences of domain insertion on sequence- structure divergence in a superfold Chetanya Pandya a,1 , Shoshana Brown b,1 , Ursula Pieper b , Andrej Sali b,c,d , Debra Dunaway-Mariano e , Patricia C. Babbitt b,c,d , Yu Xia a,f,g , and Karen N. Allen f,2

Journal Club 2013-09-10: Pandya et al

Embed Size (px)

DESCRIPTION

Journal club presentation on: Pandya, C., Brown, S., Pieper, U., Šali, A., Dunaway-Mariano, D., Babbitt, P. C., et al. (2013). Consequences of domain insertion on sequence-structure divergence in a superfold. Proceedings of the National Academy of Sciences of the United States of America, 110(36), E3381–7. doi:10.1073/pnas.1305519110

Citation preview

Page 1: Journal Club 2013-09-10: Pandya et al

Evolution of the Rossmann fold through insertions

Spencer Bliven Bourne Journal Club. Sept. 10, 2013

PNAS, 110(36), E3381–7. doi:10.1073/pnas.1305519110

Consequences of domain insertion on sequence-structure divergence in a superfoldChetanya Pandyaa,1, Shoshana Brownb,1, Ursula Pieperb, Andrej Salib,c,d, Debra Dunaway-Marianoe,Patricia C. Babbittb,c,d, Yu Xiaa,f,g, and Karen N. Allenf,2

aBioinformatics Graduate Program and fDepartment of Chemistry, Boston University, Boston, MA 02215; bDepartment of Bioengineering and TherapeuticSciences, cDepartment of Pharmaceutical Chemistry, School of Pharmacy, and dCalifornia Institute for Quantitative Biosciences, University of California,San Francisco, CA 94158-2330; eDepartment of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131; and gDepartment ofBioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada H3A 0C3

Edited* by Gregory A. Petsko, Brandeis University, Waltham, MA, and approved July 27, 2013 (received for review March 21, 2013)

Although the universe of protein structures is vast, these innumer-able structures can be categorized into a finite number of folds. Newfunctions commonly evolve by elaboration of existing scaffolds,for example, via domain insertions. Thus, understanding structuraldiversity of a protein fold evolving via domain insertions is a funda-mental challenge. The haloalkanoic dehalogenase superfamilyserves as an excellent model system wherein a variable cap domainaccessorizes the ubiquitous Rossmann-fold core domain. Here, wedetermine the impact of the cap-domain insertion on the sequenceand structure divergence of the core domain. Through quantitativeanalysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution havebeen uncovered. The relationship between sequence and structuredivergence of the core domain is shown to be monotonic andindependent of the corresponding type of domain insert, reflectingthe robustness of the Rossmann fold to mutation. However, coredomains with the same cap type share greater similarity at thesequence and structure levels, suggesting interplay between thecap and core domains. Notably, results reveal that the variance instructure maps to !-helices flanking the central "-sheet and not tothe domain–domain interface. Collectively, these results hint atintramolecular coevolution where the fold diverges differentiallyin the context of an accessory domain, a feature that might alsoapply to other multidomain superfamilies.

directed evolution | phosphoryl transferase | protein evolution |structural bioinformatics | HAD superfamily

The universe of protein structures is vast and diverse, yet theseinnumerable structures can be categorized into a finite num-

ber of folds (1). Ideally, the protein fold has a robust yet evolvablearchitecture to deliver chemistry, bind interaction partners, orprovide scaffolding. A popular strategy for the acquisition of newfunction(s) is the topological alteration of the fold to providea new evolutionary platform. More frequently, existing and stablescaffolds are elaborated to attain diversity that is due to accu-mulation of stochastic, independent, and near-neutral mutationsin the protein sequence. In a large number of cases, the ex-pansion of functional space has been achieved by the tandemfusion of two or three domains to form evolutionary modulesknown as supradomains (2). An analysis of catalytic domainsfused to the nucleotide-binding Rossmann domain has revealedthat the sequential order of their connections is conserved be-cause each pairing arose from a single recombination event (3).Another common structural embellishment is that of domaininsertion(s) into existing folds (4)—a strategy that is ubiqui-tous in all structural classes, i.e., all !, all ", ! + ", and !/" (5).For example, members of the A, B, and Y DNA polymerasesuperfamilies, Rab geranylgeranyl transferase superfamily,and alcohol dehydrogenase superfamily have inserted differentdomains into the native fold to fine tune their cellular functions(6–8). The analysis of such noncontiguous domain organizationhas been facilitated by the availability of structures bearing

insertions of domains that also occur as independent folds. Ithas been estimated that 9% of domain combinations observedin protein-structure databases are insertions (5). However, theway in which the sequence–structure relationship changes withina protein fold in the context of such domain insertions has yet tobe fully understood. In this study, we assess how the insertion ofan accessory domain affects the sequence–structure relationshipof the Rossmann fold, a superfold used by at least 10 differentprotein superfamilies (9).Function-driven changes come with their own costs: most

molecular modifications of proteins tend to be thermodynami-cally destabilizing (10). Although long hypothesized (11), it hasbeen shown only recently that the stability of a fold promotesevolvability by allowing a high degree of structural plasticity (12).As a consequence, protein folds follow a power-law distributionwhere a few intrinsically stable folds, referred to as superfolds,have numerous members, and a multitude of folds have fewmembers (9). Due to this interplay between stability and evolv-ability, it has been suggested that superfolds are compatible witha much larger set of sequences than other folds (13). This pro-posal raises the question of how protein sequence and structuraldiversity are related to one another. Pioneering work by Chothiaand Lesk (14) illustrated that structural similarity is correlatedwith sequence similarity. Although the 3D structure retains thecommon fold during neutral drift, it undergoes subtle changesas sequence diverges, mainly due to packing modifications andbackbone conformational changes. In a focused study, Halaby et al.(15) have shown that sequence diverges to a greater extent thanstructure in the Ig fold. More recently, Panchenko and co-

Significance

Here, we determine the impact of large-domain insertions onthe sequence and structure divergence of a ubiquitous proteinscaffold (superfold). By performing quantitative analysis ona distinctive dataset of >150 protein structures, unique pro-tein-design principles have been uncovered. Our work suggeststhat superfolds are tolerant to relatively large domain inser-tions when followed by accommodating mutations in thescaffold. This structural robustness may facilitate the devel-opment of directed evolution technologies that incorporatedomains into existing scaffolds.

Author contributions: C.P., S.B., D.D.-M., P.C.B., Y.X., and K.N.A. designed research; C.P.,S.B., and U.P. performed research; C.P., S.B., A.S., D.D.-M., P.C.B., Y.X., and K.N.A. analyzeddata; and C.P., Y.X., and K.N.A. wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.1C.P. and S.B. contributed equally to this work.2To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1305519110/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1305519110 PNAS | Published online August 19, 2013 | E3381–E3387

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

PNASPL

US

Page 2: Journal Club 2013-09-10: Pandya et al

Main Findings   Analyzed insertions in haloalkanoic dehalogenase

superfamily (HADSF)   Look at the effect of inserted domains on the core

structure   Use structural similarity networks to analyze complex

relationships within the HADSF family   Suggest properties of HADSF foldspace & how it was

shaped by evolution

Page 3: Journal Club 2013-09-10: Pandya et al

Rossmann superfold

  Robust to sequence changes (<10% ID)   HADSF superfamily contains >79000 sequences

Phil McFadden, http://blogs.oregonstate.edu/psquared/2012/04/16/topology-in-2d-and-3d-the-rossmann-fold/

Page 4: Journal Club 2013-09-10: Pandya et al

HADSF structural classification

Phil McFadden, http://blogs.oregonstate.edu/psquared/2012/04/16/topology-in-2d-and-3d-the-rossmann-fold/

Fig. S1

C1 C2

1LTQ 2HSZ 2C4N 1L6R

Cap

Core

C1+C2

Page 5: Journal Club 2013-09-10: Pandya et al

Structural coverage Figure S2

Nodes: 40% sequence clusters Edges: BLAST E-val < 1e-20

>79,000 sequences 154 experimental structures (0.25%) ~22% coverage with modeling

Page 6: Journal Club 2013-09-10: Pandya et al

Structure/Sequence relations are similar for all types

Figure S6

Figure 1B

fTM

scor

e

Figure 2

Page 7: Journal Club 2013-09-10: Pandya et al

Structural Clustering

By Cap By Core

Fig 3

Page 8: Journal Club 2013-09-10: Pandya et al

Cap-Core Correlation

Fig 4

Cap

dom

ain

stru

ctur

al s

imila

rity

(fT

Msc

ore)

Type Spearman P-val

Combined 0.47 <1e-10

Same 0.75 <1e-10

Different -0.01 .35

Figure S11

Page 9: Journal Club 2013-09-10: Pandya et al

PPCA Analysis   Use SALIGN for multiple alignment (TMalign/Staccato not

shown)   Small intrinsic dimensionality compared to ~6000 DOF

Fig 5

Page 10: Journal Club 2013-09-10: Pandya et al

Positional Variance   Most variance in the core is not near the cap interface   Catalysis takes place at cap/core interface   “Breathing motion” correlates with cap domain

Fig 6

2HSZ (network center)

(cap)

Page 11: Journal Club 2013-09-10: Pandya et al

Conclusions   Sequence-structure relationship is robust to large

insertions   Cap insertions determine structural changes in the core   Variation isn’t at the core-cap interface   Functional annotation difficult

  extensive divergence   Catalytic site at domain-domain interface

Page 12: Journal Club 2013-09-10: Pandya et al

Hypotheses about fold space   Adding a cap limits accessible foldspace

  Accounts for breath of C0 variability   Could also be explained by cap loss, but only one case known

  Why do cap and core covary? 1.  Coevolution after the insertion (more probable)

  LUCA thought to have 5 HAD members, one from each cap type   Small cap inserted into core pre-LUCA, then underwent

intradomain elaboration (Burroughs 2006)

2.  Independent divergence of cap and core   Several cap types evolved similar catalytic activity, so similar selective

pressures, yet observe different structural divergence

Page 13: Journal Club 2013-09-10: Pandya et al

Further Questions   Do conclusions generalize beyond HADSF?   Can we rationally design multidomain proteins based on

Rossmann scaffolds?   How can we do functional prediction on such divergent

families?   Can a similar framework be used to interpret the impact

of protodomain insertions/deletions/rearrangements?

Page 14: Journal Club 2013-09-10: Pandya et al
Page 15: Journal Club 2013-09-10: Pandya et al

Sequence Similarity

Figure S5

Page 16: Journal Club 2013-09-10: Pandya et al

PPCA with TMalign/Staccato

Fig S7

Page 17: Journal Club 2013-09-10: Pandya et al

First three PPCA components

Page 18: Journal Club 2013-09-10: Pandya et al

License   Unless otherwise indicated, figures are from:

  Pandya, C., Brown, S., Pieper, U., Šali, A., Dunaway-Mariano, D., Babbitt, P. C., et al. (2013). Consequences of domain insertion on sequence-structure divergence in a superfold. Proceedings of the National Academy of Sciences of the United States of America, 110(36), E3381–7. doi:10.1073/pnas.1305519110

  and are licensed for noncommercial use.   http://www.pnas.org/site/misc/authorlicense.pdf

  This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.