View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Recursive domains in proteins
Teresa PrzytyckaNCBI, NIH
Joint work with G.Rose & Raj Srinivasan; JHU
Domain: “Polypeptide chain (or a part of it) that can independently fold into stable tertiary structure” (Baranden & Tooze; Introduction to Protein Structure)
Two-domain protein.
Alpha helix
Beta strand
The 3D structure of a protein domain can be described as a compact arrangement of
secondary structures
These arrangements are far from random:
There are not so many of them :
Proportion of "new folds" (light blue) and "old folds" (orange) for a given year. (fold = fold domain)
PDB contains about 17000 structures and less than 1000 different folds.
Possible sources of restricted number of folds:
• Evolutionary history. – Given enough time would domains look “more
random”?
• Existence of general restrictions/rules which render some (compact) arrangements of secondary structures non-feasible. – Can real protein domains be seen as sentences in a
language, which can be generated by an underlying grammar?
Can protein domains be described using a set of folding
rules?
We restrict our attention to all beta domains:• they admit variety of topologies• they are difficult to predict from sequence
Understanding -folds• Patterns in -sheets
– Richardson 1977 • folding rules for -sheets
– Zhang and Kim 2000• Hydrogen bonding pattern• Polypeptide chain seems to avoid
“complications”
• Properties of -sandwiches– Woolfson D. N., Evans P.
A., Hutchinson E. G., and Thornton J. M. 1993
Parallel anti-parallel mixed
“forbidden” crossed conformation
Expectations for good folding rules
• We need to look at fold properties that occur in non-homologous proteins.
• Preferably: The provide a model for the folding process.
Super-secondary structures as precursors of folding rules
• Super-secondary structure – frequently occurring arrangements of a small number of secondary structures
• The occurrences of super-secondary structures in unrelated families supports possibility of their independent formation.
Example 1: Hairpin
-unit
Example 2: Greek key and suggested folding pathway for it
Folding pathway for Greek key proposed by Ptitsyn.
Pattern from a Greek vase
Two level of folding rules:
• Primitive folding rules – based on super secondary structures
• Closure operation – allows for hierarchical application of the primitive rules
supersecondary structures -primitive
folding rules
Hairpin rule
Bridge
hairpin
Greek key
Indirect wind
Direct wind
Closure-composite rules• Super-secondary structures are composed of secondary
structures that are neighboring in the chain sequence• However from the presence of a super-secondary
structure, like a hairpin, in a protein structure follows that residues that are non consecutive become neighboring in space.
Closure - “short cut” in the sequence due to a folding rule
Example 1applying
folding rules to jelly roll
Recursive domains
Recursive domain is a part of a protein fold that can be generated using folding rules supported with the closure operation.
A protein that can be fully generated using folding rules has one recursive domain.
Examples
• Example 1
• Example 2
• Example 3
• Example 4
Recursive domains
Recursive domain is a part of a protein fold that can be generated using folding rules supported with the closure operation.
A protein that can be fully generated using folding rules has one recursive domain.
Graph theoretical tools and recursive domains
Fold graph: Vertices – strands Edges – two types:Neighbor edges: directed edges between strands that are neighbors in chain or vie the closure operation.Domain edge: edges between stands used in the same folding rule
Recursive domains = connected component of the fold graph without neighbor edges.
Partition into recursive components
for small (<=10 strands) proteins
Comparison with the partition for computer generated set of all possible 8-strand sandwiches
recursive domains for proteins with at most 10 strands
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8
number of recursive domains
num
ber of fo
lds
recursive domains
Distribution of recursive domains in all sandwich like '"folds"
0
500
1000
1500
2000
2500
3000
1 2 3 4 5 6 7 8
number of recursive domains
num
ber of genera
ted "
fold
s"
Series1
Can the rules generate all known folds?
Protein data Control
One recursive fold
Offenders
Hedhehog intein domain
Given a fold, is there a unique sequence of folding steps leading to it?
Usually no.
Usually there alternative sequences of folding steps leading to a construction of the same domain.
Do such alternative folding sequences correspond to alternative folding pathways?
Are the rules complete?
Probably not.
e.g.: For propeller, each blade is in one recursive domain but we do not have a rule that will put the blades together.
Nice… dog… walk
It is so nice outside. It would be nice to take
the dog for a walk!
Conclusions: We are getting some idea how things work...
• Protein folds can be described by simple folding rules.
• The folding rules capture at least some aspects of fold simplicity and regularity.
• The sequence of folding steps leading to a given fold is usually not unique.
• The folding rules generate protein-like structures.
Conclusions
Future directions
• Can folding rules guide fold prediction?
• Would hierarchical description of a fold provided by folding rules be useful for fold classification / comparison ?
• Adding statistical evaluation of a recursive domain.
Acknowledgments
George Rose
Raj Srinivasen
Rohit Pappu
Venk Murthy
NIH, K01 grant