25
Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's) Extreme protein stabilization (S. Mayo, 1990's) Binding pocket design (H. Hellinga, 2000) New fold design (B. Kuhlman, 2002-4) Protein-protein interface design (J. Gray, 2004) Experimental (non-computational) approaches: in vitro evolution **Other names in protein design: Hill, Vriend, Regan, D. Baker, Richardson, Dunbrack, Choma, several more.

Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Embed Size (px)

Citation preview

Page 1: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Short fast history of protein design

Site-directed mutagenesis -- protein engineering (J. Wells, 1980's)

Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Extreme protein stabilization (S. Mayo, 1990's)

Binding pocket design (H. Hellinga, 2000)

New fold design (B. Kuhlman, 2002-4)

Protein-protein interface design (J. Gray, 2004)

Experimental (non-computational) approaches:

• in vitro evolution• phage display

**Other names in protein design: Hill, Vriend, Regan, D. Baker, Richardson, Dunbrack, Choma, several more.

Page 2: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

The goal of sequence design

Given a desired structure, find an amino acid sequence that folds to that structure.

MIKYGTKIYRINSDNSGKJHGCKAHNEEEGHA

design folding

To do this, we must assign an energy to each possible sequence.

Page 3: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Theoretical complexity of sequence design

To design THE OPTIMAL sequence, we need the best amino acid, and its best rotamer at every position. We can treat each position as one of 193 possible rotamers. That's 191 rotamers in the Richardson library, plus Gly and Ala (which have no rotamers)

How many possible sets of rotamers are there for a protein of length 100?

193100 = 3.6*10228

DEE reduces the complexity of sequence design to about (193L)2 = 3.6*108

Page 4: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Sequence space maps to structure space

..as many-to-one.

This means that there is a lot of potential for "slop" in a sequence design. Moderately big sequence changes are possible, and the sequence can still fold to the same general structure.

sequence families

fold

Good news for protein designers

Page 5: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Dead end elimination theorem

E(ir) + j mins E(irjs) > E(it) + j maxs E(it,js)This can be translated into plain English

as follows:

If the "worst case scenario" for t is better than the "best case scenario" for r, then you always choose t.

reminder

Page 6: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE algorithm

E(r1)

-1 1 1

3 5 1

5 5 -1

-2 2 5

0 5 -1

0 0 0

0 0 1

12 5 0

4 3 0

-1 3 5

1 5 5

1 1 -1

-2 0 0

2 5 0

5 -1 0

0 12 4

0 5 3

1 0 0

r1

r2 E(r1,r2)

1

2

3

21 3

abc

abc

abc

a b ca b c a b c

0 0 5 0 0 0 0 0 10

0

0

5

0

0

0

0

0

10

E(r2)

abc

1

2

3

Find two columns (rotamers) within the same residue, where one is always better than the other. Eliminate the rotamer that can always be beat. (repeat until only 1 rotamer per residue)

Page 7: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE with alternative sequences

abc

1

2

3

-1 1 1

3 5 1

5 5 -1

-2 1 5 2

0 5 -1 2

0 0 0 3

0 0 1 1

12 5 0 -3

4 3 0 1

-1 3 5

1 5 5

1 1 -1

-2 0 0

2 5 0

5 -1 0

2 2 3

0 12 4

0 5 3

1 0 0

1 -3 1

r1

r2

E(r1,r2)

1

2

3

21 3

abc

abc

abab

a b ca b c a b a b

0 0 5 0 0 0 0 0 12 2

0

0

5

0

0

0

0

0

12

2

E(r2)

E(r1)

Asp

Leu

“Rotamers” within the DEE framework can have different atoms. i.e. they can be different amino acids. Using DEE, we choose the best set of rotamers. Now we have the sequence of the lowest energy structure. In the example, we have D or L at position 3.

Page 8: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Sequence design using DEE

•Selected residues (or all) are chosen for mutating.•Selected (or all) amino acids are allowed at those positions. •For the selected amino acids, all rotamers are considered.

Now "rotamer" comes to mean the amino acid identity and its conformation.

Since there are as many as 193 rotamers in the rotamer library for all amino acids, each selected position can have as many as 193 "rotamers."

If "fine grained" rotamers are used, this number may be much larger.

Page 9: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE with alternative sequences and ligands

abc

L

2

3

-1 1 1

3 5 1

5 5 -1

-2 1 5 2

0 5 -1 2

0 0 0 3

0 0 1 1

12 5 0 -3

4 3 0 1

-1 3 5

1 5 5

1 1 -1

-2 0 0

2 5 0

5 -1 0

2 2 3

0 12 4

0 5 3

1 0 0

1 -3 1

r2

E(r1,r2)

L

2

3

21 3

abc

abc

abab

a b ca b c a b a b

0 0 5 0 0 0 0 0 12 2

0

0

5

0

0

0

0

0

12

2

E(r2)

E(r1)

Asp

Leu

Ligands can have multiple conformations and locations within the active site. In DEE, each position of the ligand is another “rotamer”, i.e. another row and column in the DEE matrix.

Ligand conformers. r1

Page 10: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Sidechain modeling

Given a backbone conformation and the sequence, can we predict the sidechain conformations?

Energy calculations are sensitive to small changes. So the wrong sidechain conformation will give the wrong energy.

Page 11: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Goal of sidechain modeling

Desmet et al, Nature v.356, pp339-342 (1992)

Given the sequence and only the backbone atom coordinates, accurately model the positions of the sidechains.

fine lines = true structurethink lines = sidechain predictions using the method of Desmet et al.

Page 12: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Sidechain space is discrete, almost

A random sampling of Phenylalanine sidechains, when superimposed, fall into

three classes: rotamers.

This simplifies the problem of sidechain modeling.All we have to do is select the right rotamers and we're close to the right answer.

Page 13: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

What determines rotamers

CG

H

H

HO=C

N

CA

CB

CG

H H

HO=C

N

CA

CB

CG

H

H

HO=C

N

CA

CB

"m" "p""t"-60° gauche 180° anti/trans +60° gauche

3-bond or 1-4 interactions define the preferred angles, but these may differ greatly in energy depending on the atom groups involved.

Page 14: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Rotamer Libraries

Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the whole database. Each cluster is a representative conformation (or rotamer), and is represented in the library by the best sidechain angles (chi angles), the "centroid" angles, for that cluster.Two commonly used rotamer libraries:

*Jane & David Richardson: http://kinemage.biochem.duke.edu/databases/rotamer.php

Roland Dunbrack: http://dunbrack.fccc.edu/bbdep/index.php

*rotamers of W on the previous page are from the Richardson library.

Page 15: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Dead end elimination theorem

•There is a global minimum energy conformation (GMEC), where each residue has a unique rotamer.

In other words: GMEC is the set of rotamers that has the lowest energy.

•Energy is a pairwise thing. Total energy can be broken down into pairwise interactions. Each atom is either fixed (backbone) or movable (sidechain).fixed-movable movable-movable fixed-fixed

E is a constant, =Etemplate

E depends on rotamer, but

independent of other rotamers

E depends on rotamer, and depends on surrounding rotamers

Page 16: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Theoretical complexity of sidechain modeling

The Global Minimum Energy Configuration (GMEC) is one, unique set of rotamers.

How many possible sets of rotamers are there?

n1 n2 n3 n4 n5 … nL

where n1 is the number of rotamers for residue 1, and so on.

Estimated complexity for a protein of 100 residue, with an average of 5 rotamers per position: 5100 = 8*1069

DEE reduces the complexity of the problem from 5L to approximately (5L)2

Page 17: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Dead end elimination theorem

•Each residue is numbered (i or j) and each residue has a set of rotamers (r, s or t). So, the notation ir means "choose rotamer r for position i".

•The total energy is the sum of the three components:

NOTE: Eglobal ≥ EGMEC for any choice of

rotamers.

Eglobal = Etemplate + iE(ir) + ijE(ir,js)

where r and s are any choice of rotamers.

fixed-fixedfixed-movable

movable-movable

Page 18: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Dead end elimination theorem

•If ig is in the GMEC and it is not, then we can separate the terms that contain ig

or it and re-write the inequality.

E(ir) + j mins E(irjs) > E(it) + j maxs E(it,js)

EGMEC = Etemplate + E(ig) + jE(ig,jg) + jE(jg) + jkE(jg,kg)

EnotGMEC = Etemplate + E(it) + jE(it,jg) + jE(jg) + jkE(jg,kg)

...is less than...

E(ir) + j E(irjs) > E(ig) + j E(ig,js)

Canceling all terms in black, we get:

So, if we find two rotamers ir and it, and:

Then ir cannot possibly be in the GMEC.

Page 19: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Dead end elimination theorem

E(ir) + j mins E(irjs) > E(it) + j maxs E(it,js)

If the "worst case scenario" for rotamer t is better than the "best case scenario" for rotamer r, then you can eliminate r.

This can be translated into plain English as follows:

Page 20: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Exercise: Dead End Elimination

Using the DEE worksheet:

(1) Find a rotamer that satisfies the DEE theorem.

(2) Eliminate it.

(3) Repeat until each residue has only one rotamer.

What is the final GMEC energy?

Page 21: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE exercise

abc

1

2

3

Three sidechains. Each with three rotamers. Therefore, there are 3x3x3=27 ways to arrange the sidechains. • Each rotamer has an energy E(r), which is the non-bonded energy between sidechain and template. • Each pair of rotamers has an interaction energy E(r1,r2), which is the non-bonded energy between sidechains.

Page 22: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE exercise

-1 1 1

3 5 1

5 5 -1

-2 2 5

0 5 -1

0 0 0

0 0 1

12 5 0

4 3 0

-1 3 5

1 5 5

1 1 -1

-2 0 0

2 5 0

5 -1 0

0 12 4

0 5 3

1 0 0

r1

r2 E(r1,r

2)

1

2

3

21 3

abc

abc

abc

a b ca b c a b c

0 0 5 0 0 0 0 0 12

0

0

5

0

0

0

0

0

10

E(r2)

E(r1)

Page 23: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

DEE exercise: instructions

(1) The best (worst) energies are found using the worksheet: Add E(r1) to the sum of the lowest (highest) E(r1,r2) that have not been previously eliminated.

(2) There are 9 possible DEE comparisons to make: 1a versus 1b, 1a versus 1c, 1b versus 1c, 2a versus 2b, etc. etc. For each comparison, find the minimum and maximum energy choices of the other rotamers. If the maximum energy of r1 is less than the minimum energy of r2, eliminate r2.

(3) Scratch out the eliminated rotamer and repeat until one rotamer per position remains.

If the “best case scenario” for r1 is worse than the “worst case scenario” for r2 you can eliminate r1.

Page 24: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

QuickTime™ and a decompressor

are needed to see this picture.

Page 25: Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

QuickTime™ and a decompressor

are needed to see this picture.