19
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins: 1c1y:A, 1kk1:A6-200 Sequence id: 18% Structural id: 72%

Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Protein Structure Alignment

Human Myoglobin pdb:2mm1

Human Hemoglobin alpha-chain pdb:1jebA

Sequence id: 27%

Structural id: 90%

Another example:

G-Proteins: 1c1y:A, 1kk1:A6-200

Sequence id: 18%Structural id: 72%

Transformations

Translation

Translation and Rotation Rigid Motion (Euclidian Trans.)

Translation, Rotation + Scaling

txx

'

'x Rx t

)(' txRsx

Inexact Alignment.

Simple case – two closely related proteins with the same number of amino acids.

Assume transformation T is given

Question: how to measure an alignment error?

Distance FunctionsTwo point sets: A={ai} i=1…n

B={bj} j=1…m• Pairwise Correspondence:

(ak1,bt1) (ak2,bt2)… (akN,btN)

(1) Exact Matching: ||aki – bti||=0

(2) Bottleneck max ||aki – bti||

(3) RMSD (Root Mean Square Distance)

Sqrt( Σ||aki – bti||2/N)

Correspondence is Unknown

find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.

Given two configurations of points in the three dimensional space,

T

Largest Common Point Set (LCP) problem

Given e>0 and two point sets A and B find a transformation T and equally sized subsets A’ (a subset of A) and B’ (a subset of B) of maximal cardinality such that dist(A’,T(B’)) ≤ e.

Bottleneck metric: optimal solution in O(n32.5) C. Ambuhl et al. 2000

RMSD metric: open problem

A 3-D reference frame can be uniquely defined by the ordered vertices of a non-

degenerate triangle

p1

p2

p3

Structure Alignment (Straightforward Algorithm)

• For each pair of triplets, one from each molecule which define ‘almost’ congruent triangles compute the rigid transformation that superimposes them.

• Count the number of aligned point pairs.

-> maximal bipartite matching (bottleneck metric)

How?

• Complexity : O(n3m3 ) * O(nm √(m +n) ) .

Can we say something about the quality of the final solution?

YES!

If there is a LCP of size L with error e, then the alignment method detects a LCP of size >= L with error 8e. M.T. Goodrich et al. 1994.

Superposition - best least squares(RMSD – Root Mean Square Deviation)

Given two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;

rmsd(P,Q) = √ i|pi - qi |2 /n

Find a 3-D rigid transformation T* such that:

rmsd( T*(P), Q ) = minT √ i|T(pi) - qi |2 /n

A closed form solution exists for this task.It can be computed in O(n) time.

Sequence-order Independent Alignment

P: Q:

4-helix bundle

2cbl:A1f4n:A

1b3q

1rhg:A

Sequence Order Independent Alignment

Sequence Order Independent Alignment

2cbl:A

1f4n

1rhg:A

1b3q

51 103 113 169

3 58 54 7

73 126 34 12

306 355 354 305

171 147

chain A

chain A

chain B

chain B

E. A. NALEFSKI and J. J. FALKE

The C2 domain calcium-binding motif: Structural and functional diversity Protein Sci 1996 5: 2375-2390

The C2 domain calcium-binding motif

TRAF-Immunoglobulin Ensemble

- helices ; - strands

Ensemble: 8 proteins from 2 folds.

Core: sandwich of 6 strands Runtime: 21 seconds

E- strand