57
Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Robotics Algorithms for the Study of

Protein Structure and Motion

Based on Itay Lotan’s PhD

Jean-Claude LatombeComputer Science Department

Stanford University

Page 2: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Unfolded (denatured) state

Folded (native) stateMany pathways

Page 3: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Loops connect helices and strands

Folded State

Page 4: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

amino-acid(residue)

peptide bonds

Protein Sequence Structure

Page 5: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Kinematic Linkage Model

Conformational space

Page 6: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Molecule Robot

Page 7: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Why Studying Proteins?

They perform many vital functions, e.g.:• catalysis of reactions • storage of energy• transmission of signals • building blocks of muscles

They are linked to key biological problems that raise major computational challenges

mostly due to their large sizes (100s to several 1000s of atoms), many degrees of kinematic freedom, and their huge number (millions)

Page 8: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Two problems Structure determination from

electron density maps• Inverse kinematics techniques

[Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)]

Energy maintenance during Monte Carlo simulation• Distance computation techniques

[Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]

Page 9: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Structure Determination: X-Ray Crystallography

Page 10: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Software Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID

• 1.0Å < d < 2.3Å ~ 90% completeness• 2.3Å ≤ d < 3.0Å ~ 67% completeness (varies widely)1

Manually completing a model:

• Labor intensive, time consuming• Existing tools are highly

interactive

JCSG: 43% of data sets 2.3Å

1Badger (2003) Acta Cryst. D59

Model completion is high-throughput bottleneck

1.0Å 3.0Å

Page 11: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

The Completion Problem Input:

• Electron-density map• Partial structure•Two anchor residues•Amino-acid sequence of missing fragment (typically 4 – 15 residues long)

Output: • Ranked conformations Q of fragment that

- Respect the closure constraint- Maximize target function T(Q) measuring fit with

electron-density map- No atomic clashes

Main part of protein (f olded)

Protein f ragment (f uzzy map)

Anchor 1(3 atoms)

Anchor 2(3 atoms)

Main part of protein (f olded)

Protein f ragment (f uzzy map)

Anchor 1(3 atoms)

Anchor 2(3 atoms)

Partial structure(folded)

(Inverse Kinematics)

Page 12: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Two-Stage IK Method

1. Candidate generations Closed fragments

2. Candidate refinement Optimize fit with EDM

Page 13: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Stage 1: Candidate Generation

1. Generate a random conformation of fragment (only one end attached to anchor)

2. Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Page 14: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

fixed end

moving end

Closure Distance

Closure Distance: 2 22

S N N C C C C

Compute

+ bias toward avoiding steric clashes

s.t. 0ii

Sq

q

A.A. Canutescu and R.L. Dunbrack Jr.Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003.

Page 15: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Exact Inverse Kinematics

Repeat for each conformation of a closed fragment:

1. Pick 3 amino-acids at random (3 pairs of - angles)

2. Apply exact IK solver to generate all IK solutions [Coutsias et al, 2004]

Page 16: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TM0813

GLU-83

GLY-96

Page 17: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Stage 2: Candidate Refinement

1-D manifold

Target function T (Q) measuring quality of the fit with the EDM

Minimize T while retaining closure Closed conformations lie on a self-motion

manifold of lower dimension

d3d2

d1(1,2,3)

Null space

Page 18: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Closure and Null Space dX = J dQ, where J is the 6n Jacobian

matrix (n > 6) Null space {dQ | J dQ = 0} has dim = n – 6 N: orthonormal basis of null space dQ = NNT T(Q)

X

Page 19: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

dX U66 VT6n dQ66

=

Computation of NSVD of J

12

6

Gram-Schmidt orthogonalization

0

(n-6) basis N of null space

NT

Page 20: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Refinement Procedure

Repeat until minimum of T is reached: 1. Compute J and N at current Q2. Compute T at current Q

(analytical expression of T + linear-time recursive computation [Abe et al., Comput. Chem., 1984])

3. Move by small increment along dQ = NNT T

(+ Monte Carlo / simulated annealing protocol to deal with local minima)

Page 21: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TM0813

GLU-83

GLY-96

Page 22: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Tests #1: Artificial Gaps

TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b

Complete structures (gold standard) resolved with EDM at 1.6Å resolution

Compute EDM at 2, 2.5, and 2.8Å resolution

Remove fragments and rebuild

Page 23: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TM1621 103 Fragments from TM1621 at 2.5Å

Produced by H. van den Bedem

Long Fragments:

12: 96% < 1.0Å aaRMSD15: 88% < 1.0Å aaRMSD

Short Fragments:

100% < 1.0Å aaRMSD

Page 24: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Example: TM0423PDB: 1KQ3, 376 res.2.0Å resolution12 residue gapBest: 0.3Å aaRMSD

Page 25: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Tests #2: True Gaps Structure computed by RESOLVE Gaps completed independently (gold

standard) Example: TM1742 (271 residues) 2.4Å resolution; 5 gaps left by RESOLVE

Length Top scorer

4 0.22Å

5 0.78Å

5 0.36Å

7 0.72Å

10 0.43Å

Produced by H. van den Bedem

Page 26: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TM1621

Green: manually completed conformation

Cyan: conformation computed by stage 1

Magenta: conformation computed by stage 2

The aaRMSD improved by 2.4Å to 0.31Å

Page 27: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Current/Future Work

A

B

Software actively being used at the JCSG

What about multi-modal loops?

Page 28: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TM0755: data at 1.8Å 8-residue fragment crystallized in 2 conformations Overlapping density: Difficult to interpret

manually

Algorithm successfully identified and built both conformations

A323Hist

A316Ser

Page 29: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Current/Future Work

A

B

Software actively being used at the JCSG

What about multi-modal loops?

Fuzziness in EDM can then be exploited

Use EDM to infer probability measure over the conformation space of the loop

Page 30: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Amylosucrase

J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004

Page 31: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Energy maintenance during Monte Carlo simulation

joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin1

1 Computer Science Department, Tel Aviv University

Page 32: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Random walk through conformation space At each attempted step:

• Perturb current conformation at random• Accept step with probability:

The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e.,

#conformations in V ~

/( ) min 1, bE k TP accept e

Monte Carlo Simulation (MCS)

E

-kT

Ve dV

Page 33: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Used to:• sample meaningful distributions of conformations • generate energetically plausible motion pathways

A simulation run may consist of millions of steps

energy must be evaluated a large number of times

Problem: How to maintain energy efficiently?

Monte Carlo Simulation (MCS)

Page 34: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Energy Function E = bonded terms

+ non-bonded terms + solvation terms

Bonded terms - O(n)

Non-bonded terms - E.g., Van der Waals and electrostatic- Depend on distances between pairs of atoms - O(n2) Expensive to compute

Solvation terms- May require computing molecular surface

Page 35: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Non-Bonded Terms Energy terms go to 0 when distance

increases Cutoff distance (6 - 12Å)

vdW forces prevent atoms from bunching up Only O(n) interacting pairs [Halperin&Overmars 98]

Problem: How to find interacting pairswithout enumerating all atom pairs?

Page 36: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Grid Method

dcutoff

Subdivide 3-space into cubic cells

Compute cell that contains each atom center

Represent grid as hashtable

Page 37: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Grid Method

dcutoff Θ(n) time to build grid O(1) time to find

interactive pairs for each atom

Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98]

Asymptotically optimal in worst-case

Page 38: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Can we do better on average?

Few DOFs are changed at each MC step

Number kof DOF changes

0 10 20 305

simulationof 100,000attempted steps

Page 39: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Can we do better on average?

Few DOFs are changed at each MC step Proteins are long chain kinematics

Long sub-chains stay rigid at each step Many interacting pairs of atoms are unchanged Many partial energy sums remain constant

Problem: How to find new interacting pairs and retrieve unchanged partial sums?

Page 40: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Two New Data Structures

1. ChainTree Fast detection of interacting atom pairs

2. EnergyTree Retrieval of unchanged partial energy sums

Page 41: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

ChainTree(Twofold Hierarchy: BVs +

Transforms)

links

Page 42: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

TNO

TJK

TAB

joints

ChainTree(Twofold Hierarchy: BVs +

Transforms)

Page 43: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Updating the ChainTree

Update path to root:– Recompute transforms that “shortcut” the DOF change– Recompute BVs that contain the DOF change– O(k log2(2n/k)) work for k changes

Page 44: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Finding Interacting Pairs

Page 45: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Finding Interacting Pairs

Page 46: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Finding Interacting Pairs

Do not search inside rigid sub-chains (unmarked nodes)

Page 47: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Finding Interacting Pairs

Do not search inside rigid sub-chains (unmarked nodes)

Do not test two nodes with no marked node between them

New interacting pairs

Page 48: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

EnergyTree

E(N,N)

E(J,L)

E(K.L)

E(L,L)

E(M,M)

Page 49: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

EnergyTree

E(N,N)

E(J,L)

E(K.L)

E(L,L)

E(M,M)

Page 50: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Complexity

n : total number of DOFs k : number of DOF changes at each MCS step k << n

Complexity of: updating ChainTree: O(k log2(2n/k)) finding interacting pairs: O(n4/3)

but performs much better in practice!!!

Page 51: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Experimental Setup

Energy function: Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å

300,000 steps MCS with Grid and ChainTree

Steps are the same with both methods Early rejection for large vdW terms

Page 52: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Results: 1-DOF change

(68) (144) (374) (755)# amino acids

3.5

12.5

5.8

7.8

speedup

Page 53: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Results: 5-DOF change

(68) (144) (374) (755)

2.2

3.4

4.5

5.9

speedup

Page 54: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Two-Pass ChainTree (ChainTree+)

1st pass: small cutoff distance to detect steric clashes2nd pass: normal cutoff distance

>5Tests around native state

Page 55: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Interaction with Solvent

Implicit solvent model: solvent as continuous medium, interface is solvent-accessible surface

E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces underConformational Changes. http://www.give.nl/movie/publications/telaviv/EH04.pdf

Page 56: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

Summary

Inverse kinematics techniques Improve structure determination from fuzzy electron density maps

Collision detection techniques Speedup energy maintenance during Monte Carlo simulation

Page 57: Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University

About Computational Biology

Computational Biology is more than mimicking nature (e.g., performing Molecular Dynamic simulation)

One of its goals is to achieve algorithmic efficiency by exploiting properties of molecules, e.g.: • Atoms cannot bunch up together• Forces have relatively short ranges • Proteins are long kinematic chains