A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps

Frank DiMaio, Jude ShavlikComputer Sciences Department

George PhillipsBiochemistry Department

University of Wisconsin – MadisonUSA

Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006

X-ray Crystallography

ProteinCrystal

CollectionPlate

FFTFFT

ElectronDensity Map(“3D picture”)

X-ray beam

Given: Sequence + Density Map

N

O

H

N

O

N

O

N

ON

N

O

N

OO

Sequence + Electron Density Map

Find: Each Atom’s Coordinates

N

O

H

N

O

N

O

N

ON

N

O

N

OO

Our Subtask: Backbone Trace

Cα

Cα

Cα

Cα

The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein

The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein

Density Map Resolution

ARP/wARP(Perrakis et al. 1997)

TEXTAL(Ioerger et al. 1999)

Resolve(Terwilliger 2002)

Our focus

2Å2Å 3Å3Å 4Å4Å

Overview of ACMI (our method)

Local Match Algorithm searches for sequence-specific

5-mers centered at each amino acid Many false positives

Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace

Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace

5-mer Lookup and Cluster

…VKH V LVSPEKIEELIKGY…

PDB

Cluster 1 Cluster 2

wt=0.67 wt=0.33NOTE: can be done in precompute step

5-mer Search

6D search (rotation + translation) forrepresentative structures in density map

Compute “similarity”

Computed by Fourier convolution (Cowtan 2001)

Use tuneset to convert similarity score to probability

y

mapfragfrag yxyyxt 2 )()()()(

Convert Scores to Probabilities

5-mer representative

scores ti (ui)

search density map

Bayes’rule

probability distribution over unit cell

P(5-mer at ui | Map)

match to tuneset

scoredistributions

POSNEG

In This Talk…

Where we are now

For each amino acid in the protein, we have a probability distribution over the unit cell

Where we are headedFind the backbone layout maximizing

P( | )iu Map

AAs P( | )ii

u Map

AA-pairs

P(conformation { , })i ji, j

u u

Pairwise Markov Field Models A type of undirected graphical model

Represent joint probabilities as product of vertex and edge potentials

Similar to (but more general than) Bayesian networks

edges

( )st s ts t

ψ u ,u

vertices

( | )s ss

u y

( | )p U y

u1 u3u2

y

Protein Backbone Model

ALA GLY LYS LEU

Each vertex is an amino acid Each label is location + orientation

Evidence y is the electron density map Each vertex (or observational) potential

comes from the 5-mer matching( | )i iu y

,i i iu x q


Two types of edge (or structural) potentials Adjacency constraints ensure adjacent amino acids

are ~3.8Å apart and in the proper orientation

ALA GLY LYS LEU


Two types of structural (edge) potentials Adjacency constraints ensure adjacent amino acids

are ~3.8Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino

acids do not occupy same 3D space

ALA GLY LYS LEU

Backbone Model Potential( | )p u Map

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Constraints between adjacent amino acids:

(|| ||)x i jp x x= x( )adj i jψ u ,u ( )i jp u ,u

Constraints between nonadjacent amino acids:

0 if || ||

1 otherwisei j

occ i j

x x Kψ u ,u

( | )p u Map

Backbone Model Potential

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Observational (“amino-acid-finder”) probabilities

-2 2

|

Pr(5mer ... at )i i

i i i

ψ u

s s u

Map

( | )p u Map

Backbone Model Potential

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

Probabilistic Inference

Exact methods are intractable Use belief propagation (BP) to approximate

marginal distributions

Want to find backbone layout that maximizes

( | ) ( | )i ip u p Map U Map

amino acid

( | )i ii

u Mapadjacent AAs

( )adj i j

i j

ψ u ,u

nonadjacent AAs

( )occ i j

i j

ψ u ,u

, ku k i

Belief Propagation (BP)

Iterative, message-passing method (Pearl 1988)

A message, , from amino acid i toamino acid j indicates where i expects to find j

An approximation to the marginal (or belief) ,is given as the product of incoming messages

nib

ni jm

2 ( )GLY ALA ALAm x )(1GLYGLYALA xm

1 ( | )GLY GLYb x y0 ( | )ALA ALAb x y 0 ( | )GLY GLYb x y2 ( | )ALA ALAb x y

Belief Propagation Example

ALA GLY

Technical Challenges

Representation of potentials Store Fourier coefficients in Cartesian space At each location x, store a single orientation r

Speeding up O(N2X2) naïve implementation X = the unit cell size (# Fourier coefficients) N = the number of residues in the protein

Speeding Up O(N2X2) Implementation

O(X2) computation for each occupancy message Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space

O(N2) messages computed & stored Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator

Improved implementation O(NX log X)

1XMT at 3Å Resolution

1.12Å RMSd100% coverage

HIGH

LOW

0.17

0.82

prob(AA at location)

1VMO at 4Å Resolution


0.02

0.25

HIGH

LOW


1YDH at 3.5Å Resolution


0.02

0.27

HIGH

LOW


Experiments

Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve

Used ten model-phased maps

Smoothly diminished reflection intensitiesyielding 2.5, 3.0, 3.5, 4.0 Å resolution maps

RMS Deviation

0

2

4

6

8

10

12

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolve

ACMITextalResolve


Cα

RM

S D

evia

tion ACMI

0%

20%

40%

60%

80%

100%

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolv11e

Model Completeness


0%

20%

40%

60%

80%

100%

2.0 2.5 3.0 3.5 4.0 4.5

ACMI

Textal

Resolve% c

hain

tra

ced

% r

esid

ues

iden

tifie

d

ACMI

Per-protein RMS Deviation

ACMI RMS Error

TE

XT

AL

RM

S E

rror

Res

olve

RM

S E

rror

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

2.5A3.0A3.5A4.0A

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Conclusions

ACMI effectively combines weakly-matching templates to construct a full model

Produces an accurate trace even with poor-quality density map data

Reduces computational complexity from O(N2

X2) to O(N X log X)

Inference possible for even large unit cells

Future Work

Improve “amino-acid-finding” algorithm

Incorporate sidechain placement / refinement

Manage missing data Disordered regions Only exterior visible (e.g., in CryoEM)

Acknowledgements

Ameet Soni

Craig Bingman

NLM grants 1R01 LM008796 and 1T15 LM007359

Documents

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps