Upload
allegra-yates
View
18
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps. Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry Department University of Wisconsin – Madison USA. - PowerPoint PPT Presentation
Citation preview
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps
Frank DiMaio, Jude ShavlikComputer Sciences Department
George PhillipsBiochemistry Department
University of Wisconsin – MadisonUSA
Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006
X-ray Crystallography
ProteinCrystal
CollectionPlate
FFTFFT
ElectronDensity Map(“3D picture”)
X-ray beam
Given: Sequence + Density Map
N
O
H
N
O
N
O
N
ON
N
O
N
OO
Sequence + Electron Density Map
Find: Each Atom’s Coordinates
N
O
H
N
O
N
O
N
ON
N
O
N
OO
Our Subtask: Backbone Trace
Cα
Cα
Cα
Cα
The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein
The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein
Density Map Resolution
ARP/wARP(Perrakis et al. 1997)
TEXTAL(Ioerger et al. 1999)
Resolve(Terwilliger 2002)
Our focus
2Å2Å 3Å3Å 4Å4Å
Overview of ACMI (our method)
Local Match Algorithm searches for sequence-specific
5-mers centered at each amino acid Many false positives
Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace
Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace
5-mer Lookup and Cluster
…VKH V LVSPEKIEELIKGY…
PDB
Cluster 1 Cluster 2
wt=0.67 wt=0.33NOTE: can be done in precompute step
5-mer Search
6D search (rotation + translation) forrepresentative structures in density map
Compute “similarity”
Computed by Fourier convolution (Cowtan 2001)
Use tuneset to convert similarity score to probability
y
mapfragfrag yxyyxt 2 )()()()(
Convert Scores to Probabilities
5-mer representative
scores ti (ui)
search density map
Bayes’rule
probability distribution over unit cell
P(5-mer at ui | Map)
match to tuneset
scoredistributions
POSNEG
In This Talk…
Where we are now
For each amino acid in the protein, we have a probability distribution over the unit cell
Where we are headedFind the backbone layout maximizing
P( | )iu Map
AAs P( | )ii
u Map
AA-pairs
P(conformation { , })i ji, j
u u
Pairwise Markov Field Models A type of undirected graphical model
Represent joint probabilities as product of vertex and edge potentials
Similar to (but more general than) Bayesian networks
edges
( )st s ts t
ψ u ,u
vertices
( | )s ss
u y
( | )p U y
u1 u3u2
y
Protein Backbone Model
ALA GLY LYS LEU
Each vertex is an amino acid Each label is location + orientation
Evidence y is the electron density map Each vertex (or observational) potential
comes from the 5-mer matching( | )i iu y
,i i iu x q
Protein Backbone Model
Two types of edge (or structural) potentials Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
ALA GLY LYS LEU
Protein Backbone Model
Two types of structural (edge) potentials Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino
acids do not occupy same 3D space
ALA GLY LYS LEU
Backbone Model Potential( | )p u Map
amino acid
( | )i ii
u Mapadjacent AAs
( )adj i j
i j
ψ u ,u
nonadjacent AAs
( )occ i j
i j
ψ u ,u
Constraints between adjacent amino acids:
(|| ||)x i jp x x= x( )adj i jψ u ,u ( )i jp u ,u
Constraints between nonadjacent amino acids:
0 if || ||
1 otherwisei j
occ i j
x x Kψ u ,u
( | )p u Map
Backbone Model Potential
amino acid
( | )i ii
u Mapadjacent AAs
( )adj i j
i j
ψ u ,u
nonadjacent AAs
( )occ i j
i j
ψ u ,u
Observational (“amino-acid-finder”) probabilities
-2 2
|
Pr(5mer ... at )i i
i i i
ψ u
s s u
Map
( | )p u Map
Backbone Model Potential
amino acid
( | )i ii
u Mapadjacent AAs
( )adj i j
i j
ψ u ,u
nonadjacent AAs
( )occ i j
i j
ψ u ,u
Probabilistic Inference
Exact methods are intractable Use belief propagation (BP) to approximate
marginal distributions
Want to find backbone layout that maximizes
( | ) ( | )i ip u p Map U Map
amino acid
( | )i ii
u Mapadjacent AAs
( )adj i j
i j
ψ u ,u
nonadjacent AAs
( )occ i j
i j
ψ u ,u
, ku k i
Belief Propagation (BP)
Iterative, message-passing method (Pearl 1988)
A message, , from amino acid i toamino acid j indicates where i expects to find j
An approximation to the marginal (or belief) ,is given as the product of incoming messages
nib
ni jm
2 ( )GLY ALA ALAm x )(1GLYGLYALA xm
1 ( | )GLY GLYb x y0 ( | )ALA ALAb x y 0 ( | )GLY GLYb x y2 ( | )ALA ALAb x y
Belief Propagation Example
ALA GLY
Technical Challenges
Representation of potentials Store Fourier coefficients in Cartesian space At each location x, store a single orientation r
Speeding up O(N2X2) naïve implementation X = the unit cell size (# Fourier coefficients) N = the number of residues in the protein
Speeding Up O(N2X2) Implementation
O(X2) computation for each occupancy message Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space
O(N2) messages computed & stored Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator
Improved implementation O(NX log X)
1XMT at 3Å Resolution
1.12Å RMSd100% coverage
HIGH
LOW
0.17
0.82
prob(AA at location)
1VMO at 4Å Resolution
3.63Å RMSd72% coverage
0.02
0.25
HIGH
LOW
prob(AA at location)
1YDH at 3.5Å Resolution
1.47Å RMSd90% coverage
0.02
0.27
HIGH
LOW
prob(AA at location)
Experiments
Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve
Used ten model-phased maps
Smoothly diminished reflection intensitiesyielding 2.5, 3.0, 3.5, 4.0 Å resolution maps
RMS Deviation
0
2
4
6
8
10
12
2.0 2.5 3.0 3.5 4.0 4.5
ACMI
Textal
Resolve
ACMITextalResolve
Density Map Resolution
Cα
RM
S D
evia
tion ACMI
0%
20%
40%
60%
80%
100%
2.0 2.5 3.0 3.5 4.0 4.5
ACMI
Textal
Resolv11e
Model Completeness
Density Map Resolution
0%
20%
40%
60%
80%
100%
2.0 2.5 3.0 3.5 4.0 4.5
ACMI
Textal
Resolve% c
hain
tra
ced
% r
esid
ues
iden
tifie
d
ACMI
Per-protein RMS Deviation
ACMI RMS Error
TE
XT
AL
RM
S E
rror
Res
olve
RM
S E
rror
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
2.5A3.0A3.5A4.0A
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Conclusions
ACMI effectively combines weakly-matching templates to construct a full model
Produces an accurate trace even with poor-quality density map data
Reduces computational complexity from O(N2
X2) to O(N X log X)
Inference possible for even large unit cells
Future Work
Improve “amino-acid-finding” algorithm
Incorporate sidechain placement / refinement
Manage missing data Disordered regions Only exterior visible (e.g., in CryoEM)
Acknowledgements
Ameet Soni
Craig Bingman
NLM grants 1R01 LM008796 and 1T15 LM007359