27
Protein Structural Prediction

Protein Structural Prediction

  • Upload
    ina

  • View
    55

  • Download
    3

Embed Size (px)

DESCRIPTION

Protein Structural Prediction. Performance of Structure Prediction Methods. TRILOGY: Sequence–Structure Patterns. Identify short sequence–structure patterns 3 amino acids Find statistically significant ones (hypergeometric distribution) Correct for multiple trials - PowerPoint PPT Presentation

Citation preview

Page 1: Protein Structural Prediction

Protein Structural Prediction

Page 2: Protein Structural Prediction

Performance of Structure Prediction Methods

Page 3: Protein Structural Prediction

TRILOGY: Sequence–Structure Patterns

• Identify short sequence–structure patterns 3 amino acids• Find statistically significant ones (hypergeometric distribution)

Correct for multiple trials• These patterns may have structural or functional importance

1. Pseq: R1xa-bR2xc-dR3

2. Pstr: 3 C – C distances, & 3 C – C vectors

• Start with short patterns of 3 amino acids{V, I, L, M}, {F, Y, W}, {D, E}, {K, R, H}, {N, Q}, {S, T}, {A, G, S}

• Extend to longer patterns

Bradley et al. PNAS 99:8500-8505, 2002

Page 4: Protein Structural Prediction

TRILOGY

Page 5: Protein Structural Prediction

TRILOGY: Extension

Glue together two 3-aa patterns that overlap in 2 amino acids

P-score = i:Mpat,…,min(Mseq, Mstr) C(Mseq, i) C(T – Mseq, Mstr – i) C(T, Mstr)-1

Page 6: Protein Structural Prediction

TRILOGY: Longer PatternsType-II turn between unpaired strands

NAD/RAD binding motif found in several folds

-- unit found in three proteins with the TIM-barrel fold

Helix-hairpin-helix DNA-binding motif

Four Cysteines forming 4 S-S disulfide bonds

A fold with repeated aligned -sheets

Three strands of an anti-parallel -sheet

A -hairpin connected with a crossover to a third -strand

Page 7: Protein Structural Prediction

Small Libraries of Structural Fragments for Representing Protein

Structures

Page 8: Protein Structural Prediction

Fragment Libraries For Structure Modeling

knownstructures

fragmentlibrary

proteinsequence

predictedstructure

Page 9: Protein Structural Prediction

Small Libraries of Protein Fragments

Kolodny, Koehl, Guibas, Levitt, JMB 2002

Goal: Small “alphabet” of protein structural fragments that can be used to represent

any structure

1. Generate fragments from known proteins2. Cluster fragments to identify common structural motifs3. Test library accuracy on proteins not in the initial set

f

Page 10: Protein Structural Prediction

Small Libraries of Protein FragmentsDataset: 200 unique protein domains with most reliable & distinct structures from SCOP

36,397 residues

• Divide each protein domain into consecutive fragments beginning at random initial position

Library: Four sets of backbone fragments 4, 5, 6, and 7-residue long fragments

• Cluster the resulting small structures into k clusters using cRMS, and applying k-means clustering with simulated annealing Cluster with k-means Iteratively break & join clusters with simulated annealing to optimize total variance Σ(x – μ)2

f

Page 11: Protein Structural Prediction

Evaluating the Quality of a Library

• Test set of 145 highly reliable protein structures (Park & Levitt)

• Protein structures broken into set of overlapping fragments of length f

• Find for each protein fragment the most similar fragment in the library (cRMS)

Local Fit: Average cRMS value over all fragments in all proteins in the test set

Global Fit: Find “best” composition of structure out of overlapping fragments Complexity is O(|Library|N) Greedy approach extends the C best

structures so far from pos’n 1 to N

Page 12: Protein Structural Prediction

Results

C =

Page 13: Protein Structural Prediction

Protein Side-Chain Packing

• Problem: given the backbone coordinates of a protein, predict the coordinates of the side-chain atoms

• Method: decompose a protein structure into very small blocks

Slide credits: Jimbo Xu

Page 14: Protein Structural Prediction

Protein Structure Prediction

• Stage 1: Backbone Prediction Ab initio folding Homology modeling Protein threading

• Stage 2: Loop Modeling

• Stage 3: Side-Chain Packing

• Stage 4: Structure Refinement

The picture is adapted from http://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.htmlSlide credits: Jimbo Xu

Page 15: Protein Structural Prediction

Side-Chain Packing

clash

Each residue has many possible side-chain positionsEach possible position is called a rotamerNeed to avoid atomic clashes

0.30.2

0.1

0.10.1

0.3

0.7

0.6

0.4

Slide credits: Jimbo Xu

Page 16: Protein Structural Prediction

Energy Function

))(),(,,())(,( jAiAjiPiAiSi

Minimize the energy function to obtain the best side-chain packing.

Assume rotamer A(i) is assigned to residue i. The side-chain packing quality is measured by

clash penalty

occurring preferenceThe higher the occurring probability, the smaller the value

0.82

10

1ba

ba

rrd

,

clash penalty

: distance between two atoms :atom radiibad ,

ba rr ,

Slide credits: Jimbo Xu

Page 17: Protein Structural Prediction

Related Work

• NP-hard [Akutsu, 1997; Pierce et al., 2002] and NP-complete to achieve an approximation ratio O(N) [Chazelle et al, 2004]

• Dead-End Elimination: eliminate rotamers one-by-one

• SCWRL: biconnected decomposition of a protein structure [Dunbrack et al., 2003] One of the most popular side-chain packing programs

• Linear integer programming [Althaus et al, 2000; Eriksson et al, 2001; Kingsford et al, 2004]

• Semidefinite programming [Chazelle et al, 2004]

Slide credits: Jimbo Xu

Page 18: Protein Structural Prediction

Algorithm Overview

• Model the potential atomic clash relationship using a residue interaction graph

• Decompose a residue interaction graph into many small subgraphs

• Do side-chain packing to each subgraph almost independently

Slide credits: Jimbo Xu

Page 19: Protein Structural Prediction

Residue Interaction Graph

• Vertices:Each residue is a vertex

• Edges:Two residues interact if there is a potential clash between their rotamer atoms

Residue Interaction Graph

a

b

c

d f

e

m

l k j

i

h

s

Slide credits: Jimbo Xu

Page 20: Protein Structural Prediction

Key Observations

• A residue interaction graph is a geometric neighborhood graph

Each rotamer is bound to its backbone position by a constant distance

No interaction edge between two residues if distance > D• D: constant depending on rotamer diameter

• A residue interaction graph is sparse!

Slide credits: Jimbo Xu

Page 21: Protein Structural Prediction

Tree Decomposition[Robertson & Seymour, 1986]

• Definition. A tree decomposition (T, X) of a graph G = (V, E):

T=(I, F) is a tree with node set I and edge set F

X is a set of subsets of V, the components; Union of elts. in X = V

1-to-1 mapping between I and X

For any edge (v,w) in E, there is at least one X(i) in X s.t. v, w are in X(i)

In tree T, if node j is on the path from i to k, then X(i) ∩ X(k) X(j)

• Tree width is defined to be the maximal component size minus 1

Slide credits: Jimbo Xu

Page 22: Protein Structural Prediction

h

Greedy: minimum degree heuristic

a

b

c

d f

e

m

l k j

i

g

ac

d f

e

m

k j

i

h

gabd

l

1. Choose the vertex with minimal degree2. The chosen vertex and its neighbors form a

component3. Add one edge to any two neighbors of the chosen

vertex4. Remove the chosen vertex5. Repeat the above steps until the graph is empty

Slide credits: Jimbo Xu

Tree Decomposition[Robertson & Seymour, 1986]

Page 23: Protein Structural Prediction

Tree Decomposition (Cont’d)

Tree Decomposition

Tree width: size of maximal component – 1

a

b

c

d f

e

m

l k j

i

h

gabd acd

clk

cdem defm

fgh

eij

Slide credits: Jimbo Xu

Page 24: Protein Structural Prediction

Side-Chain Packing Algorithm

Bottom-to-Top: Calculate the minimal energy function

Top-to-Bottom: Extract the optimal assignment

Time complexity: Exponential in tree width, linear in

graph size

))(,())(,())(,())(,( min)A(

iililjijXX

iri XAXScoreXAXFXAXFXAXFri

Score of subtree rooted at Xi

Score of component Xi

Score of subtree rooted at Xj

Xr

Xp Xi

Xj XlXq

Xir

XjiXli

A tree decomposition rooted at Xr

Score of subtree rooted at Xl

Slide credits: Jimbo Xu

Page 25: Protein Structural Prediction

Empirical Component Size Distribution

Tested on the 180 proteins used by SCWRL 3.0.Components with size ≤ 2 ignored.

Slide credits: Jimbo Xu

Page 26: Protein Structural Prediction

Result

protein size SCWRL SCATD speedup

1gai 472 266 3 88

1a8i 812 184 9 20

1b0p 2462 300 21 14

1bu7 910 56 8 7

1xwl 580 27 5 5

Five times faster on average, tested on 180 proteins used by SCWRL

Same prediction accuracy as SCWRL 3.0

CPU time (seconds)

Theoretical time complexity: << is the average number rotamers for each residue.

)( log3/2 NNNO N

Slide credits: Jimbo Xu

Page 27: Protein Structural Prediction

Accuracy

0.50.550.6

0.650.7

0.750.8

0.850.9

0.95

ASN ASP CYS HIS ILE SER TYR VAL

SCATDSCWRL

A prediction is judged correct if its deviation from the experimental value is within 40 degree.

1

Slide credits: Jimbo Xu