26
Alejandro Giorgetti alejandro.giorgetti@uniroma it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti [email protected] 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Embed Size (px)

Citation preview

Page 1: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

3rd Permanent School in Bioinformatics

Madrid 2005

Protein Structure Modeling Alejandro Giorgetti

Page 2: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

The number of different protein folds is limited:

[ last update: Oct 2001 ]New Folds

Known Folds

100

1'000

10'000

100'000

1'000'000

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

TrEMBL

SwissProt

PDB

Public Database Holdings:

Page 3: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Fold recognition

Principle: Find a compatible fold

>Target Sequence XYMSTLYEKLGGTTAVDLAVAAVAGAPAHKRDVLNQ

Build model of target protein based on each

template structure

Rank models according to

SCORE or ENERGY

Profile methods

Threading

M

A

TE

A

F

TS

G

Q

AlaAlaP

AlaAlaPkTAlaAlaE

unfolded

folded

(

)(ln)(

Fold =f(environment)Local Secondary StructureSolvent AccessibilityDegree of burial of polar / apolar

Fragment based methods: New fold prediction

Page 4: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Some remarks concerning fold recognition:

Capable of detecting quite remote sequence-structure matches.

Sensitivity depends on the size of the protein and its secondary structure content.

The two most versatile enzymatic functions (hydrolases and o-glycosyl- glucosidases) are associated with seven folds each.

Better for detecting: all-α > αβ > all-β

Page 5: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Homology modeling

Comparative protein modeling

Idea: Proteins evolving from a common ancestor maintained similar core 3D structures.

Known structure/s is/are used as a template to model an unknown structure with known sequence.

Both of them should be related by evolution.

First applied in late 1970’s by Tom Blundell

Page 6: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

10 %

30 %

50 %

70 %

90 %

Drug design?

Biochemistry?

Molecular Biology?

[ Chothia & Lesk (1986) ]

Evolution of protein structure families

X-r

ay c

rist

allo

grap

hy:

MR

Page 7: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Template(s) selection

Sequence Alignment

Structure Modeling

Structure E

valuation

Final Structural Models

Comparative Modeling

Known Structures (templates)

Target sequence

>hTEIIMSSPQAPEDGQGCGDRGDPPGDLRSVLVTTVLNLEPLDEDLFRGRHYWVPAKRLFGGQIVGQALVAAAKSVSEDVHVHSLHCYFVRAGDPKLP

Page 8: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Template(s) selection

Sequence Alignment

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Protein Data Bank PDB http://www.pdb.org

Database of templates

Separate into single chainsRemove bad structures (models)

Create BLAST database

Comparative Modeling

Known Structures (templates)

Page 9: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Known Structures (templates)

Sequence Alignment

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Sequence Similarity / Fold recognition

Structure quality (resolution, experimental method)

Experimental conditions (ligands and cofactors)

Comparative Modeling

Template(s) selection

Page 10: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Known Structures (templates)

Template(s) selection

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Key step in homology modelingGlobal alignment is requiredSmall error in alignment can lead to big error in modelMultiple alignments are better than pairwise alignmentsDo we know something else? Experiments?

Comparative Modeling

Sequence Alignment

Page 11: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Known Structures (templates)

Template(s) selection

Structure E

valuation

Final Structural Models

Target sequence

Template based fragment Assembly (SwissMod). Satisfaction of Spatial Restraints: MODELLER

Comparative Modeling

Sequence Alignment

Structure Modeling

Page 12: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Known Structures (templates)

Template(s) selection

Sequence Alignment

Structure Modeling

Final Structural Models

Target sequence

Errors in template selection or alignment result in bad models

Iterative cycles of alignment, modeling and evaluation

Comparative Modeling

Structure E

valuation

Page 13: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

[ http://www.expasy.org/spdbv/ ]

I. Template based fragment assembly (SwissModel)

Page 14: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

• SwissPdb downloading: a) Read and Accept licenceb) Download: SwissPdb viewer v3.7sp5(linux)• Installation:a) gunzip spdbv37sp5-Linux.tar.gzb) tar –xvf spdbv37sp5-Linux.tarc) cd SPDBV_Distribution and do: ./install.shd) Local installation : ..../guestxx/e) Run: /guestxx/SPDBV/bin/spdbv

• SwissModel submission:a) Search template: http://www.expasy.org/swissmod/SWISS-

MODEL.html Interactive tools: Search the template...(paste sequence) b) Model request submission: Save project (SwissPdb Viewer); c) Swiss- Model web page: Modelling Requests – Optimise mode.d) Fill the form and Upload your project asking for Short mode output.

Day activities. Morning

Page 15: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Corresponds to the most stable regions. Highest sequence conservation and fewer gaps. In general: secondary structures elements.

a) Build conserved core framework (Structurally conserved regions -SCRs)

I. Template based fragment assembly

[ http://www.expasy.org/spdbv/ ]

Page 16: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Least stable or more flexible regions.

Highest level of gapping

Lowest sequence conservation

Loops and turns

Loop-Database

“ab-initio” rebuilding of loops (Monte Carlo,

molecular dynamics, genetic algorithms, etc.)

b) Loop modeling (Structural variable regions - SVRs) and backbone completion

I. Template based fragment assembly

[ http://www.expasy.org/spdbv/ ]

Page 17: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

c) Side Chain placement

Find the most probable side chain

conformation, using

homologues structures

back-bone dependent rotamer libraries

energetic and packing criteria

I. Template based fragment assembly

[ http://www.expasy.org/spdbv/ ]

Page 18: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

modeling will produce unfavorable contacts and bonds

idealization of local bond and angle geometry

extensive energy minimization will move coordinates away

keep it to a minimum

SwissModel is using GROMOS 96 force field

d) Energy minimization

I. Template based fragment assembly

ji ijijij

ji

ji

bonds angles dihedralsb

rrr

qq

nVkxxkV

612

0

20

20

4

1

)cos(1()()(

Page 19: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Find the most probable structure given its alignment Satisfy spatial restraints derived from the alignment. Uses probability density functions. Minimizes violations on restraints.

Comparative protein modeling by satisfaction of spatial restraints. A. Šali and T.L. Blundell. J. Mol. Biol. 234, 779-815

II. Modeling by Satisfaction of Spatial restraints

Page 20: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Model Evaluation ?Topics:

correct fold

model coverage (%)

C - deviation (rmsd)

alignment accuracy (%)

side chain placement

Structure Analysis and Verification Server:

http://nihserver.mbi.ucla.edu/SAVS/

Page 21: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

EVA

Evaluation of Automatic protein structure prediction

[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva ]

CASPCommunity Wide Experiment on the Critical Assessment of Techniques for Protein Structure Predictionhttp://PredictionCenter.llnl.gov/casp6

3D - Crunch

Very Large Scale Protein Modelling Project

http://www.expasy.org/swissmod/SM_LikelyPrecision.html

Model Accuracy Evaluation

Page 22: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Protein Structure Resources

PDB http://www.pdb.org PDB – Protein Data Bank of experimentally solved structures (RCSB)

CATH http://www.biochem.ucl.ac.uk/bsm/cath Hierarchical classification of protein domain structures

SCOP http://scop.mrc-lmb.cam.ac.uk/scop Alexey Murzin’s Structural Classification of proteins

DALI http://www2.ebi.ac.uk/dali Lisa Holm and Chris Sander’s protein structure comparison server

SS-Prediction and Fold Recognition

PHD http://cubic.bioc.columbia.edu/predictprotein Burkhard Rost’s Secondary Structure and Solvent Accessibility Prediction Server

PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/L.J McGuffin, K Bryson & David T. Jones Secndary struture prediction Server

3DPSSM http://www.sbg.bio.ic.ac.uk/~3dpss Fold Recognition Server using 1D and 3D Sequence Profiles coupled.

THREADER: http://bioinf.cs.ucl.ac.uk/threader/threader.html David T. Jones threading program

Page 23: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

• UCL, Janet Thornton & Christine Orengo

• Class (C), Architecture(A), Topology(T), Homologous superfamily (H)

Protein Structure Classification

CATH - Protein Structure Classification[ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ]

SCOP - Structural Classification of Proteins

• MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C.

• created by manual inspection

• comprehensive description of the structural and evolutionary relationships

[ http://scop.mrc-lmb.cam.ac.uk/scop/ ]

Page 24: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

• Class(C)

derived from secondary structure

content is assigned automatically

• Architecture(A)

describes the gross orientation of

secondary structures,

independent of connectivity.

• Topology(T)

clusters structures according to

their topological connections and

numbers of secondary structures

• Homologous superfamily (H)

Page 25: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Protein Homology Modeling Resources

SWISS MODEL: http://www.expasy.org/swissmod/SWISS-MODEL.html

Deep View - SPDBV:homepage: http://www.expasy.ch/spdbv Tutorials http://www.expasy.org/spdbv/text/tutorial.htm

WhatIf http://www.cmbi.kun.nl:1100/Gert Vriend’s protein structure modeling analysis program WhatIf

Modeller: http://guitar.rockefeller.edu/modeller Andrej Sali's homology protein structure modelling by satisfaction of spatial restraints

ROBETTA: http://robetta.bakerlab.org/ Full-chain Protein Structure Prediction Server

Programs and www servers very useful in Comparative modeling: http://salilab.org/tools/

Page 26: Alejandro Giorgetti alejandro.giorgetti@uniroma1.it 3rd Permanent School in Bioinformatics Madrid 2005 Protein Structure Modeling Alejandro Giorgetti

Alejandro Giorgetti

[email protected]

Day activities SwissPdb Viewer: downloading and installation. Target accession number (Swiss-Prot): O14734 1) Template Search Method: Blast - FASTA – PsiBlast – QuickBlast (from Swiss-Prot). 2) Sequence Alignment and Secondary Structure Pred. Pairwise - Multiple sequence alignment (free to choose method) Secondary structure prediction. (Predict Protein Meta server or PHD advanced submission form). Results comparison. 3) SwissPdb viewer: Load Raw sequence to model.(SwissModel menu) Find appropriate template (SwissModel menu) Load template (1C8U.pdb) (File menu) Update threading display now (item of the SwissModel menu). This function threads your sequence into the template. Manually optimize the alignment (Wind menu: Sequences Alignments):

Align the catalytic triad and secondary structure elements Alignment editing. (Back-Space or Space to delete and add gaps). It is useful to color by secondary structure in Template

Target: Asp231, Ser253 and Gln303.

Template: Asp204, Thr228 and Gln278

Make sure the current layer is PTE1, click on the little black arrow in Align Window. "smooth" with smoothing factor =1.

"Auto Color by Threading Energy" item of the SwissModel menu.

Threading evaluation: "select aa making clashes" items of the "Select" menu.

Submit a modeling request to Swiss-Model "Submit modeling request" of the SwissModel menu. (Otherwise save the project, and submit from SwissModel web page: Optimize mode.

4 ) Model Validation Structural Controls: Loops modeling? Regions :Scan Loop Data Base (Build menu). Need refinement? If yes, few steps of MD minimization: Select all residues In SwissPdb viewer Prefs menu – Energy minimization set to 200 steps of steepest descent (electrostatics interactions?) In Tools menu: Energy minimization.