Secondary structure - alpha-Helix · Secondary structure - alpha-Helix ... The structure repeats itself every 5.4 ¯ along the helix axis, ... MULTICOIL (Wolf E., Kim P. S, Berger

1

Protein Analysis - Part 2

Bioinformatic tools for identification andcharacterization of proteinsPart 1- Similarity searches- Motif (pattern and profile) searches Protein domain databases- Primary structure analysis

Part 2- Secondary structure prediction- Tertiary structure and modelling- Proteomics

Secondary structure - alpha-Helix

Properties of the a-helix.The structure repeats itself every 5.4 Å along the helix axis, i.e. we say that the a-helix has a pitch of 5.4 Å. a-helices have3.6 amino acid residues per turn, i.e. a helix 36 amino acids long would form 10 turns.

Secondary Structure - ß-Sheet

The ß-sheet structureIn a ß-sheet two or more polypeptide chains run alongside each other and are linked in a regular manner by hydrogen bondsbetween the main chain C=O and N-H groups. Therefore all hydrogen bonds in a ß-sheet are between different segments ofpolypeptide. This contrasts with the a-helix where all hydrogen bonds involve the same element of secondary structure.

Secondary structureReverse turnsA reverse turn is region of the polypeptide having a hydrogen bond from one main chain carbonyl oxygen to the main chainN-H group 3 residues along the chain (i.e. Oi to Ni+3). Helical regions are excluded from this definition and turns betweenß-strands form a special class of turn known as the ß-hairpin.

How can secondary structures be predicted

- Statistics or stereochemical principles of AAs- Multiple alignments, the conserved structures are buried in the core (mostly), variable regions outside- Solvent accessibility pattern correspond to specific secondary structures

- Replace expert knowledge by neural networks

How are the secondary structures detected

in a PDB file

The figure below shows the three main chain torsion angles of a polypeptide. These are phi (F), psi (Y), and omega (W).

omega fixed because of planar peptide bond.

alpha

beta

2

How are the secondary structures detected in PDB

Hydrogen bonds (3.10 helix: i, i+3; alpha helix i, i+4 etc.)

• the proton-acceptor distance - less than 2.4 Angstroms & the angle between the proton-donor bond and the line connecting the donor and acceptor atoms - less than 35 degrees (e.g., see Berndt et al., 1993).• the proton-acceptor distance - less than 2.5 Angstroms & the angle - between +/- 90 and 180 degrees (Baker & Hubbard, 1984)• the energy defined by an electrostatic potential function is less than a cutoff value (see Kabsch & Sander, 1983) (E < -0.5 kcal/mol) allowing a distance of up to 5.2 Angstroms & allowing a misalignment of up to 63 degrees at the ideal length (2.9 A).

Secondary structure prediction

Best programs in the CASP2 and CASP3 contest:

• Dsc (King and Sternberg,ICRF, London; imp. HUSAR)• PHD (B.Rost, Columbia.edu)• PSIPRED (D.Jones, Warwick, UK)

also available:

• Predator (Argos, EMBL)• Foldclass (HUSAR)• GORIV (IBCP.FR)• HNN (IBCP.FR)• NNSSP (Solovyev, Sanger Center)

Algorithm PHD

multiple alignments of homologous proteins, looked up through database searching

2 neural network algorithm backpropagation with a single hidden layers

multi-level-system (also using solvent plots, transmembrane domain predictions,…)

Neural Networks learning with neural network

training settest set

GLCRVLLKP

Helix Sheet Coil

Neural Networks 2 learning with neural network

Sequence information

...

AAA

AA.

LLL

AAG

CCS

...

Profile

A C L G S in del 100 0 0 0 0 0 0100 0 0 0 0 0 330 0 100 0 0 0 0....

20+2in-putx

win-dow

H

E

C

Sequence to structure

Second Neural Network- structure to structure

Secondary structure prediction PHD

expected average accuracy > 72% for the three states helix, strand andloop(Rost &Sander, PNAS, 1993 , 90, 7558-7562; Rost & Sander, JMB, 1993 , 232, 584-599; Rost & Sander, Proteins, 1994 , 19, 55-72; evaluation of accuracy)

3

Algorithm DSC

Sequences in homologous alignments described as:• residue conformational propensities, • sequence edge effects; • moments of hydrophobicity;• position of insertions and deletions in aligned homologous sequences; • moments of conservation; • auto-correlation;• residue ratios; • secondary-structure feedback effects; • filtering

Learning method is standard linear discriminationDataset 496 proteins

Secondary structure prediction DSC

DSC (Ross D. King & Michael J.E. Sternberg (Protein Science 5:2298-2310, 1996))

Input should be an alignment (also single sequence possible).

If input is a set of multiply aligned homologous sequences, DSC has an overall per residue three-state accuracy of ~70%

Algorithm PSIPRED

PSIBLAST to get profile that means informations aboutconserved regions and insertions/deletions

2 neural networks with on single hidden layer like PHD

Secondary structure prediction PSIPRED

Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195-202.

Accuracy ca. 74%

Secondary structure prediction -

EvaluationRoss D. King et al., (2000) Protein Engineering, 13, 15-19.Is it better to combine predictions?

Yes!

Combined NNSSP, PHD, DSC and Predator.

Secondary structure prediction- Evaluation

Results

CASP3 test proteins

Method AccuracyNNSSP 70.5PHD 70.3DSC 70.2PREDATOR 68.3Simple vote 73.3Linear Discr. 72.7Neural Network 73.3

Combinedmethods

4

NPS - Network Protein Sequence Analysis

http://pbil.ibcp.fr/TIBS 2000 March Vol. 25, No 3 [291]:147-150Combet C., Blanchet C., Geourjon C. and Deléage G.Beside other tools: consensus protein secondary structureprediction. Used programs:

• SOPM (Geourjon and Deléage, 1994) • SOPMA (Geourjon and Deléage, 1995) • HNN (Guermeur, 1997) • MLRC (Guermeur et al., 1999) • DPM (Deléage and Roux, 1987) • DSC (King and Sternberg, 1996) • GOR I (Garnier et al., 1978) • GOR III (Gibrat et al., 1987) • GOR IV (Garnier et al., 1996) • PHD (Rost and Sander, 1993) • PREDATOR (Frishman and Argos, 1996) • SIMPA96 (Levin, 1997)

JPRED - Consensus secondary structure

prediction (http://jura.ebi.ac.uk:8888)Cuff J. A., Clamp M. E., Siddiqui A. S., Finlay M., Barton G, J., Jpred: A Consensus Secondary Structure Prediction Server, Bioinformatics, 14:892-893, (1998)

Input single sequence in RAW or PIR format or multiple sequence alignment in MSF or BLC format Output 3 state secondary structure prediction, in Coloured HTML, PS, Java, ASCII output Prediction methods PHD (Rost and Sander, 1993) DSC (King and Sternberg, 1996) PREDATOR (Frishman and Argos, 1996) NNSSP (Salamov, A. A. & Solovyev, V. V., 1995) MULPRED (Barton (1994), unpublished) ZPRED (Zvelebil et. al., 1987) JNET (Cuff J. and Barton G. J., 1999) JNETsolacc COILS (Lupas, A., 1996) MULTICOIL (Wolf E., Kim P. S, Berger B., 1997) PHDhtm (Rost and Sander, 1993)

Secondary structure

prediction - Example

PDB:1bnk and NNSSP

1 50MVTPALQMKKPKQFCRRMGQKKQRPARAGQPHSSSDAAQAPAEQPHSSSDCCCCCCCCCCCCCHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC51 100AAQAPCPRERCLGPPTTPGPYRSIYFSSPKGHLTRLGLEFFDQPAVPLARCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCCCCHHHHH101AFLGQVLVRRLPNGTELRGRIVETEAYLGPEDEAAHSRGGRQTPRNRGMFHHHCCEEEEECCCCEEEEEEEEEEEECCCCCCHHHCCCCCCCCCCCCEEE151MKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRHVRSTLECCCCEEEEEEECEEEEEEEECCCCCHEEEEECCCCCCHHHHHHHHHHCC201RKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLECCCEEEEEECCCCCCCCCHHHHHHHCCCCCCCCCCCCCCCCEEEECCCCC251PSEPAVVAAARVGVGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQACCCCCEEEEEEEECCCCCCCCCCCEEEEEECCCCEEEEECHHHCCCCC

HELIX 88-91, 95-102, 138-140,146-150, 190-197, 211-213,HELIX 218-224, 229-231, 265-271, 290-293SHEET 242-245, 106-110, 116-122, 184-188, 156-161,SHEET 165-173, 178-183, 123-127,SHEET 274-279, 256-260

Secondary structure


PDB:1bnk and NPS (1)


10 20 30 40 50 60 70 | | | | | | |bnkxxx0 MVTPALQMKKPKQFCRRMGQKKQRPARAGQPHSSSDAAQAPAEQPHSSSDAAQAPCPRERCLGPPTTPGPDPM ccechhhhhtcchhhhhhccccchchhhctctttcchhhhhhhtctttcchhhhccchhhcctccccctcDSC ccccccccccchhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccGOR4 ccccccccccccchhhhhcccccccccccccccchhhhhcccccccccccccccccccccccccccccccHNNC cccccccccchhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccPHD ccchhhccccccccccccccccccccccccccccccccccccccccccchhhccccccccccccccccccPredator ccccccccccccchhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccSIMPA96 cccchhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccSOPM eecchhhcccthhhhhhhtcccccccccccccccccccccccccccccccccccccccccccccccccccSec.Cons. cccc??ccccc?hhhhhhcccccccccccccccccccccccccccccccccccccccccccccccccccc

80 90 100 110 120 130 140 | | | | | | |bnkxxx0 YRSIYFSSPKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRIVETEAYLGPEDEAAHSRGGDPM ccceeecttccccehhchhhhhhchhchhhhhheeeeeehctcccchhcheehhhhhctccchhhttttcDSC cceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccceeeeeeeecccccchhhhhcccGOR4 ceeeeecccccccccccccccccccchhhhhhhhhhhhhcccccccccchhhhhhhcccchhhhhhhcccHNNC cceeeecccccchhheehhhcchhhhhhhhhhccchehhecccccceceeeeeeecccccccchhhhcccPHD eeeeeeccchhhccccccccccchhhhhhhhhhhhhhhhhccccceccceeeeeeecccccchhhhccccPredator ceeeeecccccccccccccccccccchhhhhhhhhhheeeccccccccceeeecccccccccccccccccSIMPA96 ceeeeecccccceeeccccccccchhhhhhhhhhhhhhhhccccccccceeeecccccccchhhhhccccSOPM ceeeeeccttcceeeeeeeeccccccchhhhhhhhheeecccttccccteeeehhheccccchhhhcttcSec.Cons. ceeeeecccccccccccccccccchhhhhhhhhhhhhhhhccccccccceeee???cccccchhhhcccc

Secondary structure


PDB:1bnk and NPS(2)


150 160 170 180 190 200 210 | | | | | | |bnkxxx0 RQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRHVRSTLRKGTASRVLKDPM ccctttcchhhtccceeeeeeeeeeeeeeettttccheehhhhhhchhchhhhhheeeeehccchchehhDSC cccccceeeecccccceeeeeccceeeeecccccccchhhhhhhhhhhhhhhhhhhhhhcccccccccccGOR4 cccccccceeccccceeeeeeeeeeeeeeeeecccchhhhhhhccchhhhhhhhhhhhhccccchhhhhhHNNC ccccccccceccccceeeeeehhheeeecccccccchhhhhhhhccccchhhhhhhhhhccccccheeecPHD cccccccceeeccceeeeeeeeeeeeeeeeeecccchhheecccccccchhhhhhhhhhhccccccccccPredator cccccccccccccceeeeeeeeeeeeeeeecccccchhhhhhhhhhccccchhhhhhhhhccccchhhhhSIMPA96 cccccccccccccceeeeeeecceeeeeccccccccceehhhhhccchhhhhhhhhhhhhhccccchhhhSOPM cccccccceeecttceeeeeeeeeeeeeeecccccchheehhhhhhhhhhhhhhhhhhhhhttchhhhhhSec.Cons. cccccccceeccccceeeeeeeeeeeeeeecccccchhhhhhhh?c?h?hhhhhhhhhhhccccc?hhhh

220 230 240 250 260 270 280 | | | | | | |bnkxxx0 DRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVGVGHAGEWARKPLRFYVRDPM hhhhtctccchhhhhhhtcththhhhhhhhhhhhhhccctcchchehhhhhecectcchhhhhchheeeeDSC cccccccccchhhhhhhhhccchhhhhhhhhhhhhcccccccccceeeeecccccccccchhccceeeecGOR4 cccccccccchhhhhhhccchhhhhhhhhhhhhhhcccccccchhhhhhhhhccccccccccccceeeeeHNNC cccccccchhhhhhhhhccccccchhhhhhhhhhhcccccccchhhhhhhhhccccchhhhccceeeeeePHD cccccccchhhhhhhhhccccccchhhhhcceeeeccccccccchhhhhhhcccccccccccccceeeecPredator hhcccccccchhhhhhhhcccchhhhhhhhhhhhhcccccccchhhhhhhhhhccccccccccceeeeeeSIMPA96 cccccccchhhhhhhhhcccccccccchhhhhhhccccccccchhhhhhhhhcccccccccccceeeeecSOPM htccccccchhhhhhhhhhccchhhhhhhhheeecccccccccchhhhhhhetcccccccccccceeeetSec.Cons. ccccccccc?hhhhhhhccccchhhhhhhhhhhhhcccccccc?hhhhhhhhccccccccccccceeeee

Fold classes

Notation for fold super classes: 1:all-alpha 2:alpha*beta 3:alpha+beta 4:all-beta

Notation for fold-classes (names as in Pascarella & Argos, 1992): 1:gap 2:cytc 3:hmr 4:wrp 5:ca_bind 6:globin 7:lzm 8:crn 9:cyp 10:ac_prot 11:pap 12:256b 13:hoe 14:sns 15:ferredox 16:cpp 17:pgk 18:xia 19:kinase 20:binding 21:tln 22:barrel 23:inhibit 24:pti 25:plasto 26:cts 27:rdx 28:plipase 29:virus 30:virus_prot 31:cpa 32:dfr 33:igb 34:il 35:fxc 36:sbt 37:gcr 38:tox 39:wga 40:eglin 41:ltn 42:s_prot 43:membrane 44:nbd

5

Fold class prediction - FoldClass

FoldClass (HUSAR) predicts protein fold classes and protein domains from sequence data. The predictions are generated by artificial neural networks (Reczko, M. and Bohr, H. Nucl. Ac. Res. 22: 3616-3619 (1994)).

This program predicts:• a specific overall fold-class,• a super fold-class with respect to secondary structure content and spatial distribution• optionally, a profile of possible fold-classes along the sequence.

Fold class prediction - Threader2.3

Jones DT. THREADER: Protein sequence threading by double dynamic programming, Comp. methods in Mol Biol. New York:Elsevier 1998.http://globin.bio.warwick.ac.uk/~jones/threader.html

Algorithm:• A library of unique protein domain folds is derived from PDB• Testsequence is optimally fitted to all folds (allowing insertions/deletions)• Energy of each possible fit is calculated by summing interactions and solvationsparameters• The lowest energy fold is taken

Output: Fold class, domain folds

Special secondary structures programs in

HUSAR

Transmembrane regions - TMHMM

Helixturnhelix elements - HTHscan

Coiled coils regions - Coilscan

Amphipathic helices - Amphi, Net, Wheel

globular and nonglobular regions - SEG

Protein Analysis



Protein simulation and modelling

Protein modelling:

•Sequence -> Model structure, homology modelling, threading•Modelling ligands into an active site•Docking

Protein simulation:•Molecular dynamics - follow the thermal motions of the structure with time•Prediction and information about reaction mechanisms•Prediction of binding energies, pKa‘s, spectra, etc.

Requirements for protein modelling and

simulation

• Structural data for proteins out of the PDB - Brookhaven protein database

• Potential energy for any protein conformation - Potential energy function (PEF)

6

Protein modelling method

I have a protein sequencecan I predict its structure?

Homology modellingQuick and easy!!!!Use the SWISS-MODEL server:HTTP://www.expasy.ch/swissmod/SWISS-MODEL.html

SWISS-MODEL is an Automated Protein ModellingServer running at the GlaxoWellcome ExperimentalResearch in Geneva, Switzerland.

DisclaimerThe result of any modelling procedure is NON-EXPERIMENTAL and MUST be considered with care.This is especially true since there is no humanintervention during model building.

New 3D modeling Server Geno3d:HTTP://geno3d-pbil.ibcp.fr/

Swiss Model steps

Identification of modelling template:BLAST or FASTA against sequences of PDB,

Aligning the target sequence with the template sequence:The target sequence now needs to be aligned with the template sequence(s).

Framework constructionaveraging the position of each atom in the targetsequence, based on the location of the corresponding atoms in the template.

Building the nonconserved loops, the backbone and side chains

Model refinement energy minimisation

Energy Minimisation - Start

Calculate potentiell energy for a givenmolecule (atom coordinates):

set of nuclear positions of all atoms = R

Energy Minimisation - Method

We move the molecule so as to reduce itspotential energy.There are several routines to do this:- Steepest Descent- Gradient conjugation- and more

Unfortunately no technique can guarantee tofind the global energy minimum of a complexproblem (although simulated annealing ispartial solution).

Modelling Programs

WHATIFINSIGHTII..

GROMOSDISCOVER..

7

Model

SWISS-3DIMAGE (References) is an image database which strives to provide high quality pictures of biologicalmacromolecules with known three-dimensional structure. The database contains mostly images of experimentally elucidatedstructures, but also provides views of well accepted theoretical protein models. The images are provided in several useful formats; both mono and stereo pictures are generally available (Disclaimer).

Viewer:RasmolKinemage....

Applicability of model structures

1. Models which are based on incorrect alignments between target and template sequences. Such alignment errors generally reside in the inaccurate positioning of insertions and deletions. It is however often possible to correct such errors by producing several models based on alignment variants and by selecting the most "sensible" solution. Nevertheless, such models are often useful as the errors are not located in the area of interest e.g. a conserved active site. 2. Models based on correct alignments are much better, but their accuracy can still be medium to low. They are very useful tools for the mutagenesis experiment design, but of very limited assistance during detailed ligand binding studies. 3. The last category of models comprises all those which were build based on templates which share a high degree of sequence identity (> 70%) with the target. They have proven useful during drug design projects.

Molecule Simulation - Molecular Dynamics

- The starting place for most simulations is the experimental crystal or NMR structure. - This is energy minimized, solvated in a box of water.

- System is heated (high energy state)

- Equilibration and simulation for 1 nano seconds The detailed atomic motions are usually unimportant. What really matters are "the ensemble average" properties - i.e., what happens on average (MD is in fact chaotic with sensitive dependence on initial conditions - like the weather!).

Molecular Dynamics - Disadvantage

A disadvantage of conventional molecular dynamicsprocedures is that they can only tackle motions with arelatively short time scale - a few nanoseconds is theapproximate upper limit with current computers.

Protein Analysis



ProteomicsProteomics is the analysis of the proteom of a givencell or tissue.

Proteom

The proteom consists of all proteins which are expressed from a certain genome or tissue under certain conditions.The proteom changes with the aging or developmentof a cell or tissue, it is not static as the genome.

What is Proteomics?

8

Proteomics

1. Differences in protein expression depending on time2. Differences in protein expression depending on tissue3. Differences in protein expression depending on organism

2D-Gels > Swiss 2D database (www.expasy.ch)

Metabolic databases: Kegg (www.genome.ad.jp/kegg)

Protein analysis

Documents

Secondary structure - alpha-Helix · Secondary structure - alpha-Helix ... The structure repeats itself every 5.4 ¯ along the helix axis, ... MULTICOIL (Wolf E., Kim P. S, Berger