37
Proteins ructural Bioinformati

Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures Swissprot PIR TREMBL (translated from DNA) PDB

  • View
    220

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

Proteins

Structural Bioinformatics

Page 2: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

2

Page 3: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

3

Specific databases of protein sequencesand structures

Swissprot PIR TREMBL (translated from DNA) PDB (Three Dimensional Structures)

Page 4: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

4

“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”

Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry.

Myoglobin – the first high resolution protein structure

Page 5: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

5

Why Proteins Structure ?Why Proteins Structure ?

Proteins are fundamental components of all living

cells, performing a variety of biological tasks.

Each protein has a particular 3D structure that

determines its function.

Protein structure is more conserved than protein

sequence , and more closely related to function.

Page 6: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

6

There Are Four Levels of Protein StructurePrimary: amino acid linear sequence.

Secondary: -helices, β-sheets and loops.

Tertiary: the 3D shape of the fully folded polypeptide chain

Quaternary: arrangement of several polypeptide chains.

Page 7: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

7

Symbols for the 20 amino acids

A ala alanine M met methionineC cys cysteine N asn aspargineD asp aspartic acid P pro prolineE glu glutamic acid Q gln glutamineF phe phenylalanine R arg arginineG gly glycine S ser serineH his histidine T thr threonineI ile isoleucine V val valineK lys lysine W trp tryptophaneL leu leucine Y tyr tyrosine

Page 8: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

8

Secondary StructureSecondary structure is usually divided into

three categories:

Alpha helix Beta strand (sheet)Anything else –

turn/loop

Page 9: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

9

3.6 residues

5.6 Å

Alpha HelixAlpha Helix: : Pauling Pauling ((19511951))

• A consecutive stretch of 5-40 amino

acids (average 10).

• A right-handed spiral conformation.

• 3.6 amino acids per turn.

• Stabilized by H-bonds in the backbone between C=O of residue n, and NH of residue n+4.

• Side-chains point out.

Page 10: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

10

Beta StrandBeta Strand: : Pauling and Corey Pauling and Corey ((19511951))

• Different polypeptide chains run alongside each

other and are linked together by hydrogen bonds.

• Each section is called β -strand,

and consists of 5-10 amino acids.

β -strand

Page 11: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

11

The strands become adjacent to each other, forming beta-sheet.

Beta SheetBeta Sheet3.47Å

4.6Å

3.25Å

4.6Å

(a)Antiparallel(b)Parallel

Page 12: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

12

LoopsLoops• Connect the secondary structure

elements.

• Have various length and shapes.

• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.

• Proteins that are evolutionary related have the same helices & sheets but may vary in loop structures.

Page 13: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

13

How is the 3D Structure Determined ?How is the 3D Structure Determined ?

1. Experimental methods (Best approach):1. Experimental methods (Best approach):• X-rays crystallography.

• NMR.

• Others.

2. In-silico methods (partial solutions - 2. In-silico methods (partial solutions -

based on similarity):based on similarity):.• Threading - needs a 3D structure, combinatorial complexity.

• Ab-initio structure prediction - not always successful.

Page 14: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

14

X-ray crystallography1. Obtain an ordered protein crystal.

2. Check x-ray diffraction.

The crystal is bombarded The crystal is bombarded with X-ray beams.with X-ray beams.

The collision of the beams The collision of the beams with the electrons creates with the electrons creates a diffraction pattern.a diffraction pattern.

Page 15: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

15

X-ray crystallography3. Analyze diffraction pattern and produce an

electron density map.

4. Thread the known protein sequence into the density map.

Page 16: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

16

X-ray crystallography

• The molecules must be very pure in order to produce perfect and stable crystals.

• The method is time-consuming and difficult.

Page 17: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

17

NMR - Nuclear MagneticResonance (since 1945)

• A sample is immersed in a magnetic field and bombarded with radio waves.

• The molecule’s nucleus resonate (spin). This motion is determined and is specific for each molecule type.

Page 18: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

18

Principles of NMR

Page 19: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

19

NMR - Nuclear MagneticResonance

• The NMR technique is very time consuming and expensive, and the sample has to be in a concentrated solution, and is limited to small and soluble molecules.

Page 20: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

20

PDB: Protein Data Bank• Holds 3D models of biological macromolecules (protein,

RNA, DNA).

• All data are available to the public.

• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).

• Submitted by biologists and biochemists from around the world.

Page 21: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

21

PDB – Protein Data Bank

http://www.rcsb.org/pdb/

Page 22: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

22

How Many Structures ?How Many Structures ?PDB Content Growth

http://www.rcsb.org/pdb/holdings.html

Page 23: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

23

Structure Prediction: Motivation

• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)

• Only about 28000 solved structures (PDB)Experimental methods are time consuming and not always posible

• Goal: Predict protein structure based on sequence information

Page 24: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

24

Structure Prediction: Motivation

• Understand protein function– Locate binding sites

• Broaden homology– Detect similar function where sequence differs

• Explain disease– See effect of amino acid changes– Design suitable compensatory drugs

Page 25: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

25

Prediction Approaches

• Primary (sequence) to secondary structure– Sequence characteristics

• Secondary to tertiary structure– Fold recognition– Threading against known structures

• Primary to tertiary structure– Ab initio modelling

Page 26: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

26

Secondary structures have an amphiphilic nature :one face polar and the other non polar

Non-polarpolar

-helix -sheet

non-polar

polar

polar

Can we predict the secondary structure from sequence ?

Page 27: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

27

Secondary Structure Prediction Methods

• Chou-Fasman / GOR Method– Based on amino acid frequencies

• Artificial Neural Network (ANN) methods– PHDsec and PSIpred

• HMM (Hidden Markov Model)

• Best accuracy now ~80%

Page 28: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

28

Chou and Fasman (1974)Name P(a) P(b) P(turn)

Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50

The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)

Success rate of 50%

Page 29: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

29

Secondary Structure Method Improvements

‘Sliding window’ approach• Most alpha helices are ~12 residues long

Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold

predict this is an alpha helix/beta sheet

TGTAGPOLKCHIQWMLPLKKTGTAGPOLKCHIQWMLPLKK

Page 30: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

30

Improvements in the 1980’s

• Adding information from conservation in MSA

• Smarter algorithms (e.g. HMM, neural networks).

Success -> ~80%

Page 31: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

31

PHDsec and PSIpred

• PHDsec– Rost & Sander, 1993– Based on sequence family alignments

• PSIpred– Jones, 1999– Based on Position Specific Scoring Matrix Generated by PSI-BLAST

• Both consider long-range interactions

Page 32: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

32

HMM

• HMM enables us to calculate the probability of assigning a sequence of hidden states to the observation

TGTAGPOLKCHIQWML TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBBHHHHHHHLLLLBBBBB

p? =

observation

Hidden state

Page 33: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

33

The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15

The probability of observing

Alanine as part of a β-sheet

Table built according to large database of known secondary structures

α-helix followed by

α-helix

Beginning with an α-

helix

Page 34: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

34

HMM

• The above table enables us to calculate the probability of assigning secondary structure to a protein

• ExampleTGQTGQHHHHHH

p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

Page 35: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

35

SS prediction using ANN

ACDEFGHI

KL

MNPQRSTVWY.

Inputs for one positionAmino

acid at position

Page 36: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

36

PHDsec Neural Net

ACDEFGHI

KL

MNPQRSTVWY.

Inputs for one positionAmino

acid at position

Hidden layer

OutputsH= helixE= strandC= CoilConfidence 0=low,9=high

Page 37: Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB

37

Secondary structure prediction

• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University

of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,

EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks

at University of California • DLP - Domain linker prediction at RIKEN