92
Computational Molecular Biology Protein Structure and Homology Modeling Prof. Alejandro Giorge1 Dr. Francesco Musiani Friday, March 1, 13

Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

  • Upload
    others

  • View
    13

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Computational MolecularBiology

Protein Structureand

Homology Modeling

Prof.  Alejandro  Giorge1Dr.  Francesco  Musiani

Friday, March 1, 13

Page 2: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Sequence, function and structure relationships

v Life is the ability to metabolize nutrients, respond to external stimuli, grow, reproduce and evolve

v From a chemical point of view, proteins are linear hetero-polymers formed by amino acids (aa)

Friday, March 1, 13

Page 3: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Sequence, function and structure relationships

v Life is the ability to metabolize nutrients, respond to external stimuli, grow, reproduce and evolve

v From a chemical point of view, proteins are linear hetero-polymers formed by amino acids (aa)

v Proteins assume a 3D shape which is usually responsible for function

v The consequence of the tight link between structure, function and evolutionary pressure distinguish proteins from ordinary polymers

Friday, March 1, 13

Page 4: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v The sequence of amino acids is called the primary structure

v Secondary structure refers to local folding

v Tertiary structure is the arrangement of secondary elements in 3D

v Quaternary structure describes the arrangement of a protein subunits

v The peptide bond is planar and the dihedral angle it defines is almost always 180°

Friday, March 1, 13

Page 5: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v What is a dihedral angle?

Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.

Friday, March 1, 13

Page 6: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v What is a dihedral angle?

Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.

Friday, March 1, 13

Page 7: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v What is a dihedral angle?

Is the angle between two planes. In practice, if you have four connected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.

Friday, March 1, 13

Page 8: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v The simplest arrangements of aa is the alpha-helix, a right handed spiral conformation.

v The structure repeats itself every 5.4 Å along the helix axis.

v There are 3.6 aa per turn.

O(n)-­‐NH(n+4)

H-­‐bond

Friday, March 1, 13

Page 9: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

v The beta sheet.

v The R groups of neighboring residues in strand point in opposite directions.

v There are parallel or anti-parallel beta sheets.

Friday, March 1, 13

Page 10: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Ramchandan plot: pairs of angles that do not cause the atoms of a dipeptide to collide.

Friday, March 1, 13

Page 11: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Ramchandan plot: pairs of angles that do not cause the atoms of a dipeptide to collide.

Friday, March 1, 13

Page 12: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Friday, March 1, 13

Page 13: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Right-­‐handedα-­‐helix

Friday, March 1, 13

Page 14: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Right-­‐handedα-­‐helix

Parallelβ-­‐sheet

Friday, March 1, 13

Page 15: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Right-­‐handedα-­‐helix

An<-­‐parallelβ-­‐sheet

Parallelβ-­‐sheet

Friday, March 1, 13

Page 16: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Le?-­‐handedα-­‐helix

Right-­‐handedα-­‐helix

An<-­‐parallelβ-­‐sheet

Parallelβ-­‐sheet

Friday, March 1, 13

Page 17: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Le?-­‐handedα-­‐helix

Right-­‐handedα-­‐helix

An<-­‐parallelβ-­‐sheet

Parallelβ-­‐sheet

Collagentriple  helix

Friday, March 1, 13

Page 18: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Loops:regions without repetitive structure that connects secondary structure elements.

Friday, March 1, 13

Page 19: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Supersecondary elements (motifs):

arrangements of two or three consecutive

secondary structure that are present in many

different protein structures, even with completely different

sequences.

Friday, March 1, 13

Page 20: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Protein structure

Domains: portion of the polypeptide chain that folds into a compact semi-independent unit.

v Class  (C)Derived  from  secondary  structure  content  is  assigned  automa<cally

v Architecture  (A)Describes  the  gross  orienta<on  of  secondary  structures,  independent  of  connec<vity.

v Topology  (T)Clusters  structures  according  to  their  topological  connec<ons  and  numbers  of  secondary  structures

v Homologous  superfamily  (H)

Friday, March 1, 13

Page 21: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Gly:  unusual  ramachandran,  o?en  found  in  turns

Ala:  transient  interac<ons    

Cys:  Very  reac<ve,  coordinate  metals.    

Thr,  Ser:  phosphoryla<on  target:  protein  kinases  aNack  phosphate  group  to  the  side-­‐chain.    

Thr:  Beta-­‐branched  more  o?en  found  in  beta-­‐sheets.  

Friday, March 1, 13

Page 22: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein

Thermodynamics: ΔG = ΔH – TΔS

Friday, March 1, 13

Page 23: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein

Thermodynamics: ΔG = ΔH – TΔS (i.e. stability of a given conformation)

Friday, March 1, 13

Page 24: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

What is protein fold:vCompact, globular folding arrangement of the polypeptide chainvChain folds to optimize packing of the hydrophobic residues in the interior core of the protein

Thermodynamics: ΔG = ΔH – TΔS (i.e. stability of a given conformation)

Enthalpy: electrostatics, dispersion, van der Waals, H-bonds.

Entropy: water molecules form “ordered cages” around hydrophobic amino acids. The protein folding process breaks this order.

The free energy of folding of a protein is of the order of few kcal/mol

Friday, March 1, 13

Page 25: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence

Friday, March 1, 13

Page 26: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence

v Levinthal paradox: because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations

Friday, March 1, 13

Page 27: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The problem of protein folding

v Anfinsen’s dogma: (at least for small globular proteins) the native structure is determined only by the protein's amino acid sequence

v Levinthal paradox: because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations

v Funnel theory: every protein has a specific folding pathway

Friday, March 1, 13

Page 28: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

- 0 +

The problem of protein folding

ΔH

ΔG

–TΔS

–TΔS

Conformationalentropy

Folding

Result:

Hydrophobiceffects

Internalinteractions

Friday, March 1, 13

Page 29: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

- 0 +

The problem of protein folding

ΔH

ΔG

–TΔS

–TΔS

Conformationalentropy

Folding

Result:

Hydrophobiceffects

Internalinteractions

Friday, March 1, 13

Page 30: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

- 0 +

The problem of protein folding

ΔH

ΔG

–TΔS

–TΔS

Conformationalentropy

Folding

Result:

Hydrophobiceffects

Internalinteractions

Friday, March 1, 13

Page 31: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Evolution of protein structure

v What if a base-substitution event occurs in a protein-coding DNA region?

A. The fine balance between the gain and loss of free energy of folding is compromised: no single energy minimun → NOT FOLD

Friday, March 1, 13

Page 32: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Evolution of protein structure

v What if a base-substitution event occurs in a protein-coding DNA region?

A. The fine balance between the gain and loss of free energy of folding is compromised: no single energy minimun → NOT FOLD

B. The energy landscape of the protein change, but there is a global minimum of energy → same or similar function (i.e. local perturbations without affecting the general shape or topology) FOLD

Friday, March 1, 13

Page 33: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

The “comparative modeling” principle

Friday, March 1, 13

Page 34: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Evolutionary-based methods for protein structure prediction

v Proteins evolved from a common ancestor maintain similar core 3D structures

We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)

by starting from the sequence

This can be done if the templates and the targetare evolutionarily correlated

Friday, March 1, 13

Page 35: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Evolutionary-based methods for protein structure prediction

v Proteins evolved from a common ancestor maintain similar core 3D structures

We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)

by starting from the sequence

This can be done if the templates and the targetare evolutionarily correlated

Friday, March 1, 13

Page 36: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Evolutionary-based methods for protein structure prediction

v Proteins evolved from a common ancestor maintain similar core 3D structures

We can use protein of known structure (templates)to model protein of unknown 3D structure (targets)

by starting from the sequence

This can be done if the templates and the targetare evolutionarily correlated

Friday, March 1, 13

Page 37: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Why Protein Structure Prediction?

We have an experimentally determined atomic structure for only ~1% of

the known protein sequences

Friday, March 1, 13

Page 38: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Why Protein Structure Prediction?

Growth in the number of unique foldsper year in the PDB based on the SCOP data

base from 1986 to 2007

Friday, March 1, 13

Page 39: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Why?

v We can use homology modeling to predict the structure of proteins of unknown structure…

but also…

Friday, March 1, 13

Page 40: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Why?

v We can use homology modeling to predict the structure of proteins of unknown structure…

but also…

To reconstruct some missing part in an incomplete protein structure (common in low resolution structures

or for large mobile loops)

Friday, March 1, 13

Page 41: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Why?

v We can use homology modeling to predict the structure of proteins of unknown structure…

but also…

To reconstruct some missing part in an incomplete protein structure (common in low resolution structures

or for large mobile loops)

To calculate a mutant of a known protein structure

To calculate the mean structure of an NMR ensamble

Friday, March 1, 13

Page 42: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Query sequence

Friday, March 1, 13

Page 43: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Query sequence

Sequence databases

Friday, March 1, 13

Page 44: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Query sequence

Sequence databases

Friday, March 1, 13

Page 45: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Query sequence

Sequence databases

Template PDB structure(s)

Friday, March 1, 13

Page 46: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Calculate model(s)

Query sequence

Sequence databases

Template PDB structure(s)

Friday, March 1, 13

Page 47: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Calculate model(s)

Assess resultsRefinement

(loops)

Query sequence

Sequence databases

Template PDB structure(s)

Friday, March 1, 13

Page 48: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Calculate model(s)

Assess resultsModel(s)Refinement

(loops)

Query sequence

Sequence databases

Template PDB structure(s)

Friday, March 1, 13

Page 49: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

Search for suitable

template(s)

Align sequence with

template(s)

Calculate model(s)

Assess resultsModel(s)Refinement

(loops)

Query sequence

Sequence databases

Template PDB structure(s)Possible errors

Friday, March 1, 13

Page 50: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Homology modeling flowchart

hNp://salilab.org/modeller/

Friday, March 1, 13

Page 51: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

How does it works?

Friday, March 1, 13

Page 52: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

How does it works?

Friday, March 1, 13

Page 53: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

1. Align sequence with structures

vFirst, must determine the template structures• Simplistically, try to align the target sequence against every

known structure’s sequence.• In practice, this is too slow, so heuristics are used (e.g. BLAST)• Profile or HMM searches are generally more sensitive in difficult

cases (Modeller’s profile.build method, PSI-BLAST or HHpred)• Could also use threading or other web servers

v Remember to look at:

Friday, March 1, 13

Page 54: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

1. Align sequence with structures

vFirst, must determine the template structures• Simplistically, try to align the target sequence against every

known structure’s sequence.• In practice, this is too slow, so heuristics are used (e.g. BLAST)• Profile or HMM searches are generally more sensitive in difficult

cases (Modeller’s profile.build method, PSI-BLAST or HHpred)• Could also use threading or other web servers

v Remember to look at:• Sequence identity/similarity between the putative template(s) and

the target• Experimental method, resolution and completeness of the

template(s)• Other compounds bound to the template(s)• Oligomerization state

Friday, March 1, 13

Page 55: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

1. Align sequence with structures

vAlignment to templates• Sequence-sequence: relies purely on a matrix of observed

residue-residue mutation probabilities (‘align’)• Sequence-structure: gap insertion is penalized within secondary

structure (helices etc.) (‘align2d’)• Other features, profile-profile, and/or user-defined (‘salign’) or

use an external program

v Remember:• An error in the alignment is always a fatal error for the whole

modeling procedure!• One amino acid sequence plays coy; a pair of homologous

sequences whisper; many aligned sequences shout out loud (A.M. Lesk, Introduction to Bioinformatics, 2002)

Friday, March 1, 13

Page 56: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

1. Align sequence with structures

vEvaluation of sequence alignment quality

E.  Krieger,  S.B.  Nabuurs,  G.  Vriend:  „Homology  modeling“.  In  Structural  Bioinforma<cs.  P.E.  Bourne  and  H.  Weissig  Eds.  (2003).

Friday, March 1, 13

Page 57: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

2. Extract spatial restraints

vSpatial restraints incorporate homology information, statistical preferences, and physical knowledge

• Template Cα- Cα internal distances• Backbone dihedrals (φ/ψ)• Sidechain dihedrals given residue type of both target and

template• Force field stereochemistry (bond, angle, dihedral)• Statistical potentials• Other experimental constraints• Etc.

Friday, March 1, 13

Page 58: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

3. Satisfy spatial restraints

v Satisfaction of spatial restraints

• Represent system at appropriate level(s) of resolution (e.g. atoms, residues, domains, proteins)

• Convert each data source into spatial restraints (e.g. harmonic distance simulates using “spring”)

• Sum all restraints into a scoring function• Generate models that are consistent with all restraints by

optimizing the scoring function (e.g. conjugate gradients, molecular dynamics, Monte Carlo)

Friday, March 1, 13

Page 59: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

3. Satisfy spatial restraints

v All information is combined into a single objective function

• Force field (CHARMM 22) simply added in• Function is optimized by conjugate gradients and simulated

annealing molecular dynamics, starting from the target sequence threaded onto template structure(s)

• Multiple models are generally recommended• ‘best’ model or cluster or models chosen by simply taking the

lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, or external programs such as PROSA or DFIRE

Friday, March 1, 13

Page 60: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

4. Assess results

Friday, March 1, 13

Page 61: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

4. Assess results

v How do we know if the model is a good one?

• Check log file for restraint violations and Modeller score (molpdf) (not reliable since the scoring function is not perfect!)

• Use another assessment score on the final modelØ Statistical Potential: GA341, DOPE, QMEAN

Ø Other programs (e.g. Prosa, Verify3D..)

• Use structure assessment programs (e.g. ProCheck)• Fit the model to some other experimental data not used in the

modeling procedure

Friday, March 1, 13

Page 62: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Typical assessments

DOPE  profile

Friday, March 1, 13

Page 63: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Typical assessments

DOPE  profile

Ramachandranplot  (ProCheck)

Friday, March 1, 13

Page 64: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Typical assessments

DOPE  profile

Ramachandranplot  (ProCheck)

PROSA  profile

Friday, March 1, 13

Page 65: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Structural alignment

Structural  alignment  of  thioredoxins  from  humans  (red)  and  the  fly  

Drosophila  melanogaster  (yellow)

Root-mean square deviation (RMSD)

Where xi and xj are the coordinate vectors of the

structure i and j, respectively, and N is the

number of atoms of the two strucures

2

Friday, March 1, 13

Page 66: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Typical errors in comparative models

Friday, March 1, 13

Page 67: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Model Accuracy as a Function of Target-Template Sequence Identity

Friday, March 1, 13

Page 68: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Model accuracy

Friday, March 1, 13

Page 69: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Applications of protein structure models

Drug  designVirtual  screening

DockingBinding  site  detec<on

Mutagenesis  designFunc<onal  rela<onship

Topology  recogni<onFamili  assignment

Overall  fold

Friday, March 1, 13

Page 70: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Model refining

v Loop optimization

• Often, there are parts of the sequence which have no detectable templates

• “Mini folding problem” – these loops must be sampled to get improved conformations

• Database searches only complete for 4-6 residue loops• Modeller uses conformational search with a custom energy

function optimized for loop modeling (statistical potential derived from PDB)Ø Fiser/Melo protocol (‘loopmodel’)Ø Newer DOPE + GB/SA protocol (‘dope_loopmodel’)

Friday, March 1, 13

Page 71: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Model refining

v Accuracy of loop models as a function of amount of optimization

Friday, March 1, 13

Page 72: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Model refining

v Fraction of loops modeled with medium accuracy (<2Å)

Friday, March 1, 13

Page 73: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Advanced topics

v Modeller can also• Perform more sensitive searches for templates (sequence-profile,

profile-profile, similar to PSI-BLAST)• Incorporate ligands, RNA/DNA and water molecules into built

models• Build structures of multi-chain proteins (homo or hetero)• Add extra restraints to the modeling process (such as known

distances, e.g. from FRET)• Use multiple templates to build a model

v Remember:

Friday, March 1, 13

Page 74: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Advanced topics

v Modeller can also• Perform more sensitive searches for templates (sequence-profile,

profile-profile, similar to PSI-BLAST)• Incorporate ligands, RNA/DNA and water molecules into built

models• Build structures of multi-chain proteins (homo or hetero)• Add extra restraints to the modeling process (such as known

distances, e.g. from FRET)• Use multiple templates to build a model

v Remember: • You don’t have to use Modeller for template search, alignment,

assessment or refinement. If you know your template (e.g. from BLAST) just format the alignment for Modeller and skip straight to the model building step!

Friday, March 1, 13

Page 75: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

42

Hidden Markov Models

A dishonest croupier could use a dice that has a higher probability of landing on a “6,” (e.g., 50%). To avoid being caught, the croupier can switch from a fair die to a loaded die with a certain frequency. For example, he can change the die from fair to loaded after 20 rolls and from loaded to fair after 10 rolls.

v Likelihood evaluation Given a series of emissions X1, X2, X3... Which is the probability that our model had emitted the observed sequence?

v Alignment.Given the sequence of observed emissions: which is the sequence of hidden states that generated it?

v Training:How can we optimize the statistical parameters in order to maximize probabilities 1 and 2?

Friday, March 1, 13

Page 76: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

43

Hidden Markov Models: Protein Structural Bioinformatics

• In structure prediction, models can best be thought of as “sequence generators” (e.g., Hidden Markov Models) or “sequence classifiers” (e.g., Neural Networks)

v Likelihood evaluationPerformed using dynamic programming algorithms (similar to the ones used in sequence alignments)

v Alignment• Thus, given a model and a sequence we want to determine the

probability of any specific (query) sequence having been generated by the model in any of each possible paths.

v Training The model is ‘trained’ by aligning protein families. •

Friday, March 1, 13

Page 77: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

44

Hidden Markov Models

v Described by

ü A set of possible states: match, insert, deletion.

ü A set of possible observations: frequencies of aa in each position.

ü A transition probability matrix

ü An emission probability matrix (frequencies of aa occurring in a particular state).

ü Initial state probabilities.

Friday, March 1, 13

Page 78: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... W G K V G A - - H A G E ...HBB_human ... W G K V - - - - N V D E ...MYG_phyca ... W G K V E A - - D V A G ...LGB2_luplu ... W K D F N A - - N I P K ...GLB1_glydi ... W E E I A G A D N G A G ...

0 0.25 0.75 0 0.2 0.4 0...A ...

0 0 0 0.2 0 0.2 0...D ...0 0.25 0 0 0 0 0.4...E ...

0.2 0 0 0 0 0 0...F ...0 0.25 0.25 0 0.2 0.2 0.4...G ...0 0 0 0.2 0 0 0...H ...

0.2 0 0 0 0.2 0 0...I ...0 0 0 0 0 0 0.2...K ...0 0 0 0 0 0 0...L ...

0 0.25 0 0.6 0 0 0...N ...0 0 0 0 0 0.2 0...P ...

0.6 0 0 0 0.4 0 0...V ...0 0 0 0 0 0 0...W ...

0 0 0 0 0 0 0...C ...

0 0 0 0 0 0 0...M ...

0 0 0 0 0 0 0...T ...

0 0 0 0 0 0 0...Q ...0 0 0 0 0 0 0...R ...0 0 0 0 0 0 0...S ...

0 0 0 0 0 0 0...Y ...

Each column of the profile pj(a)

contains the amino acid

frequencies in the multiple sequence

alignment

0

00.2

00.6

00

0.20

00

00

0

0

0

000

0

0

0.20.2

0000

0.60

00

00

0

0

0

000

0

0

00000000

00

01.0

0

0

0

000

0

master sequence

Sequence profiles are a condensed representation of alignments

Friday, March 1, 13

Page 79: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G A G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D Deletions

Insertions0 0.25 0.2 0.4 0 0...A ...

0 0 0 0.2 0 0...D ...0 0.25 0 0 0.4 0...E ...

0.2 0 0 0 0 0...F ...0 0.25 0.2 0.2 0.4 0...G ...0 0 0 0 0 0.4...H ...

0.2 0 0.2 0 0 0...I ...0 0 0 0 0.2 0...K ...0 0 0 0 0 0...L ...

0 0.25 0 0 0 0...N ...0 0 0 0.2 0 0...P ...

0.2 0 0 0 0 0...M→ D ...

0 0 0 0 0 0...C ...

0 0 0 0 0 0...M ...

0 0 0 0 0 0...W ...0 0 0 0 0 0.2...Y ...

0 1.0 0 0 0 0...D→ D ...

0 0 0 0 0 0...I → I ...0 0 0 0 0 0...M→ I ...

0.75

000

0.250000

00

0

0

0

00

0

0.50.25

0

0.2000

0.2000

0.60

0

0

0

00

0

00

Match or Delete

Probabilities for Insert Open Insert Extend Delete Open Delete Extend

HMM include position specific gap penalties

Friday, March 1, 13

Page 80: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D

D

I

D

I

D

I

D

I

D

I

D

I

D

I

D

I

… …HMM p M M M M MMMM

Profile HMM can be represented as states connected by transitions

Probability that a sequence is emitted by an HMM rather than by a random model?

The probability for emitting the sequence x1, . ., xL along the path through an HMM is: P(x1, . . . , x1|emission on path).

This probability is a product of the amino acid emission probabilities for each state on the path and the transition probabilities between states.

Friday, March 1, 13

Page 81: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D

D

I

D

I

D

I

D

I

D

I

D

I

D

I

D

I

… …

0 0.25 0.2 0.4 0 0A

0.2 0 0 0 0 0M→D

0 0 0 0 0 0C

0 0 0 0 0 0W0 0 0 0 0 0.2Y

0 1.0 0 0 0 0D→D

0 0 0 0 0 0I → I0 0 0 0 0 0M→ I

0.75

0

0

00

0

0.5 0.25

0

0

0

00

0

00

HMM p

pi(a)

pi(X→Y)

Matrix:

M M M M MMMM

Profile HMM can be represented as states connected by transitions

Friday, March 1, 13

Page 82: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D

D

I

D

I

D

I

D

I

D

I

D

I

D

I

D

I

… …

0 0.25 0.2 0.4 0 0A

0.2 0 0 0 0 0M→D

0 0 0 0 0 0C

0 0 0 0 0 0W0 0 0 0 0 0.2Y

0 1.0 0 0 0 0D→D

0 0 0 0 0 0I → I0 0 0 0 0 0M→ I

0.75

0

0

00

0

0.50.25

0

0

0

00

0

00

HMM p

pi(a)

pi(X→Y)

Matrix:

M M M M MMMM

Profile HMM can be represented as states connected by transitions

Friday, March 1, 13

Page 83: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D

D

I

D

I

D

I

D

I

D

II

DD

I

D

I

… …

0 0.25 0.2 0.4 0 0A

0.2 0 0 0 0 0M→D

0 0 0 0 0 0C

0 0 0 0 0 0W0 0 0 0 0 0.2Y

0 1.0 0 0 0 0D→D

0 0 0 0 0 0I → I0 0 0 0 0 0M→ I

0.75

0

0

00

0

0.50.25

0

0

0

00

0

00

HMM p

pi(a)

pi(X→Y)

Matrix:

M M M M MMMM

Profile HMM can be represented as states connected by transitions

Friday, March 1, 13

Page 84: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

HBA_human ... V G A . . H A G E Y ...HBB_human ... V - - . . N V D E V ...MYG_phyca ... V E A . . D V A G H ...LGB2_luplu ... F N A . . N I P K H ...GLB1_glydi ... I A G a d N G - G V ...

M/D M/D M/D I I M/D M/D M/D M/D M/D

D

I

D

I

D

I

D

I

D

I

D

I

D

I

D

I

… …

0 0.25 0.2 0.4 0 0A

0.2 0 0 0 0 0M→D

0 0 0 0 0 0C

0 0 0 0 0 0W0 0 0 0 0 0.2Y

0 1.0 0 0 0 0D→D

0 0 0 0 0 0I → I0 0 0 0 0 0M→ I

0.75

0

0

00

0

0.50.25

0

0

0

00

0

00

HMM p

pi(a)

pi(X→Y)

Matrix:

M M M M MMMM

Profile HMM can be represented as states connected by transitions

Friday, March 1, 13

Page 85: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

State q

State p

M

D

I

M

D

I

M

D

I

M

D

I

M

D

I

M

D

I

M

D

I

HMM q

M

M

M

M

M

I

M

M

M

M

D

M

M

M

D

I

M

D

I

M

D

I

M

D

I

M

D

I

HMM p

x1 x2 x3 x4 x5 x6

Söding, J. (2005) Bioinformatics 21, 951-960.

Include Null model maximize “log-sum-of-odds score”

Co-emitted sequence

Find path through two HMM that maximizes co-emission probability

Friday, March 1, 13

Page 86: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Excercise

v Target:• human thioesterase 8 : interacts with HIV-1 Nef protein.

v Procedure:• Search for templates using HHpred• Prepare Modeller input files• Build the models• Evaluate the model structure

v Materials and Methods:• UniProt• Modeller (http://salilab.org/modeller/)• Modeller manual• ProCheck web server (http://www.ebi.ac.uk/thornton-srv/

databases/pdbsum/Generate.html)• Prosa web server (https://prosa.services.came.sbg.ac.at/

prosa.php)

Friday, March 1, 13

Page 87: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

Profile method

For each aa we can calculate the frequency in Secondary elements Surface of the protein Hydrophobic environment ...

Each aa is substituted by a letter (property)

From the structure we can analyze positions in terms of:

- Presence in secondary structure element - Percentage of solvent exposition

- Hydrophobic or polar environment?

Principle: Find a compatible fold

    >Target Sequence XY     MSTLYEKLGGTTAVDLAVAAVA     GAPAHKRDVLNQ

Rank models  according to

    SCORE or      ENERGY

  Build model of    target protein    based on eachtemplate structure

Thus each structure is converted into property sequences...not aa

  PDB becomes a ‘property sequence’ DB. Thus we have to just align ‘property sequences’

Fold recognition

Friday, March 1, 13

Page 88: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

 M

A

    TE

A

F

TS

G

Q

Fold recognition

v Threading methods

Ø Statistical Potentials

Ø Programs:

• Threader, mgenthreader.

• Several approximations: Frozen approximation used for accelerate calculations

• In the past used for remote homology assessment• Now used in automatic projects for the structural prediction of

the entire human genome.

Friday, March 1, 13

Page 89: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

 M

A

    TE

A

F

TS

G

Q

Fold recognition

v Threading methods

Ø Statistical Potentials

Ø Programs:

• Threader, mgenthreader.

• Several approximations: Frozen approximation used for accelerate calculations

• In the past used for remote homology assessment• Now used in automatic projects for the structural prediction of

the entire human genome.

Friday, March 1, 13

Page 90: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

New folds

Friday, March 1, 13

Page 91: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

New folds

v ‘Ab inito modeling’ or de novo prediction

Ø Folding by statistical approaches: ‘very’ coarse-grainedØ Force FieldsØ Fragment Assemblies.

Ø Structure with common structural motifs or supersecondary structuresØ The relationship between local sequence and local structure is highly

degenerated

Ø Programs: Fragfold and RosettaØ These approaches were a real breakthrough in the fieldØ New folds, difficult crustal structures, difficult modeling,

protein design: see articles by David Baker.

Friday, March 1, 13

Page 92: Computational Molecular Biology Protein Structure and ...molsim.sci.univr.it/2014_bioinfo2/Structural_Modeling.pdf · Computational Molecular Biology Protein Structure and Homology

MSSPQAPEDGQGCGDRGDPPGDLRSVLVTTV

        ROSETTA      9  aa  fragmentsChoose  the  25  closest  sequences

         ROSETTA

 Simulated Annealing of dihedral angles

              FRAGFOLDSupersecondary  structure  elements

       tri, tetra and penta peptides     Each fragment is energetically evaluated             (Statistical potential)

Optimization and Assembly                    (statistical potential)                FRAGFOLD          Random combination of fragments

Simulated annealing

Fragment Assembly

Friday, March 1, 13