Upload
nora-summers
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
Theoretical and computational Theoretical and computational methods for the study of methods for the study of
biological macromoleculesbiological macromolecules
Rachid C. Maroun, PhDRachid C. Maroun, PhDUnité de Bioinformatique Unité de Bioinformatique
StructuraleStructuraleInstitut PasteurInstitut PasteurParis, FRANCEParis, FRANCE
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN
CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
INTRODUCTIONINTRODUCTION
Biological molecules are complex systems.Biological molecules are complex systems.
The true conformation is a network of loops, twists The true conformation is a network of loops, twists and folds, all stacked together in a well-defined and folds, all stacked together in a well-defined dynamical 3D structure. It’s this morphology, in dynamical 3D structure. It’s this morphology, in general that gives the protein its “life” and that general that gives the protein its “life” and that determines its activity, its role.determines its activity, its role.
The determination of the 3D structure of biological The determination of the 3D structure of biological
molecules and the way in which this structure is molecules and the way in which this structure is linked to the function and to the sequence is of linked to the function and to the sequence is of fundamental importance. fundamental importance.
-L-galactopyranose,a monosaccharide
Di- and polysaccharides
CarbohydratesCarbohydrates
RNA
BRANCHPOINT HELIX FROM YEAST AND BINDING SITE FOR PHAGE GA/MS2 COAT PROTEINS
The complexes The complexes formed withformed with ligands ligands
(substrates (substrates cofactors, cofactors, inhibitors, drugs, inhibitors, drugs, receptor agonists receptor agonists and antagonists)and antagonists)
other other macromoleculesmacromolecules
Important for biological function:
Growth of databanksGrowth of databanks
3D protein structures3D protein structuresNucleotide sequencesNucleotide sequences
Growth of PDB
Thus, theoretical methods aim at:Thus, theoretical methods aim at:
Generating reliable 3D models in a Generating reliable 3D models in a reasonable amount of timereasonable amount of time
Avoiding an exhaustive experimental Avoiding an exhaustive experimental determination of the structures of all determination of the structures of all sequences, e.g. Structural Genomicssequences, e.g. Structural Genomics
Relationships between the sequence, Relationships between the sequence, and 3D structure spacesand 3D structure spaces
Séquences
> 35% identité
< 35% identité
1CEM00 (Cellulase)
*Classification CATH
Topology (fold family)*
Homologous superfamily*
3D Structures
> 35% identité
Relationships between the sequence, Relationships between the sequence, 3D structure and function spaces3D structure and function spaces
Séquences
Fonctions
> 35% identité
< 35% identité
1CEM00 (Cellulase)
Alpha/alpha barrel
Gtransférase
Architecture*
*Classification CATH
Topology (fold family)*
Homologous superfamily*
Structures
> 35% identité
Some conclusionsSome conclusions
• The space of structures is finite and much smaller than that of sequences and functions.
• Evolutionary, the 3D structure is more conserved than the sequence.
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN
CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
Primary, i.e. the sequencePrimary, i.e. the sequence MELIRVLANLLILQLSYAQKSSELVFGGDECNINEHRMELIRVLANLLILQLSYAQKSSELVFGGDECNINEHR
SLVVLFNSNGFLCGGTLINQDWVVTAAHCDSNNF…SLVVLFNSNGFLCGGTLINQDWVVTAAHCDSNNF…
Structure levels of a Structure levels of a polypeptide chainpolypeptide chain
Arginine, R(+ charge)
Asparagine (N)(hydrophilic)
Aspartate D (- charge, hydrophilic) Histidine (H)
Lysine (K)(+ charge)
Tyrosine (Y)(aromatic)
R : Valine (V)(aliphatic hydrophobic)
Methionine (M)(hydrophobic)
Composition of proteinsComposition of proteins-The amino acid residues--The amino acid residues-
Some representative types of the 20 naturally-ocurring amino acidsSome representative types of the 20 naturally-ocurring amino acids
Properties of amino acid residues: http://www.imb-jena.de/IMAGE_AA.html
Secondary structureSecondary structure
The main chain N-Ca and Ca-C bonds are free to rotate. These The main chain N-Ca and Ca-C bonds are free to rotate. These rotations are represented by the torsion angles rotations are represented by the torsion angles and and respectively.respectively.
Right-handed Right-handed -helix-helix
Ideal valuesIdeal values
= -57.8° = -57.8° = -47.0°= -47.0°
The peptide bond The peptide bond is planar:is planar:
= 180° trans= 180° trans
= 0° cis= 0° cis
PropertiesProperties
Pitch p = 5.4 Å
N = 3.6 amino acid residues / turn
Rise r = 1.5 Å / residue
Backbone radius = 2.3 Å
H-bond between C=Oi and NHi+4
Peptide planes are roughly parallel with the helix axis. Each peptide unit has a dipole moment.
Thus, the dipoles within the helix are aligned giving rise to a macrodipole moment.
Side chains point outward from helix axis.
Formation of the helix is cooperative.
Peptide fragments that connect regular Peptide fragments that connect regular secondary structure elements (secondary structure elements (-helices or -helices or -strands)-strands)
Furnish the directional changes necessary to Furnish the directional changes necessary to obtain a globular formobtain a globular form
Found often at the surface of globular Found often at the surface of globular proteinsproteins
Form hydrogen bonds with waterForm hydrogen bonds with water Are in general very flexibleAre in general very flexible Other loops have specific non-repetitive, Other loops have specific non-repetitive,
stable and ordered structuresstable and ordered structures Have a length of 2-16 residuesHave a length of 2-16 residues
Classification of the structure Classification of the structure of protein loopsof protein loops
Type: AR beta-beta linkType: AR beta-beta link Type: EH beta-alphaType: EH beta-alpha Type: HA beta-beta hairpinType: HA beta-beta hairpin Type: HE alpha-betaType: HE alpha-beta Type: HH alpha-alphaType: HH alpha-alpha
v1 v2
Secondary structures
v3anchor
anchor
loop
http://sbi.imim.es/cgi-bin/archdb/loops.pl?http://sbi.imim.es/cgi-bin/archdb/loops.pl?
-turns (reverse turns)-turns (reverse turns)
A special case of loops A special case of loops with < 6 residues.with < 6 residues.
Ideal values:Ideal values:
-turn I-turn I II II
i+1 i+1 -60°-60° -60° -60°
i+1 i+1 -30°-30° 120°120°
i+2 i+2 -90°-90° 80° 80°
i+2i+2 0° 0° 0° 0°
Secondary structures of globular Secondary structures of globular proteinsproteins
Occurrence (%): simple loops 21 reverse turns 15 complex loops 10 helices 26 b-sheets 19
Average length (residues ): helices 9.3 -sheets 5.3 loops 5.9
Tertiary structureTertiary structure Array of secondary structures => tertiary structure (the fold)Array of secondary structures => tertiary structure (the fold)
Relative positioning of the secondary structuresRelative positioning of the secondary structures
Interactions that stabilize the new level of structureInteractions that stabilize the new level of structure Covalent bondsCovalent bonds
• S-S bridgesS-S bridges Non-covalent bondsNon-covalent bonds
• Hydrogen bondsHydrogen bonds
• Salt (ionic) bridgesSalt (ionic) bridges
• Hydrophobic effectHydrophobic effect
Folding is cooperativeFolding is cooperative
Effects of the tertiary Effects of the tertiary structurestructure
Induction of a given secondary structureInduction of a given secondary structure
New spatial repartition of the residuesNew spatial repartition of the residues Solvent-exposedSolvent-exposed BuriedBuried
New functionalityNew functionality
Example: sperm whale myoglobinExample: sperm whale myoglobin
The protein is complexed to
protoporphyrin IX containing
Fe
Hierarchical classification of Hierarchical classification of protein tertiary structuresprotein tertiary structures
CATH: CATH: www.biochem.ucl.ac.uk/bsm/cath/
SCOP: SCOP: scop.mrc-lmb.cam.ac.uk/scop/
DALI: DALI: www.ebi.ac.uk/dali/
Quaternary structureQuaternary structure
Assemblage of tertiary structures to produce a higher level of structureAssemblage of tertiary structures to produce a higher level of structure
Quaternary structureQuaternary structure
Quaternary structureQuaternary structure Protein polymersProtein polymers Closed aggregates or oligomersClosed aggregates or oligomers
• HomoHomo• HeteroHetero
SymmetrySymmetry ChemistryChemistry
StabilityStability Covalent bondsCovalent bonds Non-covalent bondsNon-covalent bonds
CooperativityCooperativity Structural and functional regulationStructural and functional regulation
• Allostery, e.g. the oxy to deoxy transition of the 4-mer of hemoglobinAllostery, e.g. the oxy to deoxy transition of the 4-mer of hemoglobin
Chemical or biological activityChemical or biological activity No consequences = > monomer as active as the oligomerNo consequences = > monomer as active as the oligomer New activity, absent in the absence of oligomerization, e.g. the active site New activity, absent in the absence of oligomerization, e.g. the active site
residues may come from several subunitsresidues may come from several subunits
Some properties of quaternary structure
ExamplesExamples
Heterodimer: Chain: a, phospholipase a2 inhibitor and chain b, phospholipase a2.
Homohexameric
Snake venom vipoxin complex
Human ephb2 receptor sam domain
The monomer of The monomer of chaperonin GroEl chaperonin GroEl (HSP60 CLASS)(HSP60 CLASS)
The tetradecamerThe tetradecamer
Complexity in quaternary structureComplexity in quaternary structure
A nucleic acid polyphosphate chain has an even A nucleic acid polyphosphate chain has an even larger nombre of potential conformations, given larger nombre of potential conformations, given that it contains 6 backbone torsion angles per that it contains 6 backbone torsion angles per monomermonomer
Other biopolymers
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN
CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
The conformational hyperspaceThe conformational hyperspace - -The potential energy landscape-The potential energy landscape-
The conformation of a biopolymer is a function of The conformation of a biopolymer is a function of a large number of degrees of freedom.a large number of degrees of freedom.
The surface described by the potential energy The surface described by the potential energy function in this n-dimensional space is very function in this n-dimensional space is very complex.complex.
If the bond lengths and valence angles of a If the bond lengths and valence angles of a polypeptide chain are fixedpolypeptide chain are fixed the chain contains 2 degrees of freedom per residue -the chain contains 2 degrees of freedom per residue -
the torsion angles the torsion angles et et this determines in a unique fashion the conformation of this determines in a unique fashion the conformation of
the molecule.the molecule.
Rotational Isomeric State (RIS) TheoryRotational Isomeric State (RIS) Theory
For a given bond, the torsion For a given bond, the torsion angles may adopt a angles may adopt a discretdiscret and and finitefinite number of states that number of states that correspond to the minimina of correspond to the minimina of the potential energy function.the potential energy function.
C = mC = mnn
C: number of conformationsC: number of conformations m: number of rotational m: number of rotational
statesstates n: number of bondsn: number of bonds
For For m = 3m = 3 and n = 100 and n = 100 C = 3C = 3100100~10~104848
The native state is assumed to be the state of miniminum global energy.The native state is assumed to be the state of miniminum global energy.
G of native state <----> denatured state may be very smallG of native state <----> denatured state may be very small
Even with the use of Even with the use of RISRIS theory, the polypeptide chain has a very large number theory, the polypeptide chain has a very large number of potential conformations.of potential conformations.
=>Formation of a conformational hyperespace composed of a multitude of =>Formation of a conformational hyperespace composed of a multitude of minima and maxima of the energy function, with many non native low energy minima and maxima of the energy function, with many non native low energy conformations separated by (high) energy barriers.conformations separated by (high) energy barriers.
A given energy-optimized geometry depends on the starting geometry.A given energy-optimized geometry depends on the starting geometry.
For complex functions of several variables, there is no analytical solution for For complex functions of several variables, there is no analytical solution for elucidation of the global minimum.elucidation of the global minimum.
Even with numerical methods, the possibility exists of becoming trapped in local Even with numerical methods, the possibility exists of becoming trapped in local minima.minima.
Furthermore, it is impossible to search and examine exhaustively all the Furthermore, it is impossible to search and examine exhaustively all the accessible conformations.accessible conformations.
Thus, the need to face and circumvent this problem => algorithms for the Thus, the need to face and circumvent this problem => algorithms for the prediction of protein structure.prediction of protein structure.
Prediction of protein side-chain conformations or Prediction of protein side-chain conformations or rotamersrotamers
Important component of any modeling method (homology Important component of any modeling method (homology modeling, ab initio structure prediction) modeling, ab initio structure prediction) Applications include study of mutations.Applications include study of mutations.
The side chain torsion angles are named 1, 2, 3, etc. and the atoms , , , etc.
Problem: side chains can adopt several conformations.
Example: the aspartate residue 2 angles , 9 rotamers.
Example: the arginine residue, 5 angles.
Side chain conformational searchSide chain conformational search Combinatorial problem Combinatorial problem
a complex search problem among interacting side chains in order to find a a complex search problem among interacting side chains in order to find a global minimum.global minimum.
The minimum number of variables to consider are:The minimum number of variables to consider are: the number of rotamers for each side chainthe number of rotamers for each side chain the number of neighboring side chains interacting with each rotamerthe number of neighboring side chains interacting with each rotamer
The rotamers for the 20 amino acid residues are stored in databases The rotamers for the 20 amino acid residues are stored in databases (e.g. the SCWRL library, (e.g. the SCWRL library, http://dunbrack.fccc.edu/SCWRL3.phphttp://dunbrack.fccc.edu/SCWRL3.php).).
The number of rotamers in a library depends on the The number of rotamers in a library depends on the angle cutoff. angle cutoff.
For a 40° For a 40° angle cutoff, a library contains 214 side-chain rotamers. angle cutoff, a library contains 214 side-chain rotamers.
The energy function or score function may be based on the rotamer The energy function or score function may be based on the rotamer library and other terms, such as a repulsive steric energy.library and other terms, such as a repulsive steric energy.
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE FORCES THAT STABILIZE
PROTEIN CONFORMATIONPROTEIN CONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
A number of interaction energies stabilize the A number of interaction energies stabilize the structure of proteins and need to be taken into structure of proteins and need to be taken into account in order to quantify the different types of account in order to quantify the different types of energy that govern the behavior and the stability energy that govern the behavior and the stability of a molecule:of a molecule: Hydrogen bonds and the aqueous solventHydrogen bonds and the aqueous solvent Hydrophobic effectHydrophobic effect van der Waals interactions (steric)van der Waals interactions (steric) Electrostatic interactions (ionic or salt bridges)Electrostatic interactions (ionic or salt bridges) Covalent "cross-link" bonds, such as disulfide bondsCovalent "cross-link" bonds, such as disulfide bonds
The hydrogen bondThe hydrogen bond An H atom is attracted by rather strong forces to 2 atoms
instead of only one.
So, it may be considered to be acting as a bond between them.
Partially positively charged H atom lies between partially negatively charged O and N.
In water, H is covalently attached to the O (about 492 kJ mol-1).
But has an additional attraction (about 23.3 kJ mol-1 (almost 10x the average thermal fluctuation at 25°C) to a neighboring O of another water molecule.
That is far greater than the included van der Waals interaction (about 5.5 kJ mol-1).
The bond is part electrostatic (90%) and part covalent (10%).
The bond may be approximated by the following states covalent HO-H····OH2 (major) ionic HOd--Hd+····Od-H2 covalent HO-····H-O+H2 (minor)
Holds the two strands of the DNA double helix together
Holds polypeptides together in secondary structures
Helps enzymes bind to their substrate
Helps antibodies bind to their antigen
Helps transcription factors bind to each other
Helps transcription factors bind to DNA
Numerous functions, one of which is that of StockmayerNumerous functions, one of which is that of Stockmayer• Ehb = 4 [ (s/r')12 - (s/r')6] - (µoµh/r3) g(∂o,∂h,ø)
• r': distance between the hydrogen and the h-acceptor; s: r': distance between the hydrogen and the h-acceptor; s: coefficients independent of r‘; µ: dipole moments centered on coefficients independent of r‘; µ: dipole moments centered on the acceptor and the hydrogen; g: angular dependent functionthe acceptor and the hydrogen; g: angular dependent function
The water solventThe water solvent A mediator in molecular interactions.A mediator in molecular interactions.
0.25-0.45g associated for each gram of protein.0.25-0.45g associated for each gram of protein.
Polar molecule.Polar molecule.
Essentially a proton donnor, e.g. side chains like Asp and Glu are strongly hydrated.Essentially a proton donnor, e.g. side chains like Asp and Glu are strongly hydrated.
As proton acceptor, the O (electronegative) links to H atoms of neighboring molecules. As proton acceptor, the O (electronegative) links to H atoms of neighboring molecules.
40% of water h-bonds take place with the C=O group of the backbone and 44% with the side 40% of water h-bonds take place with the C=O group of the backbone and 44% with the side chains.chains.
Liquid water forms clusters or cage structures . The h-Liquid water forms clusters or cage structures . The h-bonds that stabilize this network are in constant fluctuation, bonds that stabilize this network are in constant fluctuation, breaking and reforming with a high frequency of the order breaking and reforming with a high frequency of the order of the picosecond.of the picosecond.
In solution, a locally-ordered layer of water, which has a In solution, a locally-ordered layer of water, which has a different behavior from bulk water, is associated in different behavior from bulk water, is associated in permanence to a protein. permanence to a protein.
Internal water molecules contribute strongly to the stability of Internal water molecules contribute strongly to the stability of protein structure.protein structure.
When 2 macromolecules get together to form a complex, When 2 macromolecules get together to form a complex, some surface water must be displaced and suffer significant some surface water must be displaced and suffer significant arrangements. The interface water influences the formation arrangements. The interface water influences the formation of supramolecular structures.of supramolecular structures.
Water contributes to the stability of the tertiary and Water contributes to the stability of the tertiary and quaternary structure of proteins.quaternary structure of proteins.
Water plays also an important role in enzymatic catalysis.Water plays also an important role in enzymatic catalysis.
Water is associated to metallic ligands of proteins.Water is associated to metallic ligands of proteins.
Solvation effects and the properties of water are Solvation effects and the properties of water are very important and help to better understand very important and help to better understand biological processes.biological processes.
Small ions (counter-ions such as Na+, K+, Ca+Small ions (counter-ions such as Na+, K+, Ca++) and other charged molecules like phosphates +) and other charged molecules like phosphates are also present in aqueous solution.are also present in aqueous solution.
There are specific theories (e.g. Debye-Huckel) There are specific theories (e.g. Debye-Huckel) that deal with simple electrolyte solutions.that deal with simple electrolyte solutions.
Models types a, b and c are all planar whereas type d is almost tetrahedral.
Models of waterModels of water
The Hydrophobic effectThe Hydrophobic effect Major driving force for the folding of globular
proteins.
Results in the burial of the hydrophobic
residues in the core of the protein.
Exemplified by the fact that oil and water do
not mix.
The thermodynamic factors which give rise to the hydrophobic effect are complex and still incompletely understood.
The free energy of transfer of a non-polar compound from some reference state, such as an organic solution, into water,
Gtr = Htr - T Str
Htr, enthalpy, and -TStr entropy.
At room temperature, Htr from organic
solution into aqueous solution is negligible.
Str < 0 since water tends to form ordered
cages around the non-polar molecule.
Thus, Gtr > 0
Model compound studies predict that the hydrophobic effect of exposing one buried methylene group to bulk water is 0.8 kcal/mol.
Site directed mutagenesis studies yielded a larger number with greater statistical variation: the average hydrophobic effect estimated by SDM for a buried methylene group is about 1.3 kcal/mol.
Steric interactionsSteric interactions
Van der WaalsVan der Waals E vw= k ∑ [aij/rij
m - cij/rij6]
• rrijij: distance between particles i and j.: distance between particles i and j.
• The mild attractif term rThe mild attractif term r-6-6 dependence dependence From what is called induced dipole--dipole moment interaction of the From what is called induced dipole--dipole moment interaction of the
particlesparticles
• The repulsive term results from mutual deformation of the The repulsive term results from mutual deformation of the structuresstructures
When m=12, the term is called Lennard-JonesWhen m=12, the term is called Lennard-Jones
• aaijij, c, cijij and and are coefficients that depend on the type of atom. are coefficients that depend on the type of atom.
Electrostatic interactionsElectrostatic interactions
Electrostatic interactions. The monopole Electrostatic interactions. The monopole approximation of coulombic point charges is approximation of coulombic point charges is used. used.
• E E élecélec = (1/4 = (1/4ππ))∑∑qqii q qjj/ r/ rijij
: permittivity of the medium or relative microscopic : permittivity of the medium or relative microscopic dielectric coefficientdielectric coefficient
qqii: partial charges on atom i.: partial charges on atom i.
Determine a distribution of partial charges.Determine a distribution of partial charges.
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN
CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
Quantum MechanicsQuantum Mechanics Classical MechanicsClassical Mechanics
Molecular Mechanics Molecular Mechanics Statistical Mechanics (several methods) Statistical Mechanics (several methods) Normal Modes of Vibration Normal Modes of Vibration Conformational searchConformational search
Ab initioAb initio Threading or fold recognitionThreading or fold recognition
METHODOLOGICAL METHODOLOGICAL APPROCHESAPPROCHES
Quantum mechanicsQuantum mechanics
Uses (time-dependent) Schrödinger’s equation:Uses (time-dependent) Schrödinger’s equation:
The Hamiltonian operator is given byThe Hamiltonian operator is given by
Psi is the wavefunction which contains all information about the Psi is the wavefunction which contains all information about the dynamical properties of the system and U is the energy for the state.dynamical properties of the system and U is the energy for the state.
The first term represents the kinetic energy and the second the The first term represents the kinetic energy and the second the potential energy. The z’s are the electric charges.potential energy. The z’s are the electric charges.
Calculates the total potential energy of the molecule asCalculates the total potential energy of the molecule as U (total) = U (nuclear) + U (electronic).U (total) = U (nuclear) + U (electronic).
The Born-Oppenheimer approximation assumes that the The Born-Oppenheimer approximation assumes that the motions of the atomic nuclei are independent of the motions of the atomic nuclei are independent of the motion of the electrons:motion of the electrons:
U (total) = K + U (electronic).U (total) = K + U (electronic).
Two types of calculation have been developped to get E Two types of calculation have been developped to get E (electronic) :(electronic) :
Ab-initioAb-initio
Semi-empiricalSemi-empirical
Ab initioAb initio Quasi-complete calculation of the molecular wave function: Quasi-complete calculation of the molecular wave function:
no parametrization.no parametrization. Molecular orbitalsMolecular orbitals Rather trustworthyRather trustworthy High degree of precision for the energy calculation.High degree of precision for the energy calculation. Applicable to molecules containing less than 40-50 atoms.Applicable to molecules containing less than 40-50 atoms. Calculations are rather long and need a lot of disk spaceCalculations are rather long and need a lot of disk space Allows step-by-step energy optimisationAllows step-by-step energy optimisation
Semi-empiricalSemi-empirical Certain 2- and 4-electron integrals (differential overlap of Certain 2- and 4-electron integrals (differential overlap of
orbitals) are parametrized in order to fit experimental and orbitals) are parametrized in order to fit experimental and theoretical data. The goal is to find the solutions (WFs) and theoretical data. The goal is to find the solutions (WFs) and the eigenvalues (energies of the system) of Schrödinger’s the eigenvalues (energies of the system) of Schrödinger’s equation.equation.
The « force field » methodThe « force field » method Uses an empirical function of the internal or potential energyUses an empirical function of the internal or potential energy Assumes thatAssumes that
• U (total) = U (nuclear)U (total) = U (nuclear) So, no drastic changes in the electronic structure, such as formation or So, no drastic changes in the electronic structure, such as formation or
breaking of covalent bonds, are allowed.breaking of covalent bonds, are allowed. Molecules are treated with the laws of classical mechanicsMolecules are treated with the laws of classical mechanics
The total energy of interaction is decomposed in a number of The total energy of interaction is decomposed in a number of independent terms that are adjusted to be in agreement with independent terms that are adjusted to be in agreement with experimental data, such as NMR, IR, circular dichroism and X-experimental data, such as NMR, IR, circular dichroism and X-ray diffractionray diffraction
U (syst) = ∑UU (syst) = ∑Uii (intramolecular) + ∑∑U (intramolecular) + ∑∑U i,ji,j (intermolecular) (intermolecular)
Molecular mechanicsMolecular mechanics
The internal degrees of freedom that define the 3D conformation of a molecule
A
BC
D
A Blo
C
A
B
o
o
Bond length Bond angle
Proper torsionImproper torsion
A
DC
Bo imp
Decomposition of the interaction Decomposition of the interaction energyenergy
Bonding energiesBonding energies (for interactions (for interactions between atoms separated by 1, 2 or 3 between atoms separated by 1, 2 or 3 bonds)bonds) Deformation of bond lengths. Quadratic Deformation of bond lengths. Quadratic
function.function.• E str (l) = (1/2) ∑ kE str (l) = (1/2) ∑ kll
ii (l (lii-l-lioio))22; l; lii: bond length A – B ; l: bond length A – B ; lioio: :
equilibrium bond lengthequilibrium bond length
Deformation of valence angles. Quadratic function.Deformation of valence angles. Quadratic function.• E bend (E bend () = (1/2) ∑ k) = (1/2) ∑ k
ii ( (ii--ioio))22; ; ii: valence angle A - B – C ; : valence angle A - B – C ; i0i0: equilibrium valence angle: equilibrium valence angle
Deformation of proper dihedral angles or intrinsic Deformation of proper dihedral angles or intrinsic rotation around bonds. Trigonometric function. E tors rotation around bonds. Trigonometric function. E tors (ø) = ∑ (E(ø) = ∑ (Eøø
ii/2)[1+ S/2)[1+ Sii cos (|n cos (|nii|ø|øii)];E)];Eøøii: torsional barrier; : torsional barrier;
SSii: ±1 ; n: ±1 ; nii: 1, 2, 3, ... (periodicity) ;ø: 1, 2, 3, ... (periodicity) ;øii : proper torsion : proper torsion angle A - B - C - Dangle A - B - C - D
Deformation of improper dihedral angles. Quadratic Deformation of improper dihedral angles. Quadratic function.function.
• E impr (E impr () = (1/2) ∑ k) = (1/2) ∑ k ( (ii- - ioio))22; ; ii: improper angle A - B - C : improper angle A - B - C
– D; – D; ioio: equilibrium improper angle: equilibrium improper angle
Non-bonding energiesNon-bonding energies (for atoms separated by more than 3 bonds) (for atoms separated by more than 3 bonds) Van der WaalsVan der Waals
• E vw= k ∑ [aij/rijm - cij/rij
6]
Electrostatic interactionsElectrostatic interactions
• E E élecélec = (1/4 = (1/4ππ))∑∑qqii q qjj/ r/ rijij
Hydrogen bondsHydrogen bonds
• Ehb = 4 [ (s/r')12 - (s/r')6] - (µoµh/r3) g(∂o,∂h,ø)
• others
Constraint termsConstraint terms• Used to fix the values of the distances, angles or torsions. Usually quadratic.Used to fix the values of the distances, angles or torsions. Usually quadratic.
Some force fields use mixed termsSome force fields use mixed terms
• E str/bend (l, E str/bend (l, ) = ∑ k) = ∑ kiill (l (lii- l- lioio) () (ii- - ioio).).
Hydrophobic interactions and others (solvation, ...).Hydrophobic interactions and others (solvation, ...).
In general, the total potential energy is the sum of In general, the total potential energy is the sum of individual contributions:individual contributions: U tot = U str + U bend + U tor + U vw + U elec + U hb + U U tot = U str + U bend + U tor + U vw + U elec + U hb + U
miscmisc
This method needs values for the parameters, such This method needs values for the parameters, such as the force constants k:as the force constants k: Extract parameters from physical mesures (microwave Extract parameters from physical mesures (microwave
radiation, infrared, neutron diffraction, crystal lattice radiation, infrared, neutron diffraction, crystal lattice packing, etc.) on model compoundspacking, etc.) on model compounds
Refine with calculations on model compounds similar to Refine with calculations on model compounds similar to the structural object under study the structural object under study
Considers only the internal energy of the molecule Considers only the internal energy of the molecule and its geometry.and its geometry.
Quick and efficient.Quick and efficient.
Minimization algorithmsMinimization algorithms
No derivatives of the energy functionNo derivatives of the energy function SIMPLEX : coarse atom-by-atom minimizationSIMPLEX : coarse atom-by-atom minimization Used for very deformed structuresUsed for very deformed structures
Calculation of the first derivatives with condition for the Calculation of the first derivatives with condition for the extremum ∂U/∂r = 0.extremum ∂U/∂r = 0. Iterative methods of the typeIterative methods of the type
• line search - steepest descent line search - steepest descent • conjugate gradients conjugués conjugate gradients conjugués
Analytical or numerical calculation of the second derivatives Analytical or numerical calculation of the second derivatives with condition ∂with condition ∂22U/∂rU/∂r22 > 0 > 0 Newton-RaphsonNewton-Raphson If the function is quadratic, the condition for the extremum If the function is quadratic, the condition for the extremum
leads to the inverse of the Hessian.leads to the inverse of the Hessian.
Electrostatics and the molecular Electrostatics and the molecular electrostatic potential (MEP)electrostatic potential (MEP)
The potential contour at +3 kcal/mol (red) gives an indication of whether there is a significant MEP field near the molecular surface.
Solve the Poisson-Solve the Poisson-Boltzmann equation to Boltzmann equation to get the MEP field u(x)get the MEP field u(x)
d: relative dielectric,
f: molecular partial charge density
fi: counter-ion partial charge
density
E: charge of an electron
Protein A Protein B
Statistical mechanicsStatistical mechanics a) Molecular Dynamicsa) Molecular Dynamics
Use newtonian mechanics to obtain the simultaneous Use newtonian mechanics to obtain the simultaneous positions and the velocities for all the atoms as a function positions and the velocities for all the atoms as a function of time.of time.
Solve the system of simultaneous equations:Solve the system of simultaneous equations:• dd22xxii/dt/dt22 = a = aii = F = Fii/m/mii; F; Fii = - ∂V/∂x = - ∂V/∂xii
• V is the potential function; mV is the potential function; mii, x, xii and a and aii are the mass, are the mass, position and acceleration of atom i, respectively.position and acceleration of atom i, respectively.
Integrating the equation twice, obtain x=f(t) => the Integrating the equation twice, obtain x=f(t) => the trajectory of each atom is simulated.trajectory of each atom is simulated.
The time scale for the simulation must be lower than the The time scale for the simulation must be lower than the frequencies associated to certain structural modifications, frequencies associated to certain structural modifications, such as the deformation of bond lengths (10such as the deformation of bond lengths (10-12-12 s ou 1ps). s ou 1ps).
The force field may be that of Molecular Mechanics.The force field may be that of Molecular Mechanics.
The temperature of the system is linked to the time-The temperature of the system is linked to the time-average of the kinetic energy:average of the kinetic energy:
<E<Ekinetickinetic>>NVTNVT = (3/2) N k = (3/2) N kBB T T <E<Ekinetickinetic>>NVTNVT = (1/2) m v = (1/2) m v22
• N: number of particlesN: number of particles• V: volume of the systemV: volume of the system• T: absolute temperatureT: absolute temperature• kkBB: Boltzman’s constant: Boltzman’s constant
Modify the temperature of the system by scaling the Modify the temperature of the system by scaling the velocities.velocities.
At high temperatures, the system is capable of At high temperatures, the system is capable of occupying high energy regions in the conformational occupying high energy regions in the conformational space, thus overcoming high energetic barriers. space, thus overcoming high energetic barriers.
By cooling the system, the probability of the low-energy By cooling the system, the probability of the low-energy states increases. At T=0 K the system should occupy the states increases. At T=0 K the system should occupy the lowest-energy state.lowest-energy state.
EEtotal total == EEkinetic kinetic ++ EEpotentialpotential
MD is executed in the following thermodynamic MD is executed in the following thermodynamic ensemblesensembles
Microcanonical: constant NVE (traditional)Microcanonical: constant NVE (traditional) Canonical: constant NVTCanonical: constant NVT Isothermal-isobaric: constant NPTIsothermal-isobaric: constant NPT Grand canonical: constant Grand canonical: constant VT VT
• N : number of particules, V : volume, E : energy, T : temperature, N : number of particules, V : volume, E : energy, T : temperature, : : chemical potential).chemical potential).
The limitations of MD are The limitations of MD are the high computer ressources neededthe high computer ressources needed the need to have available a big number of starting the need to have available a big number of starting
conformationsconformations the limited time scale (order of nanoseconds) the limited time scale (order of nanoseconds)
Steered Steered molecular molecular dynamics, dynamics, performed by performed by applying a series applying a series of external forces of external forces to retinal, allow to retinal, allow one to extract one to extract retinal from retinal from bacteriorhodopsibacteriorhodopsin once the Schiff n once the Schiff base bond to base bond to Lys216 is cleaved Lys216 is cleaved
QuickTime™ and aVideo decompressor
are needed to see this picture.
Steered Steered molecular molecular dynamics of dynamics of the action of the action of human human synovial synovial protein protein phospholipase phospholipase A2 (PLA2) at A2 (PLA2) at the lipid water the lipid water interfaceinterface
QuickTime™ and aVideo decompressor
are needed to see this picture.
Conformational Conformational changes induced changes induced in the kinesin in the kinesin structure (blue) structure (blue) by the additional by the additional gamma gamma phosphate phosphate (green) of ATP(green) of ATP
QuickTime™ and aVideo decompressor
are needed to see this picture.
b) Monte-Carlob) Monte-Carlo Method that generates conformations by assigning Method that generates conformations by assigning
random values to the torsion angles.random values to the torsion angles. The energy is calculated for each generated The energy is calculated for each generated
conformationconformation Conformations are kept or rejected using the following Conformations are kept or rejected using the following
criteria:criteria:• if Ei+1 < Ei, conformation i is retained with probability 1;if Ei+1 < Ei, conformation i is retained with probability 1;
• if Ei+1 > Ei, one calculates a probability pif Ei+1 > Ei, one calculates a probability p ii, for ex. by the , for ex. by the
Metropolis algorithm, which generates a chain of Markov Metropolis algorithm, which generates a chain of Markov states.states.
• If pIf pii > e- > e-Ei/ktEi/kt, the conformation is retained with p, the conformation is retained with p ii; otherwise, ; otherwise,
the conformation is submitted again to a new random the conformation is submitted again to a new random variation.variation.
c) the partition function Zc) the partition function Z
Z describes the distribution of conformational states of a polymeric Z describes the distribution of conformational states of a polymeric chain of N residues and it’s based on RIS theorychain of N residues and it’s based on RIS theory
Z = K ∫ [exp (–E(1,…, N)/RT)] d1d1d1d2d2 … dN
The exponential term represents a statistical weightThe exponential term represents a statistical weight
Z establishes the connection between the (macromolecular) Z establishes the connection between the (macromolecular) thermodynamic properties, such as Gibbs free energy, the enthalpy, thermodynamic properties, such as Gibbs free energy, the enthalpy, the entropy and the calorific capacity and the microscopiques the entropy and the calorific capacity and the microscopiques phenomena.phenomena.
In the case of the Ising model (nearest-neighbor interactions only) and by In the case of the Ising model (nearest-neighbor interactions only) and by dividing the conformational space in 2 regions, helix (dividing the conformational space in 2 regions, helix ( and non- and non- or or random coil (c), for a homopolymeric chain of N-residues, the generator random coil (c), for a homopolymeric chain of N-residues, the generator matrix method allows Z to be expressed as a matrix productmatrix method allows Z to be expressed as a matrix product
Z = K [1,1] z1c 0 z2
c z2 … zN
c zN
1
0 z1 z2
c z2 … zN
c zN
1
<<< 1, the nucleation parameter <<< 1, the nucleation parameter • takes into consideration helix-random coil junctionstakes into consideration helix-random coil junctions• is a penalty in the case of the presence of many junctions, is a penalty in the case of the presence of many junctions, • favorizes long segments of identical conformationfavorizes long segments of identical conformation
s, the cooperativity parameter represents the cooperativity between those s, the cooperativity parameter represents the cooperativity between those residues of identical conformationresidues of identical conformation
Each state carries with it a given statistical weight z associated to the Each state carries with it a given statistical weight z associated to the energy of the conformationenergy of the conformation
4) Langevin dynamics4) Langevin dynamics Differential stochastic equation where two force terms Differential stochastic equation where two force terms
have been added to Newton’s second law:have been added to Newton’s second law:• Friction forceFriction force• Stochastic forceStochastic force
FFii/m/mii - - iivvii + R + Rii(t)/m(t)/mii = a = aii
• where where ii = = ii/m/mii is the collision frequency and is the collision frequency and ii the friction the friction
coefficient. Rcoefficient. Rii(t) is a stochastic function.(t) is a stochastic function.
The collisions of the solvent and the solute may help The collisions of the solvent and the solute may help overcome energetic barriers.overcome energetic barriers.
LD may perform a better conformational research than LD may perform a better conformational research than MD (MD ( = 0). = 0).
5) Normal modes of vibration5) Normal modes of vibration Simple harmonic oscillations around a local minimum of energy.Simple harmonic oscillations around a local minimum of energy.
Each movement may be expressed as a superposition of normal modes.Each movement may be expressed as a superposition of normal modes.
For a non harmonic energy function, the potential close to the minimum may For a non harmonic energy function, the potential close to the minimum may be approximated by a harmonic potential.be approximated by a harmonic potential.
Any small-amplitude movement may be described as a sum of normal Any small-amplitude movement may be described as a sum of normal modes.modes.
As a globular protein is heated from low temperatures, the atomic As a globular protein is heated from low temperatures, the atomic fluctuations begin to deviate from the harmonic behavior around 200K.fluctuations begin to deviate from the harmonic behavior around 200K.
Qualitative and semi–quantitative estimates can be made for many Qualitative and semi–quantitative estimates can be made for many properties such as the magnitude of atomic fluctuations, displacement properties such as the magnitude of atomic fluctuations, displacement covariance matrix, vibrational entropy, etc.covariance matrix, vibrational entropy, etc.
Requires the calculation of theHessian, followed by the diagonalization. The Requires the calculation of theHessian, followed by the diagonalization. The normal-mode frequencies are directly related to the eigenvalues and the normal-mode frequencies are directly related to the eigenvalues and the normal modes to the eigenvectors.normal modes to the eigenvectors.
The low-frequency normal modes (under 30 cmThe low-frequency normal modes (under 30 cm-1-1), which correspond to ), which correspond to large-scale conformational changes, are often of central interest. large-scale conformational changes, are often of central interest.
Slowest Motions in the Intact RibosomeSlowest Motions in the Intact Ribosome
QuickTime™ and aGIF decompressor
are needed to see this picture.
QuickTime™ and aGIF decompressor
are needed to see this picture.
Mode 1 Mode 3
Ab initio methodsAb initio methods
Secondary structure predictionSecondary structure prediction
Theoretical algorithms• Systematic and exhaustive conformational search with
energy minimization.
• Main problem: the astronomical number of non -native low-energy conformations.
Empirical algorithms• Residue propensities
• Neural networks
• Others
Chou and Fasman’s empirical algorithm for the prediction of for the prediction of secondary structures in proteinssecondary structures in proteins
Considers the composition of small polypeptide segments. If those segments are rich in residues found in helices or sheets, then that segment is considered to adopt the corresponding secondary structure.
f = n / n f : frequence of a given residue in an h in a series of protein structures n : number of given residues in h
n : total number of residues
P = f / <f> P : propensity of a particular residue to be in h
<f> : mean value of f for each of the 20 residues
P = f / <f> P : propensity of a particular residue to be in a sheet <f> : mean value of f for each of the 20 residues
A list of propensities and based on the analysis of X-ray structures is obtained:
H : very favorable i : indifferent h : favorable b : non favorable I : little favorable B : very non favorable
RulesRules
1. A group of 4 residues H1. A group of 4 residues H, h, h, I, I (I (I counts for counts for half) over 6 contiguous residues shall initiate an half) over 6 contiguous residues shall initiate an h.h.
2. The helicoïdal segment will propagate in the 2 2. The helicoïdal segment will propagate in the 2 directions until the mean value of Pdirections until the mean value of P of a tetrapeptidic of a tetrapeptidic segment goes under 1.0.segment goes under 1.0.
3. A Pro can find itself only in the N-terminus of an 3. A Pro can find itself only in the N-terminus of an h.h.
4. A group of 3 residues H4. A group of 3 residues H, h, h over 5 contiguous over 5 contiguous residues will initiate a strand. The strand will propagate in residues will initiate a strand. The strand will propagate in both directions until the mean value of Pboth directions until the mean value of P of a tetrapeptidic of a tetrapeptidic segment goes under 1.00.segment goes under 1.00.
5. For regions containing 5. For regions containing and and structures, the structures, the overlapping zone will be helicoïdal if <Poverlapping zone will be helicoïdal if <P> > <P> > <P> ; > ; otherwise, a strand will form.otherwise, a strand will form.
Rost and Sander’s algorithm - Uses a system of neural networks on a non redundant database - Uses a system of neural networks on a non redundant database
of more than 130 chains.of more than 130 chains.
- The network is trained on proteins that have evolved naturally.- The network is trained on proteins that have evolved naturally.
- It uses the information furnished by evolution thanks to multiple - It uses the information furnished by evolution thanks to multiple sequence alignements.sequence alignements.
- Attributes a weigth that depends on sequence conservation for a - Attributes a weigth that depends on sequence conservation for a given position.given position.
- Includes global amino acid composition.- Includes global amino acid composition.
- The conformational states are Helix (H), Sheet (E) and Other (L).- The conformational states are Helix (H), Sheet (E) and Other (L).
- Mean confidence: 50-80%.- Mean confidence: 50-80%.
Remarks
In fact, secondary structures are strongly In fact, secondary structures are strongly influenced by tertiary interactions, which is not influenced by tertiary interactions, which is not taken into account in many algorithms.taken into account in many algorithms.
Using weighted vector components or methods that take into account correlations between amino acid frequencies improve the accuracies.
Dataset size, class definition and cross-validation are of critical importance to the true accuracy of class prediction from amino acid composition
Tertiary structure predictionTertiary structure prediction
Obvious approaches for predicting protein structure from sequence
Ab initio: Simulate the folding process with the laws of physics
• Use a simplified polypeptide representation and restrain atom or residue positions to a hypothetical 2D or 3D lattice
• Use statistical potential to portray the true interaction energies
Search the entire conformational space available to the polypeptide for the correct fold.
Other approaches 1D-3D threading Threading Fold recognition
1D-3D alignments1D-3D alignments Encode 3D structural information into strings of symbols or Encode 3D structural information into strings of symbols or
profiles against which 1D strings derived from the query profiles against which 1D strings derived from the query sequence are aligned.sequence are aligned.
Defined structural environments on the basis of secondary Defined structural environments on the basis of secondary structure, solvent accessibility and burial by polar atoms. structure, solvent accessibility and burial by polar atoms. Profiles of scores for each of the 20 amino acids can then Profiles of scores for each of the 20 amino acids can then be calculated for each of the structural environment classes be calculated for each of the structural environment classes based on their observed frequencies in a database of based on their observed frequencies in a database of structures.structures.
Transform query sequences into strings of characters Transform query sequences into strings of characters representing several levels of conserved hydrophobicity or representing several levels of conserved hydrophobicity or of solvent accessibility. Dynamic programming alignments of solvent accessibility. Dynamic programming alignments using a substitution matrix derived from database counts using a substitution matrix derived from database counts may be able to detect remote homologies; include may be able to detect remote homologies; include secondary structure prediction information for the query secondary structure prediction information for the query sequence.sequence.
Limitations Much of the structural context
information is lost, compromising the specificity of sequence-structure matches,i.e. one amphipathic helix or strand is much like any other, and their mis-alignment is inevitable unless more detailed structural relationships are considered.
ThreadingThreading One solution to this problem
use a low level alignment to score the pairwise residue interactions (structural environment) at each equivalence by minimising an empirical pairwise distance
the low level alignment attempts to place query residues at positions in the library structure such that the distances between are similar to those observed frequently in a database for the amino acids of types
the final alignment and score is traced from a high level matrix derived from the low level alignment scores
Threading attempts thus to evaluate the sequence in 3D as it is threaded through a library structure
3D-1D and threading methods have similar success rates.
Fold recognitionFold recognition Protein structure and function, are more conserved than protein
sequence.
FR: Identification of correspondences between novel sequences and known structures would greatly assist in the characterisation of these sequences.
Idea: a query sequence may be compared to known structures and their sequences (the fold library).
The fold library is a set of known structures to which we wish to find similarities from a set of sequences of unknown structure, called queries.
Fold recognition and sequence database searching methods share the common aim of identifying distant ancestral relationships between sequences.
Structural comparisons give the least ambiguous measures of relatedness; sequence database annotations are often erroneous or absent.
Given that the number of distinct structures was not growing as fast as the PDB as a whole, only a finite and relatively small number of fold topologies must be encoded by the millions of protein sequences in nature.
Estimates: 1000-5000 different fold topologies.
Time scale for reaching it: tens of years.
Estimates from the deposition of structures set the probability of a newly sequenced protein (with no detectable structural homologue) being similar to a known fold at 70%.
Homologous Showing a fundamental similarity
of structure inherited from a common ancestor.
Applied to structures ranging from organs to molecules.
Homology modelingHomology modeling
ProtocolProtocol
Template searchingTemplate searchingTemplate selectionTemplate selectionMultiple sequence alignment Multiple sequence alignment Model buildingModel buildingModel refinementModel refinementStructure Validation Structure Validation
Template searchingTemplate searching Sequence search (BLAST) of the target against
sequence data banks (SwissProt, PIR, NCBI, etc.)
structure data banks (PDB)
If there are homologous sequences and structures with > 30% sequence identity
Sequence Homology Modeling
If identity < 30% 1D-3D alignment, i.e., threading and fold recognition
Template selectionTemplate selection(> 30% sequence identity)(> 30% sequence identity)
Selection of a set of representative structural templates (FSSP, a fold classification based on structure-structure alignment of proteins).
Multiple sequence alignment between target sequence Multiple sequence alignment between target sequence and template sequencesand template sequences
Methods to automatically align sequences in a reasonable manner:
CLUSTAL, relies on a heuristic approach HMMER, tackles the problem in a formally sounder
way by estimating a hidden Markov profile from the set of given sequences.
T-Coffee, an alignment of multiple alignments
Enhancement of alignment procedures Since accuracy of secondary structure predictions
algorithms ~70 %, incorporate secondary structure information for the target and the template sequences in the alignment procedure (with a proper weighting factor).
Template searching, selection and Template searching, selection and multiple sequence alignmentmultiple sequence alignment
16 20 30 34 36 38 41 59 60
a abcdTSV-PA: VFGGDECNINEHRSLVVLFNS--NGFLCGGTLINQDWVVTAAHCDS----TRYPS: IVGGYTCGANTVPYQVSLN--S-GYHFCGGSLINSQWVVSAAHCYK----KALLIK: IIGGRECEKNSHPWQVAIYHY--SSFQCGGVLVNPKWVLTAAHCKN----THROM: IVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWBOTHRO: VIGGDECDINEHPFLAFMYYS-PQY-FCGMTLINQEWVLTAAHCD-----PA-BJ : VVGGRPCKINVHRSLVLLYNS- SSLLCSGTLINQEWVLTAAHCD-----
61 62 73 77 78 81 93 95 9697 98
efghi a a aTSV-PA: ------NNFQLLFGVHSKKILN-EDEQTRDPKEKFFCPNRKK---DD-EVTRYPS: -----SGIQVRL-GEDNINVVE-GNEQFISASKSIVHPSYN----SN-TLKALLIK: -----DNYEVWL-GRHNLFENE-NTAQFFGVTADFPHPGFNLSADGK-DYTHROM: DKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYN----WRENLBOTHRO: ----K-TYMRIYLGIHTRSVAND-DEVIRYPKEKFICPNKKK---NV-ITPA-BJ : ----S-KNFKMKLGVHSIKIRN-KNERTRHPKEKFICPNRKK---DD-VL
Model buildingModel building
Proceed to the molecular modeling of the target sequences based on template structures
Construction of backbone
Energy minimization and/or molecular dynamics (may use force fields such as CHARMM, AMBER or GROMOS)
Retain a number of models based on several plausible alignments or other criteria.
Obtain properties such as the Molecular Electrostatic Potential, the Solvent Accessible Surface area, e.g., contact between protein and solvent (Lee & Richards, 1971), the exposed residues, etc.
Model refinementModel refinement
Structure ValidationStructure Validation Calculation of several stereochemical Calculation of several stereochemical
quality indicesquality indices
Ramachandran mapRamachandran map Packing qualityPacking quality
f
Assessment of the models regarding Assessment of the models regarding structural properties typical for native structural properties typical for native conformations, i.e. fold correctness.conformations, i.e. fold correctness. Use of the obtained energy-profiles to Use of the obtained energy-profiles to
reveal potential folding errors in the reveal potential folding errors in the models (Prosa II).models (Prosa II).
Growth of new folds in the Growth of new folds in the Protein Data Bank (PDB)Protein Data Bank (PDB)
New (blue) and old New (blue) and old (orange) folds per (orange) folds per year.year.
Using the homology method
• Once that representative structures of each fold will be identified (~1000-5000), it will be possible to obtain models for all the sequences.
• The determination of the fold type may furnish or improve the functional annotation,leading to a synergy between 3D structure and function
1. (Approx., 1/3 of genomic sequences have homologs with known protein structures.)
PROGRAMPROGRAM
INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN
CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS
ApplicationsApplications Probing of ‘interesting’ molecular regions with the goal of Probing of ‘interesting’ molecular regions with the goal of
trying to determiner whether they are involved in molecular trying to determiner whether they are involved in molecular recognition, activity and others.recognition, activity and others.
Computer-assisted drug design with a certain specific protein Computer-assisted drug design with a certain specific protein as target.as target.
Changes in molecular properties, such as production of Changes in molecular properties, such as production of chimeric enzymes with a different catalytic activity chimeric enzymes with a different catalytic activity (biochemical engineering).(biochemical engineering).
Effects of mutations on stability and activity.Effects of mutations on stability and activity. Computation of affinities and other properties.Computation of affinities and other properties. Mechanism of action.Mechanism of action. Structure-function relationships.Structure-function relationships.
Obtention of molecular models of the 3D Obtention of molecular models of the 3D structure of proteins and their complexes.structure of proteins and their complexes.
Design of new biomolecules possessing Design of new biomolecules possessing searched properties.searched properties.
Design of effectors of those molecules.Design of effectors of those molecules.
Studies of molecular interaction and Studies of molecular interaction and recognition, based on the principles of recognition, based on the principles of steric and electrostatic complementarity.steric and electrostatic complementarity.
Calculation of sure values for certain Calculation of sure values for certain properties, such as the inhibition constant Kproperties, such as the inhibition constant K ii
Bioavailability, pharmacokinetics and Bioavailability, pharmacokinetics and dynamics, absorption by different tissues, dynamics, absorption by different tissues, permeability, accessibility, permeability, accessibility, immunogenicityimmunogenicity or or toxicity. New ADMET technology toxicity. New ADMET technology addresses these issues.addresses these issues.
LimitationsLimitations
BIBLIOGRAPHYBIBLIOGRAPHYWEB SITESWEB SITES
Courseswww.cryst.bbk.ac.uk/PPS2/course/
swissmodel.expasy.org/course/
http://www.cmbi.kun.nl/gvteach/hommod/index.shtml
Kimball's Biology Pagesusers.rcn.com/jkimball.ma.ultranet/BiologyPages/
Protein sequences and structureswww.rcsb.org
ncbi.nlm.nih.gov
www.ebi.ac.uk
www.embl-heidelberg.de
www.expasy.org
www.uniprot.org
Image librarywww.imb-jena.de/
Water Structure and Behaviorwww.lsbu.ac.uk/water/index.html
Computational Analysis of Protein Sequence and Structurehttp://www.sbc.su.se/~maccallr/thesis/
Homology Modelinghttp://www.cmbi.kun.nl/gvteach/hommod/Step06A.shtml
BOOKSBOOKS
Lehninger principles of biochemistryLehninger principles of biochemistryNelson David L., Cox Michael M.Nelson David L., Cox Michael M.Biochemistry illustratedBiochemistry illustratedEdition 3rd edEdition 3rd ed20002000
Biochemistry illustratedBiochemistry illustratedCampbell Peter Nelson / Smith Anthony DonaldCampbell Peter Nelson / Smith Anthony DonaldEditeur Churchill LivingstoneEditeur Churchill Livingstone19941994ISBN 0-443-04573-9ISBN 0-443-04573-9
[An ]introduction to comparative biochemistry[An ]introduction to comparative biochemistryBaldwin ErnestBaldwin ErnestEdition 4th ed.Edition 4th ed.1970 [c1964]1970 [c1964]
BiochemistryBiochemistryBerg Jeremy M. / Tymoczko John L. / Stryer LubertBerg Jeremy M. / Tymoczko John L. / Stryer LubertEdition 5th edEdition 5th ed20022002
BiochemistryBiochemistry
Champe Pamela C. / Harvey Richard A. / Vella F. Champe Pamela C. / Harvey Richard A. / Vella F.
Editeur J. B. LippincottEditeur J. B. Lippincott
19941994
ISBN 0-397-51091-8ISBN 0-397-51091-8
BiochemistryBiochemistry
Davidson Victor L. / Sittman Donald B.Davidson Victor L. / Sittman Donald B.
3rd ed3rd ed
19941994
BiochemistryBiochemistry
Garrett Reginald H. / Grisham Charles M.Garrett Reginald H. / Grisham Charles M.
2nd ed2nd ed
19991999
BiochemistryBiochemistry
Voet Donald / Voet Judith G.Voet Donald / Voet Judith G.
2nd ed2nd ed
19951995
Introduction to protein structureIntroduction to protein structure
Branden Carl / Tooze John / Branden Carl / Tooze John /
Garland 1999Garland 1999
ISBN 0-8153-2304-2 ;0-8153-2305-0ISBN 0-8153-2304-2 ;0-8153-2305-0
2nd ed2nd ed
Proteins: structures and molecular propertiesProteins: structures and molecular properties
Creighton Thomas E. / Creighton Thomas E. /
2nd ed2nd ed
c1993c1993
ISBN 0-7167-2317-4 ; 0-7167-2334-4ISBN 0-7167-2317-4 ; 0-7167-2334-4
Molecular modelling. Principles and applicationsMolecular modelling. Principles and applications
Leach Andrew R. / Leach Andrew R. /
Longman, 2001Longman, 2001
ISBN 0-582-38210-6ISBN 0-582-38210-6
2nd ed2nd ed
Structural bioinformaticsStructural bioinformatics
Bourne P. E. / Weissig H. / Bourne P. E. / Weissig H. /
Wiley-Liss 2003Wiley-Liss 2003
ISBN 0-471-20199-5ISBN 0-471-20199-5
Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 2Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 2
Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony / Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony /
19931993
ISBN 90-7219-915-4ISBN 90-7219-915-4
Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 3Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 3
Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony / Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony /
19931993
ISBN 90-7219-925-1ISBN 90-7219-925-1
Computational biochemistry and biophysicsComputational biochemistry and biophysics
Becker Oren M. / et al. /Becker Oren M. / et al. /
M. Dekker 2001M. Dekker 2001
ISBN 0-8247-0455-XISBN 0-8247-0455-X