123
computational methods computational methods for the study of for the study of biological biological macromolecules macromolecules Rachid C. Maroun, PhD Rachid C. Maroun, PhD Unité de Bioinformatique Unité de Bioinformatique Structurale Structurale Institut Pasteur Institut Pasteur Paris, FRANCE Paris, FRANCE

Theoretical and computational methods for the study of biological macromolecules Rachid C. Maroun, PhD Unité de Bioinformatique Structurale Institut Pasteur

Embed Size (px)

Citation preview

Theoretical and computational Theoretical and computational methods for the study of methods for the study of

biological macromoleculesbiological macromolecules

Rachid C. Maroun, PhDRachid C. Maroun, PhDUnité de Bioinformatique Unité de Bioinformatique

StructuraleStructuraleInstitut PasteurInstitut PasteurParis, FRANCEParis, FRANCE

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN

CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

INTRODUCTIONINTRODUCTION

Biological molecules are complex systems.Biological molecules are complex systems.

The true conformation is a network of loops, twists The true conformation is a network of loops, twists and folds, all stacked together in a well-defined and folds, all stacked together in a well-defined dynamical 3D structure. It’s this morphology, in dynamical 3D structure. It’s this morphology, in general that gives the protein its “life” and that general that gives the protein its “life” and that determines its activity, its role.determines its activity, its role.

The determination of the 3D structure of biological The determination of the 3D structure of biological

molecules and the way in which this structure is molecules and the way in which this structure is linked to the function and to the sequence is of linked to the function and to the sequence is of fundamental importance. fundamental importance.

DiversityDiversityDNADNA ProteinsProteins

BIOLOGICAL MACROMOLECULESBIOLOGICAL MACROMOLECULES

-L-galactopyranose,a monosaccharide

Di- and polysaccharides

CarbohydratesCarbohydrates

RNA

BRANCHPOINT HELIX FROM YEAST AND BINDING SITE FOR PHAGE GA/MS2 COAT PROTEINS

The complexes The complexes formed withformed with ligands ligands

(substrates (substrates cofactors, cofactors, inhibitors, drugs, inhibitors, drugs, receptor agonists receptor agonists and antagonists)and antagonists)

other other macromoleculesmacromolecules

Important for biological function:

Growth of databanksGrowth of databanks

3D protein structures3D protein structuresNucleotide sequencesNucleotide sequences

Growth of PDB

Thus, theoretical methods aim at:Thus, theoretical methods aim at:

Generating reliable 3D models in a Generating reliable 3D models in a reasonable amount of timereasonable amount of time

Avoiding an exhaustive experimental Avoiding an exhaustive experimental determination of the structures of all determination of the structures of all sequences, e.g. Structural Genomicssequences, e.g. Structural Genomics

Relationships between the sequence, Relationships between the sequence, and 3D structure spacesand 3D structure spaces

Séquences

> 35% identité

< 35% identité

1CEM00 (Cellulase)

*Classification CATH

Topology (fold family)*

Homologous superfamily*

3D Structures

> 35% identité

Relationships between the sequence, Relationships between the sequence, 3D structure and function spaces3D structure and function spaces

Séquences

Fonctions

> 35% identité

< 35% identité

1CEM00 (Cellulase)

Alpha/alpha barrel

Gtransférase

Architecture*

*Classification CATH

Topology (fold family)*

Homologous superfamily*

Structures

> 35% identité

Some conclusionsSome conclusions

• The space of structures is finite and much smaller than that of sequences and functions.

• Evolutionary, the 3D structure is more conserved than the sequence.

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN

CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

Primary, i.e. the sequencePrimary, i.e. the sequence MELIRVLANLLILQLSYAQKSSELVFGGDECNINEHRMELIRVLANLLILQLSYAQKSSELVFGGDECNINEHR

SLVVLFNSNGFLCGGTLINQDWVVTAAHCDSNNF…SLVVLFNSNGFLCGGTLINQDWVVTAAHCDSNNF…

Structure levels of a Structure levels of a polypeptide chainpolypeptide chain

Arginine, R(+ charge)

Asparagine (N)(hydrophilic)

Aspartate D (- charge, hydrophilic) Histidine (H)

Lysine (K)(+ charge)

Tyrosine (Y)(aromatic)

R : Valine (V)(aliphatic hydrophobic)

Methionine (M)(hydrophobic)

Composition of proteinsComposition of proteins-The amino acid residues--The amino acid residues-

Some representative types of the 20 naturally-ocurring amino acidsSome representative types of the 20 naturally-ocurring amino acids

Properties of amino acid residues: http://www.imb-jena.de/IMAGE_AA.html

Secondary structureSecondary structure

The main chain N-Ca and Ca-C bonds are free to rotate. These The main chain N-Ca and Ca-C bonds are free to rotate. These rotations are represented by the torsion angles rotations are represented by the torsion angles and and respectively.respectively.

Right-handed Right-handed -helix-helix

Ideal valuesIdeal values

= -57.8° = -57.8° = -47.0°= -47.0°

The peptide bond The peptide bond is planar:is planar:

= 180° trans= 180° trans

= 0° cis= 0° cis

PropertiesProperties

Pitch p = 5.4 Å

N = 3.6 amino acid residues / turn

Rise r = 1.5 Å / residue

Backbone radius = 2.3 Å

H-bond between C=Oi and NHi+4

Peptide planes are roughly parallel with the helix axis. Each peptide unit has a dipole moment.

Thus, the dipoles within the helix are aligned giving rise to a macrodipole moment.

Side chains point outward from helix axis.

Formation of the helix is cooperative.

Other helical structures

The The -strand-strand

Ideal valuesIdeal values = -139.0° = -139.0° = 135.0°= 135.0°

The The -sheet-sheet

PropertiesProperties Pitch p = 6.8 Å N = 2.0 amino acid residues / turn Rise r = 3.4 Å / residue

LoopsLoops

v1 v2

Secondary structures

v3

anchor

anchor

loop

Peptide fragments that connect regular Peptide fragments that connect regular secondary structure elements (secondary structure elements (-helices or -helices or -strands)-strands)

Furnish the directional changes necessary to Furnish the directional changes necessary to obtain a globular formobtain a globular form

Found often at the surface of globular Found often at the surface of globular proteinsproteins

Form hydrogen bonds with waterForm hydrogen bonds with water Are in general very flexibleAre in general very flexible Other loops have specific non-repetitive, Other loops have specific non-repetitive,

stable and ordered structuresstable and ordered structures Have a length of 2-16 residuesHave a length of 2-16 residues

Classification of the structure Classification of the structure of protein loopsof protein loops

Type: AR beta-beta linkType: AR beta-beta link Type: EH beta-alphaType: EH beta-alpha Type: HA beta-beta hairpinType: HA beta-beta hairpin Type: HE alpha-betaType: HE alpha-beta Type: HH alpha-alphaType: HH alpha-alpha

v1 v2

Secondary structures

v3anchor

anchor

loop

http://sbi.imim.es/cgi-bin/archdb/loops.pl?http://sbi.imim.es/cgi-bin/archdb/loops.pl?

-turns (reverse turns)-turns (reverse turns)

A special case of loops A special case of loops with < 6 residues.with < 6 residues.

Ideal values:Ideal values:

-turn I-turn I II II

i+1 i+1 -60°-60° -60° -60°

i+1 i+1 -30°-30° 120°120°

i+2 i+2 -90°-90° 80° 80°

i+2i+2 0° 0° 0° 0°

Secondary structures of globular Secondary structures of globular proteinsproteins

Occurrence (%): simple loops 21 reverse turns 15 complex loops 10 helices 26 b-sheets 19

Average length (residues ): helices 9.3 -sheets 5.3 loops 5.9

Tertiary structureTertiary structure Array of secondary structures => tertiary structure (the fold)Array of secondary structures => tertiary structure (the fold)

Relative positioning of the secondary structuresRelative positioning of the secondary structures

Interactions that stabilize the new level of structureInteractions that stabilize the new level of structure Covalent bondsCovalent bonds

• S-S bridgesS-S bridges Non-covalent bondsNon-covalent bonds

• Hydrogen bondsHydrogen bonds

• Salt (ionic) bridgesSalt (ionic) bridges

• Hydrophobic effectHydrophobic effect

Folding is cooperativeFolding is cooperative

Effects of the tertiary Effects of the tertiary structurestructure

Induction of a given secondary structureInduction of a given secondary structure

New spatial repartition of the residuesNew spatial repartition of the residues Solvent-exposedSolvent-exposed BuriedBuried

New functionalityNew functionality

Example: sperm whale myoglobinExample: sperm whale myoglobin

The protein is complexed to

protoporphyrin IX containing

Fe

The Ramachandran plotThe Ramachandran plot

Hierarchical classification of Hierarchical classification of protein tertiary structuresprotein tertiary structures

CATH: CATH: www.biochem.ucl.ac.uk/bsm/cath/

SCOP: SCOP: scop.mrc-lmb.cam.ac.uk/scop/

DALI: DALI: www.ebi.ac.uk/dali/

Quaternary structureQuaternary structure

Assemblage of tertiary structures to produce a higher level of structureAssemblage of tertiary structures to produce a higher level of structure

Quaternary structureQuaternary structure

Quaternary structureQuaternary structure Protein polymersProtein polymers Closed aggregates or oligomersClosed aggregates or oligomers

• HomoHomo• HeteroHetero

SymmetrySymmetry ChemistryChemistry

StabilityStability Covalent bondsCovalent bonds Non-covalent bondsNon-covalent bonds

CooperativityCooperativity Structural and functional regulationStructural and functional regulation

• Allostery, e.g. the oxy to deoxy transition of the 4-mer of hemoglobinAllostery, e.g. the oxy to deoxy transition of the 4-mer of hemoglobin

Chemical or biological activityChemical or biological activity No consequences = > monomer as active as the oligomerNo consequences = > monomer as active as the oligomer New activity, absent in the absence of oligomerization, e.g. the active site New activity, absent in the absence of oligomerization, e.g. the active site

residues may come from several subunitsresidues may come from several subunits

Some properties of quaternary structure

ExamplesExamples

Heterodimer: Chain: a, phospholipase a2 inhibitor and chain b, phospholipase a2.

Homohexameric

Snake venom vipoxin complex

Human ephb2 receptor sam domain

The monomer of The monomer of chaperonin GroEl chaperonin GroEl (HSP60 CLASS)(HSP60 CLASS)

The tetradecamerThe tetradecamer

Complexity in quaternary structureComplexity in quaternary structure

A nucleic acid polyphosphate chain has an even A nucleic acid polyphosphate chain has an even larger nombre of potential conformations, given larger nombre of potential conformations, given that it contains 6 backbone torsion angles per that it contains 6 backbone torsion angles per monomermonomer

Other biopolymers

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN

CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

The conformational hyperspaceThe conformational hyperspace - -The potential energy landscape-The potential energy landscape-

The conformation of a biopolymer is a function of The conformation of a biopolymer is a function of a large number of degrees of freedom.a large number of degrees of freedom.

The surface described by the potential energy The surface described by the potential energy function in this n-dimensional space is very function in this n-dimensional space is very complex.complex.

If the bond lengths and valence angles of a If the bond lengths and valence angles of a polypeptide chain are fixedpolypeptide chain are fixed the chain contains 2 degrees of freedom per residue -the chain contains 2 degrees of freedom per residue -

the torsion angles the torsion angles et et this determines in a unique fashion the conformation of this determines in a unique fashion the conformation of

the molecule.the molecule.

Rotational Isomeric State (RIS) TheoryRotational Isomeric State (RIS) Theory

For a given bond, the torsion For a given bond, the torsion angles may adopt a angles may adopt a discretdiscret and and finitefinite number of states that number of states that correspond to the minimina of correspond to the minimina of the potential energy function.the potential energy function.

C = mC = mnn

C: number of conformationsC: number of conformations m: number of rotational m: number of rotational

statesstates n: number of bondsn: number of bonds

For For m = 3m = 3 and n = 100 and n = 100 C = 3C = 3100100~10~104848

The native state is assumed to be the state of miniminum global energy.The native state is assumed to be the state of miniminum global energy.

G of native state <----> denatured state may be very smallG of native state <----> denatured state may be very small

Even with the use of Even with the use of RISRIS theory, the polypeptide chain has a very large number theory, the polypeptide chain has a very large number of potential conformations.of potential conformations.

=>Formation of a conformational hyperespace composed of a multitude of =>Formation of a conformational hyperespace composed of a multitude of minima and maxima of the energy function, with many non native low energy minima and maxima of the energy function, with many non native low energy conformations separated by (high) energy barriers.conformations separated by (high) energy barriers.

A given energy-optimized geometry depends on the starting geometry.A given energy-optimized geometry depends on the starting geometry.

For complex functions of several variables, there is no analytical solution for For complex functions of several variables, there is no analytical solution for elucidation of the global minimum.elucidation of the global minimum.

Even with numerical methods, the possibility exists of becoming trapped in local Even with numerical methods, the possibility exists of becoming trapped in local minima.minima.

Furthermore, it is impossible to search and examine exhaustively all the Furthermore, it is impossible to search and examine exhaustively all the accessible conformations.accessible conformations.

Thus, the need to face and circumvent this problem => algorithms for the Thus, the need to face and circumvent this problem => algorithms for the prediction of protein structure.prediction of protein structure.

Prediction of protein side-chain conformations or Prediction of protein side-chain conformations or rotamersrotamers

Important component of any modeling method (homology Important component of any modeling method (homology modeling, ab initio structure prediction) modeling, ab initio structure prediction) Applications include study of mutations.Applications include study of mutations.

The side chain torsion angles are named 1, 2, 3, etc. and the atoms , , , etc.

Problem: side chains can adopt several conformations.

Example: the aspartate residue 2 angles , 9 rotamers.

Example: the arginine residue, 5 angles.

Side chain conformational searchSide chain conformational search Combinatorial problem Combinatorial problem

a complex search problem among interacting side chains in order to find a a complex search problem among interacting side chains in order to find a global minimum.global minimum.

The minimum number of variables to consider are:The minimum number of variables to consider are: the number of rotamers for each side chainthe number of rotamers for each side chain the number of neighboring side chains interacting with each rotamerthe number of neighboring side chains interacting with each rotamer

The rotamers for the 20 amino acid residues are stored in databases The rotamers for the 20 amino acid residues are stored in databases (e.g. the SCWRL library, (e.g. the SCWRL library, http://dunbrack.fccc.edu/SCWRL3.phphttp://dunbrack.fccc.edu/SCWRL3.php).).

The number of rotamers in a library depends on the The number of rotamers in a library depends on the angle cutoff. angle cutoff.

For a 40° For a 40° angle cutoff, a library contains 214 side-chain rotamers. angle cutoff, a library contains 214 side-chain rotamers.

The energy function or score function may be based on the rotamer The energy function or score function may be based on the rotamer library and other terms, such as a repulsive steric energy.library and other terms, such as a repulsive steric energy.

Side chain rotamersSide chain rotamers

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE FORCES THAT STABILIZE

PROTEIN CONFORMATIONPROTEIN CONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

A number of interaction energies stabilize the A number of interaction energies stabilize the structure of proteins and need to be taken into structure of proteins and need to be taken into account in order to quantify the different types of account in order to quantify the different types of energy that govern the behavior and the stability energy that govern the behavior and the stability of a molecule:of a molecule: Hydrogen bonds and the aqueous solventHydrogen bonds and the aqueous solvent Hydrophobic effectHydrophobic effect van der Waals interactions (steric)van der Waals interactions (steric) Electrostatic interactions (ionic or salt bridges)Electrostatic interactions (ionic or salt bridges) Covalent "cross-link" bonds, such as disulfide bondsCovalent "cross-link" bonds, such as disulfide bonds

The hydrogen bondThe hydrogen bond An H atom is attracted by rather strong forces to 2 atoms

instead of only one.

So, it may be considered to be acting as a bond between them.

Partially positively charged H atom lies between partially negatively charged O and N.

In water, H is covalently attached to the O (about 492 kJ mol-1).

But has an additional attraction (about 23.3 kJ mol-1 (almost 10x the average thermal fluctuation at 25°C) to a neighboring O of another water molecule.

That is far greater than the included van der Waals interaction (about 5.5 kJ mol-1).

The bond is part electrostatic (90%) and part covalent (10%).

The bond may be approximated by the following states covalent HO-H····OH2 (major) ionic HOd--Hd+····Od-H2 covalent HO-····H-O+H2 (minor)

Holds the two strands of the DNA double helix together

Holds polypeptides together in secondary structures

Helps enzymes bind to their substrate

Helps antibodies bind to their antigen

Helps transcription factors bind to each other

Helps transcription factors bind to DNA

Numerous functions, one of which is that of StockmayerNumerous functions, one of which is that of Stockmayer• Ehb = 4 [ (s/r')12 - (s/r')6] - (µoµh/r3) g(∂o,∂h,ø)

• r': distance between the hydrogen and the h-acceptor; s: r': distance between the hydrogen and the h-acceptor; s: coefficients independent of r‘; µ: dipole moments centered on coefficients independent of r‘; µ: dipole moments centered on the acceptor and the hydrogen; g: angular dependent functionthe acceptor and the hydrogen; g: angular dependent function

The water solventThe water solvent A mediator in molecular interactions.A mediator in molecular interactions.

0.25-0.45g associated for each gram of protein.0.25-0.45g associated for each gram of protein.

Polar molecule.Polar molecule.

Essentially a proton donnor, e.g. side chains like Asp and Glu are strongly hydrated.Essentially a proton donnor, e.g. side chains like Asp and Glu are strongly hydrated.

As proton acceptor, the O (electronegative) links to H atoms of neighboring molecules. As proton acceptor, the O (electronegative) links to H atoms of neighboring molecules.

40% of water h-bonds take place with the C=O group of the backbone and 44% with the side 40% of water h-bonds take place with the C=O group of the backbone and 44% with the side chains.chains.

Liquid water forms clusters or cage structures . The h-Liquid water forms clusters or cage structures . The h-bonds that stabilize this network are in constant fluctuation, bonds that stabilize this network are in constant fluctuation, breaking and reforming with a high frequency of the order breaking and reforming with a high frequency of the order of the picosecond.of the picosecond.

In solution, a locally-ordered layer of water, which has a In solution, a locally-ordered layer of water, which has a different behavior from bulk water, is associated in different behavior from bulk water, is associated in permanence to a protein. permanence to a protein.

Internal water molecules contribute strongly to the stability of Internal water molecules contribute strongly to the stability of protein structure.protein structure.

When 2 macromolecules get together to form a complex, When 2 macromolecules get together to form a complex, some surface water must be displaced and suffer significant some surface water must be displaced and suffer significant arrangements. The interface water influences the formation arrangements. The interface water influences the formation of supramolecular structures.of supramolecular structures.

Water contributes to the stability of the tertiary and Water contributes to the stability of the tertiary and quaternary structure of proteins.quaternary structure of proteins.

Water plays also an important role in enzymatic catalysis.Water plays also an important role in enzymatic catalysis.

Water is associated to metallic ligands of proteins.Water is associated to metallic ligands of proteins.

Solvation effects and the properties of water are Solvation effects and the properties of water are very important and help to better understand very important and help to better understand biological processes.biological processes.

Small ions (counter-ions such as Na+, K+, Ca+Small ions (counter-ions such as Na+, K+, Ca++) and other charged molecules like phosphates +) and other charged molecules like phosphates are also present in aqueous solution.are also present in aqueous solution.

There are specific theories (e.g. Debye-Huckel) There are specific theories (e.g. Debye-Huckel) that deal with simple electrolyte solutions.that deal with simple electrolyte solutions.

Models types a, b and c are all planar whereas type d is almost tetrahedral.

Models of waterModels of water

The Hydrophobic effectThe Hydrophobic effect Major driving force for the folding of globular

proteins.

Results in the burial of the hydrophobic

residues in the core of the protein.

Exemplified by the fact that oil and water do

not mix.

The thermodynamic factors which give rise to the hydrophobic effect are complex and still incompletely understood.

The free energy of transfer of a non-polar compound from some reference state, such as an organic solution, into water,

Gtr = Htr - T Str

Htr, enthalpy, and -TStr entropy.

At room temperature, Htr from organic

solution into aqueous solution is negligible.

Str < 0 since water tends to form ordered

cages around the non-polar molecule.

Thus, Gtr > 0

Model compound studies predict that the hydrophobic effect of exposing one buried methylene group to bulk water is 0.8 kcal/mol.

Site directed mutagenesis studies yielded a larger number with greater statistical variation: the average hydrophobic effect estimated by SDM for a buried methylene group is about 1.3 kcal/mol.

Steric interactionsSteric interactions

Van der WaalsVan der Waals E vw= k ∑ [aij/rij

m - cij/rij6]

• rrijij: distance between particles i and j.: distance between particles i and j.

• The mild attractif term rThe mild attractif term r-6-6 dependence dependence From what is called induced dipole--dipole moment interaction of the From what is called induced dipole--dipole moment interaction of the

particlesparticles

• The repulsive term results from mutual deformation of the The repulsive term results from mutual deformation of the structuresstructures

When m=12, the term is called Lennard-JonesWhen m=12, the term is called Lennard-Jones

• aaijij, c, cijij and and are coefficients that depend on the type of atom. are coefficients that depend on the type of atom.

Lennard-Jones potentialLennard-Jones potential

Electrostatic interactionsElectrostatic interactions

Electrostatic interactions. The monopole Electrostatic interactions. The monopole approximation of coulombic point charges is approximation of coulombic point charges is used. used.

• E E élecélec = (1/4 = (1/4ππ))∑∑qqii q qjj/ r/ rijij

: permittivity of the medium or relative microscopic : permittivity of the medium or relative microscopic dielectric coefficientdielectric coefficient

qqii: partial charges on atom i.: partial charges on atom i.

Determine a distribution of partial charges.Determine a distribution of partial charges.

Coulombic interactionsCoulombic interactions

BREAK !!!BREAK !!!

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN

CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

Quantum MechanicsQuantum Mechanics Classical MechanicsClassical Mechanics

Molecular Mechanics Molecular Mechanics Statistical Mechanics (several methods) Statistical Mechanics (several methods) Normal Modes of Vibration Normal Modes of Vibration Conformational searchConformational search

Ab initioAb initio Threading or fold recognitionThreading or fold recognition

METHODOLOGICAL METHODOLOGICAL APPROCHESAPPROCHES

Quantum mechanicsQuantum mechanics

Uses (time-dependent) Schrödinger’s equation:Uses (time-dependent) Schrödinger’s equation:

The Hamiltonian operator is given byThe Hamiltonian operator is given by

Psi is the wavefunction which contains all information about the Psi is the wavefunction which contains all information about the dynamical properties of the system and U is the energy for the state.dynamical properties of the system and U is the energy for the state.

The first term represents the kinetic energy and the second the The first term represents the kinetic energy and the second the potential energy. The z’s are the electric charges.potential energy. The z’s are the electric charges.

Calculates the total potential energy of the molecule asCalculates the total potential energy of the molecule as U (total) = U (nuclear) + U (electronic).U (total) = U (nuclear) + U (electronic).

The Born-Oppenheimer approximation assumes that the The Born-Oppenheimer approximation assumes that the motions of the atomic nuclei are independent of the motions of the atomic nuclei are independent of the motion of the electrons:motion of the electrons:

U (total) = K + U (electronic).U (total) = K + U (electronic).

Two types of calculation have been developped to get E Two types of calculation have been developped to get E (electronic) :(electronic) :

Ab-initioAb-initio

Semi-empiricalSemi-empirical

Ab initioAb initio Quasi-complete calculation of the molecular wave function: Quasi-complete calculation of the molecular wave function:

no parametrization.no parametrization. Molecular orbitalsMolecular orbitals Rather trustworthyRather trustworthy High degree of precision for the energy calculation.High degree of precision for the energy calculation. Applicable to molecules containing less than 40-50 atoms.Applicable to molecules containing less than 40-50 atoms. Calculations are rather long and need a lot of disk spaceCalculations are rather long and need a lot of disk space Allows step-by-step energy optimisationAllows step-by-step energy optimisation

Semi-empiricalSemi-empirical Certain 2- and 4-electron integrals (differential overlap of Certain 2- and 4-electron integrals (differential overlap of

orbitals) are parametrized in order to fit experimental and orbitals) are parametrized in order to fit experimental and theoretical data. The goal is to find the solutions (WFs) and theoretical data. The goal is to find the solutions (WFs) and the eigenvalues (energies of the system) of Schrödinger’s the eigenvalues (energies of the system) of Schrödinger’s equation.equation.

The « force field » methodThe « force field » method Uses an empirical function of the internal or potential energyUses an empirical function of the internal or potential energy Assumes thatAssumes that

• U (total) = U (nuclear)U (total) = U (nuclear) So, no drastic changes in the electronic structure, such as formation or So, no drastic changes in the electronic structure, such as formation or

breaking of covalent bonds, are allowed.breaking of covalent bonds, are allowed. Molecules are treated with the laws of classical mechanicsMolecules are treated with the laws of classical mechanics

The total energy of interaction is decomposed in a number of The total energy of interaction is decomposed in a number of independent terms that are adjusted to be in agreement with independent terms that are adjusted to be in agreement with experimental data, such as NMR, IR, circular dichroism and X-experimental data, such as NMR, IR, circular dichroism and X-ray diffractionray diffraction

U (syst) = ∑UU (syst) = ∑Uii (intramolecular) + ∑∑U (intramolecular) + ∑∑U i,ji,j (intermolecular) (intermolecular)

Molecular mechanicsMolecular mechanics

The internal degrees of freedom that define the 3D conformation of a molecule

A

BC

D

A Blo

C

A

B

o

o

Bond length Bond angle

Proper torsionImproper torsion

A

DC

Bo imp

Decomposition of the interaction Decomposition of the interaction energyenergy

Bonding energiesBonding energies (for interactions (for interactions between atoms separated by 1, 2 or 3 between atoms separated by 1, 2 or 3 bonds)bonds) Deformation of bond lengths. Quadratic Deformation of bond lengths. Quadratic

function.function.• E str (l) = (1/2) ∑ kE str (l) = (1/2) ∑ kll

ii (l (lii-l-lioio))22; l; lii: bond length A – B ; l: bond length A – B ; lioio: :

equilibrium bond lengthequilibrium bond length

Deformation of valence angles. Quadratic function.Deformation of valence angles. Quadratic function.• E bend (E bend () = (1/2) ∑ k) = (1/2) ∑ k

ii ( (ii--ioio))22; ; ii: valence angle A - B – C ; : valence angle A - B – C ; i0i0: equilibrium valence angle: equilibrium valence angle

Deformation of proper dihedral angles or intrinsic Deformation of proper dihedral angles or intrinsic rotation around bonds. Trigonometric function. E tors rotation around bonds. Trigonometric function. E tors (ø) = ∑ (E(ø) = ∑ (Eøø

ii/2)[1+ S/2)[1+ Sii cos (|n cos (|nii|ø|øii)];E)];Eøøii: torsional barrier; : torsional barrier;

SSii: ±1 ; n: ±1 ; nii: 1, 2, 3, ... (periodicity) ;ø: 1, 2, 3, ... (periodicity) ;øii : proper torsion : proper torsion angle A - B - C - Dangle A - B - C - D

Deformation of improper dihedral angles. Quadratic Deformation of improper dihedral angles. Quadratic function.function.

• E impr (E impr () = (1/2) ∑ k) = (1/2) ∑ k ( (ii- - ioio))22; ; ii: improper angle A - B - C : improper angle A - B - C

– D; – D; ioio: equilibrium improper angle: equilibrium improper angle

Non-bonding energiesNon-bonding energies (for atoms separated by more than 3 bonds) (for atoms separated by more than 3 bonds) Van der WaalsVan der Waals

• E vw= k ∑ [aij/rijm - cij/rij

6]

Electrostatic interactionsElectrostatic interactions

• E E élecélec = (1/4 = (1/4ππ))∑∑qqii q qjj/ r/ rijij

Hydrogen bondsHydrogen bonds

• Ehb = 4 [ (s/r')12 - (s/r')6] - (µoµh/r3) g(∂o,∂h,ø)

• others

Constraint termsConstraint terms• Used to fix the values of the distances, angles or torsions. Usually quadratic.Used to fix the values of the distances, angles or torsions. Usually quadratic.

Some force fields use mixed termsSome force fields use mixed terms

• E str/bend (l, E str/bend (l, ) = ∑ k) = ∑ kiill (l (lii- l- lioio) () (ii- - ioio).).

Hydrophobic interactions and others (solvation, ...).Hydrophobic interactions and others (solvation, ...).

In general, the total potential energy is the sum of In general, the total potential energy is the sum of individual contributions:individual contributions: U tot = U str + U bend + U tor + U vw + U elec + U hb + U U tot = U str + U bend + U tor + U vw + U elec + U hb + U

miscmisc

This method needs values for the parameters, such This method needs values for the parameters, such as the force constants k:as the force constants k: Extract parameters from physical mesures (microwave Extract parameters from physical mesures (microwave

radiation, infrared, neutron diffraction, crystal lattice radiation, infrared, neutron diffraction, crystal lattice packing, etc.) on model compoundspacking, etc.) on model compounds

Refine with calculations on model compounds similar to Refine with calculations on model compounds similar to the structural object under study the structural object under study

Considers only the internal energy of the molecule Considers only the internal energy of the molecule and its geometry.and its geometry.

Quick and efficient.Quick and efficient.

Minimization algorithmsMinimization algorithms

No derivatives of the energy functionNo derivatives of the energy function SIMPLEX : coarse atom-by-atom minimizationSIMPLEX : coarse atom-by-atom minimization Used for very deformed structuresUsed for very deformed structures

Calculation of the first derivatives with condition for the Calculation of the first derivatives with condition for the extremum ∂U/∂r = 0.extremum ∂U/∂r = 0. Iterative methods of the typeIterative methods of the type

• line search - steepest descent line search - steepest descent • conjugate gradients conjugués conjugate gradients conjugués

Analytical or numerical calculation of the second derivatives Analytical or numerical calculation of the second derivatives with condition ∂with condition ∂22U/∂rU/∂r22 > 0 > 0 Newton-RaphsonNewton-Raphson If the function is quadratic, the condition for the extremum If the function is quadratic, the condition for the extremum

leads to the inverse of the Hessian.leads to the inverse of the Hessian.

Electrostatics and the molecular Electrostatics and the molecular electrostatic potential (MEP)electrostatic potential (MEP)

The potential contour at +3 kcal/mol (red) gives an indication of whether there is a significant MEP field near the molecular surface.

Solve the Poisson-Solve the Poisson-Boltzmann equation to Boltzmann equation to get the MEP field u(x)get the MEP field u(x)

d: relative dielectric,

f: molecular partial charge density

fi: counter-ion partial charge

density

E: charge of an electron

Protein A Protein B

Statistical mechanicsStatistical mechanics a) Molecular Dynamicsa) Molecular Dynamics

Use newtonian mechanics to obtain the simultaneous Use newtonian mechanics to obtain the simultaneous positions and the velocities for all the atoms as a function positions and the velocities for all the atoms as a function of time.of time.

Solve the system of simultaneous equations:Solve the system of simultaneous equations:• dd22xxii/dt/dt22 = a = aii = F = Fii/m/mii; F; Fii = - ∂V/∂x = - ∂V/∂xii

• V is the potential function; mV is the potential function; mii, x, xii and a and aii are the mass, are the mass, position and acceleration of atom i, respectively.position and acceleration of atom i, respectively.

Integrating the equation twice, obtain x=f(t) => the Integrating the equation twice, obtain x=f(t) => the trajectory of each atom is simulated.trajectory of each atom is simulated.

The time scale for the simulation must be lower than the The time scale for the simulation must be lower than the frequencies associated to certain structural modifications, frequencies associated to certain structural modifications, such as the deformation of bond lengths (10such as the deformation of bond lengths (10-12-12 s ou 1ps). s ou 1ps).

The force field may be that of Molecular Mechanics.The force field may be that of Molecular Mechanics.

The temperature of the system is linked to the time-The temperature of the system is linked to the time-average of the kinetic energy:average of the kinetic energy:

<E<Ekinetickinetic>>NVTNVT = (3/2) N k = (3/2) N kBB T T <E<Ekinetickinetic>>NVTNVT = (1/2) m v = (1/2) m v22

• N: number of particlesN: number of particles• V: volume of the systemV: volume of the system• T: absolute temperatureT: absolute temperature• kkBB: Boltzman’s constant: Boltzman’s constant

Modify the temperature of the system by scaling the Modify the temperature of the system by scaling the velocities.velocities.

At high temperatures, the system is capable of At high temperatures, the system is capable of occupying high energy regions in the conformational occupying high energy regions in the conformational space, thus overcoming high energetic barriers. space, thus overcoming high energetic barriers.

By cooling the system, the probability of the low-energy By cooling the system, the probability of the low-energy states increases. At T=0 K the system should occupy the states increases. At T=0 K the system should occupy the lowest-energy state.lowest-energy state.

EEtotal total == EEkinetic kinetic ++ EEpotentialpotential

MD is executed in the following thermodynamic MD is executed in the following thermodynamic ensemblesensembles

Microcanonical: constant NVE (traditional)Microcanonical: constant NVE (traditional) Canonical: constant NVTCanonical: constant NVT Isothermal-isobaric: constant NPTIsothermal-isobaric: constant NPT Grand canonical: constant Grand canonical: constant VT VT

• N : number of particules, V : volume, E : energy, T : temperature, N : number of particules, V : volume, E : energy, T : temperature,  :  : chemical potential).chemical potential).

The limitations of MD are The limitations of MD are the high computer ressources neededthe high computer ressources needed the need to have available a big number of starting the need to have available a big number of starting

conformationsconformations the limited time scale (order of nanoseconds) the limited time scale (order of nanoseconds)

Steered Steered molecular molecular dynamics, dynamics, performed by performed by applying a series applying a series of external forces of external forces to retinal, allow to retinal, allow one to extract one to extract retinal from retinal from bacteriorhodopsibacteriorhodopsin once the Schiff n once the Schiff base bond to base bond to Lys216 is cleaved Lys216 is cleaved

QuickTime™ and aVideo decompressor

are needed to see this picture.

Steered Steered molecular molecular dynamics of dynamics of the action of the action of human human synovial synovial protein protein phospholipase phospholipase A2 (PLA2) at A2 (PLA2) at the lipid water the lipid water interfaceinterface

QuickTime™ and aVideo decompressor

are needed to see this picture.

Conformational Conformational changes induced changes induced in the kinesin in the kinesin structure (blue) structure (blue) by the additional by the additional gamma gamma phosphate phosphate (green) of ATP(green) of ATP

QuickTime™ and aVideo decompressor

are needed to see this picture.

b) Monte-Carlob) Monte-Carlo Method that generates conformations by assigning Method that generates conformations by assigning

random values to the torsion angles.random values to the torsion angles. The energy is calculated for each generated The energy is calculated for each generated

conformationconformation Conformations are kept or rejected using the following Conformations are kept or rejected using the following

criteria:criteria:• if Ei+1 < Ei, conformation i is retained with probability 1;if Ei+1 < Ei, conformation i is retained with probability 1;

• if Ei+1 > Ei, one calculates a probability pif Ei+1 > Ei, one calculates a probability p ii, for ex. by the , for ex. by the

Metropolis algorithm, which generates a chain of Markov Metropolis algorithm, which generates a chain of Markov states.states.

• If pIf pii > e- > e-Ei/ktEi/kt, the conformation is retained with p, the conformation is retained with p ii; otherwise, ; otherwise,

the conformation is submitted again to a new random the conformation is submitted again to a new random variation.variation.

c) the partition function Zc) the partition function Z

Z describes the distribution of conformational states of a polymeric Z describes the distribution of conformational states of a polymeric chain of N residues and it’s based on RIS theorychain of N residues and it’s based on RIS theory

Z = K ∫ [exp (–E(1,…, N)/RT)] d1d1d1d2d2 … dN

The exponential term represents a statistical weightThe exponential term represents a statistical weight

Z establishes the connection between the (macromolecular) Z establishes the connection between the (macromolecular) thermodynamic properties, such as Gibbs free energy, the enthalpy, thermodynamic properties, such as Gibbs free energy, the enthalpy, the entropy and the calorific capacity and the microscopiques the entropy and the calorific capacity and the microscopiques phenomena.phenomena.

In the case of the Ising model (nearest-neighbor interactions only) and by In the case of the Ising model (nearest-neighbor interactions only) and by dividing the conformational space in 2 regions, helix (dividing the conformational space in 2 regions, helix ( and non- and non- or or random coil (c), for a homopolymeric chain of N-residues, the generator random coil (c), for a homopolymeric chain of N-residues, the generator matrix method allows Z to be expressed as a matrix productmatrix method allows Z to be expressed as a matrix product

Z = K [1,1] z1c 0 z2

c z2 … zN

c zN

1

0 z1 z2

c z2 … zN

c zN

1

<<< 1, the nucleation parameter <<< 1, the nucleation parameter • takes into consideration helix-random coil junctionstakes into consideration helix-random coil junctions• is a penalty in the case of the presence of many junctions, is a penalty in the case of the presence of many junctions, • favorizes long segments of identical conformationfavorizes long segments of identical conformation

s, the cooperativity parameter represents the cooperativity between those s, the cooperativity parameter represents the cooperativity between those residues of identical conformationresidues of identical conformation

Each state carries with it a given statistical weight z associated to the Each state carries with it a given statistical weight z associated to the energy of the conformationenergy of the conformation

4) Langevin dynamics4) Langevin dynamics Differential stochastic equation where two force terms Differential stochastic equation where two force terms

have been added to Newton’s second law:have been added to Newton’s second law:• Friction forceFriction force• Stochastic forceStochastic force

FFii/m/mii - - iivvii + R + Rii(t)/m(t)/mii = a = aii

• where where ii = = ii/m/mii is the collision frequency and is the collision frequency and ii the friction the friction

coefficient. Rcoefficient. Rii(t) is a stochastic function.(t) is a stochastic function.

The collisions of the solvent and the solute may help The collisions of the solvent and the solute may help overcome energetic barriers.overcome energetic barriers.

LD may perform a better conformational research than LD may perform a better conformational research than MD (MD ( = 0). = 0).

5) Normal modes of vibration5) Normal modes of vibration Simple harmonic oscillations around a local minimum of energy.Simple harmonic oscillations around a local minimum of energy.

Each movement may be expressed as a superposition of normal modes.Each movement may be expressed as a superposition of normal modes.

For a non harmonic energy function, the potential close to the minimum may For a non harmonic energy function, the potential close to the minimum may be approximated by a harmonic potential.be approximated by a harmonic potential.

Any small-amplitude movement may be described as a sum of normal Any small-amplitude movement may be described as a sum of normal modes.modes.

As a globular protein is heated from low temperatures, the atomic As a globular protein is heated from low temperatures, the atomic fluctuations begin to deviate from the harmonic behavior around 200K.fluctuations begin to deviate from the harmonic behavior around 200K.

Qualitative and semi–quantitative estimates can be made for many Qualitative and semi–quantitative estimates can be made for many properties such as the magnitude of atomic fluctuations, displacement properties such as the magnitude of atomic fluctuations, displacement covariance matrix, vibrational entropy, etc.covariance matrix, vibrational entropy, etc.

Requires the calculation of theHessian, followed by the diagonalization. The Requires the calculation of theHessian, followed by the diagonalization. The normal-mode frequencies are directly related to the eigenvalues and the normal-mode frequencies are directly related to the eigenvalues and the normal modes to the eigenvectors.normal modes to the eigenvectors.

The low-frequency normal modes (under 30 cmThe low-frequency normal modes (under 30 cm-1-1), which correspond to ), which correspond to large-scale conformational changes, are often of central interest. large-scale conformational changes, are often of central interest.

Slowest Motions in the Intact RibosomeSlowest Motions in the Intact Ribosome

QuickTime™ and aGIF decompressor

are needed to see this picture.

QuickTime™ and aGIF decompressor

are needed to see this picture.

Mode 1 Mode 3

Ab initio methodsAb initio methods

Secondary structure predictionSecondary structure prediction

Theoretical algorithms• Systematic and exhaustive conformational search with

energy minimization.

• Main problem: the astronomical number of non -native low-energy conformations.

Empirical algorithms• Residue propensities

• Neural networks

• Others

Chou and Fasman’s empirical algorithm for the prediction of for the prediction of secondary structures in proteinssecondary structures in proteins

Considers the composition of small polypeptide segments. If those segments are rich in residues found in helices or sheets, then that segment is considered to adopt the corresponding secondary structure.

f = n / n f : frequence of a given residue in an h in a series of protein structures n : number of given residues in h

n : total number of residues

P = f / <f> P : propensity of a particular residue to be in h

<f> : mean value of f for each of the 20 residues

P = f / <f> P : propensity of a particular residue to be in a sheet <f> : mean value of f for each of the 20 residues

A list of propensities and based on the analysis of X-ray structures is obtained:

H : very favorable i : indifferent h : favorable b : non favorable I : little favorable B : very non favorable

RulesRules

1. A group of 4 residues H1. A group of 4 residues H, h, h, I, I (I (I counts for counts for half) over 6 contiguous residues shall initiate an half) over 6 contiguous residues shall initiate an h.h.

2. The helicoïdal segment will propagate in the 2 2. The helicoïdal segment will propagate in the 2 directions until the mean value of Pdirections until the mean value of P of a tetrapeptidic of a tetrapeptidic segment goes under 1.0.segment goes under 1.0.

3. A Pro can find itself only in the N-terminus of an 3. A Pro can find itself only in the N-terminus of an h.h.

4. A group of 3 residues H4. A group of 3 residues H, h, h over 5 contiguous over 5 contiguous residues will initiate a strand. The strand will propagate in residues will initiate a strand. The strand will propagate in both directions until the mean value of Pboth directions until the mean value of P of a tetrapeptidic of a tetrapeptidic segment goes under 1.00.segment goes under 1.00.

5. For regions containing 5. For regions containing and and structures, the structures, the overlapping zone will be helicoïdal if <Poverlapping zone will be helicoïdal if <P> > <P> > <P> ; > ; otherwise, a strand will form.otherwise, a strand will form.

Rost and Sander’s algorithm - Uses a system of neural networks on a non redundant database - Uses a system of neural networks on a non redundant database

of more than 130 chains.of more than 130 chains.

- The network is trained on proteins that have evolved naturally.- The network is trained on proteins that have evolved naturally.

- It uses the information furnished by evolution thanks to multiple - It uses the information furnished by evolution thanks to multiple sequence alignements.sequence alignements.

- Attributes a weigth that depends on sequence conservation for a - Attributes a weigth that depends on sequence conservation for a given position.given position.

- Includes global amino acid composition.- Includes global amino acid composition.

- The conformational states are Helix (H), Sheet (E) and Other (L).- The conformational states are Helix (H), Sheet (E) and Other (L).

- Mean confidence: 50-80%.- Mean confidence: 50-80%.

Remarks

In fact, secondary structures are strongly In fact, secondary structures are strongly influenced by tertiary interactions, which is not influenced by tertiary interactions, which is not taken into account in many algorithms.taken into account in many algorithms.

Using weighted vector components or methods that take into account correlations between amino acid frequencies improve the accuracies.

Dataset size, class definition and cross-validation are of critical importance to the true accuracy of class prediction from amino acid composition

Tertiary structure predictionTertiary structure prediction

Obvious approaches for predicting protein structure from sequence

Ab initio: Simulate the folding process with the laws of physics

• Use a simplified polypeptide representation and restrain atom or residue positions to a hypothetical 2D or 3D lattice

• Use statistical potential to portray the true interaction energies

Search the entire conformational space available to the polypeptide for the correct fold.

Other approaches 1D-3D threading Threading Fold recognition

1D-3D alignments1D-3D alignments Encode 3D structural information into strings of symbols or Encode 3D structural information into strings of symbols or

profiles against which 1D strings derived from the query profiles against which 1D strings derived from the query sequence are aligned.sequence are aligned.

Defined structural environments on the basis of secondary Defined structural environments on the basis of secondary structure, solvent accessibility and burial by polar atoms. structure, solvent accessibility and burial by polar atoms. Profiles of scores for each of the 20 amino acids can then Profiles of scores for each of the 20 amino acids can then be calculated for each of the structural environment classes be calculated for each of the structural environment classes based on their observed frequencies in a database of based on their observed frequencies in a database of structures.structures.

Transform query sequences into strings of characters Transform query sequences into strings of characters representing several levels of conserved hydrophobicity or representing several levels of conserved hydrophobicity or of solvent accessibility. Dynamic programming alignments of solvent accessibility. Dynamic programming alignments using a substitution matrix derived from database counts using a substitution matrix derived from database counts may be able to detect remote homologies; include may be able to detect remote homologies; include secondary structure prediction information for the query secondary structure prediction information for the query sequence.sequence.

Limitations Much of the structural context

information is lost, compromising the specificity of sequence-structure matches,i.e. one amphipathic helix or strand is much like any other, and their mis-alignment is inevitable unless more detailed structural relationships are considered.

ThreadingThreading One solution to this problem

use a low level alignment to score the pairwise residue interactions (structural environment) at each equivalence by minimising an empirical pairwise distance

the low level alignment attempts to place query residues at positions in the library structure such that the distances between are similar to those observed frequently in a database for the amino acids of types

the final alignment and score is traced from a high level matrix derived from the low level alignment scores

Threading attempts thus to evaluate the sequence in 3D as it is threaded through a library structure

3D-1D and threading methods have similar success rates.

Fold recognitionFold recognition Protein structure and function, are more conserved than protein

sequence.

FR: Identification of correspondences between novel sequences and known structures would greatly assist in the characterisation of these sequences.

Idea: a query sequence may be compared to known structures and their sequences (the fold library).

The fold library is a set of known structures to which we wish to find similarities from a set of sequences of unknown structure, called queries.

Fold recognition and sequence database searching methods share the common aim of identifying distant ancestral relationships between sequences.

Structural comparisons give the least ambiguous measures of relatedness; sequence database annotations are often erroneous or absent.

Given that the number of distinct structures was not growing as fast as the PDB as a whole, only a finite and relatively small number of fold topologies must be encoded by the millions of protein sequences in nature.

Estimates: 1000-5000 different fold topologies.

Time scale for reaching it: tens of years.

Estimates from the deposition of structures set the probability of a newly sequenced protein (with no detectable structural homologue) being similar to a known fold at 70%.

Homologous Showing a fundamental similarity

of structure inherited from a common ancestor.

Applied to structures ranging from organs to molecules.

Homology modelingHomology modeling

ProtocolProtocol

Template searchingTemplate searchingTemplate selectionTemplate selectionMultiple sequence alignment Multiple sequence alignment Model buildingModel buildingModel refinementModel refinementStructure Validation Structure Validation

Template searchingTemplate searching Sequence search (BLAST) of the target against

sequence data banks (SwissProt, PIR, NCBI, etc.)

structure data banks (PDB)

If there are homologous sequences and structures with > 30% sequence identity

Sequence Homology Modeling

If identity < 30% 1D-3D alignment, i.e., threading and fold recognition

Template selectionTemplate selection(> 30% sequence identity)(> 30% sequence identity)

Selection of a set of representative structural templates (FSSP, a fold classification based on structure-structure alignment of proteins).

Multiple sequence alignment between target sequence Multiple sequence alignment between target sequence and template sequencesand template sequences

Methods to automatically align sequences in a reasonable manner:

CLUSTAL, relies on a heuristic approach HMMER, tackles the problem in a formally sounder

way by estimating a hidden Markov profile from the set of given sequences.

T-Coffee, an alignment of multiple alignments

Enhancement of alignment procedures Since accuracy of secondary structure predictions

algorithms ~70 %, incorporate secondary structure information for the target and the template sequences in the alignment procedure (with a proper weighting factor).

Template searching, selection and Template searching, selection and multiple sequence alignmentmultiple sequence alignment

16 20 30 34 36 38 41 59 60

a abcdTSV-PA: VFGGDECNINEHRSLVVLFNS--NGFLCGGTLINQDWVVTAAHCDS----TRYPS: IVGGYTCGANTVPYQVSLN--S-GYHFCGGSLINSQWVVSAAHCYK----KALLIK: IIGGRECEKNSHPWQVAIYHY--SSFQCGGVLVNPKWVLTAAHCKN----THROM: IVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWBOTHRO: VIGGDECDINEHPFLAFMYYS-PQY-FCGMTLINQEWVLTAAHCD-----PA-BJ : VVGGRPCKINVHRSLVLLYNS- SSLLCSGTLINQEWVLTAAHCD-----

61 62 73 77 78 81 93 95 9697 98

efghi a a aTSV-PA: ------NNFQLLFGVHSKKILN-EDEQTRDPKEKFFCPNRKK---DD-EVTRYPS: -----SGIQVRL-GEDNINVVE-GNEQFISASKSIVHPSYN----SN-TLKALLIK: -----DNYEVWL-GRHNLFENE-NTAQFFGVTADFPHPGFNLSADGK-DYTHROM: DKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYN----WRENLBOTHRO: ----K-TYMRIYLGIHTRSVAND-DEVIRYPKEKFICPNKKK---NV-ITPA-BJ : ----S-KNFKMKLGVHSIKIRN-KNERTRHPKEKFICPNRKK---DD-VL

Model buildingModel building

Proceed to the molecular modeling of the target sequences based on template structures

Construction of backbone

Construction of loops

Optimization of side chain packing

Energy minimization and/or molecular dynamics (may use force fields such as CHARMM, AMBER or GROMOS)

Retain a number of models based on several plausible alignments or other criteria.

Obtain properties such as the Molecular Electrostatic Potential, the Solvent Accessible Surface area, e.g., contact between protein and solvent (Lee & Richards, 1971), the exposed residues, etc.

Model refinementModel refinement

Structure ValidationStructure Validation Calculation of several stereochemical Calculation of several stereochemical

quality indicesquality indices

Ramachandran mapRamachandran map Packing qualityPacking quality

f

Assessment of the models regarding Assessment of the models regarding structural properties typical for native structural properties typical for native conformations, i.e. fold correctness.conformations, i.e. fold correctness. Use of the obtained energy-profiles to Use of the obtained energy-profiles to

reveal potential folding errors in the reveal potential folding errors in the models (Prosa II).models (Prosa II).

Growth of new folds in the Growth of new folds in the Protein Data Bank (PDB)Protein Data Bank (PDB)

New (blue) and old New (blue) and old (orange) folds per (orange) folds per year.year.

Using the homology method

• Once that representative structures of each fold will be identified (~1000-5000), it will be possible to obtain models for all the sequences.

• The determination of the fold type may furnish or improve the functional annotation,leading to a synergy between 3D structure and function

1. (Approx., 1/3 of genomic sequences have homologs with known protein structures.)

PROGRAMPROGRAM

INTRODUCTIONINTRODUCTION PROTEIN STRUCTUREPROTEIN STRUCTURE PROTEIN CONFORMATIONPROTEIN CONFORMATION FORCES THAT STABILIZE PROTEIN FORCES THAT STABILIZE PROTEIN

CONFORMATIONCONFORMATION METHODOLOGICAL APPROACHESMETHODOLOGICAL APPROACHES APPLICATIONSAPPLICATIONS

ApplicationsApplications Probing of ‘interesting’ molecular regions with the goal of Probing of ‘interesting’ molecular regions with the goal of

trying to determiner whether they are involved in molecular trying to determiner whether they are involved in molecular recognition, activity and others.recognition, activity and others.

Computer-assisted drug design with a certain specific protein Computer-assisted drug design with a certain specific protein as target.as target.

Changes in molecular properties, such as production of Changes in molecular properties, such as production of chimeric enzymes with a different catalytic activity chimeric enzymes with a different catalytic activity (biochemical engineering).(biochemical engineering).

Effects of mutations on stability and activity.Effects of mutations on stability and activity. Computation of affinities and other properties.Computation of affinities and other properties. Mechanism of action.Mechanism of action. Structure-function relationships.Structure-function relationships.

Obtention of molecular models of the 3D Obtention of molecular models of the 3D structure of proteins and their complexes.structure of proteins and their complexes.

Design of new biomolecules possessing Design of new biomolecules possessing searched properties.searched properties.

Design of effectors of those molecules.Design of effectors of those molecules.

Studies of molecular interaction and Studies of molecular interaction and recognition, based on the principles of recognition, based on the principles of steric and electrostatic complementarity.steric and electrostatic complementarity.

Calculation of sure values for certain Calculation of sure values for certain properties, such as the inhibition constant Kproperties, such as the inhibition constant K ii

Bioavailability, pharmacokinetics and Bioavailability, pharmacokinetics and dynamics, absorption by different tissues, dynamics, absorption by different tissues, permeability, accessibility, permeability, accessibility, immunogenicityimmunogenicity or or toxicity. New ADMET technology toxicity. New ADMET technology addresses these issues.addresses these issues.

LimitationsLimitations

BIBLIOGRAPHYBIBLIOGRAPHYWEB SITESWEB SITES

Courseswww.cryst.bbk.ac.uk/PPS2/course/

swissmodel.expasy.org/course/

http://www.cmbi.kun.nl/gvteach/hommod/index.shtml

Kimball's Biology Pagesusers.rcn.com/jkimball.ma.ultranet/BiologyPages/

Protein sequences and structureswww.rcsb.org

ncbi.nlm.nih.gov

www.ebi.ac.uk

www.embl-heidelberg.de

www.expasy.org

www.uniprot.org

Image librarywww.imb-jena.de/

Water Structure and Behaviorwww.lsbu.ac.uk/water/index.html

Computational Analysis of Protein Sequence and Structurehttp://www.sbc.su.se/~maccallr/thesis/

Homology Modelinghttp://www.cmbi.kun.nl/gvteach/hommod/Step06A.shtml

BOOKSBOOKS

Lehninger principles of biochemistryLehninger principles of biochemistryNelson David L., Cox Michael M.Nelson David L., Cox Michael M.Biochemistry illustratedBiochemistry illustratedEdition 3rd edEdition 3rd ed20002000

Biochemistry illustratedBiochemistry illustratedCampbell Peter Nelson / Smith Anthony DonaldCampbell Peter Nelson / Smith Anthony DonaldEditeur Churchill LivingstoneEditeur Churchill Livingstone19941994ISBN 0-443-04573-9ISBN 0-443-04573-9

[An ]introduction to comparative biochemistry[An ]introduction to comparative biochemistryBaldwin ErnestBaldwin ErnestEdition 4th ed.Edition 4th ed.1970 [c1964]1970 [c1964]

BiochemistryBiochemistryBerg Jeremy M. / Tymoczko John L. / Stryer LubertBerg Jeremy M. / Tymoczko John L. / Stryer LubertEdition 5th edEdition 5th ed20022002

BiochemistryBiochemistry

Champe Pamela C. / Harvey Richard A. / Vella F. Champe Pamela C. / Harvey Richard A. / Vella F.

Editeur J. B. LippincottEditeur J. B. Lippincott

19941994

ISBN 0-397-51091-8ISBN 0-397-51091-8

BiochemistryBiochemistry

Davidson Victor L. / Sittman Donald B.Davidson Victor L. / Sittman Donald B.

3rd ed3rd ed

19941994

BiochemistryBiochemistry

Garrett Reginald H. / Grisham Charles M.Garrett Reginald H. / Grisham Charles M.

2nd ed2nd ed

19991999

BiochemistryBiochemistry

Voet Donald / Voet Judith G.Voet Donald / Voet Judith G.

2nd ed2nd ed

19951995

Introduction to protein structureIntroduction to protein structure

Branden Carl / Tooze John / Branden Carl / Tooze John /

Garland 1999Garland 1999

ISBN 0-8153-2304-2 ;0-8153-2305-0ISBN 0-8153-2304-2 ;0-8153-2305-0

2nd ed2nd ed

Proteins: structures and molecular propertiesProteins: structures and molecular properties

Creighton Thomas E. / Creighton Thomas E. /

2nd ed2nd ed

c1993c1993

ISBN 0-7167-2317-4 ; 0-7167-2334-4ISBN 0-7167-2317-4 ; 0-7167-2334-4

Molecular modelling. Principles and applicationsMolecular modelling. Principles and applications

Leach Andrew R. / Leach Andrew R. /

Longman, 2001Longman, 2001

ISBN 0-582-38210-6ISBN 0-582-38210-6

2nd ed2nd ed

Structural bioinformaticsStructural bioinformatics

Bourne P. E. / Weissig H. / Bourne P. E. / Weissig H. /

Wiley-Liss 2003Wiley-Liss 2003

ISBN 0-471-20199-5ISBN 0-471-20199-5

Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 2Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 2

Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony / Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony /

19931993

ISBN 90-7219-915-4ISBN 90-7219-915-4

Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 3Computer simulation of biomolecular systems: theoritical and exprimental applications, v. 3

Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony / Van Gunsteren Wilfred F. / Weiner Paul K. / Wilkinson Anthony /

19931993

ISBN 90-7219-925-1ISBN 90-7219-925-1

Computational biochemistry and biophysicsComputational biochemistry and biophysics

Becker Oren M. / et al. /Becker Oren M. / et al. /

M. Dekker 2001M. Dekker 2001

ISBN 0-8247-0455-XISBN 0-8247-0455-X

THIS IS THE ENDTHIS IS THE END