40
A new paradigm for virtual screening A Research Council’s Basic Technology Research Programme

A new paradigm for virtual screening A Research Council’s Basic Technology Research Programme

Embed Size (px)

Citation preview

  • A new paradigm for virtual screeningA Research Councils Basic Technology Research Programme

  • BackgroundCross research council endeavouradministered by EPSRCFunding for research to create a new technologyChange the way we do scienceUnderpin the future industrial base

  • Atom based modellingQSAR & QSPRAlmost all modelling techniques are based on atomistic descriptions of moleculesAlthough these techniques have been successful over several decades, they have disadvantagespoor scaling characteristicslack of a solid physical justification, e.g. scoring functionsinterpretation difficult due to abstract nature of many descriptorstendency to produce high dimensional models

  • What is the true dimensionality of chemical space?This has been investigated as follows:1.Choose 26 descriptors that appear again and again in our QSPR-models2. Calculate them for the entire Maybridge database3. Calculate the principal components (factors)4. What is the dimensionality of physical property space, what are the descriptors?

  • Scree plot of the PC eigenvalues

  • Physical property Space

    PC

    Main descriptors

    Interpretation

    1

    Polarizability, molecular weight, volume, surface area, globularity

    Size, shape

    2

    Maximum MEP, mean positive and negative MEPs, total variance

    Complementary electrostatic surface descriptor

    3

    Minimum MEP, mean negative MEP, balance parameter

    Complementary electrostatic surface descriptor

    4

    Total MEP-derived charges on nitrogens,

    # H-bond donors

    Complementary Hydrogen-bonding descriptor

  • Physical property Space

    PC

    Main descriptors

    Interpretation

    5

    Total MEP-derived charges on H and O, minimum MEP,

    # aromatic rings

    Complementary hydrogen bonding descriptor

    6

    Dipole moment, dipolar density

    Dipolar polarity

    7-9

    Total MEP charges on different types of atom

    Chemical diversity

  • Improved molecular modelling?Can we define a more parsimonious and explicit description of molecules than has so far been achieved using atomistic models?leading to better prediction AND a clearer understanding of the properties of molecules and how they arise

  • A non-atom based approachWe are developing an alternative approach in which molecules are described by their surfaces Benzodiazepine analogues

  • A non-atom based approachThe approach is based on calculation of a set of local properties at or near the molecular surfacethe local molecular electrostatic potential (MEP) the local ionisation energy (LIE, IEL)the local electron affinity (LEA, EAL)the local polarisability (LP, L)

  • The local surface properties

  • Molecular Electrostatic Potential

    n = number of atoms in moleculeZi = nuclear charge of atom i located at Ri (r) = electron density function

  • Local Polarizability

    Density due to a singly occupied atomic orbital j

    Coulson population of atomic orbital j

    Mean polarizability calculated for atomic orbital j

  • Local electron affinity - EAL

  • Local Ionization Energy

  • Calculation of thesurface propertiesMolecules defined as isodensity surfacesusing semi-empirical AM1 electron densitycan also be defined using a shrink-wrap or a marching cube algorithmFitted to a spherical harmonic expansionthe shape of the shrink-wrapped surface, orthe four local propertiesMEP, LIE, LEA & LP

  • Describing surface shape:spherical harmonic expansionThe accuracy of the surface description is a function of the order N of the expansionThe greater N, the larger the computational penalty

  • Advantages of this approachThis gives a completely analytical description of the molecules shape & the 4 local properties intermolecular binding properties & chemical reactivitySpherical harmonics can be truncated at low orders for fast QSAR scans (HTS), fast superposition of molecules & rapid calculation of similarity indicesfor ligands (MW < 750), N = 6-8for peptides & proteins (MW > 5,000), N = 25-30

  • Putative resolutions for in silico screeningFor ligands N=6

    For receptors N=25

  • MEP & LIE

    MEP

    IEL

  • Application to QSAR & QSPRSeveral classes of QSAR/QSPR descriptors can be derived from the local properties, including:the spherical harmonics coefficients for constant order Nthe number of coefficients is invariant of the number of atoms in a moleculethe critical points for each surface propertymaxima, minima & saddle points the distribution of field intensities at the molecular surfacefour fields with local intensities varying between moleculessample using grid points?the surface integrals for each field

  • Public domain datasetsSmallConsensus Set of 74 Drug Molecules (diverse)QSAR set (31 CoMFA steroids)MediumWDI subset (2,400 compounds)Harvard Chembank dataset (2,000 compounds)LargeWDI (50,000)Maybridge (50,000)

  • Small molecule showing tesselated surface

  • An example grid of surface points A grid is placed on this molecular surface in order to reduce the number of surface points from 4038 to 55

  • Gradient flows & molecular surface property graphsCharacterize the behaviour of a property f : S on a molecular surface S, in terms of a directed graph G on S derived from the gradient vector field x = grad f(x)The molecular surface property graph G is defined byVertices (G) = fixed points of grad f = critical points of f Edges (G) = stable and unstable manifolds of the saddle points

  • Example MoleculeAllopurinol

  • Allopurinol RGB SurfacesLIE encoded on Red channelLEA encoded on Green ChannelLP or MEP encoded on Blue Channel

  • Critical points of allopurinol 8 maxima 7 minima13 saddlesNo. of maxima no. of saddles + no. of minima = Euler characteristic (S) = 2

  • Distribution based descriptors34 descriptors were measured includingmaximum field intensityminimum field intensitymean field intensityrange of field intensitiesvariance of field intensitiesThe Principal Components of the descriptors were calculated to provide a set of orthogonal descriptors derived from the local properties at the molecular surface

  • Distribution of Allopurinol Local Properties

  • Other distribution based descriptorsMoments1st Mean2nd Variance3rd Skewness4th Kurtosis> 4th Higher moments as requiredOverlapping GaussiansKernal density procedure

  • Correlation Matrix for properties of allopurinol

  • Correlations of Local Properties: Maybridge db

    MEP

    LIE

    LEA

    LP

    MEP

    1

    LIE

    0.15

    1

    LEA

    -0.12

    0.18

    1

    LP

    0.29

    0.19

    0.51

    1

  • QSPR & QSAR modelsModels derived from Local PropertiesDrug LikenessSOMs trained on WDI (drugs) & Maybridge (general)Parameters from PC of Local Property Descriptors Medium sized datasets superimposed on SOMsSurface Integral Model for Solvation EnergyRMS Error ~ 0.75 Kcal

  • Physical-Property MappingMaybridge used as the chemistry datasetUse the top six principal components to train a 100 100 Kohonen net (unsupervised training)2,105 compounds selected from the World Drug Index as real drugs used as the drug dataset

  • Physical Property MapchemistryTrainKohonenNet

  • Physical Property Map: Drugs

  • Physical Property Map: steroid hormones

  • Surface-integral free energiesCritical for scoring functions, which otherwise use the force-field intermolcular energiesProvide an attractive alternative to descriptor-plus-interpolation QSPR-modelsSolvation , lattice energies ?, vapour pressures , partition coefficients ?, solubilities ?.....

  • Surface-integral modelsP = target propertyAi = area of triangle intri = number of triangles

  • Free energies & enthalpies of hydration, free energies of solvation for n-octanol & chloroform

  • Pattern matching on molecular surfacesCan we recognise similar surfaces?Can we recognise similar surface fragments?Can we identify the most similar surface to our target?How do we compare field descriptors on the molecular surface?

  • Surface comparisonTwo different approaches:Using spherical harmonic molecular surfaces [J. Comp. Chem. 20(4) 383-395; Ritchie and Kemp 2000; University of Aberdeen].Partial molecular alignment via local structure analysis [J. Chem. Inf. Comput. Sci. 40(2) 503-512 ; Robinson, Lyne and Richards 1999; University of Oxford].

  • Voting pairs provide possible local alignmentsTry all possible voting pairs to produce a large number of alignments. The choice of voting pairs can have a critical effect on the quality of the surface alignment.

  • Example alignments1342

  • Pattern matching of surface properties: RMSD = 0.75AB

  • ParaSurf v1.0SurfacesIsodensity SurfacesShrink WrapMarching CubeSurfaces fit to Spherical HarmonicsPropertiesMEP, LIE, LEA and LPEncoded at points on the surfaceEncoded as Spherical Harmonic Expansions

  • GRID ComputingParaSurf compiled onSGI IRIXWindowsLinux (SUSE)IBM AIXFuture PlatformsSUN SolarisGRID enabling at Portsmouth, Southampton and Oxford.

  • Provisional TimingsSGI R10k, 256MBVAMP ~ 30s/compoundParaSurf ~ 10s/compoundIntel 1.8 Xeon/ AMD Athlon XP-2000+ParaSurf ~ 2s/compoundSGI FUEL Workstation R14KParaSurf ~ 2s/compound

  • SummaryCompound screeningSpherical harmonicrepresentationAberdeen

  • ConclusionsProperties can be calculated at the surface of moleculesThese properties can be RGB encodedThe properties are localDescriptor sets derived from these properties can be used for robust QSPR & QSAR modelsThe algorithms will soon be available commercially for use in virtual high throughput screening

  • ParaSurf in silico Screening TechnologyBasic Technology Funding for October 2003 to September 2004Proof of concept studiesConsortia building networkingAcademic partnersUniversity of PortsmouthUniversity of ErlangenUniversity of SouthamptonUniversity of AberdeenUniversity of Oxford