23
Quantifying Degree of Aromaticity From Structural Features [email protected] David J. Ponting , Ruud van Deursen, Martin A. Ott

Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features [email protected]

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Quantifying Degree

of Aromaticity From

Structural Features

[email protected]

David J. Ponting,

Ruud van Deursen,

Martin A. Ott

Page 2: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

• Not all aromatic systems are equal!• Furans can undergo Diels-Alder reactions1,2

• Direct effects on in vivo behaviour, e.g. toxicity3,4

Why model aromaticity?

1: LaPorte et al (2013), J. Org. Chem., 78, 167-174. 2: Jursic (1998), J. Mol. Struct. (TheoChem), 454, 105-116

3: Opinion on Dihydroxyindole, SCCP (2006). 4: Chichirau et al (2005), Free Rad. Biol. Med., 38, 344-355

Page 3: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Quantifying Aromaticity

Page 4: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

HOMED model1

• Delocalisation can be thought of as ‘smearing-out’ bonds2

• C-C: Rd double (ethene) ≈ 1.3288 Å, Rs single (ethane) ≈ 1.5300 Å,

Ra aromatic (benzene) ≈ 1.3943 Å.

• Compare bond lengths (Ri) with ideal reference compounds1

• Calculate per-bond and sum for all in ring system

• Normalise against effect of ideal aromatic system (Rs-Ra or Ra-Rd)

• HOMED = 1 −1

𝑛𝑑+𝑛𝑠σ𝛼𝑖(𝑅𝑎−𝑅𝑖)

2

• where 𝛼𝑖 =𝑛𝑑+ 𝑛𝑠

𝑛𝑠(𝑅𝑎−𝑅𝑠)2+𝑛𝑑(𝑅𝑎−𝑅𝑑)2

• α is therefore constant for a given atom pair in a given ring type

• Reference compound selection is critical!

• Formally applies to all delocalised systems• Filter by Hückel’s rule to remove non-aromatics

1: Raczynska et al (2010), Symmetry, 2, 1485-1509

2: Jursic (1998), J. Mol. Struct. (TheoChem), 454, 105-116

Page 5: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Methods

• Reference compounds selected for 29 pairs of atoms• Extension of published HOMED method1

• References modelled at B3LYP/6-31G*• Not highest accuracy but same theory as dataset

• Large dataset of compounds derived from PubchemQC2

• ~4 million structures, as many aromatic rings

• Calculate aromaticity for all of these ring systems

1: Raczynska et al (2010), Symmetry, 2, 1485-1509

2: Nakata and Shimazaki (2017), J. Chem. Inf. Model., 57, 1300-1308

Page 6: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Furan and analogues

• Furan significantly weaker than

thiophene and pyrrole

• Several distinct sub-classes• Electron-donating groups at the

2-position are less aromatic

• Electron-withdrawing groups at

the 2-position are more aromatic

• The 3-position has less effect

• 2,3-fused systems typically less

aromatic

Page 7: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Furan and analogues

• Electron-withdrawing groups encourage delocalisation

• Electron-donating groups reduce delocalisation

• Fusions typically reduce aromaticity

Page 8: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

2-Pyridone and analogues

• 2-Pyridone more aromatic than

both uracil and isocyanuric acid• For the latter two, any

substitution pattern makes them

less aromatic

• Electron-withdrawing groups on

nitrogens reduce aromaticity;

donating groups increase it

• 4- and 6-position have effects

analogous to 2-position in furans

Page 9: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

• Electron-donating groups reduce delocalisation

• N-substitution can reduce aromaticity (EWG)

• or increase (EDG) it

2-Pyridone and analogues

Page 10: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Predicting Aromaticity

• Measuring aromaticity is useful – but requires an accurate geometry• X-Ray crystal structure

• Expensive QM calculation

• Can we machine-learn the HOMED index, and predict from a SMILES string?

Page 11: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Methods

• Reference compounds selected for 29 pairs of atoms• Extension of published HOMED method1

• References modelled at B3LYP/6-31G*• Not highest accuracy but same theory as dataset

• Large dataset of compounds derived from PubchemQC2

• ~4 million structures, as many aromatic rings

• Fragment molecules to isolate individual rings• Keep conjugated substituents

• Fingerprint the structures and group by ring systems• Machine learn values for each structure group

1: Raczynska et al (2010), Symmetry, 2, 1485-1509

2: Nakata and Shimazaki (2017), J. Chem. Inf. Model., 57, 1300-1308

Page 12: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Fragmentation

• Assume sp3 carbons interrupt conjugation• Remove acyclic bonds between them from molecule

• Then remove all acyclic C-C bonds with sp3 at one end

• Also remove halogens to allow grouping of substructures

• Calculate HOMED indices on disconnected fragments

Page 13: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Fragmentation

0.998

Not Hückel compliant

0.990

• Assume sp3 carbons interrupt conjugation• Remove acyclic bonds between them from molecule

• Then remove all acyclic C-C bonds with sp3 at one end

• Also remove halogens to allow grouping of substructures

• Calculate HOMED indices on disconnected fragments

Page 14: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Fingerprinting

• Structures with the same conjugated system grouped• Combinations of HOMED index taken

• Only those with >3 instances kept

• 15065 substituent patterns from 954 different rings

• Variety of fingerprints and hashed sizes tried• 256 – 16384 length

• Morgan Circular (ECFP2/4, both bits and counts)

• [Extended] Sybyl Atom-Pair

0.85

0.9

0.95

1

256 512 1024 2048 4096 8192 16384

Valu

e

Hologram hash size

Performance effect of fingerprint size

Gradient R^2

0.075

0.08

0.085

0.09

0.095

0.1

256 512 1024 2048 4096 8192 16384

Valu

e

Hologram hash size

Performance effect of fingerprint size

Intercept RMSE

Page 15: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Learning

• Machine learning was performed in Python• ‘First try’ random forest achieved R2=0.98, RMSE=0.08 as

out-of-bag estimates

• Due to a cluster of highly-aromatic species being well-

predicted; there was much more scatter elsewhere

• Some aromatic systems can have outliers• Often due to ring strain

• e.g.

• 3 at 0.954, 1 at 0.044: Median 0.954, Mean 0.727

• Learned against median, not mean

Page 16: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Data transformation

• The HOMED indices are clustered around 0.9-1, with

relatively few less-aromatic systems• Causes problems for some learning algorithms

• Calculated index was transformed using logistic functions

• Models built on this data, then predictions transformed back• i.e. 𝑦𝑝𝑟𝑒𝑑 = ൗ1 1+𝑒−𝑦𝑚𝑜𝑑𝑒𝑙

Page 17: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Choosing a fingerprint and metric

• Python allowed rapid model

development, optimising:• Model choice

• Model hyperparameters

• Choice of fingerprint

• 10-fold cross-validation within

model selection

• ECFP2 Hologram gave best results• Better than ECFP2 bitset, ECFP4 or

atom-pair fingerprints

• Very high r2 for all decent models• Makes it a poor metric

• RMSE affected by few outliers

• Median Absolute Deviation (MAD)

Experimental

Pre

dic

ted

Page 18: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Choosing a model

• NN and RF give good results

• kNN gives middling performance

• Kernel methods poor

Fingerprint Model R2 RMSE MAD

ECFP2 Hologram RF .9934 .07485 .004181

ECFP2 Hologram NN .9941 .07396 .004485

ECFP2 Hologram SVR .8315 .1538 .009593

ECFP2 Hologram KNN .8499 .09749 .006940

ECFP2 Bitset RF .9928 .07580 .004346

ECFP4 Hologram RF .9932 .07668 .004206

ESybyl Atom Pair RF .9914 .08200 .004295

Page 19: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Neural Network or Random Forest?• Very similar performance

• Random Forest (L) slightly tighter well-predicted band

• Neural Network (R) has fewer extreme outliers

Experimental Experimental

Pre

dic

ted

Pre

dic

ted

Page 20: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Kernel methods perform poorly

Experimental

Pre

dic

ted

Experimental

Pre

dic

ted

• Nearest Neighbours better but not spectacular

Experimental

Pre

dic

ted

Page 21: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Conclusions

• The HOMED index allows quantification of

aromaticity given an accurate geometry

• The HOMED index provides a numeric

response variable for machine learning

• This machine learning method allows us to

bypass the need for an expensive geometry

and predict degree of aromaticity from a

SMILES string

• The predicted degree of aromaticity can be

used to better predict chemical behaviour

Page 22: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Lhasa Limited

Granary Wharf House, 2 Canal Wharf

Leeds, LS11 5PS

Registered Charity (290866)

Company Registration Number 01765239

+44(0)113 394 6020

[email protected]

www.lhasalimited.org

Questions?PCCDB

PubChemQC

Page 23: Quantifying Degree of Aromaticity From Structural Features › Public › Library › 2018 › EuroQSAR talk.… · Quantifying Degree of Aromaticity From Structural Features david.ponting@lhasalimited.org

Work in progress disclaimer

This document is intended to outline our general product

direction and is for information purposes only, and may not be

incorporated into any contract. It is not a commitment to deliver

any material, code, or functionality, and should not be relied

upon. The development, release, and timing of any features or

functionality described for Lhasa Limited’s products remains at

the sole discretion of Lhasa Limited.