Upload
christian-herhaus
View
136
Download
0
Tags:
Embed Size (px)
Citation preview
Improving Cluster Representations by "Fuzzifying" Maximum Common Substructures
IntroductionArranging similar structures in clusters is one of the typical tasks of modern
Chemoinformatics with high impact in HTS follow-up, generation of structure activity
relationships (SAR) and selection of starting points for compound optimization.
Methods for cluster generation are as diverse as the structures which they are applied to
[1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend to orientate themselves in structure subsets like clusters with the help of substructures,
so-called "scaffolds", which intuitively characterize the structural relationships between
the molecules of the subset.
In the case of substructure-based clustering, well established methods are existing for
generation of Maximum Common Substructures (MCS) which are present in all members
of the structure population or a defined proportion thereof [2]. But in the case of similarity-
based clusters, such MCS may either not be existing for the required dataset proportion
or the common substructure may be so small that it is no longer representative and
therefore lacking information.
Aim of this approach is to provide descriptive scaffold structures also for clusters whose
intrinsic structural diversity is too high to be characterized by conventional Maximum
Common Substructure approaches.
Christian HerhausMerck KGaA, Computational Chemistry, Frankfurter Str. 250, 64293 Darmstadt, [email protected]
ResultsThe approach presented here allows the generation of MCS also for similarity-based
clusters with a given inherent structural diversity. It does so by utilizing query features like
atoms lists and query bonds to add "fuzziness" in atom and bond information to the MCS.
As a result, the generated "fuzzy" MCS, although still being fully database-searchable, is
more meaningful for the characterisation of clusters as it can cover larger parts of the full structures than a conventional MCS could do. The approach was implemented in Pipeline
Pilot™ for proof of concept but is general enough to be transferred to other technical
platforms as well.
Conventional
MCS
"Fuzzy"
MCSMethodsContent of the Pipeline Pilot™ component "Generate Fuzzy MCS" (slightly simplified)
1. Reduced graphs 2. Generalized MCS
any any
any
anyany
any
anyany
any anyany any
any
anyany
any any
any
anyany
any
any any
any any
any any
any any
any
6. Query bonds coded as stereo bonds
7. Final "fuzzy" MCS
any
s/ds/d
s/d
any
s/ds/d
s/d
[1] Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry (18). Chichester: Wiley and Sons; 2003, 1-40.[2] Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 2011, 1:68-79.
Where to get / How to contributeThe current version of the component is published on the Accelrys® Pipeline
Pilot™ user forum and can be downloaded from
https://community.accelrys.com/message/16528.
Contributors are appreciated for improving the approach and for transfering the concept to other technology platforms like RDKit, CDK or KNIME.
Current limitations• Aromaticity may cause unexpected effects (e.g. single/double vs. aromatic bonds)
• Symmetric substructures may blur atom/bond information (e.g. carboxyl or nitro groups)
• Currently, just one presence of an atom/bond feature leads to inclusion into a query feature. For larger clusters, a percentual threshold may be required
• Different ring sizes and chain lengths can so far not be described by query patterns.
5. Collected atom/bond information transfered into queries
Atom No Elements
1 O,O,S
7 N,N,O
11 C,C,C
Bond No Types
3 1,1,1
9 1,2,1
14 2,1,4
Atom No Element
1 [O,S]
7 [N,O]
11 C
Bond No Type
3 single
9 single/double
14 aromatic
3. Generalized MCS mapped onto structures
1
9
6
1117
20
7
16
13
4. Structures reduced & mappings normalized
1
1
1
7
7
7
11
11
11