8
SPREAD—Exploiting Chemical Features That Cause Differential Activity Behavior Josef Scheiber 1, ,, Jeremy L. Jenkins 1 , Andreas Bender 1, , Mariusz Milik 1 , Dmitri Mikhailov 1 , Sai Chetan K. Sukuru 1 , Ben Cornett 1 , Steven Whitebread 2 , Laszlo Urban 2 , John W. Davies 1 and Meir Glick 1 1 Novartis Institutes for Biomedical Research, Lead Discovery Informatics, 250 Massachussetts Avenue, Cambridge, MA 2 Novartis Institutes for Biomedical Research, Preclinical Safety Profiling, 250 Massachussetts Avenue, Cambridge, MA Received 05 September 2008; revised 24 February 2009; accepted 11 March 2009 DOI:10.1002/sam.10036 Published online 8 July 2009 in Wiley InterScience (www.interscience.wiley.com). Abstract: We present a novel generic method to better understand the divergent activities of molecules that often occur in orthogonal assays. The newly developed simple prediction of activity differences (SPREAD) method directly aims to model and understand the differences compounds exhibit when tested in two or more assays. By transforming the activity values from the assays into meta-categories (specifically defined for datasets under scrutiny), statistical models can be trained directly on the qualitative differences between assays. This contributes heavily toward a tangible understanding of molecular assay selectivity. Although ensembles of models could be used alternatively to predict compounds that score highly in one assay and low in another, the advantage of the SPREAD approach is that the chemical features influencing assay differences are parsed out immediately as a consequence of training the model on the coincident assay differences. By training the model that describes the difference between two or more assays, molecular substructures that are responsible for assay selectivity can be parsed out. The method was validated by using four challenging datasets. 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 115–122, 2009 Keywords: selectivity; prediction; activity differences; Bayes, meta-category; statistical modeling 1. INTRODUCTION Compounds encounter a multitude of assays in varying assay formats on their way through the drug discovery pipeline. Understanding the reasons for differential com- pound activity in two or more bioassays for the same or dif- ferent targets is a challenge commonly faced. For example, a mixture of biochemical and cellular assays are often employed orthogonally to evaluate compound potency, or multiple target orthologs, species, or organisms are used during drug testing. Unfortunately, multiple assays may also yield unexpectedly divergent results for a compound, which lead to the challenging task of explaining the dif- ferential behavior of compounds, such as the case where compounds display potent in vitro activity but weak in Correspondence to: Josef Scheiber ([email protected]) Novartis Pharma AG, NITAS/Text Mining Services, Forum 1, 4002 Basel, Switzerland Leiden/Amsterdam Center for Drug Research, Division of Medic- inal Chemistry, Leiden University, Einsteinweg 55, 2333 CC Lei- den, The Netherlands vivo activity. In another example of differential behavior, compounds may be found in the course of drug discov- ery to modulate multiple targets, especially targets with high genetic or structural similarity to the primary target. Adverse side effects resulting from unexpected off-target” polypharmacology” of drugs are a major cause of death and other serious outcomes [1–3]. Consequently, a fundamen- tal paradigm in drug discovery is to aim for compounds that are very selective toward their primary target to avoid potential issues with unsafe off-targets [4–5]. From a chemistry perspective, one might consider dif- ferences in orthogonal assay readouts for a compound to be a direct result of particular chemical features that are sensitive to differences in assay conditions. It is well under- stood that changing small chemical features can strongly influence protein selectivity, cell permeability, or metabolic characteristics. From a computational standpoint, model- ing selectivity is most often addressed by building separate predictive models for every single assay or target to guide optimization of a compound. In some cases, ensembles of models have been used [6–13]. However, optimizing for 2009 Wiley Periodicals, Inc.

SPREAD—exploiting chemical features that cause differential activity behavior

Embed Size (px)

Citation preview

Page 1: SPREAD—exploiting chemical features that cause differential activity behavior

SPREAD—Exploiting Chemical Features That Cause Differential ActivityBehavior

Josef Scheiber1,∗,†, Jeremy L. Jenkins1, Andreas Bender1,‡, Mariusz Milik1, Dmitri Mikhailov1, Sai Chetan K.Sukuru1, Ben Cornett1, Steven Whitebread2, Laszlo Urban2, John W. Davies1 and Meir Glick1

1 Novartis Institutes for Biomedical Research, Lead Discovery Informatics, 250 Massachussetts Avenue, Cambridge, MA

2 Novartis Institutes for Biomedical Research, Preclinical Safety Profiling, 250 Massachussetts Avenue, Cambridge, MA

Received 05 September 2008; revised 24 February 2009; accepted 11 March 2009DOI:10.1002/sam.10036

Published online 8 July 2009 in Wiley InterScience (www.interscience.wiley.com).

Abstract: We present a novel generic method to better understand the divergent activities of molecules that often occur inorthogonal assays. The newly developed simple prediction of activity differences (SPREAD) method directly aims to model andunderstand the differences compounds exhibit when tested in two or more assays. By transforming the activity values from theassays into meta-categories (specifically defined for datasets under scrutiny), statistical models can be trained directly on thequalitative differences between assays. This contributes heavily toward a tangible understanding of molecular assay selectivity.Although ensembles of models could be used alternatively to predict compounds that score highly in one assay and low inanother, the advantage of the SPREAD approach is that the chemical features influencing assay differences are parsed outimmediately as a consequence of training the model on the coincident assay differences. By training the model that describesthe difference between two or more assays, molecular substructures that are responsible for assay selectivity can be parsed out.The method was validated by using four challenging datasets. 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2:115–122, 2009

Keywords: selectivity; prediction; activity differences; Bayes, meta-category; statistical modeling

1. INTRODUCTION

Compounds encounter a multitude of assays in varyingassay formats on their way through the drug discoverypipeline. Understanding the reasons for differential com-pound activity in two or more bioassays for the same or dif-ferent targets is a challenge commonly faced. For example,a mixture of biochemical and cellular assays are oftenemployed orthogonally to evaluate compound potency, ormultiple target orthologs, species, or organisms are usedduring drug testing. Unfortunately, multiple assays mayalso yield unexpectedly divergent results for a compound,which lead to the challenging task of explaining the dif-ferential behavior of compounds, such as the case wherecompounds display potent in vitro activity but weak in

Correspondence to: Josef Scheiber ([email protected])†Novartis Pharma AG, NITAS/Text Mining Services, Forum 1,4002 Basel, Switzerland‡Leiden/Amsterdam Center for Drug Research, Division of Medic-inal Chemistry, Leiden University, Einsteinweg 55, 2333 CC Lei-den, The Netherlands

vivo activity. In another example of differential behavior,compounds may be found in the course of drug discov-ery to modulate multiple targets, especially targets withhigh genetic or structural similarity to the primary target.Adverse side effects resulting from unexpected off-target”polypharmacology” of drugs are a major cause of death andother serious outcomes [1–3]. Consequently, a fundamen-tal paradigm in drug discovery is to aim for compoundsthat are very selective toward their primary target to avoidpotential issues with unsafe off-targets [4–5].

From a chemistry perspective, one might consider dif-ferences in orthogonal assay readouts for a compound tobe a direct result of particular chemical features that aresensitive to differences in assay conditions. It is well under-stood that changing small chemical features can stronglyinfluence protein selectivity, cell permeability, or metaboliccharacteristics. From a computational standpoint, model-ing selectivity is most often addressed by building separatepredictive models for every single assay or target to guideoptimization of a compound. In some cases, ensembles ofmodels have been used [6–13]. However, optimizing for

2009 Wiley Periodicals, Inc.

Page 2: SPREAD—exploiting chemical features that cause differential activity behavior

116 Statistical Analysis and Data Mining, Vol. 2 (2009)

a desired activity in one assay may simultaneously opti-mize for undesired activity in another assay. Because it isdifficult to simultaneously optimize multiple properties, thisapproach usually creates an inefficient process of many iter-ative steps. Consequently, building and interpreting modelsfor each series of values with the aim to understand whatcauses assay differences is both time-consuming and proneto error.

In this paper, we present a novel method to elucidate thechemical features that are correlated empirically with thedifferences between two bioassay readouts. More explic-itly, this approach does not aim to improve one particu-lar activity rather than aim to achieve the understandingand design of an activity profile desired for a compound(e.g. active against one target and inactive against anotherone). Because the approach learns the coincident differ-ences between assays rather than creating and optimizingseparate assay models, the desired activities in multipleassays can be optimized concurrently. To the best of ourknowledge, this is the first approach that initially processesactivity values from multiple assays (independent of thereadout) to obtain a “meta” value for the modeling proce-dure. The objective of the simple, but very efficient SimplePREdiction of Activity Differences (SPREAD) method is toidentify the chemical properties or substructures that mostlikely cause the differences a compound exhibits betweentwo tests (an extension to more parallel values is also possi-ble and described for the fourth dataset). However, an intu-itive question to ask is why not build an individual modelfor each of the parallel assays to assess the activity differ-ence by just combining the results of the individual models?The SPREAD approach is superior to that in that it is a hugetime-saver to get equally valid results; it is not necessaryto validate several models (just one does the job of severalmodels) and it is not necessary to analyze feature distri-butions over different models, which is very difficult withlarge datasets. In contrast, with SPREAD the features canbe extracted in just one step and be included in the analysis.

This result can then be used to optimize moleculestoward a desired probability, e.g. permeability or selectivitybetween two targets. We do not present a novel moleculardescription (any kind of molecular descriptor can be used),but rather a way of dealing with multiple activity values. Bycategorizing or binning the differences in assay values (e.g.IC50 values) and using these categories—this is the crucialnovelty of our SPREAD approach—as target values (i.e.dependent variables) in the model computation, we aim todirectly explain the assay differences from a chemical per-spective. Thus, it becomes possible to prospectively predictand rank compounds that will exhibit desired differenceswhen tested. Also, one can identify chemical properties orsubstructures that cause the difference and generate ideas onhow to improve the molecules toward a desired profile. The

method can even be employed to simultaneously increasethe selectivity of compounds toward a particular target aswell as optimize for cellular activity all in one step.

The workflow is as follows (see Fig. 1 scheme): A set ofcompounds that has been tested in different assays is cho-sen. Both compounds showing assay differences as well ascompounds having no differences are included in the analy-sis. In the first step of SPREAD, each compound is assignedto a meta-category which is a derived value based on thereal experimental values or categories (high, medium, low

Fig. 1 The standard workflow for the SPREAD approach. Firstthe differences in experimental values for compounds acrossassays are determined (specific adjustments to meta-categorizationcan be applied). Then, the differences are binned into “meta-categories” on which predictive models are built, which explainassay differences.

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 3: SPREAD—exploiting chemical features that cause differential activity behavior

J. Scheiber et al.: Analyzing Chemical Features Causing Differential Activity 117

Fig. 2 (a) The meta-category scheme upon which models are trained to predict divergence in assay results. Assay 1 refers to biochemicalassays, whereas Assay 2 denotes cellular assays (as applied for the hERG and HDAC datasets). Example features that were identified asimportant for the differences between the in vitro and cellular kinase assay datasets in aggregate are shown (low/medium/high describesthe classification of the compound activity before merging this into meta-categories). (b) The categorization concept used for the Opioidreceptor dataset. A third assay adds a third dimension to the possible classes. Features can be determined for those classes by training aSPREAD model.

activity) in the assays under investigation, which can bedetermined using different methods; the most straightfor-ward ones are shown with the datasets used in this paper.The meta-categories cover all combinatorial possibilities forindividual assay activity categories. For example, in a two-assay scenario, there are five distinct bins that reflect thedifferences that can occur for compounds tested in bothassays (Fig. 2, see color-coding scheme, analogs for threecategories). Each meta-category describes how big the dif-ference between the two-assay values is: No difference(0, yellow), Assay 1 higher (1, blue), Assay 1 much higher(2, light blue), Assay 2 higher (–1, green), Assay 2 muchhigher (−2, forest green). Once compounds are binned intothem, the distinct meta-category numbers are then used asdependent variable on which statistical models are trained.This step then elucidates the chemical features that arelinked to the difference in activities between two or moreassays. This concept can easily be extended toward moreassays by using more categories, which is shown for thefourth dataset.

In order to demonstrate the value of models generatedusing SPREAD, we show its results for three challengingin-house datasets and one big dataset from the literature,taken from the GVK MediChem database. These datasetsrepresent real-life data that a cheminformatician has to workwith in pharmaceutical companies. Although the data aresometimes skewed toward certain categories, we show howSPREAD can be used to model such data. The workflow ofmeta-category determination for each dataset is described inthe Methods Section. The three in-house datasets all entaildifferent assay formats for the same target, where only oneof the assay formats is strictly biochemically based. (i) Thefirst dataset contained compounds tested in both a biochem-ical and a cell-based assay measuring activity against the

same kinase target for 30 kinase targets in total. (ii) The sec-ond dataset consisted of compounds tested in two differenthERG channel assays—a biochemical and a patch-clampassay. (iii) The third dataset contained compounds testedfor histone deacetylase (HDAC) inhibition in both in vitroand cellular assays. (iv) Finally, the fourth dataset containedcompounds tested against all of the Kappa, Delta, and Muopioid receptors (OPRK1, OPRD1, and OPRM1). In alltest cases, we find that SPREAD performs remarkably welland produces very reliable and—if proper descriptors areused—easily interpretable results that can guide the designof novel molecules exhibiting desired properties.

2. METHODS

For initial method validation four different datasets wereused; an overview of the datasets is shown in Table 1.The first set of compounds was taken from our in-housekinase inhibitor database. The compounds in this databaseeither had validated pIC50 values or were inactive. Allcompounds that were tested against the same target at leasttwice (which means that for many compounds there arerepeated measurements) both in cell-based and biochemicalassays were extracted. In total, this left us with a datasetof 1335 compounds active against 30 kinases. For everycompound the standard deviation y of the pIC50 values inrelation to the mean value of the pIC50s for the particularcompound was computed using the number of differentpIC50 values per compound. This step had to be applied tostratify the dataset toward the difference of the number oftests a compound went through. A high standard deviationmeans that there is a big difference between biochemicaland cell-based assay whereas a low deviation indicates analmost perfect overlap of the values.

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 4: SPREAD—exploiting chemical features that cause differential activity behavior

118 Statistical Analysis and Data Mining, Vol. 2 (2009)

Table 1. Dataset composition. This table shows a comparison of the different datasets and the way itwas used to generate meta-categories for them.

Name Responses Number of Number Meta-categorizationcompared compounds of cases

Kinase Biochemical vs. cell-based, 1335 2 over Standard deviationrepeated measurements 30 kinases binning

hERG Biochemical vs. patch-clamp 820 2 Category differenceHDAC Biochemical vs. cell-based 472 2 Category differenceOpioid receptors OPRK1 vs. OPRD1 vs. OPRM1 2 054 3 Category difference

This left us with one value, y, for every compound.In the next step these values were categorized. The fol-lowing categories were computed: If y was greater than3 (huge difference between activities), meta-category −2was assigned. For values between 2 and 3 meta-category-1, for the range 1–2 in y category 0, 0.5–1 category 1and finally meta-category 2 for all y values below 0.5.This categorization is mimicked by the one applied for theother datasets, although no repeated measurements wereavailable and therefore only a step similar to the secondone was necessary. The molecules were encoded usingECFP4 descriptors as implemented in Pipeline Pilot 6.1from Scitegic [14]. These radial fingerprint descriptors arewell established and are therefore our choice for describ-ing our novel method. Furthermore, for all compounds theAlogP and polar surface area were determined in PipelinePilot. A Laplacian-modified naı̈ve Bayesian classifier (thestandard component in Pipeline Pilot) was used to computetwo multi-category models [2] based on the ECFP descrip-tors one hand and on the bulk parameters (AlogP, PSA) onthe other hand. All multi-category models were internallyvalidated according to a leave-one-out cross-validation pro-cedure carried out by the Pipeline Pilot component duringrun time (SciTegic, Inc).

Next, two external (from Kinase projects) validation sets[127 (number of compounds/meta-category: 35/−2, 17/−1,29/0, 32/1, 14/2) and 66 (compounds/meta-category: 37/−2,6/−1, 14/0, 9/1, 0/2) compounds inhibiting 2 differentkinases] were encoded using the same procedure and thentheir difference values were predicted from the modelgenerated in the first step.

The second dataset consists of compounds tested againstthe hERG channel both in a biochemical and an automatedpatch-clamp assay. This dataset comprised 820 compoundstotally. Here we used a slightly different meta-categorylayout, because we have exactly two activity values forevery compound, which enables us to categorize differencein either direction as shown in Table 2. First, the valuesin both assays were categorized. The IC50s (the categoryborders are commonly used to define activity vs. inactiv-ity) below 1µM were assigned category 1, between 1 and30 µM (for the biochemical assay)/10 µM (for the automated

Table 2. The statistical quality of the generated models.

Model Descriptor N Meta- ROC TPR FPRcategory

Kinase ECFP 4 640 2 0.83 0.72 0.21305 1 0.75 0.80 0.42351 0 0.86 0.84 0.2622 −1 0.77 0.82 0.3217 −2 0.99 1.00 0.01

logP/PSA 640 2 0.58 0.67 0.51305 1 0.61 0.35 0.16351 0 0.54 0.65 0.5222 −1 0.62 0.68 0.2617 −2 0.90 1.00 0.10

hERG ECFP 4 17 2 0.85 0.76 0.17177 1 0.70 0.55 0.20481 0 0.64 0.50 0.28130 −1 0.72 0.81 0.4415 −2 0.89 1.00 0.30

HDAC ECFP 4 0 2 — — —148 1 0.77 1.00 0.3895 0 0.89 0.83 0.17

112 −1 0.79 0.75 0.31117 −2 0.89 0.88 0.20

logP/PSA 0 2 — — —148 1 0.77 1.00 0.2495 0 0.66 0.56 0.29

112 −1 0.55 0.23 0.09117 −2 0.75 0.93 0.48

ROC = Receiver Operating Characteristic, TPR = true positiverate, FPR = false positive rate.

patch-clamp) category 2 and above category 3. Then, forall compounds the differences between the categories werecomputed and the final y values became normalized usingthe following equation (the term assay here stands for thecategory):

y = (Assay1) − (Assay2)

In total this generated five different categories that reflectthe differences between the assays. This process is depictedin Fig. 2.

Again, the molecules were encoded with ECFP 4 descrip-tors and a multi-category Bayes model was established. Thesame procedure was used to model the third dataset which

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 5: SPREAD—exploiting chemical features that cause differential activity behavior

J. Scheiber et al.: Analyzing Chemical Features Causing Differential Activity 119

consisted of 472 HDAC inhibitors that were tested both ina biochemical and a cellular assay.

For the fourth dataset we extracted all compounds fromthe GVK Bio Medichem database that reported IC50 valuesfor three opioid receptors (OPRK1, OPRD1, and OPRM1).This reflects the biggest subset of compounds in GVK thathave been tested against three different receptors. In totalwe ended up with 2054 molecules that had been testedagainst all these receptors. For the purpose of this study,agonists and antagonists were not distinguished. Next, wecategorized the activities within every receptor in threecategories (1: inactive, i.e. IC50 >10 µM, 2: moderatelyactive, i.e. IC50 <10 µM and >1 µM and 3: active, i.e.IC50 <1 µM). Therefore each molecule is a member ofthree classes. These classes were then used to build themeta-categories by combining all available target cate-gories. Each compound is a member of exactly one meta-category. For example, if a compound is very active againstOPRM1 and inactive against OPRK1 and OPRM1, it getsthe meta-category K1 D1 M3; if it is very active againstD and M but inactive against K, then the meta-category isK1 D3 M3. As the SPREAD approach aims to understanddifferences between assays, the meta-category for one com-pound is adjusted to K0 D0 M0 when it is equally active inall assays. All compounds were assigned these categoriesand multi-category models were trained based on ECFP 4descriptors and using the bulk parameters.

To further describe how this can be used in a prospectivemanner, we extracted for every category of the trainedmodels 50 most relevant good and bad features, i.e. thechemical features that have the highest statistical weightsfor activity or inactivity. “Good” features are those thatcontribute to “being member of a class”, while “bad”features contribute to “not being a member of a class”.These features were extracted using the Pipeline Pilotcomponent “View Top N Good and Bad Features forMCM Categories” by using the training set as dataseton which selected ECFP fingerprints were mapped to.This Pipeline Pilot component is a way of displayingfingerprints that have the highest or least correlation tothe activity that was trained within the model-buildingprocess. As an outcome, one gets a list of substructureswith assigned scores. Thereby we obtained the fragmentsmiles for the different ECFP fingerprints. These fragmentsmiles were merged by structure (in Pipeline Pilot) to findthose that were common to different categories. Afterwardsthe features were clustered in Spotfire using hierarchicalclustering by an unweighted average clustering method anddefault Spotfire parameters.

To assess the model quality we determined the ReceiverOperating Characteristic (ROC) [15] score which is a com-monly used measure of model performance for binary clas-sification [16]. It is defined by selectivity along the Y axis

and by Specificity 1 along the X axis. An ROC score of 1achieves perfect classification (no false positives or nega-tives) while an ROC score of 0.5 indicates a random model.Also, the true-positive rate (TPR) and false-positive rate(FPR) are given, the first one is better the closer it is to 1,the latter is better the closer to 0 it is, and both are mea-sured on a scale from 0 to 1. These rates were computedas follows: TPR = # (True Positives)/[#(True Positives +#(False Negatives)].

FPR = # (False Positives)/[#(False Positives) + #(TrueNegatives)]. All the determined values are shown in Table b1.For all these steps, the respective Pipeline Pilot compo-nents with their standard settings as implemented by Scite-gic/Accelrys have been used.

3. RESULTS AND DISCUSSION

3.1. Kinases: Biochemical versus Cellular

In the present case, we are seeking to avoid compoundsthat are good inhibitors in vitro, but fail to achieve potentcellular activity. By using SPREAD with the interpretableECFP4 fingerprints (SciTegic) we were able to identifychemical features in our training set that are associatedsimultaneously with poor cellular activity and strong invitro potency (Fig. 2, bottom left, meta-category 2, whereAssay 1 is biochemical and Assay 2 is cellular). Impor-tantly, the features are not specific to any one kinase targetbecause they were derived from a combined dataset of30 kinase targets. Features that are not influential towarddifferences in the two-assay formats can be found in meta-category 0, where there are inherently no differences foundbetween assay categories for the compounds used to trainthis meta-category. At the opposite end of the spectrum,substructures that are associated peculiarly with high cellu-lar activity but low biochemical potency were singled out(Fig. 2, top right, meta-category −2). Statistics describingmodel quality (i.e. model self-consistency) at each meta-category based on a leave-one-out cross-validation proce-dure are shown in Table 1. All the ROC values and TPRsand FPRs indicate that the statistical quality of the mod-els is good for each meta-category. The ROC values formeta-category 0 is slightly lower in many cases becauseit comprehends compounds with three different activityclasses. However, these cases are not the most interestingones for SPREAD.

There are a number of reasons why a compound couldpossess potent in vitro kinase inhibition but lose cellu-lar activity. Among these are cell permeability issues, cellmetabolism, the influence of a particular kinase ATP Kmvalue on EC50, ATP-competitive versus allosteric binding,and the on- and off-rate kinetics of the compound, whereoff-rates more significantly influence the duration of in vivo

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 6: SPREAD—exploiting chemical features that cause differential activity behavior

120 Statistical Analysis and Data Mining, Vol. 2 (2009)

Table 3. External validation of models that predict differences in compound activity in biochemical and cellular kinase assays.

Dataset Descriptor Correct prediction (%) Almost correct (%) Wrong (%) Reliable (%)

Kinase 1 ECFP 4 52 34 14 86Kinase 2 ECFP 4 55 25 20 80

pharmacokinetics. With regards to cell permeability, logPand PSA are often used together to predict the cell perme-ability of a compound. We find that combined logP/PSAmodels do not explain well the differences in cellular versusbiochemical kinase assays (Table 2). Therefore the moreagnostic models trained only on chemical features appear tocapture better the differences between in vitro and cellularassays, which could be caused by any number of reasons.Further, the advantage of using the Naı̈ve Bayesian statisticrather than other learning methods is that chemical fea-tures are treated independently (naive) such that multiplemechanisms contributing to assay differences may be cap-tured simultaneously. In short, the biological mechanism bywhich the assay differences arise need not be understoodin order for SPREAD to predict when the differences willoccur.

Next, the meta-category models for the kinase datasetwere validated by prospective use on two external kinasetest sets with no relation to the training set (in terms ofcompound structures or kinase targets), except that all com-pounds were tested in both a biochemical and a cell-basedassay. We then attempted to predict when experimentalassay differences would occur for these molecules. If thepredicted difference exactly matches the meta-category,we regard the prediction as correct (Table 3). If a meta-category is predicted that is ±1 from the real category thecompound belongs to, the prediction is regarded as almostcorrect. Both results are further considered as a “reliableprediction” for practical purposes. The final statistics forboth external sets (Kinase 1 and Kinase 2) are shown inTable 3; more details on dataset composition are shown inthe Methods Section. We find that more than half of themolecules are predicted to be in their true meta-categoryand that 80% or more are predicted in a reliable area whichgives the model a very good confidence prospectively.

3.2. hERG Channel Blockers: In Vitroversus PatchClamp

In this case study, it is desirable to obtain compoundsthat will be inactive in both hERG assays (meta-category0) or less active in the patch-clamp assay than in the in vitroassay (meta-categories 1 and 2). Again, a very stable modelwas established which enables the researcher to identifymolecular features causing assay differences (details notshown, figures of merit in Table 2).

3.3. HDAC: Biochemical versus Cellular

Similar points are important for the HDAC dataset.Desirable compounds cross the membrane and retain effec-tive HDAC inhibition. Similar to the kinase dataset, themodels trained on ECFPs are statistically valid and are bet-ter cross-validated than models trained only on logP/PSA.

Table 4. The statistical quality of the generated models forthe Opioid Receptor dataset; all other possible categories arenot populated.

Descriptor N Meta-category ROC TPR FPR

ECFP 4 1120 diagonal 0.83 0.76 0.26209 K1 D1 M2 0.81 0.78 0.2849 K1 D1 M3 0.87 0.9 0.2785 K1 D2 M1 0.85 0.81 0.286 K1 D2 M2 0.87 0.77 0.1424 K1 D2 M3 0.82 0.79 0.1948 K1 D3 M1 0.95 0.9 0.1123 K1 D3 M3 0.86 0.87 0.1538 K2 D1 M1 0.87 0.68 0.0351 K2 D1 M2 0.83 0.76 0.1785 K2 D1 M3 0.91 0.85 0.1646 K2 D2 M1 0.94 0.93 0.1353 K2 D2 M3 0.83 0.83 0.2148 K2 D3 M1 0.95 0.96 0.0824 K2 D3 M3 0.85 0.75 0.1836 K3 D1 M3 0.91 0.81 0.0329 K3 D2 M3 0.88 0.83 0.19

logP PSA 1120 diagonal 0.61 0.41 0.19209 K1 D1 M2 0.66 0.79 0.4649 K1 D1 M3 0.70 0.86 0.4285 K1 D2 M1 0.64 0.74 0.4886 K1 D2 M2 0.63 0.44 0.1724 Kg1 D2 M3 0.70 0.92 0.5248 K1 D3 M1 0.90 0.79 0.0723 K1 D3 M3 0.65 0.65 0.3438 K2 D1 M1 0.66 0.82 0.4751 K2 D1 M2 0.60 0.37 0.1185 K2 D1 M3 0.72 0.73 0.3146 K2 D2 M1 0.60 0.98 0.7853 K2 D2 M3 0.61 0.72 0.4148 K2 D3 M1 0.67 0.90 0.5724 K2 D3 M3 0.65 0.67 0.2436 K3 D1 M3 0.69 0.75 0.3129 K3 D2 M3 0.66 0.93 0.54

(ROC = Receiver Operating Characteristic; TPR = true-positive rate; FPR = false-positive rate; K: Opioid kappa;D: delta; M: mu; 1: inactive—IC50 > 10 um; 2: moderatelyactive—IC50 >1 um and <10um; 3: very active —IC50<1um)

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 7: SPREAD—exploiting chemical features that cause differential activity behavior

J. Scheiber et al.: Analyzing Chemical Features Causing Differential Activity 121

Fig. 3 Feature clustering based on profiles of meta-category scores. Each line describes one feature, each column one meta-category(1:inactive—IC50 >10 um; 2: moderately active—IC50 >1 um and <10 um; 3: very active—IC50 <1 um; in each case the order is thesame shown in Table 3, i.e. first kappa, then delta, and then mu) “Good” features are shown in green, “bad” in red It can be seen that themodels infrequently share good features whereas bad features—reflecting inactivity—often correlate. The top example shows a featurethat makes compounds being very active against delta, but inactive against the others; the second (bottom yellow line) shows a featurestrongly correlated with compounds active against mu.

3.4. Opioid Receptor Selectivity

With respect to opioid receptor selectivity, various phar-macological endpoints may be desirable. If one wants todesign a selective OPRK1 ligand one would aim for fea-tures that are found especially in a category that shows noactivity for the other receptors. If one wants to hit severalreceptors at once it is desirable to consider chemical fea-tures that make a compound showing the same behavioragainst all receptors. Therefore, in this case, it was chosennot to model the differences between each assay and eachother assay but the global activity fingerprint of modelstested in the three assays to elucidate all the features thatare linked to the corresponding activity patterns. Again, allthe trained models are statistically valid, both for the ECFPdescriptors and the bulk parameters (Table 4).

In Fig. 3 we demonstrate that chemical features can behierarchically clustered based on the meta-category scoresused as a profile. The heat map rows correspond to fea-tures that have been identified as relevant in the differentmeta-categories (either positively or negatively correlatedwith activity). It becomes obvious that using the SPREADapproach together with an interpretable chemical descriptorallows one to find chemical features linked to desired selec-tivities. Further, in some cases, feature clustering accordingto SPREAD profiles results in co-clustering of potential

bioisosteric replacements. This information can then beapplied for the design of molecules that have a higher prob-ability of exhibiting a desired profile, e.g. permeability orselectivity between a target and an undesired off-target.

4. CONCLUSION

In summary, we have developed and validated a novelbut generic method that directly aims to model and under-stand the differences compounds exhibit when tested intwo or more assays. By transforming the activity val-ues from the assays into meta-categories, statistical mod-els can be trained directly on the qualitative differencesbetween assays. This contributes heavily toward a tangi-ble understanding of molecular assay selectivity. Althoughensembles of models could be used alternatively to pre-dict compounds that score highly in one assay and low inanother, the advantage of the SPREAD approach is that thechemical features influencing assay differences are parsedout immediately as a consequence of training the modelon the coincident assay differences. When used prospec-tively, the approach enables rationally directed differentialmolecular design.

Statistical Analysis and Data Mining DOI:10.1002/sam

Page 8: SPREAD—exploiting chemical features that cause differential activity behavior

122 Statistical Analysis and Data Mining, Vol. 2 (2009)

REFERENCES

[1] T. J. Moore, M. R. Cohen, and C. D. Furberg, Arch InternMed 167 (2007), 1752.

[2] A. Bender, J. Scheiber, M. Glick, J. W. Davies, K. Azzaoui,J. Hamon, L. Urban, S. Whitebread and J. L. Jenkins,ChemMedChem 2 (2007), 861.

[3] S. Whitebread, J. Hamon, D. Bojanic and L. Urban, DrugDiscov Today 10 (2005), 1421.

[4] C. Lipinski and A. Hopkins, Nature 432 (2004), 855.[5] R. Thaimattam, R. Banerjee, R. Miglani and J. Iqbal, Current

Pharmaceutical Design 13 (2007), 2751.[6] T. Klabunde and A. Evers, Chembiochem 6 (2005), 876.[7] M. J. Keiser, B. L. Roth, B. N. Armbruster, P. Ernsberger,

J. J. Irwin, and B. K. Shoichet, Nat Biotech 25 (2007),197.

[8] A. Bender, D. W. Young, J. L. Jenkins, M. Serrano,D. Mikhailov, P. A. Clemons and J. W. Davies, Comb ChemHigh Throughput Screen 10 (2007), 719.

[9] J. H. Nettles, J. L. Jenkins, A. Bender, Z. Deng, J. W. Daviesand M. Glick, J Med Chem 49 (2006), 6802.

[10] Nidhi, M. Glick, J. W. Davies, and J. L. Jenkins, J ChemInf Model 46 (2006), 1124.

[11] G. V. Paolini, R. H. B. Shapland, W. P. van Hoorn, J. S.Mason, and A. L. Hopkins, Nat Biotechnol 24 (2006), 805.

[12] J. Scheiber, J. L. Jenkins, S. C. Sukuru, A. Bender,D. Mikhailov, M. Milik, K. Azzaoui, S. Whitebread, J.Hamon, L. Urban, M. Glick, J. W. Davies. Mapping adversedrug reactions in chemical space. J Med Chem. 52(9) (2009),3103–3107.

[13] J. Scheiber, B. Chen, M. Milik, S. C. Sukuru, A. Bender,D. Mikhailov, S. Whitebread, J. Hamon, K. Azzaoui, L.Urban, M. Glick, J. W. Davies, J. L. Jenkins. Gaining Insightinto Off-Target Mediated Effects of Drug Candidates witha Comprehensive Systems Chemical Biology Analysis. JChem Inf Model 49(2) (2009), 308–317.

[14] Scitegic Pipeline Pilot, Version 7.0, Accelrys, San Diego,USA.

[15] T. Fawcett, Technical Report, Palo Alto, USA: HPLaboratories 2004, 38.

[16] N. Triballeau, F. Acher, I. Brabet, J. P. Pin and H. O.Bertrand, J Med Chem 48 (2005), 2534.

Statistical Analysis and Data Mining DOI:10.1002/sam