Click here to load reader
Upload
andrea-n
View
216
Download
1
Embed Size (px)
Citation preview
ORIGINAL PAPER
Development of a decision tree to classify the most accuratetissue-specific tissue to plasma partition coefficient algorithmfor a given compound
Yejin Esther Yun • Cecilia A. Cotton •
Andrea N. Edginton
Received: 10 July 2013 / Accepted: 7 November 2013 / Published online: 21 November 2013
� Springer Science+Business Media New York 2013
Abstract Physiologically based pharmacokinetic (PBPK)
modeling is a tool used in drug discovery and human health
risk assessment. PBPK models are mathematical repre-
sentations of the anatomy, physiology and biochemistry of
an organism and are used to predict a drug’s pharmacoki-
netics in various situations. Tissue to plasma partition
coefficients (Kp), key PBPK model parameters, define the
steady-state concentration differential between tissue and
plasma and are used to predict the volume of distribution.
The experimental determination of these parameters once
limited the development of PBPK models; however, in
silico prediction methods were introduced to overcome this
issue. The developed algorithms vary in input parameters
and prediction accuracy, and none are considered standard,
warranting further research. In this study, a novel decision-
tree-based Kp prediction method was developed using six
previously published algorithms. The aim of the developed
classifier was to identify the most accurate tissue-specific
Kp prediction algorithm for a new drug. A dataset con-
sisting of 122 drugs was used to train the classifier and
identify the most accurate Kp prediction algorithm for a
certain physicochemical space. Three versions of tissue-
specific classifiers were developed and were dependent on
the necessary inputs. The use of the classifier resulted in a
better prediction accuracy than that of any single Kp pre-
diction algorithm for all tissues, the current mode of use in
PBPK model building. Because built-in estimation equa-
tions for those input parameters are not necessarily avail-
able, this Kp prediction tool will provide Kp prediction
when only limited input parameters are available. The
presented innovative method will improve tissue distribu-
tion prediction accuracy, thus enhancing the confidence in
PBPK modeling outputs.
Keywords Physiologically based pharmacokinetic
model � Tissue to plasma partition coefficient �Decision tree � Random forest
Abbreviations
AAFE Absolute average fold error
AFE Average fold error
B:P Blood-to-plasma ratio
E Extraction ratio
Exp Experimentally derived Kp values
FE Fold error
Fi Fraction of ionized drug
Fup Unbound fraction in plasma
HSA Human serum albumin
Kp Tissue-to-plasma partition coefficient
Kpu Tissue-to-plasma water partition coefficient
KpuBC Unbound compound concentration in blood
cells
LogD Logarithmic value of N-octanol–water
partition coefficient adjusted for ionization
at pH 7.4
LogKvo:w Logarithmic value of vegetable oil–water
partitioning adjusted for ionization at pH 7.4
LogP Logarithmic value of N-octanol–water
partition coefficient
Electronic supplementary material The online version of thisarticle (doi:10.1007/s10928-013-9342-0) contains supplementarymaterial, which is available to authorized users.
Y. E. Yun � A. N. Edginton (&)
School of Pharmacy, University of Waterloo, 200 University
Ave W, Waterloo, ON, Canada
e-mail: [email protected]
C. A. Cotton
Department of Statistics and Actuarial Science, University of
Waterloo, Waterloo, ON, Canada
123
J Pharmacokinet Pharmacodyn (2014) 41:1–14
DOI 10.1007/s10928-013-9342-0
M The number of variables
MA Membrane affinity
MFE Mean fold error
mtry Optimal value of the number of variables
ntree Number of trees
PBPK Physiologically based pharmacokinetic
PhS Phosphatidylserine
Pred Predicted Kp values
r2 Coefficient of determination
RBCu Red blood cell partitioning data for unbound
drugs
RMSE Root mean square error
SPR Surface plasmon resonance
TCB Tissue composition based
Vss Volume of distribution at steady state
Introduction
Physiologically based pharmacokinetic (PBPK) models
integrate organism- and compound-specific information
within a mathematical framework to describe a com-
pound’s pharmacokinetics. The model structure represents
the mammalian system of parallel and serial connections
between organs and blood pools. Model parameters reflect
the anatomical and physiological aspects of the mammalian
system and include organ volumes and blood flows.
Parameters relating to the compound include protein
binding affinity, tissue-to-plasma partition coefficients
(Kp) and intrinsic clearance. Combining appropriate model
structure with accurate parameter values allows for a pre-
diction of the pharmacokinetics of the compound in the
absence of any real in vivo pharmacokinetic data. The
extent of compound distribution into an individual organ is
expressed by a steady-state Kp, i.e., the ratio of the con-
centration of a compound in tissue and plasma [1]. Kps are
used to quantify the extent of a compound’s distribution
from the systemic circulation into the tissues at steady state
and are key parameters within a PBPK model. The extent
of tissue distribution is dependent on tissue partitioning and
the binding affinity of a compound to blood cells, proteins
and tissue components [2]. Due to various tissue compo-
sitions, Kps are tissue-specific.
Historically, Kp values have been derived experimen-
tally in vivo. This is a costly and time-consuming endeavor
and was once a limitation in the development of PBPK
models. As a result, Kp prediction algorithms have been
developed to overcome the need for experimental Kp
determination. These algorithms predict Kps based on the
underlying physiological and behavioral aspects of a
compound in the body [1, 3–11]. Kp prediction algorithms
are divided into two areas: (i) tissue-composition-based
(TCB) algorithms, which are created solely using the
physicochemical properties of a compound along with
tissue-specific parameters, and (ii) correlation-based algo-
rithms, which are empirically derived using both com-
pound-specific information and information derived in vivo
(e.g., muscle Kp). Algorithm outputs are Kp values based
on total concentration (Kp) [1, 4, 5, 11] or unbound con-
centration (Kpu) [8–10] in the case of drug compounds or
tissue: blood partition coefficients [12] based on total
concentration for environmental chemicals.
Tissue-composition-based algorithms
TCB algorithms are mechanistic in nature and aim to
describe the degree of drug accumulation based on tissue
composition (e.g., acidic phospholipids concentration),
physicochemistry and plasma protein binding [1, 7, 9–11,
13]. The main assumption of these models is that the dis-
tribution of a compound is primarily governed by passive
diffusion into tissue compartments and reversible binding
to common proteins. Poulin et al. introduced the first TCB
models by calculating Kps as a function of a lipophilicity
measure, tissue-specific concentration of lipids and fraction
unbound in plasma [1, 7, 12–14]. Berezhkovskiy later
revised this method by correcting for the ratio of unbound
fraction in tissue to that in plasma [3]. Later, the Rodgers
and Rowland model [9] accounted for the electrostatic
interaction of basic moieties of moderate to strong bases to
acidic phospholipids and passive distribution into intra-
and/or extracellular tissue water. Rodgers et al. [10] con-
tinued to develop a new mechanistic equation for predict-
ing the Kps of neutrals, acids and weak bases by
considering compound interactions with proteins. In Sch-
mitt’s model [11], compound binding to phospholipids was
explained mechanistically by accounting for the interaction
between charged phospholipids and charged molecules and
considering the phosphatidylcholine:buffer partition coef-
ficient and the phospholipid:water partition coefficient.
These developed algorithms do not require in vivo infor-
mation because they rely on a mechanistic understanding
of the complex interactions occurring between drug and
tissue.
Correlation based algorithms
The relationship between experimentally determined
in vivo parameters (e.g., a muscle Kp) and Kps has been
utilized to develop predictive regression equations. The
work of Bjorkman [4] demonstrated that muscle Kp can be
used to predict the Kps of other lean tissues, and Bjorkman
developed regression equations to that end. This work was
later refined by Jansson et al. [5], who additionally incor-
porated lipophilicity into the equations. Another algorithm
2 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
by Poulin and Theil [8] used the relationship between red
blood cell partitioning data for unbound compounds
(RBCu) and tissue Kps as well as the relationship between
muscle Kps and tissue Kps to develop predictive regression
equations. RBCu was determined in vitro and used as an
indicator of the degree of binding capacity due to the
electrostatic interactions of basic compounds with acidic
phosphatidylserine (PhS). This model was later refined by
taking into account both the pharmacological activity of a
compound and compound-specific properties such as pKa
and lipophilicity [8, 15]. The most recent correlation-based
approach, Yun and Edginton [16], used volume of distri-
bution (Vss) as a primary predictor of tissue Kps in addi-
tion to physicochemical descriptors.
Despite increasing attention and interest in the accurate
prediction of compound distribution [17], a standard Kp
prediction method has not been agreed upon within the
research community. To date, there is no single prediction
algorithm that is applicable to all compounds in all tissues
(see Table s.1 in supplementary material), and the pre-
dictability of any single Kp prediction algorithm may vary
depending on the physicochemical properties of the com-
pound and/or the organ being assessed. Furthermore, the
experimental determination of the required compound-
specific chemical descriptors and in vitro and in vivo input
parameters can limit the use of some Kp prediction algo-
rithms. In other words, the availability of these input
parameters often determines the usability of an algorithm.
This study aims to determine the best performing algorithm
in a specific physicochemical space for a single tissue using
a statistical classification technique.
Random forest, a decision-tree-based statistical classi-
fication method [18], was utilized to identify the best
performing algorithms for given input parameters (e.g.,
LogP, fup). This technique is empirical in nature, whereby
training data are used to develop a system of decision trees
that are collated within a forest to support decision making.
In the random forest analysis, N bootstrap samples are
drawn from training data. According to the principle of
recursive partitioning [18, 19], each tree is created using a
random set of samples from the training data and a random
set of input parameters chosen from a library of inputs.
Each decision tree created from a bootstrap sample will
result in a classification. The classification with the most
votes is selected by the forest. A classifier approach will
allow the user to harness the best of all algorithms to
predict tissue-specific Kps for a new compound.
This study aims to develop a predictive decision-tree-
based classifier that will choose the most accurate pub-
lished Kp algorithm for a new compound within a specific
tissue using readily available input parameters and will
assess the predictive accuracy of the classifier relative to
that of previously published algorithms.
Methods
Input parameters for Kp algorithms
Published Kp prediction algorithms use various inputs.
Some of the key input parameters are described below.
Lipophilicity is one of the most important physicochemical
properties affecting compound disposition. Lipophilicity is
measured in various media types, with LogP the most
commonly reported. A measure of lipophilicity is usually
available in the early stages of compound discovery either
through in vitro experimentation or in silico prediction.
The fraction of compound unbound in plasma (fup) is a
common input parameter within Kp prediction algorithms
because of its pronounced influence on the extent of dis-
tribution. Common binding proteins include albumin, gly-
coproteins, lipoproteins and globulins, and binding to these
proteins is assessed through in vitro or ex vivo experi-
mentation and is considered commonly available for a
novel compound.
The degree of ionization of a compound at a particular
pH has important consequences with respect to distribu-
tion. Compounds that are weak acids or weak bases exist
in solution at equilibrium between the unionized and
ionized form. Only un-ionized nonpolar chemicals are
hypothesized to cross the cellular membranes as ionized
compounds are less permeable than un-ionized com-
pounds. At equilibrium, the concentrations of the un-
ionized compounds are equal in both plasma and tissue.
However, the total concentration in one matrix (e.g., a
tissue) may vary depending on the degree of ionization of
a compound at a tissue-specific physiological pH. For the
statistical analyses performed in this study, the ionized
fraction of the compound (fi) [16, 20, 21] represents the
degree of ionization at a tissue-specific physiological pH
(Table 1, Eq. 1). The fi equations are derived from the
Henderson-Hasselbalch equation and fi value ranges from
0 to 1, where a highly ionized compound at a specific pH
approaches 1.
Other input parameters used for previously published
Kp prediction algorithms include RBCu, muscle Kp and
Vss. Compound binding to RBCs can be used as an indi-
cator of in vivo distribution because RBCs are rich in
acidic phospholipids, which are responsible for the high
binding affinity of basic compounds. Only a few algorithms
[8, 9] require the input parameter of RBCu. Poulin and
Theil [8] demonstrated using both RBCu and muscle Kp
that tissue-specific Kp prediction with muscle Kp as an
input variable was more accurate than Kp prediction with
RBCu as an input variable. Muscle Kp is also an important
factor in Kp prediction because muscle is a highly perfused
organ and accounts for approximately 40 % of total body
mass [22, 23]. In addition, Vss can be used as an input
J Pharmacokinet Pharmacodyn (2014) 41:1–14 3
123
Table 1 Summary of equations used to estimate an unknown input parameter
Parameter Description Equation Reference
Equation 1 Degree of ionization
at a tissue pH
Calculation of fraction
ionized (fi) and fraction
unionized (fui)
fui ¼ 11þ10pKa�pHtissueð Þ fi ¼ 1� fui [16, 20,
21]Monoprotic bases
fi ¼ 1� ½1þ 10pKa�pH tissue��1
Diprotic bases
fi ¼ 1� ½1þ 10pKa1�pH tissue þ 10pKa1þpKa2�pH tissue�2��1
Monoprotic acids
fi ¼ 1� ½1þ 10pH tissue�pKa��1
Diprotic acids
.fi ¼ 1� ½1þ 10pH tissue�pKa1 þ 10pH tissue�2�pKa1�pKa2 ��1
Zwitterions
fi ¼ 1� ½1þ 10pkabase�pH tissue þ 10pH tissue�pKaacid ��1
Where pKa1 [ pKa2 for bases, whereas pKa1 \ pKa2 for acids
Equation 2 LogD Partition coefficient of
octanol and water at
specific pH
Monoprotic base [7, 15]
LogP� Log(1 + 10pKa1�7:4ÞDiprotic base
LogP� Log(1 + 10pKa1�7:4þ10pKa1þpKa2�2�7:4ÞMonoprotic acid
LogP� Log(1 + 107:4�pKa1 ÞDiprotic acid
LogP� Log(1 + 107:4�pKa1þ102�7:4�pKa1�pKa2 ÞZwitterions
LogP� Log(1 + 10pKabase�7:4þ107:4�pKaacid Þwhere pKa1 [ pKa2 for bases, whereas pKa1 \ pKa2 for acids
Equation 3 Kpu_BC (affinity for
blood cell)
Red blood cell to plasma
partition coefficient as it
relates to unbound
compound
B:P�ð1�HematocritÞHematocrit�fup
[26]
where B:P is blood to plasma ratio and fup is unbound fraction
in plasma
Equation 4 KpuBC X�fIW RBC
Yþ PfNL;RBCþð0:3Pþ0:7ÞfNP;RBC
Y
� �[27]
where fIW = .0914, fNL = 0.0017, fNP = 0.0029, P = antilog
values of LogP
For monoprotic base: X = 1 ? 10pKa-7.22, Y = 1 ? 10pKa-7.4
For monoprotic acids: X = 1 ? 107.22-pKa,
Y = 1 ? 107.4-pKa
Equation 5 Blood to plasma
ratio (B:P)
Log B : Pð Þ ¼ �0:004282þ 0:067028 LogP
þ 0:214590 Log fupð Þ n ¼ 28; R2 ¼ 0:40� � [9]
This equation was obtained using Rodgers et al. [9] dataset. In
the dataset, there were 28 experimentally determined B:P
values available. The regression equation was developed and
was statistically significant (P \ 0.05).
Equation 6 LogMA Logarithmic value of
membrane affinity
LogMA ¼ 1:294þ 0:304 � LogP [11]
This equation was obtained using Schmitt’s dataset. In the
dataset, there were 60 LogMA values available. The
regression equation was developed and was statistically
significant (P \ 0.05).
Equation 7 LogHSA Logarithmic value of
human serum albumin
(HSA)
LogHSA ¼ 0:294þ 0:135 � LogP [11]
This equation was obtained using Schmitt’s dataset. In the
dataset, there were 60 LogHSA values available. The
regression equation was developed and was statistically
significant (P \ 0.05).
4 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
because it is the parameter that represents the overall extent
of drug distribution in the body [5, 16, 24].
These physicochemical and physiological inputs repre-
sent key input parameters for Kp prediction algorithms.
Some of these input parameters are readily available, such
as a measure of lipophilicity or pKa, whereas others are not
routinely measured, such as RBCu or muscle Kp. Due to
the difficulty in obtaining some of the input parameters,
several algorithms have limited utility in tissue-specific Kp
prediction for a novel compound.
Data collection
A database of experimentally derived partition coefficients
with corresponding compound physicochemical properties
were created from the literature using several MEDLINE
searches. In vivo parameters such as the fup and Vss were
also included in the database. Data were included in the
study based on the following criteria: (i) reported experi-
mentally derived Kp values plausibly represent true steady-
state distribution/pseudo equilibrium and (ii) fup, pKa and
one of the lipophilicity measures (i.e., LogP, LogD, Log-
Kvo:w) are available. When experimental physicochemical
parameters (e.g., all lipophilicity measures, pKa) were not
available in the literature, the values were obtained from
predictions made in ChEMBL [25]. Experimentally deter-
mined values were preferably used over predicted values.
The stereoselectivity of a compound was considered, if
applicable, so that R and S enantiomers were considered
separately. As shown in Table s.1 decision trees for the
pancreas, testes, thymus and RBC were not generated
because the number of data points was insufficient for a
classification analysis.
Estimation of required inputs
Table 2 presents the required input parameters for each
algorithm. In the event that a required input parameter was
not available, it was calculated based on the equations
presented in Table 1. For example, if only LogP was
available but LogD was the necessary input parameter,
LogD was calculated using equations based on the equa-
tions derived by Poulin et al. [7, 15] (Table 1, Eq. 2). For
some input parameters [e.g., LogMA, LogHSA and
blood:plasma ratio (B:P)], a regression equation was
derived using the datasets reported in the publications by
Rodgers et al. [9] and Schmitt [11].
Affinity for blood cells (KpuBC) (i.e., unbound compound
concentrationinbloodcells)isoneoftherequiredinputsforthe
Rodgersetal. [9,10]algorithms.KpuBCisthefunctionoffup,
Table 2 Summary of Kp
prediction algorithms and their
main inputs
Algorithm Approach Main inputs
Bjorkman [4] Correlation-based Muscle Kp
Berezhkovskiy [3, 7, 13] Tissue-composition-based LogP, LogKvo:w, fup
Rodgers et al. [9, 10] Tissue-composition-based LogP, pKa, fup, B:P
Schmitt [11] Tissue-composition-based LogP, LogD, LogKvo:w, LogMA,
LogHSA, pKa, fup
Jansson et al. [5] Correlation-based Vss, Muscle Kp, LogP, LogD,
LogKvo:w
Poulin and Theil [8] Correlation-based Muscle Kp or RBCu
Yun and Edginton [16] Correlation-based Vss, LogP, pKa, fup
Table 1 continued
Parameter Description Equation Reference
Equation 8 LogKvo:w Logarithmic value of
partition coefficient
between vegetable oil and
water
1:115 � LogP� 1:34 [33]
Equation 9 Fut_lean tissue Fraction of unbound
compound in lean tissue
1= 1þ 1� fupð Þ=fupð Þ � 0:5ð Þð Þ [1]
Equation 10 Fut_adipose tissue Fraction of unbound
compound in adipose
tissue
1= 1þ 1� fupð Þ=fupð Þ � 0:15ð Þð Þ [1]
Equation 11 Muscle KpVss ¼ Vplasma þ
Pn1
Vtissue;i
�10a � logðKp;muscleÞ þ b � logðlipophilicityÞ þ c where the
coefficients a, b and c are listed in Jansson et al.
[5]
J Pharmacokinet Pharmacodyn (2014) 41:1–14 5
123
B:P and hematocrit. KpuBC is estimated using the standard
equation (Table 1,Eq. 3) in theRodgersmodels [9,26]. In the
absence of an observed B:P value, KpuBC is estimated using
Eq. 4(Table 1)asproposedbyPaixaoetal.[27].Thisequation
was derived from Rodgers et al. [10]. The assumptions for the
equations are that (i) in erythrocytes, there is no extracellular
spaceand(ii)albuminandlipoproteinsarenotcontainedwithin
thespace.
Whereas the first approach to B:P estimation was the use of
a mechanistic equation as described above, another approach
was also followed for B:P estimation, namely, the develop-
ment of a regression equation (Table 1, Eq. 5). Experimen-
tally determined B:P, LogP and fup (n = 28) were obtained
from Rodgers et al. [9], and a predictive regression equation
was developed based on the dataset. For the linear regression
analysis, the statistical software R version 2.12 [28] was used.
The estimation equation that yielded the most accurate Kpu
prediction when compared to the experimentally derived Kpu
values was selected for the calculation of Kps when Rodgers
et al. [9, 10] was used in this study.
For the calculation according to Schmitt’s algorithm
[11], the logarithmic value of the phosphatidylcho-
line:water partition coefficient at pH 7.4 (LogMA) and the
logarithmic value of human serum albumin (LogHSA)
must be estimated in the absence of the experimentally
determined values. The regression equations for LogMA
(Table 1, Eq. 6) and LogHSA (Table 1, Eq. 7) were gen-
erated using a dataset provided by Schmitt [11].
Separation of classifier groups
For researchers performing Kp prediction for a novel
compound, the availability of input parameters will not be
consistent. For example, when in vivo work has not been
performed on a compound, researchers are likely to have
only physicochemical input parameters and lack any
in vivo input parameters such as muscle Kp. Therefore, a
decision tree incorporating algorithms that require in vivo
inputs will not be useful to the researcher. Thus, several
versions of the Classification trees were created and were
based on the likely groupings of input parameters
researchers may have. Any additional algorithm-specific
input parameters that were required were estimated using
the equations in Table 1.
The development and evaluation of Classification tree
#1 was dependent on compounds for which muscle Kp, one
of the lipophilicity measures (e.g., LogP), pKa and fup
were available (Table 3). The development and evaluation
of Classification tree #2 was dependent on compounds for
which Vss, one of the lipophilicity measures, pKa and fup
were available. The development and evaluation of Clas-
sification tree #3 was dependent on compounds for which
one of the lipophilicity measures, pKa and fup were
available. The previously published algorithms that were
classified in each of the Classification trees are listed in
Table 3 along with the number of compounds used in the
development and evaluation of each tree.
Kp calculations according to previously published
algorithms
Kps were calculated according to each published algorithm
using only those input parameters required for Classifica-
tion trees #1 through #3 and using estimation equations for
any remaining required inputs. For the Berezkovskiy model
[3], LogKvo:w, fraction unbound in lean tissue (Fut_lean)
and fraction unbound in adipose tissue (Fut_adipose) were
calculated using Eqs. 8, 9 and 10, respectively (Table 1).
For Rodgers et al.’s method, the Kps of bases with
pKa C 7 were calculated by Rodgers et al. [9]. LogKvo:w
and B:P were estimated by Eqs. 8 and 5, respectively
(Table 1). The Kps of acids, zwitterions, neutrals and weak
bases were calculated by Rodgers and Rowland. [10]. In
Jansson’s algorithm [5], Kp prediction equations of bases
and neutrals and Kp prediction equations of acid and zwit-
terions were separately used. Both experimental muscle Kp
(Classification tree #1) and muscle Kp, as derived from
experimental Vss (Table 1, Eq. 11, Classification tree #2),
were used to predict Kps for Jansson’s algorithms [5]. LogD
and LogKvo:w were calculated as a function of LogP using
Eqs. 2 and 8 (Table 1). In Schmitt’s model [11], compounds
were divided into acids, neutrals, bases and zwitterions and
Kps were calculated accordingly. LogMA and LogHSA
were estimated using the regression equations Eqs. 6 and 7
Table 3 Physicochemical and/or in vivo inputs for a classifier
algorithm and included algorithms for each group
Inputs for
classification
Algorithms
Group 1
(N = 107
compounds)
Muscle Kp, LogP,
fi, fup, ClassaBerezchkovskiy [3]
Bjorkman [4]
Rodgers et al. [9, 10]
Schmitt [11]
Jansson et al. [5]
Poulin and Theil [8]
Group 2 (N = 97
compounds)
Vss, LogP, fi, fup,
ClassaBerezchkovskiy [3]
Rodgers et al. [9, 10].
Schmitt [11]
Jansson et al. [5]
Yun and Edginton [16]
Group 3 (N = 122
compounds)
LogP, fi, fup,
ClassaBerezchkovskiy [3]
Rodgers et al. [9, 10]
Schmitt [11]
a Class: acid–base properties of a compound (A: acid, B: base
(pKa C 7.4), WB: base (pKa \ 7.4), Z: zwitterions
6 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
(Table 1). In the Yun and Edginton algorithm [16], Kps were
estimated by using equations for moderate to strong bases
and equations for acids, neutrals and zwitterions. The degree
of ionization at a specific tissue pH was calculated using
Eq. 1 (Table 1). Because Poulin and Theil’s Kp prediction
approach [8] aimed to predict Kps for bases, only Kps of
bases were estimated. In Bjorkman’s model [4], Kp predic-
tion equations for acids and bases were separately developed
and Kps were calculated accordingly.
To ensure that the use of estimated input parameters (as
defined in Table 1) led to Kp predictions that were similar
to those predicted using existing algorithms, a comparison
of outcomes was made. In this study section, Kps were
calculated for Rodgers et al. [9, 10], Schmitt [11] and
Jansson et al. [5] (those algorithms that presented their
predicted Kps based on experimental inputs only) using the
inputs that were either experimental (those required by the
Classification trees) and, if appropriate, estimated based on
equations in Table 1. The difference between the predicted
Kp values from the publications (ai) vs. those calculated in
this study section (bi) were compared. Mean fold error
(MFE, Eq. 12), average fold error (AFE, Eq. 13), absolute
average fold error (AAFE, Eq. 14) and root mean square
error (RMSE, Eq. 15) were used to measure the deviation
of the published algorithm predicted Kps and the Kps
calculated using experimental and estimated inputs.
MFE ¼ 1
n
Xn
1
ai
bi
� �ð12Þ
AFE ¼ 10
1n
Pn
1
log10aibi
� ��
ð13Þ
AAFE = 10
1n
Pn
1
log10aibi
� �
�
ð14Þ
RMSE ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn1
½log10ðaiÞ � log10ðbiÞ�2
n
vuuutð15Þ
Dataset development
Using the relevant experimental and estimated input
parameters required for Group 1, 2, and 3 (Table 3),
comparisons between experimentally derived Kps and
predicted Kps from each applicable algorithm were made.
The Kp prediction algorithm that resulted in a value that
was closest to the experimental Kp was selected as the
best-predicting algorithm for the compound within the
specific tissue. The best-predicting algorithm for the
compound was then coded numerically so that the com-
pound could be categorized by the best-predicting model.
This coded information was used as the dependent variable
in the statistical analysis.
Random forest
A random forest was utilized to build a classifier that
identified the most accurate Kp prediction algorithm. The
classification analysis was performed using the random-
Forest package (4.6–6) for the statistical software R (ver-
sion 2.12) [28]. Initially, default parameter values were
used for the number of trees in a forest (ntree = 500) and
number of variables (mtry =p
M, where M is the total
number of variables). By using the rfcv function embedded
in the randomForest package [18, 28], the optimal mtry that
resulted in the smallest cross-validated error was chosen.
Consequently, the random forest was tuned using the
optimal value of the number of variables (mtry) [29]. A
final random forest model was generated by setting the
optimized variable of mtry when trees were grown.
Evaluation of the random forest using cross validation
The developed random forests for Classification trees #1,
#2 and #3 corresponding to each group in Table 3 were
evaluated. The predictive performance of each Classifica-
tion tree was evaluated with the total dataset by using
20-fold cross-validation [29]. This method assumes that a
random forest developed from 95 % (19/20) of a total
dataset is reasonably the same as a final random forest that
is developed using 100 % of the total dataset.
The 20-fold validation and analysis were performed as
follows:
(i) The total dataset was partitioned into 20 subsets.
(ii) A random forest was created using a training set
composed of 19 subsets. The developed random
forest then predicted the classification for samples in
the 20th subset as a test set. The predicted classifi-
cation (e.g., best algorithm for compound X in tissue
Y = Jansson et al. [5]) for the test set was recorded.
This step was repeated 20 times so that each subset
was used only once as a test set. As a result, each
compound was used once as a test compound.
(iii) For the test dataset including all compounds, each
compound was associated with a random forest
generated best prediction algorithm.
(iv) The rate of correct classification, per tissue, was
calculated (Eq. 16) and compared with random
permutation rates (Eq. 17).
Rate of correct classification¼ 1
n
Xn
1
IðExpi¼ PrediÞ ð16Þ
where Expi is the experimentally derived Kp value, Predi is
the predicted Kp value and n is the number of Kp values for
each tissue.
J Pharmacokinet Pharmacodyn (2014) 41:1–14 7
123
Random permutation rate
¼ 1
number of classes to be classifiedð17Þ
(v) Kp was calculated using the algorithm identified as
the most accurate during the cross-validation.
Using this method, the predictive performance of pre-
viously published algorithms was compared to the random-
forest-generated Kps for each tissue using the same dataset
as shown in Tables s.2 and s.3.
Model evaluation: comparative prediction accuracy
The prediction accuracy of each Classification tree was
compared to each of the previously published algorithms
within its group (Table 3) to assess if any one previously
published algorithm performed better than the Classifica-
tion tree. Thus, using inputs required by the Classification
tree with all others estimated based on Table 1, the pre-
diction accuracy of the Classification tree (as defined from
the cross-validation step) was compared to the prediction
accuracy of each algorithm in the group. Prediction accu-
racy was based on a comparison of the predicted (ai) and
experimentally derived (bi) Kps for each algorithm. To
assess the overall precision of each algorithm, RMSE was
calculated (Eq. 15) as well as the overall percentage within
k-fold deviation (k = 1.25, 1.5, 2, 3) (Eq. 18).
% within k� fold error ¼ 1
n
Xn
i¼1
I1
k� ai
bi
� k
� �" #
� 100 % ð18Þ
where I(�) is an indicator function, k = 1.25, 1.5, 2, 3.
Tissue-specific RMSE was also calculated to compare
the precision of the models with respect to the tissue. As a
measure of bias, AFE (Eq. 13) was calculated. The AAFE
(Eq. 14) was also calculated to quantify the overall mag-
nitude of the deviation between the predicted and the
experimentally derived Kp values.
Results
Dataset
The dataset was composed of a total of 122 compounds
with 852 Kps in 11 tissues (Tables s.2, s.3). The dataset
consisted of 29 acids, 70 bases (63 moderate to strong
bases with pKa C 7.4 and 7 weak bases with pKa \ 7.4),
12 neutral compounds and 11 zwitterionic compounds.
Kp calculations according to the previously published
algorithms
For Kp calculation according to Rodgers et al. [9], the
prediction accuracies based on the use of the previously
published estimation equation for KpuBC (Table 1, Eq. 4)
and the developed regression equation for B:P (Table 1,
Eq. 5) as used in Eq. 3 were compared. The use of the
developed regression equation resulted in a more accurate
prediction of Kps with lower tissue-specific RMSE values
(Table s.4). As a result, the developed regression equation
(Eq. 5) was used in all subsequent calculations.
With the use of estimated input parameters (e.g. B:P,
LogKvo:w), the Kps calculated using the algorithm of Rodgers
et al. [9, 10] resulted in an under-prediction with a 6 %
decrease (on average) in AFE value compared to the Kps
calculated by the author with the experimentally determined
Table 4 Summary of random forest parameter and classification performance
Classification tree #1 Classification tree #2 Classification tree #3
n mtry Rate of correct
classification
% within
twofold error
n mtry Rate of correct
classification
% within
twofold error
n mtry Rate of correct
classification
% within
twofold error
Adipose 66 5 0.359 51.6 65 2 0.384 54.6 69 4 0.638 60.0
Bone 41 5 0.561 73.2 41 5 0.561 75.6 42 2 0.643 50.0
Brain 78 5 0.385 56.4 76 5 0.395 51.3 90 4 0.644 47.8
Gut 68 5 0.368 72.1 65 5 0.446 80.0 68 4 0.618 60.3
Heart 91 5 0.452 83.3 83 5 0.446 80.7 96 4 0.563 60.4
Kidney 89 5 0.341 73.9 86 5 0.386 69.8 94 4 0.684 55.3
Liver 84 5 0.243 64.2 84 5 0.429 63.1 88 4 0.693 51.1
Lung 93 5 0.312 67.8 85 5 0.365 64.7 95 2 0.589 56.8
Muscle 108 5 0.630 78.7 93 5 0.355 79.6 108 4 0.667 80.5
Skin 64 5 0.328 77.4 61 5 0.393 77.1 64 2 0.719 71.9
Spleen 36 5 0.583 61.1 33 2 0.424 63.6 36 4 0.528 58.3
8 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
parameters (Table s.5). Using estimated input parameters
(Table 1, Eqs. 2, 6, 7, 8), the Kps calculated using each algo-
rithm were in agreement with the Kps obtained by both Jansson
et al. [5] and Schmitt [11] (Tables s.6, s.7, respectively).
Construction of predictive random forest models:
classification trees #1, # 2 and # 3
For each tissue, three Classification trees (Table 4) were
developed using the random forest method. The number of
samples and the chosen mtry are listed in Table 4. The clas-
sification performance of each Classification tree is indicated
by the rate of correct classification. Classification trees resul-
ted in a higher rate of correct classification than random per-
mutation rates of 1/6, 1/5 and 1/3 based on the probability of a
correct classification when there are n categories (1/n). The
prediction accuracy for each Classification tree was indicated
by the percentage of predicted values within a twofold devi-
ation of the experimentally derived Kps for each tissue
(Eq. 18). Based on Table 4, the relationship between the rate
of correct classification and the Kp prediction accuracy is
tissue specific. The rates of correct classification for Classifi-
cation trees #1 and #2 were relatively lower than that the rate of
Classification tree #3 because Classification tree #3 had only
two or three algorithms to classify whereas Classification tree
#1 had 5 to 6 and Classification tree #2 had 4 to 5 (Table 3).
Comparative assessment of Kp prediction accuracy
of Classification trees and published equations
Comparison of prediction accuracy of Classification tree #
1 and published equations
To assess whether Classification tree #1 offered improved
predictive performance over any one relevant published
algorithm alone, the tissue AFE, AAFE and RMSE were
calculated using the same dataset (Tables s.2, s.3). A plot of
percentage within k-fold deviation from experimentally
derived values showed that predictions based on Classification
tree #1 performed well, with 25.6, 49.7 and 68.8 % falling
within 1.25-, 1.5- and 2-fold deviation from the experimentally
derived Kp values, respectively (Fig. 1). The global RMSEs of
the algorithms in Group 1 indicate that the Kp prediction errors
are similar for Jansson et al. [5], Rodgers et al. [9, 10] and
Classification tree #1 (Table 5). However, Rodgers et al. [9,
10] and Classification tree #1 tended to under-predict Kp, with
AFE values of 0.89,and 0.94, respectively. Rodgers et al. [9,
10] under-predicted Kps in bone, kidney and liver. Jansson
et al. [5] showed the smallest RMSE value of 0.43 but appeared
to over-predict Kp with an AFE of 1.27 (Fig. 1; Table 5). The
over-prediction of Kps by Jansson et al. [5] was observed in
kidney, liver and adipose tissue. The overall bias of deviation
between the experimentally derived Kps and those estimated
using Classification tree #1 was the smallest in Group 1, with
an AFE value of 0.94 (Table 5). This result is further supported
by the tissue-specific box and whisker plots (Figure s.1 in
supplementary material), where the boxes for Classification
tree #1 are small and centered around zero and do not show
evidence of serious under- or over-prediction. Tissue-specific
RMSEs showed that the Kp prediction of Jansson et al. [5]
resulted in the smallest error for 7 out of 10 tissues in Group 1
(Table s.8). It was observed that Berezhkovskiy [3], Schmitt
[11] and Bjorkman’s models [4] tended to over-predict Kps
with an AFE value larger than 1 (Table 5). On the other hand,
Rodgers et al. [9, 10] and Poulin and Theil’s [8] models tended
to under-predict the Kps with an AFE value\1.
Comparison of prediction accuracy of Classification tree
#2 and published equations
To compare the predictive performance of the published
algorithms and Classification tree #2, tissue AFE, AAFE,
Group 1
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
Per
cent
age
with
in k
-fol
d er
ror
0
20
40
60
80
100
BerezhkovskiyRodgers et al SchmittJansson et alBjorkmanPoulin and TheilClassification tree #1
Fig. 1 Percentage of predicted
Kps from each algorithm in
Group 1 within k-fold error
(x-axis) of the experimentally
derived Kps
J Pharmacokinet Pharmacodyn (2014) 41:1–14 9
123
and RMSE were calculated. Both Classification tree #2 and
Yun and Edginton [16] yielded more accurate Kp predictions
with higher percentages within k-fold deviation from the
experimentally derived Kps (k = 1.25–3) compared to other
algorithms (Fig. 2). The prediction performances of both
Classification tree #2 and Yun and Edginton’s algorithm [16]
were very similar with almost the same AFE, AAFE, global
RMSE and tissue-specific RMSE values (Table 5, Table
s.9). The plot of percentage within k-fold deviation from
experimentally derived values showed that Kp prediction
based on Classification tree #2 performed well, with 31.9 and
50.4 % falling within 1.25- and 1.5-fold deviation from
the experimentally derived Kps, respectively (Fig. 2). In 7
out of 11 tissues, the Yun and Edginton algorithm [16]
resulted in the smallest error associated with Kp estimates
(Table s.9). Jansson et al. [5] showed an over-prediction of
Kps, which was mainly due to the over-prediction in adipose
and liver Kps (Figure s.2). Schmitt’s algorithm [11] tended to
over-predict Kps (Table 5) especially in adipose, brain, heart
and skin (Figure s.2). Although Berezhkovskiy’s [3]
Table 5 Summary of overall prediction performance for Group 1, 2 and 3
Group 1
Berezhkovskiy [3] Rodgers
et al. [9, 10]
Schmitt [11] Jansson
et al. [5]
Bjorkman [4] Poulin and
Theil [8]
Classification
tree #1
AFE 1.14 0.89 1.37 1.27 1.52 0.16 0.94
AAFE 3.21 2.34 3.36 1.98 2.81 8.34 2.00
RMSE 0.67 0.51 0.66 0.43 0.62 1.25 0.49
Group 2
Berezhkovskiy [3] Rodgers
et al. [9, 10]
Schmitt [11] Jansson et al. [5] Yun and Edginton [16] Classification
tree #2
AFE 1.02 0.93 1.28 1.21 1.01 1.03
AAFE 2.92 2.20 3.20 2.06 1.78 1.82
RMSE 0.60 0.45 0.64 0.45 0.36 0.37
Group 3
Berezhkovskiy [3] Rodgers et al. [9, 10] Schmitt [11] Classification
tree #3
AFE 1.16 0.91 1.37 0.95
AAFE 3.18 2.33 3.27 2.14
RMSE 0.66 0.52 0.65 0.45
AFE average fold error, AAFE absolute average fold error, RMSE root mean square error
Group 2
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
Per
cent
age
with
in k
- fo
ld e
rror
0
20
40
60
80
100
BerezhkovskiyRodgers et alSchmittJansson et alYun and EdgintonClassification tree #2
Fig. 2 Percentage of predicted
Kps from each algorithm in
Group 2 within k-fold error
(x-axis) of the experimentally
derived Kps
10 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
algorithm resulted in an AFE value close to 1 (1.02), its
AAFE value was 2.92 (Table 5). This finding implies that Kp
predictions were less accurate and that there was similar
under- and over-predictions of Kps. The box and whisker
plots (Figure s.2) show that there was over-prediction in
brain and adipose tissue Kps and an under-prediction in gut
and lung Kps.
Comparison of prediction accuracy of Classification tree
#3 and published equations
To compare the predictive performance of the published
algorithms and Classification tree #3, tissue AFE, AAFE
and RMSE were calculated. Classification tree #3 resulted
in accurate predictions in Group 3, with the highest per-
centages within k-fold deviation from experimentally
derived Kps (Fig. 3), the smallest global RMSE and AAFE
and an AFE closest to 1 (Table 5). In 9 out of 11 tissues,
Classification tree #3 resulted in the smallest tissue-specific
RMSEs (Table s.10). The Berezhkovskiy [3] and Schmitt
[11] algorithms were less accurate with an AAFE [ 3, and
both had a tendency to over-predict the Kps (Table 5).
Rodgers et al. [9, 10] under-predicted the Kps especially in
bone, kidneys, liver and lungs (Figure s.3).
The global RMSE, AFE, and AAFE values for Clas-
sification trees #1, #2 and #3 were comparable. However,
in the case of Classification tree #3, the percentage within
k-fold deviation from the experimentally derived Kps was
lower than those of Classification trees #1 and #2.
Discussion
One study objective was to develop a tool for Kp prediction
when only a limited number of input parameters are
available. For each tissue, Classification trees #1, #2 and
#3, which depended on user-supplied input parameters (i.e.,
LogP, pKa, fup, Vss and muscle Kp) as well as estimated
input parameters that were required but not deemed readily
available, were constructed. An assessment of the validity
of using estimation equations as a replacement for handling
generally unavailable input parameters was made. Rodgers
et al. [9, 10] demonstrated that the use of experimentally
determined inputs such as B:P and LogKvo:w resulted in
more accurate Kp predictions with lower tissue-specific
RMSEs when compared to Kps calculated using estimated
inputs (Tables s.1–3 in supplementary material). However,
the use of estimated inputs for the Rodgers et al. [9, 10]
algorithm resulted in Kpus that were comparable, although
not superior to, Kpus calculated using experimentally
determined inputs. The accuracy metrics such as tissue-
specific RMSEs and AFEs were comparable (Table s.5).
Similarly, in Jansson et al.’s algorithm [5], the use of an
experimentally determined muscle Kp resulted in more
accurate predictions in the heart, kidney, liver and lung
when compared to the prediction accuracy of Jansson et al.
[5], which used a muscle Kp that was estimated from Vss
(Table s.6). As a result, Jansson et al.’s algorithm [5] was
selected as the best-predicting algorithm in Classification
tree #1, which used muscle Kp as an input, more often than
in Classification tree #2, which used Vss as an input.
Overall, Kp predictions with the estimated input parame-
ters were deemed sufficiently agreeable to justify their use
as inputs to the Classification trees.
Kp prediction via a Classification tree depends on two
important factors. The first factor is the accuracy of each
Kp prediction algorithm in each group (e.g., Rodgers et al.
[9, 10], Jansson et al. [5]), and the second factor is the
classification performance of a classifier (i.e., a random
forest). Although poor prediction of Kps and/or poor
Group 3
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
Per
cent
age
with
in k
-fold
err
or
0
20
40
60
80
100
BerezhkovskiyRodgers et alSchmittClassification tree #3
Fig. 3 Percentage of predicted
Kps from each algorithm in
Group 3 within k-fold error (x-
axis) of the experimentally
derived Kps
J Pharmacokinet Pharmacodyn (2014) 41:1–14 11
123
classification by a classifier can lead to an undesirable
outcome, there is no clear relationship between the accu-
racy of a Kp prediction method and classification perfor-
mance. A higher rate of correct classification will not
always result in an overall lower RMSE. Even though the
best-performing algorithm is correctly selected for certain
compounds it is the degree of error from the incorrectly
classified compounds that contributes most to the RMSE.
This result was observed because the predicted Kp from an
algorithm that was classified by the random forest can
largely deviate from the corresponding experimentally
derived Kp (Table 4). Thus, the interplay of these two
factors should be taken into consideration in the interpre-
tation of Kp prediction via Classification trees #1, #2
and #3.
When experimentally determined muscle Kp and phys-
icochemical parameters (e.g., LogP, pKa, and fup) are
available, six Kp prediction algorithms can be used,
namely, the algorithms used in Classification tree #1. It was
observed that the use of Classification tree #1 improved the
Kp prediction accuracy and bias over any one of the six
prediction algorithms (Table 5; Fig. 1) with the algorithm
that requires both physico-chemical and in vivo inputs,
Jansson et al.[5], having similar accuracy metrics (but a
higher bias).
Both the Yun and Edginton algorithm [16] and Classi-
fication tree #2 had a high Kp prediction accuracy with a
high percentage within k-fold deviation from the experi-
mentally derived Kps. Notably, both the Jansson et al. [5]
and Yun and Edginton [16] models that used Vss showed
high accuracy and precision in Kp prediction. This result
further implies that the availability of the in vivo parameter
Vss and the use of these correlation models improve Kp
prediction accuracy over TCB algorithms.
TCB models [3, 9–11] require a minimal number of
input parameters such as ex vivo fup and physicochemical
parameters. Classification tree #3 identified the best-pre-
dicting model based on basic inputs (pKa, fup, LogP) and
improved the Kp prediction accuracy over any one TCB
prediction algorithm alone. It is expected that Classifica-
tion tree #3 will be the most applicable in early drug dis-
covery when compared to Classification trees #1 and #2
because the use of the Classification trees #1 and #2 is
limited by the availability of an in vivo parameter (i.e.,
muscle Kp or Vss).
For the most part, Classification trees exhibited better
prediction performance in most tissues with little bias
toward over- or under-prediction. According to the plots of
the percentage of predicted Kps within 1.25- and 1.5-fold
deviations from the experimentally derived Kps, Classifi-
cation trees #1, #2 and #3 showed higher percentages than
the other algorithms in each group (Figs. 1, 2, 3). Based on
these results, it can be concluded that classifications trees
offer advantages over the use of any single algorithm to
predict tissue-specific Kps for a compound. Further, on
comparison of the performance of algorithms from the
trees, algorithms that combine both physico-chemical
inputs and in vivo inputs perform better than TCB models
and, as a result, Classification trees #1 and #2 performed
better (greater percentage within k-fold error) than Clas-
sification tree #3, which only incorporated physico-chem-
ical inputs. Correlation-based models depend on the dataset
that is used in their derivation. A correlation model may
perform better if the chemical properties of a new com-
pound are similar to those used to develop the corre-
sponding regression equations. However, this is only true if
the chemical properties are the only determinants for the
tissue distribution of the compound. In the case in which
the chemical properties of the new drug are not similar to
the chemical properties that were used for the development
of the regression equations, a TCB model may perform
better than a correlation model because a TCB model is not
empirical but mechanistic. Therefore, the performance of
Kp prediction algorithms should be evaluated using an
external dataset not used for the development of the cor-
relation model because the prediction performance of a
regression-based algorithm could be artificial depending on
the dataset. Recently, researchers compared the predictive
performance of Kp algorithms using Vss as an outcome.
Using an independent dataset [17], it was observed that a
correlation model (i.e., Jansson et al. [5]) exhibited better
Kp prediction performance than a TCB model (i.e., Rod-
gers et al. [9, 10]). However, the TCB models do have an
advantage in that they are applicable for any species if the
tissue-specific physiological parameters are available [9].
The accuracy of the TCB method depends on how well
the factors describing the underlying process in tissue
distribution (e.g., compound binding affinity to cell con-
stituents) are formulated. Unreasonable formulation in the
structure or uncertainty in physiological and/or chemical
parameter values can lead to poor prediction of Kp. The
underlying mechanism of a Kp prediction algorithm may
not hold for a compound under certain physicochemical
conditions. For example, a different approach was needed
to overcome the poor Kp prediction accuracy for highly
lipophilic compounds. It is known that the high lipophil-
icity of a compound is associated with a large tissue dis-
tribution (i.e., large Kp, large Vss). Rodgers et al. [30]
demonstrated that Vss increases exponentially when LogP
increases above a LogP of 6. In terms of the currently
available algorithms (e.g., Jansson et al. [5], Rodgers et al.
[9, 10], Yun and Edginton [16]), all equations are designed
such that an increase in lipophilicity leads to an increase in
Kp values. Above a certain LogP value, however, this
relationship between distributional parameters and LogP
may not hold true because Kp and/or Vss may reach a
12 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123
plateau [31, 32]. Therefore, in Poulin and Haddad’s sim-
plified model [32] for highly lipophilic compounds
(LogP [ 6), regardless of a compound’s acid–base–neutral
properties, compound partitioning into neutral lipids is
prevalent [32] and the plateau concept holds true. In the
present study, LogP values ranged from -3 to 6. Thus, all
of the algorithms included in the Classification trees are not
appropriate to use with compounds for which LogP [ 6.
Therefore, user caution is recommended in the Kp pre-
diction of highly lipophilic compounds (LogP [ 6).
Because drug compounds tend to have LogP values that are
\6, this is not expected to affect the accuracy of Kp pre-
diction for small drug molecules. For environmental con-
taminants, however, LogP values often exceed 6 and the
use of certain algorithms will over-predict Kps.
In the presence of metabolic elimination and/or transport
carriers, there would be a discrepancy between true and
estimated Kp values under the assumption of no elimina-
tion or carrier-mediated tissue partitioning. The empirical
model for estimating Kps is highly dependent on the
dataset used. If a dataset is composed of numerous com-
pounds for which tissue distribution is affected by elimi-
nation or active transport, observations in the dataset can
be influential in determining the coefficient of an equation,
which can lead to the poor Kp prediction of a new obser-
vation. The relationship between in vivo parameters, the
chemical properties of a compound and tissue Kps is not
currently robust enough to describe tissue partitioning in
the presence of these processes. Thus, user discretion is
recommended in using Kp prediction algorithms for com-
pounds that are significantly affected by metabolic elimi-
nation or carrier-mediated transport. With that being said,
when Kps are used within a PBPK model framework, this
passive diffusion Kp is the desired parameter value. The
effect of extensive metabolism in an eliminating organ or
the effect of transporters in tissue distribution is taken into
account, not through Kp but through the incorporation of
enzymes or transporters.
One of the limitations of classification-tree-based Kp
prediction is that it is mathematically complex. To over-
come this problem, Classification trees #1, #2 and #3 are
available as a web-based program for public consumption
(http://spark.rstudio.com/kprftree/myapp/). This program
features a Classification tree calculator that defines the
best-predicting algorithm as well as a Kp calculator that
calculates Kp values using the best-predicting algorithm.
In conclusion, this study proposed novel Classification
trees for predicting the best-performing Kp prediction
algorithm as a function of tissue and compound. Classifi-
cation-tree-based Kp prediction overcomes the limitations
of any one algorithm by harnessing the best components of
each and by providing Kp prediction for 11 tissues, some of
which are not included in all algorithms (e.g. the Jansson
model does not provide an equation for spleen). The
Classification trees, especially those relying solely on
physico-chemical inputs, had better prediction performance
over any one algorithm, within the group. Further, Classi-
fication trees #1 and #2 performed better than Classifica-
tion tree #3 suggesting that researchers with any relevant
in vivo inputs should use them unless the compound has
features that are expected to be vastly different from those
compounds used in the development of these Classification
trees (see Table s.2). In these cases, Classification tree #3
and/or a TCB model should also be consulted. It is hoped
that an increased prediction performance of Kps will lead
to more appropriate parameterization of PBPK models and
will enhance the predictability of a compound’s
pharmacokinetics.
References
1. Poulin P, Theil FP (2000) A priori prediction of tissue:plasma
partition coefficients of drugs to facilitate the use of physiologi-
cally-based pharmacokinetic models in drug discovery. J Pharm
Sci 89:16–35
2. Peters, SA (2012) Pharmacokinetic principles, in physiologically-
based pharmacokinetic (PBPK) modeling and simulations: prin-
ciples, methods, and applications in the pharmaceutical industry.
Wiley, Hoboken
3. Berezhkovskiy LM (2004) Volume of distribution at steady state
for a linear pharmacokinetic system with peripheral elimination.
J Pharm Sci 93(6):1628–1640
4. Bjorkman S (2002) Prediction of the volume of distribution of a
drug: which tissue–plasma partition coefficients are needed?
J Pharm Pharmacol 54(9):1237–1245
5. Jansson R, Bredberg U, Ashton M (2008) Prediction of drug
tissue to plasma concentration ratios using a measured volume of
distribution in combination with lipophilicity. J Pharm Sci
97(6):2324–2339
6. Peyret T, Poulin P, Krishnan K (2010) A unified algorithm for pre-
dicting partition coefficients for PBPK modeling of drugs and
environmental chemicals. Toxicol Appl Pharmacol 249(3):197–207
7. Poulin P, Schoenlein K, Theil FP (2001) Prediction of adipose
tissue: plasma partition coefficients for structurally unrelated
drugs. J Pharm Sci 90(4):436–447
8. Poulin P, Theil FP (2009) Development of a novel method for
predicting human volume of distribution at steady-state of basic
drugs and comparative assessment with existing methods.
J Pharm Sci 98(12):4941–4961
9. Rodgers T, Leahy D, Rowland M (2005) Physiologically based
pharmacokinetic modeling 1: predicting the tissue distribution of
moderate-to-strong bases. J Pharm Sci 94(6):1259–1276
10. Rodgers T, Rowland M (2006) Physiologically based pharmaco-
kinetic modelling 2: predicting the tissue distribution of acids, very
weak bases, neutrals and zwitterions. J Pharm Sci 95(6):1238–1257
11. Schmitt W (2008) General approach for the calculation of tissue
to plasma partition coefficients. Toxicol In Vitro 22(2):457–467
12. Poulin P, Krishnan K (1995) A biologically-based algorithm for
predicting human tissue: blood partition coefficients of organic
chemicals. Hum Exp Toxicol 14(3):273–280
13. Poulin P, Theil FP (2002) Prediction of pharmacokinetics prior to
in vivo studies. 1. Mechanism-based prediction of volume of
distribution. J. Pharm Sci 91(1):129–156
J Pharmacokinet Pharmacodyn (2014) 41:1–14 13
123
14. Poulin P, Krishnan K (1996) A tissue composition-based algo-
rithm for predicting tissue:air partition coefficients of organic
chemicals. Toxicol Appl Pharmacol 136(1):126–130
15. Poulin P, Ekins S, Theil FP (2011) A hybrid approach to
advancing quantitative prediction of tissue distribution of basic
drugs in human. Toxicol Appl Pharmacol 250(2):194–212
16. Yun YE, Edginton AN (2013) Correlation-based prediction of
tissue-to-plasma partition coefficients using readily available
input parameters. Xenobiotica 43(10):839–852
17. Jones RD, Jones HM, Rowland M, Gibson CR, Yates JW, Chien
JY et al (2011) PhRMA CPCDC initiative on predictive models
of human pharmacokinetics, part 2: comparative assessment of
prediction methods of human volume of distribution. J Pharm Sci
100(10):4074–4089
18. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
19. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classifi-
cation and regression trees. Wadsworth International Group,
Belmont
20. Martin A, Bustamante P, Chun AHC (1993) Physical pharmacy:
physical chemical principles in the pharmaceutical sciences. Lea
& Febiger, Philadelphia, pp 297–298
21. Zhang H (2005) A new approach for the tissue-blood partition
coefficients of neutral and ionized compounds. J Chem Inf Model
45(1):121–127
22. Hinderling PH (1997) Red blood cells: a neglected compartment
in pharmacokinetics and pharmacodynamics. Pharmacol Rev
49(3):279–295
23. Kurz H, Fichtl B (1983) Binding of drugs to tissues. Drug Metab
Rev 14(3):467–510
24. Arundel P (1997) A multi-compartmental model generally
applicable to physiologically-based pharmacokinetics. 3rd IFAC
Symposium: Modelling and control in biomedical systems; 1997
23–26 March; University of Warwick, Coventry UK: AstraZen-
eca, London, UK
25. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey
A et al (2012) ChEMBL: a large-scale bioactivity database for
drug discovery. Nucleic Acids Res 40:D1100–D1107
26. Rowland M, Tozer T (2011) Clinical pharmacokinetics and
pharmacodynamics: concepts and applications, 4th edn. Wolters
Kluwer Health/Lippincott William & Wilkins, Philadelphia
27. Paixao P, LsF Gouveia, Morais JA (2009) Prediction of drug
distribution within blood. Eur J Pharm Sci 36(2):544–554
28. R Development Core Team (2008) R: a language and environment
for statistical computing. R foundation for statistical computing,
Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org
29. Svetnik V, Liaw A, Tong C, Wang T (2004) Application of Brei-
man’s random forest to modeling structure-activity relationships of
pharmaceutical molecules. In: Roli F, Kittler J, Windeatt T (eds)
Multiple classier systems, fifth international workshop, MCS 2004,
Proceedings, Cagliari, Italy, 9-11 June 2004. Lecture Notes in
Computer Science, vol 3077. Springer, Berlin pp 334–343
30. Rodgers T, Rowland M (2007) Mechanistic approaches to vol-
ume of distribution predictions: understanding the processes.
Pharm Res 24(5):918–933
31. Haddad S, Poulin P, Krishnan K (2000) Relative lipid content as
the sole mechanistic determinant of the adipose tissue: blood
partition coefficients of highly lipophilic organic chemicals.
Chemosphere 40(8):839–843
32. Poulin P, Haddad S (2012) Advancing prediction of tissue dis-
tribution and volume of distribution of highly lipophilic com-
pounds from a simplified tissue-composition-based model as a
mechanistic animal alternative method. J Pharm Sci 101(6):
2250–2261
33. Leo A, Hansch C, Elkins D (1971) Partition coefficients and their
uses. Chem Rev 71(6):525–616
14 J Pharmacokinet Pharmacodyn (2014) 41:1–14
123