Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Data Integration in Early Drug Development Phase
Adetayo Kasim
The QSTAR Modeling Framework
2
Outline
• Background and Data Structure
• Gene X Bioassay X Fingerprint Analysis
• Co-clustering framework
• Biclustering Framework
• QSTAR Consortium
DRUG DISCOVERY STAGES
Drug Discovery Phases
3
DRUG DISCOVERY STAGES
• The tradition drug discovery approach relies mostly on chemical properties at early stage
• Potential side effects are often identified in later toxicity studies
• Less than 25% of success rate in Phase III trials
• Early identification of potential sides effects may prevent expensive Phase II & Phase III trials.
• Early toxicity detection can be facilitated by incorporating relevant biological data in the early stages of drug developments, particularly in lead optimisation.
4
QSTAR: FRAMEWORK
5
Z
X
Y Bioactivity
GeneExpression
ChemicalStructure
6
MC
C
C
MM
CM
z
z
z
z
z
z
z
z
z
Z
2
1
2
22
12
1
21
11
NC
C
C
NN
CN
y
y
y
y
y
y
y
y
y
Y
2
1
2
22
12
1
21
11
gC
C
C
gg
CG
x
x
x
x
x
x
x
x
x
X
2
1
2
22
12
1
21
11
QSTAR: Data Structures
QSTAR: Gene X Bioassay X Fingerprint Analysis
7
• To find a group of genes that are jointly associated with chemical structure
and bioactivity data.
• To find a group of genes that are associated with a chemical structure, but
that are conditionally independent of bioactivity data.
Gene expression
Bio
acti
vity
Present
QSTAR: Joint Modelling
• Significant association between gene expression and fingerprint
• Significant association between bioactivity and fingerprint
• Significant association between gene expression and bioactivity data 8
Gene expression
Bio
acti
vity
Absent
Present
• Significant association between gene expression and fingerprint
• Significant association between bioactivity and fingerprint
• NO association between gene expression and bioactivity data
QSTAR: Joint Modelling
9
ji
ikik
jkjkjk
i
j
Z
ZN
Y
X
ik,
~
• 𝛽𝑖𝑘 denotes the effect of the kth fingerprint on the ith bioassay
• 𝛼𝑗𝑘 denotes the effect of the kth fingerprint on the jth gene.
QSTAR: Joint Modelling
2
2
iij
jj
YYX
YiXX
ji
Where,
10
11
QSTAR: Joint Modelling
• Examples of genes with significant association with
fingerprint and bioactivity data
gene expression (X)
bioa
ctiv
ity (Y
)
gene FNIP1
12
QSTAR: Joint Modelling
• Examples of genes with significant association with fingerprint,
but conditionally independent of bioactivity
gene expression (X)
bioa
ctiv
ity (Y
)
gene TXNRD1
13
QSTAR: Path Analysis
• To find genes with direct and indirect effect of chemical structures on
bioactivity data
• To find genes with only direct effect of chemical structure
14
QSTAR: Path Analysis
jkzyji ZY 2
• 𝜗𝑧𝑦𝑗 is the total effect of the kth fingerprint on the ith bioassay
• 𝜗𝑧𝑥𝑗is the Direct effect the fingerprint on the jth gene
• 𝜔𝑧𝑥𝑗 = 𝜗𝑥𝑦𝑗*𝜗𝑧𝑦𝑗 is the indirect effect of the fingerprint on gene
expression given the bioactivity data
jkzxjixyjj ZYX 1
15
QSTAR: Conditional Model
• 𝜃2𝑖 is the unadjusted effect of fingerprint on gene expression data.
kiikj ZZXEg 21)}|({
kiiiikj ZYYZXEg 210)},|({
• 𝜑2𝑖 is the adjusted effect of fingerprint on gene expression data
• This conditional modelling framework fits within the principle of
generalised linear model
Relevance to Radiation Study
16
Z
X
Y Dicentric/ gamma-H2AX
Gene Expression
Low Radiation Exposure Status
• These modelling framework can potentially be used as a surrogate
marker identification/validation in low radiation studies.
QSTAR: Co-clustering Framework
17
• To find a group of compounds with a similar chemical structures and
bioactivity data and their corresponding discriminating gene
signatures
QSTAR: Weighted Clustering
• There are several methods for simultaneous analysis of gene
expression data to overcome the limitation of gene-by-gene analysis.
• Some of the methods are variation of penalised-likelihood approach
• Elastic net
• LASSO
• Clustering methods have also been considered for this purpose
• Hierarchical clustering method
• K-means
• In co-clustering approach, we are interested in multi-view clustering
of two data matrices.
18
QSTAR: Weighted Clustering
19
20
𝜔𝑧 = 0.45 𝜔𝑧 = 0.55
Chemical Structure Bioactivity
QSTAR: Weighted Clustering 0
.00
.51
.01
.5
He
igh
t
JnJ-
1Jn
J-10
JnJ-
14Jn
J-16
JnJ-
18Jn
J-17
JnJ-
26Jn
J-2
JnJ-
4Jn
J-29
JnJ-
7Jn
J-9
JnJ-
8Jn
J-3
JnJ-
5Jn
J-23
JnJ-
6Jn
J-13
JnJ-
15Jn
J-19
JnJ-
11Jn
J-12
JnJ-
22Jn
J-25
JnJ-
31Jn
J-32
JnJ-
33Jn
J-20
JnJ-
28Jn
J-27
JnJ-
30Jn
J-21
JnJ-
24E
RLO
TIN
IBG
EFI
TIN
IB
0.0
0.5
1.0
1.5
Chemical Cluster No.1 2 3 4 5 6 7
0.0
0.5
1.0
1.5
He
igh
t
JnJ-
1Jn
J-27
JnJ-
30 JnJ-
29Jn
J-3
JnJ-
26Jn
J-5
JnJ-
24Jn
J-28
JnJ-
20Jn
J-23
JnJ-
21ER
LOTI
NIB
GEF
ITIN
IBJn
J-2
JnJ-
4Jn
J-7
JnJ-
8Jn
J-9
JnJ-
6Jn
J-13
JnJ-
14Jn
J-19
JnJ-
15Jn
J-18
JnJ-
16Jn
J-17
JnJ-
10Jn
J-22
JnJ-
25Jn
J-11
JnJ-
12Jn
J-31
JnJ-
33Jn
J-320.
00.
51.
01.
5
0.0
0.5
1.0
1.5
He
igh
t
JnJ-
1Jn
J-16
JnJ-
26Jn
J-18
JnJ-
10Jn
J-14
JnJ-
17Jn
J-2
JnJ-
29Jn
J-24
JnJ-
25Jn
J-21
JnJ-
31 JnJ-
6Jn
J-3
JnJ-
9Jn
J-23
JnJ-
5Jn
J-19
JnJ-
4Jn
J-22
JnJ-
7Jn
J-11
JnJ-
12Jn
J-13
JnJ-
15Jn
J-8
JnJ-
32Jn
J-33
JnJ-
20ER
LOTI
NIB
JnJ-
28G
EFIT
INIB
JnJ-
27Jn
J-30
0.0
0.5
1.0
1.5
21
QSTAR: Weighted Clustering
• Group of genes linked to the selected cluster
-2
-1
0
1
JnJ-
1Jn
J-10
JnJ-
14Jn
J-16
JnJ-
18Jn
J-17
JnJ-
26Jn
J-2
JnJ-
4Jn
J-29
JnJ-
7Jn
J-9
JnJ-
8Jn
J-3
JnJ-
5Jn
J-23
JnJ-
6Jn
J-13
JnJ-
15Jn
J-19
JnJ-
11Jn
J-12
JnJ-
22Jn
J-25
JnJ-
31Jn
J-32
JnJ-
33Jn
J-20
JnJ-
28Jn
J-27
JnJ-
30Jn
J-21
JnJ-
24ER
LOTI
NIB
GEF
ITIN
IB
compounds
gene
exp
ress
ion
leve
l (ce
nter
ed)
Genes
FOSL1
FGFBP1
SLCO4A1
RRM2
SH2B3
PHLDA1
CDC6
UBE2C
QSTAR: Biclustering Framework
22
• To find similar local patterns two matrices. Illustration with gene
expression matrix and high content screening matrix
QSTAR: Biclustering
• Finding local patterns in gene expression data based on simultaneous
clustering of genes and compounds
Compounds
Ge
ne
s
Bicluster
23
HCS
Genes
HCS
Genes
Bicluster
Data Matrix
• Methods:
• FABIA
• Multiple Factor Analysis /Sparse Multiple Factor Analysis
QSTAR: Multiview Biclustering
24
QSTAR: Gene expression and High Content screening
25
QSTAR: Pathway Analysis
26
• Exploring potential biological pathways based on existing literatures
and database
• Exploring target predictions based on existing literature and
database
QSTAR: Pathway Analysis
• The are publicly available databases for gene annotations and
functions
• KEEG (http://www.genome.jp/kegg/)
• GO (http://geneontology.org/page/go-database)
• The identified bioassay and chemical structures can also be
explored further using:
• chEMBL (https://www.ebi.ac.uk/chembl/)
• drugBank (http://www.drugbank.ca/)
27
QSTAR: Pathway Analysis
• Example of MPL pathway analysis 28
29
QSTAR CONSORTIUM (http://www.qstar-consortium.org/)
Thank you
30