Data Integration in Early Drug Development Phase The QSTAR … · 2019. 5. 6. · 2 1 2 22 12 1 21 11 QSTAR: Data Structures. QSTAR: Gene X Bioassay X Fingerprint Analysis. 7 •

Data Integration in Early Drug Development Phase

Adetayo Kasim

The QSTAR Modeling Framework

2

Outline

• Background and Data Structure

• Gene X Bioassay X Fingerprint Analysis

• Co-clustering framework

• Biclustering Framework

• QSTAR Consortium

DRUG DISCOVERY STAGES

Drug Discovery Phases

3

DRUG DISCOVERY STAGES

• The tradition drug discovery approach relies mostly on chemical properties at early stage

• Potential side effects are often identified in later toxicity studies

• Less than 25% of success rate in Phase III trials

• Early identification of potential sides effects may prevent expensive Phase II & Phase III trials.

• Early toxicity detection can be facilitated by incorporating relevant biological data in the early stages of drug developments, particularly in lead optimisation.

4

QSTAR: FRAMEWORK

5

Z

X

Y Bioactivity

GeneExpression

ChemicalStructure

6

MC

C

C

MM

CM

z

z

z

z

z

z

z

z

z

Z

2

1

2

22

12

1

21

11

NC

C

C

NN

CN

y

y

y

y

y

y

y

y

y

Y

2

1

2

22

12

1

21

11

gC

C

C

gg

CG

x

x

x

x

x

x

x

x

x

X

2

1

2

22

12

1

21

11

QSTAR: Data Structures

QSTAR: Gene X Bioassay X Fingerprint Analysis

7

• To find a group of genes that are jointly associated with chemical structure

and bioactivity data.

• To find a group of genes that are associated with a chemical structure, but

that are conditionally independent of bioactivity data.

Gene expression

Bio

acti

vity

Present

QSTAR: Joint Modelling

• Significant association between gene expression and fingerprint

• Significant association between bioactivity and fingerprint

• Significant association between gene expression and bioactivity data 8

Gene expression

Bio

acti

vity

Absent

Present

• Significant association between gene expression and fingerprint

• Significant association between bioactivity and fingerprint

• NO association between gene expression and bioactivity data


9

ji

ikik

jkjkjk

i

j

Z

ZN

Y

X

ik,

~

• 𝛽𝑖𝑘 denotes the effect of the kth fingerprint on the ith bioassay

• 𝛼𝑗𝑘 denotes the effect of the kth fingerprint on the jth gene.


2

2

iij

jj

YYX

YiXX

ji

Where,

10

11


• Examples of genes with significant association with

fingerprint and bioactivity data

gene expression (X)

bioa

ctiv

ity (Y

)

gene FNIP1

12


• Examples of genes with significant association with fingerprint,

but conditionally independent of bioactivity

gene expression (X)

bioa

ctiv

ity (Y

)

gene TXNRD1

13

QSTAR: Path Analysis

• To find genes with direct and indirect effect of chemical structures on

bioactivity data

• To find genes with only direct effect of chemical structure

14

QSTAR: Path Analysis

jkzyji ZY 2

• 𝜗𝑧𝑦𝑗 is the total effect of the kth fingerprint on the ith bioassay

• 𝜗𝑧𝑥𝑗is the Direct effect the fingerprint on the jth gene

• 𝜔𝑧𝑥𝑗 = 𝜗𝑥𝑦𝑗*𝜗𝑧𝑦𝑗 is the indirect effect of the fingerprint on gene

expression given the bioactivity data

jkzxjixyjj ZYX 1

15

QSTAR: Conditional Model

• 𝜃2𝑖 is the unadjusted effect of fingerprint on gene expression data.

kiikj ZZXEg 21)}|({

kiiiikj ZYYZXEg 210)},|({

• 𝜑2𝑖 is the adjusted effect of fingerprint on gene expression data

• This conditional modelling framework fits within the principle of

generalised linear model

Relevance to Radiation Study

16

Z

X

Y Dicentric/ gamma-H2AX

Gene Expression

Low Radiation Exposure Status

• These modelling framework can potentially be used as a surrogate

marker identification/validation in low radiation studies.

QSTAR: Co-clustering Framework

17

• To find a group of compounds with a similar chemical structures and

bioactivity data and their corresponding discriminating gene

signatures

QSTAR: Weighted Clustering

• There are several methods for simultaneous analysis of gene

expression data to overcome the limitation of gene-by-gene analysis.

• Some of the methods are variation of penalised-likelihood approach

• Elastic net

• LASSO

• Clustering methods have also been considered for this purpose

• Hierarchical clustering method

• K-means

• In co-clustering approach, we are interested in multi-view clustering

of two data matrices.

18


19

20

𝜔𝑧 = 0.45 𝜔𝑧 = 0.55

Chemical Structure Bioactivity

QSTAR: Weighted Clustering 0

.00

.51

.01

.5

He

igh

t

JnJ-

1Jn

J-10

JnJ-

14Jn

J-16

JnJ-

18Jn

J-17

JnJ-

26Jn

J-2

JnJ-

4Jn

J-29

JnJ-

7Jn

J-9

JnJ-

8Jn

J-3

JnJ-

5Jn

J-23

JnJ-

6Jn

J-13

JnJ-

15Jn

J-19

JnJ-

11Jn

J-12

JnJ-

22Jn

J-25

JnJ-

31Jn

J-32

JnJ-

33Jn

J-20

JnJ-

28Jn

J-27

JnJ-

30Jn

J-21

JnJ-

24E

RLO

TIN

IBG

EFI

TIN

IB

0.0

0.5

1.0

1.5

Chemical Cluster No.1 2 3 4 5 6 7

0.0

0.5

1.0

1.5

He

igh

t

JnJ-

1Jn

J-27

JnJ-

30 JnJ-

29Jn

J-3

JnJ-

26Jn

J-5

JnJ-

24Jn

J-28

JnJ-

20Jn

J-23

JnJ-

21ER

LOTI

NIB

GEF

ITIN

IBJn

J-2

JnJ-

4Jn

J-7

JnJ-

8Jn

J-9

JnJ-

6Jn

J-13

JnJ-

14Jn

J-19

JnJ-

15Jn

J-18

JnJ-

16Jn

J-17

JnJ-

10Jn

J-22

JnJ-

25Jn

J-11

JnJ-

12Jn

J-31

JnJ-

33Jn

J-320.

00.

51.

01.

5

0.0

0.5

1.0

1.5

He

igh

t

JnJ-

1Jn

J-16

JnJ-

26Jn

J-18

JnJ-

10Jn

J-14

JnJ-

17Jn

J-2

JnJ-

29Jn

J-24

JnJ-

25Jn

J-21

JnJ-

31 JnJ-

6Jn

J-3

JnJ-

9Jn

J-23

JnJ-

5Jn

J-19

JnJ-

4Jn

J-22

JnJ-

7Jn

J-11

JnJ-

12Jn

J-13

JnJ-

15Jn

J-8

JnJ-

32Jn

J-33

JnJ-

20ER

LOTI

NIB

JnJ-

28G

EFIT

INIB

JnJ-

27Jn

J-30

0.0

0.5

1.0

1.5

21


• Group of genes linked to the selected cluster

-2

-1

0

1

JnJ-

1Jn

J-10

JnJ-

14Jn

J-16

JnJ-

18Jn

J-17

JnJ-

26Jn

J-2

JnJ-

4Jn

J-29

JnJ-

7Jn

J-9

JnJ-

8Jn

J-3

JnJ-

5Jn

J-23

JnJ-

6Jn

J-13

JnJ-

15Jn

J-19

JnJ-

11Jn

J-12

JnJ-

22Jn

J-25

JnJ-

31Jn

J-32

JnJ-

33Jn

J-20

JnJ-

28Jn

J-27

JnJ-

30Jn

J-21

JnJ-

24ER

LOTI

NIB

GEF

ITIN

IB

compounds

gene

exp

ress

ion

leve

l (ce

nter

ed)

Genes

FOSL1

FGFBP1

SLCO4A1

RRM2

SH2B3

PHLDA1

CDC6

UBE2C

QSTAR: Biclustering Framework

22

• To find similar local patterns two matrices. Illustration with gene

expression matrix and high content screening matrix

QSTAR: Biclustering

• Finding local patterns in gene expression data based on simultaneous

clustering of genes and compounds

Compounds

Ge

ne

s

Bicluster

23

HCS

Genes

HCS

Genes

Bicluster

Data Matrix

• Methods:

• FABIA

• Multiple Factor Analysis /Sparse Multiple Factor Analysis

QSTAR: Multiview Biclustering

24

QSTAR: Gene expression and High Content screening

25

QSTAR: Pathway Analysis

26

• Exploring potential biological pathways based on existing literatures

and database

• Exploring target predictions based on existing literature and

database


• The are publicly available databases for gene annotations and

functions

• KEEG (http://www.genome.jp/kegg/)

• GO (http://geneontology.org/page/go-database)

• The identified bioassay and chemical structures can also be

explored further using:

• chEMBL (https://www.ebi.ac.uk/chembl/)

• drugBank (http://www.drugbank.ca/)

27

http://www.genome.jp/kegg/








http://geneontology.org/page/go-database








https://www.ebi.ac.uk/chembl/










http://www.drugbank.ca/









• Example of MPL pathway analysis 28

29

QSTAR CONSORTIUM (http://www.qstar-consortium.org/)

http://www.qstar-consortium.org/




Thank you

30

Documents

Data Integration in Early Drug Development Phase The QSTAR … · 2019. 5. 6. · 2 1 2 22 12 1 21 11 QSTAR: Data Structures. QSTAR: Gene X Bioassay X Fingerprint Analysis. 7 •