Analysis of dose-response microarray data using Bayesian Variable Selection (BVS) methods:
Modeling and multiplicity adjustments
Adetayo Kasim
Durham University UK
Outline
• Introduction to dose-response modeling in microarray experiments.
• Bayesian estimation in the presence of equality constraints
• Inference for monotone genes
• Multiplicity adjustment
• Discussion
• Current work: Bayesian isotonic regression
Dose-response Microarray Experiments
Dose-response Microarray studies
Biological information from gene expression data create
new opportunities for developing effective therapies:
To understand mechanism of action of a new treatment.
To explore the desired properties (efficacy/toxicity…).
Explore functions of genes/pathways in a dose-dependency
manner .
Case Study: Human epidermal squamous carcinoma celllines
– 4 dose levels.
– 12 arrays.
– 16,998 genes measured on each
array.
EGF (ng/ml)
Dose 0 1 10 100
# of arrays
3 3 3 3
Examples: Dose-response relationships with gene expression
Bayesian Estimation in the Presence of equality constraint between parameters
Objective
Primary interest:
• Discovery of genes with monotone relationship with respect to dose.
Order restricted inference.
• Simple order (monotone) alternatives.
Model Formulation
2,~ iij NY
Estimation under strict inequality constraints
• Order constraints of priors (Gelfand et al., 1992).
Kg ,...,: 10
2,~ iij NY
),(,~ 11
2
iii IN
otherwise
NP iii
0
,,| 11
2
Specification of the prior :
unconstrained prior.
Likelihood:
2, N
Model Formulation
1
1
0
)|()|(K
K
S
SPyP
The posterior distribution, given the order constraints, is the
same as the unconstrained distribution defined on the
constraints set.
The constraints set
Model Formulation
K
K
H
H
,...,:
,...,:
101
100
? What happen if there is equality constraints between parameters.
Model Formulation
•The null model
32107
32106
32105
32104
32103
32102
32101
32100
:
:
:
:
:
:
:
:
g
g
g
g
g
g
g
g
32101 : H
• We decompose the simple order alternative to all sub alternatives.
• All possible monotone models
Model Formulation
• Monotone models
32107
32106
32105
32104
32103
32102
32101
32100
:
:
:
:
:
:
:
:
g
g
g
g
g
g
g
g
μμμμg
μμμg
μμμg
μμμg
μμg
μμg
μμg
μg
3210
'
7
2310
'
6
3201
'
5
3120
1
4
1230
'
3
2301
'
2
3012
'
1
0123
'
0
:
:
:
:
:
:
:
:
Model Formulation
32105 : g
32107 : g
1.0 1.5 2.0 2.5 3.0 3.5 4.0
8.2
8.4
8.6
8.8
9.0
dose
ge
ne
exp
ressio
n
32107 : g
32105 : g
•We fitted two monotone models:
Equality constraints are replaced with a single parameter.
Model Formulation
Bayesian Variable Selection Method
ijKjKjjjij xxxxY 1322110
Alternative approach,
Where X is a design matrix with ordered columns, reflecting the direction of the monotone constraints
0l
2,0~ Nij
2,~ iij NY
01
0
i
i
),0(,~ 2 IN
32103
2102
101
0
d
d
d
c
dose mean
2
0 ,~ N
•Alternative approach •For a dose-response experiment with 4 dose levels (control + 3 doses):
Ki ,...,0 10
Bayesian Variable Selection Method
•Simple order alternative.
32107
32106
32105
32104
32103
32102
32101
32100
:
:
:
:
:
:
:
:
g
g
g
g
g
g
g
g
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
0 ;0 ;0:
3217
3216
3215
3214
3213
3212
3211
3210
g
g
g
g
g
g
g
g
Bayesian Variable Selection Method
i
i
iz
0
1
•The mean structure:
included in the model
not Included in the model
i
i
1
0
•Bayesian Variable Selection: a procedure of deciding which of the model parameters is equal to zero. •Define an indicator variable:
Bayesian Variable Selection Method
K
i
iii z1
0
K
K
r Sg ,...,,: 10
1
),0(,~ 2 INi
)(~ ii Bz
)1,0(~ Ui
2
0 ,~ N
•The mean structure for a candidate model:
Order restrictions Variable selection
Bayesian Variable Selection Method
1.0 1.5 2.0 2.5 3.0 3.5 4.0
8.2
8.4
8.6
8.8
9.0
dose
ge
ne
exp
ressio
n
g_7
g_5
BVS
g7
g5
BVS
Bayesian Variable Selection Method
Inference for Monotone Genes
101
100
:
:
H
H
• suppose we want to identify genes with differential
expression between the control dose and the first dose.
Comparisons Between Two Doses/Groups
);( );( );( 22
00
2
- NNNm
• Inference is based on the posterior probabilities for each to belong to the non-differential components
• Down regulated component
• Non differential component
• Up regulated component
0
00
0
Comparisons Between Two Doses/Groups
• Inference is based on the posterior probabilities for each gene to belong to the non-differential component
• The posterior probabilities could be treated like p-values
• The differentially expressed would be expected to have very
low probability if belonging the null component
),|( 10 HHpp mm
Comparisons Between Two Doses/Groups
• Inference is based on the posterior probabilities for each gene to belong to the non-differential component
• The posterior probabilities could be treated like p-values
• The differentially expressed would be expected to have very
low probability if belonging the null component
),|( 10 HHpp mm
Inference for Monotone Genes
• We assumed a gene specific model for the present
approach
Inference for Monotone Genes
K
K
H
H
,...,:
,...,:
101
100
R
r
rgij gNy0
2 );(
• Inference is based on the posterior probabilities for each to belong to the non-differential components
• Where R=7 and
32107
32101
32100
g
g
g
Inference for Monotone Genes
• We need the posterior probabilities for each gene to belong to the null model
),,|( 2
0 rij gygpp
Inference for Monotone Genes
• Which is equivalent to the posterior probabilities of flat
profile in the Bayesian variable selection approach.
),,|( 2
0 rij gygpp
Inference for Monotone Genes
),,|0,,( 2
0321 ijyδδδp
K
i
iii z1
0
K
K
r Sg ,...,,: 10
1
),0(,~ 2 INi
)(~ ii Bz
)1,0(~ Ui
2
0 ,~ N
•The mean structure for a candidate model:
Order restrictions Variable selection
Inference for Monotone Genes
32107
32106
32105
32104
32103
32102
32101
32100
:
:
:
:
:
:
:
:
g
g
g
g
g
g
g
g
)1,1,1(
)0,1,1(
)1,1,0(
)1,0,1(
)0,1,0(
)0,0,1(
)1,0,0(
)0,0,0(
z
z
z
z
z
z
z
z
•4 dose levels. •The triplet defines uniquely all candidate models: ),,( 321 zzzz
The set of off possible monotone models for an experiment with 4 dose levels
Inference for Monotone Genes
),|(),|)0,0,0(( 0 RdatagpRdatazp
3210
1
0
K
i
iii z
),,|)0,0,0(( 2
0321 ijyzzzzp
•The posterior probability that the triplet equal to zero: )0,0,0(z
Inference for Monotone Genes
514.0),|( 0 Rdatagp
g0 g3 g2 g6 g1 g4 g5 g7
0.0
0.1
0.2
0.3
0.4
0.5
•The highest posterior probability is obtained for the null model
(0.514).
Inference for Monotone Genes
4186.0),|( 5 Rdatagp
001.0),|( 0 Rdatagp
g0 g3 g2 g6 g1 g4 g5 g7
0.0
0.1
0.2
0.3
0.4
4059.0),|( 1 Rdatagp
•The highest posterior probability is obtained for model g5.
•Data do not support the null model.
Inference for Monotone Genes
Multiplicity Adjustment
)(
)()(
N
cFDcFDR
•Choose τ such that
Multiplicity Adjustment
),|( 0 Rdatagpmgene m is included in the discovery list
),|( 0 Rdatagpmthe posterior probability of the null model = the probability that we make a mistake when we include the gene in the discovery list.
)(
)()(
N
cFDcFDR
the false discovery rate for a discovery list in which the g’th gene and all other genes with smallest posterior probabilities of the null model are included (Newton 2004,2007).
Multiplicity Adjustment
),|(0
),|(1
0
0
Rdatagp
RdatagpI
m
m
m
)(N
gene m is included in the discovery list
gene m is not included in the discovery list
The number of genes in the discovery list.
M
m
mIN1
)(
•Primary interest: discovery of subset of genes with monotone relationship with respect to dose.
Multiplicity Adjustment
)(),|()(1
0 cFDIRdatagpFDEM
m
mm
)(
)()(
N
cFDcFDR
•The conditional (on the data) expected number of false discoveries (in the discovery list):
•The conditional false discovery rate:
•Choose τ such that .)( cFDR
Multiplicity Adjustment
%5
3295
,,|
)102.0(
0
Rdatazgp
cFDR
g
The expected error rate for the list with all genes for which the posterior probability of the null model < 0.102 are included.
τ
Multiplicity Adjustment
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
Cut-off
FD
R
TRUE FDR
Estmated FDR
Multiplicity Adjustment
• From Simulation Study
• BVS methods: estimation and inference.
• Multiplicity adjustment is based on the posterior probability of the null model.
• Connection between BVS and MCT.
• Connection between BVS and Bayesian model averaging.
• BVS for order restricted but non-monotone alternatives (umbrella alternatives/partial order alternatives).
• Posterior probabilities for the number of levels and the level probabilities for isotonic regressions.
Discussion
Current Work: Bayesian Isotonic Transformation
• Motivated by Dunson and Neelon (2003)
• Generate posterior samples from unconstrained full conditional distributions
for the model parameters
• Obtain constrained samples through isotonic transformation of the
unconstrained samples.
Current Work: Bayesian Isotonic Regression
• Examples
Current Work: Bayesian Isotonic Regression
• Bayes factor for model selection
• The Bayes factor account for the fact that the model not equally like under the
null model
Current Work: Bayesian Isotonic Regression
• Probability under the null model
• The probability under the null model is equivalent to to level probability in
ORIC (Anraku, 1999)
Current Work: Bayesian Isotonic Regression
• The Bayesian isotonic transformation approach provide good estimates of dose-
specific means under simple ordered constraints
• However, adjusting for multiplicity is less straight forward for this approach.
Research Team
• Ziv Shkedy.
• Luc Bijnens.
• Willem Talloen.
• Hinrich Gohlmann.
• Dhammika Amaratunga
Hasselt University, Belgium Johnson & Johnson Pharmaceutical
Durham University, UK
• Adetayo Kasim.
Imperial College, UK
• Bernet Kato.
pfizer, Belgium
• Dan Lin