Upload
vukhanh
View
233
Download
0
Embed Size (px)
Citation preview
Background
Biomarker: A specific physical trait used to measure or indicate the effects or progress of a disease or condition
Newly developed laboratory methods expand the number of biomarkers on a daily basis
Cost
Measurement Error
Causal Link to Disease
2
q(c)
1 – p(c)0 1
1
ROC: AUC
Area Under the Curve
Overall discriminatory ability
Area under the ROC curve
x
dxdyygxf
YXPAUC
)()(
)(
Motivation
Preliminary analysis of salivary concentrations of cortisol from the LIFE study
P=0.04
Shipment 1 n Mean SDMichigan 85 0.40 0.19Texas 142 0.57 0.79
4
Cortisol by Site & Plate5
02
46
8
Ship1 Ship2 Ship1 Ship2 Ship3
1 2 3 4 5 1 2 3 1 2 3 4 5 1 2 1 2 3 4 5 6 7 8
Michigan Texas
ug/d
L
Graphs by Site
Outline
Background
Limit of Detection Determination
Pooling Biomarkers
Hybrid Design
New Pooling Hybrid and Case Only Designs
Reporting of Biomarker Data
ID Z
1 3.1
2 1.5
3 8.4
4 0.8
5 5.4
6 3.2
7 2.0
8 5.8
9 13.4
10 2.5
11 1.9
12 6.1
•Reporting threshold is equal to 2.2
ID Z
1 3.1
2 ND
3 8.4
4 ND
5 5.4
6 3.2
7 ND
8 5.8
9 13.4
10 2.5
11 ND
12 6.1
Report values < threshold as ‘not detected’
ID Z
1 3.1
2 1.1
3 8.4
4 1.1
5 5.4
6 3.2
7 1.1
8 5.8
9 13.4
10 2.5
11 1.1
12 6.1
Report values < threshold as one half the value of the threshold
Conventional Determination of the Limit of Detection (LOD)
6.15)3( blanksblanksLOD
BLANK SERIES
• 10.0
• 5.0
• 8.1
• 7.1
• 4
• 11.3
• 12.0
• 8.0
• 7.7
• 7.0
Mean = 8.02
Std Dev = 2.53
0 5 10 15 20
Example of LOD left-censored data
0 5 10 15 20 25 30 35
Blanks
“True” biomarkerObserved biomarker (samples)
Outline
Background
Limit of Detection Determination
Pooling Biomarkers
Hybrid Design
Case Only and Reliability using Pooling Designs
What is pooling?
Physically combining several individual specimens to create a single mixed sample
Pooled samples are the average of the individual specimens
1
2 p
12
Random Sample of Biospecimens
RANDOM SAMPLE
Randomly select 20 samples
FULL DATA
N = 40 Individual Biospecimens
13
Pooling Biospecimens
POOLED DATA
40 samples in groups of 2
FULL DATA
N = 40 Individual Biospecimens
14
q(c)
1 – p(c)0 1
1
ROC: AUC
Area Under the Curve
Overall discriminatory ability
Area under the ROC curve
x
dxdyygxf
YXPAUC
)()(
)(
0
20
40
60
80
100
120
0 2 4 6 8 10
# of Pooled Samples
# o
f A
ssa
ys o
f P
oo
led
Sa
mp
les
20
50
100
200
Number of assays required to reach
equivalency on ROC Curves
Limit of Detection and Pooling
/
i i
i
i
x if x LODZ
N A if x LOD
Unpooled Specimens
Pooled Specimens
( ) ( )
( )
( )
, if LOD
N/A, if LOD
p p
p i i
i p
i
x xZ
x
Un-pooled
0
0.2
0.4
0.6
0.8
1
-4 -3 -2 -1 0 1 2 3 4
Effect of Pooling on Markers Affected by an LOD
Pooled
0
0.2
0.4
0.6
0.8
1
-4 -3 -2 -1 0 1 2 3 4
18
Un-pooled
0
0.2
0.4
0.6
0.8
1
-4 -3 -2 -1 0 1 2 3 4
Pooled
0
0.2
0.4
0.6
0.8
1
-4 -3 -2 -1 0 1 2 3 4
Estimation Based on Pooling
Using Likelihood Function
Differentiating with respect to
To obtain estimates of
Schisterman et al. Bioinformatics 2006
Efficiency of the Mean and Variance
Variance of Estimated Mean
Variance of Estimated Variance
FULL DATA POOLED RANDOMFULL DATA POOLED RANDOM
20
LOD below Mean
LOD below Mean
LOD above Mean
LOD above Mean
Pooling and Random Sampling
Pooling advantages
Reduces the number of assays we need to test
Efficiently estimates the mean
Cost-effective
Random sampling advantages
Reduces the number of assays we need to test
Efficiently estimates the variance
Cost-effective & easy to implement
21
Hybrid Design: Pooled—Unpooled
Creates a sample of both pooled and unpooled samples
Takes advantage of the strengths of both the pooling and random sampling designs
Reduces number of tests to perform
Cuts overall costs
Gains efficiency (by using pooling technique)
Accounts for different types of measurement error without replications
– Pooling error
– Random measurement error
– LOD
22
Outline
Background
Limit of Detection Determination
Pooling Biomarkers
Hybrid Design
Case Only and Reliability using Pooling Designs
Unpooled: X1,…,X5
Pooled: Z1,…,Z15
Hybrid Sample S: X1,…,X5,Z1,…,Z15
Setup of Hybrid Design
Unpooled: X1,…,X[αn]
Pooled: Z1,…,Z[(1-α)n]
In GeneralHybrid Sample S: X1,…,X[αn],Z1,…,Z[(1-α)n]
α is the proportion of unpooled samples
24
Hybrid Design: Pooled—Unpooled
Create a sample of both pooled and un-pooled samples that takes advantage of the strengths of the pooled and random sample designs
Reduce number of tests to perform
Cut overall costs
Gain efficiency (by using pooling technique)
Accounts for different types of measurement error without replications
Pooling error
Random measurement error
Limit of detection
Mathematical Setup
),(~ 2
xxi NX NiX i 1
),0(~ 2
p
p
i Ne
p
i
ip
pij
ii eXp
Z 1)1(
1
2
2
,~ px
xip
NZ
Individual Samples
Pooled Samples
p is the pooling group size
Average of p individual samples
pooled together
Pooling error
Maximum Likelihood Estimators
22
(1 )
2 2
( )( )
( , , ) ln (1 ) ln2 2
i xi xi ni n
x x p x z
x z
zx
n n
Random Sampling Pooling
22
22
ˆ)1(ˆ
ˆ)1(ˆˆ
xz
xzx
nn
znxn
n
xni
xi
x
2
2
)ˆ(
ˆ
2
2(1 )2
ˆ( )ˆ
ˆ(1 )
i x
i n xp
z
n p
In order to estimate the
variance, α cannot be zero.
Zi* = Zi +ei
M
Add in Measurement Error
),0(~ 2
M
M
i Ne
Individual Samples + ME
Pooled Samples + ME
p is the pooling group size
Measurement error
Xi* = Xi +ei
M* 2 2~ ( , )i x x MX N
22
2
* ,~ Mp
x
xip
NZ
Hybrid Design Example: IL-6
Measured IL-6 on 40 MI cases and 40 controls
Biological specimens were randomly pooled in groups of 2, for the cases and controls separately, and remeasured
We want to evaluate the discriminating ability of this biomarker in terms of AUC
29
Hybrid Design Example: IL-6
n αx αy AÛC Var(AÛC)
Empirical 40 1.00 1.00 0.640 0.0036
Hybrid design: Optimal α 20 0.40 0.35 0.621 0.0049
Random sample: α=1 20 1.00 1.00 0.641 0.0071
Hybrid design reduced the variability of Var(AÛC) by 32% as compared to taking only a random sample
30
Summary—Hybrid Design
Hybrid design is a more efficient way to estimate the mean and variance of a population
Cost-effective
Yields estimate of measurement error without requiring repeated measurements
Here we focus on normally distributed data, but can be applied to other distributions as well
31
Outline
Background
Limit of Detection Determination
Pooling Biomarkers
Hybrid Design
Case Only and Reliability using Pooling Designs
34
Case-Only Studies
Used to assess G x E interactions
Assumes independence between gene and environment
Cannot evaluate the independent effects of exposure or genotype
35
2 x 2 Table - Notation
Total Cohort
G+ G-
E+ a1+b1 a0+b0
E- c1+d1 c0+d0
Cases
G+ G-
E+ a1 a0
E- c1 c0
Controls
G+ G-
E+ b1 b0
E- d1 d0
36
Efficiency
*
11
*
11
*
00
*
00
11111111)var(
dcbadcbaORGxE
1100
1111)var(
cacaORGxE
Case-
Control
Case-Only
Case-only designs provide more efficient estimates of G x E
interactions than either case-control or cohort studies
Testing for GxE Interactions37
Public health importance Important in identifying highly susceptible populations
such that modifiable exposures that cause disease can be prevented
Problem Traditional case-control interaction estimator requires
a large n
Cost and biospecimen availability often limit the ability of investigators to test for GxE interactions
A Pooled GxE Estimator38
Pooling biospecimens and measuring allele frequencies
Rather than measuring the genotype for every individual, we propose measuring the allele frequencies of a pool of individuals within disease and environmental strata
Frequency
PCR cycle
ThresholdtC
[Germer et al., Genome Res. 2000]
12
1~
tCDEf
39
2 x 2 Table - Notation
Total Cohort
G+ G-
E+ a1+b1 a0+b0
E- c1+d1 c0+d0
Cases
G+ G-
E+ a1 a0
E- c1 c0
Controls
G+ G-
E+ b1 b0
E- d1 d0
Improvements in Power40
To address potential measurement error, we propose repeatedly measuring each pool
12 pooled measurements does better than 500 individual genotype measurements
Implication:
If an investigator had access to a large number of stored biospecimens, pooling would be very useful in preliminary assessments of GxEinteraction hypotheses
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Power Function
GXE interaction
Po
we
r
Pooled Sample Size 1000, error variance = .05
Pooled Sample Size 1000, error variance = .1
Not Pooled Sample Size 500
Adaptations for Different Estimators
Case-Only Estimator Uses assumptions of GxE independence for a rare disease
to achieve high power by not requiring control subjects
Hybrid Estimators Two Step
Step 1: test case-only estimator assumptions Step 2: decide whether to use the case-control or case-only
estimator
Empirical Bayes Weighted combination of the case-control and case-only
estimators
Study Design
Case-only pooling design to study GxE
Biomarker evaluation reliability studies
New user designs
Biomarker Evaluation
Estimating sources of variation in new biomarkers is an important step in development
Between- and within-subject variation, intraclasscorrelation coefficient (ICC)
Biomarkers with similar biological function with the highest ICC can be considered the most reproducible
May then be selected as markers for future studies
ICC
ICC is useful when the biomarker is an:
Outcome measure
ICC dictates trade-off in efficiency between taking an increased number of individuals or repeated measurements
Exposure measure
ICC aids investigators in correcting for measurement error
Pooling Designs
Designs where we pool samples over time on individuals (T) are more efficient than designs where we pool over individuals at the same time point (I)