Upload
may-lewis
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
Bayes NCSU QTL II: Yandell © 2005 1
Bayesian Interval Mapping
1. what is Bayes? Bayes theorem? 2-6
2. Bayesian QTL mapping 7-17
3. Markov chain sampling 18-25
4. sampling across architectures 26-32
Bayes NCSU QTL II: Yandell © 2005 2
• Reverend Thomas Bayes (1702-1761)– part-time mathematician
– buried in Bunhill Cemetary, Moongate, London
– famous paper in 1763 Phil Trans Roy Soc London– was Bayes the first with this idea? (Laplace?)
• basic idea (from Bayes’ original example)– two billiard balls tossed at random (uniform) on table
– where is first ball if the second is to its left (right)?
1. who was Bayes? what is Bayes theorem?
prior pr() = 1likelihood pr(Y | )= 1–Y(1–)Y
posterior pr( |Y) = ?
Y=0 Y=1
firstsecond
Bayes NCSU QTL II: Yandell © 2005 3
what is Bayes theorem?• where is first ball if the second is to its left (right)?• prior: probability of parameter before observing data
– pr( ) = pr( parameter )– equal chance of being anywhere on the table
• posterior: probability of parameter after observing data– pr( | Y ) = pr( parameter | data )– more likely to left if first ball is toward the right end of table
• likelihood: probability of data given parameters– pr( Y | ) = pr( data | parameter )– basis for classical statistical inference
• Bayes theorem– posterior = likelihood * prior / pr( data )– normalizing constant pr( Y ) often drops out of calculation
)(pr
)(pr)|(pr
)(pr
),(pr)|(pr
Y
Y
Y
YY
Y=0 Y=1
Bayes NCSU QTL II: Yandell © 2005 4
where is the second ball given the first?
1 if)1(2
0 if2
)(pr
)(pr)|(pr)|pr( :posterior
2
1)(pr :marginal
1 if1
0 if)|(pr :likelihood
1)pr( :prior
Y
Y
Y
YY
Y
Y
YY
prior pr() = 1likelihood pr(Y | )= 1–Y(1- )Y
posterior pr( |Y) = ?
Y=0 Y=1
first ballsecond ball
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Y 0
Y 1
pr
|Y
Bayes NCSU QTL II: Yandell © 2005 5
Bayes posterior for normal data
large prior variancesmall prior variance
6 8 10 12 14 16
y = phenotype values
n la
rge
n s
ma
llp
rio
r
6 8 10 12 14 16
y = phenotype values
n la
rge
n s
ma
llp
rio
rp
rio
r
actu
al m
ean
prio
r m
ean
actu
al m
ean
prio
r m
ean
Bayes NCSU QTL II: Yandell © 2005 6
model Yi = + Ei
environment E ~ N( 0, 2 ), 2 known likelihood Y ~ N( , 2 )prior ~ N( 0, 2 ), known
posterior: mean tends to sample meansingle individual ~ N( 0 + B1(Y1 – 0), B12)
sample of n individuals
fudge factor(shrinks to 1)
Bayes posterior for normal data
11
/sumwith
/,)1(~
},...,1{
20
n
nB
nYY
nBBYBN
n
ini
nnn
Bayes NCSU QTL II: Yandell © 2005 7
2. Bayesian QTL mapping• observed measurements
– Y = phenotypic trait– X = markers & linkage map
• i = individual index 1,…,n
• missing data– missing marker data– Q = QT genotypes
• alleles QQ, Qq, or qq at locus
• unknown quantities of interest = QT locus (or loci) = phenotype model parameters– m = number of QTL– M = genetic model = genetic architecture
• pr(Q|X,) recombination model– grounded by linkage map, experimental cross– recombination yields multinomial for Q given X
• pr(Y|Q,) phenotype model– distribution shape (could be assumed normal) – unknown parameters (could be non-parametric)
observed X Y
missing Q
unknown
afterSen Churchill (2001)
M
Bayes NCSU QTL II: Yandell © 2005 8
QTL
Marker Trait
QTL Mapping (Gary Churchill)
Key Idea: Crossing two inbred lines creates linkage disequilibrium which in turn creates associations and linked segregating QTL
Bayes NCSU QTL II: Yandell © 2005 9
2r
1X 2X 3X 4X 5X 6X
5r4r3r1r
pr(Q|X,) recombination modelpr(Q|X,) = pr(geno | map, locus)
pr(geno | flanking markers, locus)
distance along chromosome
Q?
recombination rates
markers
Bayes NCSU QTL II: Yandell © 2005 10
pr(Y|Q,) phenotype model• pr( trait | genotype, effects )• trait = genetic + environment• assume genetic uncorrelated with environment
2
2
2
)var(
)var(
),(~
),|(pr
e
G
eGY
GNY
qQY
Gq
q
q
8 9 10 11 12 13 14 15 16 17 18 19 20
qq QQ
Y
GQQ
Bayes NCSU QTL II: Yandell © 2005 11
Bayesian priors for QTL• missing genotypes Q
– pr( Q | X, )– recombination model is formally a prior
• effects = ( G, 2 )– pr( ) = pr( Gq | 2 ) pr(2 ) – use conjugate priors for normal phenotype
• pr( Gq | 2 ) = normal• pr(2 ) = inverse chi-square
• each locus may be uniform over genome– pr( | X ) = 1 / length of genome
• combined prior– pr( Q, , | X ) = pr( Q | X, ) pr( ) pr( | X )
Bayes NCSU QTL II: Yandell © 2005 12
Bayesian model posterior• augment data (Y,X) with unknowns Q• study unknowns (,,Q) given data (Y,X)
– properties of posterior pr(,,Q | Y, X )
• sample from posterior in some clever way– multiple imputation or MCMC
),|,,(pr sum),|,(pr
)|(pr
)|(pr)(pr),|(pr),|(pr),|,,(pr
XYQXY
XY
XXQQYXYQ
Q
Bayes NCSU QTL II: Yandell © 2005 13
how does phenotype Y improve posterior for genotype Q?
90
100
110
120
D4Mit41D4Mit214
Genotype
bp
AAAA
ABAA
AAAB
ABAB
what are probabilitiesfor genotype Qbetween markers?
recombinants AA:AB
all 1:1 if ignore Yand if we use Y?
Bayes NCSU QTL II: Yandell © 2005 14
posterior on QTL genotypes• full conditional for Q depends data for individual i
– proportional to prior pr(Q | Xi, )• weight toward Q that agrees with flanking markers
– proportional to likelihood pr(Yi|Q,)• weight toward Q=q so that group mean Gq Yi
• phenotype and prior recombination may conflict– posterior recombination balances these two weights
),,|(pr
),|(pr),|(pr),,,|(pr
ii
iiii XY
QYXQXYQ
Bayes NCSU QTL II: Yandell © 2005 15
posterior genotypic means Gq
6 8 10 12 14 16
y = phenotype values
n la
rge
n la
rge
n s
ma
llp
rio
r
qq Qq QQ
Bayes NCSU QTL II: Yandell © 2005 16
posterior centered on sample genotypic meanbut shrunken slightly toward overall mean
prior:
posterior:
fudge factor:
posterior genotypic means Gq
11
/sum},{count
/,)1(~
,~
}{
2
2
q
qiqQ
qiq
qqqqqq
q
n
nB
nYYqQn
nBYBYBNG
YNG
i
Bayes NCSU QTL II: Yandell © 2005 17
What if variance 2 is unknown?• sample variance is proportional to chi-square
– ns2 / 2 ~ 2 ( n )– likelihood of sample variance s2 given n, 2
• conjugate prior is inverse chi-square 2 / 2 ~ 2 ( )– prior of population variance 2 given , 2
• posterior is weighted average of likelihood and prior– (2+ns2) / 2 ~ 2 ( +n )– posterior of population variance 2 given n, s2, , 2
• empirical choice of hyper-parameters 2= s2/3, =6– E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4
Bayes NCSU QTL II: Yandell © 2005 18
3. Markov chain sampling of architectures
• construct Markov chain around posterior– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution
• initial values may have low posterior probability• burn-in period to get chain mixing well
• hard to sample (Q, , , M) from joint posterior– update (Q,,) from full conditionals for model M– update genetic model M
NMQMQMQ
XYMQMQ
),,,(),,,(),,,(
),|,,,(pr~),,,(
21
Bayes NCSU QTL II: Yandell © 2005 19
Gibbs sampler idea• toy problem
– want to study two correlated effects– could sample directly from their bivariate distribution
• instead use Gibbs sampler:– sample each effect from its full conditional given the other– pick order of sampling at random– repeat many times
2
1122
22211
2
1
2
1
1),(~
1),(~
1
1,~
GNG
GNG
NG
G
Bayes NCSU QTL II: Yandell © 2005 20
Gibbs sampler samples: = 0.6
0 10 20 30 40 50
-2-1
01
2
Markov chain index
Gib
bs:
mea
n 1
0 10 20 30 40 50
-2-1
01
23
Markov chain index
Gib
bs:
mea
n 2
-2 -1 0 1 2
-2-1
01
23
Gibbs: mean 1
Gib
bs:
mea
n 2
-2 -1 0 1 2
-2-1
01
23
Gibbs: mean 1
Gib
bs:
mea
n 2
0 50 100 150 200
-2-1
01
23
Markov chain index
Gib
bs:
mea
n 1
0 50 100 150 200
-2-1
01
2
Markov chain index
Gib
bs:
mea
n 2
-2 -1 0 1 2 3
-2-1
01
2
Gibbs: mean 1
Gib
bs:
mea
n 2
-2 -1 0 1 2 3
-2-1
01
2
Gibbs: mean 1
Gib
bs:
mea
n 2
N = 50 samples N = 200 samples
Bayes NCSU QTL II: Yandell © 2005 21
MCMC sampling of (,Q,)• Gibbs sampler
– genotypes Q– effects = ( G, 2 )– not loci
• Metropolis-Hastings sampler– extension of Gibbs sampler– does not require normalization
• pr( Q | X ) = sum pr( Q | X, ) pr( )
)|(pr
)|(pr),|(pr~
)|(pr
)(pr),|(pr~
),,,|(pr~
XQ
XXQ
QY
QY
XYQQ ii
Bayes NCSU QTL II: Yandell © 2005 22
Metropolis-Hastings idea• want to study distribution f()
– take Monte Carlo samples• unless too complicated
– take samples using ratios of f• Metropolis-Hastings samples:
– current sample value – propose new value *
• from some distribution g(,*)• Gibbs sampler: g(,*) = f(*)
– accept new value with prob A• Gibbs sampler: A = 1
),()(
),()(,1min
*
**
gf
gfA
0 2 4 6 8 10
0.0
0.2
0.4
-4 -2 0 2 4
0.0
0.2
0.4
f()
g(–*)
Bayes NCSU QTL II: Yandell © 2005 23
Metropolis-Hastings samples
0 2 4 6 8
050
150
mcm
c se
quen
ce
0 2 4 6 8
02
46
pr(
|Y)
0 2 4 6 8
050
150
mcm
c se
quen
ce
0 2 4 6 8
0.0
0.4
0.8
1.2
pr(
|Y)
0 2 4 6 8
040
080
0
mcm
c se
quen
ce
0 2 4 6 8
0.0
1.0
2.0
pr(
|Y)
0 2 4 6 8
040
080
0
mcm
c se
quen
ce
0 2 4 6 8
0.0
0.2
0.4
0.6
pr(
|Y)
N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g
Bayes NCSU QTL II: Yandell © 2005 24
full conditional for locus• cannot easily sample from locus full conditional
pr( |Y,X,,Q) = pr( | X,Q)= pr( Q | X, ) pr( ) /
constant• constant is very difficult to compute explicitly
– must average over all possible loci over genome– must do this for every possible genotype Q
• Gibbs sampler will not work in general– but can use method based on ratios of probabilities– Metropolis-Hastings is extension of Gibbs sampler
Bayes NCSU QTL II: Yandell © 2005 25
Metropolis-Hastings Step
• pick new locus based upon current locus– propose new locus from some distribution g( )
• pick value near current one? (usually)
• pick uniformly across genome? (sometimes)
– accept new locus with probability A• otherwise stick with current value
),(),|(pr)(pr
),(),|(pr)(pr,1min),(
newoldoldold
oldnewnewnewnewold gXQ
gXQA
Bayes NCSU QTL II: Yandell © 2005 26
4. sampling across architectures • search across genetic architectures M of various
sizes– allow change in m = number of QTL– allow change in types of epistatic interactions
• methods for search– reversible jump MCMC– Gibbs sampler with loci indicators
• complexity of epistasis– Fisher-Cockerham effects model– general multi-QTL interaction & limits of inference
Bayes NCSU QTL II: Yandell © 2005 27
search across genetic architectures
• think model selection in multiple regression– but regressors (QTL genotypes) are unknown
• linked loci are collinear regressors– correlated effects
• consider only main genetic effects here– epistasis just gets more complicated
or 211 qqqqq
GG
Bayes NCSU QTL II: Yandell © 2005 28
model selection in regression
• consider known genotypes Q at 2 known loci – models with 1 or 2 QTL
• jump between 1-QTL and 2-QTL models• adjust parameters when model changes
q1 estimate changes between models 1 and 2– due to collinearity of QTL genotypes
21
1
:2
:1
qqq
Gm
Gm
Bayes NCSU QTL II: Yandell © 2005 29
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
b1
b2
c21 = 0.7
Move Between Models
m=1
m=2
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
b1
b2
Reversible Jump Sequence
geometry of reversible jump
1 1
2
2
Bayes NCSU QTL II: Yandell © 2005 30
0.05 0.10 0.15
0.0
0.05
0.10
0.15
b1
b2
a short sequence
-0.3 -0.1 0.1
0.0
0.1
0.2
0.3
0.4
first 1000 with m<3
b1
b2
-0.2 0.0 0.2
geometry allowing Q and to change
1 1
2
2
Bayes NCSU QTL II: Yandell © 2005 31
reversible jump MCMC idea
• Metropolis-Hastings updates: draw one of three choices– update m-QTL model with probability 1-b(m+1)-d(m)
• update current model using full conditionals• sample m QTL loci, effects, and genotypes
– add a locus with probability b(m+1)• propose a new locus and innovate new genotypes & genotypic effect• decide whether to accept the “birth” of new locus
– drop a locus with probability d(m)• propose dropping one of existing loci• decide whether to accept the “death” of locus
• Satagopan Yandell (1996, 1998); Sillanpaa Arjas (1998); Stevens Fisch (1998)
– these build on RJ-MCMC idea of Green (1995); Richardson Green (1997)
0 L1 m+1 m2 …
Bayes NCSU QTL II: Yandell © 2005 32
Gibbs sampler with loci indicators • consider only QTL at pseudomarkers
– every 1-2 cM– modest approximation with little bias
• use loci indicators in each pseudomarker = 1 if QTL present = 0 if no QTL present
• Gibbs sampler on loci indicators – relatively easy to incorporate epistasis– Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005 Genetics)
• (see earlier work of Nengjun Yi and Ina Hoeschele)
2211 qqq
G