View
217
Download
0
Category
Preview:
Citation preview
CMB Power Spectrum Estimation withHamiltonian Sampling
J.F. Taylor, M.A.J. Ashdown, M.P. Hobson
March 17, 2008
Outline
I Aim of power spectrum estimation
I Standard methodsI Sampling
I GibbsI Hamiltonian Monte CarloI Tests on simulated data
I Extension to polarisationI What are the new challenges?I Preliminary results
The aim of power spectrum estimation
Figures WMAP science team
I Pixelised CMB map t (xp)
t (xp) =`=`max∑`=0
∑m=−`
a`mY`m (xp)
I For an isotropic Gaussian CMB
〈a`ma∗`′m′〉 = C`δ``′δmm′
I Observed data d = s + n wheres = Rt = RYa
I pixelised map R = WBI time-ordered data R = AB
Aim of power spectrum estimation
I non-stationary, correlatednoise
I beams
I cut-sky / partial coverage
I foregrounds
I systematicsI large data-sets
I WMAP : Npix ≈3× 106 `max ≈ 1000
I Planck : Npix ≈ 5× 107
`max ≈ 2500 ∼ 3000
Figures WMAP science team
pseudo-C` estimators
I Frequentist approach (Peebles 1973; Hivon et al. 2002)
I compute sphericalharmonic coefficientsof data map a`m
C` =1
2`+ 1
∑m
|a`m|2
I Clearly different fromtrue C` applycorrections for the
1. cut2. noise3. beams4. filtering . . .
(Hivon et al. 2002)
I Fast
I but sub-optimal, particularly at low-`
Maximum-likelihood
I likelihood L = Pr (d|C`) =∫
Pr (d|s) Pr (s|C`) ds
I Gaussian noise and signal
L ∝∫e−
12(d−s)T N−1(d−s)e−
12sT S−1sds
where N = 〈nnT 〉 and S = 〈ssT 〉 non-sparse
I Complete square and integrate
lnL = constant− 12{
ln |S + N|+ dT (S + N)−1d}
I Obtain ML estimate using an iterative algorithm, e.g.Newton–Raphson
I Basic method requires storage O(N2pix), operations O(N3
pix)
more ML
There are a number of shortcuts
I Solve matrix equations with conjugate gradient
O(NiterN2pix)
I Compute traces with Monte–Carlo
O(NMCNiterN2pix)
I Ring torus method, for some (i.e. Planck) scanning strategiesS, N same block diagonal form
O(N2pix)
Feasible only up to Npix ∼ 104 or 105
Sampling as an alternative approach
We would actually like to know about the posterior distribution
Pr (C`|d)
It is possible to sample from the joint density
Pr (C`,a|d)
and we could marginalize over a.
I Need to sample in extremely high dimensional space
I Most conventional Monte Carlo methods move throughparameter space by a random walk
Gibbs sampling
I For sampling multi-dimensionaldistributions
I Joint distribution Pr ({xi}) hard tosample
I Conditional distribtionsPr (xi|{xj}j 6=i) tractable
I Sample from each conditionaldistribution in turn to build upjoint density
I No parameters to tune
I Explores parameter space by random walk
Gibbs sampling
Collect samples by alternately drawing fromthe conditional distributions for C` and a
ai+1 ← Pr(a|Ci`,d
)∝ Pr (d|a) Pr
(a|Ci`
)Ci+1` ← Pr
(C`|ai,d
)∝ Pr
(ai|C`
)Pr (C`)
I C` step is simple. . . but slow for low signal to noise . . . so bin
I a step is hard
I Limited to Gaussian noise and signal
Gibbs sampling
Signal sample
I distribution is multivariate Gaussian in a space
a =(S−1 + RtN−1R
)−1RtN−1d
V =(S−1 + RtN−1R
)−1
I performed using transformed white noise sampling
I solved using conjugate gradient method
Computational cost
I write matrix equations in form of SHTs
storage O(Npix), operations O(N3/2pix )
I need to construct preconditioner
I hence approach requires ∼ 100− 200 SHTs per a sample
Hamiltonian Monte Carlo
I Proposed (Duane et al. 1987)
I Draws parallels between sampling and classical dynamics
I Introduces persistent motion into the Markov Chain → movesthrough high dimensional spaces efficiently.
I For each parameter xi we introduce a momentum pi anddefine the Hamiltonian
H =∑i
p2i
2mi+ ψ (x)
where ψ (x) = − log{Pr (x)} and mi is a fictional massassociated with each variable.
Hamiltonian Monte Carlo
I Draw momenta pi fromGaussian with variance mi
I Move x,p along a trajectoryaccording to Hamilton’sequations using simpleiterative scheme.
I After random time testcandidate point withMetropolis rule.
I If using exact Hamiltonian and trajectory is accurate theacceptance rate will be 100 %.
I Explores correlations and degeneracies with relative ease.
Hamiltonian Sampling
I Draw samples for C` and a simultaneously.
I ‘Potential’
ψ =12
(d− Ra)tN−1 (d− Ra) +∑`
(`+
12
)(lnC` +
σ`C`
)I Gradient
∇aψ = RtN−1 (d− Ra) +(l +
12
)a
C`
∇C`ψ =
(l +
12
)1C`
(1− σ`
C`
)where σ` = 1
2`+1
∑m |a`m|2 the spectrum of the signal
I recall R = BYI one SHT for potentialI two SHTs for gradient
WMAP
I Simulated W-bandmap
I Nside = 512 i.e.3× 106 pixels
I Noise and beams asfor a combinedW-band map
I Kp2 mask (∼ 15% ofthe sky
I Currently takes a day on a workstation to process up to`max = 512
I Initially we had problems with long correlation lengths but wenow better understand how to keep these low.
Polarisation
I Small signal
I Low multipoles of particularinterest
I Dominant foregrounds. . . large mask
I Ambiguity between E and B
Possible to separate E/B inmaps but we consider estimatingspectra directly from the data.
PolSpice
Estimates power spectrum using correlation functions.(Chon et al, 2004, Szapudi, Prunet, Colombi 2001)
BB spectrum with full sky (top) and with WMAP cut (bottom)
Sampler
BB spectrum with full sky (blue) and with WMAP cut (red). Mean spectrum rather than maximum likelihood.
Conclusions
I Sampling provides a fast and optimal framework forperforming power spectrum estimation
I Hamiltonian Monte Carlo is a good candidate for performingthe sampling ... fast and flexible
I Optimal estimates are needed for polarisation
next. . .
I incorporate component separation?
Scaling with problem size
Supressing random walk → increased sampling efficiency for largedimensional problems
Metropolis–HastingsGibbsHamiltonianMC
Blackwell-Rao
We can use the a samples to form a fast likelihood code.(Wandelt et al. 2004; Chu et al. 2004)
I Allows us to compute Pr (C`|d) for arbitrary values of C`given our samples
Pr (C`|d) ≈ 1Nsamples
Nsamples∑i
Pr(C`|σi`
)For a Gaussian field
Pr (C`|σ`) ∝∞∏`=0
1σ`
(σ`C`
) 2`+12
exp(−2`+ 1
2σ`C`
)
I Require large numbers of samples to analyse high resolutiondata exactly
I But certainly useful at low `
A convergence diagnostic
Hanson (2001) proposed the following diagnostic that makes use ofgradient information.Compare two estimates of the variance that depend differently onwhere our samples lie in the distribution.
var1(x) =∫ ∞−∞
(x− x)2 Pr (x) dx
var2(x) =13
∫ ∞−∞
(x− x)3 Pr (x)∂ψ(x)∂x
dx +13
∣∣∣(x− x)3 Pr (x)∣∣∣∞−∞
For most ‘interesting’ distributions the second term is zeroSo we compute the ratio from a set of samples {xk}
R =
∑k
(xk − x
)3 ∂ψ(x)∂x
∣∣xk
3∑
k (xk − x)2
Hamiltonian Monte Carlo
Proposed (Duane et al. 1987)
I Draws parallels between sampling and classical dynamics
I Introduces persistent motion through the parameter space
For each parameter xi we introduce a momentum pi and definethe Hamiltonian
H =∑i
p2i
2mi+ ψ (x)
where ψ (x) = − log{Pr (x)} and mi is a fictional mass associatedwith each variable. (In general we can have a mass matrix M)
Hamiltonian Monte Carlo
1. draw new momenta pi from Gaussian with variance mi
2. propagate x,p along a trajectory in the (x,p) space fromHamilton’s equations
∂p
∂t= −∂H
∂x
∂x
∂t=∂H
∂p
3. after some (randomised) length of time halt and accept thenew point according to the Metropolis rule
4. discard p variables, x sample Pr (x)
I We can use any Hamiltonian we like to define our trajectoryas long as we use the correct Hamiltonian to make theaccept/reject decision
I If we use the true Hamiltonian and simulate the dynamicsexactly then every proposed point will be accepted.Conservation of energy.
The leapfrog method
I Simple method for followingdynamics
I Robust to numerical errors
I Reversible
I iterate for T = nτ
I randomise τ and n to avoidresonance conditions
p (t+ τ/2) = p (t)−τ2∇xψ (x) |x=x(t)
x (t+ τ) = x (t)+τ
mp (t+ τ/2)
p (t+ τ) = p (t+ τ/2)−τ2∇xψ (x) |x=x(t+τ)
Recommended