27
ÉCOLE POLYTECHNIQUE DÉRALE DE LAUSANNE http://stat.epfl.ch January 2019 – slide 1 Almost-perfect Signal Detection Anthony Davison

ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

http://stat.epfl.ch January 2019 – slide 1

Almost-perfect SignalDetection

Anthony Davison

Page 2: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Banff 2006

http://stat.epfl.ch January 2019 – slide 2

Page 3: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Simple example

http://stat.epfl.ch January 2019 – slide 3

� Define set of cuts to select possible signal.

� Expect background b (± error) from uninteresting sources.

� Observe n events

1. For n rather larger than b, quantify significance of deviation (strongsignificance required, i.e. “5σ”).

2. For n ≈ b, establish upper limit on possible excess from interestingnew source.

� Background and calibration efficiencies (intensity of signal) should beconsidered as nuisance parameters.

� Typically there are many search channels (different configurations inwhich the particle could be produced and decay).

Page 4: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Statistical model

http://stat.epfl.ch January 2019 – slide 4

At Banff the following model was proposed:

Ni ∼ Poisson(λ1i ψ + λ2i),

Yi ∼ Poisson(λ2i ti),

Zi ∼ Poisson(λ1i ui),

� i = 1, . . . , c, where c is the number of search channels,

� Y and Z represent auxiliary independent experiments meant to measurebackground and intensity respectively,

� ti and ui are known positive constants,

� different channels are assumed independent,

� Then the parameters are

– ψ, the interest parameter (signal of interest),

– λ = (λ11, λ21, . . . , λ1c, λ2c), the nuisance parameter.

Page 5: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Likelihood

http://stat.epfl.ch January 2019 – slide 5

� The proposed Poisson model has log likelihood

ℓ(ψ, λ) = log f(n, y, z;ψ, λ)

≡c∑

i=1

{ni log(λ1i ψ + λ2i) + zi log λ1i

+ yi log λ2i − (ψ + ui)λ1i − (1 + ti)λ2i}

� A (3c, 2c+ 1) curved exponential family.

� A full exponential family with a single channel (c = 1).

� For physicists, ψ ≥ 0.

� But for the use of the model ψ > −λ2i/λ1i, for each i, so can haveψ < 0, at least mathematically.

� Aim to use higher order likelihood inference for ψ.

� Details for this example in Davison and Sartori (2008, Statistical Science).

Page 6: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Higher Order Likelihood Inference

Detecting a Signal

Higher OrderLikelihoodInference

First order

Higher order

Nuisance parameters

Properties

Bayes

Application toSignal Detection

Neutrino ordering

http://stat.epfl.ch January 2019 – slide 6

Page 7: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

First order likelihood theory

http://stat.epfl.ch January 2019 – slide 7

� Parametric model f(y; θ) with (notional?) sample size n, log likelihoodℓ(θ), observed information j(θ) = −∂2ℓ(θ)/∂θ2, MLE θ.

� If θ scalar, first order inference based on limiting N(0, 1) laws of

likelihood root, r(θ) = sign(θ − θ)[2{ℓ(θ)− ℓ(θ)

}]1/2;

score statistic, s(θ) = j(θ)−1/2∂ℓ(θ)/∂θ;

Wald statistic, t(θ) = j(θ)1/2(θ − θ) :

as n→ ∞ have, for instance,

P {r(θ) ≤ robs} = Φ(robs) +O(n−1/2) ,

which yields tests and confidence sets for θ based on an observed robs.

Page 8: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Higher order likelihood theory

http://stat.epfl.ch January 2019 – slide 8

� For continuous responses, third order inference based on limiting N(0, 1)distribution of modified likelihood root, also called modified directeddeviance,

r∗(θ) = r(θ) +1

r(θ)log

{q(θ)

r(θ)

},

where q(θ) depends on model, can be s(θ), t(θ), or similar.

� Inference uses significance function

Φ{r∗(θ)}·

∼ U(0, 1), for true θ.

� Level 1− 2α confidence interval contains those θ for which

α ≤ Φ{r∗(θ)} ≤ 1− α.

Page 9: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

With nuisance parameters?

http://stat.epfl.ch January 2019 – slide 9

� When θ = (ψ, λ), with ψ scalar

r(ψ) = sign(ψ − ψ)[2{ℓ(θ)− ℓ(θψ)

}]1/2

where θψ is the MLE of θ for fixed ψ.

� The function q(ψ) is

q(ψ) =| ϕ(θ)− ϕ(θψ) ϕλ(θψ) |

| ϕθ(θ) |

{| j(θ) |

| jλλ(θψ) |

}1/2

where ϕ (which has the dimension of θ) is the canonical parameter of alocal exponential family approximation to the model and where, forexample, ϕθ denotes the matrix ∂ϕ/∂θT.

Page 10: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Properties of higher order approximations

http://stat.epfl.ch January 2019 – slide 10

� Parameterization-invariant.

� Computation almost as easy as first order asymptotics.

� Error O(n−3/2) in continuous response models.

� Gives continuous approximation to discrete response models.

� Relative (not absolute) error, so highly accurate in tails.

� Bayesian version with prior π uses

qB(ψ) = ℓ′p(ψ)jp(ψ)−1/2

∣∣∣jλλ(θψ)∣∣∣

∣∣∣jλλ(θ)∣∣∣

1/2

π(θ)

π(θψ),

where ℓp(ψ) = ℓ(θψ) is the profile log likelihood for ψ, and jp = −ℓ′′p isthe corresponding information. Again, easy computation and highaccuracy.

Page 11: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Bayesian approach

http://stat.epfl.ch January 2019 – slide 11

� Aim for prior that is noninformative for ψ in presence of nuisance parameter ξ.

� Tibshirani (1989, Biometrika) shows that this prior is proportional to

|iψψ(ψ, ξ)|1/2g(ξ) dψdξ,

when ξ is orthogonal to ψ and

– iψψ(ψ, ξ) denotes the (ψ, ψ) element of the Fisher information matrix,

– g(ξ) is an arbitrary positive function satisfying mild regularity conditions.

� This gives a Jeffreys prior for ψ that is also a matching prior: it gives (1− α)one-sided Bayesian posterior confidence intervals that contain ψ with frequentistprobability (1− α) +O(n−1).

� Calculations are (miraculously) explicit for the Poisson model.

Page 12: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Publicite

http://stat.epfl.ch January 2019 – slide 12

Brazzale, Davison, Reid (2007) Applied Asymptotics, Cambridge University Press.

Page 13: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Application to Signal Detection

Detecting a Signal

Higher OrderLikelihood Inference

⊲Application toSignal Detection

Single channel

Simulation

Discussion

Neutrino ordering

http://stat.epfl.ch January 2019 – slide 13

Page 14: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Single channel (c = 1): Example

http://stat.epfl.ch January 2019 – slide 14

� The data are

n y z t u

1 8 14 27 80

� The MLE of ψ is 4.02.

� The p-value for testing ψ = 0 versus ψ > 0 is 0.16276 for r and 0.12714for r∗, both indicating almost no evidence of signal.

� The 0.99 lower and upper limits for ψ obtained from r are

−2.64 ( 7→ 0), 33.84.

� The analogous limits obtained from r∗ are

−2.60 ( 7→ 0), 36.52.

Page 15: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Single channel: An example

http://stat.epfl.ch January 2019 – slide 15

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

psi

Sig

nific

ance

func

tion

Page 16: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

One channel: Coverage of confidence limits

http://stat.epfl.ch January 2019 – slide 16

0 5 10 15 20 25

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

ψ

cove

rage

Single channel, 90%

0 5 10 15 20 25

0.98

00.

985

0.99

00.

995

1.00

cove

rage

Single channel, 99%

Target coverage (red), coverage of r∗ (black), coverage of r∗B (dashes), as a functionof interest parameter ψ.

Page 17: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Multichannel simulation, c = 10

http://stat.epfl.ch January 2019 – slide 17

TABLE 3

Empirical coverage probabilities in a multiple-channel simulation

with 10,000 replications, ψ = 2, β = (0.20,0.30,0.40, . . . ,1.10),

γ = (0.20,0.25,0.30, . . . ,0.65), t = (15,17,19, . . . ,33) and

u = (50,55,60, . . . ,95)

Probability r r∗

r∗

B

0.0100 0.0099 0.0101 0.0109

0.0250 0.0244 0.0255 0.0273

0.0500 0.0493 0.0519 0.0542

0.1000 0.0967 0.1012 0.1035

0.5000 0.4869 0.5043 0.5027

0.9000 0.8900 0.9013 0.8942

0.9500 0.9421 0.9499 0.9427

0.9750 0.9687 0.9759 0.9689

0.9900 0.9875 0.9913 0.9864

Figures in bold differ from the nominal level by more than simula-

tion error.

Page 18: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Multichannel simulation, c = 10

http://stat.epfl.ch January 2019 – slide 18

0 5 10 15 20 25

0.80

0.85

0.90

0.95

1.00

ψ

cove

rage

Multiple channels, 90%

0 5 10 15 20 25

0.97

50.

980

0.98

50.

990

0.99

51.

000

ψco

vera

ge

Multiple channels, 99%

Target coverage (red), coverage of r∗ (black), coverage of r∗B (dashes), as a functionof interest parameter ψ.

Page 19: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Discussion

http://stat.epfl.ch January 2019 – slide 19

� Modified likelihood root can yield highly accurate inference in this toyproblem.

� It’s pretty good even with many nuisance parameters.

� Some (boundary) cases give problems with q(ψ): then we use r(ψ).

� Noninformative Bayesian solution provides (slightly) worse confidenceintervals, but is quite feasible, could instead use an informative prior.

Page 20: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Discussion

http://stat.epfl.ch January 2019 – slide 19

� Modified likelihood root can yield highly accurate inference in this toyproblem.

� It’s pretty good even with many nuisance parameters.

� Some (boundary) cases give problems with q(ψ): then we use r(ψ).

� Noninformative Bayesian solution provides (slightly) worse confidenceintervals, but is quite feasible, could instead use an informative prior.

Now for neutrinos . . .

Page 21: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Neutrino ordering

Detecting a Signal

Higher OrderLikelihood Inference

Application toSignal Detection

⊲ Neutrino ordering

Simulation

http://stat.epfl.ch January 2019 – slide 20

Page 22: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Fun with neutrinos

http://stat.epfl.ch January 2019 – slide 21

� Discussion based on Heavens and Sellentin (2018, arXiv): Objective Bayesian analysis of

neutrino masses and hierarchy, suggested by Louis.

� Neutrino masses: 0 ≤ µL ≤ µM ≤ µH meV

� Two hierarchies: normal and inverted

� Measurements y1 = 75meV2, y2 = 2524meV2, and probability models

– normal hierarchy:

y1 ∼ N (µ2

M − µ2

L, 1.82),

y2 ∼ N{µ2

H − (µ2

M + µ2

L)/2, 402},

– inverted hierarchy:

y1 ∼ N (µ2

H − µ2

M , 1.82),

y2 ∼ N{(µ2

H + µ2

M )/2− µ2

L, 402}

� 95% credible region: P(µL + µM + µH ≤ 120meV) = 0.95, interpreted as penalizedlikelihood corresponding to y3 = 0 observed from half-normal distribution

|N{µL + µM + µH , (120/1.96)2}|.

� Possible typo: replace y2 = 2524 by y2 = 2514 in inverted case?

Page 23: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

First idea

http://stat.epfl.ch January 2019 – slide 22

� Normal hierarchy indicated if

µ2H − µ2M > µ2M − µ2L ⇔ ψ = µ2H + µ2L − 2µ2M > 0,

so try and use higher order methods to test this.

� As a warm-up, try testing non-nested hypotheses:

– Cox (1961, Tests of Separate Families of Hypotheses, 4th BerkeleySymposium),

– Cox (1962, Further results on tests of separate families of hypotheses,JRSSB),

– massive literature in econometrics,

� Boils down to use of likelihood ratios for the models.

Page 24: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Separate families test with neutrino data

http://stat.epfl.ch January 2019 – slide 23

� Natural test statistic is difference of maximised log likelihoods,

T = 2(ℓNormal − ℓInverted

),

with observed value tobs = 1.748.

� T·

∼ normal with unknown mean and variance.

� We estimate the significance probabilities for testing the normal and invertedmodels,

pNormal = PNormal(T ≤ tobs), pInverted = PInverted(T ≥ tobs)

by simulating data from the best-fitting normal and inverted models, and get

pNormal ≈ 0.44, pInverted ≈ 0.55,

so the data do not distinguish the models.

� Equivalent to (two) bootstrap hypothesis tests (Davison and Hinkley, 1997,Bootstrap Methods and their Application, Chapter 4).

Page 25: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Profile log likelihood for µL

http://stat.epfl.ch January 2019 – slide 24

−200 −150 −100 −50 0 50 100

−45

−40

−35

−30

−25

muL

Twic

e lo

g lik

elih

ood

2ℓp(µL) for normal hierarchy (black) and inverted hierarchy (red)

Page 26: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Oops . . .

http://stat.epfl.ch January 2019 – slide 25

� On reflection the result is obvious: there are three parameters to match the means of y1and y2, and this can be done (almost) perfectly:

– normal model: µL = 0.238, µM = 8.665, µH = 50.611, E(y1) = 75.018,E(y2) = 2523.90,

– inverted model: µL = 0.009, µM = 49.856, µH = 50.603, E(y1) = 75.000,E(y2) = 2523.14,

so the only difference between the fits is due to the penalty, which is larger for theinverted model, for which µL + µM + µH ≈ 100, whereas for the normal model,µL + µM + µH ≈ 59.

� The same argument applies to Bayesian analyses: the log odds depends only on theextent that the credible region for the sum of masses favours the normal model, and theeffects of a prior.

� Check: replace 120meV by stronger penalty P(µL + µM + µH ≤ 12meV) = 0.95, whichdisfavours the inverted hierarchy even more, and now get

pNormal ≈ 0.72, pInverted ≈ 0.024,

so we would reject the inverted model at the 5% level, as expected.

� Conclusion: more (but different!) data are needed to distinguish the hierarchies.

Page 27: ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Almost ... · ÉC OLE POL Y TEC H NIQU E FÉ DÉR ALE D E LAUSANNE Banff 2006 fl.ch January 2019 – slide 2

ÉC OL E POL Y T EC H N I Q U EFÉ DÉRA LE D E L A U SAN N E

Thanks!

http://stat.epfl.ch January 2019 – slide 26