G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,

G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1

Report from the Statistics Forum

ATLAS Week Physics PlenaryCERN, 23 June, 2011

Glen Cowan, Eilam Gross, Kyle Cranmer*

* on behalf of the ATLAS Statistics Forum


Outline

Recent issues concerning upper limitsUpdate to recommendation for PCLNew software for low background analyses

Interactions with the CMS Statistics GroupProvisional agreement for summer conferencesLonger-term issues


Setting Limits

There are several methods one may use for setting limits:One-sided (frequentist), e.g., PCL, CLsUnified intervals (Feldman-Cousins)Bayesian

In ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS.

This recommendation was adopted by Physics Coordinationafter the Statistics Workshop held on 15 April 2011, and will berevisited at the upcoming PC meeting on 27 June 2011.


PCL Quick Review (see arXiv:1105.3166)Consider a parameter μ proportional to rate of signal (μ ≥ 0).

“Naive” upper limits can exclude parameter values to whichone has little or no sensitivity (for s << b, exclusion prob ~ 5%).

CLs solves this by effectively penalizing the test of each parameter value by an amount that varies continuously with the sensitivity; result is a limit with coverage probability > 95%.

PCL addresses the problem by regarding μ to be excluded if:

•It is excluded by a statistical test at 95% CL.

(b) One has sufficient sensitivity to μ.

Here sensitivity is measured by the power M0(μ) of a test of μwith respect to the background-only alternative. I.e. require

M0(μ) =P (μ above limit|μ =0) ≥Mmin


PCL in practice

PCL with Mmin = 0.16

Here power belowthreshold; do not exclude.

median limit(unconstrained)

+/ 1σ bandof limit dist.assuming μ = 0.

observed limit

Important to report both the constrained and unconstrained limits.


Choice of minimum power

Choice of Mmin is convention. Formally it should be large relativeto 1 – CL (5%). Earlier we have proposed

because in case of x ~ Gauss(μ,σ) this means that one applies thepower constraint if the observed limit fluctuates down by one standard deviation or more.

For the Gaussian example, this gives μmin = 0.64σ, i.e., the lowest limit is similar to the intrinsic resolution of the measurement (σ).

We have recently revisited this point and now propose moving theminimum power to Mmin = 0.5, i.e., PCL never goes below themedian limit under assumption of background only.


Aggressive conservatism

It could be that owing to practical constraints, certain systematicuncertainties are over-estimated in an analysis; this couldbe justified by wanting to be conservative.

The consequence of this will be that the +/1 sigma bands ofthe unconstrained limit are broader than they otherwise would be.

If the unconstrained limit fluctuates low, it could be that thePCL limit, constrained at the 1sigma band, is lower than itwould be had the systematics been estimated correctly.

Being conservative could be more aggressive.

If the power constraint Mmin is at 0.5, then by inflating the systematics the median of the unconstrained limit is expected to move less, and in any case upwards, i.e., it will lead to a lessstrong limit (as one would expect from “conservatism”).


Upper limits for Gaussian problem

measurement →

(unk

now

n) tr

ue v

alue

→


Coverage probability for Gaussian problem

(unknown) true value →

P(μ

≤ μ

up |

μ) →


PCL summary of recent developments

Proposal to move minimum power from 16% to 50%.

Power constraint applied at the median limit.

Improvement of approximations used for low-count analyses.

New code available (see Statistics Forum twiki):

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/

StatisticsTools.

Substantial improvement in speed.

Substantial progress on documentation, including background onmethod and implementation details (see twiki).


New frequentist limit documenthttps://twiki.cern.ch/twiki/pub/AtlasProtected/StatisticsTools/Frequentist_Limit_Recommendation.pdf


New usage details on twiki

Example from twiki of how to determine whether asymptotic formulae are valid.

The new scripts implement the appropriate proceduresfor different regimes, e.g., asymptotic, b < 10, b > 10.

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLimitRecommendationImplementation


Interactions with the CMS Statistics GroupInteraction between ATLAS and CMS statistics groups beganalready several years ago in the context of the Higgs combination;this effort continues in the separate LHC Higgs Combination Group:

In addition, the meetings between the ATLAS and CMS StatisticsGroups have increased this year with the goal of agreeing on statistical tools and practice to facilitate comparison and eventual combination of results.

ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. MurrayCMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo


Discussion on Limits with CMS

Within CMS it has been recommended to use at least one of the three methods mentioned in the PDG Statistics Review:

BayesianCLsFeldman-Cousins

In ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS.

In recent meetings with CMS we have listed the mathematical properties of the various limits and on these we essentially agree.

There is some disagreement on the importance that one should attach to different properties.


Properties of Frequentist Limits (1)One-sided (PCL, CLs) versus unified (Feldman-Cousins)

Exclude parameter values because predicted rate higherthan data, or because prediction ≠ data on other grounds

(e.g., likelihood ratio wrt two-sided alternative).

Coverage

Substantial over-coverage for CLs and upper edge of F-C. Exact for full interval of F-C. Exact for PCL inregion of sensitivity; 100% otherwise.

Flip-flopping

Violation of coverage if decision to report limit ortwo-sided interval is based on data. Not problem for F-C; OK for one-sided limits if one agrees to always reportupper limit for searches (also should report p-value ofbackground-only hypothesis, p0).

for one-sided limit can be avoided if for every searchalways report upper limit and discovery significance.


Properties of Frequentist Limits (2)Avoiding exclusion in cases with little/no sensitivity.

PCL Discontinuous separation of (in)sensitive regions. CLs Ratio of p-values → penalty against low sensitivity. F-C Counts prob. of upwards fluctuation for upper limit.

Power (related to median limit under background-only hypothesis).

PCL Most powerful for region with sensitivity; zero otherwise. CLs Less powerful than PCL F-C upper edge as limit less powerful than PCL, CLs, but full interval also has power relative to higher values of μ.

Correspondence with Bayesian result for some prior

CLs yes; F-C yes (approx.); PCL no.


Properties of Frequentist Limits (3)Negatively biased relevant subsets

Related to conditional coverage probability given that outcome is observed in some identifiable subset of data space.

PCL, CLs, F-C do not have NBRS. If also condition on m, all methods have (adapted) NBRS.

Familiarity in HEP community

CLs widely used. F-C used for many problems but not often as a replacement for upper limits. PCL is new but core concepts are textbook statistics and documentation now greatly improved: arXiv:1105.3166 and info on method and implementation on https://twiki.cern.ch/twiki/bin/view/AtlasProtected/

StatisticsTools.


Areas where ATLAS and CMS agree

Both collaborations support RooStats as the software tool for combinations. See, e.g., K. Cranmer talk at PHYSTAT 2011:

https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=107747

Within both collaborations there are many who support the Bayesian approach, especially for limits (see, e.g., talks byA. Harel and D. Casadei at PHYSTAT 2011):

Recent effort in ATLAS to establish recommendations forBayesian limits (Georgios Choudalakis, Diego Casadei).

Within both ATLAS and CMS there exist different views onunfolding, with a strong tendency away from use of bin-by-bin factors. (See e.g. talks by G. Choudalakis and M. Weber from PHYSTAT 2011).




Discussions on Discovery with CMSThe two collaborations broadly agree on how to report the significance of a discovery.

The test statistic recommended in ATLAS coincides with the Feldman-Cousins approach for testing the background-only model.

There is also support in both collaborations for an approximatecorrection for the Look-Elsewhere Effect using the approach of Gross and Vitells (EPJC 70 (2010) 525, arXiv:1005.1891; arXiv:1105.4355).

And there is no controversy if analyses correct for LEE exactly (e.g., floating-mass Higgs search), as long as the uncorrected (e.g., fixed-mass) discovery significance is also reported.

Both collaborations have made some progress in studying BayesianModel Selection using Bayes Factors (ongoing).


Summary and conclusions (1)PCL solves problem of “spurious exclusion” by separating the parameter space into regions in which one has/hasn’t sufficient sensitivity as given by the probability to reject μ if background-only model is true.

Recommendations for ATLAS: Report unconstrained limit.Report power constrained limit (with power M0(μ) ≥ 0.5).Report p-value of background-only hypothesis.Also report CLs.

In problems with low background, recent improvement to software implementation related to treatment of nuisance params.

ATLAS also has ongoing effort to establish recommendations for Bayesian limits (Georgios Choudalakis, Diego Casadei).

new


Summary and conclusions (2)

Discussions with the CMS Statistics Group are ongoing.

Goal is to agree on statistical tools and practice to facilitate comparison and eventual combination of results.

Broad agreement in a number of areas but still non-trivial issuesconcerning limits:

one-sided vs. unifiedPCL vs. CLs

We essentially agree on the mathematical properties of the approaches; debate is on relative importance of various properties.

Provisional agreement to use CLs as basis for comparison; in longer term Bayesian limit may play this role.


Extra slides


Some reasons to consider increasing Mmin

Mmin is supposed to be “substantially” greater than α (5%).

So Mmin = 16% is fine for 1 – α = 95%, but if we ever want 1 – α = 90%, then16% is not “large” compared to 10%; μmin = 0.28σ starts to look small relative to the intrinsic resolution of the measurement. Not an issue if we stick to 95% CL.

PCL with Mmin = 16% is often substantially lower than CLs.This is because of the conservatism of CLs (see coverage).

But goal is not to get a lower limit per se, rather

● to use a test with higher power in those regions where one feels there is enough sensitivity to justify exclusion and

● to allow for easy communication of coverage (95% for μ ≥ μmin; 100% otherwise).


A few further considerations

Obtaining PCL requires the distribution of unconstrained limits,from which one finds the Mmin (16%, 50%) percentile.

In some analyses this can entail calculational issues thatare expected to be less problematic for Mmin = 50% than for 16%.

Analysts produce anyway the median limit, even in absence ofthe error bands, so with Mmin = 50% the burden on the analyst is reduced somewhat (but one would still want the error bands).

We therefore recently proposed moving Mmin to 50%.


Treatment of nuisance parametersIn most problems, the data distribution is not uniquely specifiedby μ but contains nuisance parameters θ.

This makes it more difficult to construct an (unconstrained)interval with correct coverage probability for all values of θ,so sometimes approximate methods used (“profile construction”).

More importantly for PCL, the power M0(μ) can depend on θ.So which value of θ to use to define the power?

Since the power represents the probability to reject μ if thetrue value is μ = 0, to find the distribution of μup we take the values of θ that best agree with the data for μ = 0:May seem counterintuitive, since the measure of sensitivitynow depends on the data. We are simply using the data to choosethe most appropriate value of θ where we quote the power.


ATLAS/CMS discussions on one-sided limits

Some prefer to report one-sided frequentist upper limits (CLs, PCL); others prefer unified (Feldman-Cousins) limits, wherethe lower edge may or may not exclude zero.

The prevailing view in the ATLAS Statistics Forum has been that in searches for new phenomena, one wants to know whether a cross section is excluded on the basis that its predicted rate is too high relative to the observation, not excluded on some other grounds (e.g., a mixture of too high or too low).

Among statisticians there is support for both approaches.


Discussions concerning flip-floppingOne-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e., violation of coverage probability if one decides, based on the data, whether to report an upper limit or a measurement with error bars (two-sided interval).

This can be avoided by “always” reporting:

(1) An upper limit based on a one-sided test.

(2) The discovery significance (equivalent to p-value of background-only hypothesis).

In practice, “always” can mean “for every analysis carried outas a search”, i.e., until the existence of the process is well established (e.g., 5σ).

I.e. we only require what is done in practice to map approximatelyonto the idealized infinite ensemble.


Discussions on CLs and F-C

CLs has been criticized as a method for preventing spurious exclusion as it leads to significant overcoverage that is in practice not communicated to the reader.

This was the motivation behind PCL.

We have also not supported using the upper edge of a Feldman-Cousins interval as a substitute for a one-sided upper limit, sincewhen used in this way F-C has lower power.

Furthermore F-C unified intervals protect against small (or null)intervals by counting the probability of upward data fluctuations, which are not relevant if the goal is to establish an upper limit.


Discussions concerning PCLPCL has been criticized as it does not obviously map onto a Bayesian result for some choice of prior (CLs = Bayesian forspecial cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0).

We are not convinced of the need for this. The frequentist propertiesof PCL are well defined, and as with all frequentist limits oneshould not interpret them as representing Bayesian credible intervals.

Further criticism of PCL is related to an unconstrained limit thatcould exclude all values of μ. A remnant of this problem could survive after application of the power constraint (cf. “negatively biased relevant subsets”).

PCL does not have negatively biased relevant subsets (nor doesour unconstrained limit, as it never excludes μ = 0).

On both points, debate still ongoing.

Documents

G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,