Click here to load reader

G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum ATLAS Week Physics Plenary CERN, 23 June, 2011 Glen Cowan,

  • View
    215

  • Download
    0

Embed Size (px)

Text of G. Cowan Report from the Statistics Forum / CERN, 23 June 2011 1 Report from the Statistics Forum...

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Report from the Statistics ForumATLAS Week Physics PlenaryCERN, 23 June, 2011Glen Cowan, Eilam Gross, Kyle Cranmer** on behalf of the ATLAS Statistics Forum

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*OutlineRecent issues concerning upper limitsUpdate to recommendation for PCLNew software for low background analysesInteractions with the CMS Statistics GroupProvisional agreement for summer conferencesLonger-term issues

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Setting LimitsThere are several methods one may use for setting limits:One-sided (frequentist), e.g., PCL, CLsUnified intervals (Feldman-Cousins)BayesianIn ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS.This recommendation was adopted by Physics Coordinationafter the Statistics Workshop held on 15 April 2011, and will berevisited at the upcoming PC meeting on 27 June 2011.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*PCL Quick Review (see arXiv:1105.3166)Consider a parameter proportional to rate of signal ( 0).Naive upper limits can exclude parameter values to whichone has little or no sensitivity (for s 95%.PCL addresses the problem by regarding to be excluded if:It is excluded by a statistical test at 95% CL. (b) One has sufficient sensitivity to .Here sensitivity is measured by the power M0() of a test of with respect to the background-only alternative. I.e. require

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*PCL in practicePCL with Mmin = 0.16Here power belowthreshold; do not exclude.median limit(unconstrained)+/- 1 bandof limit dist.assuming = 0.observed limitImportant to report both the constrained and unconstrained limits.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Choice of minimum powerChoice of Mmin is convention. Formally it should be large relativeto 1 CL (5%). Earlier we have proposedbecause in case of x ~ Gauss(,) this means that one applies thepower constraint if the observed limit fluctuates down by one standard deviation or more.For the Gaussian example, this gives min = 0.64, i.e., the lowest limit is similar to the intrinsic resolution of the measurement ().We have recently revisited this point and now propose moving theminimum power to Mmin = 0.5, i.e., PCL never goes below themedian limit under assumption of background only.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Aggressive conservatismIt could be that owing to practical constraints, certain systematicuncertainties are over-estimated in an analysis; this couldbe justified by wanting to be conservative.The consequence of this will be that the +/-1 sigma bands ofthe unconstrained limit are broader than they otherwise would be.If the unconstrained limit fluctuates low, it could be that thePCL limit, constrained at the -1sigma band, is lower than itwould be had the systematics been estimated correctly.Being conservative could be more aggressive.If the power constraint Mmin is at 0.5, then by inflating the systematics the median of the unconstrained limit is expected to move less, and in any case upwards, i.e., it will lead to a lessstrong limit (as one would expect from conservatism).

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Upper limits for Gaussian problemmeasurement (unknown) true value

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Coverage probability for Gaussian problem(unknown) true value P( up | )

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*PCL summary of recent developmentsProposal to move minimum power from 16% to 50%.Power constraint applied at the median limit.Improvement of approximations used for low-count analyses.New code available (see Statistics Forum twiki):https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ StatisticsTools.Substantial improvement in speed.Substantial progress on documentation, including background onmethod and implementation details (see twiki).

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*New frequentist limit documenthttps://twiki.cern.ch/twiki/pub/AtlasProtected/StatisticsTools/Frequentist_Limit_Recommendation.pdf

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*New usage details on twikiExample from twiki of how to determine whether asymptotic formulae are valid.

    The new scripts implement the appropriate proceduresfor different regimes, e.g., asymptotic, b < 10, b > 10. https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLimitRecommendationImplementation

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Interactions with the CMS Statistics GroupInteraction between ATLAS and CMS statistics groups beganalready several years ago in the context of the Higgs combination;this effort continues in the separate LHC Higgs Combination Group:In addition, the meetings between the ATLAS and CMS StatisticsGroups have increased this year with the goal of agreeing on statistical tools and practice to facilitate comparison and eventual combination of results.ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. MurrayCMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Discussion on Limits with CMSWithin CMS it has been recommended to use at least one of the three methods mentioned in the PDG Statistics Review:BayesianCLsFeldman-CousinsIn ATLAS, we have recommended using Power-Constrained Limits (PCL) and also to report CLs limits to allow for comparison with CMS.In recent meetings with CMS we have listed the mathematical properties of the various limits and on these we essentially agree.There is some disagreement on the importance that one should attach to different properties.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Properties of Frequentist Limits (1)One-sided (PCL, CLs) versus unified (Feldman-Cousins)Exclude parameter values because predicted rate higherthan data, or because prediction data on other grounds (e.g., likelihood ratio wrt two-sided alternative).CoverageSubstantial over-coverage for CLs and upper edge of F-C. Exact for full interval of F-C. Exact for PCL inregion of sensitivity; 100% otherwise. Flip-floppingViolation of coverage if decision to report limit ortwo-sided interval is based on data. Not problem for F-C; OK for one-sided limits if one agrees to always reportupper limit for searches (also should report p-value ofbackground-only hypothesis, p0).

    for one-sided limit can be avoided if for every searchalways report upper limit and discovery significance.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Properties of Frequentist Limits (2)Avoiding exclusion in cases with little/no sensitivity. PCL Discontinuous separation of (in)sensitive regions. CLs Ratio of p-values penalty against low sensitivity. F-C Counts prob. of upwards fluctuation for upper limit.Power (related to median limit under background-only hypothesis). PCL Most powerful for region with sensitivity; zero otherwise. CLs Less powerful than PCL F-C upper edge as limit less powerful than PCL, CLs, but full interval also has power relative to higher values of .Correspondence with Bayesian result for some prior CLs yes; F-C yes (approx.); PCL no.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Properties of Frequentist Limits (3)Negatively biased relevant subsets Related to conditional coverage probability given that outcome is observed in some identifiable subset of data space. PCL, CLs, F-C do not have NBRS. If also condition on m, all methods have (adapted) NBRS.Familiarity in HEP community CLs widely used. F-C used for many problems but not often as a replacement for upper limits. PCL is new but core concepts are textbook statistics and documentation now greatly improved: arXiv:1105.3166 and info on method and implementation on https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ StatisticsTools.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Areas where ATLAS and CMS agreeBoth collaborations support RooStats as the software tool for combinations. See, e.g., K. Cranmer talk at PHYSTAT 2011:https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=107747Within both collaborations there are many who support the Bayesian approach, especially for limits (see, e.g., talks byA. Harel and D. Casadei at PHYSTAT 2011):Recent effort in ATLAS to establish recommendations forBayesian limits (Georgios Choudalakis, Diego Casadei).Within both ATLAS and CMS there exist different views onunfolding, with a strong tendency away from use of bin-by-bin factors. (See e.g. talks by G. Choudalakis and M. Weber from PHYSTAT 2011).

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Discussions on Discovery with CMSThe two collaborations broadly agree on how to report the significance of a discovery.The test statistic recommended in ATLAS coincides with the Feldman-Cousins approach for testing the background-only model.There is also support in both collaborations for an approximatecorrection for the Look-Elsewhere Effect using the approach of Gross and Vitells (EPJC 70 (2010) 525, arXiv:1005.1891; arXiv:1105.4355).And there is no controversy if analyses correct for LEE exactly (e.g., floating-mass Higgs search), as long as the uncorrected (e.g., fixed-mass) discovery significance is also reported.Both collaborations have made some progress in studying BayesianModel Selection using Bayes Factors (ongoing).

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Summary and conclusions (1)PCL solves problem of spurious exclusion by separating the parameter space into regions in which one has/hasnt sufficient sensitivity as given by the probability to reject if background-only model is true.Recommendations for ATLAS: Report unconstrained limit.Report power constrained limit (with power M0() 0.5).Report p-value of background-only hypothesis.Also report CLs.In problems with low background, recent improvement to software implementation related to treatment of nuisance params.ATLAS also has ongoing effort to establish recommendations for Bayesian limits (Georgios Choudalakis, Diego Casadei).new

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Summary and conclusions (2)Discussions with the CMS Statistics Group are ongoing.Goal is to agree on statistical tools and practice to facilitate comparison and eventual combination of results.Broad agreement in a number of areas but still non-trivial issuesconcerning limits:one-sided vs. unifiedPCL vs. CLsWe essentially agree on the mathematical properties of the approaches; debate is on relative importance of various properties.Provisional agreement to use CLs as basis for comparison; in longer term Bayesian limit may play this role.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Extra slides

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Some reasons to consider increasing MminMmin is supposed to be substantially greater than (5%).So Mmin = 16% is fine for 1 = 95%, but if we ever want 1 = 90%, then16% is not large compared to 10%; min = 0.28 starts to look small relative to the intrinsic resolution of the measurement. Not an issue if we stick to 95% CL.PCL with Mmin = 16% is often substantially lower than CLs.This is because of the conservatism of CLs (see coverage).But goal is not to get a lower limit per se, rather to use a test with higher power in those regions where one feels there is enough sensitivity to justify exclusion and to allow for easy communication of coverage (95% for min; 100% otherwise).

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*A few further considerations Obtaining PCL requires the distribution of unconstrained limits,from which one finds the Mmin (16%, 50%) percentile.In some analyses this can entail calculational issues thatare expected to be less problematic for Mmin = 50% than for 16%.Analysts produce anyway the median limit, even in absence ofthe error bands, so with Mmin = 50% the burden on the analyst is reduced somewhat (but one would still want the error bands).We therefore recently proposed moving Mmin to 50%.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Treatment of nuisance parametersIn most problems, the data distribution is not uniquely specifiedby but contains nuisance parameters .This makes it more difficult to construct an (unconstrained)interval with correct coverage probability for all values of ,so sometimes approximate methods used (profile construction).More importantly for PCL, the power M0() can depend on .So which value of to use to define the power?Since the power represents the probability to reject if thetrue value is = 0, to find the distribution of up we take the values of that best agree with the data for = 0:May seem counterintuitive, since the measure of sensitivitynow depends on the data. We are simply using the data to choosethe most appropriate value of where we quote the power.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*ATLAS/CMS discussions on one-sided limitsSome prefer to report one-sided frequentist upper limits (CLs, PCL); others prefer unified (Feldman-Cousins) limits, wherethe lower edge may or may not exclude zero.The prevailing view in the ATLAS Statistics Forum has been that in searches for new phenomena, one wants to know whether a cross section is excluded on the basis that its predicted rate is too high relative to the observation, not excluded on some other grounds (e.g., a mixture of too high or too low).Among statisticians there is support for both approaches.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Discussions concerning flip-floppingOne-sided limits (CLs, PCL) can suffer from flip-flopping, i.e., violation of coverage probability if one decides, based on the data, whether to report an upper limit or a measurement with error bars (two-sided interval).This can be avoided by always reporting:(1) An upper limit based on a one-sided test.(2) The discovery significance (equivalent to p-value of background-only hypothesis).In practice, always can mean for every analysis carried outas a search, i.e., until the existence of the process is well established (e.g., 5).I.e. we only require what is done in practice to map approximatelyonto the idealized infinite ensemble.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Discussions on CLs and F-CCLs has been criticized as a method for preventing spurious exclusion as it leads to significant overcoverage that is in practice not communicated to the reader.This was the motivation behind PCL.We have also not supported using the upper edge of a Feldman-Cousins interval as a substitute for a one-sided upper limit, sincewhen used in this way F-C has lower power.Furthermore F-C unified intervals protect against small (or null)intervals by counting the probability of upward data fluctuations, which are not relevant if the goal is to establish an upper limit.

    Report from the Statistics Forum / CERN, 23 June 2011

  • G. Cowan Report from the Statistics Forum / CERN, 23 June 2011*Discussions concerning PCLPCL has been criticized as it does not obviously map onto a Bayesian result for some choice of prior (CLs = Bayesian forspecial cases, e.g., x ~ Gauss(, ), constant prior for 0).We are not convinced of the need for this. The frequentist propertiesof PCL are well defined, and as with all frequentist limits oneshould not interpret them as representing Bayesian credible intervals.Further criticism of PCL is related to an unconstrained limit thatcould exclude all values of . A remnant of this problem could survive after application of the power constraint (cf. negatively biased relevant subsets).PCL does not have negatively biased relevant subsets (nor doesour unconstrained limit, as it never excludes = 0). On both points, debate still ongoing.

    Report from the Statistics Forum / CERN, 23 June 2011