Upload
camilla-kennedy
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 1
Input from Statistics Forum for Exotics
ATLAS Exotics Meeting
CERN/phone, 22 January, 2009
Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan
Input from: Eilam Gross, Samir Ferrag
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 2
IntroContributions to Statistics Forum from Exotics groupover last year have raised questions in several areas:
methods for setting limits, establishing discovery,
methods for incorporating systematic uncertainties,
approval of software, methods,…
Purpose of this talk is to address some of these issues as part ofan ongoing discussion (not yet definitive answers).
Some pointers to info -- StatForum Webpage: twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/StatisticsTools
including notes in Statistics FAQ and also 1st half of the Higgs Combination chapter of CSC Book (p 1480).
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 3
Statistics Forum Website: FAQ
Some general items:PDG Chapters,Pedestrian's guide,Glossary, ...
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 4
Statistics Forum FAQ Notes
This is a living document
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 5
Statistics Forum FAQ NotesThe “FAQ” consists of a collection of notes on specific questionsuse cases, examples, ...
Bayesian methods for ATLAS Higgs search (GC)Comparison of significance from profile and integrated
likelihoods (GC, EG) Discovery significance with statistical uncertainty in the
background estimate (EG, OV, GC)Error analysis for efficiency (GC)How to measure efficiency (DC)MC statistical errors in ML fits (GC)Covariance matrix for histogram made using seed events (GC)
If you have a note which you think should be included here, orif you are interested to write such a note or comment on a note or request a note on a specific subject please let us know.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 6
Some statistics issues in searches (1) Define appropriate test variable(s).
Cut-basedMultivariate method (Fisher, NN, BDT, SVM,...)
(2) Determine its (their) distribution(s) under hypothesis of:background only, background + (parametrized) signal, ...
Data-driven or MC, parametric or histogram, ...Quantify systematic uncertainties.
(3) Measure the distribution in data; quantify level ofagreement between data and predictions (resultsin limits, discovery significance).
Exclusion limits (Neyman, CLs, Bayesian)Discovery significance (frequentist, Bayesian)
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 7
Multivariate methods – brief comment Most searches in the CSC book use physically motivated cut-based selection:
analysis easy to understand andeasy to spot anomalous behaviour.
But by a nonlinear decision boundary between signal and background leads in general to higher sensitivity.
Many new tools on market (see e.g. TMVA manual): Boosted Decision Trees, K-Nearest Neighbour/Kernel-based Density Estimation, Support Vector Machines,..
Multivariate analysis suffers some loss of transparency but... from MVA plus e.g. from cuts could win the race.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 8
Search formalism
Define a test variable whose distribution is sensitive to whetherhypothesis is background-only or signal + background.
E.g. count n events in signal region:
events found
expected signal expected background
strength parameter= s/ s,nominal
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 9
Search formalism with multiple bins (channels)
Bin i of a given channel has ni events, expectation value is
Expected signal and background are:
is global strength parameter, common to all channels.= 0 means background only, = 1 is nominal signal hypothesis.
btot, s, b arenuisance parameters
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 10
Subsidiary measurements for background
One may have a subsidiary measurement to constrain the background based on a control region where one expects no signal.
In bin i of control histogram find mi events; expectation value is
where the ui can be found from MC and includes parametersrelated to the background (mainly rate, sometimes also shape).
In some measurements there may be no explicit subsidiarymeasurement but the sidebands around a signal peak effectivelyplay the same role in constraining the background.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 11
Likelihood functionFor an individual search channel, ni ~ Poisson(si+bi), mi ~ Poisson(ui). The likelihood is:
Parameter of interest
Here represents allnuisance parameters
For multiple independent channels there is a likelihood Li(,i) for each. The full likelihood function is
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 12
Systematics "built in" as long as some point in -space = "truth"
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 13
p-values
Quantify level of agreement between data and hypothesis H with:
p-value = Prob(data with ≤ compatibility with H when compared to the data we got | H )
= probability, under assumption of H, to obtain data as bizarre as the data we got (or more so)
≠ probability that H is true (!!!)
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 14
Significance from p-value
Define significance Z as the number of standard deviationsthat a Gaussian variable would fluctuate in one directionto give the same p-value.
TMath::Prob
TMath::NormQuantile
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 15
When to publish
HEP folklore is to claim discovery when p = 2.9 × 10-7,corresponding to a significance Z = 5.
This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g.,
phenomenon reasonable p-value for discoveryD0D0 mixing Higgs (?)Life on Mars
Astrology
Note some groups have defined 5 to refer to a two-sidedfluctuation, i.e., p = 5.7 × 10-7
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 16
Distribution of qSo to find the p-value we need f(q|) .
Method 1: generate toy MC experiments with hypothesis , obtain at distribution of q.
OK for e.g. ~103 or 104 experiments, 95% CL limits.
But for discovery usually want 5, p-value = , so needto generate ~108 toy experiments (for every point in param. space).
Method 2: Wilk's theorem says that for large enough sample,
f(q|) ~ chi-square(1 dof)
This is the approach used in the Higgs Combination exercise;not yet validated to 5 level.
If/when we are fortunate enough to see a signal, then focus MC resources on that point in parameter space.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 17
Significance from q
If we take f(q|) ~ 2 for 1dof, then the significance is (see Higgs
combo note):
For n ~ Poisson (s+b) with b known, testing =0 gives
To quantify sensitivity give e.g. expected Z under s+b hypothesis
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 18
Likelihood ratio Ls+b/Lb
Fast Fourier Transform method to find distribution; derivesn-event distribution from that of single event with FFT.
Hu and Nielson, physics/9906010
Solves "5-sigma problem".
Used at LEP -- systematics treated by averaging the likelihoodsby sampling new values of nuisance parameters for each simulated experiment (integrated rather than profile likelihood).
An alternative (in simple cases equivalent) test variable is
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 19
Determining distributions: systematics E.g. Mll distribution from Z'→dilepton search (CSC Book p 1709), uses 4-parameter function for signal.
Sidebands provide estimate of background.
So nothing in real analysis fromMC, but...
Still should consider some systematic due to fact that assumedparametric functions not perfect.
General approach: include more parameters making themodel more flexible, so that for some point in the enlargedparameter space, model = Nature (or difference negligible).
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 20
A general strategy (see attached note)Suppose one needs to know the shape of a distribution.Initial model (e.g. MC) is available, but known to be imperfect.
Q: How can one incorporate the systematic error arising fromuse of the incorrect model?
A: Improve the model.
That is, introduce more adjustable parameters into the modelso that for some point in the enlarged parameter space it is very close to the truth.
Then use profile the likelihood with respect to the additional(nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest.
Difficulty is deciding how to introduce the additional parameters.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 21
A simple example
The naive model (a) could have been e.g. from MC (herestatistical errors suppressed; point is to illustrate how toincorporate systematics.)
0th order modelTrue model (Nature) Data
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 22
Comparison with the 0th order model
The 0th order model gives q = 258.8, p×
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 23
Enlarging the model
Here try to enlarge the model by multiplying the 0th orderdistribution by a function s:
where s(x) is a linear superposition of Bernstein basis polynomials of order m:
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 24
Bernstein basis polynomials
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 25
Enlarging the parameter space
Using increasingly high order for the basis polynomials givesan increasingly flexible function.
At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter.
Now iterate this procedure, and stop when the data do notrequire addition of further parameters based on the likelihoodratio test.
Once the enlarged model has been found, simply includeit in any further statistical procedures, and the statistical errorsfrom the additional parameters will account for the systematicuncertainty in the original model.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 26
Fits using increasing numbers of parameters
Stop here
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 27
Setting limits
Method outlined in the CSC Higgs Combo = "CLs+b method",i.e., for the hypothesized(e.g. 1) compute the p-value:
is excluded at CL=0.95 if p <= 0.05, and if=1 is excluded, the corresponding point in parameter space for the signalmodel is excluded.
E.g. present expected limit on vs mass parameter.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 28
Setting limits: CLs
Alternative method (from Alex Read at LEP); exclude if
where
This cures the problematic case where the one excludes parameterpoint where one has no sensitivity (e.g. large mass scale)because of a downwards fluctuation of the background.
But there are perhaps other ways to get around this problem,e.g., only exclude if both observed and expected p-value .
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 29
Comment on validation procedures for methods
Ongoing discussions on methodologyIdeal is to use several methods (profile likelihood, Bayesian, CLs,...) for each result.
Formal procedures still evolving, but if you are goingto use a novel statistical technique, please come give a talkabout it at the Statistics Forum.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 30
Comment on software toolsSummer 08: agree to develop RooStats as common framework. Keep eye on ability to carry out independent validation.
Key players:
Kyle Cranmer (ATLAS)Gregory Schott (CMS)Wouter Verkerke (RooFit)Lorenzo Moneta (Root)
Work currently very active (and help needed).
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 31
Summary
Current areas of activity include:Development of profile likelihood, CLs, Bayesian methods for searches (including systematics);Combination tools (e.g. Higgs combination);RooStats software effort,Multivariate methods, ...
Statistics forum wants to increase active dialogue with thephysics groups.
If you are using a novel procedure or want to discuss a statistical method, please contact us.
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 32
Extra slides
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 33
Physics Group / StatForum interaction
Eilam Gross, 8.12.08
G. CowanRHUL Physics Input from Statistics Forum for Exotics page 34
Questions from Luis Flores, 24 September, 2008