Concluding Talk: Physics Gary Feldman Harvard University PHYSTAT 05 University of Oxford 15 September, 2005

Concluding Talk: Physics

Gary Feldman

Harvard University

PHYSTAT 05

University of Oxford

15 September, 2005

Gary Feldman PHYSTAT 05 15 September 2005 2

Topics

I will restrict my comments to two topics, both of which I am interested in and both of which received some attention at this meeting:

Event classification Nuisance parameters


Event Classification

The problem: Given a measurement of an event X = (x1,x2,…xn), find the function F(X) which returns 1 if the event is signal (s) and 0 if the event is background (b) to optimize a figure of merit, say

signal.

s b for discovery or s s+b for an established


Theoretical Solution

In principle the solution is straightforward: Use a Monte Carlo simulation to calculate the likelihood ratio Ls(X)/Lb(X) and derive F(X) from it. By the Neyman-Pearson Theorem, this is the optimum solution.

Unfortunately, this does not work due to the “curse of dimensionality.” In a high-dimension space, even the largest data set is sparse with the distance between neighboring events comparable to the radius of the space.


Practical Solutions

Thus, we are forced to substitute cleverness for brute force.

In recent years, physicists have come to learn that computers may be cleverer than they are.

They have turned to machine learning: One gives the computer samples of signal and background events and lets the computer figure out what F(X) is.


Artificial Neural Networks

Originally most of this effort was in artificial neural networks (ANN). Although used successfully in many experiments, ANNs tend to be finicky and often require real cleverness from their creators.

At this conference, there was an advance in ANNs reported by Harrison Prosper. The technique is to average over a collection of networks. Each network is constructed by sampling the weight probability density constructed from the training sample.


Trees and Rules

In the past couple of years, interest has started to shift to other techniques, such as decision trees, at least partially sparked by Jerry Friedman’s talk at PHYSTAT 03.

A single decision treehas limited power, butits power can be increasedby techniques that effectively sum many trees. A cartoon

fromRoe’s talk


Rules and Bagging Trees

Jerry Friedman gave a talk on rules, which effectively combines a series of trees.

Harrison Prosper gave a talk (for Ilya Narsky) on bagging (Bootstrap AGGregatING) trees. In this technique, one builds a collection of trees by selecting a sample of the training data and, optionally, a subset of the variables.

Results on significance of B e at BaBar

Single decision tree 2.16 Boosted decision trees 2.62 (not optimized)

Bagging decision trees 2.99


Boosted Decision Trees

Byron Roe gave a talk on the use of boosted trees in MiniBooNE. Misclassified events in one tree are given a higher weight and a new tree is generated. Repeat to generate 1000 trees. The final classifier is a weighted sum of all of the trees.

Comparison

to ANN:

Also more

robust.

% of signal retained

52 variables

21 variables


Other Talks

There were a couple of other talks on this subject by Puneet Sarda and Alex Gray, which I could not attend.


Nuisance Parameters

Nuisance parameters are parameters with unknown true values for which coverage is required in a frequentist analysis.

They may be statistical, such as number of background events in a sideband used for estimating the background under a peak.

They may be systematic, such as the shape of the background under the peak, or the error caused by the uncertainty of the hadronic fragmentation model in the Monte Carlo.

Most experiments have a large number of systematic uncertainties.


New Concerns for the LHC

Although the statistical treatment of these uncertainties is probably the analysis question that I have been asked the most, Kyle Cranmer has pointed out that these issues will be even more important at the LHC.

If the statistical error is O(1) and the systematic error is O(0.1), the the systematic error will contribute as its square or O(0.01) and it does not much matter how you treat it.

However, at the LHC, we may have process with 100 background events and 10% systematic errors.

Even more critical, we want 5 for a discovery level.


Why 5 ?

LHC searches: 500 searches each of which has 100 resolution elements (mass, angle bins, etc.) x 5 x 104 chances to find something.

One experiment: False positive rate at 5 (5 x 104) (3 x 10-7) = 0.015. OK.

Two experiments: Allowable false positive rate: 10. 2 (5 x 104) (1 x 10-4) = 10 3.7 required. Required other experiment verification:

(1 x 10-3)(10) = 0.01 3.1 required. Caveats: Is the significance real? Are there common

systematic errors?


A Cornucopia of Techniques

At this meeting we have seen a wide series of techniques discussed for constructing confidence intervals in the presence of nuisance parameters.

Everyone has expressed a concern that their methods cover, at least approximately. This appears to be important for LHC physics in light of Cranmer’s concerns.


Bayesian with Coverage

Joel Heinrich presented a decision by CDF to do Bayesian analyses with priors that cover. Advantage is Bayesian conditioning with frequentist coverage. Possibly the maximum amount of work for the experimenter.

Example of coveragewith a single Poisson with normalization and background nuisance parameters:

Flat priors


Bayesian with Coverage

Example of coverage with flat and 1/and 1/b priors for a 4-channel Poisson with normalization and background nuisance parameters

Flat priors 1/and 1/b priors


Frequentist/Bayesian Hybrid

Fredrik Tegenfeldt presented a likelihood-ratio ordered (LR) Neyman construction after integrating out the nuisance parameters with a flat priors. In a single channel test, there was no undercoverage.

What happens for a multi-channel case? My guess is that the confidence belt will be distorted by the use of flat priors, but that the method will still cover due to the construction.

Cranmer considers a similar technique, as was used for LEP Higgs searches.

Both are call “Cousins-Highland,” although probably neither actually is.


Profile Likelihood

44 years ago, Kendall and Stuart told us how to eliminate nuisance parameters and do a LR construction:


One (Minor) Problem

The Kendall-Stuart prescription leads to the problem that for Poisson problems as the nuisance parameter is better and better known, the confidence intervals do not converge to the limit of being perfectly known. The reason is that the introduction of a nuisance par-ameter breaks the discreteness of the Poisson distribution.

From Punzi’s talk


One More Try

Since this was referred to in a parallel session as “the Feldman problem” and since two plenary speakers made fun of my Fermilab Workshop plots, I will try to explain them again.

n

b

n

b

n

b

r = 1

r << 1

known exactly

nbnbnbr = 1r << 1known exactly


The Cousins-Highland Problem

This correction also solves what Bob and I refer to as the Cousins-Highland problem (as opposed to method).

Cousins and Highland turned to a Bayesian approach to calculate the effect of a normalization error because the frequentist approach gave an answer with the wrong sign.

We now understand this was due to simply breaking the discreteness of the Poisson distribution.

In one test case, using this correction reproduced the Cousins-Highland result x/ 2.


Use of Profile Likelihood

Wolfgang Rolke presented a talk on eliminating the nuisance parameters via profile likelihood, but with the Neyman construction replaced by the-lnL hill-climbing approximation. This is also what MINUIT does. The coverage is good with some minor undercoverage. Cranmer also considers this method.


Full Neyman Constructions

Both Giovanni Punzi and Kyle Cranmer attempted full Neyman constructions for both signal and nuisance parameters.

I don’t recommend you try this at home for the following reasons:

The ordering principle is not unique. Both Punzi and Cranmer ran into some problems.

The technique is not feasible for more than a few nuisance parameters.

It is unnecessary since removing the nuisance parameters through profile likelihood works quite well.


Cranmer’s (Revised) Conclusions

In Cranmer’s talk, he had an unexpected result for the coverage of Rolke’s method(“profile”). He didin fact have an error and it is corrected here:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.


Final Comments on Nuisance Parameters

My preference is to eliminate at least the major nuisance parameters through profile likelihood and then do a LR Neyman construction. It is straightforward and has excellent coverage properties.

However, whatever method you choose, you should check the coverage of the method.

Cranmer makes the point that if you can check the coverage, you can also do a Neyman construction. I don’t completely agree, but it is worth considering.

Documents

Concluding Talk: Physics Gary Feldman Harvard University PHYSTAT 05 University of Oxford 15 September, 2005