Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat...

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1

Multivariate Methods in HEP

Pushpa Bhat Fermilab

Outline

• Introduction/History• Physics Analysis Examples• Popular Methods

• Likelihood Discriminants• Neural Networks• Bayesian Learning• Decision Trees

• Future• Issues and Concerns• Summary

Some History

• In 1990 most of the HEP community was skeptical towards use of multivariate methods, particularly so in case of neural networks (NN)• NN as a black box

Can’t understand weightsNonlinear mapping; higher order correlations Though mathematical function can’t explain in terms of physicsCan’t calculate systematic errors reliably

Uni-variate or “cut-based” analysis was the norm • Some were pursuing application of neural network methods to HEP

around 1990• Peterson, Lonnblad, Denby, Becks, Seixas, Lindsey, etc

• First AIHENP (Artificial Intelligence in High Energy & Nuclear Physics) workshop was in 1990.• Organizers included D. Perret-Gallix, K.H. Becks, R. Brun, J.Vermaseren. AIHENP metamorphosed into ACAT ten years later, in 2000

• Multivariate methods such as Fisher discriminants were in limited use.• In 1990, I began to pursue the use of multivariate methods, especially

NN, in top quark searches at Dzero.

Mid-1990’s

• LEP experiments had been using NN and likelihood discriminants for particle-ID applications and eventually for signal searches (Steinberger; tau-ID)

• H1 at HERA successfully implemented and used NN for triggering (Kiesling).

• Hardware NN was attempted at Fermilab at CDF• Fermilab Advanced Analysis Methods Group

brought CDF and DØ together for discussion of these methods and applications in physics analyses.

The Top QuarkPost-Evidence, Pre-Discovery !

Fisher Analysis of tte channel

One candidate event (S/B)(mt = 180 GeV)

= 18 w.r.t. Z = 10 w.r.t WW

NN Analysis tt e+jets channeltt

W+jets

W+jetstt160 Data

P. Bhat, DPF94

Cut Optimization for Top Discovery Feb. ‘95

Signal

BackgroundJan. ’95

(Aspen) cut

Mar. ’95Discovery cut

Contours: Possible NN cuts Feb. ‘95

Sig. Eff.

S/B (Feb-Mar, 95 -Discovery

Conventional cut)

S/B reach with 2-v NN analysisfor similar efficiency

(Jan, 95 –Aspen mtg.Conventional cut)

Neural Network Equi-probability Contour cuts from 2-variable analysis compared with conventional cuts used in Jan. ’95 and in Observation paper

P. Bhat, H.Prosper, E. AmidiD0 Top Marathon, Feb. ‘95

Measurement of the Top Quark Mass

Discriminant variables

mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2

The DiscriminantsThe Discriminants

DØ Lepton+jetsDØ Lepton+jets

Fit performed in 2-D: (DLB/NN, mfit)

Run I (1996) result with NN and likelihoodRecent (CDF+D0) mt measurement:

mt= 171.4 ± 2.1 Gev/c2

First significant physics result using multivariate methods

Higgs, the Holy Grail of HEPDiscovery Reach at the Tevatron

• The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis

• Improved bb mass resolution & b-tag efficiency crucial

Run II Higgs study hep-ph/0010338 (Oct-2000)P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022

Then, it got easier

• One of the important steps in getting the NN accepted at the Tevatron experiments was to make the Bayesian connection.

• Another important message to drive home was “the maximal use of information in the event” for the job at hand

• Developed a random grid search technique that can be used as baseline for comparison

• Neural network methods now have become popular due to the ease of use, power and many successful applications

Maybe too easy??

Optimal Event Selection

r(x,y) = constant defines an optimaldecision boundary

Feature spaceFeature space

)()|,(

)()|,(),(

bpbyxp

spsyxpyxr

)()|,(

)()|,(),(

bpbyxp

spsyxpyxr

S = B =

Conventional cutsx x

The NN-Bayesian Connection

Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2).

rxspxy

1)|()ˆ,(

)ˆ,,( 21 xxy

))P(|P(x

))P(|P(x )x |( 11

1ii CC

spsxpr

Limitations of “Conventional NN”

• The training yields one set of weights or network parameters• Need to look for “best” network, but avoid overfitting

• Heuristic decisions on network architecture• Inputs, number of hidden nodes, etc.

• No direct way to compute uncertainties

Ensembles of Networks

)(xyayi

Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.

Bayesian Learning

• The result of Bayesian training is a posterior density of the network weights

P(w|training data) • Generate a sequence of weights (network

parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:

1),( kwxy

Bayesian Learning – 2

• Advantages• Less prone to over-fitting• Less need to optimize the size of the network. Can use a

large network! Indeed, number of weights can be greater than number of training events!

• In principle, provides best estimate of p(t|x)p(t|x)

• Disadvantages• Computationally demanding!

• The dimensionality of the parameter space is, typically, large • There could be multiple maxima in the likelihood function p(t|

x,w), or, equivalently, multiple minima in the error function E(x,w).

Example: Single Top Search

• Training Data• 2000 events (1000 tqb- + 1000 Wbb-)• Standard set of 11 variables

• Network• (11, 30, 1) Network (391391 parameters!)

• Markov Chain Monte Carlo (MCMC)• 500 iterations, but use last 100 iterations • 20 MCMC steps per iteration• NN-parameters stored after each iteration• 10,000 steps• ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)

Signal:tqb; Background:Wbb Distributions

Example: Single Top Search

Decision Trees

• Recover events that fail criteria in cut-based analyses• Start at first “node” with a fraction of the “training

sample” • Select best variable and cut with best separation to

produce two “branches ” of events, (F)ailed and (P)assed cut

• Repeat recursively on successive nodes• Stop when improvement stops or when too few events

are left • Terminal node is called a “leaf ” with purity =

Ns/(Ns+Nb)• Run remaining events and data through the tree to

derive results• Boosting DT:

• Boosting is a recently developed technique that improves any weak classifier (decision tree, neural network, etc)

• Boosting averages the results of many trees, dilutes the discrete nature of the output, improves the performance

DØ single topanalysis

Matrix Element MethodExample: Top mass measurement

• Maximal use of information in each event by calculating event-by-event signal and background probabilities based on the respective matrix element

x: reconstructed kinematic variables of final state objectsJES: jet energy Scale from Mw constraint

• Signal and background probabilities from differential cross sections

• Write combined likelihood for all events

• Maximize likelihood w.r.t. mtop, JES

Summary

• Multivariate methods are now used extensively in HEP data analysis

• Neural networks, because of their ease of use and power, are favorites for particle-ID and signal/background discrimination

• Bayesian neural networks take us one step closer to optimization

• Likelihood discriminants and Decision trees are becoming popular because they are easier to “defend” (no “black-box” stigma)

• Many issues remain to be addressed as we get ready to deploy the multivariate methods for discoveries in HEP

Nothing tends so much to the advancement of knowledge as the application of a new instrument - Humphrey Davy

No amount of experimentation can ever prove me right; a single experiment can prove me wrong. - Albert Einstein

DØDØ

Booster

World’s Highest Energy Laboratory

(for now)

Our Fancy New Toys

LHC Ring

SPS Ring

Circumference = 27kmBeam Energy = 7.7 TeVLuminosity =1.65x1034 cm-2sec-1

Startup date: 2007

LHC Magnet LHC Tunnel

TI 2TI 2

TI 8TI 8

The Large Hadron Collider

LHC Environment

14 TeV Proton Proton colliding beams

Parameter ValueBunch-crossing frequency 40 MHz

Average # of collisions / crossing

“interaction rate” ~109

Average # of charged tracks

Radiation field severe

CMS Parameter ValueLevel-1 trigger rate 100 kHz

Mean time between triggers

10 sec

Trigger latency 3.2 sec

Solenoid field 4 T

CMS Silicon Tracker

Challenges

CMS Si Tracker

Inner Barrel & Disks

(TIB & TID)

PixelsOuter Barrel (TOB)

Lots of Silicon

214m2 of silicon sensors11.4 million silicon strips66 million pixels!

Si Tracker Challenges

• Large and complex system• 77.4 million total channels (out of a total of 78.2 M for

experiment)• Detector monitoring, data organization, data quality monitoring,

analysis, visualization, interpretation all daunting!

• Need to monitor every channel and make sure most of the detector is working at all times (live fraction of the detector and efficiencies bound to decrease with time)

• Need to verify data integrity and data quality for physics• Diagnose and fix problems ASAP• Keep calibration and alignment parameters current

Detector/Data Monitoring

• Monitor• Environmental variables

• Temperatures, coolant flow rates, interlocks, radiation doses

• Hardware status• Voltages, currents

• Channel Data• Readout states, Errors, missing data/channels, bad ID for

channel/modulemany kinds to be categorized and tracked and displayedshould be able to find rare problems/errors (with low

occurrence rate) that may corrupt data Problems (Rare problems may indicate a developing failure mode or hidden bad behavior)

Correlate problem/noisy channels with history, temperature, currents, etc.

Data Quality Monitoring

• Monitor• Raw Data

• Pedestals, noise, adc counts, occupancies, efficiencies• Processed high level objects

• Clusters, tracks, etc.• Evaluate thousands of histograms

• Can’t visually examine all• Automatically evaluate histograms by comparing to reference

histograms • Adaptive, efficient, find evolving patterns over time

• Quantiles? q-q plots/comparison instead of KS test?• A variety of 2D “heat” maps

• Occupancies, #of bad channels/module, #of errors/module, etc.

• Typical occupancy ~ 2% in strip tracker• 200,000 channels written out 100 times/sec

Module Assembly Precision

Example of a“Heat” map

Need smart approaches

• What are the best techniques for data-mining?• To organize data for analysis and data visualization

• complex geometry/addressing makes visualization difficult

• For finding problematic channels quickly, efficiently clustering, exploratory data-mining

• For finding anomalies, corrupt data, patterns of behaviorFeature-finding algorithms, superpose many events, time

evolution, spatial and temporal correlations

• Noise Correlations • Via correlation coefficients of defined groups• Correlate to history (time variations), environmental

variables

Data Visualization

• Based on hierarchical/geometrical structure of the tracker• Display every channel, attach objects/info to each

Sub-structuresLayers/ringsModulesReadout Chips

Multivariate Analysis Issues

• Dimensionality Reduction• Choosing Variables optimally without losing information

• Choosing the right method for the problem• Controlling Model Complexity• Testing Convergence• Validation

• Given a limited sample what is the best way?

• Computational Efficiency

Multivariate Analysis Issues

• Correctness of modeling• How do we make sure the multivariate modeling is

correct? • The data used for training or building PDEs represent reality.

Is it sufficient to check the modeling in the mapped variable? Pair-wise correlations? Higher order correlations?

• How do we show that the background is modeled well? How do we quantify the correctness of modeling?

• In conventional analysis, we normally look for variables that are well modeled in order to apply cuts

• How well is the background modeled in the signal region?

• Worries about hidden bias• Worries about underestimating errors

Sociological Issues

• We have been conservative in the use of MV methods for discovery.

• We have been more aggressive in the use of MV methods for setting limits.

• But discovery is more important and needs all the power you can muster!

• This is expected to change at LHC.

Summary

• The next generation of experiments will need to adopt advanced data mining and data analysis techniques

• Conventional/routine tasks such as alignment, detector performance and data quality monitoring and data visualization will be challenging and require new approaches

• Many issues regarding use of multivariate methods of data analysis for discoveries and measurements need to be addressed to make optimal use of data

MV: Where can we use them?

• Almost everywhere since HEP events are multivariate• Improve several aspects of analysis

• Event selection• Triggering, Real-time Filters, Data Streaming

• Event reconstruction• Tracking/vertexing, particle ID

• Signal/Background Discrimination• Higgs discovery, SUSY discovery, Single top, …

• Functional Approximation• Jet energy corrections, tag rates, fake rates

• Parameter estimation• Top quark mass, Higgs mass, SUSY model parameters

• Data Exploration• Knowledge Discovery via data-mining• Data-driven extraction of information, latent structure analysis

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat...

Documents

Bhat Airtel

Page 1 of 14 - Aligarh Muslim University · raof ahmad bhat irshad maqbool malik showkat ahmad naikoo md. bani israil mohd. aasif ms. pushpa mohd. shahid khan md. mahtab khan lalit

cwjc/CWJC PUSHPA KUMARI AND ORS.pdf

Feedback Bhat

Bhat ishfaq

Pushpa gas equipments

Pneumatic tube systems - PII-Pushpa Healthcare

R&D ON BEAM INJECTION AND BUNCHING SCHEMES IN THE … · R&D ON BEAM INJECTION AND BUNCHING SCHEMES IN THE FERMILAB BOOSTER * C. M. Bhat # Fermilab, Batavia, IL 60510, USA. Abstract

US High Luminosity LHC with Large Piwinski Angle Scheme: A Recent Look C. M. Bhat Fermilab/CERN 1 st HiLumi LHC/LARP Collaboration Meeting November 16-18,

Pushpa Industries, Bangalore,Karnataka,India

Bhat bhateni group

Pooja Pushpa Lu

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November

Pushpa Exports, Udaipur, Hand Crafted Furniture

Muon Coalescing 101 Chuck Ankenbrandt Chandra Bhat Milorad Popovic Fermilab NFMCC Meeting @ IIT March 14, 2006

PUSHPA MATHEW PORTFOLIO

Understanding media and advertising pushpa

SAM'S IN PUSHPA

Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University

Resensi buku bil.2 pushpa