Extracting bb Higgs decay signals using multivariate techniques

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

Extracting bb Higgs decay signals using multivariate techniques. Clarke Smith. Outline. Higgs search at ATLAS Multivariate methods Event generation with PYTHIA Event processing with ROOT Higgs mass reconstruction with TMVA Results. Higgs search at ATLAS. - PowerPoint PPT Presentation

Text of Extracting bb Higgs decay signals using multivariate techniques

Slide 1

Extracting bb Higgs decay signals using multivariate techniquesClarke SmithOutlineHiggs search at ATLASMultivariate methodsEvent generation with PYTHIAEvent processing with ROOTHiggs mass reconstruction with TMVAResults2Higgs search at ATLASHiggs boson h evidence of a theoretical mechanism for giving fermions and bosons massmass width of several MeVgg h bb

After formulating the Standard Model of particle physics, theorists had difficulty explaining the origin of mass for fermions and bosons namely why some force-carrying bosons have no mass while others have a great deal of itIt was suggested that there exists a Higgs mechanism (named after Peter Higgs, an English physicist) that gives these particles massIf we can observe the Higgs boson, the offspring of the Higgs mechanism, we can experimentally confirm the existence of the Higgs mechanismWe want to observe the Higgs by detecting the particles that result when it decaysUnfortunately, Higgs decay signals are extremely weak compared to those of background processesThe property of the Higgs that we want to exploit is its narrow intrinsic mass width of only a few MeVThis would make the signal stick out above the background if we can effectively measure the Higgs massThis is a plot of the Higgs branching fractions against massFor a Higgs under 140 GeV, the dominant decay mode is bbIf the Higgs is in this mass range, then we will need to be able to isolate gghbb from other modes of bb production3signalgg h bbbackgroundgg bb-These are Feynman diagrams for the hard processes that produce bb-On top a Higgs is produced by gluon-gluon fusion via a virtual top-quark loop and subsequently decays to bb-pp-collisions that involve this hard subprocess are called signal events-On bottom, b and b are produced from gluon-gluon fusion via the exchange of a virtual b-quark-pp-collisions that involve this hard subprocess are called background events-bb can be produced through other processes as well, but gg bb is dominant4In pp-collisions (events), detect resulting hadrons and measure their pT, , and

So-called jet combinatorics problem:

how to partition hadrons into jets to reconstruct event informationMany mass-reconstruction algorithms for thisall produce pT, , and for b, b, and huse different R values to isolate jets

mbb reconstruction plots theoretically show background with tiny, wide mh (signal) bumpGoal: observe bump by narrowing it

-The ATLAS coordinate system has the z-axis along the beam line, the angle w.r.t. the beam line, and the angle around the beam line-Instead of talking about , we will talk about the pseudorapidity defined as shown-In pp-collisions, quarks and gluons are produced but quickly hadronize due to color confinement, yielding tight sprays of hadrons called jets-We detect these hadrons and measure their transverse momentum, , and -This, however, doesnt tell us much about the event; we need jet information for that-Hence we have the so-called jet combinatorics problem, which is: how do we group these detected hadrons into their respective jets?-Doing this allows us to isolate the jets resulting from Higgs decay and thus reconstruct Higgs information-There are a multitude of mass-reconstruction algorithms designed for this purpose they all output pt, , and for h, b, and b using various selection processes that depend on R-values which are merely points in the plane-With these algorithms, we can reconstruct mbb for all events (signal and background)-Due to the small mass width of the Higgs, there should be a small spike somewhere in the mbb plot-However, the algorithms are not yet good enough to see such a spike-If we can develop a better method for reconstructing the Higgs mass in signal events, then it might be possible to see this spike

5Multivariate methodsMethods used to reconstruct mh: neural networks (NN) and boosted regression trees (BRT)Train method by feeding it inputs and targets (true mh) for each eventMethod searches for patterns in the inputs and correlations to true mhUse outputs from 25 mass-reconstruction algorithms as inputs for NN and BRT-For this task, we want to combine the various existing mass-reconstruction algorithms-It appears that a good way to do this is using multivariate methods, which simply construct a mapping from multiple input variables into a prediction of the target -Since we are trying to reconstruct the Higgs mass, the true Higgs mass is our target-Two of the most popular multivariate methods for this type of regression task are neural nets and boosted regression trees-A neural net is a complicated and nonlinear composite function that models the target-A boosted regression tree is a tree that predicts the target based on a number of event criteria-Both the NN and the BRT are created through training, the process of feeding the method inputs and targets so that it can construct its mapping-For our methods, we use the outputs from 25 of the best mass-reconstruction algorithms for our inputs-That includes reconstructed geometric and energetic information about the b and b jets and the Higgs6-Here is a relatively simple neural net-The white dots on the left are inputs, which are linearly combined and nonlinearly mapped into each of the white dots on the second layer-This happens over and over until the end7-Here is an example boosted regression tree-At each split, called a node, there is a yes-or-no question about the input-For any given event, you can use the inputs to go from node to node until arriving at an end state, called a leaf8Event generation with PYTHIAGenerate 7105 gg h bb (signal) eventsSpecify mh = 90, 100, 110, 120, 130, 140, 150 GeVgenerated event mh-To train these boosted regression trees and neural nets, we first need a large quantity of signal events with specified Higgs masses-We use a Monte Carlo particle physics event generator called PYTHIA-With PYTHIA, we get to input the physics models that we want the generator to use (in this case, SM physics), the Higgs mass, and the hard subprocess involved (in this case, gg h bb)-This is a plot of the true Higgs masses in events generated by PYTHIA9Event processing with ROOT25 mass-reconstruction algorithms applied to each event output is input for NN/BRT

variablessingle algorithm reconstructed mh-From the Monte Carlo events, we use ROOT to extract pt, , and for each emitted hadron-We then use the 25 mass-reconstruction algorithms to obtain nine types of variables for each event: the sum of the pt activity in the event; pt and for the b and b jets; the R separation of the b from the b jet; and the pt, , and m for the Higgs-All variables except the pt sum depend on the algorithm used, so we have a set of over 100 input variables-These variables are fed to the NN or BRT as types of inputs that the methods can expect for each event-This is an example of the reconstructed Higgs mass plot for one algorithm10Higgs mass reconstruction with TMVATo run TMVA: feed data, select method(s), specify variables, and choose parametersTMVA uses half of the sample for training and half for testingSelect variables based on effectiveness and redundancyeffective if ranked highly by TMVA methodredundant if strongly correlated to another variableOptimize parameters with RMS comparison

NN parameters: HiddenLayers, NeuronType, NeuronInputType, etc.BRT parameters: NTrees, BoostType, SeparationType, etc.

-To access the neural net and boosted regression tree, we use the TMVA package within ROOT-Ultimately, TMVA allows users to choose four things: the input data, the multivariate methods, the variables, and the parameters of the methods-Instead of using all of the variables, we only use an optimal subset by eliminating redundant and ineffective variables-The TMVA method chosen ranks the variables based on how important they were in training; we use this to eliminate useless variables-We also remove redundant variables: those that are strongly correlated to others and hence offer nothing on their own-This is done by looking at 2D scatter plots-Finally, we optimize various NN and BRT parameters by tweaking them and comparing the methods total RMS value, which reflects the ability of the method to model the target (its the same as the standard deviation of the predictions deviation from the target)-Some NN parameters we tweak are the number of hidden layers, the number of neurons in each hidden layer, the nonlinear function, and the type of linear combination-The BRT parameters we adjust are those used in the tree creation11ResultsOverall, BRT with GradientBoost yielded best predictionsmethodRMSmean deviationtruncated RMStruncated mean deviationNN1.25104479.731031.21103BRT with AdaBoost.R21.641049561.581041.49103BRT with GradientBoost1.24104-1.491038.99103-281units are MeV

reconstructed mh using BRT with GradientBoost for PYTHIA-generated 120 GeV Higgs eventsprevious mh reconstruction attempt using NN for ALPGEN-generated 120 GeV Higgs eventsFuture workOptimize parameters algorithmicallyGenerate events with more Higgs massesProcess events with more variablesCombine multivariate methodsTest on actual ATLAS data

gg

b

b

b

gg

b

t

t

t

b

h

pt1_0

eta1_0

pt2_0

eta2_0

Bias node

Layer 0 Layer 1 Layer 2 Layer 3 Layer 4

htempEntries 84415Mean 1.2e+05RMS 12.29

true_higgs_mass119.85 119.9 119.95 120 120.05 120.1 120.15

3100

5000

10000

15000

20000

25000

30000

35000

40000

45000

htempEntries 84415Mean 1.2e+05RMS 12.29

true_higgs_mass

htempEntries 84415Mean 113.7RMS 23.38

b_m_1840 60 80 100 120 140 160 180 2000

5000

10000

15000

20000

25000

htempEntries 84415