1
Q UARK / GLUON JET TAGGING FOR ALICE: MACHINE LEARNING FOR PARTICLE PHYSICS A NDREW J OHN L OWE Wigner Research Centre for Physics, Hungarian Academy of Sciences I NTRODUCTION Search strategies for new subatomic particles often depend on being able to efficiently discriminate between signal and back- ground processes. Particle physics experiments are expensive, the competition between rival experiments is intense, and the stakes are high. This has lead to increased interest in advanced statisti- cal methods to extend the discovery reach of the experiments. We present a new method that could be used for differentiating be- tween decays of quarks and gluons at experiments like those at the Large Hadron Collider (LHC) at CERN. The power to discriminate between these two types of particle would have a huge impact on many new physics searches at CERN and beyond. T HE ALICE E XPERIMENT ALICE (A Large Ion Collider Experiment) is one of seven de- tector experiments at the LHC. ALICE is focusing on the physics of strongly interacting matter in heavy-ion (lead nuclei) collisions. The resulting temperature and energy density are expected to be high enough to produce quark-gluon plasma, a state of matter wherein quarks and gluons are freed. Similar conditions are believed to have existed a fraction of the second after the Big Bang. Recreating this primordial form of matter and understanding how it evolves is ex- pected to shed light on questions about how matter is organized, the mechanism that confines quarks and gluons, and the nature of strong interactions and how they result in generating the bulk of the mass of ordinary matter. Figure 1: Computer generated cut-away view of ALICE. W HAT IS A JET ? The production of quarks and gluons (collectively known as partons) via strong interactions is the dominant high-momentum- transfer process at the LHC. Quarks and gluons are not observed individually. Instead, we can only measure their decay products. What we observe is a cone-shaped spray of particles called a jet. The measured particles are grouped together by a jet algorithm, and the resultant jets are viewed as a proxy to the initial quarks and gluons that we can’t measure. Figure 2: When two high-energy protons collide, the partons that compose them (here only quarks are depicted in green, red, and blue) can hit each other. Some of these partons (pink balls) can fly away and "hadronize", forming directional jets of energetic particles (white balls). From [1]. T HE PROBLEM IN A NUTSHELL Inside ALICE, beams of energetic protons and/or heavy ions collide. Quarks and gluons emerge and decay into collimated sprays of particles, and algorithms cluster these decay products into jets. For each jet, we’d like to know what initiated it: was it a quark or a gluon? This is an archetypal classification problem that might be amenable to machine learning. F EATURE ENGINEERING There are several differences between quarks and gluons that prove useful in motivating observables that might distinguish be- tween jets initiated by quarks as compared to gluons. Specifically, we wish to leverage differences in jet substructure to construct dis- criminant variables. Many candidate discriminant variables (fea- tures) were found during a thorough and extensive literature search, but we also consider various unintuitive combinations of variables, following the example in [2]. Combining particle attributes with each other (to form sums, differences, or products, for example) leads to a rapid proliferation of features. Consequently, we ex- plore hundreds of experimentally motivated, physically motivated, and unmotivated single-variable discriminants. G ETTING & CLEANING DATA The ALICE analysis software framework A LI ROOT was used to process Monte-Carlo simulated data that contains lots of jets. We inserted our own C++ code with handcrafted features into A LI - ROOT. Unphysical (missing) values are denoted by NaNs. We re- quire that jets have at least two tracks, are fully contained within tracker geometrical acceptance and are isolated. We then analyse the floating-point types contained in the data. This is a new sub- process in particle physics data analysis that we have invented. The data is contained in two-dimensional array-like structure, in which each column contains measurements on one feature, and each row contains one jet. We plot this below: 2 tracks 3 tracks 4 tracks 5 tracks Feature Jet Number types Zero Normal Normal Large unnormal or NaN Figure 3: "Missingness" and floating-point type map of the data. Feature names and jet ID numbers have been omitted for clarity. We observe that several features appear to be duplicates, and several that are overwhelmingly NaN or zero. These have no pre- dictive value and are removed. We reset large unnormalised and infinite values to the largest representable normalised number on our hardware. Several features have values below the machine e — these are are due to rounding-error in floating-point arithmetic, and are essentially equal to zero. The variation in their values is not real, and could be misleading to a classifier. We note that process- ing very small floating-point values may significantly slow compu- tation; in extreme cases, instructions may be as much as 100 times slower [3, 4]. We flush these values to zero, which should speed up classifier training. J ET TRUTH LABELLING The jets are assigned a "ground-truth" label using information in the data simulator event record. However, the labelling procedure is not unambiguous, and there is significant class noise (i.e., there are mislabelled jets) in the assignments. We have devised a new la- belling scheme to address the problem of mislabelled jets. We adapt the method in [5] by extending it to form an ensemble of multiple different labelling schemes. We then reject all jets for which the en- semble does not reach a consensus, i.e., we employ "the wisdom of the crowds". Limiting class noise is critical for two reasons: firstly, many machine learning classifiers are confused by mislabelled ob- servations, and this will damage performance; secondly, the perfor- mance of a classifier is measured with respect to its ability to cor- rectly predict the assigned labels, so performance estimates are less meaningful if the labels are uncertain. J ET TRUTH LABELLING ( cont . ) We use an ensemble labeller with five members. To tab- ulate their outputs would require ten tables corresponding to the ( 5 2 ) possible adjacency matrices which, in turn, corre- spond to the margins of the 5-dimensional adjacency matrix that fully describes the relationships between the schemes. We use a chord diagram to examine these relationships, which are in agreement with our expectations: 0 b 0 c 0 g 0 70 140 210 q 0 γ 0 0 b 0 c 0 g 0 70 140 210 q 0 γ 0 0 b 0 c 0 g 210 0 70 140 q 0 γ 0 0 b 0 c 0 g 0 70 140 210 q 0 γ 0 0 b 0 c 0 g 0 70 140 210 q 0 γ 0 p - x a m e n o C e r a w a - D C Q e n o C p - x a m A G GA Q C D - a wa re R ecluste r e d : no label q: light quark g: gluon c: charm b: bottom γ: photon Figure 4: Chord diagram showing the relationships between five chosen labelling schemes. There is overwhelming agreement in the label assignments from each scheme, and (as expected) variations of the "max- p T " scheme are prone to label a jet as photon-initiated (γ) when a QCD- aware scheme would label the jet as a gluon or assign no label (as noted in [5]). F EATURE RANKING RESULTS Our data initially contains more than 300 features. Removing duplicate and highly correlated features halves the number of fea- tures; first we filter on the absolute value of Pearson correlation to identify linearly correlated features, then we filter on the absolute value of Spearman correlation to identify monotonically-related fea- tures. To optimally search the remaining feature space to find the variables that provide the best predictive power, we invented a fast filter-based method that involves ranking variables by information gain (Kullback-Leibler divergence) or Gini impurity and comparing their rank with that of random probes injected into the data. We do this repeatedly for a large number of bootstrap resamplings to yield a median (or, optionally, a mean) and nonparametric confidence in- terval estimate for the chosen metric for each feature. We then re- move features with values of the metric less than that for random probes, or within one standard deviation: Figure 5: Box-and-whisker plot showing median information gain for features and six ran- dom probes (denoted "BOGUS"). To the left is worse, to the right is better. In addition to confirming the power of features already proposed in past work, this method has found intriguing new variables that promise better discrimination between quark- and gluon-initiated jets and are therefore ideal can- didates for further study. R EFERENCES [1] C. Manuel. The Stopping Power of Hot Nuclear Matter. Physics, 7(97), 2014. doi: 10.1103/Physics.7.97. URL http://link.aps.org/doi/10.1103/Physics.7.97. [2] J. Gallicchio, J. Huth, M. Kagan, M. D. Schwartz, K. Black, and B. Tweedie. Mul- tivariate discrimination and the Higgs + W/Z search. JHEP, 04:069, 2011. doi: 10.1007/JHEP04(2011)069. [3] E. M. Schwarz, M. Schmookler, and S. D. Trong. FPU implementations with denormalized numbers. IEEE Transactions on Computers, 54(7):825–836, July 2005. ISSN 0018-9340. doi: 10.1109/TC.2005.118. [4] I. Dooley and L. Kale. Quantifying the interference caused by subnormal floating-point values. In Proceedings of the Workshop on Operating System Interference in High Performance Applications, 2006. [5] A. Buckley and C. Pollard. QCD-aware partonic jet clustering for truth-jet flavour labelling. Eur. Phys. J., C76(2):71, 2016. doi: 10.1140/epjc/s10052-016-3925-z. This work was supported by: Hungarian National Research Fund (OTKA) NK106119 and the Wigner GPU Laboratory of the Wigner RCP, Hungarian Academy of Sciences

poster-lowe-6

Embed Size (px)

Citation preview

Page 1: poster-lowe-6

QUARK/GLUON JET TAGGING FOR ALICE:MACHINE LEARNING FOR PARTICLE PHYSICS

ANDREW JOHN LOWEWigner Research Centre for Physics, Hungarian Academy of Sciences

INTRODUCTIONSearch strategies for new subatomic particles often depend

on being able to efficiently discriminate between signal and back-

ground processes. Particle physics experiments are expensive, the

competition between rival experiments is intense, and the stakes

are high. This has lead to increased interest in advanced statisti-

cal methods to extend the discovery reach of the experiments. We

present a new method that could be used for differentiating be-

tween decays of quarks and gluons at experiments like those at the

Large Hadron Collider (LHC) at CERN. The power to discriminate

between these two types of particle would have a huge impact on

many new physics searches at CERN and beyond.

THE ALICE EXPERIMENTALICE (A Large Ion Collider Experiment) is one of seven de-

tector experiments at the LHC. ALICE is focusing on the physics

of strongly interacting matter in heavy-ion (lead nuclei) collisions.

The resulting temperature and energy density are expected to be

high enough to produce quark-gluon plasma, a state of matter wherein

quarks and gluons are freed. Similar conditions are believed to have

existed a fraction of the second after the Big Bang. Recreating this

primordial form of matter and understanding how it evolves is ex-

pected to shed light on questions about how matter is organized,

the mechanism that confines quarks and gluons, and the nature of

strong interactions and how they result in generating the bulk of the

mass of ordinary matter.

Figure 1: Computer generated cut-away view of ALICE.

WHAT IS A JET?The production of quarks and gluons (collectively known as

partons) via strong interactions is the dominant high-momentum-

transfer process at the LHC. Quarks and gluons are not observed

individually. Instead, we can only measure their decay products.

What we observe is a cone-shaped spray of particles called a jet. The

measured particles are grouped together by a jet algorithm, and the

resultant jets are viewed as a proxy to the initial quarks and gluons

that we can’t measure.

Figure 2: When two high-energy protons collide, the partons that compose them (here onlyquarks are depicted in green, red, and blue) can hit each other. Some of these partons (pinkballs) can fly away and "hadronize", forming directional jets of energetic particles (white balls).From [1].

THE PROBLEM IN A NUTSHELLInside ALICE, beams of energetic protons and/or heavy ions

collide. Quarks and gluons emerge and decay into collimated sprays

of particles, and algorithms cluster these decay products into jets.

For each jet, we’d like to know what initiated it: was it a quark or

a gluon? This is an archetypal classification problem that might be

amenable to machine learning.

FEATURE ENGINEERINGThere are several differences between quarks and gluons that

prove useful in motivating observables that might distinguish be-

tween jets initiated by quarks as compared to gluons. Specifically,

we wish to leverage differences in jet substructure to construct dis-

criminant variables. Many candidate discriminant variables (fea-

tures) were found during a thorough and extensive literature search,

but we also consider various unintuitive combinations of variables,

following the example in [2]. Combining particle attributes with

each other (to form sums, differences, or products, for example)

leads to a rapid proliferation of features. Consequently, we ex-

plore hundreds of experimentally motivated, physically motivated,

and unmotivated single-variable discriminants.

GETTING & CLEANING DATAThe ALICE analysis software framework ALIROOT was used

to process Monte-Carlo simulated data that contains lots of jets. We

inserted our own C++ code with handcrafted features into ALI-

ROOT. Unphysical (missing) values are denoted by NaNs. We re-

quire that jets have at least two tracks, are fully contained within

tracker geometrical acceptance and are isolated. We then analyse

the floating-point types contained in the data. This is a new sub-

process in particle physics data analysis that we have invented. The

data is contained in two-dimensional array-like structure, in which

each column contains measurements on one feature, and each row

contains one jet. We plot this below:

2 tracks

3 tracks

4 tracks

≥ 5 tracksFeature

Jet

Number typesZero

Normal < ε

Normal > εLarge unnormal or ∞

NaN

Figure 3: "Missingness" and floating-point type map of the data. Feature names and jet IDnumbers have been omitted for clarity.

We observe that several features appear to be duplicates, and

several that are overwhelmingly NaN or zero. These have no pre-

dictive value and are removed. We reset large unnormalised and

infinite values to the largest representable normalised number on

our hardware. Several features have values below the machine ε

— these are are due to rounding-error in floating-point arithmetic,

and are essentially equal to zero. The variation in their values is not

real, and could be misleading to a classifier. We note that process-

ing very small floating-point values may significantly slow compu-

tation; in extreme cases, instructions may be as much as 100 times

slower [3, 4]. We flush these values to zero, which should speed up

classifier training.

JET TRUTH LABELLINGThe jets are assigned a "ground-truth" label using information

in the data simulator event record. However, the labelling procedure

is not unambiguous, and there is significant class noise (i.e., there

are mislabelled jets) in the assignments. We have devised a new la-

belling scheme to address the problem of mislabelled jets. We adapt

the method in [5] by extending it to form an ensemble of multiple

different labelling schemes. We then reject all jets for which the en-

semble does not reach a consensus, i.e., we employ "the wisdom of

the crowds". Limiting class noise is critical for two reasons: firstly,

many machine learning classifiers are confused by mislabelled ob-

servations, and this will damage performance; secondly, the perfor-

mance of a classifier is measured with respect to its ability to cor-

rectly predict the assigned labels, so performance estimates are less

meaningful if the labels are uncertain.

JET TRUTH LABELLING (cont.)We use an ensemble labeller with five members. To tab-

ulate their outputs would require ten tables correspondingto the (5

2) possible adjacency matrices which, in turn, corre-spond to the margins of the 5-dimensional adjacency matrixthat fully describes the relationships between the schemes.We use a chord diagram to examine these relationships,which are in agreement with our expectations:

0

b0

c0

g

0

70

140

210

q

0

γ

0

0

b

0

c

0g 070

140

210

q 0

γ

0∅

0

b

0c

0

g

210

0

70140

q

0

γ

0

0

b0

c0

g

0

70

140210

q0

γ0 ∅

0b

0c

0

g

0

70

140

210

q

0

γ0

ᵀp-x

am

en

oC

erawa-DCQenoC

ᵀp-

xa

mA

G

G AQ C D - a w a r e

Re

c l us t e

r ed

∅: no label

q: light quark

g: gluon

c: charm

b: bottom

γ: photon

Figure 4: Chord diagram showing the relationships between five chosen labelling schemes.There is overwhelming agreement in the label assignments from each scheme, and (as expected)variations of the "max-pT" scheme are prone to label a jet as photon-initiated (γ) when a QCD-aware scheme would label the jet as a gluon or assign no label (as noted in [5]).

FEATURE RANKING RESULTSOur data initially contains more than 300 features. Removing

duplicate and highly correlated features halves the number of fea-

tures; first we filter on the absolute value of Pearson correlation to

identify linearly correlated features, then we filter on the absolute

value of Spearman correlation to identify monotonically-related fea-

tures. To optimally search the remaining feature space to find the

variables that provide the best predictive power, we invented a fast

filter-based method that involves ranking variables by information

gain (Kullback-Leibler divergence) or Gini impurity and comparing

their rank with that of random probes injected into the data. We do

this repeatedly for a large number of bootstrap resamplings to yield

a median (or, optionally, a mean) and nonparametric confidence in-

terval estimate for the chosen metric for each feature. We then re-

move features with values of the metric less than that for random

probes, or within one standard deviation:

Figure 5: Box-and-whisker plot showing median information gain for features and six ran-dom probes (denoted "BOGUS"). To the left is worse, to the right is better.

In addition to confirming the power of features alreadyproposed in past work, this method has found intriguingnew variables that promise better discrimination betweenquark- and gluon-initiated jets and are therefore ideal can-didates for further study.

REFERENCES[1] C. Manuel. The Stopping Power of Hot Nuclear Matter. Physics, 7(97), 2014. doi:

10.1103/Physics.7.97. URL http://link.aps.org/doi/10.1103/Physics.7.97.[2] J. Gallicchio, J. Huth, M. Kagan, M. D. Schwartz, K. Black, and B. Tweedie. Mul-

tivariate discrimination and the Higgs + W/Z search. JHEP, 04:069, 2011. doi:10.1007/JHEP04(2011)069.

[3] E. M. Schwarz, M. Schmookler, and S. D. Trong. FPU implementations with denormalizednumbers. IEEE Transactions on Computers, 54(7):825–836, July 2005. ISSN 0018-9340. doi:10.1109/TC.2005.118.

[4] I. Dooley and L. Kale. Quantifying the interference caused by subnormal floating-pointvalues. In Proceedings of the Workshop on Operating System Interference in High PerformanceApplications, 2006.

[5] A. Buckley and C. Pollard. QCD-aware partonic jet clustering for truth-jet flavour labelling.Eur. Phys. J., C76(2):71, 2016. doi: 10.1140/epjc/s10052-016-3925-z.

This work was supported by: Hungarian National Research Fund (OTKA) NK106119 and the Wigner GPU Laboratory of the Wigner RCP, Hungarian Academy of Sciences