21
New Developments in the ROOT Mathematical Libraries L. Moneta, C. Gumpert, B. Rabacal (CERN, PH-SFT)

New Developments in the ROOT Mathematical Libraries

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

New Developments in the ROOT Mathematical

LibrariesL. Moneta, C. Gumpert, B. Rabacal

(CERN, PH-SFT)

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Outline

Recent developments in ROOT libMathCorenumerical algorithms interfaces

Fitting improvements TFitResult

New classes in Hist library TEfficiency class

TKDE class for

density estimationGoodness of Fit

new GoFTest class Conclusions

2

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Recent MathCore Developments

libMathCore provides the basic Math functionalityMathematical and statistical functions

in TMath or ROOT::Math namespace

Random number generatorsImplementation of basic algorithms

(integration, derivation, root finders, etc..)Interfaces for function evaluations and for numerical algorithms

Additional implementations provided in other libraries (e.g. libMathMore)

transparent mechanism to use them via the plug-in manager

see Integrator or Minimizer interfaces

Fitting classes (in namespace ROOT::Fit)

Fitter, FitResult, etc.. using function and Minimizer interfaces

3

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Numerical Integration

Single entry point for multiple implementation: ROOT::Math::Integrator

4

using namespace ROOT::Math; //multidim integrand function double func( const double* x, const double *p);....// Functor class to wrap user function in interfaceFunctor f(func,dimension);// adaptive cubature methodIntegratorMultiDim ig(IntegrationMultiDim::kADAPTIVE);double v1 = ig.Integral(f,xmin,xmax);

// MC method (VEGAS) loaded from MathMore libraryIntegratorMultiDim ig(IntegrationMultiDim::kVEGAS);double v2 = ig.Integral(f,xmin,xmax);

Different implementation can be selected. Example of usage: RooStats::BayesianCalculator

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Function MinimizationCommon interface class (ROOT::Math::Minimizer) for all ROOT

minimizer implementations. Existing plug-ins: Minuit (based on class TMinuit, direct translation from Fortran code) Minuit2 (new C++ implementation with OO design) Fumili (only for least-square or log-likelihood minimizations) GSL minimizers : conjugate gradient algorithms (Fletcher-Reeves, BFGS) and

Levenberg-Marquardt (for minimizing least square functions) Linear for least square functions (direct solution, non-iterative method) Genetic minimizer (based on algorithm implemented in TMVA)

Easy to extend and plug-in new minimizers NagC, Opt++,....?

Possible to combine minimizers eg: Minuit+Genetic minimizer

Control via MinimizerOptions class: MinimizerOptions::SetDefaultMinimizer(“Minuit2”);

Exists also a RooFit interface (RooMinimizer) (from A.Lazzaro)see

5

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

DistSampler class

New interface class in version 5.28 for random generation of data according to a generic distribution implemented currently using UNU.RANan implementation based on Foam is coming

can also generate directly a data sets (binned or unbinned) plan to use it in RooFit for implementing RooAbsPdf::generate

6

using namespace ROOT::Math; ....DistSampler * sampler = Factory::CreateDistSampler(“Unuran”);// set the sampling distributionsampler->SetFunction(user_function);// init with algorithm namesampler->Init(“TDR”); for (int i = 0; i< n;++i) {

// sample 1D datadouble x = sampler->Sample1D();// sample for multi-dimensional dataconst double * xx = sampler->Sample().......

}

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Fitting improvements

New fitting classes already presented in past conferencesNew since 5.26: TFitResult class

returned from the TH1::Fit or TGraph::Fit using TFitResultPtr need to use option “S” otherwise just the status (int) is returned

TFitResult contains all fit result informationparameters, error, covariance matrix, Minos erros, minimizer status, etc..

7

// return a smart pointer to TFitResult using option “S” TFitResultPtr r = h1->Fit("gaus","S");double chi2 = r->Chi2(); // chi2 of fit double fmin = r->MinFcnValue(); // minimum of fcn function

const double * par = r->GetParams(); // get fit parametersconst double * err = r->GetErrors(); // get fit errorsTMatrixDSym covMat = r->GetCovarianceMatrix(); TMatrixDSym corMat = r->GetCorrelationMatrix();

r->Print(“V”); // full printout of result

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

New class in 5.28 for efficiencies and binomial errorscommon problem in HEP analysis (trigger, selection cuts, etc..)

Histogram described by Poisson statistics bin counts:

Division of two histograms described by Binomial statistics

if they are correlated

if k and n are uncorrelated, ratio of Poisson can still be written as

Histogram class cannot fully describe binomial statisticsneed both ki and ni for further analysis (combination, fitting, etc..)

Histogram division

8

ni : Poisson(ni|µi)

ki

ni: Binomial(k|n, �)

ni1

ni2

=ni

1 + ni2

ni2

− 1 → 1�− 1

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Motivation for TEfficiency

What did we have in ROOT ? TH1::Divide uses normal approximation for binomial errors

TGraphAsymErrors::BayesDivide binomial intervals with Bayesian statistics assuming an uniform prior.

TEfficiency class now provides several statistical methods for computing binomial confidence intervalsfrequentist interval (Clopper-Pearson) and described in PDGapproximate methods (Agresti-Coull, Wilson)Bayesian interval based on a Beta prior distribution

include uniform ( Beta(1,1) ) and Jeffrey Beta(1/2,1/2) priors

9

�̂ =k

��̂(1− �̂)

nfails for ε-> 0 or 1

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Binomial intervals

Coverage probabilities for the binomial interval

10

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

TEfficiency

TEfficiency provides possibility to estimate and draw intervals at different confidence level and statistics option

Support also for 2D and 3D objectsPossible to fill directly TEfficiency

eff.Fill(true, x); for the events passing a selection eff.Fill(false, x); for the events failing the selection

11

TEfficiency ef(*h1,*h2);

ef.SetStatisticOption(kFCP);ef.SetStatisticOption(kFAC);

ef.SetConfidenceLevel(0.683);

ef.Draw(“A4”);

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Fitting TEfficiency

TEfficiency::Fit : binned maximum likelihood fit using a binomial probability for each bin

12

maxL(ki|Ni, pi) =�

i

ni!(ni − ki)!ki!

fkii (1− fi)ni−ki with fi = f(�i, �p)

Least square (χ2) fit not statistically correct for ε ≃ 0 or ε ≃ 1

using the class TBinomialEfficiency

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

TEfficiency Combinations

Possible to combine and merge different TEfficiency objectssupport combinations from a list of objects with different weights

e.g. combination of efficiency generating from different processesUse Bayesian statistics for the combination

support generalization for weighted eventse.g. in combination of different MC samples

13

Pcomb(�|wi, ki, Ni) ∝�

i

L(ki|Ni, �)wiΠ(�)

L(ki|Ni, �) : is the likelihood functionΠ(�) = B(�, α, β): prior (beta distribution)

wi : weights renormalized to w�

i = wi

�i wi�i w2

i

the combined posterior is then:

Pcomb(�|wi, ki, Ni) = B(�,�

i

wiki + α,�

i

wi(ni − ki) + β)

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Non Parametric Density Estimator

Estimate of the underlying probability density function from the underlying datanon parametric: do not assume any model for the data in contrast

to parametric estimators which require a data modele.g. fitting is instead a parametric estimation

Histogram is a non parametric density estimator simplest and computational most efficientdrawbacks:

discontinuities and dependence on bin width and origin

Kernel density estimators is an alternative method

the bandwidth h is a smoothing parameter influencing both bias and variance of estimator

14

�fh(x) =1

nh

n�

i=1

K(x− xi

h)

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

New TKDE Class

Kernel density estimator classes exist in both RooFit and TMVAno real kernel density estimator in core ROOT.

TH1K class is based on nearest-neighbor (uniform kernel)

New class TKDE will be available in 5.28 (in libHist)support for various kernel (default is Gaussian) but also

Epanechnikov, Bi-weight and Arc-cosine kernelssupport for adaptive bandwidth (better for multi-modal distribution

and for describing peaks and tails)can provide both full result or interpolated one for fast evaluationcan support data binning for efficient bandwidth computation in the

adaptive case

Working on a multi-dimensional class using kd-tree as data storage (TKDTree)plan to use also Foam for optimal multi-dimensional binning

15

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Examples of TKDE

Example: gaussian, bi-gaussian and log-normal

16

GaussianLog-normal

Log-normal(log-scale)

Bi-Gaussian

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Errors from TKDE

Can draw also error (confidence interval at desired level)and also bias and RMS (root mean square)

17

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

New GoF Test Class

New class for goodness of fit tests: ROOT::Math::GoFTest in libMathCore1-sample test

test if data are compatible with a reference distribution user provided distributions or standard ones (normal, log-normal,etc..)

2 sample testtest if two data sets are compatible

working on un-bin data setswe have already the Pearsonχ2 test on the bin data sets (histograms)

Kolmogorov-Smirnov test was already existing in ROOT for the 2 sample and bin dataadd 1 sample test

Anderson-Darling test much more sensitive to detect tails variation

18

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Example of using GoFTest

1 sample test

2 sample test

19

using namespace ROOT::Math; // create gof test class on data x[n]={....} GoFTest gof(n,x,GoFTest::kLogNormal); // set a user distribution object// which must implement operator ()(x) gof.SetUserDistribution(user_dist);

double pValueAD = gof.AndersonDarlingTest();double pValueKS = gof.KolmogorovSmirnovTest();

// create GoF test for data x1[n1] and x2[n2] GoFTest gof2(n1,x1,n2,x2);

double pValueAD = gof2.AndersonDarling2SamplesTest();double pValueKS = gof2.KolmogorovSmirnov2SamplesTest();

data 2 quantiles

data

1 q

uant

iles

data

qua

ntile

s

theoretical quantiles

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Conclusions

Large collection of math and statistical tools available in ROOTworking recently on improving the overall quality

more tests, studied and improved performance whenever possiblefixed several issues found from a code static checker (Coverity)

improving modularity common interfaces for functions and algorithms

improve usability (e.g. new classes like TFitResult) New classes useful for LHC data analysis will be available in 5.28

TEfficiency to compute and display binomial intervals TKDE for density kernel estimation ROOT::Math::GoFTest for goodness of fit tests

Developing advanced tools for physics analysis complex fitting (RooFit) multivariate analysis (TMVA) (see poster 081) new statistical framework (RooStats)

see separate presentation next Thursday in Event Processing session 20

CHEP 2010, Taipei, Taiwan 2010 Lorenzo Moneta, CERN/PH-SFT

Documentation

Online reference documentation (most up-to date) class description with THtml (and also Doxygen)

see http://root.cern.ch/root/htmldoc/MATH_Index.html

see TEfficiency doc as example of a very well documented class

Math library documentation on Drupal see http://root.cern.ch/drupal/content/mathematical-libraries document most of the recent developments (numerical algorithm, fitting, etc..)

ROOT User guides: see http://root.cern.ch/root/doc/RootDoc.html

not been updated with latest developmentsTMVA, RooFit and RooStats (in preparation) user guides

ROOT Talk Forum (for support, requests and discussions)✦ a thread is available for only Math and Statistical topics✦ bugs should be reported to Savannah

21