39
1 Probability and Statistics What is probability? What is statistics?

1 Probability and Statistics What is probability? What is statistics?

Embed Size (px)

Citation preview

Page 1: 1 Probability and Statistics  What is probability?  What is statistics?

1

Probability and Statistics

What is probability?What is statistics?

Page 2: 1 Probability and Statistics  What is probability?  What is statistics?

2

Probability and StatisticsProbability

Formally defined using a set of axioms Seeks to determine the likelihood that a given

event or observation or measurement will or has happened

What is the probability of throwing a 7 using two dice?

Statistics Used to analyze the frequency of past events Uses a given sample of data to assess a

probabilistic model’s validity or determine values of its parameters

After observing several throws of two dice, can I determine whether or not they are loaded

Also depends on what we mean by probability

Page 3: 1 Probability and Statistics  What is probability?  What is statistics?

3

Probability and StatisticsWe perform an experiment to collect a

number of top quarks How do we extract the best value for its

mass? What is the uncertainty of our best value? Is our experiment internally consistent? Is this value consistent with a given

theory, which itself may contain uncertainties?

Is this value consistent with other measurements of the top quark mass?

Page 4: 1 Probability and Statistics  What is probability?  What is statistics?

4

Probability and StatisticsCDF “discovery” announced

4/11/2011

Page 5: 1 Probability and Statistics  What is probability?  What is statistics?

5

Probability and Statistics

Page 6: 1 Probability and Statistics  What is probability?  What is statistics?

6

Probability and Statistics

Pentaquark search - how can this occur?

2003 – 6.8 effect 2005 – no effect

Page 7: 1 Probability and Statistics  What is probability?  What is statistics?

7

ProbabilityLet the sample space S be the space of

all possible outcomes of an experimentLet x be a possible outcome

Then P(x found in [x,x+dx]) = f(x)dx f(x) is called the probability density function

(pdf) It may be called f(x;) since the pdf could

depend on one or more parameters Often we will want to determine from a set of

measurements Of course x must be somewhere so

1

dxxf

Page 8: 1 Probability and Statistics  What is probability?  What is statistics?

8

ProbabilityDefinitions of mean and variance are

given in terms of expectation values

2222

xExExExV

dxxxfxE

Page 9: 1 Probability and Statistics  What is probability?  What is statistics?

9

ProbabilityDefinitions of covariance and correlation

coefficient

0,cov so and

then and

t then independeny x,if

,cov

,cov

yx

dxdyx,yxyfx,yE

yfxfx,yf

yx

xyEyxEyxV

yx

yx

yxxy

yxyxxy

Page 10: 1 Probability and Statistics  What is probability?  What is statistics?

10

ProbabilityError propagation

n

jiij

xji

ii

x

n

i i

jiij

n

Vx

y

x

yyxyE

yxyE

xx

yyxy

xxVxE

xxxxxy

1,

22

1

21

find we

expanding TSthen

,cov and and

,..., ables with variConsider

Page 11: 1 Probability and Statistics  What is probability?  What is statistics?

11

ProbabilityThis gives the familiar error propagation

formulas for sums (or differences) and products (or ratio)

21

2122

22

21

21

2

2

21

2122

21

2

21

222

,cov2

for and

,cov2

for find we

Using

xx

xx

xxy

xxy

xx

xxy

yEyEyV

y

y

y

Page 12: 1 Probability and Statistics  What is probability?  What is statistics?

12

Uniform DistributionLet

What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?

2

2

2

12

11

2

1

2

1

otherwise

for

0

1),;(

dxxxV

xExExV

dxx

dxxxfxE

βxαxf

Page 13: 1 Probability and Statistics  What is probability?  What is statistics?

13

Binomial DistributionConsider N independent experiments

(Bernoulli trials)Let the outcome of each be pass or failLet the probability of pass = p

nNn

nNn

ppnNn

NpNnf

N

nNn

N

pp

1 !!

!),;(

timeaat n themgrouping objects, habledistinguis

for spermuation !!

! are But there

1 successesn ofy Probabilit

Page 14: 1 Probability and Statistics  What is probability?  What is statistics?

14

PermutationsQuick review

n

N

nN-n

N

N-n

n

n

NN

!!

! is spermuation unique ofnumber

toleads nspermutatio irrelevant for these accounting Thus

typesecond theof elements remaining for the Ditto

situation same the tolead elements theof !

so ishableindistingu are first type theof elements But

habledistinguiselement each considers This

! elementsfor nspermutatio ofNumber

Page 15: 1 Probability and Statistics  What is probability?  What is statistics?

15

Binomial DistributionFor the mean and variance we

obtain (using small tricks)

And note with the binomial theorem that

pNpnEnEnV

NppNnnfnEN

n

1

,;

22

0

N

n

NnNnN

n

ppppn

NpNnf

00

111),;(

Page 16: 1 Probability and Statistics  What is probability?  What is statistics?

16

Binomial Distribution

Binomial pdf

Page 17: 1 Probability and Statistics  What is probability?  What is statistics?

17

Binomial Distribution

Examples Coin flip (p=1/2) Dice throw (p=1/6) Branching ratio of nuclear and

particle decays (p=Br) Detector or trigger efficiencies (pass

or not pass) Blood group B or not blood group B

Page 18: 1 Probability and Statistics  What is probability?  What is statistics?

18

Binomial Distribution It’s baseball season! What is the

probability of a 0.300 hitter getting 4 hits in one game?

hitter 0.300 afor hits 0.841.2Expect

84.07.03.04

2.13.04

0081.07.03.0!0!4

!43.0,4;4

3.0,4

04

nV

nE

f

pN

Page 19: 1 Probability and Statistics  What is probability?  What is statistics?

19

Poisson DistributionConsider when

vnV

vnE

en

vvnf

vNpnE

p

N

vn

2

finds one and!

;

is pdf PoissonThe

0

Page 20: 1 Probability and Statistics  What is probability?  What is statistics?

20

Poisson Distribution

pNn

evep

n

NpNnf

eepN

Npp

nNnNpnNpp

NnN

N

ppnNn

NpNnf

vnvn

n

vNpnN

nN

n

nNn

small and largefor !!

,;

...!2

11

...!2

111

largeN for !

!

1!!

!,;

22

2

Page 21: 1 Probability and Statistics  What is probability?  What is statistics?

21

Poisson DistributionPoisson pdf

Page 22: 1 Probability and Statistics  What is probability?  What is statistics?

22

Poisson DistributionExamples

Particles detected from radioactive decays Sum of two Poisson processes is a Poisson process

Particles detected from scattering of a beam on target with cross section

Cosmic rays observed in a time interval t Number of entries in a histogram bin when

data is accumulated over a fixed time interval

Number of Prussian soldiers kicked to death by horses

Infant mortality QC/failure rate predictions

Page 23: 1 Probability and Statistics  What is probability?  What is statistics?

23

Poisson DistributionLet

271.0!2

2)2;2(

271.0!1

2)2;1(

135.0!0

2)2;0(

and

/2 Then

10210

11Let

atoms 10Let

22

21

20

2012

20

eP

eP

eP

sNpv

/s~yτ

p

N~

Page 24: 1 Probability and Statistics  What is probability?  What is statistics?

24

Gaussian DistributionGaussian distribution

Important because of the central limit theorem For n independent variables x1,x2,…,xN that

are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N

Examples are almost any measurement error (energy resolution, position resolution, …)

ii

ii

yV

yE

Page 25: 1 Probability and Statistics  What is probability?  What is statistics?

25

Gaussian DistributionThe familiar Gaussian pdf is

2

2

2

2 2exp

2

1,;

xV

xE

xxf

Page 26: 1 Probability and Statistics  What is probability?  What is statistics?

26

Gaussian Distribution Some useful properties of the Gaussian

distribution are P(x in range ±) = 0.683 P(x in range ±2) = 0.9555 P(x in range ±3) = 0.9973 P(x outside range ±3) = 0.0027 P(x outside range ±5) = 5.7x10-7

P(x in range ±0.6745) = 0.5

Page 27: 1 Probability and Statistics  What is probability?  What is statistics?

27

2 DistributionChi-square distribution

d.o.f. withondistributi thefollows

,,t independen

for that is pdf thisof usefulness The

2

zE

freedom of degrees ofnumber theis ,...2,1

02/2

1;

2

12

2

2

2/12/2/

n

xz

xn

nzV

n

n

zezn

nzf

n

i i

ii

iii

znn

Page 28: 1 Probability and Statistics  What is probability?  What is statistics?

28

2 Distribution

Page 29: 1 Probability and Statistics  What is probability?  What is statistics?

29

Probability

Page 30: 1 Probability and Statistics  What is probability?  What is statistics?

30

ProbabilityProbability can be defined in terms of

Kolmogorov axioms The probability is a real-valued function

defined on subsets A,B,… in sample space S

This means the probability is a measure in which the measure of the entire sample space is 1

1

,0 If

0 in subset every For

SP

BPAPBAPBA

AS, PA

Page 31: 1 Probability and Statistics  What is probability?  What is statistics?

31

ProbabilityWe further define the conditional

probability P(A|B) read P(A) given B

Bayes’ theorem

BP

BAPBAP

|

BP

APABPBAP

ABPBAP

||

Using

Page 32: 1 Probability and Statistics  What is probability?  What is statistics?

32

ProbabilityFor disjoint Ai

Usually one treats the Ai as outcomes of a repeatable experiment

iii

iii

APABP

APABPBAP

then

APABPBP

|

||

|

Page 33: 1 Probability and Statistics  What is probability?  What is statistics?

33

ProbabilityUsually one treats the Ai as outcomes of

a repeatable experiment Then P(A) is usually assigned a value equal to

the limiting frequency of occurrence of A

Called frequentist statisticsBut Ai could also be interpreted as

hypotheses, each of which is true or false Then P(A) represents the degree of belief that

hypothesis A is true Called Bayesian statistics

n

AAP

n lim

Page 34: 1 Probability and Statistics  What is probability?  What is statistics?

34

Bayes’ TheoremSuppose in the general population

P(disease) = 0.001 P(no disease) = 0.999

Suppose there is a test to check for the disease P(+, disease) = 0.98 P(-, disease) = 0.02

But also P(+, no disease) = 0.03 P(-, no disease) = 0.97

You are tested for the disease and it comes back +. Should you be worried?

Page 35: 1 Probability and Statistics  What is probability?  What is statistics?

35

Bayes’ TheoremApply Bayes’ theorem

3.2% of people testing positive have the disease

Your degree of belief about having the disease is 3.2%

032.0999.003.0001.098.0

001.098.0),(

) () ,()(),(

)(),(),(

diseaseP

diseasenoPdiseasenoPdiseasePdiseaseP

diseasePdiseasePdiseaseP

Page 36: 1 Probability and Statistics  What is probability?  What is statistics?

36

Bayes’ Theorem Is athlete A guilty of drug doping?Assume a population of athletes in this sport

P(drug) = 0.005 P(no drug) = 0.995

Suppose there is a test to check for the drug P(+, drug) = 0.99 P(-, drug) = 0.01

But also P(+, no drug) = 0.004 P(-, no drug) = 0.996

The athlete is tested positive. Is he/she involved in drug doping?

Page 37: 1 Probability and Statistics  What is probability?  What is statistics?

37

Bayes’ TheoremApply Bayes’ theorem

???

45.0005.099.0995.0004.0

995.0004.0), (

)(),() () ,(

) () ,(), (

and

55.0995.0004.0005.099.0

005.099.0),(

) () ,()(),(

)(),(),(

drugnoP

drugPdrugPdrugnoPdrugnoP

drugnoPdrugnoPdrugnoP

drugP

drugnoPdrugnoPdrugPdrugP

drugPdrugPdrugP

Page 38: 1 Probability and Statistics  What is probability?  What is statistics?

38

Binomial DistributionCalculating efficiencies

Usually use instead of p

efficiency theonerror theis This

/111

estimateour use can but we trueknow thet don' We

12

NnnNNN

NnV

N

n

N

nENnE

Page 39: 1 Probability and Statistics  What is probability?  What is statistics?

39

Binomial DistributionBut there is a problem

If n=0, (’) = 0 If n=N, (’) = 0

Actually we went wrong in assuming the best estimate for is n/N We should really have used the most

probable value of given n and NA proper treatment uses Bayes’

theorem but lucky for us (in HEP) the solution is implemented in ROOT h_num->Sumw2() h_den->Sumw2() h_eff->Divide(h_num,h_den,1.0,1.0,”B”)