45
Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Département d’Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Machine learning – p. 1/45

Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Statistical foundations of machinelearning

INFO-F-422

Gianluca Bontempi

Département d’Informatique

Boulevard de Triomphe - CP 212

http://www.ulb.ac.be/di

Machine learning – p. 1/45

Page 2: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Testing hypothesis• Hypothesis testing is the second major area of statistical inference.

• A statistical hypothesis is an assertion or conjecture about the distributionof one or more random variables.

• A test of a statistical hypothesis is a rule or procedure for decidingwhether to reject the assertion on the basis of the observed data.

• The basic idea is formulate some statistical hypothesis and look to see ifthe data provides any evidence to reject the hypothesis.

Machine learning – p. 2/45

Page 3: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

An hypothesis testing problem• Consider the model of the traffic in the boulevard.

• Suppose that the measures of the inter-arrival times areDN = {10, 11, 1, 21, 2, . . . } seconds.

• Can we say that the mean inter-arrival time θ is different from 10?

• Consider the grades of two different school sections.• Section A had {15, 10, 12, 19, 5, 7}.• Section B had {14, 11, 11, 12, 6, 7}.• Can we say that Section A had better grades than Section B?

• Consider two protein coding genes and their expression levels in a cell.Are the two genes differentially expressed ?

A statistical test is a procedure that aims to answer such questions.

Machine learning – p. 3/45

Page 4: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Types of hypothesisWe start by declaring the working (basic, null) hypothesis H to be tested, in theform θ = θ0 or θ ∈ ω ⊂ Θ, where θ0 or ω are given .The hypothesis can be

Simple. It fully specifies the distribution of z.

Composite. It partially specifies the distribution of z.

Example : if DN constitutes a random sample of size N from N (µ, σ2) thehypothesis H : µ = µ0, σ = σ0, (with µ0 and σ0 known values) is simple whilethe hypothesis H : µ = µ0 is composite since it leaves open the value of σ in(0,∞).

Machine learning – p. 4/45

Page 5: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Types of statistical testSuppose we have collected N samples DN = {z1, . . . , zN} from a distributionFz and we have declared a null hypothesis H about F .Three are the most common types of statistical test:

Pure significance test: data DN are used to assess the inferential evidenceagainst H.

Significance test: the inferential evidence against H is used to judge whetherH is inappropriate. In other words it is a rule for rejecting H.

Hypothesis test: data DN are used to assess the hypothesis H against aspecific alternative hypothesis H . In other words this is a rule forrejecting H in favour of H.

Machine learning – p. 5/45

Page 6: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Pure significance test• Suppose that the null hypothesis H is simple.

• Let t(DN ) be a statistic such that the larger its value the more it castsdoubt on H.

• The quantity t(DN ) is called test statistic or discrepancy measure .

• Let tN = t(DN ) the value of t calculated on the basis of the sample dataDN .

• Let us consider the p-value quantity

p = Prob {t(DN ) > tN |H}

• If p is small the sample data DN are highly inconsistent with H and p

(significance probability or significance level ) is the measure of suchinconsistency.

Machine learning – p. 6/45

Page 7: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Some considerations• p is the proportion of situations under the hypothesis H where we would

observe a degree of inconsistency at least to the extent represented bytN .

• tN is the observed value of the statistic for a given DN . Different DN

yield different values of p ∈ (0, 1).

• it is essential that the distribution of t(DN ) under H is known.

• We cannot say that p is the probability that H is true but better that p is the

probability that the dataset DN is observed given that H is true

• Open issues

1. What if H is composite?

2. how to choose t(DN ).

Machine learning – p. 7/45

Page 8: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Tests of significance• Suppose that the value p is known. If p is small either a rare event has

occured or perhaps H is not true.

• Idea: if p is less than some stated value α, we reject H.

• We choose a critical level α, we observe DN and we reject H at level α if

P{t(DN ) > tN |H) ≤ α

• This is equivalent to choose some critical value tα and we reject H iftN > tα.

• We obtain two regions in the space of sample data:

critical region S0 where if DN ∈ S0 we reject H.

non-critical region S1 where the sample data DN gives us no-reason toreject H on the basis of the level-α test.

Machine learning – p. 8/45

Page 9: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Some considerations• The principle is that we will accept H unless we witness some event that

has sufficiently small probability of arising when H is true.

• If H were true we could still obtain data in S0 and consequently wronglyreject H with probability

Prob {DN ∈ S0|H} = Prob {t(DN ) > tα|H} = α

• The significance level α provides an upper bound to the maximumprobability of incorrectly rejecting H.

• The p-value is the probability that the test statistic is more extreme thanits observed value. The p-value changes with the observed data (i.e. it isa random variable) while α is a level fixed by the user.

Machine learning – p. 9/45

Page 10: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Standard normal distribution

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Normal distribution function (µ=0, σ=1)

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Normal density function (µ=0, σ=1)

Remember that z0.05 ≈ 1.64.This means that, if z ∼ N (0, 1), then Prob {z ≥ z0.05} = 0.05 and also that

Prob {|z| ≥ z0.05} = 2 ∗ 0.05 = 0.1

For a generic z ∼ N (µ, σ2)

Prob {|z− µ| ≥ σ ∗ z0.05} = 2 ∗ 0.05 = 0.1

Machine learning – p. 10/45

Page 11: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP: example• Let DN consist of N independent observations of x ∼ N (µ, σ2), with

known variance σ2.

• We want to test the hypothesis H : µ = µ0 with µ0 known.

• Consider as test statistic t(DN ), the quantity |µ− µ0| where µ is thesample average estimator . If H is true we know that µ ∼ N (µ0, σ

2/N).

• Let us calculate the value t(DN ) = |µ− µ0| and assume that therejection region is S0 = {|µ− µ0|||µ− µ0| > tα}.

• Let us put a significance level α = 10% = 0.1. This means that tα shouldsatisfy

Prob {t(DN ) > tα|H} = Prob {|µ− µ0| > tα|H} =

Prob {(µ− µ0 > tα) (OR) (µ− µ0 < −tα)|H} = 0.1

Machine learning – p. 11/45

Page 12: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP: example (II)• For a normal variable x ∼ N (µ, σ2)

Prob {x− µ > 1.645σ} = 1− Fx(1.645σ) = 0.05

and consequently

Prob {x− µ > 1.645σ (OR) x− µ < −1.645σ} = 0.05 + 0.05 = 0.1

• It follows that being µ ∼ N (µ0, σ2/N) (i.e. µ−µ0

σ/√

N∼ N (0, 1)) once we put

tα = 1.645σ/√

N

we haveProb {|µ− µ0| > tα|H} = 0.1

and that the critical region is

S0 ={

DN : |µ− µ0| > 1.645σ/√

N}

Machine learning – p. 12/45

Page 13: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP: example (III)• Suppose that σ = 0.1 and that we want to test if µ = µ0 = 10 with a

significance level 10%.

• After N = 6 observations we have DN = {10, 11, 12, 13, 14, 15}.• On the basis of the dataset we compute

µ =10 + 11 + 12 + 13 + 14 + 15

6= 12.5

andt(DN ) = |µ− µ0| = 2.5

• Since tα = 1.645 ∗ 0.1/√

6 = 0.0672, and t(DN ) > tα, the observationsDN are in the critical region.

• The hypothesis is rejected .

Machine learning – p. 13/45

Page 14: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Hypothesis testing: types of errorSo far we considered a single hypothesis. Let us now consider twoalternative hypothesis: H and H.

Type I error. It is the error we make when we reject H if it is true.Significance level represents the probability of making the type I error.

Type II error. It is the error we make when we accept H if it is false.In order to define this error, we are forced to declare an alternativehypothesis H as a formal definition of what is meant by H being “false”.The probability of type II error is the probability that the test leads toacceptance of H when in fact H prevails.When the alternative hypothesis is composite, there is no unique Type IIerror.

Machine learning – p. 14/45

Page 15: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

An analogy• Consider the analogy with a murder trial, where we have as suspect Mr.

Bean.

• The null hypothesis H is “Mr. Bean is innocent”.

• The dataset is the amount of evidence collected by the police againstMr. Bean.

• The Type I error is the error that we make if, being Mr. Bean innocent,we send him to penalty death.

• The Type II error is the error that we make if, being Mr. Bean guilty, weacquit him.

Machine learning – p. 15/45

Page 16: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Hypothesis testing• Suppose we have some data {z1, . . . , zN} ∼ F from a distribution F .

• H and H represent two hypotheses about F .

• On the basis of the data, one is accepted and one is rejected .

• Note that the two hypotheses have different philosophical status(asymmetry).

• H is a conservative hypothesis, not to be rejected unless evidence isclear. This means that a type I error is more serious than a type II error(benefit of the doubt).

• It is often assumed that F belongs to a parametric family F (z, θ). Thetest on F becomes a test on θ.

• A particular example of hypothesis test is the goodness of fit test wherewe test H : F = F0 against H : F 6= F0.

Machine learning – p. 16/45

Page 17: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

The five steps of hypothesis testing1. Declare the null (e.g. H: honest student) and the alternative hypothesis

(H: cheat student)

2. Choose the numeric value of the type I error (e.g. the risk I want to run).

3. Choose a procedure to obtain test statistic (e.g. number of similar lines).

4. Determine the critical value of the test statistic (e.g. 4 identical lines) thatleads to a rejection of H. This is done in order to ensure the Type I errordefined in Step 2.

5. Obtain the data and determine whether the observed value of the teststatistic leads to an acceptation or rejection of H.

Machine learning – p. 17/45

Page 18: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Quality of the testSuppose that

• N students took part to the exam,

• NN did not copy,

• NP copied,

• NN were considered not guilty and passed the exam

• NP were considered guilty and rejected

• FP honest students were refused

• FN cheat students passed.

Machine learning – p. 18/45

Page 19: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Confusion matrixThen we have

Not refused Refused

H: Not guilty student (-) TN FP NN

H: Guilty student (+) FN TP NP

NN NP N

• FP is the number of False Positives and the ratio FP /NN represents thetype I error.

• FN is the number of False Negatives and the ratio FN/NP representsthe type II error.

Machine learning – p. 19/45

Page 20: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Specificity and sensitivitySpecificity: the ratio (to be maximized)

SP =TN

FP + TN=

TN

NN=

NN − FP

NN= 1− FP

NN, 0 ≤ SP ≤ 1

It increases by reducing the number of false positive.

Sensitivity: the ratio (to be maximized)

SE =TP

TP + FN=

TP

NP=

NP − FN

NP= 1− FN

NP, 0 ≤ SE ≤ 1

It increases by reducing the number of false negatives and correspondsto the power of the test (i.e. it estimates the quantity 1-Type II error).

Machine learning – p. 20/45

Page 21: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Specificity and sensitivity (II)There exists a trade-off between these two quantities.

• In the case of a test who return always H (e.g. very kind professor) wehave NP = 0,NN = N , FP = 0, TN = NN and SP = 1 but SE = 0.

• In the case of a test who return always H (e.g. very suspiciousprofessor) we have NP = N ,NN = 0, FN = 0, TP = NP and SE = 1 butSP = 0.

Machine learning – p. 21/45

Page 22: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

False Positive and False Negative RateFalse Positive Rate:

FPR = 1 − SP = 1 −TN

FP + TN

=FP

FP + TN

=FP

NN

, 0 ≤ FPR ≤ 1

It decreases by reducing the number of false positive and estimates the Type I error.

False Negative Rate

FNR = 1 − SE = 1 −TP

TP + FN

=FN

TP + FN

=FN

NP

0 ≤ FPR ≤ 1

It decreases by reducing the number of false negative.

Machine learning – p. 22/45

Page 23: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Predictive valuePositive Predictive value: the ratio(to be maximized)

PPV =TP

TP + FP=

TP

NP

, 0 ≤ PPV ≤ 1

Negative Predictive value: the ratio (to be maximized)

PNV =TN

TN + FN=

TN

NN

, 0 ≤ PNV ≤ 1

False Discovery Rate: the ratio (to be minimized)

FDR =FP

TP + FP=

FP

NP

= 1− PPV, 0 ≤ FDR ≤ 1

Machine learning – p. 23/45

Page 24: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Receiver Operating Characteristic curveThe Receiver Operating Characteristic (also known as ROC curve) is a plotof the true positive rate (i.e. sensitivity or power) against the false positiverate (Type I error) for the different possible decision thresholds of a test.

Consider an example where t+ ∼ N (1, 1) and t− ∼ N (−1, 1). Suppose thatthe examples are classed as positive if t > THR and negative if t < THR,where THR is a threshold.

• If THR = −∞, all the examples are classed as positive: TN = FN = 0

which implies SE = TP

NP= 1 and FPR = FP

FP +TN= 1.

• If THR =∞, all the examples are classed as negative: TP = FP = 0

which implies SE = 0 and FPR = 0.

Machine learning – p. 24/45

Page 25: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

ROC curve

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

SE

R script roc.R

Machine learning – p. 25/45

Page 26: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Choice of testThe choice of test and consequently the choice of the partition {S0, S1} isbased on two steps

1. Define a significance level α, that is the probability of type I error

Prob {reject H|H} = Prob {DN ∈ S0|H} ≤ α

that is the probability of incorrectly rejecting H

2. Among the set of tests {S0, S1} of level α, choose the test that minimizesthe probability of type II error

Prob{

accept H|H}

= Prob{DN ∈ S1|H

}

that is the probability of incorrectly accepting H. This is equivalent tolook for maximizing the power of the test

Prob{

reject H|H}

= Prob{DN ∈ S0|H

}= 1− Prob

{DN ∈ S1|H

}

which is the probability of correctly rejecting H. The higher the power,the better !

Machine learning – p. 26/45

Page 27: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP example• Consider a r.v. z ∼ N (µ, σ2), where σ is known and a set of N iid

observations are given.

• We want to test the null hypothesis µ = µ0 = 0, with α = 0.1

• Consider the 3 critical regions S0

1. |µ− µ0| > 1.645σ/√

N

2. µ− µ0 > 1.282σ/√

N

3. |µ− µ0| < 0.126σ/√

N

• For all these tests Prob {DN ∈ S0|H} ≤ α, hence the significance levelis the same.

• However if H : µ = 10 the type II error of the three tests is significantlydifferent.

• What is the best one?

Machine learning – p. 27/45

Page 28: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP example (II)H

H

S S

µ:

µ:

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

0 10

1 0

������������������������������������������������������������������������������������

������������������������������������������������������������������������������������

On the left: distribution of the test statistic µ if H : µ0 = 0 is true. On the right: distribution of the

test statistic µ if H : µ1 = 10 is true. The interval marked by S1 denotes the set of observed µ

values for which H is accepted (non-critical region). The interval marked by S0 denotes the set

of observed µ values for which H is rejected (critical region). The area of the black pattern

region on the right equals Prob {DN ∈ S0|H}, i.e. the probability of rejecting H when H is true

(Type I error). The area of the grey shaded region on the left equals the probability of accepting

H when H is false (Type II error).

Machine learning – p. 28/45

Page 29: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP example (III)

������������������������������������������������������������

������������������������������������������������������������

S

H

H

SS

µ:

µ:

10

0 10

1

On the left: distribution of the test statistic µ if H : µ0 = 0 is true. On the right: distribution of the

test statistic µ if H : µ1 = 10 is true. The two intervals marked by S1 denote the set of observed

µ values for which H is accepted (non-critical region). The interval marked by S0 denotes the set

of observed µ values for which H is rejected (critical region). The area of the pattern region

equals Prob {DN ∈ S0|H}, i.e. the probability of rejecting H when H is true (Type I error).

Which area corresponds to the probability of the Type II error?

Machine learning – p. 29/45

Page 30: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Type of parametric testsConsider random variables with a parametric distribution F (·, θ).One-sample vs. two-sample: in the one-sample test we consider a single r.v.

and we formulate hypothesis about its distribution. In the two-samplestest we consider 2 r.v. z1 and z2 and we formulate hypothesis about theirdifferences/similarities.

Simple vs composite: the test is simple if H describes completely thedistributions of the involved r.v. otherwise it is composite.

Single-sided (or one-tailed) vs Two-sided (or two-tailed) : in the single-sided test theregion of rejection concerns only one tail of the distribution of the nulldistribution. This means that H indicates the predicted direction of thedifference (e.g. H : θ > θ0) . In the two-sided test, the region of rejectionconcern both tails of the null distribution. This means that H does notindicate the predicted direction of the difference (e.g. H : θ 6= θ0) .

Machine learning – p. 30/45

Page 31: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Example of parametric test• Consider a parametric test on the distribution of a gaussian r.v., and

suppose that the null hypothesis is H : θ = θ0 where θ0 is given andrepresents the mean.

• The test is one-sample and composite.

• In order to know whether it is one or two-sided we have to define thealternative configuration: if H : θ < θ0 the test is one-sided down, ifH : θ > θ0 the test is one-sided up, if H : θ 6= θ0 the test is double-sided.

Machine learning – p. 31/45

Page 32: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

z-test (one-sample and one-sided)Consider a random sample DN ← x ∼ N (µ, σ2) with µ unknown et σ2 known.STEP 1:

Consider the null hypothesis and the alternative (composite and one-sided)

H : µ = µ0; H : µ > µ0

STEP 2: fix the value α of the type I error.STEP 3: choose a test statistic:If H is true then the distribution of µ is N (µ0, σ

2/N). This means that thevariable z is

z =(µ− µ0)

√N

σ∼ N (0, 1)

It is convenient to rephrase the test in terms of the test statistic z.

Machine learning – p. 32/45

Page 33: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

z-test (one-sample and one-sided) (II)STEP 4: determine the critical value for z.

We reject the hypothesis H is rejected if zN > zα where zα is such thatProb {N (0, 1) > zα} = α.

Ex: for α = 0.05 we would take zα = 1.645 since 5% of the standard normaldistribution lies to the right of 1.645.

R command: zα=qnorm(alpha,lower.tail=FALSE)

STEP 5: Once the dataset DN is measured, the value of the test statistic is

zN =(µ− µ0)

√N

σ

Machine learning – p. 33/45

Page 34: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP: example z-test• Consider a r.v. z ∼ N (µ, 1).

• We want to test H : µ = 5 against H : µ > 5 with significance level 0.05.

• Supose that the data is DN = {5.1, 5.5, 4.9, 5.3}.• Then µ = 5.2 and zN = (5.2− 5) ∗ 2/1 = 0.4.

• Since this is less than zα = 1.645, we do not reject the null hypothesis.

Machine learning – p. 34/45

Page 35: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Two-sided parametric testsAssumption: all the variables are normal!

Name one/two sample known H H

z-test one σ2 µ = µ0 µ 6= µ0

z-test two σ21 = σ2

2 µ1 = µ2 µ1 6= µ2

t-test one µ = µ0 µ 6= µ0

t-test two µ1 = µ2 µ1 6= µ2

χ2-test one µ σ2 = σ20 σ2 6= σ2

0

χ2-test one σ2 = σ20 σ2 6= σ2

0

F-test two σ21 = σ2

2 σ21 6= σ2

2

Machine learning – p. 35/45

Page 36: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

Student’s t-distributionIf x ∼ N (0, 1) and y ∼ χ2

N are independent then the Student’s t-distribution

with N degrees of freedom is the distribution of the r.v.

z =x√y/N

We denote this with z ∼ tN .If z1, . . . , zN are i.i.d. N (µ, σ2) then

√N(µ− µ)√

SS/(N − 1)

=

√N(µ− µ)

σ∼ tN−1

Machine learning – p. 36/45

Page 37: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

t-test: one-sample and two-sidedConsider a random sample from N (µ, σ2) with σ2 unknown . Let

H : µ = µ0; H : µ 6= µ0

Let

t(DN ) = T =

√N(µ− µ0)√

1

N−1

∑Ni=1

(zi − µ)2=

(µ− µ0)√σ2

N

a statistic computed using the data set DN .

Machine learning – p. 37/45

Page 38: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

t-test: one-sample and two-sided (II)• It can be shown that if the hypothesis H holds, T ∼ TN−1 is a r.v. with a

Student distribution with N − 1 degrees of freedom.

• The size α t-test consists in rejecting H if

|T | > k = tα/2,N−1

where tα/2,N−1 is the upper α point of a T -distribution on N − 1 degreesof freedom, i.e.

Prob{|tN−1| > tα/2,N−1

}= α/2.

where tN−1 ∼ TN−1.

• In other terms H is rejected when T is large.

• R command: tα/2,N−1=qt(alpha/2,N-1,lower.tail=TRUE)

Machine learning – p. 38/45

Page 39: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

TP exampleDoes jogging lead to a reduction in pulse rate? Eight non jogging volunteersengaged in a one-month jogging programme. Their pulses were taken beforeand after the programme

pulse rate before 74 86 98 102 78 84 79 70

pulse rate after 70 85 90 110 71 80 69 74

decrease 4 1 8 -8 7 4 10 -4

Suppose that the decreases are samples from N (µ, σ2) for some unknownσ2.We want to test H : µ = µ0 = 0 against H : µ 6= 0 with a significance α = 0.05.We have N = 8, µ = 2.75, T = 1.263, tα/2,N−1 = 2.365

Since |T | ≤ tα/2,N−1, the data is not sufficient to reject the hypothesis H. Inother terms we have not enough evidence to show that there is a reduction inpulse rate.

Machine learning – p. 39/45

Page 40: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

The chi-squared distributionFor a N positive integer, a r.v. z has a χ2

N distribution if

z = x21 + · · ·+ x2

N

where x1,x2,. . . ,xN are i.i.d. random variables N (0, 1).

• The probability distribution is a gamma distribution with parameters( 1

2N, 1

2).

• E[z] = N and Var [z] = 2N .

• The distribution is called “a chi-squared distribution with N degrees offreedom”.

Machine learning – p. 40/45

Page 41: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

χ2-test: one-sample and two-sided• Consider a random sample from N (µ, σ2) with µ known.

• LetH : σ2 = σ2

0 ; H : σ2 6= σ20

• Let SS =∑

i(zi − µ)2.

• It can be shown that if H is true then SS/σ20 ∼ χ2

N

• The size α χ2-test rejects H if SS/σ20 < a1 or SS/σ2

0 > a2 where

Prob

{SS

σ20

< a1

}+ Prob

{SS

σ20

> a2

}= α

• If µ is unknown, you must

1. replace µ with µ in the quantity SS

2. use a χ2N−1

distribution.

Machine learning – p. 41/45

Page 42: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

t-test: two-samples, two-sidedConsider two r.v.s x ∼ N (µ1, σ

2) and y ∼ N (µ2, σ2) with the same variance.

Let DxN and Dy

M two independent sets of samples .We want to test H : µ1 = µ2 against H : µ1 6= µ2.Let

µx =

∑Ni=1

xi

N, SSx =

N∑

i=1

(xi− µx)2, µy =

∑Mi=1

yi

M, SSy =

M∑

i=1

(yi− µy)2

Once defined the statistic

T =µx − µy√(

1

M + 1

N

) (SSx+SSy

M+N−2

) ∼ TM+N−2

it can be shown that a test of size α rejects H if

|T | > tα/2,M+N−2

Machine learning – p. 42/45

Page 43: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

F-distributionLet x ∼ χ2

M and y ∼ χ2N be two independent r.v.. A r.v. z has a F-distribution

Fm,n with M and N degrees of freedom if

z =x/M

y/N

• If z ∼ FM,N then 1/z ∼ FN,M .

• If z ∼ TN then z2 ∼ F1,N .

Machine learning – p. 43/45

Page 44: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

F-distribution

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FM,N

density: M=20 N=10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FM,N

cumulative distribution: M=20 N=10

R script s_f.R.

Machine learning – p. 44/45

Page 45: Statistical foundations of machine learning. Hypothesis Testing.pdf · •A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on

F-test: two-samples, two-sidedConsider a random sample x1, . . . , xM from N (µ1, σ

21) and a random sample

y1, . . . , yN from N (µ2, σ22) with µ1 and µ2 unknown. Suppose we want to test

H : σ21 = σ2

2 ; H : σ21 6= σ2

2

Let us consider the statistic

f =σ

2

1

σ2

2

=SS1/(M − 1)

SS2/(N − 1)∼ σ2

1χ2M−1

/(M − 1)

σ22χ2

N−1/(N − 1)

=σ2

1

σ22

FM−1,N−1

It can be shown that if H is true, the ratio f has a F-distribution FM−1,N−1

We reject H if the ratio f is large, i.e. f > Fα,M−1,N−1 where

Prob {z > Fα,M−1,N−1} = α

if z ∼ FM−1,N−1.

Machine learning – p. 45/45