26
Statistical inference: Hypothesis testing 1

1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Embed Size (px)

Citation preview

Page 1: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Statistical inference:Hypothesis testing

1

Page 2: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Statistics: Learning from Samples about Populations

Inference 1: Confidence IntervalsWhat does the 95% CI really mean?

Inference 2: Hypothesis TestsWhat does a p-value really mean?When to use which test?

Statistical Inference: Brief Overview

Page 4: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Hypothesis testing = testing of statistical hypothesis

4

Page 5: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Statistical hypothesisStatements about population parameter values.

Null hypothesis (H0) says a parameter is unchanged from a default, pre-specified value;

andAlternative hypothesis (H1) says parameter has a value

incompatible with H0

5

Page 6: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Population Sample

?

MeanStandard Deviation

Size

xsn

_

Parameters Statistics

Postulated (unknown) Seen (known)

. . . . . . . . . . . . . . .

. . . . .. . . . . . .

……

Page 7: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Make appropriate statistical hypotheses: Assumption: Mean cholesterol in hypertensive men is

equal to mean cholesterol in male general population (20-74 years old).

We estimated: In the 20-74 year old male population

the mean serum cholesterol is 211 mg/ml with a standard deviation of 46 mg/ml

Example: Hypertension and Cholesterol

Page 8: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Null hypothesis => no difference between treatments H0: μhypertensive = μgeneral population

H0: μhypertensive = 211 mg/ml

• μ - population mean of serum cholesterol • Mean cholesterol for hypertensive men = mean for general male

population

Alternative hypothesis HA: μhypertensive ≠ μ general population

HA: μ hypertensive ≠ 211 mg/ml

Example: Hypertension and Cholesterol

Page 9: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Null and alternative hypothesis

9

Two-sided tests

One-sided tests

Page 11: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Steps in Hypothesis Tests1. Assume H0 is true i.e. believe results are a matter of chance

2. Quantify how far away are data from being consistent with H0

by evaluating quantity called a test statistic

3. Assess probability of results at least this extreme - call this the p-value of the test

4. Reject H0 (believe H1) if this p-value is small or keep H0 (do not believe H1) otherwise

Page 12: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Interpretation of P-value (0.05)

P>=0.05

Significant difference between the treatmentsNull hypothesis is rejected, alternative is accepted

P<0.05 5%

No difference between the treatments (observed difference having happened by chance)Null hypothesis is accepted

Page 13: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

P-valueThe P value gives the probability of observed and more

extreme difference having happened by chance.

P = 0.500 means that the probability of the difference having happened by chance is 0.5=50% in 1 ~ 1 in 2.

P = 0.05 means that the probability of the difference having happened by chance is 0.05=5% in 1 ~ 1 in 20.

13

Page 14: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

P-value

14

Page 15: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

P-valueThe lower the P value, the less likely it is that the

difference happened by chance and so the higher the significance of the finding.

P = 0.01 is often considered to be “highly significant”. It means that the difference will only have happened by chance 1 in 100 times. This is unlikely, but still possible.

15

Page 16: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Example 1Out of 50 new babies on average 25 will be girls,

sometimes more, sometimes less.

Say there is a new fertility treatment and we want to know whether it affects the chance of having a boy or a girl.

Null hypothesis –the treatment does not alter the chance of having a girl.

16

Page 17: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Example 1Null hypothesis –the treatment does not alter the chance

of having a girl.

Out of the first 50 babies resulting from the treatment, 15 are girls.

We need to know the probability that this just happened by chance, i.e. did this happen by chance or has the treatment had an effect on the sex of the babies?

P=0.007

17

Page 18: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Example 1The P value in this example is 0.007. This means the result would only have happened by

chance in 0.007 in 1 (or 1 in 140) times if the treatment did not actually affect the sex of the baby.

This is highly unlikely, so we can reject our hypothesis and conclude that the treatment probably does alter the chance of having a girl.

18

Page 19: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Example 2Patients with minor illnesses were randomized to see either Dr Smith or Dr Jones. Dr Smith ended up

seeing 176 patients in the study whereas Dr Jones saw 200 patients.

19

Page 20: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Example 2

20

Page 21: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

1. Type of data (type of variable)?2. Number of groups?3. Related or independent groups?4. Normal or asymmetric distribution?

How to choose the appropriate statistical test?

Page 22: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

22

Numerical

Page 23: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Make appropriate statistical hypotheses: Mean cholesterol in hypertensive men is 220 mg/ml

with a standard deviation of 39 mg/ml. In the 20-74 year old male population the mean

serum cholesterol is estimated to 211 mg.

Example: Hypertension and Cholesterol

Page 24: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests
Page 25: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests
Page 26: 1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests

Hypothesis vs Statictical Hypothesis

Alcohol intake increases driver’s reaction time.

Mean reaction time in examinees drinking alcohol is greater than in nondrinking controls.

Research hypothesis Statistical hypothesis