PPAL -6200 Intro to Inference Chapter 14 and 15 (16 is review) March 8-9, 2011 (revised March 8...

Preview:

Citation preview

PPAL -6200 Intro to Inference

Chapter 14 and 15 (16 is review)March 8-9, 2011

(revised March 8 22:30)

Why we do research?

• Once we pull a sample and run our experiment or test, or whatever we learn something about the sample.

• However, that is not really our goal• Our goal is to Infer from the sample some

conclusion about the wider population

What if there are differences among samples?

• If we pull two samples from the same population there is a good chance that our results will differ

• Therefore, we can never be 100% certain that any statistics or results we draw from a sample will be reflected in the population parameters or how something might work in the wider world

Therefore,…

• When we do a study based on a sample we generally have to calculate two things

• Confidence Intervals• Significance• Both of these things are based on the sampling

distribution of statistics (the probability of what our outcomes would be if we applied the same methods for choosing a sample and calculating a statistic repeatedly many times).

Imagine a very simple situation

• SRS from the population• Variable is precisely normal• We don’t know the pop mean but we do know

the pop std. dev• Note as book states, this is a bit implausible

If you only knew the Std. Dev of the Pop. Life would be easy

• Take two examples from the book. They simply assume that the mean of the sample is equal to the mean of the pop, give or take.

• What do we use to calculate that “give or take”

• The sampling distribution of the mean for the sample and our knowledge of the normal curve

So let’s look at an example

• BMI (body mass index) of 654 women = 26.8• We know from some other source that the of BMI for all women is 7.5• We know the std dev. for the sampling

distribution (standard deviation for all samples of the mean) is

n/

X

3.0654

5.7

Estimating using the 68,95,99.7 Rule

• The 68, 95, 99.7 rule tells us that 95 % of all means will be within two units of standard deviation from the mean for all samples. Therefore if the sampling distribution is 0.3 then two units of standard deviation will be 0.6

• Therefore the 95% confidence interval will be between

4.276.08.266.0

2.266.08.266.0

X

X

Another Example

• NAEP Test of basic math skills • 840 young men in the sample• Mean score for young men = 272• We want to estimate Mean score in pop of

young men. From another source We know pop is normal and has a std. deviation of 60

• So we can figure out the std dev. for the sampling dist. And from there the 95% confidence interval

n/• So if one unit of std. dev

for the sampling dist is 2.07

• Then two is plus or minus 4.14

07.2840

60

Therefore we are 95% confident that the mean score in the population is

between

14.27614.427214.4

86.26714.427214.4

X

X

Now let’s take what we learned and think wider…

• In the first example (slides 7 and 8) the margin of error was ± 0.6 with 95% confidence

• If we choose the 95% we saw it was just plus or minus 2 x the std dev. for the sampling distribution (which we worked out as 0.3). Now please note, if we chose a different confidence interval we would have a different margin of error. 99.7 would be 3 x the sampling distribution.

Example #1 margin of error now with 99.7% confidence rather than 95%

7.279.08.269.0

9.259.08.269.0

X

X

Think of it like playing darts…• As you can see there is a

sort of trade off here. All other things being equal: Higher confidence means greater margin of error.

• Think of it like playing darts.– The smaller the target area

(margin of error) the less likely you are to hit it (confidence)

– The larger the target area (margin of error) the more likely you are to hit it (confidence)

Rogues• In the two examples we did: We estimated the

range within which the population mean resides, given our knowledge of the sample means and the sampling distribution 95% of the time.

• That means if we use our method 100 times there will be five occasions when the population mean falls either above or below our estimate for the margins of error.

• Those five occasions are sometimes called “rogues” We can discuss how to detect them later

The Standardized normal curve

• In practice we can simplify what we did before by using the properties of the normal curve and the known critical values of Z

Confidence level C

90% 95% 99%

Critical value Z*

1.645 1.960 2.576

So let’s see this with equation and numbers

• Let’s go back and check the Math tests using this method (slides 9-11).

nto

n ZXZX **

• As you recall:– Mean is 272– Sample “n” is 840– Std dev of pop is 60– Critical z for 95% is

1.960

• The difference between the two slides

is due to rounding and nothing more

05.4840

6096.1

*

nZ

05.27605.427205.4

95.26705.427205.4

X

X

Significance and Testing the null hypothesis

• Why do we always test the null hypothesis that there is “no” relationship among the variables? Because we can never really say anything is true.

• The probability that the null hypothesis is true is the p value.

• Thinking about the 95% confidence interval we can just reverse this and say we want to see results that have a p value < .05 meaning there is less than a 5% probability that the null hypothesis is true.

In terms of statistical tests

• We are asking, what is the probability that the results we have observed could be caused by random chance? A p value < .05 says there is less than a 5% chance of this.

• Looking at the example in the book (does cola lose sweetness in storage?)

• Our null hypothesis is that there is no loss of sweetness and that average loss of sweetness is in fact = 0

Here comes the normal curve again

• We know there are 10 tasters• We know their mean loss of sweetness score

was 1.02• We also know that for any cola the std.

deviation of sweetness loss is 1.0 If the mean really were zero then what we are

talking about is 0± 316.010

1

n

Now let’s think about moving beyond eyeballs on graphs to numbers

• Like Z, All of the statistics we use have known critical values so that we can go to a table and look up how strong the stat has to be with a given size sample so as to be significant at the p<0.5 level

• In fact, your software will calculate the precise or (at least to three decimal place) probability

What we are looking for• Data that would rarely occur if the null hypothesis Ho were

true provide evidence that Ho is not true. P values give us a measure of “would rarely occur”.

• State: what is the practical question we are testing• Plan: Identify the parameter, state null and alternative

hypotheses, and choose the type of test that fits the data and problem

• Solve: – Check the conditions for the test you plan to use– Calculate the statistic– Find the P. Value

• Conclude: Return to the practical question to describe your results in this setting

Tests for a population mean The Z test statistic

• The z test statistic measures how far the observed mean varies from the population mean (and allows us to proceed to estimate if the difference is significant).

• This is worth doing for the sake of knowing how. However, it has a big flaw as a real world test, it assumes you know the Mean in the pop and the standard deviation.

Let’s look at the blood pressure example

• An executive wants to know if his/her people have abnormal blood pressure.

So if we do it• Now go to the back of the

book and look at table A and find the area that is 1.09 units away from 0. Find the row -1.0 and then go across to column .09 and you find 0.1379 double it and you have found the percentage of times 27% that a sample of 72 men would have blood pressure this far from the population mean OR P=0.27.

09.1

7215

12807.126

n

ZoX

Thinking about Inference• No matter which statistical test you employ, the

reasoning of confidence intervals and significance is the same

• Statistics is applied mathematics. You must know the mathematical theorems (such as the fact that the Z stat has a normal distribution when Ho is true).

• As well, you must also use judgment so as to determine when to apply these theorems and when not to (what I call the “deer hunter phenomenon”).

When you teach stats you sometimes come away feeling as though you are teaching people to use a

high-power rifle without teaching them what deer look like

Some rules and ideas to keep in mind so that you ensure you are shooting at deer, rather than

things you are not supposed to hunt

H0 True Ha True

Reject H0 Type 1 Error Correct Conclusion

Fail to Reject H0 Correct Conclusion Type II Error

• Conditions for inference– Understand the conditions that apply for a given statistic

so that its confidence intervals and associated significance tests can be trusted

– E.g. the Z statistic procedure only worked because we knew the parameter mean (a situation that will rarely occur). Therefore this test is of little use in the real world. However, there is another way to do this that will be discussed in chapter 17. The other requirements, that data be drawn from a population with a SRS and that population is “Normal” is harder to get around. Points to two big questions you must always ask:• Where did the data come from?• What is the shape of the population distribution?

• Where did the data come from?– Most statistical tests assume data comes from some

sort of Random Sample, does it really?

• What is the shape of the population distribution?– In an ideal world you will have a sample that is normal

from Normally distributed populations. – In truth this will often not apply and it often does not

matter that much. However, there are some statistical tests where it does matter and you will have to pay attention to those warnings.

• How do the confidence intervals behave– What causes it to get smaller, what causes it to

get larger?• What does the margin of error include?– The margin of error calculated for a confidence

interval only includes the error caused by the sampling errors (the sampling distribution)

– Any other source of error, e.g. response bias, non-responses, etc. is not covered and will certainly influence this

• How do significance tests behave– How small a P is

convincing?– 0.05 good for social

research not so hot for designing nuclear reactors

• Significance always depends on the alternative hypothesis as to whether it should be a two sided or one sided test

• Significance does not mean important.– Many associations are relatively minor but significant. For

example, a study might show that a certain type of behavior increases the risk of getting a certain type of cancer by 0.05% and that the relationship is significant. This might be cause for concern until you read that only 1% of people get that sort of cancer.

– Then you have to ask, what am being asked to give up in order to get that benefit? Eating a specific type of fatty food? Okay, I’ll give that up. Living in a large urban area? Probably not because living in large urban areas has other benefits that probably counter-balance that risk.

• If you do it often enough you will get a significant result. – Beware of multiple analysis. If you have several

studies using the same method and only one produces significant results be careful.

• Use the correct sample size for the confidence interval and significance level (power of a test) you want.

Recommended