32
Name: ___________________________ AP Stats Chapter 18-22 Notes Chapter 18: Sampling Distribution Models The Central Limit Theorem for Sample Proportions Rather than showing real repeated samples, ________________________ what would happen if we were to actually draw many samples. Now imagine what would happen if we looked at the sample proportions for these samples. The histogram we’d get if we could see all the proportions from all possible samples is called the ______________________________________ of the proportions. What would the histogram of all the sample proportions look like? We would expect the histogram of the sample proportions to _____________________ at the true proportion, p, in the population. As far as the shape of the histogram goes, we can ________________________ a bunch of random samples that we didn’t really draw. It turns out that the histogram is ___________________, symmetric, and centered at p. More specifically, it’s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions. Modeling how sample proportions vary from sample to sample is one of the most powerful ideas we’ll see in this course. A ________________________________________ for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval. To use a Normal model, we need to specify its mean and standard deviation. We’ll put µ, the mean of the Normal, at p. When working with proportions, knowing the mean automatically gives us the standard deviation as well—the standard deviation we will use is: So, the distribution of the sample proportions is modeled with a probability model that is: A picture of what we just discussed is as follows: 1

mrhubbard.pbworks.commrhubbard.pbworks.com/w/file/fetch/50827598/StatsChp…  · Web viewJust how big of a sample do we need? This will soon be revealed ... they act as a bridge

Embed Size (px)

Citation preview

Name: ___________________________ AP Stats Chapter 18-22 Notes

Chapter 18: Sampling Distribution Models The Central Limit Theorem for Sample Proportions

Rather than showing real repeated samples, ________________________ what would happen if we were to actually draw many samples.

Now imagine what would happen if we looked at the sample proportions for these samples.

The histogram we’d get if we could see all the proportions from all possible samples is called the ______________________________________ of the proportions.

What would the histogram of all the sample proportions look like? We would expect the histogram of the sample proportions to

_____________________ at the true proportion, p, in the population. As far as the shape of the histogram goes, we can ________________________ a

bunch of random samples that we didn’t really draw. It turns out that the histogram is ___________________, symmetric, and

centered at p. More specifically, it’s an amazing and fortunate fact that a Normal model is

just the right one for the histogram of sample proportions. Modeling how sample proportions vary from sample to sample is one of the

most powerful ideas we’ll see in this course. A ________________________________________ for how a sample proportion varies

from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval.

To use a Normal model, we need to specify its mean and standard deviation. We’ll put µ, the mean of the Normal, at p.

When working with proportions, knowing the mean automatically gives us the standard deviation as well—the standard deviation we will use is:

So, the distribution of the sample proportions is modeled with a probability

model that is:

A picture of what we just discussed is as follows:

Because we have a Normal model, for example, we know that 95% of Normally distributed values are within two standard deviations of the mean.

1

So we should not be surprised if 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations.

This is what we mean by _________________________________. It’s not really an error at all, but just variability you’d expect to see from one sample to another. A better term would be _____________________________.

How Good Is the Normal Model? The Normal model gets better as a good model for the distribution of sample

proportions as the sample size gets _______________________. Just how big of a sample do we need? This will soon be revealed…

Assumptions and Conditions Most models are useful only when specific assumptions are true. There are two assumptions in the case of the model for the distribution of

sample proportions:1. ___________________________________: The sampled values must be

independent of each other.2. ___________________________________: The sample size, n, must be large

enough. Assumptions are hard—often impossible—to check. That’s why we assume

them. Still, we need to check whether the assumptions are reasonable by checking

_____________________________ that provide information about the assumptions. The corresponding conditions to check before using the Normal to model the

distribution of sample proportions are the ____________________________, the ________________________ and the ____________________________.

1. ___________________________: The sample should be a simple random sample of the population.

2. ___________________________: the sample size, n, must be no larger than 10% of the population.

3. ___________________________: The sample size has to be big enough so that both np (number of successes) and nq (number of failures) are at least 10.

…So, we need a large enough sample that is not too large.

A Sampling Distribution Model for a Proportion A proportion is no longer just a computation from a set of data.

It is now a random variable quantity that has a probability distribution. This distribution is called the __________________________________ for

proportions.

2

Even though we depend on sampling distribution models, we never actually get to see them.

We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them.

Still, ______________________________________ are important because they act as a bridge from the real world of data to the imaginary world

of the statistic and enable us to say something about the population when all we have is

data from the real world. Provided that the sampled values are independent and the sample size is

large enough, the sampling distribution of is modeled by a Normal model with

Mean: ____________________ Standard deviation:____________________________

Just Checking…1. You want to poll a random sample of 100 students on campus to see if they

are in favor of the proposed location for the new student center. Of course, you’ll get one number, your sample proportion, p. But if you imagined all the possible samples of 100 students you could draw and imagined the histogram of all the sample proportions from these samples, what shape would it have?

2. Where would the center of the histogram be?

3. If you think about half of your students are in favor of the plan, what would the standard deviation of the sample proportions be?

What About Quantitative Data? Proportions summarize categorical variables. The Normal sampling distribution model looks like it will be very useful. Can we do something similar with quantitative data? We can indeed. Even more remarkable, not only can we use all of the same

concepts, but almost the same model.

Simulating the Sampling Distribution of a Mean Like any statistic computed from a random sample, a sample mean also has

a sampling distribution.

3

We can use __________________________ to get a sense as to what the sampling distribution of the sample mean might look like…

Means – The “Average” of One Die Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Looking at the average of two dice after a simulation of 10,000 tosses:

The average of three dice after a simulation of 10,000 tosses looks like:

The average of 5 dice after asimulation of 10,000 tosses looks like: The average of 20 dice after a simulation of 10,000 tosses looks like:

Means – What the Simulations Show As the sample size (number of dice) gets ________________________, each

sample average is more likely to be __________________ to the population mean.

So, we see the shape continuing to tighten around 3.5 And, it probably does not shock you that the sampling distribution of a mean

becomes ________________________.

The Fundamental Theorem of Statistics The sampling distribution of any mean becomes more nearly

_________________________ as the sample size grows. All we need is for the observations to be independent and collected

with randomization. We don’t even care about the shape of the population distribution!

The Fundamental Theorem of Statistics is called the _______________________________________(CLT).

The CLT is surprising and a bit weird:

4

Not only does the histogram of the sample means get closer and closer to the Normal model as the sample size grows, but _____________________ _________________________________________________________

The CLT works better (and faster) the closer the population model is to a Normal itself. It also works better for larger samples.

____________________________________ (CLT) The mean of a random sample is a random variable whose sampling

distribution can be approximated by a Normal model. The larger the sample, the better the approximation will be.

Assumptions and Conditions The CLT requires essentially the same assumptions we saw for modeling

proportions: _______________________________: The sampled values must be

independent of each other. _______________________________: The sample size must be sufficiently

large. We can’t check these directly, but we can think about whether the

_______________________________ is plausible. We can also check some related conditions:

________________________________: The data values must be sampled randomly.

________________________________: When the sample is drawn without replacement, the sample size, n, should be no more than 10% of the population.

________________________________: The CLT doesn’t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

But Which Normal? The CLT says that the sampling distribution of any mean or proportion is

approximately ____________________. But which Normal model?

For proportions, the sampling distribution is centered at the population proportion.

For means, it’s centered at the population mean. But what about the standard deviations? The Normal model for the sampling distribution of the mean has a standard

deviation equal to:

5

where σ is the population standard deviation. The Normal model for the sampling distribution of the proportion has a

standard deviation equal toAbout Variation

The standard deviation of the sampling distribution declines only with the square root of the sample size (the denominator contains the square root of n).

Therefore, the variability ______________________ as the sample size _____________________.

While we’d always like a larger sample, the square root limits how much we can make a sample tell about the population. (This is an example of the Law of Diminishing Returns.)

The Real World and the Model WorldBe careful! Now we have two distributions to deal with.

The first is the ____________________________________ of the sample, which we might display with a histogram.

The second is the math world _______________________________ of the statistic, which we model with a Normal model based on the Central Limit Theorem.

Just Checking…4. Human gestation times have a mean of about 266 days, with a standard deviation of about 16 days. If we record the gestation times of a sample of 100 women, do we know that a histogram of the times will be well modeled by a Normal model?

5. Suppose we look at the average gestation times for a sample of 100 women. If we imagined all the possible random samples of 100 women we could take and looked at the histogram of all these sample means, what shape would it have?

6. Where would the center of that histogram be?

7. What would be the standard deviation of that histogram?

Sampling Distribution Models

6

Always remember that the statistic itself is a ____________________ quantity. We can’t know what our statistic will be because it comes from a

random sample. Fortunately, for the mean and proportion, the _____________ tells us that we

can model their sampling distribution directly with a Normal model. There are two basic truths about sampling distributions:

Sampling distributions arise because samples ____________. Each random sample will have different cases and, so, a different value of the statistic.

Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

The Process Going Into the Sampling Distribution Model

What Can Go Wrong? Don’t confuse the sampling distribution with the distribution of the sample.

When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics.

The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get.

Beware of observations that are not independent. The CLT depends crucially on the assumption of independence. You can’t check this with your data—you have to think about how the

data were gathered. Watch out for small samples from skewed populations.

7

The more skewed the distribution, the larger the sample size we need for the CLT to work.

Chapter 19: Confidence Intervals for Proportions Standard Error

Both of the sampling distributions we’ve looked at are Normal.For proportions For means

When we don’t know p or σ, we’re stuck, right? Nope. We will use _______________________________ to estimate these

population parameters. Whenever we estimate the standard deviation of a sampling distribution, we

call it a _____________________________. For a sample proportion, the standard error is

For the sample mean, the standard error is

A Confidence Interval Recall that the sampling distribution model of is centered at p, with

standard deviation.

Since we don’t know p, we can’t find the true standard deviation of the sampling distribution model, so we need to find the standard error:

By the 68-95-99.7% Rule, we know about _______% of all samples will have ’s within 1 SE of p about _______% of all samples will have ’s within 2 SEs of p about _______% of all samples will have ’s within 3 SEs of p

We can look at this from ’s point of view…

Consider the ________% level: There’s a 95% chance that p is no more than 2 SEs away from . So, if we reach out 2 SEs, we are 95% sure that p will be in that

interval. In other words, if we reach out 2 SEs in either direction of ,

8

we can be 95% confident that this interval contains the true proportion.

This is called a 95% ____________________________.

A Pew Research study regarding cell phones asked questions about cell phone experience. One growing concern is unsolicited advertising in the form of text messages. Pew asked cell phone owners, “Have you ever received unsolicited text messages on your cell phone from advertisers?” and 17% reported that they had. Pew estimates a 95% confidence interval to be 0 .17±0 .04or between 13% and 21%.

Are the following statements about people who have cell phones correct? Explain.

1. In Pew’s sample, somewhere between 13% and 21% of respondents reported that they had received unsolicited advertising text messages.

2. We can be 95% confident that 17% of U.S. cell phone owners have received unsolicited advertising text messages.

3. We are 95% confident that between 13% and 21% of all U.S. cell phone owners have received unsolicited advertising text messages.

4. We know that between 13% and 21% of all U.S. cell phone owners have received unsolicited advertising text messages.

5. 95% of all U.S. cell phone owners have received unsolicited advertising text messages.

What Does “95% Confidence” Really Mean? Each confidence interval uses a sample statistic to estimate a population

parameter. But, since samples vary, the statistics we use, and thus the confidence

intervals we construct, vary as well. The figure to the right shows that

9

some of our confidence intervals (from 20 random samples) capture the true proportion (the green horizontal line), while others do not:

Our confidence is in the ________________________ of constructing the interval, not in any one interval itself.

Thus, we expect 95% of all 95% confidence intervals to contain the true parameter that they are estimating.

Margin of Error: Certainty vs. Precision We can claim, with 95% confidence, that the interval _____________________

contains the true population proportion. The extent of the interval on either side of is called the

_________________________ (ME). In general, confidence intervals have the form ________________________ The more confident we want to be, the larger our ME needs to be, making the

interval wider.

To be more confident, we wind up being less ____________________. We need more values in our confidence interval to be more certain.

Because of this, every confidence interval is a balance between certainty and precision.

The tension between certainty and precision is always there. Fortunately, in most cases we can be both sufficiently certain and

sufficiently precise to make useful statements. The choice of confidence level is somewhat arbitrary, but keep in mind this

tension between certainty and precision when selecting your confidence level.

The most commonly chosen confidence levels are _______%, _______%, and _________% (but any percentage can be used).

10

Critical Values The ‘2’ in (our 95% confidence interval) came from the

68-95-99.7% Rule. Using a table or technology, we find that a more exact value for our 95%

confidence interval is ___________ instead of 2. We call 1.96 the _____________________________ and denote it z*.

For any confidence level, we can find the corresponding critical value (the number of SEs that corresponds to our confidence interval level).

Example: For a 90% confidence interval, the critical value is _______________:

Just CheckingThink some more about the 95% confidence interval Fox News created for the proportion of registered voters who believe that global warming exists.

6. If Fox wanted to be 98% confident, would their confidence interval need to be wider or narrower?

7. Fox’s margin of error was about +/- 3%. If they reduced it to +/- 2%, would their level of confidence be higher or lower?

8. If Fox News had polled more people, would the interval’s margin of error had been smaller or larger?

Assumptions and Conditions All statistical models make upon ____________________________.

Different models make different assumptions. If those assumptions are not true, the model might be inappropriate

and our conclusions based on it may be wrong. You can never be sure that an assumption is true, but you can often decide

whether an assumption is plausible by checking a related ______________________.

Here are the assumptions and the corresponding conditions you must check before creating a confidence interval for a proportion:

_______________________________: We first need to Think about whether the ______________________________ is plausible. It’s not one you can check by looking at the data. Instead, we check two conditions to decide whether independence is reasonable.

11

_______________________________: Were the data sampled at random or generated from a properly randomized experiment? Proper randomization can help ensure independence.

_______________________________: Is the sample size no more than 10% of the population?

___________________________________: The sample needs to be large enough for us to be able to use the CLT.

______________________________: We must expect at least 10 “successes” and at least 10 “failures.”

One-Proportion z-Interval When the conditions are met, we are ready to find the confidence interval for

the population proportion, p. The confidence interval is: where

The critical value, z*, depends on the particular confidence level, C, that you

specify.

Choosing Your Sample Size The question of how large a sample to take is an important step in planning

any study. Choose a Margin or Error (ME) and a Confidence Interval Level. The formula requires which we don’t have yet because we have not taken

the sample. A good estimate for _______, which will yield the largest valuefor ________ (and therefore for n) is 0.50.

Solve the formula for n:

What Can Go Wrong?__________________________________________________________:

Don’t suggest that the parameter varies. Don’t claim that other samples will agree with yours. Don’t be certain about the parameter. Don’t forget: It’s about the parameter (not the statistic). Don’t claim to know too much. Do take responsibility (for the uncertainty). Do treat the whole interval equally.

___________________________________________________________: We can’t be exact, but how precise do we need to be? One way to make the margin of error smaller is to reduce your level of

confidence. (That may not be a useful solution.) You need to think about your margin of error when you design your study.

12

To get a narrower interval without giving up confidence, you need to have less variability.

You can do this with a larger sample…____________________________________________________________:

In general, the sample size needed to produce a confidence interval with a given margin of error at a given confidence level is:

where z* is the critical value for your confidence level. To be safe, round up the sample size you obtain.

_____________________________________________________: Watch out for biased samples—keep in mind what you learned in Chapter 12. Think about independence.

Chapter 20: Testing Hypotheses About Proportions Hypotheses

___________________________ are working models that we adopt temporarily. Our starting hypothesis is called the ___________________________. The null hypothesis, that we denote by H0, specifies a population model

parameter of interest and proposes a value for that parameter. We usually write down the null hypothesis in the form H0: parameter =

hypothesized value. The alternative hypothesis, which we denote by HA, contains the values of the

parameter that we consider plausible if we reject the null hypothesis. The _________________________________, specifies a population model

parameter of interest and proposes a value for that parameter. We might have, for example, H0: p = 0.20, as in the chapter example.

We want to compare our data to what we would expect given that H0 is true. We can do this by finding out how many standard deviations away

from the proposed value we are. We then ask how likely it is to get results like we did if the null hypothesis

were true.

A Trial as a Hypothesis Test Think about the logic of jury trials:

To prove someone is guilty, we start by _________________ they are innocent.

We retain that ______________________ until the facts make it unlikely beyond a reasonable doubt.

13

Then, and only then, we __________________ the hypothesis of innocence and declare the person guilty.

The same logic used in jury trials is used in statistical tests of hypotheses: We begin by assuming that a hypothesis is true. Next we consider whether the data are consistent with the hypothesis. If they are, all we can do is retain the hypothesis we started with. If

they are not, then like a jury, we ask whether they are unlikely beyond a reasonable doubt.

P-Values The statistical twist is that we can ________________ our level of doubt.

We can use the model proposed by our hypothesis to calculate the probability that the event we’ve witnessed could happen.

That’s just the probability we’re looking for—it quantifies exactly how surprised we are to see our results.

This probability is called a _____________________. When the data are consistent with the model from the null hypothesis, the P-

value is high and we are unable to reject the null hypothesis. In that case, we have to “retain” the null hypothesis we started with. We can’t claim to have proved it; instead we “______________________

________________________________” when the data are consistent with the null hypothesis model and in line with what we would expect from natural sampling variability.

If the P-value is low enough, we’ll “___________________________________ _______________________________,” since what we observed would be very unlikely were the null model true.

What to Do with an “Innocent” Defendant If the evidence is not strong enough to reject the presumption of innocent,

the jury returns with a verdict of “____________________.” The jury does not say that the defendant is innocent. All it says is that there is not enough evidence to convict, to reject

innocence. The defendant may, in fact, be innocent, but the jury has no way to be

sure. Said statistically, we will ___________________________ the null hypothesis.

We never declare the null hypothesis to be true, because we simply do not know whether it’s true or not.

Sometimes in this case we say that the null hypothesis has been retained.

In a trial, the burden of proof is on the prosecution.

14

In a hypothesis test, the burden of proof is on the _______________________. The null hypothesis is the ordinary state of affairs, so it’s the alternative to

the null hypothesis that we consider unusual (and for which we must marshal evidence).

Just Checking1. A research team wants to know if aspirin helps to thin blood. The null

hypothesis says that it doesn’t. They test 12 patients, observe the proportion with thinner blood, and get a P-value of 0.32. They proclaim that aspirin does not work. What would you say?

2. An allergy drug has been tested and found to give relief to 75% of the patients with a large clinical trial. Now the scientists want to see if the new, improved version works even better. What would the null hypothesis be?

3. The new drug is tested and the P-value is 0.0001. What would you conclude about the new drug?

The Reasoning of Hypothesis Testing There are four basic parts to a hypothesis test:

1. Hypotheses2. Model3. Mechanics4. Conclusion

Let’s look at these parts in detail…1. ________________________________

____________________________: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters.

In general, we have H0: parameter = hypothesized value. ____________________________: The alternative hypothesis, HA, contains

the values of the parameter we consider plausible when we reject the null.

2. _______________________________ To plan a statistical hypothesis test, specify the _______________ you will

use to test the null hypothesis and the parameter of interest. All models require assumptions, so state the assumptions and check

any corresponding conditions. Your model step should end with a statement such

15

Because the conditions are satisfied, I can model the sampling distribution of the proportion with a Normal model.

Watch out, though. It might be the case that your model step ends with “Because the conditions are not satisfied, I can’t proceed with the test.” If that’s the case, stop and reconsider.

Each test we discuss in the book has a name that you should include in your report.

The test about proportions is called a _______________________________________.

One-Proportion z-Test The conditions for the one-proportion z-test are the same as for the one

proportion z-interval. We test the hypothesis H0: p = p0

using the statistic where

When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value.

3. ________________________________ Under “mechanics” we place the actual calculation of our test statistic

from the data. Different tests will have different formulas and different test statistics. Usually, the mechanics are handled by a statistics program or

calculator, but it’s good to know the formulas. The ultimate goal of the calculation is to obtain a __________________.

The P-value is the probability that the observed statistic value (or an even more extreme value) could occur if the null model were correct.

If the P-value is small enough, we’ll reject the null hypothesis. Note: The P-value is a conditional probability—it’s the probability

that the observed results could have happened _______________ ________________________________

4. __________________________________ The conclusion in a hypothesis test is always a ___________________

about the null hypothesis. The conclusion must state either that we __________________ or that we

___________________________ the null hypothesis.

16

And, as always, the conclusion should be stated in _________________. Your conclusion about the null hypothesis should never be the end of a

testing procedure. Often there are actions to take or policies to change.

Alternative Alternatives There are three possible alternative hypotheses:

HA: parameter < hypothesized value HA: parameter ≠ hypothesized value HA: parameter > hypothesized value

________________________________ is known as a ________________________________ because we are equally interested in deviations on either side of the null hypothesis value.

For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value.

The other two alternative hypotheses are called _________________________. A one-sided alternative focuses on deviations from the null hypothesis value

in only one direction. Thus, the P-value for one-sided alternatives is the probability of deviating

only in the direction of the alternative away from the null hypothesis value.

P-Values and Decisions: What to Tell About a Hypothesis Test How small should the P-value be in order for you to reject the null

hypothesis? It turns out that our decision criterion is context-dependent.

When we’re screening for a disease and want to be sure we treat all those who are sick, we may be willing to reject the null hypothesis of no disease with a fairly large P-value (0.10).

A longstanding hypothesis, believed by many to be true, needs stronger evidence (and a correspondingly small P-value) to reject it.

17

Another factor in choosing a P-value is the importance of the issue being tested.

Your conclusion about any null hypothesis should be accompanied by the _______________ of the test.

If possible, it should also include a confidence interval for the parameter of interest.

__________________ just declare the null hypothesis rejected or not rejected. Report the P-value to show the strength of the evidence against the

hypothesis. This will let each reader decide whether or not to reject the null

hypothesis.

Just Checking…4. A bank is testing a new method for getting delinquent customers to pay their past-due credit card bills. The standard way was to send a letter (costing about $0.40) asking the customer to pay. That worked 30% of the time. They want to test a new method that involves sending a DVD to customers encouraging them to contact the bank and to set up a payment plan. Developing and sending the video costs about $10 a customer. What is the parameter of interest? What are the null and alternative hypotheses?

5. The bank sets up an experiment to test the effectiveness of the DVD. They mail it out to several randomly selected delinquent customers and keep track of how many actually do contact the bank to arrange payments. The bank’s statistician calculates a P-value of 0.003. What does this P-value suggest about the DVD?

6. The statistician tells the bank’s management that the results are clear and that they should switch to the DVD method. Do you agree? What else might you want to know?

What Can Go Wrong? Hypothesis tests are so widely used—and so widely misused—that the issues

involved are addressed in their own chapter (Chapter 21). There are a few issues that we can talk about already, though: Don’t base your __________________________ on what you see in the data.

Think about the situation you are investigating and develop your null hypothesis appropriately.

Don’t base your __________________________ on the data, either. Again, you need to Think about the situation.

18

Don’t make your null hypothesis what you want to show to be true. You can reject the null hypothesis, but you can never “accept” or

“prove” the null. Don’t forget to check the conditions.

We need randomization, independence, and a sample that is large enough to justify the use of the Normal model.

Don’t ____________________the null hypothesis. If you fail to reject the null hypothesis, don’t think a bigger sample would be

more likely to lead to rejection. Each sample is different, and a larger sample won’t necessarily

duplicate your current observations.

Chapter 21: More About Tests and Intervals Zero In on the Null

Null hypotheses have special _________________________. To perform a hypothesis test, the null must be a ______________________ about

the value of a parameter for a model. We then use this value to compute the probability that the observed sample

statistic—or something even farther from the null value—might occur. How do we choose the null hypothesis? The appropriate null arises directly

from the context of the problem—it is not dictated by the data, but instead by the situation.

One good way to identify both the null and alternative hypotheses is to think about the Why of the situation.

To write a null hypothesis, you can’t just choose any parameter value you like.

The null must relate to the question at hand—it is context dependent. There is a temptation to state your __________________ as the null hypothesis.

However, you ___________________ prove a null hypothesis true. So, it makes more sense to use what you want to show as the

_____________________. This way, when you reject the null, you are left with what you want to

show.

How to Think About P-Values A P-value is a _______________________ probability—the probability of the

observed statistic given that the null hypothesis is true. The P-value is ________ the probability that the null hypothesis is true.

19

It’s not even the conditional probability that null hypothesis is true given the data.

Be careful to interpret the P-value correctly.

What to Do with a High P-Value When we see a small P-value, we could continue to believe the null

hypothesis and conclude that we just witnessed a rare event. But instead, we ____________ the data and use it as evidence to reject the null hypothesis.

However big P-values just mean what we observed isn’t __________________. That is, the results are now in line with our assumption that the null hypothesis models the world, so we have no reason to reject it.

A big P-value doesn’t prove that the null hypothesis is true, but it certainly offers no evidence that it is _____________________.

Thus, when we see a large P-value, all we can say is that we “_______________ _____________________________________.”

Alpha Levels Sometimes we need to make a firm decision about whether or not to reject

the null hypothesis. When the P-value is small, it tells us that our data are rare given the null

hypothesis. How rare is “rare”? We can define “rare event” arbitrarily by setting a threshold for our P-value.

If our P-value falls below that point, we’ll reject H0. We call such results __________________________________.

The threshold is called an _______________________, denoted by a. Common alpha levels are _________, __________, and ___________.

You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation.

The alpha level is also called the ______________________________. When we reject the null hypothesis, we say that the test is “significant

at that level.” What can you say if the P-value does not fall below a?

You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.”

Don’t say that you “accept the null hypothesis.” Recall that, in a jury trial, if we do not find the defendant guilty, we say the

defendant is “________________”—we don’t say that the defendant is “innocent.”

20

The P-value gives the reader far more information than just stating that you reject or fail to reject the null.

In fact, by providing a P-value to the reader, you allow that person to make his or her own decisions about the test.

What you consider to be statistically significant might not be the same as what someone else considers statistically significant.

There is more than one alpha level that can be used, but each test will give only one P-value.

Significant vs. Important What do we mean when we say that a test is statistically significant?

All we mean is that the test statistic had aP-value ________________ than our alpha level.

Don’t be lulled into thinking that statistical significance carries with it any sense of practical importance or impact.

For large samples, even small, unimportant (“insignificant”) deviations from the null hypothesis can be statistically __________________________.

On the other hand, if the sample is not large enough, even large, financially or scientifically “significant” differences may not be statistically significant.

It’s good practice to report the magnitude of the difference between the observed statistic value and the null hypothesis value (in the data units) along with the P-value on which we base statistical significance.

Confidence Intervals and Hypothesis Tests Confidence intervals and hypothesis tests are built from the same

calculations. They have the same ___________________ and ___________________.

You can approximate a hypothesis test by examining a _________________________________.

Just ask whether the null hypothesis value is consistent with a confidence interval for the parameter at the corresponding confidence level.

Because confidence intervals are two-sided, they correspond to __________________________.

In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an a-level of 100 – C%.

The relationship between confidence intervals and ___________________________ is a little more complicated.

A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an a-level of ½(100 – C)%.

21

Just Checking…1. An experiment to test the fairness of a roulette wheel gives a z-score of 0.62.

What would you conclude?

2. In the last chapter we encountered a bank that wondered if it could get more customers to make payments on delinquent balances by sending them a DVD urging them to set up a payment plan. Well, the bank just got back the results on their test of this strategy. A 90% confidence interval for the success rate is (0.29, 0.45). Their old send-a-letter method had worked 30% of the time. Can you reject the null hypothesis that the proportion is still 30% at α = 0.05? Explain.

3. Given the confidence interval the bank found in their trial of DVDs, what would you recommend they do? Should they scrap the DVD strategy?

*A 95% Confidence Interval for Small Samples When the ____________________________________ fails, all is not lost. A simple adjustment to the calculation lets us make a confidence interval

anyway. All we do is add four phony observations, two successes and two failures. So instead of: we use the adjusted proportion:

*A Better Confidence Interval for Proportions

Now the adjusted interval is:

The adjusted form gives better performance overall and works much better for proportions of 0 or 1.

It has the additional advantage that we no longer need to check the Success/Failure Condition.

Making Errors

Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision.

When we perform a hypothesis test, we can make mistakes in two ways:

22

I. The null hypothesis is true, but we mistakenly reject it. (____________________)

II. The null hypothesis is false, but we fail to reject it. (____________________)

Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent.

Here’s an illustration of the four situations in a hypothesis test:

How often will a Type I error occur? I. Since a Type I error is rejecting a true null hypothesis, the probability

of a Type I error is our a level. When H0 is false and we reject it, we have done the right thing.

I. A test’s ability to detect a false hypothesis is called the _________________ of the test.

When H0 is false and we fail to reject it, we have made a ___________________.I. We assign the letter b to the probability of this mistake.II. It’s harder to assess the value of b because we don’t know what the

value of the parameter really is.III. There is no single value for b--we can think of a whole collection of b’s,

one for each incorrect parameter value.

One way to focus our attention on a particular b is to think about the ______________________.

I. Ask “How big a difference would matter?” We could reduce b for all alternative parameter values by increasing a.

I. This would reduce b but increase the chance of a Type I error.II. This tension between Type I and Type II errors is inevitable.

The only way to reduce both types of errors is to ________________________. Otherwise, we just wind up trading off one kind of error against the other.

Power The ____________________ of a test is the probability that it correctly rejects a

false null hypothesis. When the power is _____________, we can be confident that we’ve looked hard

enough at the situation. The power of a test is 1 – b ; because b is the probability that a test fails to

reject a false null hypothesis and power is the probability that it does reject.

23

Whenever a study fails to reject its null hypothesis, the __________________ power comes into question.

When we calculate power, we imagine that the null hypothesis is false. The value of the power depends on how far the ________________ lies from the

null hypothesis value.I. The distance between the null hypothesis value, p0, and the truth, p, is

called the _____________________________.II. Power depends directly on _________________________.

Just Checking…4. Remember our bank that’s sending out DVDs to try to get customers to make payments on delinquent loans? It is looking for evidence that the costlier DVD strategy produces a higher success rate than the letters it has been sending. Explain what a Type I error is in this context and what the consequences would be to the bank?

5. What’s a Type II error in the bank experiment context, and what would the consequences be?

6. For the bank, which situation has higher power: a strategy that works really well, actually getting 60% of the people to pay off their balances, or a strategy that barely increases the payoff rate to 32%? Explain briefly.

A Picture Worth Words

The __________________ the effect size, the easier it should be to see it. Obtaining a larger sample size ___________________ the probability of a Type II

error, so it increases the power. It also makes sense that the ________________ we’re willing to accept a Type I

error, the ________________ likely we will be to make a Type II error. This diagram shows the relationship between these concepts:

Reducing Both Type I and Type II Error

24

1P ( z>3 .09 )

The previous figure seems to show that if we reduce Type I error, we must automatically __________________________ Type II error.

But, we can reduce both types of error by making both curves _________________.

How do we make the curves narrower? ___________________ the sample size. This figure has means that are just as far apart as in the previous figure, but

the sample sizes are larger, the standard deviations are smaller, and the error rates are reduced:

Original comparison of errors: Comparison of errors with a larger sample size:

What Can Go Wrong? Don’t interpret the P-value as the probability that H0 is true.

I. The P-value is about the ____________, not the hypothesis.II. It’s the probability of observing data this unusual, given that H0 is true,

not the other way around. Don’t believe too strongly in arbitrary alpha levels.

I. It’s better to report your P-value and a confidence interval so that the reader can make her/his own decision.

Don’t confuse practical and statistical significance.I. Just because a test is __________________________________ doesn’t mean

that it is significant in practice.II. And, sample size can impact your decision about a null hypothesis,

making you miss an important difference or find an “______________________” difference.

Don’t forget that in spite of all your care, you might make a wrong decision.

Chapter 22: Comparing Two Proportions Comparing Two Proportions

25

Comparisons between two percentages are much more common than questions about isolated percentages. And they are more interesting.

We often want to know how two groups differ, whether a treatment is better than a placebo control, or whether this year’s results are better than last year’s.

Another Ruler In order to examine the difference between two proportions, we need another

ruler—the standard deviation of the sampling distribution model for the difference between two proportions.

Recall that standard deviations don’t add, but variances do. In fact, the variance of the sum or difference of two independent random quantities is the ___________ of their individual variances.

The Standard Deviation of the Difference Between Two Proportions Proportions observed in independent random samples are independent. Thus,

we can add their variances. So… The standard deviation of the difference between two sample proportions is Thus, the standard error is:

Assumptions and Conditions ____________________________________:

_______________________________: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment.

_______________________________: If the data are sampled without replacement, the sample should not exceed 10% of the population.

_______________________________: The two groups we’re comparing must be independent of each other.

____________________________________: Each of the groups must be big enough… ________________________________: Both groups are big enough that at

least 10 successes and at least 10 failures have been observed in each.

The Sampling Distribution We already know that for large enough samples, each of our proportions has

an approximately Normal sampling distribution. The same is true of their difference.

26

Provided that the sampled values are ______________________, the samples are independent, and the samples sizes are large enough, the sampling distribution of is modeled by a Normal model with

Mean: Standard deviation:

Two-Proportion z-Interval When the conditions are met, we are ready to find the confidence interval for

the difference of two proportions: The confidence interval is

where The critical value z* depends on the particular confidence level, C, that you

specify.

Just Checking…A public broadcasting station plans to launch a special appeal for additional contributions from current members. Unsure of the most effective way to contact people, they run an experiment. They randomly select two groups of current members. They send the same request for donations to everyone, but it goes to one group by e-mail and another group by regular mail. The station was successful in getting contributions from 26% of the members they e-mailed but only 15% of those who received request by regular mail. A 90% confidence interval estimated the difference in donation rates to be 11% +/- 7%.1. Interpret the confidence interval in context.

2. Based on this confidence interval, what conclusion would we reach if we tested the hypothesis that there’s no difference in the response rates to the two methods of fundraising? Explain.

Everyone into the Pool The typical hypothesis test for the difference in two proportions is the one of

no difference. In symbols, H0: p1 – p2 = 0. Since we are hypothesizing that there is no difference between the two

proportions, that means that the standard deviations for each proportion are _______________________.

Since this is the case, we combine (______________) the counts to get one overall proportion.

The pooled proportion is

27

where and If the numbers of successes are not whole numbers, round them first.

(This is the only time you should round values in the middle of a calculation.)

We then put this pooled value into the formula, substituting it for _______ sample proportions in the standard error formula:

Compared to What? We’ll _________________ our null hypothesis if we see a large enough difference

in the two proportions. How can we decide whether the difference we see is large?

Just compare it with its ___________________________. Unlike previous hypothesis testing situations, the null hypothesis doesn’t

provide a standard deviation, so we’ll use a __________________________ (here, pooled).

Two-Proportion z-Test The conditions for the two-proportion z-test are the same as for the two-

proportion z-interval. We are testing the hypothesis H0: p1 – p2 = 0, or, equivalently, H0: p1 = p2. Because we hypothesize that the proportions are equal, we pool them to find

We use the pooled value to estimate the standard error:

Now we find the test statistic:

When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value.

Just Checking…3. A June 2004 public opinion poll asked 1000 randomly selected adults whether the United States should decrease the amount of immigration allowed; 49% of those responding said “yes.” In June of 1995, a random sample of 1000 had found that 65% of adults thought immigration should be curtailed. To see if the percentage has

28

decreased, why can’t we just use a one-proportion z-test of Ho = 0.65 and see what the P-value for p = 0.49 is?

4. For opinion polls like this, which has more variability: the percentage of respondents answering “yes” in either year or the difference in percentages between the two years?

What Can Go Wrong? Don’t use ________________________________________ methods when the

samples aren’t independent. These methods give wrong answers when the independence

assumption is violated. Don’t apply __________________________ methods when there was no

randomization. Our data must come from representative random samples or from a

properly randomized experiment. Don’t interpret a significant difference in proportions _________________.

Be careful not to jump to conclusions about causality.

29