Type I and Type II Errors. An analogy may help us to understand two types of errors we can make with...

Preview:

Citation preview

Type I and Type II Errors

An analogy may help us to understand two types of errors we can make with inference.

Consider the judicial system in the US.

There is an obvious goal of convicting guilty people and acquitting innocent ones.

When the system works that’s what happens.

Sometimes, however, the system fails; the guilty go free or the innocent go to jail.

Consider this square:

InnocentTruth:

Convict

Acquit

Action:

Type I

error

Type II

error

One side represents the truth, the other represents the action we take.

In truth, a person is innocent or guilty.

In action, we may convict or acquit.

In these cases the system works.

In these cases it fails, and we call these failures Type I and Type II errors.

Guilty

We do not consider the two types of errors to be equivalent.

We especially try to avoid convicting innocent people and decide on procedures and rules to prevent that.

As times change we may change the rules and change the probabilities of these two types of errors.

As one type goes down, the other goes up.

More rarely, an improvement in technique, such as with DNA technology, results in a decrease in both types of errors.

Now back to Statistics:

H0 is TrueTruth:

Reject H0

Fail to reject H0

Action:

Type I

error

Type II

error

One side of our square represents truth, the other action.

In truth, a null hypothesis is true or false.

In action, we reject or fail to reject H0.

Sometimes our system works.

Sometimes it fails, and we also call these failures Type I and Type II errors.

H0 is False

For type I and type II errors to exist, we must have null and alternate hypotheses.

H0: µ = 40 The mean of the distribution is 40.

Ha: µ 40 The mean of the distribution is not 40.

For our example to follow:

P(Type I error) = P(Type II error) =

We can define:

When we choose a fixed level of significance we set . Here we have a mean of 40 and standard deviation of 10.

The shaded regions represent regions of Type I error and have probability . In this example is 5%. (The significance level is 95%.)

We get a Type I error when we our distribution is centered at 40, but our sample mean happens to be larger than 60 or smaller than 20.

If we change the significance level to 90%, we change . Here is 10%.

As you can see, we have increased the probability of Type I error.

Type II error occurs only when the null hypothesis is false. It cannot occur if the null hypothesis is true, by our very definition.

When we speak of Type II error we must know that the null hypothesis is not true.

Let’s start with our hypothetical distribution:

Now we see an alternate distribution. Our samples will come from this distribution, N(68,10), instead of the hypothetical distribution.

Now we see both.

The region shaded pink is our probability of Type I error, here 5%.

The region shaded blue is the probability of Type II area. Notice that it is under the alternate (blue) distribution.

We make a Type II error whenever the null hypothesis is false, but we get a sample mean that falls into a range that will cause us to fail to reject the null hypothesis.

Let’s take a closer look:

Sample means between 20 and 60 will “look good” to us, we will not reject the null hypothesis.

Now we check the alternate distribution. Are there times when sample means from this distribution will give us values between 20 and 60?

In fact, there are.

To find the probability of Type II area we find the area under the curve.

P(X <60)=P Z<60−68

10

⎛ ⎝ ⎜

⎞ ⎠ ⎟ =P(Z <−.8)=.2118

So the probability of Type II error is 21%.

That is, when the true mean is 68, there is a 21% probability that we will fail to reject the null hypothesis.

How can we reduce the probability of Type II error?

Examine the following figures:

Can you see that is less now, but is greater?

P(X <56.4)=P Z<56.4485−68

10

⎛ ⎝ ⎜

⎞ ⎠ ⎟ =

=P(Z<−1.1551)=.1240

Here the probability of Type II error is 12.4%

Increasing does result in a decrease in .

This does not necessarily get you very far ahead.

Suppose we could have a different alternate distribution. Suppose we could make it have a larger mean, perhaps 72 instead of 68. Would this change ?

Now we have a new alternate distribution N(72,10) and so a new probability.

P(X <60)=P Z<60−72

10

⎛ ⎝ ⎜

⎞ ⎠ ⎟ =

=P(Z<−1.2)=.1150

So we now have 11.5% Type II error. While moving the alternate distribution further away reduces Type II error, usually we cannot do this, for practical reasons.

Another approach is to decrease standard deviation. Any way we can accomplish this will have the same effect. Usually you can change sample size.

If our sampling distributions are now N(40,8) and N(68,8) we can find the effect on probability of Type II error.

This also shows a reduction in Type II error. Increasing sample size will be our most effective way to minimize Type II error.

P(X <55.6)=P Z<55.6−68

8

⎛ ⎝ ⎜

⎞ ⎠ ⎟ =

P(Z<−1.55)=.06057

With a decrease in the standard deviation we see the probability of Type II error decrease to 6%. Decreasing the standard deviation reduces the amount of overlap between the two distributions, thus reducing the Type II error.

We have seen the difference between Type I and Type II errors.

We set the probability of Type I error when we choose a level of significance.

The probability of Type II error can be reduced by increasing , by reducing the standard deviation (perhaps by increasing sample size), or by increasing the distance between the hypothetical and alternate means.

THE END

Recommended