45
Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Embed Size (px)

Citation preview

Page 1: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Inference for Regression

BPS chapter 24

© 2006 W.H. Freeman and Company

Page 2: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regressionWhich point represents “a” in our least-squares regression equation?

a) Point Q

b) Point S

c) Point R

d) Point T

Page 3: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regression (answer)Which point represents “a” in our least-squares regression equation?

a) Point Q

b) Point S

c) Point R

d) Point T

Page 4: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

CorrelationIf two quantitative variables, X and Y, have a correlation coefficient r =

0.80, which graph could be a scatterplot of the two variables?

a) Plot A

b) Plot B

c) Plot C

Page 5: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Correlation (answer)If two quantitative variables, X and Y, have a correlation coefficient r =

0.80, which graph could be a scatterplot of the two variables?

a) Plot A

b) Plot B

c) Plot C

Page 6: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

CorrelationWhich of the following statements is true?

a) rPlot A > rPlot B

b) rPlot C > rPlot A

c) rPlot C > rPlot B

d) The correlation coefficient is the same in all plots.

Page 7: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Correlation (answer)Which of the following statements is true?

a) rPlot A > rPlot B

b) rPlot C > rPlot A

c) rPlot C > rPlot B

d) The correlation coefficient is the same in all plots.

Page 8: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

ResidualThe following scatterplot shows the number of gold medals earned by

countries in 1992 versus how many earned in 1996. Which of the points would have the smallest residual?

a) Point Ab) Point Bc) Point Cd) Point D

Page 9: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Residual (answer)The following scatterplot shows the number of gold medals earned by

countries in 1992 versus how many earned in 1996. Which of the points would have the smallest residual?

a) Point Ab) Point Bc) Point Cd) Point D

Page 10: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Regression lineIn the previous question about gold medals, the least-squares

regression equation is:

Where x is the number of medals earned in 1992 and is the predicted number of medals earned in 1996. What is the best interpretation of b in this example?

a) Countries that earned ten medals in the 1992 Olympics are predicted to earn an average of nine medals in 1996.

b) For all countries participating in the 1992 Olympics, 89% earned medals in 1996.

c) If a country earned zero medals in 1992, they would have an 89% chance of earning one in 1996.

d) All countries who earned medals in 1992 had an 89% probability of earning a medal in 1996.

y

Page 11: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Regression line (answer)In the previous question about gold medals, the least-squares

regression equation is:

Where x is the number of medals earned in 1992 and is the predicted number of medals earned in 1996. What is the best interpretation of b in this example?

a) Countries that earned ten medals in the 1992 Olympics are predicted to earn an average of nine medals in 1996.

b) For all countries participating in the 1992 Olympics, 89% earned medals in 1996.

c) If a country earned zero medals in 1992, they would have an 89% chance of earning one in 1996.

d) All countries who earned medals in 1992 had an 89% probability of earning a medal in 1996.

y

Page 12: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Appropriate analysisEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If he wanted to investigate if there was a linear relationship between the distance and the velocity, what type of analysis did he perform?

a) Two-sample t-test on means

b) 2 analysis on proportions

c) Linear regression analysis

d) Matched pairs experiment

Page 13: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Appropriate analysis (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If he wanted to investigate if there was a linear relationship between the distance and the velocity, what type of analysis did he perform?

a) Two-sample t-test on means

b) 2 analysis on proportions

c) Linear regression analysis

d) Matched pairs experiment

Page 14: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regressionEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. He used the following model: where x represents the distance the galaxy is from the earth (in megaparsecs) and represents the mean velocity (in km/sec) for all galaxies at that distance. What does b represent in this problem?

a) The average velocity for a galaxy that is extremely close to earth.

b) The average change in velocity for a one-megaparsec increase in distance for those galaxies in the sample.

c) The average velocity for all galaxies in the universe.

d) The average change in velocity for a one-megaparsec increase in distance of all galaxies.

y x b y

Page 15: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regression (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. He used the following model: where x represents the distance the galaxy is from the earth (in megaparsecs) and represents the mean velocity (in km/sec) for all galaxies at that distance. What does b represent in this problem?

a) The average velocity for a galaxy that is extremely close to earth.

b) The average change in velocity for a one-megaparsec increase in distance for those galaxies in the sample.

c) The average velocity for all galaxies in the universe.

d) The average change in velocity for a one-megaparsec increase in distance of all galaxies.

y x b y

Page 16: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regressionEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. Summarizing his data with a scatterplot and generating the least-squares regression line gave the following table:

Based on the information in the table, what is the correct equation for the least-squares regression line?

a)

b)

c)

d)

e)

Page 17: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regression (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. Summarizing his data with a scatterplot and generating the least-squares regression line gave the following table:

Based on the information in the table, what is the correct equation for the least-squares regression line?

a)

b)

c)

d)

e)

Page 18: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

ResidualsEdwin Hubble collected data on the distance a galaxy is from the earth and the

velocity with which it appears to be receding. By looking at the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for performing the linear regression?

a) Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met.

b) The residual plot implies that the data violate the assumption of normality.c) The histogram of the residuals shows that the data are extremely right-

skewed.d) Neither plot tells us anything about the assumptions for doing inference for

regression.e) The residual plot implies that the data violate the assumption of linearity.

Page 19: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Residuals (answer)Edwin Hubble collected data on the distance a galaxy is from the earth and the

velocity with which it appears to be receding. By looking at the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for performing the linear regression?

a) Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met.

b) The residual plot implies that the data violate the assumption of normality.c) The histogram of the residuals shows that the data are extremely right-

skewed.d) Neither plot tells us anything about the assumptions for doing inference for

regression.e) The residual plot implies that the data violate the assumption of linearity.

Page 20: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear relationshipEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If the researchers want to test whether there is a positive linear relationship between the distance and velocity, what hypotheses could be used?

a)

b)

c)

d)

Page 21: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear relationship (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If the researchers want to test whether there is a positive linear relationship between the distance and velocity, what hypotheses could be used?

a)

b)

c)

d)

Page 22: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regressionEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding.

For a confidence interval for b we use the general form for a confidence interval: estimate (table value) (SE of the estimate)

According to the printout above, what value should we use for the standard error of the estimate?

a) 83.4389

b) 75.2371

c) -40.7836

d) 454.2584

Page 23: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regression (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding.

For a confidence interval for b we use the general form for a confidence interval: estimate (table value) (SE of the estimate)

According to the printout above, what value should we use for the standard error of the estimate?

a) 83.4389

b) 75.2371

c) -40.7836

d) 454.2584

Page 24: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Confidence intervalEdwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If a 95% confidence interval for b is (298.12, 610.20), what conclusion could be made about b at a significance level of = 0.05?

a) We have sufficient evidence to conclude that there is no linear relationship between velocity and distance.

b) We have sufficient evidence to conclude that there is a linear relationship between velocity and distance.

c) There is insufficient evidence to conclude that there is a linear relationship between velocity and distance.

d) The confidence interval does not give us enough information to answer this question.

Page 25: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Confidence interval (answer)Edwin Hubble collected data on the distance a galaxy is from the earth

and the velocity with which it appears to be receding. If a 95% confidence interval for b is (298.12, 610.20), what conclusion could be made about b at a significance level of = 0.05?

a) We have sufficient evidence to conclude that there is no linear relationship between velocity and distance.

b) We have sufficient evidence to conclude that there is a linear relationship between velocity and distance.

c) There is insufficient evidence to conclude that there is a linear relationship between velocity and distance.

d) The confidence interval does not give us enough information to answer this question.

Page 26: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervalsFor a house of size 1500 ft2, the 95% prediction interval for its selling

price will be _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2?

a) Wider than

b) The same as

c) Narrower than

d) Not comparable with

Page 27: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervals (answer)For a house of size 1500 ft2, the 95% prediction interval for its selling

price will be _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2?

a) Wider than

b) The same as

c) Narrower than

d) Not comparable with

Page 28: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervalsTrue or false: If we give a prediction interval for one home whose size

is 1500 ft2, this interval estimates the mean selling prices for all homes whose size is 1500 ft2.

a) True

b) False

Page 29: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervals (answer)True or false: If we give a prediction interval for one home whose size

is 1500 ft2, this interval estimates the mean selling prices for all homes whose size is 1500 ft2.

a) True

b) False

Page 30: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervalsTrue or false: If we compute a prediction interval for one home whose

size is 1100 ft2 and a 95% confidence interval for the mean selling prices of all homes whose size is 1100 ft2, the centers of the intervals will be the same.

a) True

b) False

Page 31: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction intervals (answer)True or false: If we compute a prediction interval for one home whose

size is 1100 ft2 and a 95% confidence interval for the mean selling prices of all homes whose size is 1100 ft2, the centers of the intervals will be the same.

a) True

b) False

Page 32: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Hypothesis testsResearchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). The following scatterplot shows the data. In order to know if the number of beers consumed was a good predictor of BAC, they tested . From the following table, what is the test statistic for performing this test?

a) 0.0126

b) 0.0180

c) 0.3320

d) 7.4796

e) -1.050

0 : 0, : 0aH Hb b

Page 33: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Hypothesis tests (answer)Researchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). The following scatterplot shows the data. In order to know if the number of beers consumed was a good predictor of BAC, they tested . From the following table, what is the test statistic for performing this test?

a) 0.0126

b) 0.0180

c) 0.3320

d) 7.4796

e) -1.050

0 : 0, : 0aH Hb b

Page 34: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Hypothesis testsResearchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). In order to know if the number of beers consumed was a good predictor of BAC, they tested . What can we conclude from the following table?

a) Because the P-value is 0.3320, there is a significant linear relationship between the number of beers consumed and BAC.

b) Because the P-value is 0.0000, there is a significant linear relationship between the number of beers consumed and BAC.

c) Because the P-value is 0.3320, there is no significant linear relationship between the number of beers consumed and BAC.

d) Because the P-value is 0.0000, there is no significant linear relationship between the number of beers consumed and BAC.

0 : 0, : 0aH Hb b

Page 35: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Hypothesis tests (answer)Researchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). In order to know if the number of beers consumed was a good predictor of BAC, they tested . What can we conclude from the following table?

a) Because the P-value is 0.3320, there is a significant linear relationship between the number of beers consumed and BAC.

b) Because the P-value is 0.0000, there is a significant linear relationship between the number of beers consumed and BAC.

c) Because the P-value is 0.3320, there is no significant linear relationship between the number of beers consumed and BAC.

d) Because the P-value is 0.0000, there is no significant linear relationship between the number of beers consumed and BAC.

0 : 0, : 0aH Hb b

Page 36: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

PredictionResearchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). We want to predict the mean BAC for students who have had seven beers. Should we use the 95% confidence interval for , which is (0.0976, 0.1290), or the 95% prediction interval for Y for X = x* which is (0.0667, 0.1599)?

a) Confidence interval

b) Prediction interval

y

Page 37: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Prediction (answer)Researchers at The Ohio State University wanted to know if they could

use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). We want to predict the mean BAC for students who have had seven beers. Should we use the 95% confidence interval for , which is (0.0976, 0.1290), or the 95% prediction interval for Y for X = x* which is (0.0667, 0.1599)?

a) Confidence interval

b) Prediction interval

y

Page 38: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

ConclusionsThe following scatterplot shows a linear regression analysis of the relationship

between the time (in seconds), y, to run a marathon versus the year the marathon was run, x. A statistics student used the regression equation y = 337,047 – 165.6809x to predict how fast the marathon would be run in 2004. She got an answer of 5022 seconds, or about 1 hour and 24 minutes. This conclusion is:

a) Believable because the results came from the regression equation.b) Believable because looking at the graph you can see that the time to run a

marathon is indeed decreasing.c) Unbelievable because no one will ever be able to run a marathon that

quickly.d) Unbelievable because using 2004 to predict the running time would be

considered extrapolation.

Page 39: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Conclusions (answer)The following scatterplot shows a linear regression analysis of the relationship

between the time (in seconds), y, to run a marathon versus the year the marathon was run, x. A statistics student used the regression equation y = 337,047 – 165.6809x to predict how fast the marathon would be run in 2004. She got an answer of 5022 seconds, or about 1 hour and 24 minutes. This conclusion is:

a) Believable because the results came from the regression equation.b) Believable because looking at the graph you can see that the time to run a

marathon is indeed decreasing.c) Unbelievable because no one will ever be able to run a marathon that

quickly.d) Unbelievable because using 2004 to predict the running time would be

considered extrapolation.

Page 40: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

ConclusionsAn article in a newspaper said that students who major in subjects that

have higher expected incomes after graduation are more likely to be married. This conclusion is:

a) Correct because the data were collected in a scientific way.

b) Incorrect because the results are likely biased due to lurking variables.

c) Not reliable because it does not sound plausible.

Page 41: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Conclusions (answer)An article in a newspaper said that students who major in subjects that

have higher expected incomes after graduation are more likely to be married. This conclusion is:

a) Correct because the data were collected in a scientific way.

b) Incorrect because the results are likely biased due to lurking variables.

c) Not reliable because it does not sound plausible.

Page 42: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

RelationshipsThe following plot shows a person’s score on a sobriety test versus

their blood alcohol content. Which statement is NOT true about this plot?

a) An outlier is present in the dataset.

b) A relationship exists between BAC and the test score.

c) The relationship could be modeled with a straight line.

d) There is a positive relationship between the two variables.

Page 43: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Relationships (answer)The following plot shows a person’s score on a sobriety test versus

their blood alcohol content. Which statement is NOT true about this plot?

a) An outlier is present in the dataset.

b) A relationship exists between BAC and the test score.

c) The relationship could be modeled with a straight line.

d) There is a positive relationship between the two variables.

Page 44: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

ConclusionsThe average height of people in the United States has been increasing

for decades. Similarly there is evidence that the number of plant species is decreasing over these decades. An appropriate conclusion to draw from these observations would be that

a) Even though they appear to be associated, we could not conclude association.

b) Growing adults are causing the number of plant species to decrease.

c) There is a positive relationship between the two variables.

Page 45: Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Conclusions (answer)The average height of people in the United States has been increasing

for decades. Similarly there is evidence that the number of plant species is decreasing over these decades. An appropriate conclusion to draw from these observations would be that

a) Even though they appear to be associated, we could not conclude association.

b) Growing adults are causing the number of plant species to decrease.

c) There is a positive relationship between the two variables.