17
Chapter 9: Regression Wisdom AP Statistics

Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Embed Size (px)

Citation preview

Page 1: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Chapter 9: Regression Wisdom

AP Statistics

Page 2: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Issues and Problems with Regression

• Subsets and curves• Dangers of extrapolation• Possible effects of outliers, high leverage, and

influential points• Problems with regression of summary data• Mistakes of inferring causation

Page 3: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

What else can residuals tell us?

• Histograms (and other graphs) of residuals can reveal “Subsets” of data that will enhance our understanding of the original data.

• May lead us to analyzing the “subsets” seperately.

Page 4: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

What else can residuals tell us?

Histogram of residuals Scatterplot of residuals

Page 5: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Hard to See CurvesSometimes the scatterplot looks “straight enough”, but a non-linear

relationship only comes to light after you look at residual plot.

Page 6: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Extrapolation

• The farther our x value is from the mean of x, the less we trust our predicted value.

• Once we venture into new x territory our predicted value is an extrapolation.

• Our extrapolations not reliable because we are operating under the assumption that the relationship between x and y has changed, even for these extreme values of x.

• Don’t extrapolate into the future!!!!!!!!

Page 7: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Extrapolation

Page 8: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Outliers, Leverage and Influence

Unusual point vocabulary:High Leverage Points: Points that have an x

value that is far from

Influential Points: Points that change the model (change the slope of the line)

High leverage points can also be influential, but do not need to be

x

Page 9: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Outliers, Leverage and Influence

Three types of unusual points:

1. High Leverage points with small residuals. These points confirm the pattern, but are extreme values. The slope and intercept are mostly unaffected, but the R-squared value will increase—don’t be misled that the model is now stronger.

Page 10: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Outliers, Leverage, and Influence

2. Outliers—Not high leverage, not influential and large residual: Does not affect slope, but aren’t consistent with pattern. Will change the intercept. Don’t throw away. x value is near center of mean of x values

Page 11: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Outliers, Leverage, and Influence

3. Influential Points—also high leverage and probably small residual: These are most troublesome. They aren’t consistent with model and if the point is removed the slope of line dramatically changes—it changes the model. Don’t throw it our without thinking.

Page 12: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Lurking Variables and Causation

• With observational data, as opposed to designed experiments, there is not way to be sure that a lurking variable is not the cause of any apparent association.

• The lurking variable is some third variable (not the explanatory or predictor variable) that is driving both variables you have observed.

Page 13: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Lurking Variables and Causationz is the lurking variable

Page 14: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Lurking Variables and Causation

There have been many studies showing a strong positive association between hours spent in religious activities (going to church, attending religious classes, praying, etc) and life expectancy. NOT CAUSATION. There is confoudnding—on average, people who attend relgious activites also take better care of themselves than non-church attendants. They are also less likely to smoke, more likely to exercise and less likely to be overweight. These effects of good habits (lurking variables) are confounded with the direct effects of attending religious activities.

Page 15: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Working With Summary Values

• Be cautious when working with data values that are summaries, such as mean and medians.

• These values have less variability and therefore inflate the strength of the relationship (correlation).

Page 16: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Summary Data

Meanfin = 0.808MeanMid + 13.6; r2 = 0.98

66687072747678808284

MeanMid64 66 68 70 72 74 76 78 80 82 84 86 88

-0.8

0.0

0.8

64 66 68 70 72 74 76 78 80 82 84 86 88MeanMid

Collection 2 Scatter Plot

Page 17: Chapter 9: Regression Wisdom AP Statistics. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers,

Fin = 0.0500Mid + 72; r2 = 0.00052

0

20

40

60

80

100

120

Mid60 65 70 75 80 85 90 95 100 105

-80-40

040

60 65 70 75 80 85 90 95 100 105Mid

Collection 2 Scatter Plot

All Data Points