17
Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction Vertical Chimneys Regression Line Equation of the Regression Line Regression and Least Squares Regression Fallacy

Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

  • Upload
    others

  • View
    2

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

Regression and PredictionChapter 15 plus extra

May 2, 2012

Prediction

Vertical Chimneys

Regression Line

Equation of the Regression Line

Regression and Least Squares

Regression Fallacy

Page 2: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

1.0 Prediction

If we have two quantitative variables X and Y that arelinearly related to each other, then knowing the particularvalue of X for one individual can help us to estimate(or predict) the value of Y for that individual.

We will explore what is the best prediction of the responsevariable (Y ) given a value of the explanatory variable (X ).

What is the likely size of the prediction error?

Page 3: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

1.1 Fundamental Principle of

Prediction

Incoming students at a large law school have an averageL.S.A.T. score of 163 and a S.D. of 8. You may assume thehistogram of these data follows a normal curve approximately.Tomorrow one of these students will be chosen at random.

What is your best guess for their score?

The guess will be compared to their actual score to seehow far off it is. What is the likely size for the error inyour guess?

Page 4: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

2.0 Vertical Chimneys In a

Scatterplot

55 60 65 70 75 80

6065

7075

Father's height (inches)

Son

's h

eigh

t (in

ches

)

55 60 65 70 75 8060

6570

75

Father's height (inches)

Son

's h

eigh

t (in

ches

)

Page 5: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

2.0 Vertical Chimneys in a

Scatterplot

55 60 65 70 75 80

6065

7075

Father's height (inches)

Son

's h

eigh

t (in

ches

)

The graph ofaverages shows theaverage son’s height foreach father’s height.

It is close to a straightline in the middle.

At the ends, it is quitebumpy.

Page 6: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

2.1 Prediction in a Scatterplot

Use the mean of the relevant sub-group of data as ourpredictor.

S.D. of the group gives the “likely size” of the error inour prediction.

Page 7: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

3.0 Regression Line

55 60 65 70 75 80

6065

7075

Father's height (inches)

Son

's h

eigh

t (in

ches

)

The regression lineis a line fit to the graphof averages.

It smooths away someof the chance variationin the data.

If the graph of averagesis close to a straightline, then we use theregression line to predictY for a given X .

If the graph of averagesis non-linear, it is betterto use it instead.

Page 8: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

3.1 Predicting using a Regression Line

Estimate the averageweight of the men whoseheight is 69 inches.

If you used theregression method toestimate weight fromheight, would yourestimates generally be alittle too high, low orabout right, for men inthe sample with heightbetween 72 in. and 74in?

Page 9: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

4.0 The Regression Line

The regression line for predicting Y from X passesthrough the point of averages

(X , Y

)and has

sloper × S.D. of Y

S.D. of X

Page 10: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

5.0 The Equation of the Regression

Line

The regression line for predicting Y from X has the form:

Y = a + b X ,

= intercept + slope X .

Here

b = slope,

= rS.D. of Y

S.D of X.

a = intercept,

= Y − b X ,

= Y −(r

S.D. of Y

S.D of X

)X .

Page 11: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

5.1 Prediction from a Regression Line

The predicted value of Y for a given value of X say X ∗

has the form:

Y = a + b X ∗,

=

(Y − r

S.D. of Y

S.D of XX

)+

(r

S.D. of Y

S.D of X

)X ∗.

Page 12: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

5.2 Predicting Sons’ Heights1,078 father-son pairs and their heights were measured.

I Average height of fathers is ≈ 68 in.I S.D. of height of fathers is ≈ 2.7 in.I Average height of sons is ≈ 69 in.I S.D. of height of sons is ≈ 2.8 in.I r is ≈ 0.5.

What are the co-ordinates for the point of averages?

What is the slope of the regression line?

What is the intercept of the regression line?

Write the equation of the regression line.

Suppose a father has a height of 72 inches. What wouldyou predict for his sons’ height?

Suppose a father has a height of 62 inches. What wouldyou predict for his son’s height?

Page 13: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

5.3 Interpreting the Regression

Coefficients

Associated with a unit increase in X , there is some averagechange in Y . The slope of the regression line estimates thischange. The formula for the slope is:

r × S.D. of YS.D. of X

That is, associated with an increase of one S.D. in X , there isan increase of r S.D.s in Y , on the average.

The intercept is just the predicted value for Y when X equalszero. be wary of extrapolation

Page 14: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

6.0 Regression and Least Squares

The Regression Line is familiarly referred to as the leastsquares line. This is because it minimizes the sum of thesquares of the vertical distances of the data points.

Data point

Vertical distanceto line

y

x

Regression Line

Page 15: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

7.0 The Regression Fallacy

Page 16: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

7.0 The Regression Fallacy

In virtually every scatterplot with less than perfect correlation,the data points that are extreme along the x axis tend not tobe as extreme on the y axis. This is called the regressioneffect.

DefinitionThinking that the regression effect must be due to somethingimportant, not just chance error, is called the regressionfallacy.

Page 17: Regression and Prediction - UW Faculty Web Serverfaculty.washington.edu/grover4/class11.pdf · 2012-04-29 · Regression and Prediction Chapter 15 plus extra May 2, 2012 Prediction

7.1 Example

An instructor standardizes both her midterm and the finaleach semester so the class average is 50 and the S.D. is 10 onboth tests. The correlation between the tests is around 0.5.One semester she took all the students who scored below 30 inthe midterm and gave them special tutoring. On average, theygained 10 points the final. She claims that her tutoringworked. Can you give her alternate explanation?