Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
The R.M.S. Error
Regression for Prediction
2
Prediction Errors
y
x
Regression Line
Error or “residual”
3
A Measure of the Size of the Errors: RMS Error
RMS Error =
(error)2 + (error)2 + ... + (error)2
number of errors
.
4
10 15 20 25 30 35 40 45 50 55 600.5
1
1.5
2
2.5
3
3.5
4
4.5
Quantitative GMAT
MB
A G
PA
Quantitative GMAT Predicts MBA GPA? (r = .34)
5
10 15 20 25 30 35 40 45 50 55 600.5
1
1.5
2
2.5
3
3.5
4
4.5
Quantitative GMAT
MB
A G
PA
RMS Error = .46 Regression line plus one RMS error
Regression line minus one RMS error
6
Interpretation of the RMS Error
• It can be shown that the residuals have average = 0. The RMS error is thus their SD.
• Rule of thumb: about 68% of the residuals are smaller in magnitude than one RMS error. About 95% are smaller in magnitude than two RMS errors
7
The RMS error is a measure of the error around the regression line, in the same sense that the SD is a measure of variability around the mean.
8
10 15 20 25 30 35 40 45 50 55 600.5
1
1.5
2
2.5
3
3.5
4
4.5
Quantitative GMAT
MB
A G
PA
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.50
50
100
150
200
250
300
residuals
RMS Error
RMS Error
Scatter Diagram Histogram of Residuals
9
Match the RMS error: .2 1 5
10
Predicting MBA GPA• Using the GMAT, the error measure would be
the RMS error = .46
• Without knowledge of GMAT, the average would be your best prediciton. A measure of the error would be the SD of MBA GPAs. SD = .49
So you don’t gain much by using the GMAT
11
RMS Error, Correlation, the SD of Y: The Picture
Y Y
X X
Low Correlation High Correlation
12
RMS Error, Correlation, and the SD of Y: The Formula
RMS Error = 1 - r2 x SD of Y
13
Residual Plot: Focus on Prediction Errors
0
Residual = Observed minus Predicted
Scatter Diagram Residual Plot
14
Example: Predicting MBA GPA from Quantitative GMAT
10 15 20 25 30 35 40 45 50 55 600.5
1
1.5
2
2.5
3
3.5
4
4.5
Quantitative GMAT
MB
A G
PA
10 15 20 25 30 35 40 45 50 55 60-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
Quantitative GMAT
Res
idua
ls
Scatter Diagram Residual Plot
15
Predicting MBA GPA from Undergrad GPA
1 1.5 2 2.5 3 3.5 40.5
1
1.5
2
2.5
3
3.5
4
4.5
Undergrad GPA
MB
A G
PA
1 1.5 2 2.5 3 3.5 4-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
Undergrad GPA
Resi
dual
Scatter Diagram Residual Plot
16
Stream Flow Rate versus Depth
Scatter Diagram Residual Plot
17
Inside Vertical Strips
18
SDs in Vertical Strips
10 15 20 25 30 35 40 45 50 55 600.5
1
1.5
2
2.5
3
3.5
4
4.5
Quantitative GMAT
MB
A G
PA
19
Terminology
• Homoscedastic: same RMS errors in each vertical strip.
• Heteroscedastic: different RMS errors in vertical strips
20
A Heteroscedastic Scatter Plot
21
A Football-Shaped Scatter Diagram is Homoscedastic?
22
The Normal Curve Approximation within a Vertical Strip: The Picture
Regression Lin
e
The data in this vertical strip have an average given by the regression line and an SD equal to the RMS error. The normal approximation can be used with this average and SD.
23
The Normal Curve Approximation within a Vertical Strip: The Algorithm
• Find the average in the strip from the regression line
• The SD within the strip is the RMS error
• Convert to standard units using this mean and SD
• Refer to table of normal curve
24
Example
E3: Average height of father = 68 inches; SD = 2.7
Average height of son = 69 inches; SD = 2.7
r = .50
Scatter diagram is football shaped.
Q: What percent of the sons were over 6 feet tall?
6 feet = 72 inches.
Standard Unit =72 - 69
2.7= 1.11
25
From the Table:
-1.10 1.1072.87 %
13.57 %
So about 14 % of the sons are taller than 72 inches
26
E3: Average height of father = 68 inches; SD = 2.7
Average height of son = 69 inches; SD = 2.7
r = .50
Scatter diagram is football shaped.
Q: What percent of the 6 foot fathers had sons over 6 feet tall?
27
1. Find the average height of the sons from the regression line:
A 6 foot father is 72 - 68
2.7= 1.48 SDs
higher than the father average.
28
Son SD
Father SD r x Son SD
Regression Line and SD Line
29
1.48 father SDs
r x 1
.48
son S
Ds
son average + r x 1.48 son SDs
69 + .5 x 1.48 x 2.7 = 71 inches
Point of Averages
30
So the average in this strip is 71 inches. What is the SD?
RMS Error = 1 - r2 x SD of Y
= 1- .52 x 2.7
= 2.33
31
So: In the 72 inch father strip the average son height is 71 inches and the SD is 2.33.
To answer question, “What percent in this strip are over 6 feet tall?” use the normal curve.
6 feet = 72 inches
= (72 - 71)/2.33 standard units
= .43 standard units
34.73 32.6 32.6
About 33% of the sons of 6 foot fathers are taller than 6 feet.