Upload
logan-thornton
View
229
Download
0
Embed Size (px)
DESCRIPTION
linear model is an equation of a straight line through the data. – The points don’t all line up – straight line can summarize the general pattern
Citation preview
Chapter 8Linear Regression
Fat Versus Protein: An Example• 30 items on the Burger King menu:
• linear model is an equation of a straight line through the data. –The points don’t all line up–straight line can summarize the
general pattern
Residuals• The scattered points are actual data, • The corresponding values on the line
are the predicted values,• A residual is an error: difference
between the actual and the predicted (the line)
y
ˆresidual actual predicted y y
y
• A negative residual means the predicted value’s too big (an overestimate).
• A positive residual means the predicted value’s too small (an underestimate).
• A residual of zero means the line predicted exact
“Best Fit” Line• We want the total residuals to be small as
possible (minimizing error)• The smaller the sum, the better the fit line• Why the linear model is the line of best fit or
regression line or least squares line• The line always passes through the mean of the x
variable and the mean of the y variable
Regression Line• Algebra : • Statistics:
• If the model is a good one, the data values will scatter closely around it.
ymx b
1 0y b x b
• slope (b1) formula
–In units of y per unit of x–Interpretation:
b1 rsysx
• intercept (b0)
–In units of y–Usefulness of intercept:
b0 y b1x
Correlation and the Line• Correlation coefficient tells us “how linear”• So we will use r in the calculation of the slope• Neg. Slope = Neg. Correlation• Pos. Slope = Pos. Correlation
• Moving 1 st. dev. away from the mean in x moves us r st. dev. away from the mean in y
(and visa versa)
Ex: If you’re 1.5 st. dev. above avg. in GPA…Then how many st. dev. above on SAT?
Ways to find the Regression Line
1. Use the formulas for slope and intercept
2. Use raw data and run a stat -> calc -> LinReg
3. Use a given compute output
• Burger King data fits the data well:– The equation is
predict fat of a 30 g protein sandwich:
Check for same conditions:–Quantitative Variables –Straight Enough–No Outlier
• To check whether a linear model is appropriate:
1. Look at the scatter plot of residuals vs. x var or y var (run a LinReg first)
2. If there is a pattern = linear model NOT appropriate
3. If there is NO pattern = linear model is appropriate
• The residuals for the BK menu regression look appropriately boring:
The Residual St. Dev.
• The st. dev. of the residuals, se , to be relatively small
= spread/scatter around the line is small
R2—Coefficient of Strength• It’s the square of r• It’s a percent• It’s between 0 and 100%• The closer to 100 the stronger the linear
model is at predicting• Interpretation:“the % of the variation in y var that is explained by the x var”
EX: Burger King Fat vs. Protein had R2 = 69%.
Summary• In order to use linear regression:1. Check the original x, y scatter plot of
straightness2. Run a LinReg, then make a scatterplot of
residuals vs x var….check for no pattern3. Turn on diagnostic to check and run a LinReg
to check R2
Chapter 8 Assignment
Pg: 192: #1-11 odd, 19-23 odd, 27, 31, 35, 37, 45, 49