Upload
ambrose-bates
View
223
Download
1
Embed Size (px)
Citation preview
Linear Regression Model
• In regression, x = independent (predictor) variable
y= dependent (response) variable regression line (prediction line) ŷ = a + bx• linear regression model– r, r2 (explained variance), and prediction line
Regression Line
• regression line (prediction line) ŷ = a + bx
• b = Σxy – [(Σx)(Σy)/n] Σx2 – [(Σx)2 /n]
• a = My – bMx My = mean of scores of y
b = slope just computed Mx = mean of scores of x
X Y XY X2 Y2
8 -2 -16 64 4
4 2 8 16 4
5 1 5 25 1
-1 6 -6 1 36
1 4 4 1 16
2 3 6 4 9
6 -1 -6 36 1
Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71
M = 3.57 M = 1.86
X 8 4 5 -1 1 2 6
Y -2 2 1 6 4 3 -1 n = 7
Regression Line
• b = -5 – [(25)(13)/7] 147 – [(25)2 /7]
• b = -5 – [325/7] 147 – [625 /7]
• b = -5 – 46.43 = -51.43 = -0.89 147 – 89.29 57.71
X Y XY X2 Y2
Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71
M = 3.57 M = 1.86
Regression Line
• b = -5 – [(25)(13)/7] 147 – [(25)2 /7]
• b = -5 – [325/7] 147 – [625 /7]
• b = -5 – 46.43 = -51.43 = -0.89 147 – 89.29 57.71
X Y XY X2 Y2
Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71
M = 3.57 M = 1.86
Regression Line
• a = 1.86 – -0.89(3.57)• a = 1.86 – -3.18• a = 5.04
X Y XY X2 Y2
Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71
M = 3.57 M = 1.86 b = -0.89
Regression Line
• regression line (prediction line) ŷ = a + bx• a = 5.04• b = - 0.89• The best line of fit for the data is – ŷ = 5.04 + -0.89x
Regression Line of Fit
• Line of fit is plotted on a scatter plot
-2 0 2 4 6 8 10
-3-2-101234567
f(x) = − 0.891089108910891 x + 5.03960396039604R² = 0.978024631731466
Y-Values
Axis Title
Axis Title
Coefficient of Determination r2
• Measures the proportion of the variability of the DV (y) is explained by IV (x)
• Basic Properties– Obtained by squaring the value of r– Values range from 0.00 to 1.00 or 0% - 100%– 0.00 ≤ r2 ≤ 1.00 or 0% ≤ r2 ≤ 100% – 1.00 or 100% = a perfect model (explains most of
the variation in the dependent variable)– 0.00 or 0% = a imperfect model (explains none of
the variation in the dependent variable)
Coefficient of Determination r2
• The regression model can explain 97.8% of the variation in the y value.
• r = 7(-5) – (25)(13) = -0.989[7(147) – (25)2] X [7(71) – (13)2]
r2 = (-0.989)2 = 0.978
Linear Regression
• r = -0.989, r2 = 0.978, explaining 97.8% of the variance in y.
-2 0 2 4 6 8 10
-3-2-101234567
f(x) = − 0.891089108910891 x + 5.03960396039604R² = 0.978024631731466
Y-Values
X Axis
Y A
xis
Regression• Widely used for prediction (including
forecasting and time-series data)• Also used to understand how the IVs (x) are
related to the DVs (y) & to explore these relationships
• In controlled studies, can be used to infer causal relationships between the IVs and DVs.
• In addition to the r, r2, & ŷ, a test of statistical significance it typically reported.
Regression• Type of errors that can cause problems w/ the
relationships explored– Chance that data shows a relationship between 2
variables when relationship is purely coincidence. – Chance that a relationship that exists does not show
up in the sample data, purely b/c of unlucky randomly selecting the data points.
Residuals & Residuals Error• Errors– The difference between the observed (actual, y)
value and the predicted (ŷ) value• Residual Error– Residual = (observed – predicted)– Residual = (y – ŷ)– When plotted, the residuals should resemble
either a linear plot (if the model is linear) or a non-linear plot (if the model is non-linear)• If the two do not match, then the model (variables
included) should be adjusted
Regression• p – values (p) are statistical values, typically
reported to show the “statistical significance” of other values– The probability that the observed relationship (e.g.,
between variables) in a sample occurred by pure chance
– Or that in the population from which the sample was drawn, no such relationship actually exists.
Regression• p – values– Results said to be statistically significant when the p
is less than a preset threshold value (alpha α)• Typically α = .05, .01, .005, or .001
– Represents a 5% , 1%, .5%, or .1%– want the p value to be = to or lower than the
threshold (p < α or p ≤ α)• p ≤ .05, p < .01, p < .001
– Typically calculated and reported with the r value (r = 0.61, p < 0.01)
– Relationships w/ p over .05 (p > .05) are typically not viewed as being statistically significant