16
Linear Regression Model • In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a + bx • linear regression model – r, r 2 (explained variance), and prediction line

Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Embed Size (px)

Citation preview

Page 1: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Linear Regression Model

• In regression, x = independent (predictor) variable

y= dependent (response) variable regression line (prediction line) ŷ = a + bx• linear regression model– r, r2 (explained variance), and prediction line

Page 2: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line

• regression line (prediction line) ŷ = a + bx

• b = Σxy – [(Σx)(Σy)/n] Σx2 – [(Σx)2 /n]

• a = My – bMx My = mean of scores of y

b = slope just computed Mx = mean of scores of x

Page 3: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

X Y XY X2 Y2

8 -2 -16 64 4

4 2 8 16 4

5 1 5 25 1

-1 6 -6 1 36

1 4 4 1 16

2 3 6 4 9

6 -1 -6 36 1

Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71

M = 3.57 M = 1.86

X 8 4 5 -1 1 2 6

Y -2 2 1 6 4 3 -1 n = 7

Page 4: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line

• b = -5 – [(25)(13)/7] 147 – [(25)2 /7]

• b = -5 – [325/7] 147 – [625 /7]

• b = -5 – 46.43 = -51.43 = -0.89 147 – 89.29 57.71

X Y XY X2 Y2

Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71

M = 3.57 M = 1.86

Page 5: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line

• b = -5 – [(25)(13)/7] 147 – [(25)2 /7]

• b = -5 – [325/7] 147 – [625 /7]

• b = -5 – 46.43 = -51.43 = -0.89 147 – 89.29 57.71

X Y XY X2 Y2

Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71

M = 3.57 M = 1.86

Page 6: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line

• a = 1.86 – -0.89(3.57)• a = 1.86 – -3.18• a = 5.04

X Y XY X2 Y2

Σx = 25 Σy = 13 Σxy = -5 Σx2= 147 Σy2 = 71

M = 3.57 M = 1.86 b = -0.89

Page 7: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line

• regression line (prediction line) ŷ = a + bx• a = 5.04• b = - 0.89• The best line of fit for the data is – ŷ = 5.04 + -0.89x

Page 8: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression Line of Fit

• Line of fit is plotted on a scatter plot

-2 0 2 4 6 8 10

-3-2-101234567

f(x) = − 0.891089108910891 x + 5.03960396039604R² = 0.978024631731466

Y-Values

Axis Title

Axis Title

Page 9: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Coefficient of Determination r2

• Measures the proportion of the variability of the DV (y) is explained by IV (x)

• Basic Properties– Obtained by squaring the value of r– Values range from 0.00 to 1.00 or 0% - 100%– 0.00 ≤ r2 ≤ 1.00 or 0% ≤ r2 ≤ 100% – 1.00 or 100% = a perfect model (explains most of

the variation in the dependent variable)– 0.00 or 0% = a imperfect model (explains none of

the variation in the dependent variable)

Page 10: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Coefficient of Determination r2

• The regression model can explain 97.8% of the variation in the y value.

• r = 7(-5) – (25)(13) = -0.989[7(147) – (25)2] X [7(71) – (13)2]

r2 = (-0.989)2 = 0.978

Page 11: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Linear Regression

• r = -0.989, r2 = 0.978, explaining 97.8% of the variance in y.

-2 0 2 4 6 8 10

-3-2-101234567

f(x) = − 0.891089108910891 x + 5.03960396039604R² = 0.978024631731466

Y-Values

X Axis

Y A

xis

Page 12: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression• Widely used for prediction (including

forecasting and time-series data)• Also used to understand how the IVs (x) are

related to the DVs (y) & to explore these relationships

• In controlled studies, can be used to infer causal relationships between the IVs and DVs.

• In addition to the r, r2, & ŷ, a test of statistical significance it typically reported.

Page 13: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression• Type of errors that can cause problems w/ the

relationships explored– Chance that data shows a relationship between 2

variables when relationship is purely coincidence. – Chance that a relationship that exists does not show

up in the sample data, purely b/c of unlucky randomly selecting the data points.

Page 14: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Residuals & Residuals Error• Errors– The difference between the observed (actual, y)

value and the predicted (ŷ) value• Residual Error– Residual = (observed – predicted)– Residual = (y – ŷ)– When plotted, the residuals should resemble

either a linear plot (if the model is linear) or a non-linear plot (if the model is non-linear)• If the two do not match, then the model (variables

included) should be adjusted

Page 15: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression• p – values (p) are statistical values, typically

reported to show the “statistical significance” of other values– The probability that the observed relationship (e.g.,

between variables) in a sample occurred by pure chance

– Or that in the population from which the sample was drawn, no such relationship actually exists.

Page 16: Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +

Regression• p – values– Results said to be statistically significant when the p

is less than a preset threshold value (alpha α)• Typically α = .05, .01, .005, or .001

– Represents a 5% , 1%, .5%, or .1%– want the p value to be = to or lower than the

threshold (p < α or p ≤ α)• p ≤ .05, p < .01, p < .001

– Typically calculated and reported with the r value (r = 0.61, p < 0.01)

– Relationships w/ p over .05 (p > .05) are typically not viewed as being statistically significant