31
CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2

CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

  • Upload
    howe

  • View
    56

  • Download
    3

Embed Size (px)

DESCRIPTION

CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited. Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2. Pearson Correlation. n  (x i – m x )(y i – m y )/(n-1) r xy = I=1 _____________________________ = s xy /s x s y s x s y - PowerPoint PPT Presentation

Citation preview

Page 1: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2

Page 2: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Pearson Correlation n

(xi – mx)(yi – my)/(n-1) rxy = I=1_____________________________ = sxy/sxsy

sx sy

= zxizyi/(n-1) / = 1 – ( (zxi-zyi)2/2(n-1)

= 1 – ( (dzi)2/2(n-1)

= COVARIANCE / SDxSDy

Page 3: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Variance of X=1

Variance of Y=1

r2 = percent overlap in the two squares

Fig. 3.6: Geometric representation of r2 as the overlap of two squares

a. Nonzero correlation

Variance of X=1

Variance of Y=1

B. Zero correlation

Page 4: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SSySSx

Sxy

Sums of Squares and Cross Product (Covariance)

Page 5: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SATMath

CalcGrade

.00364 (.40))

error

.932(.955)

Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades

R2 = .42 = .16

Page 6: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Path Models• path coefficient -standardized coefficient

next to arrow, covariance in parentheses• error coefficient- the correlation between

the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores.

• Predicted(Calc Grade) = .00364 SAT-Math + 2.5

• errors are sometimes called disturbances

Page 7: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

X Y

a

X Y

b

X Y

c

Figure 3.2: Path model representations of correlation

Page 8: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SUPPRESSED SCATTERPLOT

• NO APPARENT RELATIONSHIP

X

Y

Prediction lines

MALES

FEMALES

Page 9: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

IDEALIZED SCATTERPLOT• POSITIVE CURVILINEAR

RELATIONSHIP

X

Y

Linear

prediction line

Quadratic

prediction line

Page 10: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

LINEAR REGRESSION- REVISITED

Page 11: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Single predictor linear regression.

• Regression equations:• y = xb1x+ xb0

• x = yb1y + yb0

• Regression coefficients:• xb1 = rxy sy / sx

• yb1 = rxy sx / sy

Page 12: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Two variable linear regression

• Path model representation:unstandardized

x y e

b1

Page 13: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Linear regression

y = b1x + b0

If the correlation coefficient is calculated, then b1 can be calculated from the equation above:

b1 = rxy sy / sx

The intercept, b0, follows by placing the means for x and y into the equation above and solving:

_ _b0 = y. – [ rxysy/sx ] x.

Page 14: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Linear regression

• Path model representation:standardized

zx zy e

rxy

Page 15: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Least squares estimation

The best estimate will be one in which the sum of squared differences between each score and the estimate will be the smallest among all possible linear unbiased estimates (BLUES, or best linear unbiased estimate).

Page 16: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Least squares estimation

• errors or disturbances. They represent in this case the part of the y score not predictable from x:

• ei = yi – b1xi .

• The sum of squares for errors follows:• n• SSe = e2

i .

• i-1

Page 17: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

e

y

x

e

e

e

e

e

e e

SSe = e2i

Page 18: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Matrix representation of least squares estimation.

• We can represent the regression model in matrix form:

• y = X + e

Page 19: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Matrix representation of least squares estimation

• y = X + e

• y1 1 x1 e1

• 0

• y2 1 x2 1 e2

• y3 1 x3 e3

• y4 = 1 x4 + e4

• . 1 . .

• . 1 . .

• . 1 . .

Page 20: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Matrix representation of least squares estimation

• y = Xb + e• The least squares criterion is satisfied by the following

matrix equation:• b = (X’X)-1X’y .• The term X’ is called the transform of the X matrix. It is

the matrix turned on its side. When X’X is multiplied together, the result is a 2 x 2 matrix

• n xi

• xi x2i

Page 21: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SUMS OF SQUARES

• SSe = (n – 2 )s2e

• SSreg = ( b1 xi – y. )2

• SSy = SSreg + SSe

Page 22: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SUMS OF SQUARES-Venn Diagram

ssregSSy

SSe

Fig. 8.3: Venn diagram for linear regression with one predictor and one outcome measure

SSx

Page 23: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

STANDARD ERROR OF ESTIMATE

s2y = s2yhat + s2e

s2zy = 1 = r2y.x +s2ez

sez = sy ( 1 - r2y.x )

= SSe / (n-2)

Review slide 17: this is the standard deviation of the errors shown there

Page 24: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

SUMS OF SQUARES- ANOVA Table

SOURCE df Sum of Mean FSquares Square

x 1 SSreg SSreg / 1 SSreg/ 1

SSe /(n-2)

e n-2 SSe SSe / (n-2)

Totaln-1 SSy SSy / (n-1)

Table 8.1: Regression table for Sums of Squares

Page 25: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Confidence Intervals Around b and Beta weights

sb = (sy / sx ) (1 - r2y.x )/ (n-2)

Standard deviation of sampling error of estimate of regression weight b

sβ = ( 1 - r2y.x )/ (n-2)

Note: this is formally correct only for a regression equation, not for the Pearson correlation

Page 26: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Distribution around parameter estimates: b-weight

bestimatesb

± t sb

Page 27: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Hypothesis testing for the regression weight

Null hypothesis: bpopulation = 0Alternative hypothesis: bpopulation ≠ 0

Test statistic: t = bsample / seb

Student’s t-distribution with degrees of freedom = n-2

Page 28: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Model Summary

.539a .291 .268 3.121Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), LOCUS OF CONTROLa.

ANOVAb

123.867 1 123.867 12.714 .001a

302.012 31 9.742425.879 32

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), LOCUS OF CONTROLa.

Dependent Variable: SOCIAL STRESSb.

Coefficientsa

-4.836 2.645 -1.828 .077.190 .053 .539 3.566 .001

(Constant)LOCUS OF CONTROL

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: SOCIAL STRESSa.

Test of b=0 rejected at .05 level

SPSS Regression Analysis option predicting Social Stress from Locus of Control in a sample of 16 year olds

Page 29: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Locus of Control

Social Stress

.190 (.539))

error

3.12(.842)

Figure 3.4: Path model representation of prediction of Social Stress from Locus of Control

R2 = .291

√1- R2 = .842

b βse

Page 30: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Difference between Independent b-weights

Compare two groups’ regression weights to see if they differ (eg. boys vs. girls)

Null hypothesis: bboys = bgirls

Test statistic: t = (bboys - bgirls) / (sbboys – bgirls)(sbboys – bgirls) = √ s2bboys + s2bgirls

Student’s t distribution with n1+ n2 - 4

Page 31: CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited

Coefficientsa

-.416 3.936 -.106 .917.106 .081 .289 1.314 .205

(Constant)LOCUS OF CONTROL

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: SOCIAL STRESSa.

Coefficientsa

-9.963 2.970 -3.354 .007.281 .058 .835 4.807 .001

(Constant)LOCUS OF CONTROL

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: SOCIAL STRESSa.

boys n=22

girls n=12

t = ( .281 - .106) / √ (.0812 + .0582 )

= 1.76