23
Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

Embed Size (px)

Citation preview

Page 1: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

Correlation and Covariance

R. F. Riesenfeld(Based on web slides by

James H. Steiger)

Page 2: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 2

Goals

⇨ Introduce concepts of Covariance Correlation

⇨ Develop computational formulas

R F Riesenfeld Sp 2010

Page 3: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 3

Covariance

⇨Variables may change in relation to each other

⇨Covariance measures how much the movement in one variable predicts the movement in a corresponding variable

R F Riesenfeld Sp 2010

Page 4: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 4

Smoking and Lung Capacity

⇨ Example: investigate relationship between cigarette smoking and lung capacity

⇨ Data: sample group response data on smoking habits, and measured lung capacities, respectively

R F Riesenfeld Sp 2010

Page 5: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 5

Smoking v Lung Capacity Data

N Cigarettes (X ) Lung Capacity (Y )1 0 452 5 42

3 10 334 15 31

5 20 29

R F Riesenfeld Sp 2010

Page 6: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

6

Smoking and Lung Capacity

-5 0 5 10 15 20 2520

25

30

35

40

45

50

Lung Capacity (Y )

Smoking (yrs)

Lung

Cap

acity

Page 7: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 7

Smoking v Lung Capacity

⇨Observe that as smoking exposure goes up, corresponding lung capacity goes down⇨Variables covary inversely⇨Covariance and Correlation quantify relationship

R F Riesenfeld Sp 2010

Page 8: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 8

Covariance⇨Variables that covary inversely, like smoking

and lung capacity, tend to appear on opposite sides of the group means When smoking is above its group mean, lung

capacity tends to be below its group mean.⇨Average product of deviation measures extent to which variables covary, the degree of linkage between them

R F Riesenfeld Sp 2010

Page 9: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 9

The Sample Covariance

⇨Similar to variance, for theoretical reasons, average is typically computed using (N -1), not N . Thus,

1

1

1

N

i ii

xyS X X Y YN

R F Riesenfeld Sp 2010

Page 10: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 10

Cigs (X ) Lung Cap (Y )   

0 455 42

10 3315 3120 29

10 36

Calculating Covariance

R F Riesenfeld Sp 2010

X Y

Page 11: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 11

Calculating Covariance

Cigs (X )       Cap (Y )0 -10 -90 9 45

5 -5 -30 6 4210 0 0 -3 3315 5 -25 -5 3120 10 -70 -7 29

∑= -215R F Riesenfeld Sp 2010

( ) ( )X X Y Y ( )Y Y ( )X X

Page 12: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 12

Evaluation yields,

1( 215) 53.75

4xyS

Covariance Calculation (2)

R F Riesenfeld Sp 2010

Page 13: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 13

Covariance under Affine Transformation

.

,

.

i i i i

i i i i

ii

L aX b M cY d

l a x m c y

u u u

Let and Then,

,

where,

1

1

1

N

LM i ii

S l mN

Evaluating, in turn, gives,

R F Riesenfeld Sp 2010

Page 14: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 14

Covariance under Affine Transf (2)

1

1

1

1

1

1

1

1

1

N

LM i ii

N

i ii

N

i ii

S l mN

a x c yN

ac x yN

Evaluating further,

LM xyS acS

R F Riesenfeld Sp 2010

Page 15: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 15

(Pearson) Correlation Coefficient rxy

⇨Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data.

1

1

1

N

xy i ii

r zx zyN

R F Riesenfeld Sp 2010

Page 16: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 16

Alternative (common) Expression

xyxy

x y

sr

s s

R F Riesenfeld Sp 2010

Page 17: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 17

Computational Formula 1

1 1

1

1

1

N N

i iNi i

xy i ii

X Ys X Y

N N

R F Riesenfeld Sp 2010

Page 18: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 18

Computational Formula 2

2 22 2

N XY X Y

N X X N Y Yxyr

R F Riesenfeld Sp 2010

Page 19: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 19

Table for Calculating rxy

Cigs (X ) X 2 XY Y 2 Cap (Y )0 0 0 2025 45

5 25 210 1764 4210 100 330 1089 3315 225 465 961 3120 400 580 841 29

∑= 50 750 1585 6680 180

R F Riesenfeld Sp 2010

Page 20: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 20

Computing rxy from Table

2 2

5(1585) 50(180)

7925 9000

5(750 50 ) 5(6680) 180

3750 2500 33400 32400

xyr

R F Riesenfeld Sp 2010

Page 21: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 21

Computing Correlation

1075

1250 1000

0.9615

xy

xy

r

r

R F Riesenfeld Sp 2010

Page 22: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 22

Conclusion

⇨ rxy = -0.96 implies almost certainty smoker will have diminish lung capacity

⇨ Greater smoking exposure implies greater likelihood of lung damage

0.96xyr

R F Riesenfeld Sp 2010

Page 23: Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

CS5961 Comp Stat 23

End Covariance & Correlation

Notes

R F Riesenfeld Sp 2010