37
20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation Coefficient. Reviewed 05 June 05/MODULE 20

20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

Embed Size (px)

Citation preview

Page 1: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 1

Module 20: Correlation

This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation Coefficient.

Reviewed 05 June 05/MODULE 20

Page 2: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 2

Correlation In Module 19, we examined how two variables, x and y, relate to each other by using the simple linear regression tool. In that context, x was the independent variable and y was the dependent variable. Typical examples for the independent variable include measures of time, including age; whereas, typical examples for the dependent variable are continuous measurements such as blood cholesterol level. The general assumption is that there are separate normal distributions of the dependent variable y for each value of the independent variable x. Further, we need to assume that these separate normal distributions for the dependent variable all have the same population variance.

Page 3: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 3

Clearly these assumptions are quite restrictive in that we are often interested in the relationship between two variables, x and y, where it is not at all clear which should be labeled the independent variable and which the dependent one. An example is the relationship between blood cholesterol level and blood pressure level.

Page 4: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 4

For this situation, we have another tool to measure and test hypotheses about the relationship between these two variables. The tool is correlation and we focus here only on what is usually called the Pearson Product Moment Correlation Coefficient. There are other measures of correlation which we will not discuss here. There are restrictions for the use of this correlation tool as well, which include the basic assumption that the x and y variables together have a joint frequency distribution which is called the bivariate normal distribution. This distribution looks like a three-dimensional bell in a manner similar to the way a normal distribution for one variable looks like a cross section of a bell.

Page 5: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 5

The degree of association or correlation between two variables is measured by the correlation coefficient. This is done in a manner similar to that for other population parameters and estimates of these parameters obtained by calculating statistics from samples. That is, there is a value for the population parameter for this coefficient which is estimated by selecting a random sample and calculating the appropriate coefficient using the data from this sample. We can also use the information from the sample to test hypotheses about the population. 

Page 6: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 6

2 2

( )( )

( ) ( )

x yxy

x y

x u y u

x u y u

2 2

2 2 2 2

( )( )

( ) ( )

( )( ) / =

[ ( ) / ][ ( ) / ]

( )

( ) ( )

xy

x x y yr

x x y y

xy x y n

x x n y y n

SS xy

SS x SS y

The population parameter for the Pearson Product Moment Correlation Coefficient is defined as

which is typically called Rho, for the Greek letter it represents. The estimate of ρ calculated from the sample data is the statistic

Page 7: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 7

Patient x = Fecal Fat y = Urinary Oxalate 1 16 312 14 403 38 414 8 705 15 756 22 657 28 858 27 1159 14 128

10 45 14511 46 140

Sum 273 935Mean 24.8 85.0

Fecal Fat (g/24 hr) and Urinary Oxalate (mg/24 hr) secreted by a random sample of n = 11 persons

Fecal Fat and Urinary Oxalate Example

Page 8: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 8

0

20

40

60

80

100

120

140

160

0 10 20 30 40 50

Fecal Fat (g/24hr)

Uri

nar

y O

xal

ate

(mg/2

4h

r)

Scatter plot for Fecal Fat and Urinary Oxalate Data

Page 9: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 9

Person x x2y y2

xy1 16 256 31 961 4962 14 196 40 1,600 5603 38 1,444 41 1,681 1,5584 8 64 70 4,900 5605 15 225 75 5,625 1,1256 22 484 65 4,225 1,4307 28 784 85 7,225 2,3808 27 729 115 13,225 3,1059 14 196 128 16,384 1,792

10 45 2,025 145 21,025 6,52511 46 2,116 140 19,600 6,440

Sum 273 8,519 935 96,451 25,971Mean 24.8 85.0

Sum2/n 6,775.36 79,475.00SS 1,743.64 16,976.00 2,766.00

Variance 174.36 1,697.60SD 13.2 41.2

Fecal Fat Urinary Oxalate

Calculations for Regressing Urinary Oxalate on Fecal Fat

Page 10: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 10

85.0

24.8

( ) 1,743.64

( ) 16,976.00

( ) 2,766.00

y

x

SS x

SS y

SS xy

Regression Tools

Page 11: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 11

( ) 2,766.001.59

( ) 1,743.64

85.0 1.59(24.8) 45.54

SS xySlope b

SS x

Intercept a y bx

ˆ

ˆ 45.54 1.59

y a bx

y x

At x = 40, the regression estimate for y is:

ˆ 45.54 1.59(40) 109.14y

So we can calculate

The straight line depicting the regression relationship of y on x is

Page 12: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 12

With this information, we can add the regression line

xy 59.154.45ˆ to the scatter plot, as shown below.

0

20

40

60

80

100

120

140

160

0 20 40 60

Fecal Fat (g/24hr)

Urin

ary

Ox

ala

te (

mg

/24

hr)

09.1174559.154.45ˆ xy

Page 13: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 13

1. The hypothesis: H0: β = 0 vs. H1: β ≠0

2. The level: = 0.05

3. The assumptions: Random normal samples for y-variable from populations defined by x-variable 

4. The test statistic: ANOVA as specified by

Test for regression of Urinary Oxalate on Fecal Fat

ANOVA Source df SS MS F Regression 1 bSS (xy) SS(Reg)/1 MS(Reg)/MS(Res) Residual n-2 SS(Res)a SS(Res)/(n-2) Total n-1 SS (y)

a SS(Residual) = SS(y) – SS(Regression)

Page 14: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 14

5. The critical region: Reject H0: β = 0 if the value

calculated for F > F0.95(1, 9) = 5.12

6. The result: SS(Reg) = bSS(xy) = 1.59 (2,766.00) = 4,397.94

SS(Total) = SS(y) = 16,976.00

SS(Res) = 16,976.00 – 4,397.94 = 12,578.06

7. The conclusion: Accept H0: β = 0 since F < 5.12

ANOVA Source df SS MS F Regression 1 4,397.94 4,397.94 3.15 Residual 9 12,578.06 1,397.56 Total 10 16,976.00

Page 15: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 15

y = 45.54+1.59x

R2 = 0.2585

0

20

40

60

80

100

120

140

160

0 10 20 30 40 50 60

Fecal Fat (g/24hr)

Uri

nary

Oxa

late

(mg/

24hr

)

(45, 85)

(x, y)

(45,145)

)45(59.154.45ˆ y

6085145 yy

3285117 A

28117145 B

y = Regression estimate = 45.54 + 1.59x = 117

C = A + B = yy Total deviation SS(y)

A = yy ˆ Explained by line SS(Reg)

B = yy ˆ Left over SS(Residual)

45.54

Page 16: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 16

Correlation Tools

( ) 1,743.64

( ) 16,976.00

( ) 2,766.00

SS x

SS y

SS xy

The estimate of the correlation coefficient is:

( )

( ) ( )xy

SS xyr

SS x SS y

2,766.000.5084

[1,743.64][16,976.00]xyr

Page 17: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 17

r measures linear association

r has values between -1 ≤ r ≤ + 1

r + 1 implies strong positive linear association

r - 1 implies strong negative linear association

r 0 implies no linear association

The Correlation Coefficient

Page 18: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 18

..

x

y

.

..

..

.

...

.

.

.

x

y

.. .

..

.

.

.

..

r +1 r -1

. .

x

y

... .

. . ....

..

x

y

..

.

.

.

.

.

... .

r 0 r +1

..

x

y

..

.

.

.. .

...

.

..

r 0 r > 0

.

.

x

y.

.

.

.. .

..

..

.

.

..

.

..

Page 19: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 19

22 ( )

½2

1n

nt r t

r

Note that this calculation requires only the sample estimate r of the correlation coefficient ρ and the sample size n and that we need to use the t distribution with n - 2 degrees of freedom.

Correlation Hypothesis Testing

The hypothesis of interest deals with whether there is linear association between x and y. If there is no such association, we would have = 0. Hence, the hypotheses of interest are: H0: = 0 vs. H1: 0

which we can test by using the test statistic:

Page 20: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 20

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

22 ( )

½2

1n

nt r t

r

Test of Correlation between Urinary Oxalate and Fecal Fat, n = 11, r = 0.5084

Page 21: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 21

5. The critical region: Reject H0: = 0 if the value

calculated for t is not between ± t0.975(9) = 2.262

6. The result: r = 0.5084, n = 11

7. The conclusion: Accept H0: = 0 since t = 1.77 is between ± t0.975(9) = 2.262

2

½ ½9 9

0.5084 0.5084 1.771 0.25851 (0.5084)

t

Page 22: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 22

21

2

r

nrt

Test of Correlation for Tono-Pen vs. Goldman intraocular pressure, n = 40, r = 0.6574

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

Page 23: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 23

5. The rejection region: Reject H0: = 0 , if t is not

between t0.975(38) = 2.02

6. The result: n = 40, r = 0.6574, r2 = 0.44,

7. The conclusion: Reject H0: = 0 since t = 5.44 is not between 2.02

44.556.0

3866.0

6574.01

386574.0

2

t

Page 24: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 24

Example : AJPH, 1995; 85: 1397-1401

Page 25: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 25Source: AJPH, 1995; 85: 1397-1401

Page 26: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 26

21

2

r

nrt

Test of Correlation between infant mortality rate and gross domestic product, n = 17, r = -0.64

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

Page 27: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 27

5. The rejection region: Reject H0: = 0 , if t is not

between t0.975(15) = 2.13

6. The result: n = 17, r = -0.64

7. The conclusion: Reject H0: = 0 since t = -3.23 is not between t0.975(15) = 2.13

23.359.0

1564.0

)64.0(1

1564.0

2

t

Page 28: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 28

Test of correlation hypothesis for life expectancy for males and females, n = 17, r = 0.67

21

2

r

nrt

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

Page 29: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 29

5. The rejection region: Reject H0: = 0 , if t is not

between t0.975(15) = 2.13

6. The result: n = 17, r = 0.67

7. The conclusion: Reject H0: = 0 since t = 3.49 is not between t0.975 (15) = 2.13

2

15 150.67 0.67 3.49

1 0.67 1 0.45t

Page 30: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 30

Example : AJPH, 1997; 87: 1491-1498

Page 31: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 31

Page 32: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 32

Correlation between Mortality and Social Mistrust, n = 39, r = 0.79

21

2

r

nrt

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

Page 33: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 33

5. The rejection region: Reject H0: = 0 , if t is not

between t0.975(37) = 2.02

6. The result: n = 39, r = 0.79

7. The conclusion: Reject H0: = 0 since t = 7.8  is not between

t0.975 (37) = 2.02

8.73759.0

3779.0

79.01

3779.0

2

t

Page 34: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 34

Example : AJPH, 1998; 88: 1496-1502

Page 35: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 35

Source: AJPH, 1998; 88: 1496-1502

Note: very few outliers can have a large impact on the location of the line

Page 36: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 36

Test for Correlation between Gonorrhea rate and Chlamydia rate

21

2

r

nrt

1. The hypothesis: H0: = 0 vs. H1: ≠ 0

2. The level: = 0.05

3. The assumptions: Random sample from bivariate normal distribution.

4. The test statistic:

Page 37: 20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation

20 - 37

5. The rejection region: Reject H0: = 0 , if t is not

between t0.975(320) 2.00

6. The result: n = 322, r = 0.83

7. The conclusion: Reject H0: = 0 since  t = 26.67 is not between

t0.975(320) = 2.00

2

320 3200.83 0.83 26.67

1 0.83 1 0.69t