Introduction to measures of relationship: covariance, and Pearson r

Introduction to measures of relationship: covariance, and Pearson r Ivan Jacob Agaloos Pesigan

Outline

•  Defining correlation •  Describing relationships: Scatter plots •  Measuring relationships: Covariance •  Measuring relationships: Pearson r •  Measuring relationships: Issues •  Uses of Correlation

2

Defining correlation

Introduction to measures of relationship: covariance, and Pearson r

3


•  A correlation is a statistical method used to describe and measure the relationship between two variables.

•  A relationship exists when changes in one variable tend to be accompanied by consistent and predictable changes in the other variable.

4


Although you should not make too much of the distinction between relationships and differences (if treatments have different means, then means are related to treatments), the distinction is useful in terms of the interests of the experimenter and the structure of the experiment.

5

Describing Relationships: ��Scatter Plots


6

7

Child Vocabulary Digit Symbol 1 5 12 2 8 15 3 7 14 4 9 18 5 10 19 6 8 18 7 6 14 8 6 17 9 10 20 10 9 17 11 7 15 12 7 16 13 9 16 14 6 13 15 8 16

8

5 6 7 8 9 10

1214

1618

20

A strong, positive linear relationship

Vocabulary

Dig

it S

ymbo

l

9

Child Age Time to complete 1 7.5 21 2 6.5 43 3 7.5 26 4 6 37 5 7 25 6 6.5 34 7 5.5 35 8 7 33 9 5.5 44 10 6.5 35 11 7 41 12 5 41 13 5 50 14 5 56 15 5.5 51 16 6 40

10

5.0 5.5 6.0 6.5 7.0 7.5

2025

3035

4045

5055

A negative linear relationship

Age

Tim

e to

com

plet

e

Describing relationships: ��Scatter plots

•  Linear relationship •  Positive linear relationship •  Negative linear relationship •  Curvilinear relationship •  No linear relationship

11

Linear relationship

12

5 6 7 8 9 10

1214

1618

20

A strong, positive linear relationship

Vocabulary

Dig

it S

ymbo

l

5.0 5.5 6.0 6.5 7.0 7.5

2025

3035

4045

5055

A negative linear relationship

Age

Tim

e to

com

plet

e

Curvilinear Relationship

13

1 2 3 4 5

510

1520

A curvilinear relationship

Motivation

Tim

e to

Com

plet

e th

e Te

st

4.0 4.5 5.0 5.5 6.0 6.5

810

1214

1618

A negative, curved line relationship

Age

Num

ber o

f Err

ors

No linear relationship

14

60 80 100 120 140

5060

7080


IQ

Dep

ress

ion

Sco

re

Measuring relationships: Covariance


15


16

100 105 110 115 120 125

7075

8085

9095

100

IQ and final examination grade

IQ

Exam


•  The simplest way to look at whether two variables are associated is to look at whether they covary.

•  To understand what covariance is, we first need to think back to the concept of variance.

17


•  Remember that the variance of a single variable represents the average amount that the data vary from the mean.

18

variance = sX2 =

Xi−M

X( )2

∑nX−1

variance = sX2 =

Xi− X( )

2

∑nX−1


•  If we are interested in whether two variables are related, then we are interested in whether changes in one variable are met with similar changes in the other variable.

•  Therefore, when one variable deviates from its mean we would expect the other variable to deviate from its mean in a similar way or the directly opposite way.

19


20

2 4 6 8 10

100

105

110

115

120

125

IQ Scores per Student

Student

IQ S

core

s

2 4 6 8 10

7075

8085

9095

100

Final Exam Scores per Student

Student

Fina

l Exa

m S

core

s


21

100 105 110 115 120 125

05

1015

2025

30

IQ and Errors in the Final Examination

IQ

Exam


22

2 4 6 8 10

100

105

110

115

120

125


Student

IQ S

core

s

2 4 6 8 10

05

1015

2025

30

Final Exam Errors per Student

Student

Fina

l Exa

m E

rror

s

23

Student IQ Final Exam Score 1 97 71

2 98 71

3 108 74

4 114 84

5 115 86

6 119 86

7 120 87

8 121 90

9 122 96

10 127 99

24

Student IQ Final Exam Score 1 97 71

2 98 71

3 108 74

4 114 84

5 115 86

6 119 86

7 120 87

8 121 90

9 122 96

10 127 99

SUM 1141 844

MEAN 114.1 84.4

25

Student IQ X - Mx Final Exam

Score Y - MY

1 97 -17.1 71 -13.4

2 98 -16.1 71 -13.4

3 108 -6.1 74 -10.4

4 114 -0.1 84 -0.4

5 115 0.9 86 1.6

6 119 4.9 86 1.6

7 120 5.9 87 2.6

8 121 6.9 90 5.6

9 122 7.9 96 11.6

10 127 12.9 99 14.6

SUM 1141 844

MEAN 114.1 84.4


26

2 4 6 8 10

100

105

110

115

120

125


Student

IQ S

core

s

-17.1-16.1

-6.1

-0.10.9

4.95.9

6.97.9

12.9

2 4 6 8 10

7075

8085

9095

100

Final Exam Scores per Student

Student

Fina

l Exa

m S

core

s

-13.4 -13.4

-10.4

-0.4

1.6 1.62.6

5.6

11.6

14.6


•  When we multiply the deviations of one variable by the corresponding deviations of a second variable, we get what is known as the cross-product deviations.

•  As with the variance, if we want an average value of the combined deviations for the two variables, we must divide by the number of observations (n− 1)

27

28

Student IQ X - Mx

Final Exam Score

Y - MY (X - Mx) (Y – MY)

1 97 -17.1 71 -13.4 229.14

2 98 -16.1 71 -13.4 215.74

3 108 -6.1 74 -10.4 63.44

4 114 -0.1 84 -0.4 0.04

5 115 0.9 86 1.6 1.44

6 119 4.9 86 1.6 7.84

7 120 5.9 87 2.6 15.34

8 121 6.9 90 5.6 38.64

9 122 7.9 96 11.6 91.64

10 127 12.9 99 14.6 188.34

SUM 1141 844 851.6

MEAN 114.1 84.4 cov = 94.62


•  This averaged sum of combined deviations is known as the covariance.

29

covariance = covX ,Y

=Xi−M

X( ) Yi −MY( )∑n−1


=Xi− X( ) Yi −Y( )∑n−1


30


=Xi-M

X( ) Yi -MY( )∑n -1

covX ,Y

=851.610 -1

=851.69

≈ 94.622


•  Calculating the covariance is a good way to assess whether two variables are related to each other.

•  A positive covariance indicates that as one variable deviates from the mean, the other variable deviates in the same direction.

•  On the other hand, a negative covariance indicates that as one variable deviates from the mean (e.g. increases), the other deviates from the mean in the opposite direction (e.g. decreases).

31


•  There is, however, one problem with covariance as a measure of the relationship between variables and that is that it depends upon the scales of measurement used.

•  So, covariance is not a standardized measure.

32


•  For example, if we use the data above and assume that they represented two variables measured in miles then the covariance is 94.62 (as calculated above). If we then convert these data into kilometers (by multiplying all values by 1.609) and calculate the covariance again then we should find that it increases to 244.97.

33


•  This dependence on the scale of measurement is a problem because it means that we cannot compare covariances in an objective way – so, we cannot say whether a covariance is particularly large or small relative to another data set unless both data sets were measured in the same units.

34

Measuring Relationships: ��Pearson r


35

Measuring relationships: Pearson r

•  To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units.

•  This process is known as standardization.

36


•  A very basic form of standardization would be to insist that all experiments use the same units of measurement, say meters – that way, all results could be easily compared.

•  However, what happens if you want to measure attitudes – you’d be hard pushed to measure them in meters!

37


•  Therefore, we need a unit of measurement into which any scale of measurement can be converted. The unit of measurement we use is the standard deviation.

•  If we divide any distance from the mean by the standard deviation, it gives us that distance in standard deviation units.

38


39

zX=Xi−M

X

sX

=Xi− X

sX

zY=Yi−M

Y

sY

=Yi−Y

sY

Measuring relationships: Pearson r •  Since the properties of z scores form the foundation

necessary for understanding the Pearson product moment correlation coefficient (r) they will be briefly reviewed: 1.  The sum of a set of z score (Σ z) and therefore

also the mean equal 0. 2.  The variance (s2) of the set of z scores equals 1, as

does the standard deviation (s).

3.  Neither the shape of the distribution of X, nor of its relationship to any other variable, is affected by transforming it to zX

40


41

rX ,Y

=zXzY∑

n−1

zX=Xi−M

X

sX

=Xi− X

sX

zY=Yi−M

Y

sY

=Yi−Y

sY

then

rX ,Y

=Xi−M

X( ) Yi −MY( )∑sXsY( ) n−1( )

=Xi− X( ) Yi −Y( )∑sXsY( ) n−1( )

where

covX ,Y

=Xi−M

X( ) Yi −MY( )∑n−1( )

=Xi− X( ) Yi −Y( )∑n−1( )

then

rX ,Y

=cov

X ,Y

sXsY

rX ,Y

=covariability of X and Y

variability of X and Y separately

rX ,Y

=degree to which X and Y vary together

degree to which X and Y vary separately

42

Student IQ zx = (X – Mx)/sx

Final Exam Score

zy = (Y – MY)/sy

zxzy

1 97 -1.69 71 -1.37 2.31

2 98 -1.59 71 -1.37 2.18

3 108 -0.60 74 -1.06 0.64

4 114 -0.01 84 -0.04 0.00

5 115 0.09 86 0.16 0.01

6 119 0.48 86 0.16 0.08

7 120 0.58 87 0.27 0.15

8 121 0.68 90 0.57 0.39

9 122 0.78 96 1.19 0.93

10 127 1.27 99 1.49 1.90

SUM 1141 844 8.60

MEAN 114.1 s = 10.14 84.4 s = 9.77 r = 0.96

43

Student IQ X - Mx

Final Exam Score

Y - MY (X - Mx) (Y – MY)

1 97 -17.1 71 -13.4 229.14

2 98 -16.1 71 -13.4 215.74

3 108 -6.1 74 -10.4 63.44

4 114 -0.1 84 -0.4 0.04

5 115 0.9 86 1.6 1.44

6 119 4.9 86 1.6 7.84

7 120 5.9 87 2.6 15.34

8 121 6.9 90 5.6 38.64

9 122 7.9 96 11.6 91.64

10 127 12.9 99 14.6 188.34

SUM 1141 844 851.6

MEAN 114.1 s = 10.14 84.4 s = 9.77 cov = 94.62


44

rX ,Y

=cov

X ,Y

sXsY

=94.622

10.14( ) 9.77( )

rX ,Y

=94.622

99.00331457= 0.96


45

radj=1−

1− r2( ) n−1( )n− 2

Measuring relationships: Pearson r There are three general strategies for determining the size of the population effect which a research is trying to detect:

1.  To the extent that studies which have been carried out by the current investigator or others are closely related to the present investigation, the ESs found in these studies reflect the magnitude which can be expected.

2.  In some research areas an investigator may posit some minimum population effect that would have either practical of theoretical significance.

3.  A third strategy in deciding what ES values to use in determining the power of a study is to use certain suggested conventional definitions of “small”, “medium”, and “large” effects.

46


47

Size of Effect / Magnitude r r2 (% of variance)

small 0.1 0.01 (1%)

medium 0.3 0.09 (9%)

large 0.5 0.25 (25%)


r = +1.00 r = -1.00

48

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Perfect positive linear relationship

X

Y

-2 -1 0 1 2 3

-3-2

-10

12

Perfect negative linear relationship

X

Y


r = .80 r = .30

49

-3 -2 -1 0 1 2

-2-1

01

23

Strong positive linear relationship

X

Y

-2 -1 0 1 2 3

-3-2

-10

12

Moderate positive linear relationship

X

Y


r = .10 r = 0

50

-2 -1 0 1 2 3

-3-2

-10

12


X

Y

-3 -2 -1 0 1 2

-3-2

-10

12

Weak positive linear relationship

X

Y

Measuring relationships: Pearson r •  Pearson Product Moment Correlation Coefficient:

–  It is a pure number and independent of the units of measurement. –  Its absolute value varies between zero, when the variables have no

linear relationship, and 1, when each variable is perfectly predicted by the other. The absolute value thus gives the degree of relationship.

–  Its sign indicates the direction of the relationship. A positive sign indicates a tendency for high values of one variable to occur with high values of the other, and low values to occur with low. A negative sign indicates a tendency for high values of one variable to be associated with low values of the other. Reversing the direction of measurement of one of the variables will produce a coefficient of the same value bur of opposite sign. Coefficients with equal value but opposite sign (for example, +.50 and -.50) thus indicate equally strong linear relationship, but in opposite directions.

51

Measuring Relationships: Issues


52

-2 -1 0 1 2

-3-2

-10

12

3

Restriction of range

X

Y

Restriction of Range

53

Heterogeneous Subsamples

62 64 66 68 70 72 74

100

120

140

160

180

200

Relationship of Height and WeightBlue Line = Males; Red Line = Females; Black Line = Overall

Height

Weight

malefemale

54

Outliers

55

Correlation and Causation

56

Uses of Correlation


57

Uses of Correlation

•  Prediction. If two variables are known to be related in a systematic way, then it is possible to use one of the variables to make accurate predictions about the other.

58

Uses of Correlation

•  Validity. Suppose that a psychologist develops a new test for measuring intelligence. How could you show that this test truly measures what it claims; that is, how could you demonstrate the validity of the test? One common technique for demonstrating validity is to use a correlation.

59

Uses of Correlation

•  Reliability. In addition to evaluating the validity of a measurement procedure, correlations are used to determine reliability. A measurement procedure is considered reliable to the extent that it produces stable, consistent measurements.

60

Uses of Correlation

•  Theory Verification. Many psychological theories make specific predictions about the relationship between two variables.

61

People behind the concepts Scatter plot

Bivariate plot Pearson Product Moment

Correlation Coefficient

62

Slides Prepared by: Ivan Jacob Agaloos Pesigan Lecturer, Psychology Department

Ateneo de Manila University Assistant Professor Lecturer, Psychology Department

De La Salle University Lecturer, College of Education, Psychology Department, Mathematics Department Miriam College

63