Upload
ivan-pesigan
View
310
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Introduction to measures of relationship: covariance, and Pearson r Ivan Jacob Agaloos Pesigan
Outline
• Defining correlation • Describing relationships: Scatter plots • Measuring relationships: Covariance • Measuring relationships: Pearson r • Measuring relationships: Issues • Uses of Correlation
2
Defining correlation
Introduction to measures of relationship: covariance, and Pearson r
3
Defining correlation
• A correlation is a statistical method used to describe and measure the relationship between two variables.
• A relationship exists when changes in one variable tend to be accompanied by consistent and predictable changes in the other variable.
4
Defining correlation
Although you should not make too much of the distinction between relationships and differences (if treatments have different means, then means are related to treatments), the distinction is useful in terms of the interests of the experimenter and the structure of the experiment.
5
Describing Relationships: ���Scatter Plots
Introduction to measures of relationship: covariance, and Pearson r
6
7
Child Vocabulary Digit Symbol 1 5 12 2 8 15 3 7 14 4 9 18 5 10 19 6 8 18 7 6 14 8 6 17 9 10 20 10 9 17 11 7 15 12 7 16 13 9 16 14 6 13 15 8 16
8
5 6 7 8 9 10
1214
1618
20
A strong, positive linear relationship
Vocabulary
Dig
it S
ymbo
l
9
Child Age Time to complete 1 7.5 21 2 6.5 43 3 7.5 26 4 6 37 5 7 25 6 6.5 34 7 5.5 35 8 7 33 9 5.5 44 10 6.5 35 11 7 41 12 5 41 13 5 50 14 5 56 15 5.5 51 16 6 40
10
5.0 5.5 6.0 6.5 7.0 7.5
2025
3035
4045
5055
A negative linear relationship
Age
Tim
e to
com
plet
e
Describing relationships: ���Scatter plots
• Linear relationship • Positive linear relationship • Negative linear relationship • Curvilinear relationship • No linear relationship
11
Linear relationship
12
5 6 7 8 9 10
1214
1618
20
A strong, positive linear relationship
Vocabulary
Dig
it S
ymbo
l
5.0 5.5 6.0 6.5 7.0 7.5
2025
3035
4045
5055
A negative linear relationship
Age
Tim
e to
com
plet
e
Curvilinear Relationship
13
1 2 3 4 5
510
1520
A curvilinear relationship
Motivation
Tim
e to
Com
plet
e th
e Te
st
4.0 4.5 5.0 5.5 6.0 6.5
810
1214
1618
A negative, curved line relationship
Age
Num
ber o
f Err
ors
No linear relationship
14
60 80 100 120 140
5060
7080
No linear relationship
IQ
Dep
ress
ion
Sco
re
Measuring relationships: Covariance
Introduction to measures of relationship: covariance, and Pearson r
15
Measuring relationships: Covariance
16
100 105 110 115 120 125
7075
8085
9095
100
IQ and final examination grade
IQ
Exam
Measuring relationships: Covariance
• The simplest way to look at whether two variables are associated is to look at whether they covary.
• To understand what covariance is, we first need to think back to the concept of variance.
17
Measuring relationships: Covariance
• Remember that the variance of a single variable represents the average amount that the data vary from the mean.
18
variance = sX2 =
Xi−M
X( )2
∑nX−1
variance = sX2 =
Xi− X( )
2
∑nX−1
Measuring relationships: Covariance
• If we are interested in whether two variables are related, then we are interested in whether changes in one variable are met with similar changes in the other variable.
• Therefore, when one variable deviates from its mean we would expect the other variable to deviate from its mean in a similar way or the directly opposite way.
19
Measuring relationships: Covariance
20
2 4 6 8 10
100
105
110
115
120
125
IQ Scores per Student
Student
IQ S
core
s
2 4 6 8 10
7075
8085
9095
100
Final Exam Scores per Student
Student
Fina
l Exa
m S
core
s
Measuring relationships: Covariance
21
100 105 110 115 120 125
05
1015
2025
30
IQ and Errors in the Final Examination
IQ
Exam
Measuring relationships: Covariance
22
2 4 6 8 10
100
105
110
115
120
125
IQ Scores per Student
Student
IQ S
core
s
2 4 6 8 10
05
1015
2025
30
Final Exam Errors per Student
Student
Fina
l Exa
m E
rror
s
23
Student IQ Final Exam Score 1 97 71
2 98 71
3 108 74
4 114 84
5 115 86
6 119 86
7 120 87
8 121 90
9 122 96
10 127 99
24
Student IQ Final Exam Score 1 97 71
2 98 71
3 108 74
4 114 84
5 115 86
6 119 86
7 120 87
8 121 90
9 122 96
10 127 99
SUM 1141 844
MEAN 114.1 84.4
25
Student IQ X - Mx Final Exam
Score Y - MY
1 97 -17.1 71 -13.4
2 98 -16.1 71 -13.4
3 108 -6.1 74 -10.4
4 114 -0.1 84 -0.4
5 115 0.9 86 1.6
6 119 4.9 86 1.6
7 120 5.9 87 2.6
8 121 6.9 90 5.6
9 122 7.9 96 11.6
10 127 12.9 99 14.6
SUM 1141 844
MEAN 114.1 84.4
Measuring relationships: Covariance
26
2 4 6 8 10
100
105
110
115
120
125
IQ Scores per Student
Student
IQ S
core
s
-17.1-16.1
-6.1
-0.10.9
4.95.9
6.97.9
12.9
2 4 6 8 10
7075
8085
9095
100
Final Exam Scores per Student
Student
Fina
l Exa
m S
core
s
-13.4 -13.4
-10.4
-0.4
1.6 1.62.6
5.6
11.6
14.6
Measuring relationships: Covariance
• When we multiply the deviations of one variable by the corresponding deviations of a second variable, we get what is known as the cross-product deviations.
• As with the variance, if we want an average value of the combined deviations for the two variables, we must divide by the number of observations (n− 1)
27
28
Student IQ X - Mx
Final Exam Score
Y - MY (X - Mx) (Y – MY)
1 97 -17.1 71 -13.4 229.14
2 98 -16.1 71 -13.4 215.74
3 108 -6.1 74 -10.4 63.44
4 114 -0.1 84 -0.4 0.04
5 115 0.9 86 1.6 1.44
6 119 4.9 86 1.6 7.84
7 120 5.9 87 2.6 15.34
8 121 6.9 90 5.6 38.64
9 122 7.9 96 11.6 91.64
10 127 12.9 99 14.6 188.34
SUM 1141 844 851.6
MEAN 114.1 84.4 cov = 94.62
Measuring relationships: Covariance
• This averaged sum of combined deviations is known as the covariance.
29
covariance = covX ,Y
=Xi−M
X( ) Yi −MY( )∑n−1
covariance = covX ,Y
=Xi− X( ) Yi −Y( )∑n−1
Measuring relationships: Covariance
30
covariance = covX ,Y
=Xi-M
X( ) Yi -MY( )∑n -1
covX ,Y
=851.610 -1
=851.69
≈ 94.622
Measuring relationships: Covariance
• Calculating the covariance is a good way to assess whether two variables are related to each other.
• A positive covariance indicates that as one variable deviates from the mean, the other variable deviates in the same direction.
• On the other hand, a negative covariance indicates that as one variable deviates from the mean (e.g. increases), the other deviates from the mean in the opposite direction (e.g. decreases).
31
Measuring relationships: Covariance
• There is, however, one problem with covariance as a measure of the relationship between variables and that is that it depends upon the scales of measurement used.
• So, covariance is not a standardized measure.
32
Measuring relationships: Covariance
• For example, if we use the data above and assume that they represented two variables measured in miles then the covariance is 94.62 (as calculated above). If we then convert these data into kilometers (by multiplying all values by 1.609) and calculate the covariance again then we should find that it increases to 244.97.
33
Measuring relationships: Covariance
• This dependence on the scale of measurement is a problem because it means that we cannot compare covariances in an objective way – so, we cannot say whether a covariance is particularly large or small relative to another data set unless both data sets were measured in the same units.
34
Measuring Relationships: ���Pearson r
Introduction to measures of relationship: covariance, and Pearson r
35
Measuring relationships: Pearson r
• To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units.
• This process is known as standardization.
36
Measuring relationships: Pearson r
• A very basic form of standardization would be to insist that all experiments use the same units of measurement, say meters – that way, all results could be easily compared.
• However, what happens if you want to measure attitudes – you’d be hard pushed to measure them in meters!
37
Measuring relationships: Pearson r
• Therefore, we need a unit of measurement into which any scale of measurement can be converted. The unit of measurement we use is the standard deviation.
• If we divide any distance from the mean by the standard deviation, it gives us that distance in standard deviation units.
38
Measuring relationships: Pearson r
39
zX=Xi−M
X
sX
=Xi− X
sX
zY=Yi−M
Y
sY
=Yi−Y
sY
Measuring relationships: Pearson r • Since the properties of z scores form the foundation
necessary for understanding the Pearson product moment correlation coefficient (r) they will be briefly reviewed: 1. The sum of a set of z score (Σ z) and therefore
also the mean equal 0. 2. The variance (s2) of the set of z scores equals 1, as
does the standard deviation (s).
3. Neither the shape of the distribution of X, nor of its relationship to any other variable, is affected by transforming it to zX
40
Measuring relationships: Pearson r
41
rX ,Y
=zXzY∑
n−1
zX=Xi−M
X
sX
=Xi− X
sX
zY=Yi−M
Y
sY
=Yi−Y
sY
then
rX ,Y
=Xi−M
X( ) Yi −MY( )∑sXsY( ) n−1( )
=Xi− X( ) Yi −Y( )∑sXsY( ) n−1( )
where
covX ,Y
=Xi−M
X( ) Yi −MY( )∑n−1( )
=Xi− X( ) Yi −Y( )∑n−1( )
then
rX ,Y
=cov
X ,Y
sXsY
rX ,Y
=covariability of X and Y
variability of X and Y separately
rX ,Y
=degree to which X and Y vary together
degree to which X and Y vary separately
42
Student IQ zx = (X – Mx)/sx
Final Exam Score
zy = (Y – MY)/sy
zxzy
1 97 -1.69 71 -1.37 2.31
2 98 -1.59 71 -1.37 2.18
3 108 -0.60 74 -1.06 0.64
4 114 -0.01 84 -0.04 0.00
5 115 0.09 86 0.16 0.01
6 119 0.48 86 0.16 0.08
7 120 0.58 87 0.27 0.15
8 121 0.68 90 0.57 0.39
9 122 0.78 96 1.19 0.93
10 127 1.27 99 1.49 1.90
SUM 1141 844 8.60
MEAN 114.1 s = 10.14 84.4 s = 9.77 r = 0.96
43
Student IQ X - Mx
Final Exam Score
Y - MY (X - Mx) (Y – MY)
1 97 -17.1 71 -13.4 229.14
2 98 -16.1 71 -13.4 215.74
3 108 -6.1 74 -10.4 63.44
4 114 -0.1 84 -0.4 0.04
5 115 0.9 86 1.6 1.44
6 119 4.9 86 1.6 7.84
7 120 5.9 87 2.6 15.34
8 121 6.9 90 5.6 38.64
9 122 7.9 96 11.6 91.64
10 127 12.9 99 14.6 188.34
SUM 1141 844 851.6
MEAN 114.1 s = 10.14 84.4 s = 9.77 cov = 94.62
Measuring relationships: Pearson r
44
rX ,Y
=cov
X ,Y
sXsY
=94.622
10.14( ) 9.77( )
rX ,Y
=94.622
99.00331457= 0.96
Measuring relationships: Pearson r
45
radj=1−
1− r2( ) n−1( )n− 2
Measuring relationships: Pearson r There are three general strategies for determining the size of the population effect which a research is trying to detect:
1. To the extent that studies which have been carried out by the current investigator or others are closely related to the present investigation, the ESs found in these studies reflect the magnitude which can be expected.
2. In some research areas an investigator may posit some minimum population effect that would have either practical of theoretical significance.
3. A third strategy in deciding what ES values to use in determining the power of a study is to use certain suggested conventional definitions of “small”, “medium”, and “large” effects.
46
Measuring relationships: Pearson r
47
Size of Effect / Magnitude r r2 (% of variance)
small 0.1 0.01 (1%)
medium 0.3 0.09 (9%)
large 0.5 0.25 (25%)
Measuring relationships: Pearson r
r = +1.00 r = -1.00
48
-3 -2 -1 0 1 2 3
-3-2
-10
12
3
Perfect positive linear relationship
X
Y
-2 -1 0 1 2 3
-3-2
-10
12
Perfect negative linear relationship
X
Y
Measuring relationships: Pearson r
r = .80 r = .30
49
-3 -2 -1 0 1 2
-2-1
01
23
Strong positive linear relationship
X
Y
-2 -1 0 1 2 3
-3-2
-10
12
Moderate positive linear relationship
X
Y
Measuring relationships: Pearson r
r = .10 r = 0
50
-2 -1 0 1 2 3
-3-2
-10
12
No linear relationship
X
Y
-3 -2 -1 0 1 2
-3-2
-10
12
Weak positive linear relationship
X
Y
Measuring relationships: Pearson r • Pearson Product Moment Correlation Coefficient:
– It is a pure number and independent of the units of measurement. – Its absolute value varies between zero, when the variables have no
linear relationship, and 1, when each variable is perfectly predicted by the other. The absolute value thus gives the degree of relationship.
– Its sign indicates the direction of the relationship. A positive sign indicates a tendency for high values of one variable to occur with high values of the other, and low values to occur with low. A negative sign indicates a tendency for high values of one variable to be associated with low values of the other. Reversing the direction of measurement of one of the variables will produce a coefficient of the same value bur of opposite sign. Coefficients with equal value but opposite sign (for example, +.50 and -.50) thus indicate equally strong linear relationship, but in opposite directions.
51
Measuring Relationships: Issues
Introduction to measures of relationship: covariance, and Pearson r
52
-2 -1 0 1 2
-3-2
-10
12
3
Restriction of range
X
Y
Restriction of Range
53
Heterogeneous Subsamples
62 64 66 68 70 72 74
100
120
140
160
180
200
Relationship of Height and WeightBlue Line = Males; Red Line = Females; Black Line = Overall
Height
Weight
malefemale
54
Outliers
55
Correlation and Causation
56
Uses of Correlation
Introduction to measures of relationship: covariance, and Pearson r
57
Uses of Correlation
• Prediction. If two variables are known to be related in a systematic way, then it is possible to use one of the variables to make accurate predictions about the other.
58
Uses of Correlation
• Validity. Suppose that a psychologist develops a new test for measuring intelligence. How could you show that this test truly measures what it claims; that is, how could you demonstrate the validity of the test? One common technique for demonstrating validity is to use a correlation.
59
Uses of Correlation
• Reliability. In addition to evaluating the validity of a measurement procedure, correlations are used to determine reliability. A measurement procedure is considered reliable to the extent that it produces stable, consistent measurements.
60
Uses of Correlation
• Theory Verification. Many psychological theories make specific predictions about the relationship between two variables.
61
People behind the concepts Scatter plot
Bivariate plot Pearson Product Moment
Correlation Coefficient
62
Slides Prepared by: Ivan Jacob Agaloos Pesigan Lecturer, Psychology Department
Ateneo de Manila University Assistant Professor Lecturer, Psychology Department
De La Salle University Lecturer, College of Education, Psychology Department, Mathematics Department Miriam College
63