Upload
kami-ali
View
220
Download
0
Embed Size (px)
DESCRIPTION
notes
Citation preview
COMPILED BY
MUTHAMA, JAPHETH MUTINDA
CORRELATION
INTRODUCTION
Objectives of the presentation
After going through this presentation, the listener is expected to:
1. Be able to present the results of analysed research data.
2. Make effective interpretation of the relationship between research variables
3. Draw implications or inferences from the variables in the study model
Definition
Correlation (r) is the statistical measure of how two Variables move in
relation to each other.
It measures the relative strength of the relationship between two
variables
Correlation is computed into what is known as the correlation
coefficient, which ranges between -1 and +1.
Coefficient of correlation
Coefficient of correlation is the technique of determining the degree of
correlation between two or more variables in different values of the study
variables
The correlation, if any, found through this approach is applied in a
statistical method to deal with the formulation of mathematical model
depicting relationship amongst variables which can be used for the purpose
of prediction of the values of dependent variable, given the values of the
independent variable
The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y.
-1 < r < +1
r = 0 : Indicates no linear relationship between the research variables
-1 < r < +1
The + and – signs are used for explaining the positive linear correlations and negative linear correlations respectively
Coefficient of Correlation Analysis
Strong negative
relationship Strong positive
relationship
Interpreting Correlation Coefficient (r)
1) Strong correlation: r > 0.70 or r < –0.70
2) Moderate correlation: r is between 0.30 and 0.70
or r is between –0.30 and –0.70
3) Weak correlation: r is between 0 and 0.30 or r is between 0 and –0.30 .
Methods of studying Correlation
Correlation can be determined by use of the following method;
1. A Scatter Diagram Method
2. Karl Pearson Coefficient Correlation of Method
3. Spearman’s Rank Correlation Method
SCATTER DIAGRAMS
This is a graph in which the individual data points are plotted in two-dimensions as
presented below;
Very good fit Moderate fit
Points clustered closely around a line show a strong correlation. The line is a good
predictor (good fit) with the data. The more spread out the points, the weaker the
correlation, and the less good the fit.
The line is a REGRESSSION line (Y = a + bX)
Strong relationship simply means a good linear fit
Coefficient of determination and the regression line
NOTE:
1. The coefficient of determination is a measure of how well the regression
line represents the data and therefore represents the percent of the data that
is the closest to the line of best fit
2. If the regression line passes exactly through every point on the scatter
plot, it would be able to explain all of the variation
3. The further the line is away from the points, the less it is able to explain
the variation
Cont…
For example in the case of variables X and Y:
If the r = 0.922, then r 2 = 0.850
Which means that 85% of the total variation in y can be explained by the
linear relationship between x and y (as described by the regression
equation)
This therefore means that, the other 15% of the total variation in y remains
unexplained
Karl Pearson’s coefficient of correlation (or simple correlation)
This is the most widely used method of measuring the degree of
relationship between two variables.
Its defined as the measure of the strength of the linear relationship between
two variables that is defined in terms of the (sample) covariance of the
variables divided by their (sample) standard deviations.
This coefficient assumes the following:
(i) that there is linear relationship between the two variables;
(ii) that the two variables are casually related which means that one of the
variables is independent and the other one is dependent
(iii) A large number of independent causes are operating in both variables
so as to produce a normal distribution.
2222 )Y(Yn )X(Xn
YXXYn
r xy
- Shared variability of X and Y variables - on the top
- Individual variability of X and Y variables- At the bottom
Karl Pearson’s coefficient of correlation can be worked out thus
OR
yxr
yx .
),cov(
Illistration
From the following data find the coefficient of correlation by Karl Pearson method
X: 6, 2, 10, 4, 8
Y: 9, 11, 5, 8, 7
Sol.cont.
92.0800
26
20.40
26
.
.
85
40
65
30
22
yx
yxr
N
YY
N
XX
Spearman's rank coefficient
This is the technique of determining the degree of correlation between two
variables incase of ordinal data where ranks are given to different values of
the variables.
The main objective of the coefficient is to determine the extend to which
the two sets of ranking are similar or dissimilar.
This method is only used to determine correlation when the data is not
available in numerical form
Thus when the values of the two variables are converted to their ranks and
the correlation is obtained, the correlation is known as rank correlation
Computation of Rank Correlation
Spearman’s rank correlation coefficient ρ can be calculated when
• Actual ranks given
• Ranks are not given but grades are given but not repeated
• Ranks are not given and grades are given and repeated
yofrankR
XofrankR
RRD
where
NN
DR
y
x
yx
..
..
)1(
61
2
2
Illustration
Calculate the spearman’s rank correlation coefficient between advertisement cost and sales from the following data
Advertisement cost : 39, 65, 62, 90, 82, 75, 25, 98, 36, 78
Sales(Shs): 47, 53, 58, 86, 62, 68, 60, 91, 51, 84
X Y R-x R-y D
39 47 8 10 -2 4
65 53 6 8 -2 4
62 58 7 7 0 0
90 86 2 2 0 0
82 62 3 5 -2 4
75 68 5 4 1 1
25 60 10 6 4 16
98 91 1 1 0 0
36 51 9 9 0 0
78 84 4 3 1 1
30
2D
Cont….
82.0
990
1801
1010
)30(61
61
3
3
2
R
R
R
NN
DR
Nonlinear Relationships
In correlation analysis, not all relationships are linear.
In cases where there is clear evidence of a nonlinear relationship DO NOTuse Pearson’s Product Moment Correlation ( r ) to summarize the strength of the relationship between Y and X.
Non linear correlation Scatter graph
Conclusions
Correlation is the linear association between two numeric variables e.g variables X and Y.
The correlation (r) ranges from -1 to +1
where
-1 < r < 1
If r < 0 then there is a negative correlation between X and Y, i.e. as X increases Y generally decreases
If r > 0 then there is a positive correlation between X and Y, i.e. as X increases Y generally increases
The close r is to 0 the weaker the linear association between X and Y.
A diagram explaining different strengths of correlations
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustratedby the following diagram.
-1 10-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no relation
perfect
correlation
perfect
correlation
Directindirect
Example of graphs and their interpretation
Negative and positive correlations
No Relationship (r = .00)
Information about Explanatory Flexibility tells you nothing about Emotional Insight
Explanatory Flexibility
3.53.02.52.01.51.0.50.0-.5
AS
IS -
Em
otio
na
l In
sig
ht
8
7
6
5
4
3
2
1
REFERENCES
Dhrymes, P. J.: Econometrics: Statistical Foundations and Applications,Harper & Row, New York, 1970.
Fomby, Thomas B., Carter R. Hill, and Stanley R. Johnson: AdvancedEconometric Methods, Springer-Verlag, New York, 1984.
Goldberger, A. S.: A Course in Econometrics, Harvard University Press,Cambridge, Mass., 1991.
Harvey, A. C.: The Econometric Analysis of Time Series, 2d ed., MIT Press,Cambridge, Mass., 1990.
Kothari CR, Research methodology: an introduction. New Delhi, Vikaspublishing house Pvt ltd 2000
Emory C William, Business research methods. Illinois: Richard D. Irwin,Inc. Homewood 2001
THANK YOU