Upload
barnaby-black
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
2Go to Table of Content
Additional Reading
• For additional reading see Chapter 6 in Michael R. Middleton’s Data Analysis Using Excel, Duxbury Thompson Publishers, 2000.
• See also Chapter 4 section 7 of Keller and Warrack’s Statistics for Management and Economics. Fifth Edition, Duxbury Thompson Learning Publisher, 2000.
• Read any introductory statistics book about correlation.
3Go to Table of Content
Which Approach Is Appropriate When?
• Choosing the right method for the data is the key statistical expertise that you need to have.
• You might want to review a decision tool that we have organized for you to help you in choosing the right statistical method.
4Go to Table of Content
Do I Need to Know the Formulas?
• You do not need to know exact formulas.• You do need to know where they are in your
reference book.• You do need to understand the concept behind
them and the general statistical concepts imbedded in the use of the formulas.
• You do not need to be able to do correlation and regression by hand. You must be able to do it on a computer using Excel or other software.
5Go to Table of Content
Table of Content
• Objectives• Independent and dependent variables• Example • Scatter plot• Correlation coefficient• Range of correlation coefficient• Formula for correlation coefficient• Example for correlation coefficient• Possible relationships between variables
6Go to Table of Content
Objectives
• To learn the assumptions behind and the interpretation of correlation.
• To use Excel to calculate correlations.
Go to Table of Content
Purpose of Correlation
Correlation determines whether values of one variable are related to
another.
8Go to Table of Content
Independent and Dependent Variables
• Independent variable: is a variable that can be controlled or manipulated.
• Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable.
9Go to Table of Content
Example
• Independent variable in this example is the number of hours studied.
• The grade the student receives is a dependent variable.
• The grade student receives depend upon the number of hours he or she will study.
• Are these two variables related?
Student Hours studied % Grade
A 6 82
B 2 63
C 1 57
D 5 88
E 3 68
F 2 75
10Go to Table of Content
Scatter Plot
• The independent and dependent can be plotted on a graph called a scatter plot.
• By convention, the independent variable is plotted on the horizontal x-axis. The dependent variable is plotted on the vertical y-axis.
11Go to Table of Content
Example of Scatter Plot
• A scatter plot is a graph of the ordered pairs (x,y) of numbers consisting of the independent variables, x, and the dependent variables, y.
• Please use excel to create a scatter plot.
Scatter Plot
0
20
40
60
80
100
0 1 2 3 4 5 6 7
Hours Studied
Gra
de (%
)
12Go to Table of Content
Interpret a Scatter Plot
The graph suggests a positive relationship between hours of studies and grades
Scatter Plot
0
20
40
60
80
100
0 1 2 3 4 5 6 7
Hours StudiedG
rad
e (
%)
13Go to Table of Content
Correlation Coefficient
• The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables.
• The range of the correlation coefficient is.
- 1 to + 1 and is identified by r.
14Go to Table of Content
Positive and Negative Correlations
• A positive relationship exists when both variables increase or decrease at the same time. (Weight and height).
• A negative relationship exist when one variable increases and the other variable decreases or vice versa. (Strength and age).
15Go to Table of Content
Range of correlation coefficient
• In case of exact positive linear relationship the value of r is +1.
• In case of a strong positive linear relationship, the value of r will be close to + 1.
Correlation = +1
15
20
25
10 12 14 16 18 20
Independent variableD
ep
en
de
nt
vari
ab
le
16Go to Table of Content
Range of correlation coefficient
• In case of exact negative linear relationship the value of r is –1.
• In case of a strong negative linear relationship, the value of r will be close to – 1.
Correlation = -1
15
20
25
10 12 14 16 18 20
Independent variableD
ep
en
de
nt
vari
ab
le
17Go to Table of Content
Range of correlation coefficient
In case of a weak relationship the value of r will be close to 0.
Correlation = 0
10
15
20
25
30
0 2 4 6 8 10 12
Independent variableD
ep
en
de
nt
vari
ab
le
18Go to Table of Content
Range of correlation coefficient
In case of nonlinear relationship the value of r will be close to 0.
Correlation = 0
0
10
20
30
0 2 4 6 8 10 12
Independent variableD
ep
en
de
nt
vari
ab
le
19Go to Table of Content
Formula for correlation coefficient
The formula to compute a correlation coefficient is:r = [n(xy) – (x)(y)] /
{[n(x2) – (x)2][n(y2) – (y)2]}0.5
Where n is the number of data pairs, x is the independent variable and y the dependent variable.
20Go to Table of Content
Example for correlation coefficient
• Let’s do an example.
• Using the data on age and blood pressure, let’s calculate the x, y, xy, x2 and y2.
Student Age Blood Pressure
Age*BP
age2 BP2
A 43 128 5504 1849 16384
B 48 120 5760 2304 14400
C 56 135 7560 3136 18225
D 61 143 8723 3721 20449
E 67 141 9447 4489 19881
F 70 152 10640 4900 23104
Sum 345 819 47634 20399 112443
21Go to Table of Content
Example for correlation coefficient
• Substitute in the formula and solve for r:r= {(6*47634)-(345*819)}/{[(6*20399)-3452]
[(6*112443)-8192]}0.5.
r= 0.897.
• The correlation coefficient suggests a strong positive relationship between age and blood pressure.
22Go to Table of Content
Possible Relationships Between Variables
• Direct cause and effect, that is x cause y or water causes plant to grow.
• Both cause and effect, that y cause x or coffee consumption causes nervousness as well nervous people have more coffee.
• Relationship caused by third variable; Death due to drowning and soft drink consumption during summer. Both variables are related to heat and humidity (third variable).
23Go to Table of Content
Possible Relationships Between Variables
• Complexity of interrelationships among many variables; Relationship between student’s high school grade and college grades. But others variables are involved too such as IQ, hours of study, influence of parents, motivation, age, and instructors.
• Coincidental relationship; Increase in the number of people exercising and increase in the number of people committing crimes.
24Go to Table of Content
Interpretation
• The correlation is 0.9• There is a strong
positive relationship between age and blood pressure
AgeBlood Pressure 0.90
25Go to Table of Content
Test of Correlation
• Null hypothesis: correlation is zero
• Test statistic is t = r [(n-2)/(1-r2)]0.5
• The statistic is distributed as Student t distribution with n-2 degrees of freedom
• Excel does not calculate this statistic and you can manually calculate it