Upload
zephania-camacho
View
11
Download
2
Embed Size (px)
DESCRIPTION
Correlation. MEASURING ASSOCIATION - PowerPoint PPT Presentation
Citation preview
Correlation• MEASURING ASSOCIATION• Establishing a degree of association
between two or more variables gets at the central objective of the scientific enterprise. Scientists spend most of their time figuring out how one thing relates to another and structuring these relationships into explanatory theories. The question of association comes up in normal discourse as well, as in "like father like son“.
Scatterplots
A. scatter diagram
A list of 1,078 pairs of heights would be impossible to grasp. [so we need some method that can examine this data and convert it into a more conceivable format]. One method is plotting the data for the two variables (father's height and son's height) in a graph called a scatter diagram.
B. The Correlation CoefficientThis scatter plot looks like a cloud of points which
visually can give us a nice representation and a gut feeling on the strength of the relationship, and is especially useful for examining outliners or data anomalies, but statistics isn't too fond of simply providing a gut feeling. Statistics is interested in the summary and interpretation of masses of numerical data - so we need to summarize this relationship numerically. How do we do that - yes, with a correlation coefficient.
The correlation coefficient ranges from +1 to -1
r = 1.0
r = .85
r = .42
R = .17
R = - .94
R = - .54
R = - .33
• Computing the Pearson's r correlation coefficient
• Definitional formula is:
Convert each variable to standard units (zscores). The average of the products give the correlation coefficient. But this formula requires you to calculate z-scores for each observation, which means you have to calculate the standard deviation of X and Y before you can get started. For example, look what you have to do for only 5 cases.
Dividing the Sum of ZxZy (2.50) by N (5) get you the correlation coefficient = .50
• Therefore through some algebraic magic we get the computational formula, which is a bit more manageable.
2222 YNYXNX
YXNXYr
Interpreting correlation coefficients• Strong Association versus Weak
Association: strong: knowing one helps a lot in predicting the other. Weak, information about one variables does not help much in guessing the other. 0 = none; .25 weak; .5 moderate; .75 < strong
• Index of Association• R-squared defined as the proportion of the
variance of one variable accounted for by another variable a.k.a PRE STATISTIC (Proportionate Reduction of Error))
Significance of the correlation
• Null hypothesis?
• Formula:
• Then look to Table C in Appendix B
• Or just look at Table F in Appendix B
21
2
r
Nrt
Limitations of Pearson's r
• 1) at best, one must speak of "strong" and "weak," "some" and "none"-- precisely the vagueness statistical work is meant to cure.
• 2) Assumes Interval level data: Variables measured at different levels require that different statistics be used to test for association.
• 3) Outliers and nonlinearity• The correlation coefficient does not always give a true
indication of the clustering. There are two main exceptional cases: Outliers and nonlinearity.
r = .457 r = .336
4. Assumes a linear relationship
0
10000
20000
30000
40000
50000
60000
0 5 10 15 20 25 30
Education
Sala
ry
Series1
4) Christopher Achen in 1977 argues (and shows empirically) that two correlations can differ because the variance in the samples differ, not because the underlying relationship has changed.
Solution?
Regression analysis