Upload
martin-merritt
View
216
Download
0
Embed Size (px)
Citation preview
CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES
Lecture 18:
Agenda2
Reminder about Lab 3
Brief Update on Data for Final
Correlations
Probability Revisited3
To make a reasonable decision, we must know:
Probability Distribution What would the distribution be like if it were
only due to chance?
Decision Rule What criteria do we need in order to
determine whether an observation is just due to chance or not.
Quick Recap of An Earlier Issue:Why N-1?
4
If we have a randomly distributed variable in a population, extreme cases (i.e., the tails) are less likely to be selected than common cases (i.e., within 1 SD of the mean). One result of this: sample variance is lower than
actual population variance. Dividing by n-1 corrects this bias when calculating sample statistics.
Checking for simple linear relationships
5
Pearson’s correlation coefficient Measures the extent to which two metric or
interval-type variables are linearly related Statistic is Pearson r, or the linear or product-
moment correlation
Or, the correlation coefficient is the average of the cross products of the corresponding z-scores.
Correlations6
N
iyixixy zz
Nr
11
1
Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Negative relations Positive relations
Remember: correlation ONLY measures linear relationships, not all relationships!
Interpretation7
Recall that Correlation is a precondition for causality– but by itself it is not sufficient to show causality (why?)
Correlation is a proportional measure; does not depend on specific measurements
Correlation interpretation: Direction (+/-) Magnitude of Effect (-1 to 1); shown as r Statistical Significance (p<.05, p<.01, p<.001)
Correlation: Null and Alt Hypotheses8
Null versus Alternative Hypothesis H0 H1, H2, etc
Test Statistics and Significance Level
Test statistic Calculated from the data Has a known probability distribution
Significance level Usually reported as a p-value
(probability that a result would occur if the null hypothesis were true).
price mpg
price 1.0000
mpg -0.4686 1.0000 0.0000
Factors which limit Correlation coefficient
9
Homogeneity of sample groupNon-linear relationshipsCensored or limited scalesUnreliable measurement instrumentOutliers
Homogenous Groups10
Homogenous Groups: Adding Groups11
Homogenous Groups: Adding More Groups
12
Separate Groups (non-homogeneous)13
Non-Linear Relationships14
Censored or Limited Scales…15
Censored or Limited Scales16
Unreliable Instrument17
Unreliable Instrument18
Unreliable Instrument19
Outliers20
Outliers21
Outlier
22
Examples with Real Data…