Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Topics for Today
Scatterplots
Relationship between 2 Continuous Variables
Pearson’s Correlation
Facts and Myths
Correlation as a Statistic
Stat203 Page 1 of 31Fall2011 – Week 9, Lecture 1
Two Continuous Variables
Using the 2-sample Chi-square test we were able to investigate the relationship between two discrete variables.
Eg: - Radio format and age- weather and city
Now we will examine the relationship between two __________ variables.
The first tool we will discuss is called ___________.
Stat203 Page 2 of 31Fall2011 – Week 9, Lecture 1
but even before that … Scatter Plots
Shows the relationship between 2 continuous variables measured on the same ___________.
Values of the one variable (X) are plotted on the horizontal axis and values of the other variable (Y) are plotted on the vertical axis. Each individual appears as a single point.
Let’s look at this in SPSS …
Stat203 Page 3 of 31Fall2011 – Week 9, Lecture 1
Let’s look at a dataset called Detroit that has information from the city for years 1961 to 1973. It contains 6 variables:
- year- homicide rate (per 100,000 population)- # of police (per 100,000 population)- unemployment rate (%)- # registered handguns (per 10,000 population)- average weekly income ($)
Stat203 Page 4 of 31Fall2011 – Week 9, Lecture 1
Let’s create a scatterplot of two of these variables.
Stat203 Page 5 of 31Fall2011 – Week 9, Lecture 1
Stat203 Page 6 of 31Fall2011 – Week 9, Lecture 1
A scatterplot of the # of registered handguns and the # of police officers:
Stat203 Page 7 of 31Fall2011 – Week 9, Lecture 1
let’s look at the first row of the data table, and then identify that point (circle it) in the scatterplot on the previous page:
Stat203 Page 8 of 31Fall2011 – Week 9, Lecture 1
Each row in the data table corresponds to exactly one point in the scatter plot.
What sort of relationship between the # of registered handguns and the # of police officers does this scatterplot show?
Stat203 Page 9 of 31Fall2011 – Week 9, Lecture 1
Correlation
The term ___________ is often used in common language and has a general interpretation as implying a ____________ between two events … including two discrete events:
“Autism is correlated with vaccination”
… or things that can’t really be measured
“there’s a correlation between my mood and my partner’s behavior”
However in statistics the term correlation means something specific.Stat203 Page 10 of 31Fall2011 – Week 9, Lecture 1
Statistical Correlation
___________ measures the _________ and ________ of a ______ relationship between two continuous variables (X and Y). Pearson’s correlation is the most commonly used:
r=∑i=1
n( x i− x )( y i− y )
√ [∑i=1
n( xi− x )
2 ][∑i=1
n( y i− y )
2]Note:
- this is ONLY a linear relationship- there are many types of relationships that are not
linearStat203 Page 11 of 31Fall2011 – Week 9, Lecture 1
I only give you the formula for completeness; we will not be calculating it by hand (it is extremely tedious).
In this class as in every time you analyze data in the future, we will make the software calculate the correlation.
However, it is important that you understand that it’s just another statistic calculated from the data, just like the
mean, the standard deviation, or the odds-ratio.
Stat203 Page 12 of 31Fall2011 – Week 9, Lecture 1
Some Facts about Correlation
1. Correlation can only be used when both variables are interval or ratio level
2. Correlation does not change when we change the units of measurement of X and Y
Height in cm or in will give same correlation to weight in kg or lbs
3. Positive correlation indicates positive association between the variables and negative correlation indicates negative association
4. Correlation is always between __ and _. Values near 0 indicate a very ____ relationship-1 or 1 will occur only if points fall on a straight line
Stat203 Page 13 of 31Fall2011 – Week 9, Lecture 1
Examples
The following are scatter plots of two variables with the correlation between the two listed above the plot.
Stat203 Page 14 of 31Fall2011 – Week 9, Lecture 1
Pearson Correlation of 1
As in the definition, correlation is the strength of the linear relationship. All of these figures have the ____ correlation!
Important note! The strength of the correlation doesn’t depend on the slope of the line, just how _______ clustered the points are to a _____________ … any straight line!
Stat203 Page 15 of 31Fall2011 – Week 9, Lecture 1
Examples of a relationship withPearson Correlation of 0
Stat203 Page 16 of 31Fall2011 – Week 9, Lecture 1
Facts in a video
http://www.youtube.com/watch?v=Ypgo4qUBt5o
Stat203 Page 17 of 31Fall2011 – Week 9, Lecture 1
Let’s do some examples – Correlation guessing
Q15, pg 370 – correlation between poverty and rates of teen pregnancy in 8 US states.
a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)
Stat203 Page 18 of 31Fall2011 – Week 9, Lecture 1
Q16, pg 370 (edited) – Hours studied and exam grade
a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)
Stat203 Page 19 of 31Fall2011 – Week 9, Lecture 1
Q19, pg 371 – Hours watching TV vs # books read
a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)
Stat203 Page 20 of 31Fall2011 – Week 9, Lecture 1
An Example
In which of these two scatter plots is the correlation higher?
Stat203 Page 21 of 31Fall2011 – Week 9, Lecture 1
The correlation of the x and y in the two figures is _________, only the _____ of the axes is different!
Stat203 Page 22 of 31Fall2011 – Week 9, Lecture 1
Don’t trust your eye, always calculate the correlation.
… but don’t trust the correlation … always check by eye.
Stat203 Page 23 of 31Fall2011 – Week 9, Lecture 1
Myths about Correlation
1. Correlation implies causation
There could be a third, unknown variable which influences both X and Y
2. A correlation coefficient of zero implies no relationship between two variables
WRONG! it only implies no LINEAR relationship!
Remember the funky shaped figures!
Stat203 Page 24 of 31Fall2011 – Week 9, Lecture 1
Myths explained in video
http://www.youtube.com/watch?v=MTbZoKEOkUg
http://www.youtube.com/watch?v=VW1IEqKuf6s(Only to 2:48)
Stat203 Page 25 of 31Fall2011 – Week 9, Lecture 1
Correlation as a statistic
As with the mean, the Odds Ratio and the other statistics we have looked at, a correlation is a characteristic of a population that we estimate with our ______:
Population(Parameter)
Sample(Statistic)
Mean µ X
Proportion p pOdds Ratio ORCorrelation _ _
Stat203 Page 26 of 31Fall2011 – Week 9, Lecture 1
The r tells part of the story
Remember, the correlation (r) we calculate from a sample is only one of the _____________ correlations we could have obtained one of many possible _______. It’s possible that the true population correlation, ρ, has another value … say 0, or ρ0.
So … there is some variability of our estimate r, it’s standard error.
Stat203 Page 27 of 31Fall2011 – Week 9, Lecture 1
s e (r )=√ 1−r 2n−2 Hypotheses for Associationsbetween Continuous Variables
H0: there is no linear relationship between X and Y
Ha: there is a linear relationship between X and Y
Is the same as:
H0: H0: ρ = 0
Ha: H0: ρ ≠ 0
Stat203 Page 28 of 31Fall2011 – Week 9, Lecture 1
And as in our other hypotheses tests, we will use a _________ (r ) to approximate a _________ (ρ).
Testing for Correlation = 0
Recall our hypothesis tests for the μ= 0, we used a t-test.
t= x−0se ( x )
= xs /√n
If both X and Y are normally distributed, the test for H0: ρ = 0 is very similar:
t= r−0se (r )
= r
√1−r 2n−2Stat203 Page 29 of 31Fall2011 – Week 9, Lecture 1
and we look up our t value in the appropriate table to find the p-value!
Stat203 Page 30 of 31Fall2011 – Week 9, Lecture 1
New Topics Covered Today
Pearson’s Correlation Most commonly calculated correlation statistic No definition of response or predictor Always between -1 and 1
Hypothesis testing for Correlation Does a correlation exist? Reject null = a non-zero correlation
Reading:
Chapter 10 up to page 360
Stat203 Page 31 of 31Fall2011 – Week 9, Lecture 1