42
Topics for Today Scatterplots Relationship between 2 Continuous Variables Pearson’s Correlation Facts and Myths Correlation as a Statistic Stat203 Page 1 of 42 Fall2011 – Week 9, Lecture 1

people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Topics for Today

Scatterplots

Relationship between 2 Continuous Variables

Pearson’s Correlation

Facts and Myths

Correlation as a Statistic

Stat203 Page 1 of 31Fall2011 – Week 9, Lecture 1

Page 2: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Two Continuous Variables

Using the 2-sample Chi-square test we were able to investigate the relationship between two discrete variables.

Eg: - Radio format and age- weather and city

Now we will examine the relationship between two __________ variables.

The first tool we will discuss is called ___________.

Stat203 Page 2 of 31Fall2011 – Week 9, Lecture 1

Page 3: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

but even before that … Scatter Plots

Shows the relationship between 2 continuous variables measured on the same ___________.

Values of the one variable (X) are plotted on the horizontal axis and values of the other variable (Y) are plotted on the vertical axis. Each individual appears as a single point.

Let’s look at this in SPSS …

Stat203 Page 3 of 31Fall2011 – Week 9, Lecture 1

Page 4: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Let’s look at a dataset called Detroit that has information from the city for years 1961 to 1973. It contains 6 variables:

- year- homicide rate (per 100,000 population)- # of police (per 100,000 population)- unemployment rate (%)- # registered handguns (per 10,000 population)- average weekly income ($)

Stat203 Page 4 of 31Fall2011 – Week 9, Lecture 1

Page 5: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Let’s create a scatterplot of two of these variables.

Stat203 Page 5 of 31Fall2011 – Week 9, Lecture 1

Page 6: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Stat203 Page 6 of 31Fall2011 – Week 9, Lecture 1

Page 7: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

A scatterplot of the # of registered handguns and the # of police officers:

Stat203 Page 7 of 31Fall2011 – Week 9, Lecture 1

Page 8: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

let’s look at the first row of the data table, and then identify that point (circle it) in the scatterplot on the previous page:

Stat203 Page 8 of 31Fall2011 – Week 9, Lecture 1

Page 9: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Each row in the data table corresponds to exactly one point in the scatter plot.

What sort of relationship between the # of registered handguns and the # of police officers does this scatterplot show?

Stat203 Page 9 of 31Fall2011 – Week 9, Lecture 1

Page 10: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Correlation

The term ___________ is often used in common language and has a general interpretation as implying a ____________ between two events … including two discrete events:

“Autism is correlated with vaccination”

… or things that can’t really be measured

“there’s a correlation between my mood and my partner’s behavior”

However in statistics the term correlation means something specific.Stat203 Page 10 of 31Fall2011 – Week 9, Lecture 1

Page 11: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Statistical Correlation

___________ measures the _________ and ________ of a ______ relationship between two continuous variables (X and Y). Pearson’s correlation is the most commonly used:

r=∑i=1

n( x i− x )( y i− y )

√ [∑i=1

n( xi− x )

2 ][∑i=1

n( y i− y )

2]Note:

- this is ONLY a linear relationship- there are many types of relationships that are not

linearStat203 Page 11 of 31Fall2011 – Week 9, Lecture 1

Page 12: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

I only give you the formula for completeness; we will not be calculating it by hand (it is extremely tedious).

In this class as in every time you analyze data in the future, we will make the software calculate the correlation.

However, it is important that you understand that it’s just another statistic calculated from the data, just like the

mean, the standard deviation, or the odds-ratio.

Stat203 Page 12 of 31Fall2011 – Week 9, Lecture 1

Page 13: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Some Facts about Correlation

1. Correlation can only be used when both variables are interval or ratio level

2. Correlation does not change when we change the units of measurement of X and Y

Height in cm or in will give same correlation to weight in kg or lbs

3. Positive correlation indicates positive association between the variables and negative correlation indicates negative association

4. Correlation is always between __ and _. Values near 0 indicate a very ____ relationship-1 or 1 will occur only if points fall on a straight line

Stat203 Page 13 of 31Fall2011 – Week 9, Lecture 1

Page 14: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Examples

The following are scatter plots of two variables with the correlation between the two listed above the plot.

Stat203 Page 14 of 31Fall2011 – Week 9, Lecture 1

Page 15: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Pearson Correlation of 1

As in the definition, correlation is the strength of the linear relationship. All of these figures have the ____ correlation!

Important note! The strength of the correlation doesn’t depend on the slope of the line, just how _______ clustered the points are to a _____________ … any straight line!

Stat203 Page 15 of 31Fall2011 – Week 9, Lecture 1

Page 16: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Examples of a relationship withPearson Correlation of 0

Stat203 Page 16 of 31Fall2011 – Week 9, Lecture 1

Page 17: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Facts in a video

http://www.youtube.com/watch?v=Ypgo4qUBt5o

Stat203 Page 17 of 31Fall2011 – Week 9, Lecture 1

Page 18: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Let’s do some examples – Correlation guessing

Q15, pg 370 – correlation between poverty and rates of teen pregnancy in 8 US states.

a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)

Stat203 Page 18 of 31Fall2011 – Week 9, Lecture 1

Page 19: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Q16, pg 370 (edited) – Hours studied and exam grade

a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)

Stat203 Page 19 of 31Fall2011 – Week 9, Lecture 1

Page 20: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Q19, pg 371 – Hours watching TV vs # books read

a) [-0.95, -0.5)b) [-0.5, 0)c) (0, 0.5)d) [0.5, 0.95)

Stat203 Page 20 of 31Fall2011 – Week 9, Lecture 1

Page 21: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

An Example

In which of these two scatter plots is the correlation higher?

Stat203 Page 21 of 31Fall2011 – Week 9, Lecture 1

Page 22: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

The correlation of the x and y in the two figures is _________, only the _____ of the axes is different!

Stat203 Page 22 of 31Fall2011 – Week 9, Lecture 1

Page 23: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Don’t trust your eye, always calculate the correlation.

… but don’t trust the correlation … always check by eye.

Stat203 Page 23 of 31Fall2011 – Week 9, Lecture 1

Page 24: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Myths about Correlation

1. Correlation implies causation

There could be a third, unknown variable which influences both X and Y

2. A correlation coefficient of zero implies no relationship between two variables

WRONG! it only implies no LINEAR relationship!

Remember the funky shaped figures!

Stat203 Page 24 of 31Fall2011 – Week 9, Lecture 1

Page 25: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Myths explained in video

http://www.youtube.com/watch?v=MTbZoKEOkUg

http://www.youtube.com/watch?v=VW1IEqKuf6s(Only to 2:48)

Stat203 Page 25 of 31Fall2011 – Week 9, Lecture 1

Page 26: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

Correlation as a statistic

As with the mean, the Odds Ratio and the other statistics we have looked at, a correlation is a characteristic of a population that we estimate with our ______:

Population(Parameter)

Sample(Statistic)

Mean µ X

Proportion p pOdds Ratio ORCorrelation _ _

Stat203 Page 26 of 31Fall2011 – Week 9, Lecture 1

Page 27: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

The r tells part of the story

Remember, the correlation (r) we calculate from a sample is only one of the _____________ correlations we could have obtained one of many possible _______. It’s possible that the true population correlation, ρ, has another value … say 0, or ρ0.

So … there is some variability of our estimate r, it’s standard error.

Stat203 Page 27 of 31Fall2011 – Week 9, Lecture 1

Page 28: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

s e (r )=√ 1−r 2n−2 Hypotheses for Associationsbetween Continuous Variables

H0: there is no linear relationship between X and Y

Ha: there is a linear relationship between X and Y

Is the same as:

H0: H0: ρ = 0

Ha: H0: ρ ≠ 0

Stat203 Page 28 of 31Fall2011 – Week 9, Lecture 1

Page 29: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

And as in our other hypotheses tests, we will use a _________ (r ) to approximate a _________ (ρ).

Testing for Correlation = 0

Recall our hypothesis tests for the μ= 0, we used a t-test.

t= x−0se ( x )

= xs /√n

If both X and Y are normally distributed, the test for H0: ρ = 0 is very similar:

t= r−0se (r )

= r

√1−r 2n−2Stat203 Page 29 of 31Fall2011 – Week 9, Lecture 1

Page 30: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

and we look up our t value in the appropriate table to find the p-value!

Stat203 Page 30 of 31Fall2011 – Week 9, Lecture 1

Page 31: people.stat.sfu.capeople.stat.sfu.ca/.../Stat203/Fall2011/Stat203_W10L1.docx · Web viewNow we will examine the relationship between two _____ variables. The first tool we will discuss

New Topics Covered Today

Pearson’s Correlation Most commonly calculated correlation statistic No definition of response or predictor Always between -1 and 1

Hypothesis testing for Correlation Does a correlation exist? Reject null = a non-zero correlation

Reading:

Chapter 10 up to page 360

Stat203 Page 31 of 31Fall2011 – Week 9, Lecture 1