29
Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator.

Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Embed Size (px)

Citation preview

Page 1: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Bivariate Data

Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator.

Page 2: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Archaeopteryx is an extinct beast having feathers like a bird but teeth and a long bony tail like a reptile. Only six fossil specimens are known. Because these specimens differ greatly in size, some scientists think they are different species rather than individuals from the same species. If the specimens belong to the same species and differ in size because some are younger than others, there should be a positive linear relationship between the bones from all individuals. An outlier from this relationship would suggest a different species. Here are data on the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the upper arm) for the five specimens that preserve both bones.

femur 38 56 59 64 74

humerus 41 63 70 72 84

Load data into list 1 and list 2 and make a scatterplot.

Page 3: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator
Page 4: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

This is not enough. What do we need?

Page 5: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

femur length in cm

hum

eru

s le

ngt

h in

cm

38

4164

72

A “cheater” way to put scale on a scatterplot is to trace two points and label each axis with those two values.

Page 6: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

femur length in cm

hum

eru

s le

ngt

h in

cm

38

4164

72

explanatory variable?

response variable?

femur length in cm

humerus length in cm

But does it really matter here?

No. But often it does.

Page 7: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Find the correlation coefficient and explain what it means.

Page 8: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Find the correlation coefficient and explain what it means.

correlation coefficient

Did you get it?

Page 9: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

If you did not get the correlation coefficient, you must turn your diagnostics on.

Push 2nd then 0.

Scroll down to diagnostics on.

Push “enter” twice and little calculator guy will say “done”.

Page 10: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Find the correlation coefficient and explain what it means.

correlation coefficient

Page 11: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994

The correlation coefficient is ALWAYS between

-1 and 1. It does not change when the units or scale is transformed. Let’s check out the formula sheet.

Page 12: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 What does it mean?

The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.

These points line up pretty well with a negative slope.

The correlation coefficient would be close to -1.

graph on the bottom of your notes

Page 13: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 What does it mean?

The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.

These points line up pretty well with a positive slope.

The correlation coefficient would be close to 0.8 or 0.9.

Page 14: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 What does it mean?

The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.

These points don’t line up at all.

The correlation coefficient would be nearly 0.

Page 15: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 What does it mean?

The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.

These points line up sort of well with a negative slope.

The correlation coefficient might be – 0.6 or – 0.7.

Page 16: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 What does it mean?

The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.

These points don’t line up at all.

The correlation coefficient would be fairly close to 0.

Page 17: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

r = .994 Here’s what you write:

This suggests a strong, positive, linear relationship between femur length and humerus length.

Page 18: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

So what’s the rest of this stuff?

Page 19: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

slopey-interceptcoefficient of determination

equation: ŷ = 1.197x – 3.660

This is hugely important! It means

the predicted y.

Page 20: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

equation: ŷ = 1.197x – 3.660where x = femur length and y = humerus length

slope = 1.197 ; For every 1 cm increase in femur length, the humerus length increases by 1.197 cm, on average.

y-intercept ; When the femur length is 0 cm, the humerus length is about -3.660 cm.

Of course, this is ridiculous and is an example of extrapolation.

Page 21: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

ResidualsSince our line misses many of the points, a residual

is a measure of the “miss.”

residual = y – ŷ (actual – predicted)

a residual is the vertical distance from the point to the line

Page 22: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

What is the residual for the point (56, 63)?

residual = y – ŷ

ŷ = 1.197x – 3.660

ŷ = 1.197(56) – 3.660 = 63.372

residual = y – ŷ = 63 – 63.372 = -.372

Find the residual for the point (74, 84).

- .918

Page 23: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

A residual plot is a graph of all the residuals.

To get resid, push 2nd

statresid

This only works if the little guy knows the equation of the line.

Page 24: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

femur length in cm

resi

dual

s

38-.8

3

59

This is a horrible residual plot. We’d like the points to be equally scattered above and below the line.

Residual Plot

Page 25: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

That’s it for dinosaurs today.

Page 26: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Limitations of Correlation and Regression

Correlation is linear only.

One influential point or incorrectly entered data point can greatly change the regression line. Correlation is not robust (resistant to outliers).

Correlations based on averages are usually too high when applied to individuals.

Extrapolation can yield silly results. Predictions for y should be made using the range of values in the data.

Correlation does not imply a cause-and-effect relationship.

Page 27: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Examples of a perfect linear fit

A job pays $10 per hour. The relationship between hours worked and pay.

hours worked

pay

Page 28: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Examples of a perfect linear fit

The association between hours worked and time spent pursuing hedonistic pleasures.

hours worked

hed

on

istic

ple

asu

re ti

me

Here we could switch x and y.

Sometimes we are simply curious about an association.

Page 29: Bivariate Data Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator

Time for more data?