Upload
amos-wheeler
View
236
Download
4
Embed Size (px)
Citation preview
Bivariate Data
Pick up a formula sheet, Notes for Bivariate Data – Day 1, and a calculator.
Archaeopteryx is an extinct beast having feathers like a bird but teeth and a long bony tail like a reptile. Only six fossil specimens are known. Because these specimens differ greatly in size, some scientists think they are different species rather than individuals from the same species. If the specimens belong to the same species and differ in size because some are younger than others, there should be a positive linear relationship between the bones from all individuals. An outlier from this relationship would suggest a different species. Here are data on the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the upper arm) for the five specimens that preserve both bones.
femur 38 56 59 64 74
humerus 41 63 70 72 84
Load data into list 1 and list 2 and make a scatterplot.
This is not enough. What do we need?
femur length in cm
hum
eru
s le
ngt
h in
cm
38
4164
72
A “cheater” way to put scale on a scatterplot is to trace two points and label each axis with those two values.
femur length in cm
hum
eru
s le
ngt
h in
cm
38
4164
72
explanatory variable?
response variable?
femur length in cm
humerus length in cm
But does it really matter here?
No. But often it does.
Find the correlation coefficient and explain what it means.
Find the correlation coefficient and explain what it means.
correlation coefficient
Did you get it?
If you did not get the correlation coefficient, you must turn your diagnostics on.
Push 2nd then 0.
Scroll down to diagnostics on.
Push “enter” twice and little calculator guy will say “done”.
Find the correlation coefficient and explain what it means.
correlation coefficient
r = .994
The correlation coefficient is ALWAYS between
-1 and 1. It does not change when the units or scale is transformed. Let’s check out the formula sheet.
r = .994 What does it mean?
The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.
These points line up pretty well with a negative slope.
The correlation coefficient would be close to -1.
graph on the bottom of your notes
r = .994 What does it mean?
The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.
These points line up pretty well with a positive slope.
The correlation coefficient would be close to 0.8 or 0.9.
r = .994 What does it mean?
The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.
These points don’t line up at all.
The correlation coefficient would be nearly 0.
r = .994 What does it mean?
The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.
These points line up sort of well with a negative slope.
The correlation coefficient might be – 0.6 or – 0.7.
r = .994 What does it mean?
The correlation coefficient describes the strength of the linear relationship. The closer it is to 1 or -1 the more the points line up.
These points don’t line up at all.
The correlation coefficient would be fairly close to 0.
r = .994 Here’s what you write:
This suggests a strong, positive, linear relationship between femur length and humerus length.
So what’s the rest of this stuff?
slopey-interceptcoefficient of determination
equation: ŷ = 1.197x – 3.660
This is hugely important! It means
the predicted y.
equation: ŷ = 1.197x – 3.660where x = femur length and y = humerus length
slope = 1.197 ; For every 1 cm increase in femur length, the humerus length increases by 1.197 cm, on average.
y-intercept ; When the femur length is 0 cm, the humerus length is about -3.660 cm.
Of course, this is ridiculous and is an example of extrapolation.
ResidualsSince our line misses many of the points, a residual
is a measure of the “miss.”
residual = y – ŷ (actual – predicted)
a residual is the vertical distance from the point to the line
What is the residual for the point (56, 63)?
residual = y – ŷ
ŷ = 1.197x – 3.660
ŷ = 1.197(56) – 3.660 = 63.372
residual = y – ŷ = 63 – 63.372 = -.372
Find the residual for the point (74, 84).
- .918
A residual plot is a graph of all the residuals.
To get resid, push 2nd
statresid
This only works if the little guy knows the equation of the line.
femur length in cm
resi
dual
s
38-.8
3
59
This is a horrible residual plot. We’d like the points to be equally scattered above and below the line.
Residual Plot
That’s it for dinosaurs today.
Limitations of Correlation and Regression
Correlation is linear only.
One influential point or incorrectly entered data point can greatly change the regression line. Correlation is not robust (resistant to outliers).
Correlations based on averages are usually too high when applied to individuals.
Extrapolation can yield silly results. Predictions for y should be made using the range of values in the data.
Correlation does not imply a cause-and-effect relationship.
Examples of a perfect linear fit
A job pays $10 per hour. The relationship between hours worked and pay.
hours worked
pay
Examples of a perfect linear fit
The association between hours worked and time spent pursuing hedonistic pleasures.
hours worked
hed
on
istic
ple
asu
re ti
me
Here we could switch x and y.
Sometimes we are simply curious about an association.
Time for more data?