Upload
collin-tucker
View
233
Download
0
Embed Size (px)
Citation preview
Chapter 7 Scatterplots, Association, and Correlation
Scatterplots
Displays the relationship between 2 quantitative valuables measured on the same cases very common very effective way to display relationships see patterns and trends
Examples
Relationships between variables are often at the heart of what we would like to learn from data. Are grades actually higher now than they used to be? Do people tend to reach puberty at a younger age than in
previous generations? Does applying magnets to parts of the body relieve pain? If
so, are stronger magnets more effective? Do students learn better with the use of computer
technology?
These questions relate two quantitative variables and ask whether there is an association between them.
Direction
Positive Negative
Form
Straight Curved
StrengthHow much scatter??
Weak Strong
Unusual Features
Be sure to mention any outliers or subgroups
Cartesian Plane
Created by René Descartes (1596 – 1650)
Variables
x - variable Explanatory variable
Predictor variable
Accounts for, explains, predicts or is otherwise responsible for the y – variable
y - variable Response variable
The variable you hope to predict or explain
Assigning the Variables
We want to compare peak period freeway speed to cost per person per year.
x = speed and y = cost the slower you go, the more it costs in delays
x = cost and y = speed the more you spend on highway improvements the
speed would increase
Determining Variables
Do heavier smokers develop lung cancer at younger ages?
Is birth order an important factor in predicting future income?
Can we estimate a person’s % body fat more simply by just measuring waist or wrist size?
Examples: Describe what the scatterplot might look like.
Drug dosage and degree of pain relief
Calories consumed and weight loss
Hours of sleep and score on a test
Show size and grade point average
Time for a mile run and age
Age of car and cost of repairs
Calculator
Making scatterplots
Naming lists
Correlation
measures the strength of the linear association between two quantitative variables The sign of the correlation coefficient gives the direction of the
association Always between -1 and 1
-1 and 1 would be a perfect straight line (possible but very rare) Correlation treats x and y symmetrically No units NOT affected by changes in the center or scale of either variable Correlation depends on the z-scores Measures the strength of ONLY LINEAR plots Sensitive to outliers
a single value can drastically change your coefficient
Correlation Conditions
Quantitative Variables Condition: correlation applies only to quantitative variables. Check to make sure you know the variables units and what they measure
Straight Enough Condition: the correlation coefficient tells us the strength of LINEAR scatterplots only
Outlier Conditions: outliers can distort the correlation dramatically. When you see an outlier, you should report the correlation with AND without the outlier.
Checking In
Your Statistics teacher tells you that the correlation between the scores (points out of 50) on Exam 1 and Exam 2 was .75 Before answering any questions about the correlation, what
would you like to see? Why? If she added 10 points to each Exam 1 scores, how will this
change the correlation? If she standardizes both scores, how will this affect the
correlation? In general, if someone does poorly on Exam 1, are they likely
to do poorly or well on Exam 2? Explain. If someone does poorly on Exam 1, will they definitely do
poorly on Exam 2 as well?
Looking at Association
When your blood pressure is measured, it is reported at two values, systolic blood pressure and diastolic blood pressure. How are these variable related to each other? Do they tend to be both hih or both low?
Think!!
Plan I’ll examine the relationship between two
measures of blood pressure.
Variables Systolic blood pressure and diastolic blood
pressure, both measured in millimeters of mercury
W’s: 1406 participants in a health study in Framingham MA
Plot Create a scatterplot
Check the Conditions
Quantitative Variables??
Straight Enough??
Outliers??
Show!!
Mechanics We will calculate correlation on the calculator Correlation = .792
Tell!!
Conclusion The scatterplot shows a positive direction, with a
higher SBP going with a higher DBP. The plot is generally straight with a moderate amount of scatter. The correlation of .792 is consistent with what I saw in the scatterplot. A few cases stand out with unusually high SBP compared with their DBP. It seems far less common for the DBP to be high by itself.