Lecture 25: Scatterplots for Bivariate Data
Section 3.1 and 3.2
Announcement
• Grades of Exam 2 are posted. • Blackboard Signal Intervention will re-
run.
3.1 Visually Display Bivariate …
Scatterplot for H/W data
40
45
50
55
60
65
70
155 160 165 170 175 180
Scatter Plots
Example 3.1 with SAS Code
Scatterplot of Example 3.1
Scatterplots
• Plot bivariate data • Plot the (x,y) pairs directly on plot • Pattern within plot can indicate certain
relationships between x and y – Linear
• we like these A LOT! – Quadratic, Cubic? – Nonlinear? Exponential or Log? – Other? Random? – Etc.
3.2 Pearson’s Correlation Coefficient
• Suppose a scatterplot shows a linear (or roughly linear) relationship between X and Y (note: both must be quantitative)
• The correlation coefficient, r, measures the
strength and direction of the linear relationship – Formally called Pearson’s correlation coefficient
• Examples: – Age and Bone Density – Weight and Blood Pressure – Etc.
How to calculate Correlation
• Where:
• Typo: On the right hand side of the above “Sxy” equation, the second item on the numerator part should be Sum of yi, instead of Sum of xi.
• See Example 3.3 in text on page 108. • Or by calculator
More about the Correlation • Takes values between -1 and 1 – Sign indicates type of relationship • Positive, i.e., As X increases, Y also increases • Negative , i.e., X increases, Y decreases (and vice versa)
– Value indicates strength, farther from 0 is stronger • If r is near 0, it implies a weak (or no) linear relationship • Closer to +1 or -1 suggests very strong linear pattern • See page 109 indicating “strengths”
• If switch roles of X and Y à r doesn’t change
• Unit free—unaffected by linear transformations
Visual understanding
Concerns with Correlation
• r is affected by outliers, see formula • Captures only the strength of the “linear”
relationship – it could be true that Y and X have a very strong
non-linear relationship but r is close to zero
• r = +1 or -1 only when points lie perfectly on a straight line. (Y=2X+3) – Rarely, if ever, true for real data!
Do datasets with the same r value have the same relationship? …
• All four datasets have the same r = 0.816
More about r • Does a small r indicate that x and y are NOT
associated? – Not exactly, although maybe – Linear association is weak between x and y BUT another
association may still exist! – Are there outliers? – Are there clusters?
• Does a large r indicate that x and y are always linearly associated? – Not always, could have clusters that look linear
• Always check your scatterplot!!
What about a similar idea for populations?
• Yes! We can define the correlation for populations as well, designated as ρ
– Called the population correlation coefficient – Maintains similar properties as r, i.e.
–It is between −1 and 1
–The correlation is 1 in the case of an increasing linear relationship, −1 in the case of a decreasing linear relationship
–Some value in between indicating the degree of linear association between the variables
– We are not required to calculate ρ at this point
Correlation and Causation • A correlation, even a very strong one, DOES NOT IMPLY CAUSATION!!! • Examples
– For children, there is a extremely strong correlation between shoe size and math scores
– Very strong correlation between ice cream sales and number of deaths by
drowning – Very strong correlation between number of churches in a town and number of
bars in a town.
– A large correlation between height and weight of a person only means that there is a positive association between height and weight
– Heavy weight does not cause a person to grow tall
– Examples of common response…NOT causation!
After Class …
• Review Section 3.1 and 3.2 • Read section 3.3
• Hw#9, 5pm today • This Wed- Lab#5