Correlation and Regression
• Paired Data
• Is there a relationship?
• Do the numbers appear to increase or decrease together?
• Does one set increase as the other decreases?
• How consistent is the pattern?
• If so can we…
• Quantify it?
• Model it with an equation?
• Use the equation for prediction?
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
x Plastic (lb)
y Household
• The Linear Correlation Coefficient measures strength and direction of the linear relationship between paired x and y values in a sample.
• ρ (rho) is the population’s linear correlation coefficient.
• r is the sample’s linear correlation coefficient
Linear Correlation Coefficient
-110
no correlationnegative positive
Example/Homework
• Estimate r for the following relationships
1. Household size and amount of trash
2. Car weight and gas mileage
3. Car length and braking distance
4. Height and shoe size
5. Facebook friends and time spent on line
6. Car cost and number of cup holders
7. Time watching television and SAT scores
8. Outside temperature and student absences
9. Number of pages for the term paper and its grade
10.Number of accidents and car insurance premiums
Calculation
• The values of r is not affected by the units of measurements or the assignment of x and y.
• Round to three decimal places• The sample of paired data (x,y) is a random sample.• The pairs of (x,y) data have a bivariate normal
distribution.1. For every x, the paired y values are normally distributed2. For every y, the paired x values are normally distributed
nxy - (x)(y)
n(x2) - (x)2 n(y2) - (y)2r =
Calculating r
X Y XY X2 Y2
2 0.27
3 1.41
3 2.19
6 2.83
2 1.81
4 2.19
1 0.85
5 3.05
Calculating r
• Excel• The correl function
• Calculator• Data into two lists• STAT->TEST>E:
LinRegTTest• Enter two lists• Highlight
CALCULATE, Select Enter
• Find r (and t and p)
Budget Gross
18.5 81.8
72 75
0.25 12
55 68.75
10 138.3
70 19.8
17 72
8 107.9
Formal Hypothesis Test
• Test whether the linear correlation is significant• Hypothesis
• H0: ρ = 0 (no significant linear correlation)
• H1: ρ 0 (significant linear correlation)
• Two-tailed test• Still need a significance level• Two methods for calculating the test statistic and
critical value
1 - r 2
n - 2
rt =
Test Statistic and Critical Value
• Test statistic:
• Critical values: – T-table
– Two-tailed alpha heading
– Degrees of freedom = n - 2
Test Statistic and Critical Value
• Test statistic: r
• Critical value• Use to Table A-5
456789
101112131415161718192025303540455060708090
100
n
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05= .01
For Example
Is there a correlation between engine size and mileage? If so, is it significant?
r =
Size Mileage
2.2 23
2 23
3 19
2.3 23
4.6 17
2.5 20
4 17
2.4 22
Common Errors Involving Correlation
1. Causation: It is wrong to conclude that correlation implies causality.
1. If strongly correlated, we can not always assume “x causes y”
1. y might cause x
2. The both might be caused by z
2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.
3. There may be some relationship between x and y even when there is no significant linear correlation.
Homework
• For each of the following pairs of data find the linear correlation coefficient and determine if the correlation is significant.
MathCritical
Reading
720 690
720 590
690 500
680 490
550 470
480 560
664 654
750 710
650 680
560 610
LengthBraking Distance
194 131
183 136
194 129
191 127
198 146
196 146
200 155
188 139
197 133
200 131
191 131
Homework
• For each of the following pairs of data find the linear correlation coefficient and determine if the correlation is significant.
depth (ft)
Velocity (ft/sec)
0.7 1.55
2.0 1.11
2.6 1.42
3.3 1.39
4.6 1.39
5.9 1.14
7.3 0.91
8.6 0.59
9.9 0.59
10.6 0.41
11.2 0.22
Altitude (km)Temp
(C)
0.0 15.0
0.5 11.8
1.0 8.5
1.5 5.3
2.0 2.0
2.5 -1.2
3.0 -4.5
3.5 -7.7
4.0 -11.0
4.5 -14.2
5.0 -17.5