112
Chapter 3 – Examining Relationships

Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Embed Size (px)

Citation preview

Page 1: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Chapter 3 – Examining Relationships

Page 2: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Scatterplots and Correlation - 3.1

Page 3: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Scatterplots: Shows a relationship between two variables.

Explanatory Variables: Variable on the x-axis.Influences the response

Response Variables: Variable on the y-axis.

Response to a variable

Page 4: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Looking at Scatterplots:

• Direction: Positive as x increases, y increasesNegative as x increases, y decreases

• Form: Is there a linear relationship between the two variables?

• Strength: Do the points follow a single stream that is tight to the line or is there considerable spread (or variability) around the line?

(DFS!)Describe For Scatter!

Page 5: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Can the NOAA predict where a hurricane will go?

Page 6: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

•The example in the text shows a negative association between central pressure and maximum wind speed•As the central pressure increases, the maximum wind speed decreases.

Page 7: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: Diagnostics On!

Catalog – Alpha “D” – Diagnostics On - Enter

Page 8: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: Scatterplots

L1: Explanatory Variable

L2: Response Variable

Use statplot to graph

Page 9: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Scientists are interested in seeing if global temperature has been increasing. They measured the average global temperature per year (in Celsius). What graph should they make?

Page 10: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Histogram of Global Temp.

Page 11: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

What does the scatterplot tell us that the histogram didn’t?

Page 12: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Are female oscar winners getting older?

Page 13: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 14: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.

1. T-shirts at a store: Price of each, Number Sold

x

yD:

S:

negative

strong

$5 $50

1

100

Price of shirt

# sold

explanatory response

Page 15: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.

2. Drivers: Reaction Time, Blood Alcohol Level

x

yD:

S:

positive

strong

.01 .5

1

10

BAC

Time

explanatoryresponse

Page 16: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.

3. Cars: Age of Owner, Weight of the Car

Makes no sense!!!

Page 17: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #2:“I have never found a quantifiable predictor in 25 years of grading that was anywhere as strong as this one. If you just graded them based on length without ever reading them, you’d be right over 90 percent of the time.” The table below shows the data set that Dr. Perlman used to draw his conclusions.

Carry out your own analysis of the data. Then write a few sentences in response to each of Dr. Perlman’s conclusions.

Essay score and length for a sample of SAT essays

Words 460 422 402 365 357 278 236 201 168 156 133

Score 6 6 5 5 6 5 4 4 4 3 2

Words 114 108 100 403 401 388 320 258 236 189 128

Score 2 1 1 5 6 6 5 4 4 3 2

Words 67 697 387 355 337 325 272 150 135 73

Score 1 6 6 5 5 4 4 2 3 1

Page 18: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

D: F: S:positive Linear, one unusual point strong

Page 19: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #3:Regraph #2 with score as the dependent variable now. Do you see any differences in the graph?

**You may want to store these lists for tomorrow…

Page 20: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 21: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Correlation:

Measures the direction and strength of the linear relationship (DF only)

“r”

Must be quantitative

Page 22: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Attributes of the Correlation

1.The correlation coefficient is a unit-less measurement, denoted with the letter r, and has values between -1 and 1.

2. When r = 1 all the data points form a perfect straight line relationship with a positive slope.

3. When r = -1 all the data points form a perfect straight line relationship with a negative slope.

Page 23: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

4. Correlation treats x and y symmetrically: – The correlation of x with y is the same as the

correlation of y with x.

5. Correlation is not affected by changes in the center or scale of either variable.

– Correlation depends only on the z-scores, and they are unaffected by changes in center or scale.

Attributes of the Correlation

Page 24: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Attributes of the Correlation

6. Values of r close to 0 means that the linear relationship is weak. There is a general linear trend, but there is a lot of variability around that trend.

7. When r = 0 there is no relationship between the two variables. In other words, the best fitting line has a slope of zero.

Page 25: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

8. Outliers have a large influence on the correlation coefficient. The correlation is NOT resistant to outliers.

Attributes of the Correlation

9. Correlation does not describe curved relationships! (ONLY LINEAR)

Page 26: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Guidelines: How strong is the linear relationship?

0 < r < 0.3 = weak positive -0.3 < r < 0 = weak negative0.4 < r < 0.7 = moderate positive -0.4 < r < -0.7 = moderate negative0.8 < r < 1 = strong positive -0.8 < r < -1 = strong negative

Page 27: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds):

Page 28: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

•If we had to put a number on the strength, we would not want it to depend on the units we used.•A scatterplot of heights (in centimeters) and weights (in kilograms) doesn’t change the shape of the pattern:

Page 29: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Types of Correlation:

r = 0 r = -0.3

r = 0.5 r = -0.7

r = 0.9 r = -0.99

Example #4

r = 0

r = -0.7

r = 0.5

r = -0.99 r = -0.3

r = 0.9

Page 30: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• Don’t assume the relationship is linear just because the correlation coefficient is high.

Here the correlation is 0.979, but the relationship is actually bent.

Page 31: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #5:What is wrong with the following statements?

1.There is a strong correlation between the gender of American workers and their income.

Gender is categorical

Page 32: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #5:What is wrong with the following statements?

b. We found a high correlation (r = 1.09) between students’ rating of faculty teaching and ratings made by other faculty members.

r can’t be bigger than 1

Page 33: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #5:What is wrong with the following statements?

c. We found a very weak correlation (r = -0.95) which suggests little relationship between income and hours spent at casinos.

r = -0.95 is a strong negative relationship

Page 34: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #5:What is wrong with the following statements?

d. We found a very weak correlation (r = 0.01) which suggests little relationship between age and death rate.

Should be a very strong relationship!

Page 35: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

HOW TO CALCULATE THE CORRELATION COEFFICIENT

Remember how to calculate the z-score? We used this calculation to determine how many standard deviations our observations was from the mean.

RECALL:

z - score = z = x

Page 36: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

In this case, we were only concerned with one variable.

Now, we are considering two variables and each must be standardized.

Page 37: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Notation:

s' theofdeviation standard sampleS

s' theofn observatioth ' the

s' ofmean sample

n correlatio

x x

xix

xx

r

i

s' theofdeviation standard sampleS

s' theofn observatioth ' the

s' ofmean sample

nsobservatio ofnumber totaln

y y

yiy

yy

i

Page 38: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

FORMULA:

y

i

x

i

S

yy

S

xx

n 1

1r

Page 39: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #4:

Speed (x) 20 30 40

MPG (y) 25 35 45

Step #1: Find the following summary statistics:

3

30 10

35 10

n = ___

SPEED: Sx = _____

MPG Sy = _____

_____x

_____y

Page 40: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Step #2: Calculate z-scores

SPEED Z(x1) = Z(x2) = Z(x3) =

MPG Z(y1) = Z(y2) = Z(y3) =

PRODUCT Z(x1)Z(y1) = Z(x2)Z(y2) = Z(x3)Z(y3) =

10

3020Z

1Z

10

3030Z

0Z

10

3040Z

1Z

10

3525Z

1Z

10

3535Z

0Z

10

3545Z

1Z

1 0 1

Page 41: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Step #3: Calculate the Correlation

10113

1r

)2(2

1r

1r

Page 42: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: Correlation

L1: Explanatory Variable

L2: Response Variable

Stat-calc-LinReg(a+bx), L1, L2

(make sure your diagnostic is on!!!)

Page 43: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #7:Use your calculator to find the correlation to #2. Comment on what it means.

Words 460 422 402 365 357 278 236 201 168 156 133

Score 6 6 5 5 6 5 4 4 4 3 2

Words 114 108 100 403 401 388 320 258 236 189 128

Score 2 1 1 5 6 6 5 4 4 3 2

Words 67 697 387 355 337 325 272 150 135 73

Score 1 6 6 5 5 4 4 2 3 1

r = 0.888 D: positive S: strong

Page 44: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

3.2 – Least-Squares Regression

Page 45: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Regression line: straight line that describes the linear relationship between an explanatory variable and a response variable.

Page 46: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

LEAST SQUARES REGRESSION LINE:

• This is the best-fitting line to the data.

• The goal is to minimize the (vertical) distances of your observations (data) from your line.

• Again, we must square the distances (like the calculation of the variance) because some data points will be larger than the mean (positive) and some are smaller than the mean (negative) and they will cancel each other out. So to compensate, they are squared.

Page 47: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

We can use this line to predict a response, y, from a given explanatory variable, x.

Page 48: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Remember graphing??

Slope-Intercept formula for a line:

y = mx + b where m = ____________

and b = ____________

slope

y-intercept

Do you remember the SLOPE?

rise

run

y

x

In statistics, we write it

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

0 1y b b x

Page 49: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Facts about Least Squares Regression:

1. The distinction between explanatory and response variables is essential (which variable is used to predict which?).

2. It always passes through the point (x, y).

3. Correlation ‘r’ describes the direction and strength of the straight line, but doesn’t tell us anymore about the slope than if it is positive or negative, or zero.

Page 50: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Extrapolation: Predicting outside the range of the x values

Page 51: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• Here is a timeplot of the Energy Information Administration (EIA) predictions and actual prices of oil barrel prices. How did forecasters do?

Page 52: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #8Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.

Predicted Weight = – 393 + 5.9(length)

a. What is the slope of the line? What does it mean?

m = 5.9

For every inch in length, it adds 5.9 pounds in weight

Page 53: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #8Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.

Predicted Weight = – 393 + 5.9(length)

b. What is the y-intercept of the line? What does it mean?

b = -393

If an alligator is 0 inches, then it weights -393lbs. This makes no sense!!!

Page 54: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #8Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.

Predicted Weight = – 393 + 5.9(length)

c. Describe the relationship between weight and length of alligators.

As the length increases, their weight increases.

Page 55: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #8Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.

Predicted Weight = – 393 + 5.9(length)

d. What is the predicted weight for an alligator 90 inches long?

= -393 + 5.9(90)

= -393 + 531

= 138 lbs

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

Page 56: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Slope formula:

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

b1 rsy

sx0 1y b b x

Find slope first!

Our slope is always in units of y per unit of x

Page 57: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Y-intercept formula:

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

0 1y b b x b0 y b1xOur intercept is always in units of y

Page 58: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Fat Versus Protein

• The regression line for the Burger King data fits the data well:– The equation is

The predicted fat content for a BK Broiler chicken sandwich (with 30 g of protein) is 6.8 + 0.97(30) = 35.9 grams of fat.

Page 59: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #9: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:

Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396

Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843

a. Interpret the value of the correlation coefficient in the context of the problem.

As wine consumption increases, mean deaths from heart disease decreases.

Page 60: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #9: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:

Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396

Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843

b. Calculate the least-squares regression line predicting death rate from wine consumption.

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= -2.2971

= 191,053–(-2.29713,026)= 198004.0991

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 198,004.0991 – 2.2971x

68,3960.0843

2510

a y bx

Page 61: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #9: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:

Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396

Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843

c. Use your line to predict death rate for an average adult who consumes 4 liters of wine.

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 198,004.0991 – 2.2971x

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 198,004.0991 – 2.2971(4)

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 197,994.9107

Page 62: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #10:Consider n pairs of numbers. Suppose

Of the following, which could be the least squares regression line?(A) y = 2 + x(B) y = -6 + 2x(C) y = -10 + 3x(D) y = 5/3 – x (E) y = 6 – x

4, 3, 2, and 5.x yx S y S

Slope:

y

x

Sb r

S

5

3r

r can be between -1 and 1, so slope is between

5 5

3 3b

Passes through:

2 = 2 + 4

2 6

2 = 5/3 - 4

2 -2.33

2 = 6 - 4

2 = 2

,x y

Page 63: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: LSRL

L1: Explanatory Variable

L2: Response VariableStat-calc-LinReg(a+bx), L1, L2, vars/y-vars/Function/ Y1

Just the li

ne

Line gra

phed

Page 64: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: Tables

2nd – window, then 2nd - graph

Page 65: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #11: It's easy to measure the circumference of a tree's trunk, but not so easy to measure its height. Foresters need to develop a model for ponderosa pines that they use to predict the tree's height (in feet) from the circumference of its trunk (in inches):

Trunk Diameter

8 9 7 6 13 7 11 12

Tree Height

35 49 27 33 60 21 45 51

a. Make a scatterplot of the data and find the LSRL. Define any variables used in this equation.

Page 66: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= -1.31467 + 4.54133x

Where x = trunk diameter and

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= predicted tree height

a. Make a scatterplot of the data and find the LSRL. Define any variables used in this equation.

Page 67: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Strong, positive correlation, r = 0.88

b. How strong of an association is there?

c. They need to cut a tree down that is 10inches in diameter. What is the predicted height of the tree?

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= -1.31467 + 4.54133x

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= -1.31467 + 4.54133(10)

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 44.10ft

d. Oops! When they cut it down, it was actually 50ft tall. How much were they off?

They were 5.9ft over what they thought it would be!

Page 68: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Residual: How close is the data to the line?

Observed y – predicted

yy ˆ

y

Page 69: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• The linear model assumes that the relationship between the two variables is a perfect straight line. The residuals are the part of the data that hasn’t been modeled.

Data = Model + Residual

or (equivalently)

Residual = Data – Model

Page 70: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

residual

50ft

Page 71: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• A negative residual means the predicted value’s too big (an overestimate).

• A positive residual means the predicted value’s too small (an underestimate).

• In the figure, the estimated fat of the BK Broiler chicken sandwich is 36 g, while the true value of fat is 25 g, so the residual is –11 g of fat.

Page 72: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• Some residuals are positive, others are negative, and, on average, they cancel each other out.

• Similar to what we did with deviations, we square the residuals and add the squares.

• The smaller the sum, the better the fit.

• The line of best fit is the line for which the sum of the squared residuals is smallest, the least squares line.

Page 73: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Residual Plot: A plot that shows the residuals for all the data. A good line has no pattern in the residual plot.

Page 74: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Calculator Tip: Residual Plot

1. Calculate the LSRL

2. Graph L1 and RESID (in list)

Page 75: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example of linear residual plots

Page 76: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example of curved residual plots

Not a linear model, curved

Page 77: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example of fanning residual plots

Less accurate for larger x values (fanning)

Page 78: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Remember BK?

Page 79: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #12:Graph the residual plot of #2 and comment on what the graph tells you.

Page 80: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Slight curve, might not be a linear model, one unusual point

Page 81: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Reading Computer Output:

Predictor Coef StDev T PConstantx-variable

S = R-Sq= R-Sq(adj) =

y-intSlope

r2

Page 82: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #13:The number of students taking AP Statistics at a high school during the years of 2000-2007 is fitted with a least squares regression line. The graph of the residuals and some computer output is as follows.

How many students took AP Statistics in the year 2003?

Dependent variable is: StudentsVariable Coeff s.e. t pConstant 11 6.299 1.75 0.1313Years 13.9286 1.0506 9.25 0.0001 s = 9.758 R-sq = 93.4% R-sq(adj) = 9.24%

# Students = 11 + 13.9286(Year)

Page 83: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

# Students = 11 + 13.9286(Year)

# Students = 11 + 13.9286(3)

How many students took AP Statistics in the year 2003?

# Students = 11 + 41.7858

# Students = 52.7858

Residual = actual – predicted

5 = actual – 52.7858 57.7858 = actual

About 58 students took AP stats in 2003

Page 84: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #14:An important factor in the amount of gasoline a car uses is the size of the engine. Called “displacement”, engine size measures the volume of the cylinders in cubic inches. The regression analysis is shown.

Dependent variable is: MPG89 total cases of which 0 are missingR-squared = 60.9% R-squared (adjusted) = 60%s = 3.056 with 89 – 2 = 82 degrees of freedomVariable Coefficient s.e. of Coeff t-ratio probConstant 34.9799 1.231 28.4 0.0001Eng. Displcmt -0.066196 0.0077 -8.64 0.0001

A car you are thinking of buying is available in two different size engines, 190 cubic inches or 240 cubic inches. How much difference might this make in your gas mileage?

240 – 190 = 50 50(-0.066196) = -3.3098

About 3 miles less per gallon

Page 85: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Standard Deviation of the residuals:

Used to measure the prediction error of the line

2

residuals2

ns

Page 86: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

The Residual Standard Deviation

• The standard deviation of the residuals measures how much the points spread around the regression line.

• Check to make sure the residual plot has about the same amount of scatter throughout.

Page 87: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

The Residual Standard Deviation

• We don’t need to subtract the mean because the mean of the residuals = 0

• Make a histogram or normal probability plot of the residuals. It should look unimodal and roughly symmetric.

• Then we can apply the 68-95-99.7 Rule to see how well the regression model describes the data.

Page 88: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• The variation in the residuals is the key to assessing how well the model fits.

• In the BK menu items example, total fat has a standard deviation of 16.4 grams. The standard deviation of the residuals is 9.2 grams.

Page 89: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

http://bcs.whfreeman.com/tps3e/

Two-variable Statistical Calculator

Exercise

Page 90: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

3.2 & 3.3 – Correlation of Determination, Lurking Variables

Page 91: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Correlation of Determination: (r2)

How much of the y value is explained by the x value

Page 92: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Assessing the Predictive Power of the Equation:

1. Correlation of Determination: r2 = the correlation coefficient, squared

2. It is the fraction (or percent) of the variation in the values of y that is explained by the least-squares regression of y on x.

3. The closer r2 is to 1, the better the regression line describes the connection between x and y – in particular, predictions made with the equation will be more accurate.

Page 93: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #15The correlation between alcohol and yearly deaths from heart disease was -0.843. What percent of the variation in the yearly deaths from heart disease can be explained by the regression of yearly deaths in alcohol consumption?

r = -0.843

r2 = 0.710649

71% of deaths from heart disease can be explained by alcohol consumption.

Page 94: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #16Is there a linear relationship between marijuana consumption and other drug usage? For this regression, the percent of variability in other drug usage explained by the regression of other drugs on marijuana use as 66.5%. What is the correlation coefficient?

r = 0.815475

r2 = .665

Moderately strong, positive realtionship

Page 95: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #17Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.

a. Calculate the LSRL.

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= 0.849(143/2.008) = 60.46165339

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx = 446.9 – (60.467.557) = -10.00871464

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

= -10.0087 + 60.4617x

ˆ y a bx

1.Slope: b rSy

Sx

Calculate this first!

2. Y - intercept: a = y - bx

is the predicted number of calories and x is the serving size.

Page 96: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

b. What percent of the variability in calories is explained by the least squares line with serving size?

Example #17Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.

r2 = 0.8492 = 0.720801

72% of the variability in calories is explained by serving size

Page 97: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

c. Use this regression line to predict the average number of calories in a 35-ounce serving. Explain if the least squares would be appropriate to use in this situation.

Example #17Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.

xy 4617.600087.10ˆ )35(4617.600087.10ˆ y

1508.2106ˆ y

No, extrapolation, too far away from normal values.

Page 98: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #18:Find the correlation of determination and correlation coefficient for #12 and explain its meaning.

Dependent variable is: StudentsVariable Coeff s.e. t pConstant 11 6.299 1.75 0.1313Years 13.9286 1.0506 9.25 0.0001 s = 9.758 R-sq = 93.4% R-sq(adj) = 9.24%

93.4% of the variation of students that take AP Stats is explained by the year.

r = 0.9664, Strong, positive association between the number of AP stats students and the year.

Page 99: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Cautions in Making Predictions with Regression Lines:

1. If the correlation is not strong, predictions will not be accurate.

2. Extrapolation: Do not make predictions outside of the range for which you have data.

3. Correlation simply does not imply causation

• The correlation may be a coincidence• Both correlation variables might be directly influenced by some common underlying cause

Page 100: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

It is a variable that is not among the explanatory or response variables, but influences the interpretation of the relationship.

Lurking Variables:

Causation (z = lurking variable)

X YX Y

Z

Page 101: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 102: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 103: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 104: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Are you looking hard enough?

Page 105: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #19There is a positive correlation between the number of deaths by drowning and the number of ice cream cones sold. Is this evidence that people are not heeding the old advice to wait 2 hours after eating before swimming and are paying the price for it?

No! Summer is the lurking variable

Page 106: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Example #20 Smoke Causes Coughs: A strong relationship is

found between weekly sales of firewood and weekly sales of cough drops from September to March. Can we conclude that smoke from the fires causes coughs?

No! Winter is the lurking variable

Page 107: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

Outlier: Observation away from the other data points

Influential Point:

Observation that drastically changes the LSRL

Page 108: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…

Page 109: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• The red line shows the effects that one unusual point can have on a regression:

Page 110: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

• The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow!

Page 111: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1
Page 112: Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1

http://bcs.whfreeman.com/tps3e/

Two-variable Statistical Calculator

Outlier vs. Influential