29

Overview

  • Upload
    moshe

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Overview. 4.2 Introduction to Correlation 4.3 Introduction to Regression. Scatterplots. Used to summarize the relationship between two quantitative variables that have been measured on the same element Graph of points (x, y) each of which represents one observation from the data set - PowerPoint PPT Presentation

Citation preview

Page 1: Overview
Page 2: Overview

Overview

4.2 Introduction to Correlation

4.3 Introduction to Regression

Page 3: Overview

ScatterplotsUsed to summarize the relationship between two quantitative variables that have been

measured on the same element

Graph of points (x, y) each of which represents one observation from the data set

One of the variables is measured along the horizontal axis and is called the x variable

The other variable is measured along the vertical axis and is called the y variable

Page 4: Overview

Predictor Variable and Response Variable

The value of the x variable can be used to predict or estimate the value of the

y variable

The x variable is referred to as the predictor variable

The y variable is called the response variable

Page 5: Overview

Scatterplot TerminologyNote the terminology in the caption to Figure

4.2.

When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable.

This terminology reinforces the notion that the y variable depends on the x variable.

Page 6: Overview

FIGURE 4.2Scatterplot of sales price versus square

footage.

Page 7: Overview

Positive relationshipAs the x variable increases in value, the y variable also tends to increase.

FIGURE 4.3 (a) Scatterplot of a positive relationship

Page 8: Overview

Negative relationshipAs the x variable increases in value, the y variable tends to decrease

FIGURE 4.3 (b) scatterplot of a negative relationship

Page 9: Overview

No apparent relationshipAs the x variable increases in value, the y

variable tends to remain unchanged

FIGURE 4.3 (c) scatterplot of no apparent relationship.

Page 10: Overview

4.2 Introduction to CorrelationObjective:By the end of this section, I will beable to…

1) Calculate and interpret the value of the correlation coefficient.

Page 11: Overview

Correlation Coefficient rMeasures the strength and direction of the

linear relationship between two variables.

sx is the sample standard deviation of the x data values.

sy is the sample standard deviation of the y data values.

)( )(( 1) x y

y yx xrn s s

Page 12: Overview

Example 4.5 - Calculating the correlation coefficient rFind the value of the correlation coefficient rfor the temperature data in Table 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Page 13: Overview

Interpreting the Correlation Coefficient r

1) Values of r close to 1 indicate a positive relationship between the two variables.

The variables are said to be positively correlated.

As x increases, y tends to increase as well.

Page 14: Overview

Interpreting the Correlation Coefficient r2) Values of r close to -1 indicate a negative

relationship between the two variables.

The variables are said to be negatively correlated.

As x increases, y tends to decrease.

Page 15: Overview

Interpreting the Correlation Coefficient r3) Other values of r indicate the lack of either

a positive or negative linear relationship between the two variables.

The variables are said to be uncorrelated

As x increases, y tends to neither increase nor decrease linearly.

Page 16: Overview

Guidelines for Interpreting the Correlation Coefficient rIf the correlation coefficient between twovariables isgreater than 0.7, the variables are positively

correlated.between 0.33 and 0.7, the variables are

mildly positively correlated.between –0.33 and 0.33, the variables are

not correlated.between –0.7 and –0.33, the variables are

mildly negatively correlated. less than –0.7, the variables are negatively

correlated.

Page 17: Overview

Example 4.6 - Interpreting the correlation coefficientInterpret the correlation coefficient found in Example 4.5.

Page 18: Overview

Example 4.6 continuedSolution

In Example 4.5, we found the correlation coefficient for the relationship between high and low temperature to be r = 0.9761.

r = 0.9761 very close to 1. We would therefore say that high and low

temperatures for these 10 American cities are strongly positively correlated.

As low temperature increases, high temperatures also tend to increase.

Page 19: Overview

Equivalent Computational Formula for Calculating the Correlation Coefficient r

2 22 2

/

/ /

xy x y nr

x x n y y n

Page 20: Overview

Example 4.7Use the computational formula to calculate the correlation coefficient r for the relationshipbetween square footage and sales price of the eight home lots for sale in Glen Ellyn from Table 4.6 (Example 4.3 in Section 4.1).

Page 21: Overview

SummarySection 4.2 introduces the correlation coefficient r, a measure of the strength of linear

association between two numeric variables.

Values of r close to 1 indicate that the variables are positively correlated.

Values of r close to –1 indicate that the variables are negatively correlated.

Values of r close to 0 indicate that the variables are not correlated.

Page 22: Overview

4.3 Introduction to RegressionObjectives:By the end of this section, I will beable to…

1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line.

2) Predict values of y for given values of x.

Page 23: Overview

Equation of the Regression LineApproximates the relationship between x

and y

The equation is where the regression coefficients are the

slope, b1, and the y intercept, b0.

The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.

0 1y b b x

Page 24: Overview

Example 4.8 - Calculating the regression coefficients b0 and b1

Find the value of the regression coefficients b0 and b1 for the temperature data inTable 4.11.

Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Page 25: Overview

Example 4.8 continuedStep 4:

Thus, the equation of the regression line for the temperature data is

10.0533 0.9865y x

Page 26: Overview

Example 4.8 continuedSince y and x represent high and low

temperatures, respectively, this equation is read as follows:

“The estimated high temperature for an American city is 10.0533 degrees Fahrenheit plus 0.9865 times the low temperature for that city.”

Page 27: Overview

Using the Regression Equation to Make PredictionsFor any particular value of x, the predicted

value for y lies on the regression line.

Example 4.11

Suppose we are considering moving to a city that has a low temperature of 47 degrees Fahrenheit (ºF) on this particular winter’s day. What would the estimated high temperature be for this city?

Page 28: Overview

Example 4.11 continuedSolution

Plug the value of 47ºF for the variable low into the regression equation from Example 4.8:

We would say: “The estimated high temperature for an American city with a low of 47ºF, is 56.4188ºF.”

10.0533 0.9865

10.0533 0.9865 47

56.4188

y low

Page 29: Overview

Interpreting the SlopeRelationship Between Slope and Correlation Coefficient

The slope b1 of the regression line and the correlation coefficient r always have the same sign.

b1 is positive if and only if r is positive.

b1 is negative if and only if r is negative.