56
Exploring Relationships Le . sson . 1 Correlation

-Notes 06 - Correlation

Embed Size (px)

DESCRIPTION

Brief discussion on correlation

Citation preview

Chapter Three Numerically Summarizing Data

Exploring Relationships

Le.sson.1

Correlation

Lesson 1: CorrelationBivariate Data

Bivariate data is data in which two variables are measured on an individual.The response variable is the variable whose value can be explained or determined based upon the value of the predictor variable.A lurking variable is one that is related to the response and/orpredictor variable, but is excluded from the analysis

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationScatter DiagramsA scatter diagram shows the relationship between two quantitative variables measured on the same individual.The value of the predictor is read on the horizontal axis and the response variable on the vertical axis.Each individual in the data set is represented by a point in the scatter diagram.Do not connect the points when drawing a scatter diagram.

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 1:Drawing a Scatter DiagramP 202, #16.An engineer wanted to determine how the weight of a car affected the gas mileage. The data represent the weight of various domestic cars and their city mileage rating (in mpg) for the 2001 model year.(a) Determine which is the likely predictor variable and which is the likely response variable.Predictor variable: weight Response variable: mileage

Weight (pounds)3565344039703305334032003230256025203065360033003625359026052370

Miles Per Gallon19201719202019282820181919192328

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 1:Drawing a Scatter DiagramP 202, #16.An engineer wanted to determine how the weight of a car affected the gas mileage. The data represent the weight of various domestic cars and their city mileage rating (in mpg) for the 2001 model year.(b) Draw a scatter diagram.

City Mileage (MPG)Weight vs. Mileage30

25

20

1520002500300035004000Weight (lbs)

Weight (pounds)3565344039703305334032003230256025203065360033003625359026052370

Miles Per Gallon19201719202019282820181919192328

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationRelationships Between Two VariablesScatter diagrams reveal the type of relationship or trend that exists between two variables.

Linear(Decreasing)NonlinearNo trend

Linear(Increasing)Nonlinear

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 2: Identifying the TrendP 199, #1 4.Determine whether the relationship between the variables is linear or non-linear.If linear, indicate whether there is a positive or negative trend.1.2.

Nonlinear3.4.

LinearNegative

LinearPositiveNonlinearUnit 2: Probability Distributionstzf

Lesson 1: CorrelationPositive Linear RelationshipsTwo variables that are linearly related are said to be positively associated when above average values of one variable are associated with above average values of the corresponding variable.

IIIyIIIIVThat is, two variables are positively associated when the values of the predictor variable increase, the values of the response variable also increase.

x

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationNegative Linear RelationshipsTwo variables that are linearly related are said to be negatively associated when above average values of one variable are associated with below average values of the corresponding variable.

IIIyIIIIVThat is, two variables are negatively associated when the values of the predictor variable increase, the values of the response variable decrease.

x

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationMeasuring the Strength of the Linear RelationshipThe linear correlation coefficient (or Pearson product moment correlation coefficient) is a measure of the strength of linear relation between two quantitative variables.We use the Greek letter (rho) to represent the population correlation coefficient and r to represent the sample correlation coefficient.

We shall only present the formula for the sample correlation coefficient:

r

xi x sx

yi y

sy

n 1The correlation coefficient is a unitless measure of association. The units of measure for x and y play no role in the interpretation of r.

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationProperties of the Linear Correlation CoefficientThe linear correlation coefficient is always between 1 and 1.

r = 1If r = +1, there is a perfect positive linear relation between the two variables.

The closer r is to +1, the stronger the evidence of positive associationbetween the two variables.

r .9r .4

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationProperties of the Linear Correlation Coefficient

r = 1If r = 1 , there is a perfect negative linear relation between the two variables.

The closer r is to 1 , the stronger the evidence of negative association between the two variables.

r .9r .4

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationProperties of the Linear Correlation CoefficientIf r is close to 0, there is little or no linear relation between the two variables.

r 0, no relationshipr 0, nonlinear relationship

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 3:Estimating Correlation from a Scatter PlotP 200, # 6.Match the correlation coefficient to the scatter diagram.

(c)r = 1(d)r = 0.992(b)r = 0.049(a)r = 0.969(a) r = 0.969(b) r = 0.049(c) r = 1(d) r = 0.992

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 4: Anticipating CorrelationP 205, #27.For each of the following statements, state whether you think the variables will have a positive correlation, negative correlation, or no correlation.(a) Number of children in the household under the age of 3 and

expenditures on diapers.

Positive correlation

(b) Interest rates on car loans and the number of cars sold.

Negative

(c) Number of hours per week on the treadmill and cholesterol level.Negative correlation(d) Price of a Big Mac and the number of MacDonalds french fries

sold in a week.

Negative correlation

(e) Shoe size and IQ.

No correlation

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationCalculating the Correlation CoefficientA more efficient formula for computing the correlation coefficient is

r SxySxx Syy

where

Sxx

(xi

x )2

xi2

xi

22n

Syy

( yi

y)2

yi2

yi n

Sxy

(xi

x )( yi

y)

xi yi

xi

yi n

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 5:Computing a Correlation

P 200, # 8.Given the data:(a) Draw a scatter diagram.

y6543210

xy25.735.252.861.962.2

123456x

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 5:Computing a Correlation

xyx2y2xy25.7432.4911.435.2927.0415.652.8257.8414.061.9363.6111.462.2364.8413.22217.811075.8265.6P 200, # 8.Given the data:(b) Compute the correlation coefficient.

Compute x

2, y

2, and xy.

Sum all columns.Calculate SSxx, SSyy, and SSxy.222

17.82

S110 13.2

S75.82 12.452

xx

Sxy 65.6

5(22)(17.8)5

yy512.72

Calculate the correlation:r

12.72(13.2)(12.452)

.99

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 5:Computing a Correlation

xyx2y2xy25.7432.4911.435.2927.0415.652.8257.8414.061.9363.6111.462.2364.8413.22217.811075.8265.6P 200, # 8.Given the data:(c) Comment on the relationship between x and y.The correlation coefficient indicates there is a strong negative linear relationship between x and y.

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 6: Weight vs. Mileage RatingP 202, #16. The data represent the weight of various domestic cars and their city mileage rating (in mpg) for the 2001 model year.(c) What type of relation that appears to exist between the weight of the car between the weight of a car and its city mileage rating.

Weight (pounds)3565344039703305334032003230

Miles Per Gallon19201719202019

City Mileage (MPG)Weight vs. Mileage30

25

20

152000250030003500400

There is a negative linear relationship between weight and mileage.

256028252028306520360018330019362519359019260523

Weight (lbs)0

237028

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 6: Drawing a Scatter DiagramP 202, #16. The data represent the weight of various domestic cars and their city mileage rating (in mpg) for the 2001 model year.(d) Compute the linear correlation coefficient between the weight of the car between the weight of a car and its city mileage rating.

Weight (pounds)3565344039703305334032003230

Miles Per Gallon19201719202019

City Mileage (MPG)Weight vs. Mileage30

25

20

15

r = .92

256028252028306520360018330019362519359019

2000250030003500400

260523

Weight (lbs)0

237028

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationCorrelation & CausationA word of caution when interpreting the correlation coefficient:A linear correlation coefficient that implies a strong positive or negative association that is computed using observational data does not imply causation among the variables.The predictor and response variables may both be determined by an unknown lurking variable.If data are obtained through a controlled experiment, then a stronglinear correlation also implies causation.

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 7: Brain Size and IntelligenceP 203, #21.Researchers interested in whether a persons brain size is related to mental capacity selected a sample of 20 students who had SAT scores higher than 1350 and administered an IQ test. Brain size was determined by an MRI scan.

(a) Use the TI-83 to draw a scatter diagram treating MRI count as the predictor variable and IQ as the response variable.

Gender Female Female Female Female Female Female Female Female Female Female

MRICount816932951545991305833868856472852244790619866662857782948066

IQ133137138132140132135130133133

Gender Male Male Male Male Male Male Male Male Male Male

MRICount949395100112110384379653539554661079549924059955003935494949589

IQ140140139133133141135139141144

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 7: Brain Size and IntelligenceP 203, #21.Researchers interested in whether a persons brain size is related to mental capacity selected a sample of 20 students who had SAT scores higher than 1350 and administered an IQ test. Brain size was determined by an MRI scan.

(:b) Use the TI-83 to compute the correlation coefficient between the MRI count and IQ.Do they appear to be linearly related?

Gender Female Female Female Female Female Female Female Female Female Female

MRICount816932951545991305833868856472852244790619866662857782948066

IQ133137138132140132135130133133

Gender Male Male Male Male Male Male Male Male Male Male

MRICount949395100112110384379653539554661079549924059955003935494949589

IQ140140139133133141135139141144

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 7: Brain Size and IntelligenceP 203, #21.Researchers interested in whether a persons brain size is related to mental capacity selected a sample of 20 students who had SAT scores higher than 1350 and administered an IQ test. Brain size was determined by an MRI scan.

(c) Gender is a lurking variable in the analysis.Draw separate scatter diagrams for each gender.What do you notice?

Gender Female Female Female Female Female Female Female Female Female Female

MRICount816932951545991305833868856472852244790619866662857782948066

IQ133137138132140132135130133133

Gender Male Male Male Male Male Male Male Male Male Male

MRICount949395100112110384379653539554661079549924059955003935494949589

IQ140140139133133141135139141144

Unit 2: Probability Distributionstzf

Lesson 1: CorrelationExample 7: Brain Size and IntelligenceP 203, #21.Researchers interested in whether a persons brain size is related to mental capacity selected a sample of 20 students who had SAT scores higher than 1350 and administered an IQ test. Brain size was determined by an MRI scan.

(d) Calculate the correlation coefficient separately for males and females.Do you still believe that MRI count and IQ are linearly related?

Gender Female Female Female Female Female Female Female Female Female Female

MRICount816932951545991305833868856472852244790619866662857782948066

IQ133137138132140132135130133133

Gender Male Male Male Male Male Male Male Male Male Male

MRICount949395100112110384379653539554661079549924059955003935494949589

IQ140140139133133141135139141144

Unit 2: Probability Distributionstzf