15
AP Statistics Chapte AP Statistics Chapte r 7: r 7: Scatterplots Scatterplots , Association, and Correlation , Association, and Correlation Concord High School Concord High School RNBriones RNBriones Starter Ch. 7 Starter Ch. 7 DO: Just Checking 1 DO: Just Checking 1- 5, p. 153 5, p. 153 Association/Direction Association/Direction: : positive, negative, no direction positive, negative, no direction Form Form: linear, curved, clusters, : linear, curved, clusters, no pattern no pattern Strength Strength: how closely the : how closely the points fit the points fit the “form form” Outliers Outliers: deviations from the : deviations from the pattern pattern Correlation Conditions Correlation Conditions Quantitative Variables Condition: Quantitative Variables Condition: Correlation applies only to quantitative variables. Don’t apply correlation to categorical data masquerading as quantitative. Check that you know the variables’ units and what they measure. Correlation Conditions Correlation Conditions Straight Enough Condition: Straight Enough Condition: You can calculate a correlation coefficient for any pair of variables. But correlation measures the strength only of the linear association, and will be misleading if the relationship is not linear. Correlation Conditions Correlation Conditions Outlier Condition: Outlier Condition: Outliers can distort the correlation dramatically. An outlier can make an otherwise small correlation look big or hide a large correlation. It can even give an otherwise positive association a negative correlation coefficient (and vice versa). When you see an outlier, it’s often a good idea to report the correlations with and without the point. Chapter 7: Chapter 7: Scatterplots Scatterplots , Association , Association and Correlation and Correlation HW Ch. 7 HW Ch. 7 1) 1) Email me at Email me at [email protected] [email protected] Subject: Ch. 7 additional assignments Subject: Ch. 7 additional assignments no later than 11 pm tonight, Oct. 22 no later than 11 pm tonight, Oct. 22 2) 2) 1- 37 odds, p. 164 37 odds, p. 164- 169 169 Chapter 7: Chapter 7: Scatterplots Scatterplots , Association , Association and Correlation and Correlation CA Standards CA Standards 2.12: Find the line of best fit to a given distribution of data 2.12: Find the line of best fit to a given distribution of data by using least by using least squares regression. (cont. for Ch 8 squares regression. (cont. for Ch 8-9) 9) 2.13: Know what the correlation coefficient of two variables mea 2.13: Know what the correlation coefficient of two variables means and are ns and are familiar with the coefficient's properties. familiar with the coefficient's properties.

Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Starter Ch. 7Starter Ch. 7DO: Just Checking 1DO: Just Checking 1--5, p. 1535, p. 153

�� Association/DirectionAssociation/Direction: :

positive, negative, no directionpositive, negative, no direction

�� FormForm: linear, curved, clusters, : linear, curved, clusters,

no patternno pattern

�� StrengthStrength: how closely the : how closely the

points fit the points fit the ““formform””

�� OutliersOutliers: deviations from the : deviations from the

patternpattern

Correlation ConditionsCorrelation Conditions

�� Quantitative Variables Condition:Quantitative Variables Condition:

�Correlation applies only to quantitative

variables.

�Don’t apply correlation to categorical

data masquerading as quantitative.

�Check that you know the variables’ units

and what they measure.

Correlation ConditionsCorrelation Conditions

�� Straight Enough Condition:Straight Enough Condition:

�You can calculate a correlation

coefficient for any pair of variables.

�But correlation measures the strength

only of the linear association, and will be

misleading if the relationship is not

linear.

Correlation ConditionsCorrelation Conditions�� Outlier Condition:Outlier Condition:

�Outliers can distort the correlation dramatically.

�An outlier can make an otherwise small correlation look big or hide a large correlation.

� It can even give an otherwise positive association a negative correlation coefficient (and vice versa).

�When you see an outlier, it’s often a good idea to report the correlations with and without the point.

Chapter 7:Chapter 7:

ScatterplotsScatterplots, Association , Association

and Correlationand Correlation

HW Ch. 7HW Ch. 71)1)Email me at Email me at [email protected]@gmail.com

Subject: Ch. 7 additional assignments Subject: Ch. 7 additional assignments

no later than 11 pm tonight, Oct. 22no later than 11 pm tonight, Oct. 22

2)2) 11--37 odds, p. 16437 odds, p. 164--169 169

Chapter 7:Chapter 7:

ScatterplotsScatterplots, Association , Association

and Correlationand Correlation

CA StandardsCA Standards2.12: Find the line of best fit to a given distribution of data 2.12: Find the line of best fit to a given distribution of data by using least by using least

squares regression. (cont. for Ch 8squares regression. (cont. for Ch 8--9)9)

2.13: Know what the correlation coefficient of two variables mea2.13: Know what the correlation coefficient of two variables means and are ns and are

familiar with the coefficient's properties.familiar with the coefficient's properties.

Page 2: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Chapter Objectives:Chapter Objectives:

At the end of this chapter students should be able to:

1) create a scatterplot to graphically depict the relationship between 2

quantitative variables

2) describe the information that a scatterplot conveys about the

relationship between 2 quantitative variables: form, direction,

strength, points that depart from the overall pattern.

3) calculate the correlation coefficient between 2 quantitative variables

using technology.

4) interpret the value of the correlation coefficient

5) describe when it is appropriate to use the correlation to describe the

relationship between 2 quantitative variables

6) list the properties of the correlation coefficient

7) apply the properties of the correlation coefficient to determine the

correlation when the units of the original variables are changed

8) describe the difference between association, correlation and cause-

and-effect.

Chapter 7 will look at relationships between two

quantitative variables x and y.

Scatterplot/Line of best Fit

Correlation

People might ask the following questions in the real

life:

1) Is the price of sneakers related to how long

they last?

2) Is smoking related to lung cancer?

3) Do baseball teams that score more runs sell

more tickets to their games?

Which average?Which average?

MeanMean MedianMedian ModeMode

•• not appropriate for not appropriate for describing highly describing highly skewed skewed distributionsdistributions

•• not appropriate for not appropriate for describing nominal describing nominal and ordinal data and ordinal data

•• choose median choose median when mean is when mean is inappropriate, inappropriate, except when except when describing nominal describing nominal datadata

•• choose mode when choose mode when describing nominal describing nominal data. However, for data. However, for nominal data, an nominal data, an average may not be average may not be needed (use needed (use percentage instead)percentage instead)

Positive SkewPositive Skew

ModeMode MedianMedian MeanMean

Negative SkewNegative Skew

MeanMean MedianMedian ModeMode

SymmetricSymmetric

MeanMean = = = = = = = = MedianMedian = = = = = = = = ModeMode

NegativelyNegatively

SkewedSkewed

Mode

Median

Mean

SymmetricSymmetric

(Not Skewed)(Not Skewed)

Mean

Median

Mode

PositivelyPositively

SkewedSkewed

Mode

Median

Mean

Starter Ch. 7Starter Ch. 7

Seven different families drove to their vacation Seven different families drove to their vacation

destinations. The table below shows the distance destinations. The table below shows the distance

they drove (in miles) and the time it took them (in they drove (in miles) and the time it took them (in

hours). Represent the data graphically and write a hours). Represent the data graphically and write a

description of the data.description of the data.

DistanceDistance 400400 411411 247247 385385 229229 217217 325325

TimeTime 6.56.5 77 4.14.1 6.56.5 3.53.5 3.83.8 5.45.4

Data Analysis ToolboxData Analysis Toolbox

To answer a statistical question of interest:To answer a statistical question of interest:DataData:: Organize and ExamineOrganize and Examine

WhoWho are the individuals described?are the individuals described?

WhatWhat are the variables?are the variables?

WhyWhy were the data gathered?were the data gathered?

When, Where, How, By WhomWhen, Where, How, By Whom were data gathered?were data gathered?

GraphGraph: : Construct an appropriate graphical displayConstruct an appropriate graphical display

Describe Describe SOCSSOCS

NumericalNumerical SummarySummary: : Calculate appropriate center and spread Calculate appropriate center and spread

((mean mean andand ss oror 55--number summarynumber summary))

InterpretationInterpretation:: Answer question Answer question in context!in context!

Page 3: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Looking at Looking at ScatterplotsScatterplotsCan the NOAA predict where a hurricane will go?Can the NOAA predict where a hurricane will go?

Looking at Looking at ScatterplotsScatterplotsCan the NOAA predict where a hurricane will go?Can the NOAA predict where a hurricane will go?

�� As the years have As the years have

passed, the predictions passed, the predictions

have improved (errors have improved (errors

have decreased).have decreased).

�� The figure shows a The figure shows a

negative direction negative direction between the year since between the year since

1970 and the and the 1970 and the and the

prediction errors made prediction errors made

by NOAA.by NOAA.

ScatterplotsScatterplots�� ScatterplotsScatterplots are the best way to start observing the relationship are the best way to start observing the relationship

between two between two quantitativequantitative variables. It shows the relationship variables. It shows the relationship

between two quantitative variables measured on the same cases.between two quantitative variables measured on the same cases.

�� The association between two quantitative variables can be shown The association between two quantitative variables can be shown

on one graph by plotting data points as ordered pairs on axes. on one graph by plotting data points as ordered pairs on axes.

Such a graph is called a Such a graph is called a scatterplotscatterplot..

�� In a In a scatterplotscatterplot, you can see patterns, trends, relationships, and , you can see patterns, trends, relationships, and

even the occasional extraordinary value sitting apart from the even the occasional extraordinary value sitting apart from the

others.others.

�� If it seems that one variable is a response to the other, then pIf it seems that one variable is a response to the other, then plot lot

that variable on the that variable on the yy--axis. It is called the axis. It is called the response response (or (or

predictedpredicted) ) variablevariable..

�� The The xx--axis then has the axis then has the explanatory explanatory (or (or predictorpredictor))variablevariable..

Response and explanatory variablesResponse and explanatory variables

�� Response Response (or (or predictedpredicted)) variable variable : the variable which we intend to model; the variable of interest

• we intend to explain through statistical modeling

�� Explanatory Explanatory (or (or predictorpredictor) ) variable variable : the variable which may be used to model the response variable

• values may be related to the response variable

�� Does nicotine gum reduce cigarette smoking?Does nicotine gum reduce cigarette smoking?

Response variableResponse variable = =

Explanatory variableExplanatory variable = =

Role for VariablesRole for Variables� It is important to determine which of the two

quantitative variables goes on the x-axis and which on

the y-axis.

� This determination is made based on the roles played

by the variables.

� When the roles are clear, the explanatoryexplanatory or predictor predictor

variablevariable goes on the x-axis, and the response variableresponse variable

(variable of interest) goes on the y-axis.

**If the relationship between the variables is unclear, it If the relationship between the variables is unclear, it

does not matter which one we identify as the does not matter which one we identify as the explanatory/response variable. Always THINK about explanatory/response variable. Always THINK about

the dataset and what you are measuring!!!!the dataset and what you are measuring!!!!

Page 4: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Role for VariablesRole for Variables� The roles that we choose for variables are more

about how we think about them rather than about

the variables themselves.

� Just placing a variable on the x-axis doesn’t

necessarily mean that it explains or predicts

anything. And the variable on the y-axis may not

respond to it in any way.

Two quantitative variablesTwo quantitative variables

What type of relationship exists between the two variables What type of relationship exists between the two variables

and is the association significant?and is the association significant?

xx yy

Cigarettes smoked per day

Score on SAT

Height

Hours of Training

Explanatory

(Independent)Variable

Response

(Dependent)Variable

A relationship between two variablesA relationship between two variables.

Number of Accidents

Shoe Size Height

Lung Capacity

Grade Point Average

IQ

ScatterplotsScatterplotsThe following are some questions that ask whether The following are some questions that ask whether

there is an association between the two variables:there is an association between the two variables:

�� Do older houses sell for less than newer ones for Do older houses sell for less than newer ones for

comparable size and quality? comparable size and quality?

�� Do students learn better with more use of Do students learn better with more use of

computer technology?computer technology?

�� Does economic status influence the amount of Does economic status influence the amount of

physical activity?physical activity?

ScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplots are the ideal way to are the ideal way to picturepicture such such

associations.associations.

Recall:Recall: BivariateBivariate relationshipsrelationships

�� An extension of An extension of univariateunivariate descriptive statisticsdescriptive statistics

�� Used to detect evidence of association in the Used to detect evidence of association in the

samplesample

�� Two variables are said to be Two variables are said to be associatedassociated if the if the

distribution of one variable differs across groups distribution of one variable differs across groups

or values defined by the other variableor values defined by the other variable

23

Recall:Recall: BivariateBivariate RelationshipsRelationships

�� Two quantitative variablesTwo quantitative variables

�� Scatter plotScatter plot

�� Side by side stem and leaf plotsSide by side stem and leaf plots

�� Two qualitative variablesTwo qualitative variables�� TablesTables

�� Bar chartsBar charts

�� One quantitative and one qualitative variableOne quantitative and one qualitative variable�� Side by side box plotsSide by side box plots

�� Bar chartBar chart

Describing AssociationsDescribing Associations�� Three main concepts make up the description of an Three main concepts make up the description of an

association between two variables: association between two variables: directiondirection, , formform, ,

and and strengthstrength..1.1. DirectionDirection is positive or negative and agrees with the slope is positive or negative and agrees with the slope

of the lineof the line

�� In positive associations, an increase in the explanatory In positive associations, an increase in the explanatory

variable leads to an increase in the response variablevariable leads to an increase in the response variable

�� ALWAYS ASK: ALWAYS ASK: WhatWhat’’s my signs my sign——positive, negative, or positive, negative, or

neither?neither?

2.2. FormForm is a description of the is a description of the shapeshape of the graphof the graph

�� A straight line is typical, but not the only shape possible.A straight line is typical, but not the only shape possible.

�� LOOK for FORM: LOOK for FORM: straight, curved, something exotic (?), straight, curved, something exotic (?),

no pattern?no pattern?

Page 5: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Describing AssociationsDescribing Associations�� Three main concepts make up the description of an Three main concepts make up the description of an

association between two variables: association between two variables: directiondirection, , formform, ,

and and strengthstrength..3.3. StrengthStrength is a description of is a description of how clearly the data follow how clearly the data follow

the form the form stated.stated.

�� LOOK for STRENGTH: LOOK for STRENGTH: how much scatter?how much scatter?

4.4. Look for deviation from patterns or unusual featuresLook for deviation from patterns or unusual features

�� Are there outliers or subgroups?Are there outliers or subgroups?

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots�� DIRECTIONDIRECTION

•• A pattern that runs from the upper left to the lower A pattern that runs from the upper left to the lower

right is said to have a right is said to have a negativenegative direction. direction.

•• A trend running the other way has a A trend running the other way has a positivepositive

direction.direction.

Positive linear relationship No relationship Negative linear relationship

�� FORMFORM

�� If there is a straight If there is a straight

line (line (linearlinear) )

relationship, it will relationship, it will

appear as a cloud or appear as a cloud or

swarm of points swarm of points

stretched out in a stretched out in a

generally consistent, generally consistent,

straight form.straight form.

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

�� FORMFORM

�� If the relationship isnIf the relationship isn’’t straight, but curves gently, t straight, but curves gently,

while still increasing or decreasing steadily, while still increasing or decreasing steadily,

we can often find ways to make it more nearly we can often find ways to make it more nearly

straight.straight.

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

�� FORMFORM

�� If the relationship curves sharply, If the relationship curves sharply,

the methods of this book cannot really help us.the methods of this book cannot really help us.

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

�� STRENGTHSTRENGTH

�� At one extreme, the points appear to follow a single At one extreme, the points appear to follow a single

stream stream

(whether straight, curved, or bending all over the (whether straight, curved, or bending all over the

place).place).

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

Page 6: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

�� STRENGTHSTRENGTH

�� At the other extreme, the points appear as a vague At the other extreme, the points appear as a vague

cloud with no discernable trend or pattern:cloud with no discernable trend or pattern:

�� Note: we will quantify the amount of scatter soon.Note: we will quantify the amount of scatter soon.

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

�� UNUSUAL FEATURESUNUSUAL FEATURES

�� Look for the unexpected.Look for the unexpected.

�� Often the most interesting thing to see in a Often the most interesting thing to see in a scatterplotscatterplotis the thing you never thought to look for. is the thing you never thought to look for.

�� One example of such a surprise is an One example of such a surprise is an outlieroutlier standing standing away from the overall pattern of the away from the overall pattern of the scatterplotscatterplot..

�� Clusters or subgroups should also raise questions.Clusters or subgroups should also raise questions.

Describing Associations: Looking at Describing Associations: Looking at

ScatterplotsScatterplots

Typical Patterns of Typical Patterns of ScatterplotsScatterplots

No relationship

Negative nonlinear relationship

This is a weak linear relationship.

A non linear relationship seems to

fit the data better.

Nonlinear (concave) relationship

Positive linear relationship Negative linear relationship

Drawing Drawing ScatterplotsScatterplots by handby hand

�� Plot the explanatory variable, if there is one, on Plot the explanatory variable, if there is one, on

the horizontal axis of the the horizontal axis of the scatterplotscatterplot. .

Note:Note: We usually call the explanatory variable We usually call the explanatory variable xxand the response variable and the response variable yy. If there is no . If there is no

explanatoryexplanatory--response distinction, either variable response distinction, either variable

can go on the horizontal axis.can go on the horizontal axis.

�� Label both axes.Label both axes.

�� Scale the horizontal and vertical axes. The Scale the horizontal and vertical axes. The

intervals must be uniform; intervals must be uniform; i.ei.e, the distance , the distance

between tick marks must be the same.between tick marks must be the same.

Drawing Drawing ScatterplotsScatterplotsBy hand:By hand:

-- Graph on a normal Graph on a normal xx--yy planeplane

-- Make sure to label and scale axes Make sure to label and scale axes (including units if known!)(including units if known!)

-- You do not have to show the origin!You do not have to show the origin!

By TI:By TI:

-- Enter dataEnter data

-- 22ndnd:Stat Plot :Stat Plot –– 11stst type of graphtype of graph

Starter Ch. 7Starter Ch. 7

Seven different families drove to their vacation Seven different families drove to their vacation

destinations. The table below shows the distance destinations. The table below shows the distance

they drove (in miles) and the time it took them (in they drove (in miles) and the time it took them (in

hours). Represent the data graphically and write a hours). Represent the data graphically and write a

description of the data.description of the data.

DistanceDistance 400400 411411 247247 385385 229229 217217 325325

TimeTime 6.56.5 77 4.14.1 6.56.5 3.53.5 3.83.8 5.45.4

Page 7: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

37

Describing the AssociationDescribing the Association

�� Which variable should go on the Which variable should go on the xx--axis?axis?

•• Do cold days cause gas usage, or does gas usage cause cold Do cold days cause gas usage, or does gas usage cause cold

days(!)?days(!)?

•• Since cold days cause gas usage, degreeSince cold days cause gas usage, degree--days is the days is the explanatory explanatory

variablevariable and goes on the xand goes on the x--axisaxis..

�� Gas usage responds to degreeGas usage responds to degree--days, so it is the days, so it is the response variableresponse variable

and goes on the yand goes on the y--axisaxis..

�� Set up Stat Plot 1 to show the data as a Set up Stat Plot 1 to show the data as a scatterplotscatterplot and and

display the graph.display the graph.

•• Plot the dataPlot the data

�� Write a description of the association addressing Write a description of the association addressing

direction, shape and strength.direction, shape and strength.

•• There is a moderately strong positive linear association betweenThere is a moderately strong positive linear association between

coldness (degreecoldness (degree--days) and gas usagedays) and gas usage

StudentStudent Number of Number of

BeersBeers

BACBAC

1 5 0.1

2 2 0.03

3 9 0.19

6 7 0.095

7 3 0.07

9 3 0.02

11 4 0.07

13 5 0.085

4 8 0.12

5 3 0.04

8 5 0.06

10 5 0.05

12 6 0.1

14 7 0.09

15 1 0.01

16 4 0.05

Here we have two quantitative

variables for each of 16 students.

1. How many beers they drank,

and

2. Their blood alcohol level

(BAC)

We are interested in the

relationship between the two

variables: What are the

variables? How is one affected

by changes in the other one?

Example:Example:

StudentStudent BeersBeers BACBAC

1 5 0.1

2 2 0.03

3 9 0.19

6 7 0.095

7 3 0.07

9 3 0.02

11 4 0.07

13 5 0.085

4 8 0.12

5 3 0.04

8 5 0.06

10 5 0.05

12 6 0.1

14 7 0.09

15 1 0.01

16 4 0.05

In a In a scatterplotscatterplot one axis is used to represent each of the one axis is used to represent each of the

variables, and the data are plotted as points on the variables, and the data are plotted as points on the

graph. graph.

Some plots don’t have clear explanatory and response variables.

Do calories explain

sodium amounts?

Does percent return on

Treasury bills explain percent

return on common stocks?

Recap: Interpreting Recap: Interpreting scatterplotsscatterplots

�� After plotting two variables on a After plotting two variables on a scatterplotscatterplot, ,

we describe the relationship by examining the we describe the relationship by examining the

formform, , directiondirection, and , and strengthstrength of the of the

association. We look for an overall pattern association. We look for an overall pattern ……

�� FormForm: linear, curved, clusters, no pattern: linear, curved, clusters, no pattern

�� DirectionDirection: positive, negative, no direction: positive, negative, no direction

�� StrengthStrength: how closely the points fit the : how closely the points fit the ““formform””

((how much variation, or how much variation, or scatter,scatter, there is around the main formthere is around the main form))

�� …… and deviations from that pattern.and deviations from that pattern.

�� OutliersOutliers

Interpreting Interpreting scatterplotsscatterplots�� StrengthStrength: how closely the points fit the : how closely the points fit the ““formform””

((how much variation, or how much variation, or scatter,scatter, there is around the main formthere is around the main form))

With a strong relationship,

you can get a pretty good

estimate of y if you know x.

With a weak relationship, for

any x you might get a wide

range of y values.

Page 8: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

This is a very strong very strong relationship.

The daily amount of gas consumed

can be predicted quite accurately for

a given temperature value.

This is a weakweak relationship. For a

particular state median household

income, you can’t predict the state

per capita income very well.

Looking at Looking at ScatterplotsScatterplots

Scatterplot of the size of a diamond ring in

carats and the price in dollars.

After plotting two variables After plotting two variables

on a on a scatterplotscatterplot, we describe , we describe

the relationship by examining the relationship by examining

the the formform, , directiondirection, and , and

strengthstrength of the association. of the association.

We look for an overall patternWe look for an overall pattern

�� FormForm: linear, curved, : linear, curved,

clusters, no patternclusters, no pattern

�� Association/DirectionAssociation/Direction: :

positive, negative, no positive, negative, no

directiondirection

Looking at Looking at ScatterplotsScatterplots

Scatterplot of the size of a diamond ring in

carats and the price in dollars.

�� StrengthStrength: how closely the : how closely the

points fit the points fit the ““formform””

�� OutliersOutliers: deviations from : deviations from

the patternthe pattern

AssociationAssociation�� Suppose you were to collect data for each pair of variables. YouSuppose you were to collect data for each pair of variables. You

want to make a want to make a scatterplotscatterplot. Which variable would you use as the . Which variable would you use as the

explanatory variable and which as the response variable? Why? explanatory variable and which as the response variable? Why?

What would you expect to see in the What would you expect to see in the scatterplotscatterplot? Discuss the likely ? Discuss the likely

direction, form , and strength.direction, form , and strength.

a)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold

b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure

c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility

d)d) All elementary school students: weight, score on a All elementary school students: weight, score on a

reading test reading test

AssociationAssociationa)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold

b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure

c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility

d)d) All elementary school students: weight, score All elementary school students: weight, score

on a reading test on a reading test

Explanatory Response Direction Form Strength

a) T-shirt price Number of T-

shirts

Negative Linear/

straight

Moderate(A very low price would likely

lead to a very high sales, and a

very high price would lead to

low sales.)

b) Depth of the

water

Water pressure Positive Straight Strong(The deeper you dive, the greater

the water pressure.)

�� Association/DirectionAssociation/Direction: :

positive, negative, no directionpositive, negative, no direction

�� FormForm: linear, curved, clusters, : linear, curved, clusters,

no patternno pattern

�� StrengthStrength: how closely the : how closely the

points fit the points fit the ““formform””

�� OutliersOutliers: deviations from the : deviations from the

patternpattern

AssociationAssociationa)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold

b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure

c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility

d)d) All elementary school students: weight, score All elementary school students: weight, score

on a reading test on a reading test

Explanatory Response Direction Form Strength

c) Depth of the

water

Visibility Negative Possibly

straight

Moderate(If a sample of different bodies of

water is used. If the same body of

water has visibility measured at

different depths, the association

would be strong.)

d) Weight Reading test

score

Positive Straight Moderate(Older students generally weigh

more and generally are better

readers. Therefore, students who

weight more are likely to be better

readers. This does not mean that

weight causes higher reading

scores.)

�� Association/DirectionAssociation/Direction: :

positive, negative, no directionpositive, negative, no direction

�� FormForm: linear, curved, clusters, : linear, curved, clusters,

no patternno pattern

�� StrengthStrength: how closely the : how closely the

points fit the points fit the ““formform””

�� OutliersOutliers: deviations from the : deviations from the

patternpattern

Page 9: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

#1#1

#4#4

#2 and #4#2 and #4

#3 moderately strong, curved association#3 moderately strong, curved association

#2 and #4 each show a very #2 and #4 each show a very

strong association, although some strong association, although some

might classify the association as might classify the association as

merely merely ““strongstrong””..

Looking at Looking at ScatterplotsScatterplots Looking at Looking at ScatterplotsScatterplotsWhich of the Which of the scatterplotsscatterplots show:show:

a)a)Little or no association?Little or no association?

b)b)A negative association?A negative association?

c)c)A linear association?A linear association?

d)d)A moderately strongA moderately strong

association?association?

e)e)A very strong association?A very strong association?

None, although #4 is weakNone, although #4 is weak

#3 and #4. Increases in one variable #3 and #4. Increases in one variable

are generally related to decreases in are generally related to decreases in

the other variable the other variable

#2, #3, and #4#2, #3, and #4

#2#2

#1 and #3. #1 shows a curved #1 and #3. #1 shows a curved

association and #3 shows a straight association and #3 shows a straight

associationassociation

#8, p. 165#8, p. 165 #8, p. 165#8, p. 165

Winning speeds in the Kentucky Derby have generally increased

over time. The association between year and speed is moderately moderately

strongstrong, and seems slightly curvedslightly curved, with a greater rate of increase

in winning speed before 1950 and a smaller rate of increase after

1950, suggesting that winning speeds have leveled off over time.

Quantifying StrengthQuantifying Strength

�� When determining the strength of a When determining the strength of a scatterplotscatterplot, we , we

would like a numerical value that indicates the would like a numerical value that indicates the

strength of the relationship between the explanatory strength of the relationship between the explanatory

and response variables. This numerical value is and response variables. This numerical value is

called the called the correlation coefficient, rcorrelation coefficient, r..

r =zxzy∑

n −1

Correlation ConditionsCorrelation Conditions

�� CorrelationCorrelation measures the strength of the measures the strength of the

linearlinear association between association between two two quantitative quantitative

variablesvariables. .

�� Before you use correlation, you must check Before you use correlation, you must check

several conditions:several conditions:

�� Quantitative Variables ConditionQuantitative Variables Condition

�� Straight Enough ConditionStraight Enough Condition

�� Outlier ConditionOutlier Condition

Page 10: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Correlation ConditionsCorrelation Conditions

�� Quantitative Variables Condition:Quantitative Variables Condition:

�� Correlation applies only to Correlation applies only to quantitative variablesquantitative variables. .

�� DonDon’’t apply correlation to categorical data t apply correlation to categorical data

masquerading as quantitative. masquerading as quantitative.

�� Check that you know the variablesCheck that you know the variables’’ units and what units and what

they measure.they measure.

Correlation ConditionsCorrelation Conditions

�� Straight Enough Condition:Straight Enough Condition:

�� You can You can calculatecalculate a correlation coefficient for any a correlation coefficient for any

pair of variables. pair of variables.

�� But correlation measures the strength only of the But correlation measures the strength only of the

linearlinear associationassociation, and will be misleading if the , and will be misleading if the

relationship is not linear, so watch for curvature!relationship is not linear, so watch for curvature!

Correlation ConditionsCorrelation Conditions

�� Outlier Condition:Outlier Condition:

�� Outliers can distort the correlation dramatically. Outliers can distort the correlation dramatically.

�� An outlier can make an otherwise small correlation An outlier can make an otherwise small correlation

look big or hide a large correlation. look big or hide a large correlation.

�� It can even give an otherwise positive association a It can even give an otherwise positive association a

negative correlation coefficient (and vice versa). negative correlation coefficient (and vice versa).

�� When you see an outlier, itWhen you see an outlier, it’’s often a good idea to s often a good idea to

report the correlations with and without the point.report the correlations with and without the point.

Correlation PropertiesCorrelation Properties

�� The The signsign of a correlation coefficient gives the of a correlation coefficient gives the

direction of the association.direction of the association.

�� Correlation is always between Correlation is always between ––1 and +11 and +1. .

�� Correlation Correlation cancan be exactly equal to be exactly equal to ––1 or +1, but 1 or +1, but

these values are unusual in real data because they these values are unusual in real data because they

mean that all the data points fall mean that all the data points fall exactlyexactly on a single on a single

straight line.straight line.

�� A correlation near zero corresponds to a weak A correlation near zero corresponds to a weak

linear association.linear association.

Correlation PropertiesCorrelation Properties

�� ““rr”” ranges from ranges from ––11 to to +1+1. .

�� ““rr”” quantifies the strength quantifies the strength

and direction of a and direction of a linearlinear

relationship between two relationship between two

quantitativequantitative variables.variables.

��Strength:Strength: How closely the How closely the

points follow a straight points follow a straight

line. line.

��DirectionDirection is positive when is positive when

individuals with higher x individuals with higher x

values tend to have higher values tend to have higher

values of y.values of y.

Correlation PropertiesCorrelation Properties

rr StrengthStrength

0.0 ≤≤≤≤ r r ≤≤≤≤ 0.1 None/Very weak

0.1 < r r ≤≤≤≤ 0.3 Small/Weak

0.3 < r r ≤≤≤≤ 0.5 Medium/Moderate

0.5 < r r ≤≤≤≤ 1.0 Strong

�As a rule of thumb, the following guidelines on strength of relationship are often useful (though many experts would somewhat disagree on the choice of boundaries).

Page 11: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Correlation PropertiesCorrelation Properties

0.0060.006 0.7770.777

--0.9230.923 --0.4870.487

Correlation PropertiesCorrelation Properties

�� Correlation treats Correlation treats xx and and yy symmetrically: symmetrically:

�� The correlation of The correlation of xx with with yy is the same as the is the same as the correlation of correlation of yy with with xx..

�� Correlation has Correlation has no unitsno units..

�� Correlation is not affected by changes in the Correlation is not affected by changes in the center or scale of either variable. center or scale of either variable.

�� Correlation depends only on the Correlation depends only on the zz--scoresscores, and they , and they are unaffected by changes in center or scale.are unaffected by changes in center or scale.

Correlation PropertiesCorrelation Properties

�� Correlation measures the strength of the Correlation measures the strength of the linearlinear

associationassociation between the two variables. between the two variables.

�� Variables can have a strong association but still Variables can have a strong association but still have a small correlation if the association isnhave a small correlation if the association isn’’t t linear.linear.

�� Correlation is Correlation is sensitive to outlierssensitive to outliers. A single . A single outlying value can make a small correlation outlying value can make a small correlation large or make a large one small.large or make a large one small.

Correlation Properties SummaryCorrelation Properties Summary

�� Sign of Sign of rr gives the direction of associationgives the direction of association

�� Correlation is always between Correlation is always between --1 and +11 and +1

�� FlippingFlipping xx and and yy does NOT affectdoes NOT affect rr

�� rr has NO units!! It has been standardizedhas NO units!! It has been standardized

�� Changing units on Changing units on xx or or yy does not affect does not affect rr

�� rr measures a LINEAR relationship only!measures a LINEAR relationship only!

�� rr is nonis non--resistant to outliersresistant to outliers

Correlation Correlation ≠≠ CausationCausation

� Just because a correlation exists between two

factors doesn’t mean one factor causes the

other factor, or in fact, that there is any

relationship at all between the two factors.

Correlation Correlation ≠≠ CausationCausation

�� Whenever we have a strong correlation, it is Whenever we have a strong correlation, it is

tempting to explain it by imagining that the tempting to explain it by imagining that the

predictor variable has predictor variable has causedcaused the response to the response to

help.help.

�� ScatterplotsScatterplots and correlation coefficients and correlation coefficients nevernever

prove causation.prove causation.

�� A hidden variable that stands behind a A hidden variable that stands behind a

relationship and determines it by relationship and determines it by

simultaneously affecting the other two simultaneously affecting the other two

variables is called a variables is called a lurking variablelurking variable..

Page 12: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Beware the Beware the ““Lurking VariableLurking Variable””

• A lurking variablelurking variable is a variable that is notnot among

the explanatory or response variables in a study

and yet may influence the interpretation of

relationships among variables.

• An association between two variables x and y

can reflect many types of relationship among x,

y, and one or more lurking variables. In other

words, association does not imply causationassociation does not imply causation.

• Correlations based on averages are usually too

high when applied to individuals.

Finding Correlation Using the TIFinding Correlation Using the TI

�� Press Press Stat: Calc: 4:LinRegStat: Calc: 4:LinReg

� If rr does not show, you will need to turn

DiagnosticsOnDiagnosticsOn.

� Go to 22ndnd:0 (Catalog):0 (Catalog), scroll down to

DiagnosticsOnDiagnosticsOn and hit Enter twice.

Straightening Straightening ScatterplotsScatterplots

�� Straight line relationships are the ones that we Straight line relationships are the ones that we

can measure with correlation. can measure with correlation.

�� When a When a scatterplotscatterplot shows a bent form that shows a bent form that

consistently increases or decreases, we can consistently increases or decreases, we can

often straighten the form of the plot by often straighten the form of the plot by rere--

expressing one or both variablesexpressing one or both variables..

Straightening Straightening ScatterplotsScatterplots

�� A A scatterplotscatterplot of f/stop vs. shutter speed shows of f/stop vs. shutter speed shows

a bent relationship:a bent relationship:

Straightening Straightening ScatterplotsScatterplots

�� ReRe--expressing f/stop vs. shutter speed by expressing f/stop vs. shutter speed by

squaring the f/stop valuessquaring the f/stop values straightens the straightens the

relationship:relationship:

Find the ErrorsFind the Errors……�� Your economics instructor assigns your class to investigate Your economics instructor assigns your class to investigate

factors associated with the gross domestic product (GDP) of factors associated with the gross domestic product (GDP) of

nations. Each student examines a different factor (such as nations. Each student examines a different factor (such as

life expectancy, literacy rate, etc) for a few countries and life expectancy, literacy rate, etc) for a few countries and

reports to the class. Explain the mistakes in the statements reports to the class. Explain the mistakes in the statements

below:below:

a) My correlation of a) My correlation of --0.772 shows that there is almost no 0.772 shows that there is almost no

association between GDP and infant mortality rate.association between GDP and infant mortality rate.

A correlation of A correlation of --0.772 is fairly strong0.772 is fairly strong

b) There was a correlation of 0.44 between GDP and b) There was a correlation of 0.44 between GDP and

continent.continent.

Continent is not a quantitative variableContinent is not a quantitative variable

Correlation cannot be calculated here.Correlation cannot be calculated here.

Page 13: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Find the ErrorsFind the Errors……�� Your economics instructor assigns your class to investigate Your economics instructor assigns your class to investigate

factors associated with the gross domestic product (GDP) of factors associated with the gross domestic product (GDP) of

nations. Each student examines a different factor (such as nations. Each student examines a different factor (such as

life expectancy, literacy rate, etc) for a few countries and life expectancy, literacy rate, etc) for a few countries and

reports to the class. Explain the mistakes in the statements reports to the class. Explain the mistakes in the statements

below:below:

c) There was a very strong correlation of 1.22 between life c) There was a very strong correlation of 1.22 between life

expectancy and GDP.expectancy and GDP.

Correlation cannot be higher than 1Correlation cannot be higher than 1

d) The correlation between literacy rate and GDP was 0.83. d) The correlation between literacy rate and GDP was 0.83.

This shows that countries wanting to increase their This shows that countries wanting to increase their

standard of living should invest heavily in education.standard of living should invest heavily in education.

Correlation does not imply causation.Correlation does not imply causation.

Chapter 7 ExercisesChapter 7 Exercises

Chapter 7 ExercisesChapter 7 Exercises Chapter 7 ExercisesChapter 7 Exercises

Chapter 7 ExercisesChapter 7 Exercises Chapter 7 ExercisesChapter 7 Exercises

Page 14: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Chapter 7 ExercisesChapter 7 Exercises

a) None of the scatterplotsshow little or no association, although # 4 is very weak.

Chapter 7 ExercisesChapter 7 Exercises

b) #3 and #4 show negative association. Increases in one variable are generally related to decreases in the other variable.

Chapter 7 ExercisesChapter 7 Exercises

c) #2, #3, and #4 each show a straight association.

Chapter 7 ExercisesChapter 7 Exercises

d) #2 shows a moderately strong association.

Chapter 7 ExercisesChapter 7 Exercises

e) #1 and #3 each show a very strong association. #1 shows a curved association and #3 shows a straight association.

Chapter 7 Exercises, #33Chapter 7 Exercises, #33

Page 15: Starter Ch. 7 Correlation ConditionsCorrelation Conditionsmathbriones.weebly.com/uploads/8/3/4/0/8340232/...7) apply the properties of the correlation coefficient to determine the

AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation

Concord High SchoolConcord High School RNBrionesRNBriones

Chapter 7 Exercises, #33Chapter 7 Exercises, #33 Chapter 7 Exercises, #33Chapter 7 Exercises, #33