Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Starter Ch. 7Starter Ch. 7DO: Just Checking 1DO: Just Checking 1--5, p. 1535, p. 153
�� Association/DirectionAssociation/Direction: :
positive, negative, no directionpositive, negative, no direction
�� FormForm: linear, curved, clusters, : linear, curved, clusters,
no patternno pattern
�� StrengthStrength: how closely the : how closely the
points fit the points fit the ““formform””
�� OutliersOutliers: deviations from the : deviations from the
patternpattern
Correlation ConditionsCorrelation Conditions
�� Quantitative Variables Condition:Quantitative Variables Condition:
�Correlation applies only to quantitative
variables.
�Don’t apply correlation to categorical
data masquerading as quantitative.
�Check that you know the variables’ units
and what they measure.
Correlation ConditionsCorrelation Conditions
�� Straight Enough Condition:Straight Enough Condition:
�You can calculate a correlation
coefficient for any pair of variables.
�But correlation measures the strength
only of the linear association, and will be
misleading if the relationship is not
linear.
Correlation ConditionsCorrelation Conditions�� Outlier Condition:Outlier Condition:
�Outliers can distort the correlation dramatically.
�An outlier can make an otherwise small correlation look big or hide a large correlation.
� It can even give an otherwise positive association a negative correlation coefficient (and vice versa).
�When you see an outlier, it’s often a good idea to report the correlations with and without the point.
Chapter 7:Chapter 7:
ScatterplotsScatterplots, Association , Association
and Correlationand Correlation
HW Ch. 7HW Ch. 71)1)Email me at Email me at [email protected]@gmail.com
Subject: Ch. 7 additional assignments Subject: Ch. 7 additional assignments
no later than 11 pm tonight, Oct. 22no later than 11 pm tonight, Oct. 22
2)2) 11--37 odds, p. 16437 odds, p. 164--169 169
Chapter 7:Chapter 7:
ScatterplotsScatterplots, Association , Association
and Correlationand Correlation
CA StandardsCA Standards2.12: Find the line of best fit to a given distribution of data 2.12: Find the line of best fit to a given distribution of data by using least by using least
squares regression. (cont. for Ch 8squares regression. (cont. for Ch 8--9)9)
2.13: Know what the correlation coefficient of two variables mea2.13: Know what the correlation coefficient of two variables means and are ns and are
familiar with the coefficient's properties.familiar with the coefficient's properties.
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Chapter Objectives:Chapter Objectives:
At the end of this chapter students should be able to:
1) create a scatterplot to graphically depict the relationship between 2
quantitative variables
2) describe the information that a scatterplot conveys about the
relationship between 2 quantitative variables: form, direction,
strength, points that depart from the overall pattern.
3) calculate the correlation coefficient between 2 quantitative variables
using technology.
4) interpret the value of the correlation coefficient
5) describe when it is appropriate to use the correlation to describe the
relationship between 2 quantitative variables
6) list the properties of the correlation coefficient
7) apply the properties of the correlation coefficient to determine the
correlation when the units of the original variables are changed
8) describe the difference between association, correlation and cause-
and-effect.
Chapter 7 will look at relationships between two
quantitative variables x and y.
Scatterplot/Line of best Fit
Correlation
People might ask the following questions in the real
life:
1) Is the price of sneakers related to how long
they last?
2) Is smoking related to lung cancer?
3) Do baseball teams that score more runs sell
more tickets to their games?
Which average?Which average?
MeanMean MedianMedian ModeMode
•• not appropriate for not appropriate for describing highly describing highly skewed skewed distributionsdistributions
•• not appropriate for not appropriate for describing nominal describing nominal and ordinal data and ordinal data
•• choose median choose median when mean is when mean is inappropriate, inappropriate, except when except when describing nominal describing nominal datadata
•• choose mode when choose mode when describing nominal describing nominal data. However, for data. However, for nominal data, an nominal data, an average may not be average may not be needed (use needed (use percentage instead)percentage instead)
Positive SkewPositive Skew
ModeMode MedianMedian MeanMean
Negative SkewNegative Skew
MeanMean MedianMedian ModeMode
SymmetricSymmetric
MeanMean = = = = = = = = MedianMedian = = = = = = = = ModeMode
NegativelyNegatively
SkewedSkewed
Mode
Median
Mean
SymmetricSymmetric
(Not Skewed)(Not Skewed)
Mean
Median
Mode
PositivelyPositively
SkewedSkewed
Mode
Median
Mean
Starter Ch. 7Starter Ch. 7
Seven different families drove to their vacation Seven different families drove to their vacation
destinations. The table below shows the distance destinations. The table below shows the distance
they drove (in miles) and the time it took them (in they drove (in miles) and the time it took them (in
hours). Represent the data graphically and write a hours). Represent the data graphically and write a
description of the data.description of the data.
DistanceDistance 400400 411411 247247 385385 229229 217217 325325
TimeTime 6.56.5 77 4.14.1 6.56.5 3.53.5 3.83.8 5.45.4
Data Analysis ToolboxData Analysis Toolbox
To answer a statistical question of interest:To answer a statistical question of interest:DataData:: Organize and ExamineOrganize and Examine
WhoWho are the individuals described?are the individuals described?
WhatWhat are the variables?are the variables?
WhyWhy were the data gathered?were the data gathered?
When, Where, How, By WhomWhen, Where, How, By Whom were data gathered?were data gathered?
GraphGraph: : Construct an appropriate graphical displayConstruct an appropriate graphical display
Describe Describe SOCSSOCS
NumericalNumerical SummarySummary: : Calculate appropriate center and spread Calculate appropriate center and spread
((mean mean andand ss oror 55--number summarynumber summary))
InterpretationInterpretation:: Answer question Answer question in context!in context!
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Looking at Looking at ScatterplotsScatterplotsCan the NOAA predict where a hurricane will go?Can the NOAA predict where a hurricane will go?
Looking at Looking at ScatterplotsScatterplotsCan the NOAA predict where a hurricane will go?Can the NOAA predict where a hurricane will go?
�� As the years have As the years have
passed, the predictions passed, the predictions
have improved (errors have improved (errors
have decreased).have decreased).
�� The figure shows a The figure shows a
negative direction negative direction between the year since between the year since
1970 and the and the 1970 and the and the
prediction errors made prediction errors made
by NOAA.by NOAA.
ScatterplotsScatterplots�� ScatterplotsScatterplots are the best way to start observing the relationship are the best way to start observing the relationship
between two between two quantitativequantitative variables. It shows the relationship variables. It shows the relationship
between two quantitative variables measured on the same cases.between two quantitative variables measured on the same cases.
�� The association between two quantitative variables can be shown The association between two quantitative variables can be shown
on one graph by plotting data points as ordered pairs on axes. on one graph by plotting data points as ordered pairs on axes.
Such a graph is called a Such a graph is called a scatterplotscatterplot..
�� In a In a scatterplotscatterplot, you can see patterns, trends, relationships, and , you can see patterns, trends, relationships, and
even the occasional extraordinary value sitting apart from the even the occasional extraordinary value sitting apart from the
others.others.
�� If it seems that one variable is a response to the other, then pIf it seems that one variable is a response to the other, then plot lot
that variable on the that variable on the yy--axis. It is called the axis. It is called the response response (or (or
predictedpredicted) ) variablevariable..
�� The The xx--axis then has the axis then has the explanatory explanatory (or (or predictorpredictor))variablevariable..
Response and explanatory variablesResponse and explanatory variables
�� Response Response (or (or predictedpredicted)) variable variable : the variable which we intend to model; the variable of interest
• we intend to explain through statistical modeling
�� Explanatory Explanatory (or (or predictorpredictor) ) variable variable : the variable which may be used to model the response variable
• values may be related to the response variable
�� Does nicotine gum reduce cigarette smoking?Does nicotine gum reduce cigarette smoking?
Response variableResponse variable = =
Explanatory variableExplanatory variable = =
Role for VariablesRole for Variables� It is important to determine which of the two
quantitative variables goes on the x-axis and which on
the y-axis.
� This determination is made based on the roles played
by the variables.
� When the roles are clear, the explanatoryexplanatory or predictor predictor
variablevariable goes on the x-axis, and the response variableresponse variable
(variable of interest) goes on the y-axis.
**If the relationship between the variables is unclear, it If the relationship between the variables is unclear, it
does not matter which one we identify as the does not matter which one we identify as the explanatory/response variable. Always THINK about explanatory/response variable. Always THINK about
the dataset and what you are measuring!!!!the dataset and what you are measuring!!!!
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Role for VariablesRole for Variables� The roles that we choose for variables are more
about how we think about them rather than about
the variables themselves.
� Just placing a variable on the x-axis doesn’t
necessarily mean that it explains or predicts
anything. And the variable on the y-axis may not
respond to it in any way.
Two quantitative variablesTwo quantitative variables
What type of relationship exists between the two variables What type of relationship exists between the two variables
and is the association significant?and is the association significant?
xx yy
Cigarettes smoked per day
Score on SAT
Height
Hours of Training
Explanatory
(Independent)Variable
Response
(Dependent)Variable
A relationship between two variablesA relationship between two variables.
Number of Accidents
Shoe Size Height
Lung Capacity
Grade Point Average
IQ
ScatterplotsScatterplotsThe following are some questions that ask whether The following are some questions that ask whether
there is an association between the two variables:there is an association between the two variables:
�� Do older houses sell for less than newer ones for Do older houses sell for less than newer ones for
comparable size and quality? comparable size and quality?
�� Do students learn better with more use of Do students learn better with more use of
computer technology?computer technology?
�� Does economic status influence the amount of Does economic status influence the amount of
physical activity?physical activity?
ScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplotsScatterplots are the ideal way to are the ideal way to picturepicture such such
associations.associations.
Recall:Recall: BivariateBivariate relationshipsrelationships
�� An extension of An extension of univariateunivariate descriptive statisticsdescriptive statistics
�� Used to detect evidence of association in the Used to detect evidence of association in the
samplesample
�� Two variables are said to be Two variables are said to be associatedassociated if the if the
distribution of one variable differs across groups distribution of one variable differs across groups
or values defined by the other variableor values defined by the other variable
23
Recall:Recall: BivariateBivariate RelationshipsRelationships
�� Two quantitative variablesTwo quantitative variables
�� Scatter plotScatter plot
�� Side by side stem and leaf plotsSide by side stem and leaf plots
�� Two qualitative variablesTwo qualitative variables�� TablesTables
�� Bar chartsBar charts
�� One quantitative and one qualitative variableOne quantitative and one qualitative variable�� Side by side box plotsSide by side box plots
�� Bar chartBar chart
Describing AssociationsDescribing Associations�� Three main concepts make up the description of an Three main concepts make up the description of an
association between two variables: association between two variables: directiondirection, , formform, ,
and and strengthstrength..1.1. DirectionDirection is positive or negative and agrees with the slope is positive or negative and agrees with the slope
of the lineof the line
�� In positive associations, an increase in the explanatory In positive associations, an increase in the explanatory
variable leads to an increase in the response variablevariable leads to an increase in the response variable
�� ALWAYS ASK: ALWAYS ASK: WhatWhat’’s my signs my sign——positive, negative, or positive, negative, or
neither?neither?
2.2. FormForm is a description of the is a description of the shapeshape of the graphof the graph
�� A straight line is typical, but not the only shape possible.A straight line is typical, but not the only shape possible.
�� LOOK for FORM: LOOK for FORM: straight, curved, something exotic (?), straight, curved, something exotic (?),
no pattern?no pattern?
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Describing AssociationsDescribing Associations�� Three main concepts make up the description of an Three main concepts make up the description of an
association between two variables: association between two variables: directiondirection, , formform, ,
and and strengthstrength..3.3. StrengthStrength is a description of is a description of how clearly the data follow how clearly the data follow
the form the form stated.stated.
�� LOOK for STRENGTH: LOOK for STRENGTH: how much scatter?how much scatter?
4.4. Look for deviation from patterns or unusual featuresLook for deviation from patterns or unusual features
�� Are there outliers or subgroups?Are there outliers or subgroups?
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots�� DIRECTIONDIRECTION
•• A pattern that runs from the upper left to the lower A pattern that runs from the upper left to the lower
right is said to have a right is said to have a negativenegative direction. direction.
•• A trend running the other way has a A trend running the other way has a positivepositive
direction.direction.
Positive linear relationship No relationship Negative linear relationship
�� FORMFORM
�� If there is a straight If there is a straight
line (line (linearlinear) )
relationship, it will relationship, it will
appear as a cloud or appear as a cloud or
swarm of points swarm of points
stretched out in a stretched out in a
generally consistent, generally consistent,
straight form.straight form.
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
�� FORMFORM
�� If the relationship isnIf the relationship isn’’t straight, but curves gently, t straight, but curves gently,
while still increasing or decreasing steadily, while still increasing or decreasing steadily,
we can often find ways to make it more nearly we can often find ways to make it more nearly
straight.straight.
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
�� FORMFORM
�� If the relationship curves sharply, If the relationship curves sharply,
the methods of this book cannot really help us.the methods of this book cannot really help us.
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
�� STRENGTHSTRENGTH
�� At one extreme, the points appear to follow a single At one extreme, the points appear to follow a single
stream stream
(whether straight, curved, or bending all over the (whether straight, curved, or bending all over the
place).place).
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
�� STRENGTHSTRENGTH
�� At the other extreme, the points appear as a vague At the other extreme, the points appear as a vague
cloud with no discernable trend or pattern:cloud with no discernable trend or pattern:
�� Note: we will quantify the amount of scatter soon.Note: we will quantify the amount of scatter soon.
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
�� UNUSUAL FEATURESUNUSUAL FEATURES
�� Look for the unexpected.Look for the unexpected.
�� Often the most interesting thing to see in a Often the most interesting thing to see in a scatterplotscatterplotis the thing you never thought to look for. is the thing you never thought to look for.
�� One example of such a surprise is an One example of such a surprise is an outlieroutlier standing standing away from the overall pattern of the away from the overall pattern of the scatterplotscatterplot..
�� Clusters or subgroups should also raise questions.Clusters or subgroups should also raise questions.
Describing Associations: Looking at Describing Associations: Looking at
ScatterplotsScatterplots
Typical Patterns of Typical Patterns of ScatterplotsScatterplots
No relationship
Negative nonlinear relationship
This is a weak linear relationship.
A non linear relationship seems to
fit the data better.
Nonlinear (concave) relationship
Positive linear relationship Negative linear relationship
Drawing Drawing ScatterplotsScatterplots by handby hand
�� Plot the explanatory variable, if there is one, on Plot the explanatory variable, if there is one, on
the horizontal axis of the the horizontal axis of the scatterplotscatterplot. .
Note:Note: We usually call the explanatory variable We usually call the explanatory variable xxand the response variable and the response variable yy. If there is no . If there is no
explanatoryexplanatory--response distinction, either variable response distinction, either variable
can go on the horizontal axis.can go on the horizontal axis.
�� Label both axes.Label both axes.
�� Scale the horizontal and vertical axes. The Scale the horizontal and vertical axes. The
intervals must be uniform; intervals must be uniform; i.ei.e, the distance , the distance
between tick marks must be the same.between tick marks must be the same.
Drawing Drawing ScatterplotsScatterplotsBy hand:By hand:
-- Graph on a normal Graph on a normal xx--yy planeplane
-- Make sure to label and scale axes Make sure to label and scale axes (including units if known!)(including units if known!)
-- You do not have to show the origin!You do not have to show the origin!
By TI:By TI:
-- Enter dataEnter data
-- 22ndnd:Stat Plot :Stat Plot –– 11stst type of graphtype of graph
Starter Ch. 7Starter Ch. 7
Seven different families drove to their vacation Seven different families drove to their vacation
destinations. The table below shows the distance destinations. The table below shows the distance
they drove (in miles) and the time it took them (in they drove (in miles) and the time it took them (in
hours). Represent the data graphically and write a hours). Represent the data graphically and write a
description of the data.description of the data.
DistanceDistance 400400 411411 247247 385385 229229 217217 325325
TimeTime 6.56.5 77 4.14.1 6.56.5 3.53.5 3.83.8 5.45.4
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
37
Describing the AssociationDescribing the Association
�� Which variable should go on the Which variable should go on the xx--axis?axis?
•• Do cold days cause gas usage, or does gas usage cause cold Do cold days cause gas usage, or does gas usage cause cold
days(!)?days(!)?
•• Since cold days cause gas usage, degreeSince cold days cause gas usage, degree--days is the days is the explanatory explanatory
variablevariable and goes on the xand goes on the x--axisaxis..
�� Gas usage responds to degreeGas usage responds to degree--days, so it is the days, so it is the response variableresponse variable
and goes on the yand goes on the y--axisaxis..
�� Set up Stat Plot 1 to show the data as a Set up Stat Plot 1 to show the data as a scatterplotscatterplot and and
display the graph.display the graph.
•• Plot the dataPlot the data
�� Write a description of the association addressing Write a description of the association addressing
direction, shape and strength.direction, shape and strength.
•• There is a moderately strong positive linear association betweenThere is a moderately strong positive linear association between
coldness (degreecoldness (degree--days) and gas usagedays) and gas usage
StudentStudent Number of Number of
BeersBeers
BACBAC
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
Here we have two quantitative
variables for each of 16 students.
1. How many beers they drank,
and
2. Their blood alcohol level
(BAC)
We are interested in the
relationship between the two
variables: What are the
variables? How is one affected
by changes in the other one?
Example:Example:
StudentStudent BeersBeers BACBAC
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
In a In a scatterplotscatterplot one axis is used to represent each of the one axis is used to represent each of the
variables, and the data are plotted as points on the variables, and the data are plotted as points on the
graph. graph.
Some plots don’t have clear explanatory and response variables.
Do calories explain
sodium amounts?
Does percent return on
Treasury bills explain percent
return on common stocks?
Recap: Interpreting Recap: Interpreting scatterplotsscatterplots
�� After plotting two variables on a After plotting two variables on a scatterplotscatterplot, ,
we describe the relationship by examining the we describe the relationship by examining the
formform, , directiondirection, and , and strengthstrength of the of the
association. We look for an overall pattern association. We look for an overall pattern ……
�� FormForm: linear, curved, clusters, no pattern: linear, curved, clusters, no pattern
�� DirectionDirection: positive, negative, no direction: positive, negative, no direction
�� StrengthStrength: how closely the points fit the : how closely the points fit the ““formform””
((how much variation, or how much variation, or scatter,scatter, there is around the main formthere is around the main form))
�� …… and deviations from that pattern.and deviations from that pattern.
�� OutliersOutliers
Interpreting Interpreting scatterplotsscatterplots�� StrengthStrength: how closely the points fit the : how closely the points fit the ““formform””
((how much variation, or how much variation, or scatter,scatter, there is around the main formthere is around the main form))
With a strong relationship,
you can get a pretty good
estimate of y if you know x.
With a weak relationship, for
any x you might get a wide
range of y values.
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
This is a very strong very strong relationship.
The daily amount of gas consumed
can be predicted quite accurately for
a given temperature value.
This is a weakweak relationship. For a
particular state median household
income, you can’t predict the state
per capita income very well.
Looking at Looking at ScatterplotsScatterplots
Scatterplot of the size of a diamond ring in
carats and the price in dollars.
After plotting two variables After plotting two variables
on a on a scatterplotscatterplot, we describe , we describe
the relationship by examining the relationship by examining
the the formform, , directiondirection, and , and
strengthstrength of the association. of the association.
We look for an overall patternWe look for an overall pattern
�� FormForm: linear, curved, : linear, curved,
clusters, no patternclusters, no pattern
�� Association/DirectionAssociation/Direction: :
positive, negative, no positive, negative, no
directiondirection
Looking at Looking at ScatterplotsScatterplots
Scatterplot of the size of a diamond ring in
carats and the price in dollars.
�� StrengthStrength: how closely the : how closely the
points fit the points fit the ““formform””
�� OutliersOutliers: deviations from : deviations from
the patternthe pattern
AssociationAssociation�� Suppose you were to collect data for each pair of variables. YouSuppose you were to collect data for each pair of variables. You
want to make a want to make a scatterplotscatterplot. Which variable would you use as the . Which variable would you use as the
explanatory variable and which as the response variable? Why? explanatory variable and which as the response variable? Why?
What would you expect to see in the What would you expect to see in the scatterplotscatterplot? Discuss the likely ? Discuss the likely
direction, form , and strength.direction, form , and strength.
a)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold
b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure
c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility
d)d) All elementary school students: weight, score on a All elementary school students: weight, score on a
reading test reading test
AssociationAssociationa)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold
b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure
c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility
d)d) All elementary school students: weight, score All elementary school students: weight, score
on a reading test on a reading test
Explanatory Response Direction Form Strength
a) T-shirt price Number of T-
shirts
Negative Linear/
straight
Moderate(A very low price would likely
lead to a very high sales, and a
very high price would lead to
low sales.)
b) Depth of the
water
Water pressure Positive Straight Strong(The deeper you dive, the greater
the water pressure.)
�� Association/DirectionAssociation/Direction: :
positive, negative, no directionpositive, negative, no direction
�� FormForm: linear, curved, clusters, : linear, curved, clusters,
no patternno pattern
�� StrengthStrength: how closely the : how closely the
points fit the points fit the ““formform””
�� OutliersOutliers: deviations from the : deviations from the
patternpattern
AssociationAssociationa)a) TT--shirts at a store: price each, number soldshirts at a store: price each, number sold
b)b) Scuba diving: depth, water pressureScuba diving: depth, water pressure
c)c) Scuba diving: depth, visibilityScuba diving: depth, visibility
d)d) All elementary school students: weight, score All elementary school students: weight, score
on a reading test on a reading test
Explanatory Response Direction Form Strength
c) Depth of the
water
Visibility Negative Possibly
straight
Moderate(If a sample of different bodies of
water is used. If the same body of
water has visibility measured at
different depths, the association
would be strong.)
d) Weight Reading test
score
Positive Straight Moderate(Older students generally weigh
more and generally are better
readers. Therefore, students who
weight more are likely to be better
readers. This does not mean that
weight causes higher reading
scores.)
�� Association/DirectionAssociation/Direction: :
positive, negative, no directionpositive, negative, no direction
�� FormForm: linear, curved, clusters, : linear, curved, clusters,
no patternno pattern
�� StrengthStrength: how closely the : how closely the
points fit the points fit the ““formform””
�� OutliersOutliers: deviations from the : deviations from the
patternpattern
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
#1#1
#4#4
#2 and #4#2 and #4
#3 moderately strong, curved association#3 moderately strong, curved association
#2 and #4 each show a very #2 and #4 each show a very
strong association, although some strong association, although some
might classify the association as might classify the association as
merely merely ““strongstrong””..
Looking at Looking at ScatterplotsScatterplots Looking at Looking at ScatterplotsScatterplotsWhich of the Which of the scatterplotsscatterplots show:show:
a)a)Little or no association?Little or no association?
b)b)A negative association?A negative association?
c)c)A linear association?A linear association?
d)d)A moderately strongA moderately strong
association?association?
e)e)A very strong association?A very strong association?
None, although #4 is weakNone, although #4 is weak
#3 and #4. Increases in one variable #3 and #4. Increases in one variable
are generally related to decreases in are generally related to decreases in
the other variable the other variable
#2, #3, and #4#2, #3, and #4
#2#2
#1 and #3. #1 shows a curved #1 and #3. #1 shows a curved
association and #3 shows a straight association and #3 shows a straight
associationassociation
#8, p. 165#8, p. 165 #8, p. 165#8, p. 165
Winning speeds in the Kentucky Derby have generally increased
over time. The association between year and speed is moderately moderately
strongstrong, and seems slightly curvedslightly curved, with a greater rate of increase
in winning speed before 1950 and a smaller rate of increase after
1950, suggesting that winning speeds have leveled off over time.
Quantifying StrengthQuantifying Strength
�� When determining the strength of a When determining the strength of a scatterplotscatterplot, we , we
would like a numerical value that indicates the would like a numerical value that indicates the
strength of the relationship between the explanatory strength of the relationship between the explanatory
and response variables. This numerical value is and response variables. This numerical value is
called the called the correlation coefficient, rcorrelation coefficient, r..
r =zxzy∑
n −1
Correlation ConditionsCorrelation Conditions
�� CorrelationCorrelation measures the strength of the measures the strength of the
linearlinear association between association between two two quantitative quantitative
variablesvariables. .
�� Before you use correlation, you must check Before you use correlation, you must check
several conditions:several conditions:
�� Quantitative Variables ConditionQuantitative Variables Condition
�� Straight Enough ConditionStraight Enough Condition
�� Outlier ConditionOutlier Condition
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Correlation ConditionsCorrelation Conditions
�� Quantitative Variables Condition:Quantitative Variables Condition:
�� Correlation applies only to Correlation applies only to quantitative variablesquantitative variables. .
�� DonDon’’t apply correlation to categorical data t apply correlation to categorical data
masquerading as quantitative. masquerading as quantitative.
�� Check that you know the variablesCheck that you know the variables’’ units and what units and what
they measure.they measure.
Correlation ConditionsCorrelation Conditions
�� Straight Enough Condition:Straight Enough Condition:
�� You can You can calculatecalculate a correlation coefficient for any a correlation coefficient for any
pair of variables. pair of variables.
�� But correlation measures the strength only of the But correlation measures the strength only of the
linearlinear associationassociation, and will be misleading if the , and will be misleading if the
relationship is not linear, so watch for curvature!relationship is not linear, so watch for curvature!
Correlation ConditionsCorrelation Conditions
�� Outlier Condition:Outlier Condition:
�� Outliers can distort the correlation dramatically. Outliers can distort the correlation dramatically.
�� An outlier can make an otherwise small correlation An outlier can make an otherwise small correlation
look big or hide a large correlation. look big or hide a large correlation.
�� It can even give an otherwise positive association a It can even give an otherwise positive association a
negative correlation coefficient (and vice versa). negative correlation coefficient (and vice versa).
�� When you see an outlier, itWhen you see an outlier, it’’s often a good idea to s often a good idea to
report the correlations with and without the point.report the correlations with and without the point.
Correlation PropertiesCorrelation Properties
�� The The signsign of a correlation coefficient gives the of a correlation coefficient gives the
direction of the association.direction of the association.
�� Correlation is always between Correlation is always between ––1 and +11 and +1. .
�� Correlation Correlation cancan be exactly equal to be exactly equal to ––1 or +1, but 1 or +1, but
these values are unusual in real data because they these values are unusual in real data because they
mean that all the data points fall mean that all the data points fall exactlyexactly on a single on a single
straight line.straight line.
�� A correlation near zero corresponds to a weak A correlation near zero corresponds to a weak
linear association.linear association.
Correlation PropertiesCorrelation Properties
�� ““rr”” ranges from ranges from ––11 to to +1+1. .
�� ““rr”” quantifies the strength quantifies the strength
and direction of a and direction of a linearlinear
relationship between two relationship between two
quantitativequantitative variables.variables.
��Strength:Strength: How closely the How closely the
points follow a straight points follow a straight
line. line.
��DirectionDirection is positive when is positive when
individuals with higher x individuals with higher x
values tend to have higher values tend to have higher
values of y.values of y.
Correlation PropertiesCorrelation Properties
rr StrengthStrength
0.0 ≤≤≤≤ r r ≤≤≤≤ 0.1 None/Very weak
0.1 < r r ≤≤≤≤ 0.3 Small/Weak
0.3 < r r ≤≤≤≤ 0.5 Medium/Moderate
0.5 < r r ≤≤≤≤ 1.0 Strong
�As a rule of thumb, the following guidelines on strength of relationship are often useful (though many experts would somewhat disagree on the choice of boundaries).
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Correlation PropertiesCorrelation Properties
0.0060.006 0.7770.777
--0.9230.923 --0.4870.487
Correlation PropertiesCorrelation Properties
�� Correlation treats Correlation treats xx and and yy symmetrically: symmetrically:
�� The correlation of The correlation of xx with with yy is the same as the is the same as the correlation of correlation of yy with with xx..
�� Correlation has Correlation has no unitsno units..
�� Correlation is not affected by changes in the Correlation is not affected by changes in the center or scale of either variable. center or scale of either variable.
�� Correlation depends only on the Correlation depends only on the zz--scoresscores, and they , and they are unaffected by changes in center or scale.are unaffected by changes in center or scale.
Correlation PropertiesCorrelation Properties
�� Correlation measures the strength of the Correlation measures the strength of the linearlinear
associationassociation between the two variables. between the two variables.
�� Variables can have a strong association but still Variables can have a strong association but still have a small correlation if the association isnhave a small correlation if the association isn’’t t linear.linear.
�� Correlation is Correlation is sensitive to outlierssensitive to outliers. A single . A single outlying value can make a small correlation outlying value can make a small correlation large or make a large one small.large or make a large one small.
Correlation Properties SummaryCorrelation Properties Summary
�� Sign of Sign of rr gives the direction of associationgives the direction of association
�� Correlation is always between Correlation is always between --1 and +11 and +1
�� FlippingFlipping xx and and yy does NOT affectdoes NOT affect rr
�� rr has NO units!! It has been standardizedhas NO units!! It has been standardized
�� Changing units on Changing units on xx or or yy does not affect does not affect rr
�� rr measures a LINEAR relationship only!measures a LINEAR relationship only!
�� rr is nonis non--resistant to outliersresistant to outliers
Correlation Correlation ≠≠ CausationCausation
� Just because a correlation exists between two
factors doesn’t mean one factor causes the
other factor, or in fact, that there is any
relationship at all between the two factors.
Correlation Correlation ≠≠ CausationCausation
�� Whenever we have a strong correlation, it is Whenever we have a strong correlation, it is
tempting to explain it by imagining that the tempting to explain it by imagining that the
predictor variable has predictor variable has causedcaused the response to the response to
help.help.
�� ScatterplotsScatterplots and correlation coefficients and correlation coefficients nevernever
prove causation.prove causation.
�� A hidden variable that stands behind a A hidden variable that stands behind a
relationship and determines it by relationship and determines it by
simultaneously affecting the other two simultaneously affecting the other two
variables is called a variables is called a lurking variablelurking variable..
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Beware the Beware the ““Lurking VariableLurking Variable””
• A lurking variablelurking variable is a variable that is notnot among
the explanatory or response variables in a study
and yet may influence the interpretation of
relationships among variables.
• An association between two variables x and y
can reflect many types of relationship among x,
y, and one or more lurking variables. In other
words, association does not imply causationassociation does not imply causation.
• Correlations based on averages are usually too
high when applied to individuals.
Finding Correlation Using the TIFinding Correlation Using the TI
�� Press Press Stat: Calc: 4:LinRegStat: Calc: 4:LinReg
� If rr does not show, you will need to turn
DiagnosticsOnDiagnosticsOn.
� Go to 22ndnd:0 (Catalog):0 (Catalog), scroll down to
DiagnosticsOnDiagnosticsOn and hit Enter twice.
Straightening Straightening ScatterplotsScatterplots
�� Straight line relationships are the ones that we Straight line relationships are the ones that we
can measure with correlation. can measure with correlation.
�� When a When a scatterplotscatterplot shows a bent form that shows a bent form that
consistently increases or decreases, we can consistently increases or decreases, we can
often straighten the form of the plot by often straighten the form of the plot by rere--
expressing one or both variablesexpressing one or both variables..
Straightening Straightening ScatterplotsScatterplots
�� A A scatterplotscatterplot of f/stop vs. shutter speed shows of f/stop vs. shutter speed shows
a bent relationship:a bent relationship:
Straightening Straightening ScatterplotsScatterplots
�� ReRe--expressing f/stop vs. shutter speed by expressing f/stop vs. shutter speed by
squaring the f/stop valuessquaring the f/stop values straightens the straightens the
relationship:relationship:
Find the ErrorsFind the Errors……�� Your economics instructor assigns your class to investigate Your economics instructor assigns your class to investigate
factors associated with the gross domestic product (GDP) of factors associated with the gross domestic product (GDP) of
nations. Each student examines a different factor (such as nations. Each student examines a different factor (such as
life expectancy, literacy rate, etc) for a few countries and life expectancy, literacy rate, etc) for a few countries and
reports to the class. Explain the mistakes in the statements reports to the class. Explain the mistakes in the statements
below:below:
a) My correlation of a) My correlation of --0.772 shows that there is almost no 0.772 shows that there is almost no
association between GDP and infant mortality rate.association between GDP and infant mortality rate.
A correlation of A correlation of --0.772 is fairly strong0.772 is fairly strong
b) There was a correlation of 0.44 between GDP and b) There was a correlation of 0.44 between GDP and
continent.continent.
Continent is not a quantitative variableContinent is not a quantitative variable
Correlation cannot be calculated here.Correlation cannot be calculated here.
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Find the ErrorsFind the Errors……�� Your economics instructor assigns your class to investigate Your economics instructor assigns your class to investigate
factors associated with the gross domestic product (GDP) of factors associated with the gross domestic product (GDP) of
nations. Each student examines a different factor (such as nations. Each student examines a different factor (such as
life expectancy, literacy rate, etc) for a few countries and life expectancy, literacy rate, etc) for a few countries and
reports to the class. Explain the mistakes in the statements reports to the class. Explain the mistakes in the statements
below:below:
c) There was a very strong correlation of 1.22 between life c) There was a very strong correlation of 1.22 between life
expectancy and GDP.expectancy and GDP.
Correlation cannot be higher than 1Correlation cannot be higher than 1
d) The correlation between literacy rate and GDP was 0.83. d) The correlation between literacy rate and GDP was 0.83.
This shows that countries wanting to increase their This shows that countries wanting to increase their
standard of living should invest heavily in education.standard of living should invest heavily in education.
Correlation does not imply causation.Correlation does not imply causation.
Chapter 7 ExercisesChapter 7 Exercises
Chapter 7 ExercisesChapter 7 Exercises Chapter 7 ExercisesChapter 7 Exercises
Chapter 7 ExercisesChapter 7 Exercises Chapter 7 ExercisesChapter 7 Exercises
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Chapter 7 ExercisesChapter 7 Exercises
a) None of the scatterplotsshow little or no association, although # 4 is very weak.
Chapter 7 ExercisesChapter 7 Exercises
b) #3 and #4 show negative association. Increases in one variable are generally related to decreases in the other variable.
Chapter 7 ExercisesChapter 7 Exercises
c) #2, #3, and #4 each show a straight association.
Chapter 7 ExercisesChapter 7 Exercises
d) #2 shows a moderately strong association.
Chapter 7 ExercisesChapter 7 Exercises
e) #1 and #3 each show a very strong association. #1 shows a curved association and #3 shows a straight association.
Chapter 7 Exercises, #33Chapter 7 Exercises, #33
AP Statistics ChapteAP Statistics Chapter 7: r 7: ScatterplotsScatterplots, Association, and Correlation, Association, and Correlation
Concord High SchoolConcord High School RNBrionesRNBriones
Chapter 7 Exercises, #33Chapter 7 Exercises, #33 Chapter 7 Exercises, #33Chapter 7 Exercises, #33