Upload
jonas-woods
View
216
Download
1
Embed Size (px)
Citation preview
Chapter 6 1
Where Are We? Population and Sample Getting Data: Experiments and Observational Studies Quantitative versus Categorical Variables Related Variables (Causation, Non-Causal, Other) Ex1: Experiment to look at relationship between
nicotine patches and smoking cessation
Ex2: Obs study to look at relationship between IQ and number of spankings
Describing Data: graphs, summary statistics (mean&stdev, median&quartiles), distributions (stemplots, histograms, boxplots)
Percent quitting Nicotine Placebo
Smoker at home 31% 20%
No smoker at home 58% 20%
Chapter 14 2
Chapter 14
Describing Relationships: Scatterplots and Correlation
Chapter 14 3
Thought Question 1
For all cars manufactured in the U.S., there is a positive correlation between the size of the engine and horsepower. There is a negative correlation between the size of the engine and gas mileage. What does it mean for two variables to have a positive correlation or a negative correlation?
Chapter 14 4
Thought Question 2
What type of correlation would the following pairs of variables have – positive, negative, or none?
1. Temperature during the summer and electricity bills
2. Temperature during the winter and heating costs
3. Number of years of education and height
4. Frequency of brushing and number of cavities
5. Number of churches and number of bars in cities in your state
6. Height of husband and height of wife
Chapter 14 5
Thought Question 3
Consider the two scatterplots below. How does the outlier impact the correlation for each plot?
– does the outlier increase the correlation, decrease the correlation, or have no impact?
Chapter 14 6
Statistical versus Deterministic Relationships
Distance versus Speed (when travel time is constant).
Income (in millions of dollars) versus total assets of banks (in billions of dollars).
Chapter 14 7
Distance versus Speed
Distance = Speed Time Suppose time = 1.5 hours Each subject drives a
fixed speed for the 1.5 hrs– speed chosen for each
subject varies from 10 mph to 50 mph
Distance does not vary for those who drive the same fixed speed
Deterministic relationship
0
10
20
30
40
50
60
70
80
0 20 40 60
speed
dis
tan
ce
Chapter 14 8
Income versus Assets
0
50
100
150
200
250
300
0 20 40 60
assets (billions)
inco
me
(mil
lio
ns)
Income =a + bAssets
Assets vary from 3.4 billion to 49 billion
Income varies from bank to bank, even among those with similar assets
Statistical relationship
Chapter 14 9
Strength and Statistical Significance
A strong relationship seen in the sample may indicate a strong relationship in the population.
The sample may exhibit a strong relationship simply by chance and the relationship in the population is not strong or is zero.
The observed relationship is considered to be statistically significant if it is stronger than a large proportion of the relationships we could expect to see just by chance.
Chapter 14 10
Warnings aboutStatistical Significance
“Statistical significance” does not imply the relationship is strong enough to be considered “practically important”.
Even weak relationships may be labeled statistically significant if the sample size is very large.
Even very strong relationships may not be labeled statistically significant if the sample size is very small.
Chapter 14 11
Linear Relationship
Some relationships are such that the points of a scatterplot tend to fall along a straight line — linear relationship
Chapter 14 12
Examples of Relationships
0
10
20
30
40
50
60
$0 $10 $20 $30 $40 $50 $60 $70
Income
Hea
th S
tatu
s M
easu
re
0
10
20
30
40
50
60
70
0 20 40 60 80 100
Age
Hea
th S
tatu
s M
easu
re0
2
4
6
8
10
12
14
16
18
0 20 40 60 80 100
Age
Ed
uca
tion
Lev
el
30
35
40
45
50
55
60
65
0 20 40 60 80
Physical Health Score
Men
tal H
ealt
h S
core
Chapter 14 13
Measuring Strength & Directionof a Linear Relationship
How closely does a non-horizontal straight line fit the points of a scatterplot?
The correlation coefficient (often referred to as just correlation): r– measure of the strength of the relationship: the
stronger the relationship, the larger the magnitude of r.– measure of the direction of the relationship: positive r
indicates a positive relationship, negative r indicates a negative relationship.
Click for Computation
Chapter 14 14
Correlation Coefficient
special values for r: a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the scatterplot
points are best fit by a horizontal line, then r = 0 Note: r must be between -1 and +1, inclusive
r > 0: as one variable changes, the other variable tends to change in the same direction
r < 0: as one variable changes, the other variable tends to change in the opposite direction
Plot
Chapter 14 15
Examples of Correlations Husband’s versus Wife’s ages
r = .94 Husband’s versus Wife’s heights
r = .36 Professional Golfer’s Putting Success:
Distance of putt in feet versus percent success
r = -.94Plot
Click for Graphical Examples
Chapter 14 16
Not all Relationships are LinearMiles per Gallon versus Speed
Linear relationship?MPG = a + bSpeed
Speed chosen for each subject varies from 20 mph to 60 mph
MPG varies from trial to trial, even at the same speed
Statistical relationship
y = - 0.013x + 26.9r = - 0.06
0
5
10
15
20
25
30
35
0 50 100
speed
mil
es p
er
gall
on
Chapter 14 17
Not all Relationships are LinearMiles per Gallon versus Speed
Curved relationship(r is misleading)
Speed chosen for each subject varies from 20 mph to 60 mph
MPG varies from trial to trial, even at the same speed
Statistical relationship
0
5
10
15
20
25
30
35
0 50 100
speed
mil
es p
er g
allo
n
Chapter 14 18
Problems with Correlations
Outliers can inflate or deflate correlations
Groups combined inappropriately may mask relationships (a third variable)– groups may have different relationships
when separated
Plot
Chapter 14 19
Outliers and Correlation
For each scatterplot above, how does the outlier affect the correlation?
A B
A: outlier decreases the correlation B: outlier increases the correlation
Chapter 14 20
Price of Books versus Size
0
20
40
60
80
100
120
140
0 100 200 300 400
# of pages
pri
ce (
do
llar
s)
Relationship between price of books and the number of pages?
Positive? Look at paperbacks: Look at hard covers: All books together: Overall correlation is
Negative!
Chapter 14 21
Key Concepts
Statistical vs. Deterministic Relationships Statistically Significant Relationship Strength of Linear Relationship Direction of Linear Relationship Correlation Coefficient Problems with Correlations
Chapter 14 22
Correlation Calculation Suppose we have data on variables X
and Y for n individuals:x1, x2, … , xn and y1, y2, … , yn
Each variable has a mean and std dev: ) ) y
xs( x, s ( y, s (see ch. 12 for ) and
n
1i y
i
x
i
s
yy
s
xx
1-n
1r
Chapter 14 23
Case Study
Per Capita Gross Domestic Productand Average Life Expectancy for
Countries in Western Europe
Chapter 14 24
Case Study
Country Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
Chapter 14 25
Case Studyx y
21.4 77.48 -0.078 -0.345 0.027
23.2 77.53 1.097 -0.282 -0.309
20.0 77.32 -0.992 -0.546 0.542
22.7 78.63 0.770 1.102 0.849
20.8 77.17 -0.470 -0.735 0.345
18.6 76.39 -1.906 -1.716 3.271
21.5 78.51 -0.013 0.951 -0.012
22.0 78.15 0.313 0.498 0.156
23.8 78.99 1.489 1.555 2.315
21.2 77.37 -0.209 -0.483 0.101
= 21.52 = 77.754sum = 7.285
sx =1.532 sy =0.795
yi /syy xi /sxx
x y
y
i
x
i
s
y-y
s
x-x
Chapter 14 26
Case Study
0.809
(7.285)110
1
n
1i y
i
x
i
s
yy
s
xx
1-n
1r
There is a strong, positive linear relationship between Per Capita GDP (x) and Life Expectancy (y).
Return to Slide 12
Chapter 14 27
Examples of Correlations
Return to Slide 14