27
Chapter 6 1 Where Are We? Population and Sample Getting Data: Experiments and Observational Studies Quantitative versus Categorical Variables Related Variables (Causation, Non- Causal, Other) Ex1: Experiment to look at relationship between nicotine patches and smoking cessation Ex2: Obs study to look at relationship between IQ and number of spankings Describing Data: graphs, summary statistics (mean&stdev, Percent quitting Nicotine Placebo Smoker at home 31% 20% No smoker at home 58% 20%

Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Embed Size (px)

Citation preview

Page 1: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 6 1

Where Are We? Population and Sample Getting Data: Experiments and Observational Studies Quantitative versus Categorical Variables Related Variables (Causation, Non-Causal, Other) Ex1: Experiment to look at relationship between

nicotine patches and smoking cessation

Ex2: Obs study to look at relationship between IQ and number of spankings

Describing Data: graphs, summary statistics (mean&stdev, median&quartiles), distributions (stemplots, histograms, boxplots)

Percent quitting Nicotine Placebo

Smoker at home 31% 20%

No smoker at home 58% 20%

Page 2: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 2

Chapter 14

Describing Relationships: Scatterplots and Correlation

Page 3: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 3

Thought Question 1

For all cars manufactured in the U.S., there is a positive correlation between the size of the engine and horsepower. There is a negative correlation between the size of the engine and gas mileage. What does it mean for two variables to have a positive correlation or a negative correlation?

Page 4: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 4

Thought Question 2

What type of correlation would the following pairs of variables have – positive, negative, or none?

1. Temperature during the summer and electricity bills

2. Temperature during the winter and heating costs

3. Number of years of education and height

4. Frequency of brushing and number of cavities

5. Number of churches and number of bars in cities in your state

6. Height of husband and height of wife

Page 5: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 5

Thought Question 3

Consider the two scatterplots below. How does the outlier impact the correlation for each plot?

– does the outlier increase the correlation, decrease the correlation, or have no impact?

Page 6: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 6

Statistical versus Deterministic Relationships

Distance versus Speed (when travel time is constant).

Income (in millions of dollars) versus total assets of banks (in billions of dollars).

Page 7: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 7

Distance versus Speed

Distance = Speed Time Suppose time = 1.5 hours Each subject drives a

fixed speed for the 1.5 hrs– speed chosen for each

subject varies from 10 mph to 50 mph

Distance does not vary for those who drive the same fixed speed

Deterministic relationship

0

10

20

30

40

50

60

70

80

0 20 40 60

speed

dis

tan

ce

Page 8: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 8

Income versus Assets

0

50

100

150

200

250

300

0 20 40 60

assets (billions)

inco

me

(mil

lio

ns)

Income =a + bAssets

Assets vary from 3.4 billion to 49 billion

Income varies from bank to bank, even among those with similar assets

Statistical relationship

Page 9: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 9

Strength and Statistical Significance

A strong relationship seen in the sample may indicate a strong relationship in the population.

The sample may exhibit a strong relationship simply by chance and the relationship in the population is not strong or is zero.

The observed relationship is considered to be statistically significant if it is stronger than a large proportion of the relationships we could expect to see just by chance.

Page 10: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 10

Warnings aboutStatistical Significance

“Statistical significance” does not imply the relationship is strong enough to be considered “practically important”.

Even weak relationships may be labeled statistically significant if the sample size is very large.

Even very strong relationships may not be labeled statistically significant if the sample size is very small.

Page 11: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 11

Linear Relationship

Some relationships are such that the points of a scatterplot tend to fall along a straight line — linear relationship

Page 12: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 12

Examples of Relationships

0

10

20

30

40

50

60

$0 $10 $20 $30 $40 $50 $60 $70

Income

Hea

th S

tatu

s M

easu

re

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Age

Hea

th S

tatu

s M

easu

re0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Age

Ed

uca

tion

Lev

el

30

35

40

45

50

55

60

65

0 20 40 60 80

Physical Health Score

Men

tal H

ealt

h S

core

Page 13: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 13

Measuring Strength & Directionof a Linear Relationship

How closely does a non-horizontal straight line fit the points of a scatterplot?

The correlation coefficient (often referred to as just correlation): r– measure of the strength of the relationship: the

stronger the relationship, the larger the magnitude of r.– measure of the direction of the relationship: positive r

indicates a positive relationship, negative r indicates a negative relationship.

Click for Computation

Page 14: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 14

Correlation Coefficient

special values for r: a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the scatterplot

points are best fit by a horizontal line, then r = 0 Note: r must be between -1 and +1, inclusive

r > 0: as one variable changes, the other variable tends to change in the same direction

r < 0: as one variable changes, the other variable tends to change in the opposite direction

Plot

Page 15: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 15

Examples of Correlations Husband’s versus Wife’s ages

r = .94 Husband’s versus Wife’s heights

r = .36 Professional Golfer’s Putting Success:

Distance of putt in feet versus percent success

r = -.94Plot

Click for Graphical Examples

Page 16: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 16

Not all Relationships are LinearMiles per Gallon versus Speed

Linear relationship?MPG = a + bSpeed

Speed chosen for each subject varies from 20 mph to 60 mph

MPG varies from trial to trial, even at the same speed

Statistical relationship

y = - 0.013x + 26.9r = - 0.06

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er

gall

on

Page 17: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 17

Not all Relationships are LinearMiles per Gallon versus Speed

Curved relationship(r is misleading)

Speed chosen for each subject varies from 20 mph to 60 mph

MPG varies from trial to trial, even at the same speed

Statistical relationship

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n

Page 18: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 18

Problems with Correlations

Outliers can inflate or deflate correlations

Groups combined inappropriately may mask relationships (a third variable)– groups may have different relationships

when separated

Plot

Page 19: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 19

Outliers and Correlation

For each scatterplot above, how does the outlier affect the correlation?

A B

A: outlier decreases the correlation B: outlier increases the correlation

Page 20: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 20

Price of Books versus Size

0

20

40

60

80

100

120

140

0 100 200 300 400

# of pages

pri

ce (

do

llar

s)

Relationship between price of books and the number of pages?

Positive? Look at paperbacks: Look at hard covers: All books together: Overall correlation is

Negative!

Page 21: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 21

Key Concepts

Statistical vs. Deterministic Relationships Statistically Significant Relationship Strength of Linear Relationship Direction of Linear Relationship Correlation Coefficient Problems with Correlations

Page 22: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 22

Correlation Calculation Suppose we have data on variables X

and Y for n individuals:x1, x2, … , xn and y1, y2, … , yn

Each variable has a mean and std dev: ) ) y

xs( x, s ( y, s (see ch. 12 for ) and

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 23: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 23

Case Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Page 24: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 24

Case Study

Country Per Capita GDP (x) Life Expectancy (y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Page 25: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 25

Case Studyx y

21.4 77.48 -0.078 -0.345 0.027

23.2 77.53 1.097 -0.282 -0.309

20.0 77.32 -0.992 -0.546 0.542

22.7 78.63 0.770 1.102 0.849

20.8 77.17 -0.470 -0.735 0.345

18.6 76.39 -1.906 -1.716 3.271

21.5 78.51 -0.013 0.951 -0.012

22.0 78.15 0.313 0.498 0.156

23.8 78.99 1.489 1.555 2.315

21.2 77.37 -0.209 -0.483 0.101

= 21.52 = 77.754sum = 7.285

sx =1.532 sy =0.795

yi /syy xi /sxx

x y

y

i

x

i

s

y-y

s

x-x

Page 26: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 26

Case Study

0.809

(7.285)110

1

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

There is a strong, positive linear relationship between Per Capita GDP (x) and Life Expectancy (y).

Return to Slide 12

Page 27: Chapter 61 Where Are We? u Population and Sample u Getting Data: Experiments and Observational Studies u Quantitative versus Categorical Variables u Related

Chapter 14 27

Examples of Correlations

Return to Slide 14