43
Chapter 6 Scatterplots, Association, and Correlation D. Raffle 5/26/2015 Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h... 1 of 43 05/25/2015 08:22 PM

Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Chapter 6Scatterplots, Association, and Correlation

D. Raffle

5/26/2015

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

1 of 43 05/25/2015 08:22 PM

Page 2: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Review: Comparing Variables

In previous chapters, we looked for relationships (associations) betweenvariables by:

In this chapter, we will:

Comparing categorical variables with contingency tables and stacked barplots

Comparing numeric variables across groups with side-by-side boxplots

Looked at how variables change over time with timeplots

·

·

·

Look for relationships between two numeric variables·

2/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

2 of 43 05/25/2015 08:22 PM

Page 3: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

The Data

Recall the Motor Trend Cars data from previous chapters:

We might want to know:

## mpg cyl disp hp wt qsec vs am## Mazda RX4 21.0 6 160 110 2.620 16.46 V auto## Mazda RX4 Wag 21.0 6 160 110 2.875 17.02 V auto## Datsun 710 22.8 4 108 93 2.320 18.61 S auto## Hornet 4 Drive 21.4 6 258 110 3.215 19.44 S manual## Hornet Sportabout 18.7 8 360 175 3.440 17.02 V manual## Valiant 18.1 6 225 105 3.460 20.22 S manual

Is there a relationship between engine displacement (size) and horsepower?

Is weight related to fuel efficiency?

·

·

3/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

3 of 43 05/25/2015 08:22 PM

Page 4: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Overview

How to we find relationships between numeric (quantitative) variables?

Visually: using scatterplots

Numerically: using the correlation coefficient

Usually, we do both

In this course, we will only focus on linear relationships

·

·

·

·

4/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

4 of 43 05/25/2015 08:22 PM

Page 5: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Scatterplots

5/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

5 of 43 05/25/2015 08:22 PM

Page 6: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Scatterplots

How to make scatterplots:

What we look for:

Define one variable as the variable, and one as

Draw a point for each observation, using the values of the and variablesas coordinates

Typically, the variable is on the horizontal axis and the variable on thevertical axis

· X Y

· X Y

· X Y

Is there are trend or pattern?

Are there any outliers or unusual points?

·

·

6/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

6 of 43 05/25/2015 08:22 PM

Page 7: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Horsepower vs. Displacement

7/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

7 of 43 05/25/2015 08:22 PM

Page 8: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Horsepower vs. Displacement

Is there a relationship?

Are there any unusual points?

As engines get bigger, they tend to have more horsepower

We call this a positive association

·

·

There is a point well above the rest

Notice that it's engine size is right in the middle (about 300 cu. in.), but itshorsepower is larger than any other car

·

·

8/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

8 of 43 05/25/2015 08:22 PM

Page 9: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weight vs. Fuel Efficiency

9/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

9 of 43 05/25/2015 08:22 PM

Page 10: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weight vs. Fuel Efficiency

Is there a relationship?

Are there any outliers?

As cars get heavier, they tend to have lower fuel efficiency

We call this a negative association

·

·

No points fall far away from the rest·

10/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

10 of 43 05/25/2015 08:22 PM

Page 11: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Types of Relationships

There are many types of trends that can come up when we make scatterplots. Inthis class, we will focus on the most common:

Directions of Relationships

Why lines?

Linear: The trend can be described fairly well by a straight line

Non-linear: Any other type of trend

·

·

Positive: As one variable goes up, so does the other one

Negative: As one variable goes up, the other goes down

·

·

In statistics, we often try to find the simplest adequate method. Lines aresimple.

·

11/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

11 of 43 05/25/2015 08:22 PM

Page 12: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Strong Positive Linear Trend

12/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

12 of 43 05/25/2015 08:22 PM

Page 13: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Moderate Negative Linear Trend

13/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

13 of 43 05/25/2015 08:22 PM

Page 14: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weak Positive Linear Trend

14/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

14 of 43 05/25/2015 08:22 PM

Page 15: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

No Trend

15/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

15 of 43 05/25/2015 08:22 PM

Page 16: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Non-Linear Trend

16/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

16 of 43 05/25/2015 08:22 PM

Page 17: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Non-Linear Trend

17/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

17 of 43 05/25/2015 08:22 PM

Page 18: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Roles of Variables

How do we decide which is and which is ?

The Variable is:

The Variable is:

Which is which depends on what question we're asking.

X Y

X

The explanatory or independent variable.

We want to know if changes in this variable explains changes in

·

· Y

Y

The response or dependent variable

We want to see if this variable responds when we change

·

· X

18/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

18 of 43 05/25/2015 08:22 PM

Page 19: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Variable Role Examples

Horsepower vs. Engine Displacement

Fuel Efficiency vs. Weight

It makes sense that giving a car a bigger engine gives it more power.

We can't just give a car more horsepower, horsepower responds to changes wemake to the car.

Horsepower should be , and Engine Displacement should be .

·

·

· Y X

When we make a car heavier, it should mean that it takes more fuel to move it.

Fuel efficiency responds to changes in the properties of the car.

Fuel Efficiency should be , and Weight should be .

·

·

· Y X

19/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

19 of 43 05/25/2015 08:22 PM

Page 20: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Measuring the Strength

How do we measure how strong the relationship is?

What is the correlation coefficient?

We use the correlation coefficient

StatCrunch will find this for us

·

· r =∑ ×zy zx

n−1

·

is the strength of the linear relationship between two numeric variables

It tells us how well a straight line explains the relationship

· r

·

20/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

20 of 43 05/25/2015 08:22 PM

Page 21: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Interpreting Correlation

Notes:

The value of tells us the strength

The sign of tells us the direction

: the points make a perfect straight line with a positive slope

: the points make a perfect straight line with a negative slope

: there is no linear relationship at all

· −1 ≤ r ≤ 1· r

· r

· r = 1· r = −1· r = 0

You can sometimes get high correlations even if the relationship isn't linear

You should always see a scatterplot along with a correlation coefficient toknow whether or not it's meaningful

·

·

21/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

21 of 43 05/25/2015 08:22 PM

Page 22: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Strong Positive Linear Trend

22/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

22 of 43 05/25/2015 08:22 PM

Page 23: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Moderate Negative Linear Trend

23/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

23 of 43 05/25/2015 08:22 PM

Page 24: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weak Positive Linear Trend

24/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

24 of 43 05/25/2015 08:22 PM

Page 25: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

No Trend

25/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

25 of 43 05/25/2015 08:22 PM

Page 26: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Non-Linear Trend

26/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

26 of 43 05/25/2015 08:22 PM

Page 27: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Non-Linear Trend

27/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

27 of 43 05/25/2015 08:22 PM

Page 28: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Using the Correlation Coefficient

So how do we use ?

Note:

r

First, make a scatterplot

There needs to be a linear association, or is meaningless

Check the sign: is the relationship positive or negative?

Check the value: how strong is the relationship?

Are there outliers? The correlation is very sensitive to them.

·

· r

·

·

·

We often use the terms weak, moderate, and strong to describe therelationship, but these are up to interpretation.

·

28/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

28 of 43 05/25/2015 08:22 PM

Page 29: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Horsepower vs. Engine Displacement

29/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

29 of 43 05/25/2015 08:22 PM

Page 30: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Horsepower vs. Engine Displacement

30/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

30 of 43 05/25/2015 08:22 PM

Page 31: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

More Properties of Correlation

is unitless

is not affected by changes of center or scale

If we change units, the correlation will not change (e.g., )

The correlation of and is the same as the correlation between and (their z-scores)

The correlation stays the same if we flip and

Correlation only applies to relationships between numeric variables. If there isan association involving categorical variables, it is not correlation.

· r

· r

· lbs → kg

· X Y Zx

Zy

· X Y

·

31/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

31 of 43 05/25/2015 08:22 PM

Page 32: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weight (lbs) vs. Fuel Efficiency

32/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

32 of 43 05/25/2015 08:22 PM

Page 33: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weight (kg) vs. Fuel Efficiency

33/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

33 of 43 05/25/2015 08:22 PM

Page 34: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Weight vs. Fuel Efficiency (Z-Scores)

34/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

34 of 43 05/25/2015 08:22 PM

Page 35: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Fuel Efficiency vs. Weight

35/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

35 of 43 05/25/2015 08:22 PM

Page 36: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

In StatCrunch

Scatterplots:

Correlation:

Graph Scatter Plot1.

X Column Select your explanatory variable2.

Y Column Selected your response variable3.

Compute!4.

→→ (X)

→ (Y )

Stat Summary Stats Correlation1.

Select Column(s) Hold Shift/Ctrl/Command to select multiple variables(note: if you select more than two variables, it will find all pair-wisecorrelations)

2.

Compute!3.

→ →→

36/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

36 of 43 05/25/2015 08:22 PM

Page 37: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Correlation Causation

Must people are familiar with the phrase "correlation does not equal causation,"but what does that really mean?

For example:

Even if we find a correlation between two variables, it does not mean that onecauses the other.

This is especially common when two things both increase or decrease overtime.

Both may be caused by other, unknown variables.

We call these unknown variables lurking variables or confounding variables.

·

·

·

·

What if we looked at the correlation between national ice cream sales and thenumber of forest fires, recorded for each month of the year?

·

37/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

37 of 43 05/25/2015 08:22 PM

Page 38: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Ice Cream Sales and Forest Fires

38/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

38 of 43 05/25/2015 08:22 PM

Page 39: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Ice Cream Sales and Forest Fires

It certainly looks like there is a relationship.

What could it be?

As the number of forest fires increase, the amount of ice cream being solddoes as well

If you open a pint of Ben & Jerries, does this light a patch of brush inCalifornia?

The more likely explanation is the there is at least one lurking variable

·

·

·

Both could be related to the month in which the information was collected

Additionally, certain months tend to be hotter and drier.

Both of these conditions lead to people wanting ice cream and forest firesbeing easier to start

·

·

·

39/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

39 of 43 05/25/2015 08:22 PM

Page 40: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Forest Fires vs. Month

40/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

40 of 43 05/25/2015 08:22 PM

Page 41: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Ice Cream Sales vs. Month

41/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

41 of 43 05/25/2015 08:22 PM

Page 42: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Reporting Correlation

Employee Salaries and Productivity, :

Red Wine and Cholesterol, :

Parents' and Children's Education Levels (association, not correlation):

r = .8

Bad: Raising salaries increases productivity.

Good: Employees with higher salaries tend to be more productive.

·

·

r = −0.99

Bad: This proves that drinking more red wine lowers cholesterol.

Good: There is a strong negative association between red wine consumptionand cholesterol level.

·

·

Bad: A child that has two educated parents will graduate from college.

Good: Children whose parents are educated are more likely to graduate fromcollege

·

·

42/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

42 of 43 05/25/2015 08:22 PM

Page 43: Chapter 6 · Review: Comparing Variables In previous chapters, we looked for relationships (associations) between variables by: In this chapter, we will: Comparing categorical variables

Summary

We can use scatterplots to find relationships and outliers between twonumeric variables

The variable is the explanatory variable

The variable is the response variable

A relationship between variables is called association

We can measure the strength of a linear relationship between two numericvariables using correlation

Correlation doesn't neccessarily imply causation, there may be lurkingvariables

·

· X

· Y

·

·

·

43/43

Chapter 6 http://stat.wvu.edu/~draffle/111/week2/ch6/ch6.h...

43 of 43 05/25/2015 08:22 PM