39
Linear Correlations Vignon S. Oussa December 7, 2011 Vignon S. Oussa () Linear Correlations December 7, 2011 1/8

Linear Correlations

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Correlations

Linear Correlations

Vignon S. Oussa

December 7, 2011

Vignon S. Oussa () Linear Correlations December 7, 2011 1 / 8

Page 2: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 3: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 4: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 5: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 6: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 7: Linear Correlations

Content

Introduction

Measure of dispersion

Concept of linear correlation of variables

Computations in Mathematica

Misc

questions

Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8

Page 8: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 9: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 10: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 11: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 12: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 13: Linear Correlations

What does it mean when we say smoking causes lung cancer?

Does it mean you will get cancer if you smoke a single cigarette?

People smoke for several years but never had lung cancer

Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)

Studies showed a clear correlation between smoking and lung cancer.

These studies are bases on statistics

.

Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8

Page 14: Linear Correlations

De�nitionA correlation exists between two variables when higher values of onevariable consistently go with higher values of another, or when highervalues of one variable consistently go with lower values of another.

ExampleThere is a correlation between the variables height and weight for people.Taller people trend to weight more than shorter people.

ExampleThere is a correlation between the variables demand for apples and price ofapples; demands tend to be lower when the price is higher.

ExampleThere is a correlation between practice time and skill among piano players;those who practice more tend to be more skilled

Vignon S. Oussa () Linear Correlations December 7, 2011 4 / 8

Page 15: Linear Correlations

Given x1, � � � , xN be N measurement of observations in population of Nindividuals with average µ.

Population mean µ =x1 + � � �+ xN

N= average.

Population Variance σ2 =∑Ni=1 (xi � µ)2

N

Pop Stand Deviation σ =p

σ2 =

s∑Ni=1 (xi � µ)2

N.

Vignon S. Oussa () Linear Correlations December 7, 2011 5 / 8

vignon
Typewritten Text
s
vignon
Typewritten Text
vignon
Typewritten Text
vignon
Typewritten Text
vignon
Typewritten Text
vignon
Typewritten Text
Page 16: Linear Correlations

Now let x1, � � � , xn be n observations in the sample of the originalpopulation with mean x

Sample mean x =x1 + � � �+ xn

n

Sample Variance s2 =∑ni=1 (xi � x)

2

n� 1

Sample Stand Deviation s =ps2 =

s∑ni=1 (xi � x)

2

n� 1 .

Vignon S. Oussa () Linear Correlations December 7, 2011 6 / 8

Page 17: Linear Correlations

Some Sample data

Company Revenue in 1998GM 178.1Wal-Mart 119.3IBM 78.5Boeing 45.8

Sample mean = 105.425

Sample variance = 3251.42

Sample standard deviation = 57.02

Vignon S. Oussa () Linear Correlations December 7, 2011 7 / 8

Page 18: Linear Correlations

De�nitionsBivariate data are data in which 2 variables are measured on anindividual.

We are interested in the fundamental question

ProblemHow can the value of one variable be used to predict the value of the othervariable?

Vignon S. Oussa () Linear Correlations December 7, 2011 8 / 8

Page 19: Linear Correlations

Scatter Diagrams and Correlation

4-2

Objectives fJ Draw and interpret scatter diagrams

S Describe the properties of the linear correlation coefficient

£J Compute and interpret the linear correlation coefficient

I!J Determine whether a linear relation exists between two variables

fa Explain the difference between correlation and causation

Page 20: Linear Correlations

4-33

IJ Dra'W' a:n.d Interpret Se!atter Diagra:m.s

Page 21: Linear Correlations

4-4

The reS()Onse variable is the variable whose value can be explained by the value of the eX()Ianatory or 1nedictor variable.

Page 22: Linear Correlations

4-5

A scatter diagram is a graph that shows the relationship between two quantita­tive variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the h01izontal axis, and the response variable is plotted on the vertical axis.

Page 23: Linear Correlations

EXAMPLE Drawing and Interpreting a Scatter Diagram

The data shown to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the explanatory variable, x, and time (in minutes) to drill five feet is the response variable, y. Draw a scatter diagram of the data.Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6.

4-6

vignon
Callout
We want to test the linear correlation between the 2 variables.
vignon
Typewritten Text
vignon
Typewritten Text
Page 24: Linear Correlations

4-7

8-

(1.)

E_ jZ:

6- • • I

50

Drilling Time versus Depth

•• • • •

• • •

• •

I I I

100 150 200

Depth

vignon
Text Box
This graph can be obtained using Mathematica
Page 25: Linear Correlations

Various Types of Relations in a Scatter Diagram

Response

Response

•• ••• ••• • ••• •• • • • • •• Explanatory (a) Linear

••• -~···· •• •• •• • • •• •• Explanatory

(c) Nonlinear

Response

Response

•• • •• • • • •• • • • •• ••• • • Explanatory (b) Linear

•• • •• •••• •• • ••••• •• • • •• • • Explanatory

(d) Nonlinear

Response

• • • • • ••••• • • • •

Explanatory (e) No relation

vignon
Text Box
Be aware that a scatter diagram depends on the scaling of each axis.
Page 26: Linear Correlations
vignon
Typewritten Text
• Two variables that are linearly related are said to be positively associated if whenever the value of one variable increases, the value of the other variable also. • Two variables that are linearly related are said to be negatively associated when the value of one variable increases, the value of the other variable decreases
Page 27: Linear Correlations

4-10

fJ Des~ibe the Properties of the Linear Correlation Coeffi~ient

Page 28: Linear Correlations

4-11

The linear correlation coeflicient or Pearson product moment correlation coef­ficient is a measure of the strength and direction of the linear relation between two quantitative va1iables. We use the Greek letter p (rho) to represent the pop­ulation correlation coefficient and r to represent the sample conelation coeffi­cient. We present only the formula for the sample correlation coefficient.

Samtlle Linear Correlation Coefficient':'

r= n -1

where x is the sample mean of the explanatory variable

Sx is the sample standard deviation of the explanatory vmiable

y is the sample mean of the response vmiable

Sy is the sample standard deviation of the response va1iable

n is the number of individuals in the sample

(1)

vignon
Callout
The sign of r is dictated by these values.
vignon
Rectangle
Page 29: Linear Correlations

4-12

Properties of the Linear Correlation Coefficient

1. The linear correlation coefficient is always between -1 and 1, inclusive. That is, -1 < r < 1. 2. If r = + 1, then a perfect positive linear relation exists between the two vati­ables. See Figure 4(a). 3. If r = -1, then a perfect negative linear relation exists between the two vati­ables. See Figure 4( d). 4. The closer r is to + 1, the stronger is the evidence of positive association between the two variables. See Figures 4(b) and 4(c).

5. The closer r is to -1, the stronger is the evidence of negative association between the two variables. See Figures 4( e) and 4(f). 6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. Because the linear correlation coefficient is a measme of the strength of the linear relation, r close to 0 does not imt)ly no relation, just no linear relation. See Figures 4(g) and 4(h). 7. The linear correlation coefficient is a unitless measure of association. So the unit of measure for x and y plays no role in the interpretation of r. 8. The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the lin­ear correlation coefficient.

Page 30: Linear Correlations

4-13

• • • • • • • • • (;1) Perfect positive

linear relation, r = 1

I • • • •

• : I • • I • • • •

(e) Strong negative

linear relation, r = - 0.9

• •

I I •

! I ~ • • • I I

• • I I •

(b) Strong positive linear relation, r = 0.9

I • • • • • • •

• • • • • • I • • • • • • (I) Moderate negative

• I

linear relation, r = - 0.4

• I

• • I • • • • • • • •

• • I • •••• I • • • • •

(c) Moderate positive linear relation, r = 0.4

I • I •

• • I • • • • • • • • • •

(g) No linear relation, r close to 0.

I •

I •

• • I

• •

• • • • • • • • • • • (d) Perfect negative

linear relation, r = - 1

• • I • • • I • • • • • I I

(h) No linear relation, r close to 0.

I I

Page 31: Linear Correlations

4-14

fJ Co:tnpute and Interpret the Linear Correlation Coeffie!ient

Page 32: Linear Correlations

EXAMPLE Determining the Linear Correlation Coefficient

Determine the linear correlation coefficient of the drilling data.

4-15

vignon
Text Box
We will show how to use CAS to compute the linear correlation coefficient.
Page 33: Linear Correlations

4-16

Depth, x Time, y

35 5.88 -1.74712 -1.41633 2.474501

50 5.99 -1.45992 -1.27544 1.862051

75 6.74 -0.98126 -0.31486 0.308958

95 6.1 -0.59833 -1.13456 0.678839

120 7.47 -0.11967 0.620111 -0.07421

130 6.93 0.0718 -0.07151 -0.00513

145 6.42 0.358998 -0.72471 -0.26017

155 7.97 0.550463 1.260501 0.693859

160 7.92 0.646196 1.196462 0.773149

175 7.62 0.933394 0.812228 0.758129

185 6.89 1.12486 -0.12274 -0.13807

190 7.9 1.220592 1.170846 1.429126

126.25 6.985833 8.501037

sx = 52.2287 4 sv = 0. 78077 4

Page 34: Linear Correlations

17

18.501037

12 10.773

i i

x y

x x y ys s

rn

− − =

=−

=

4-17

Page 35: Linear Correlations

184-18

fJ Deter:tnine Whether a Linear Relation Exists bet'W'een TW'o Variables

Page 36: Linear Correlations

Does a Linear Relation Exist?

Since is fairly close to +1, we can say there is a positive linear relation between time to drill five feet and depth at which drilling begins.

194-19

vignon
Highlight
Page 37: Linear Correlations

204-20

fEJ Explain the Differen~e bet'Ween Correlation and Causation

Page 38: Linear Correlations

21

According to data obtained from the Statistical Abstract of the United States, the correlation between the percentage of the female population with a bachelor’s degree and the percentage of births to unmarried mothers since 1990 is 0.940.

Does this mean that a higher percentage of females with bachelor’s degrees causes a higher percentage of births to unmarried mothers?

Certainly not! The correlation exists only because both percentages have been increasing since 1990. It is this relation that causes the high correlation. In general, time series data (data collected over time) will have high correlations because each variable is moving in a specific direction over time (both going up or down over time; one increasing, while the other is decreasing over time).

When data are observational, we cannot claim a causal relation exists between two variables. We can only claim causality when the data are collected through a designed experiment.

4-21

vignon
Highlight
Page 39: Linear Correlations

Another way that two variables can be related even though there is not a causal relation is through a lurking variable.

A lurking variable is related to both the explanatory and response variable.

For example, ice cream sales and crime rates have a very high correlation. Does this mean that local governments should shut down all ice cream shops? No! The lurking variable is temperature. As air temperatures rise, both ice cream sales and crime rates rise.

4-22

vignon
Highlight