21
Crash Course in Correlation and Regression MEASURING ASSOCIATION • Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise. Scientists spend most of their time figuring out how one thing relates to another and structuring these relationships into explanatory theories.

Crash Course in Correlation and Regression

Embed Size (px)

DESCRIPTION

Crash Course in Correlation and Regression. MEASURING ASSOCIATION - PowerPoint PPT Presentation

Citation preview

Page 1: Crash Course in Correlation and Regression

Crash Course in Correlation and Regression

• MEASURING ASSOCIATION

• Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise. Scientists spend most of their time figuring out how one thing relates to another and structuring these relationships into explanatory theories.

Page 2: Crash Course in Correlation and Regression

Scatterplots

A. scatter diagram

A list of 1,000 data points would be impossible to grasp. [so we need some method that can examine this data and convert it into a more conceivable format]. One method is plotting the data for two variables (education and income; father’s height and son’s height; team spending in baseball and % wins) in a graph called a scatter diagram.

Page 3: Crash Course in Correlation and Regression

r = 1.0

Page 4: Crash Course in Correlation and Regression

r = .85

Page 5: Crash Course in Correlation and Regression

r = .42

Page 6: Crash Course in Correlation and Regression

R = .17

Page 7: Crash Course in Correlation and Regression

R = - .94

Page 8: Crash Course in Correlation and Regression

R = - .54

Page 9: Crash Course in Correlation and Regression

R = - .33

Page 10: Crash Course in Correlation and Regression

• Formula for the Correlation Coefficient

2222 YNYXNX

YXNXYr

Page 11: Crash Course in Correlation and Regression

Interpreting correlation coefficients

• Ranges from -1 to +1. [0 = no association; .25 weak; .5 moderate; .75 < strong]

• Square correlation coefficient to creat “R-squared” defined as the proportion of the variance of one variable accounted for by another variable a.k.a PRE STATISTIC (Proportionate Reduction of Error)

• Which bring us to Regression

Page 12: Crash Course in Correlation and Regression

MLB spending and performance example (Hoover & Donovan 2001):

Y [team finish] = + X [spending]

Expressing the model in words: values of the Y variable (team finish: 1st place, 2nd place, etc.) are a function of some constant (), plus some amount of the X variable (spending).

How much change in the Y variable (team finish) is associated with a change in the X variable (spending). The answer lies in β (beta), a.k.a the regression coefficient. In the baseball example, it would be the amount of improvement in team finish associated with an additional $1 million in spending on players’ salaries.

Page 13: Crash Course in Correlation and Regression

Hoover and Donovan using 1999 MLB season data and a bivariate regression found:

Team finish = 4.4 – 0.03 x spending (in $millions)

Interpretation: The beta (a.k.a the slope) suggests the relationship between spending and team finish was –0.03. Or, for each million dollars that a team spends, there is only a 3 percent change in division position. These results show that a team spending $70 million on players will finish close to second place. We can also show that any given team would have to spend almost $34 million more to improve its team finish by one position (-0.03 x $34million = 1.02).

The correlation was -0.39 which means that spending explains only 15 percent of variation in the team’s finish (r-squared = .15 = -0.39 x -0.39).

Page 14: Crash Course in Correlation and Regression

Another Baseball Example

• Testing Causality Between Team Performance and Payroll : The Cases of Major League Baseball and English Soccer

• By Stephen Hall, Stefan Szymanski and Andrew S. Zimbalist

• Journal of Sports Economics 2002

Page 15: Crash Course in Correlation and Regression
Page 16: Crash Course in Correlation and Regression
Page 17: Crash Course in Correlation and Regression

Multiple Regression

Multiple regression contains a single dependent variable and two or more independent variables. Multiple regression is particularly appropriate when the causes (independent variables) are inter-correlated, which again is usually the case.

Page 18: Crash Course in Correlation and Regression

Multivariate Regression is a powerful tool to examine how multiple factors (independent variables) influence a dependent variable.

It differs from bivariate regression in that it can identify the independent effect a variable has on a dependent variable by holding all other variables constant?

What other variables would we include in the baseball model to predict winning %?

Page 19: Crash Course in Correlation and Regression

Y

X1

X2

Page 20: Crash Course in Correlation and Regression

X1

Y

X2

c

Page 21: Crash Course in Correlation and Regression

In figure 1 the fact that X1 and X2 do not overlap means that they are not correlated, but each is correlated with Y. This is great and means we don’t need sophisticated analysis, just two separate bivariate regressions.

In figure 2, X1 and X2 are correlated. The area C is created by the correlation between X1 and X2; c represents the proportion of the variance in Y that is shared jointly with X1 and X2.

How do we deal with C? We can’t count it twice or we will get a variation that is greater than 100%. Multivariate Regression