26
Data Analysis: Relationships Continued Regression Research Methods Dr. Gail Johnson

Data Analysis: Relationships Continued Regression Research Methods Dr. Gail Johnson

Embed Size (px)

Citation preview

Data Analysis: Relationships Continued

Regression

Research Methods

Dr. Gail Johnson

Simple Regression

• Enables us to estimate the:• Strength of relationship

Expressed as the percent of variance explained

• How much change you can expect in the dependent variable based on a one unit change in the independent variable

• Enables you to make predictive estimates

Relationships

Correlation is not causation.

Statistical measurement includes the measurement of relationships. There are 2 ways to measure the strength of a relationship:

1. How great a difference the independent variable makes on the dependent variable (sometimes called the effect-description). This allows you to predict the effect of the IV on the DV but you have to have interval data.

Relationships

2. How completely the dependent variable is explained by the independent variable. (Correlational). (R squared).

Simple Regression

• Assumes a linear relationship

• Interval level (or dichotomous: which means coded 0 or 1) data

• Independent Variable: interval level

• Random or census

Simple Regression

Y = a + bX + error

Where:a = the constant or Y interceptb = the regression coefficient, or slopeY = predicted value of the dependant

variableX = the independent variable.

Simple Regression

• Estimate car repair costs for motor poolY= car repair costsX = miles driven

• Collect data and crunch it. You get these results:

Y = -267 and .018X

Simple Regression

• Estimate car repair costs

Y = -267 and .018X

• Interpretation: for every mile driven, the repair costs

goes up by 1.8 cents.For every 100 miles driven, costs go

up by $1.80

Simple Regression

• Y = -267 and .018X

• If you expect the cars to be driven a total of 100,000 miles, how much will car repair costs likely be?100,000 x .018 = $1,800

• Solve equation:Y = -267 + 1,800 = $1,763

Simple Regression

r= correlation coefficient (overall fit) (measure of association but non-directional; zero-order correlational coefficient).

r2 = proportion of explained variation

1-r2 = proportion of unexplained variation

Life is more complex

• Rarely will any one single variable cause something to happen

• Life is inherently multivariate

• What are the possible causes for urban decay?

What are the possible causes for urban decay?

• lack of jobs• high % of absentee

landlords• low % of homeowners• poor quality of

schools• increased

concentration of poor

• increase in drugs, crime

• aging housing stock• flight of middle class

to suburbs• corruption• aging infrastructure• business flight to

suburbs

What caused drop in crime?

• Changing demographics?

• Better policing?• Strong economy?• Gun control laws?• Concealed weapons

laws?• Increased use of death

penalty?

• Increase in number of police?

• Rising prison population?

• Waning Crack epidemic?

• Legalization of abortion?

Multiple Regression

• Multiple Regression lets you do four things: test your hypothesispredict the dependent variable if you know the

values for independent variablesPredicts the independent effect of each independent

variable while controlling for the others tells you the relative strength of each of the

independent variable using the beta weights

Multiple Regression

Y = a1 + bX1 + bX2 + bX3 + b X4 + e.Y = dependent variableX1 = independent variable 1,

controlling for X2, X3, X4X2 = independent variable 2

controlling for X1, X3, X4X3 = independent variable 3

controlling for X1, X2, X4X4= independent variable 4

controlling for X1, X2, X3

Multiple Regression

Income as a function of education and seniority?

Y = Income (dep. Var.)Y (Income) = a + education + seniority

Y= 6000 + 400X1 + 200X2based on Lewis-Beck example

Multiple Regression

Y= 6000 + 400X1 + 200X2

R square. = .67

67% of the variation in income is explained by these two variables. Excellent!

For every year of education, holding seniority constant, income increases by $400.

For every year of seniority, holding education constant, income increases by $200.

Multiple Regression

Y= 6000 + 400X1 + 200X2

Example:

Estimate the income of someone who has 10 years of education and

5 years of seniority:

Y=6000 + 400(10) + 200(5)

Y= $ 11,000

Multiple Regression

Relationship between contributions to political campaigns as a function of age and income?

Y= campaign contribution (dollars)

x1 = age (years)

X2 = income (dollars)

Multiple Regression

Relationship between contributions to political campaigns as a function of age and income.

Y = 8 + 2X1 + .010X2

(age) (income)

For every increase in age, contributions go up by $2.

For every increase in income, contributions go up .01 dollars

Multiple Regression

Y = 8 + 2X1 + .010X2

Y= campaign contribution (dollars)

But which is stronger?

Need to look at the Beta weights

Age = .15

Income = .45

Beta Weights

• Workbook: Table 10.15

Quick Analysis

• Whenever you are dealing with a correlation (regression analysis)

• First check the R squared value.• A good study will have this• If it is low, then you know that it is not a

strong model and they shouldn’t be making grand conclusions

• Make sure they meet 4 conditions necessary for causality

Example

• Study tried to determine what explained why some cities introduced reinvention.

• Sent out a survey, respectable response rate• Tested 13 factors they thought would

explain reinvention• R squared was .05• What do you conclude?

Example

• They ran a second model

• Included managers’ attitudes about innovation

• R squared was .22

• What do you conclude?

The Levitt Article

• What data does he show?

• What kind of question is he asking?

• Does he show correlations?

• Does he build a multivariate model?

• Did anyone see an R squared in this article?