26
1 Regression Analysis: Outline • Review on Regression Analysis • Regression with Categorical explanatory variables • Pooled Regression: Fixed Effect and Random Effect models

1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

Embed Size (px)

Citation preview

Page 1: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

1

Regression Analysis: Outline

• Review on Regression Analysis

• Regression with Categorical explanatory variables

• Pooled Regression: Fixed Effect and Random Effect models

Page 2: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

2

Regression Analysis in the overall context of Research

•Research Purpose– Research questions, objectives, hypotheses

•Methodology– Type of Study– Sampling plan and sample size determination– Data collection methods– Data analysis plan

•Execution– Data collection and analysis– Data collection and Data analysis– Discussion and Conclusion– Research Evaluations

Page 3: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

3

Regression Analysis: Review

• What is Regression?• Dependence measure~ estimate the overall relationships between the

dependent and independent variables• Examples of dependent and independent variables?• Regression and Causality (~ experiment, theory )• Regression (~predict dependent) and Correlation (~ linear association)

• Uses of Regression• Descriptive~ describe relationship and how strong?• Inference ~ which variables are most important/ significant?• Predictive ~ forecasting• Hypothesis Testing

• Sample Size

Page 4: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

4

Type of Variables in Regression Analysis

• Independent

• Dependent

• Moderating

• Mediating

• Moderation-mediation

Page 5: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

5

Moderating Variables

• Moderating Variables

Testing Moderation• Y = b0 + b1* X + b2* Z + b3* XZ +e

Y = [b1 + b3* Z] X + [b0+b2*Z]

Page 6: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

6

Mediator Variables

• Mediator Variables

Attitude BBI

c

ba

Page 7: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

7

Multivariate Research Methods: Regression Analysis: Review

• How it works?

• Formalization of regression model:

• y = b0 + b1 x1+ b2 X2+ …+bk Xk+ error

– intercept, slope, error

– Examples??

• What do we observe? Y and X’s and estimate b’s• Which variables to include?

– Theory, Prior research, common sense– If you don’t have any idea?

» statistical criteria: stepwise, Forward and Backward ( in cases of only metric data??)

• Moderator Effects ~ Interaction Variables

• How to Obtain Estimates?– Least square method of Regression– Any straight line you fit will have some error– Objective is to minimize that errors e.g. sums of squared values of difference between Y and Y-

predicted.– Or minimize the sum squares errors– Y = a + b*X + e leads to e = Y - a -b*X– e2 = (Y - a - b*X)2 ~ minimize sum of e2

Systematicpart

usystematicpart

Page 8: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

8

Multivariate Research Methods: Regression Analysis: Review

• Interpretation of parameter estimates?

• Intercept• mean of the dependent ~ when value of all independent

variables are zero• Mean of the dependent ~ when all slopes are zero• Not always meaningful

• Slopes: • Change in Y as we change one unit of X.• zero slope ? X does not affect Y

• b1, b2,…..bk: partial regression coefficients

• e.g. b1 = Change in the value of Y if X1 is changed by one unit

while all other explanatory variables are ( X2 …Xk) kept constant.

Page 9: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

9

Multivariate Research Methods: Regression Analysis: Review

• Interpretation of parameter estimates?

• Size of the regression coefficient• depends on the scale of the explanatory variable• Which variable is a good explanatory variables then size of the

coefficient is not a good predictor for that. • Scale of the independent variables ~ within 10 times

• Beta coefficients/ or standardized coefficients,• provides relative importance

• Elasticity: This measures the percentage change in dependent variable for 1 % change in the independent variable.

elasticityX

Y

Page 10: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

10

Multivariate Research Methods: Regression Analysis: Review

• Is Regression coefficient Significant?

• Is Regression Significant?

• Overall goodness of fit?• r2

• r ~ coefficient of multiple correlation • adjusted r2

10

1

2

2

r

TSSRSS

TSSESS

r

X

Y

TSSESS

RSS ( error)

Y= b0+bX

Page 11: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

11

Multivariate Research Methods: Regression Analysis: Review

Major assumptionsHeteroscedasticity The variance of the error term is constant

Autocorrelation There is no autocorrelation in the error termMulticollinearity There is no exact linear relationship in the

independent variables There must be variability in the independentvariables The regression model is correctly specified The regression model is linear in parameters The mean value of the error term is zero No covariation between errors andindependent variables The error term is normally distributed

Page 12: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

12

Multivariate Research Methods: Regression Analysis: Review

• Detecting problems with the assumptions?• Heteroscedasticity

• error variances are not same• when errors are related to either dependent or independent variables• e.g more stable saving ( or consumption) with lower income families/

larger variances with brand switchers than brand loyal customers

Income

SavingVariance

•Remedy ?? If we know the nature of heteroscedasticity, we can use WLS

• Volatility ~ Finance ??

Page 13: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

13

Regression Analysis : Review

• Detecting problems with the assumptions?• Autocorrelation~ more a time-series problem

• when errors are correlated with consecutive obs.• Reasons? • Omitted variables• Model mis-specification

• Detection• Graphical methods• Durbin-Watson ~ DW= 2 (1-r), DW varies between 0 - 4

– ideal number is 2

Yet

X et-1Problem?• Over estimate coeff. of determination and underestimate the standard errors

Positive

Negative

Page 14: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

14

Multivariate Research Methods: Regression Analysis: Review

• Detecting problems with the assumptions?• Multicollinearity

• presence of very high interrelations among explanatory variables (do not violate any assumption)• Symptoms:The standard errors are likely to be high, Estimates are not reliable?

• Detection

• Bivariate correlation• Variance Inflation Factor (VIF)~ 10 • Tolerance = 1/VIF

• Remedies

• Drop variables• composite variables e.g. Family life cycles, Social Status• Factor analysis

Y

X1X2 X1

Y

X2

.r1

1VIF

2i

Page 15: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

15

Multivariate Research Methods: Regression Analysis: Review

• Detecting problems with the assumptions?• Linear in parameters

• Y = a + b*X2 + e ~ linear in parameters but non-linear in variables

• Y = a + b2 *X1 + b*X2+ e ~ non-linear in parameters: Non-linear regression

• The Regression model is correctly specified• Functional form, e.g. new consumer durable sales

• Influential observation• outliers• whether one or a few observations??

Page 16: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

16

Regression Analysis: Review

• Outliers: In linear regression, an outlier is an observation with large residual. Problem with dependent variable??

• Leverage: An observation with an extreme value on a independent variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients.

• Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients.

• Detection• RESIDUAL CHECK

– Standardized residual– Studentized residual

– Problem approx.: abs. value > 2 ii

ii

i

ii

hse

e

hse

e

1

1

*

*

Page 17: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

17

Regression Analysis: Review

• Transformation of variables– Dependent variable should be normally dist., constant variance etc– e.g. GNP per capita, Log(Price) etc– Retransformation ??

• Forecasting • model fit versus forecasting• forecasting independent variables

• Model Selection / comparing models• adjusted R-sq

• Model Validation• Cross-validation• Jackknife validation

Page 18: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

18

Multivariate Research Methods: Regression Analysis: Limitations

• Nominal independent variables ~ dummy variable regression– gender, income groups, ethnicity, region, race etc.

• Measurement error~ Structural equation models

• XTrue = Xobs + ex

• Y=b0 +b1 * XTrue + eY

• Y= b0 +b1 * (Xobs + ex) + eY

• Y= b0 +b1 * Xobs + b1*ex + eY

• Y= b0 +b1 * Xobs + b1*ex + eYError term iscorrelated withx-variable ~ thisviolates the reg.assumption

Page 19: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

19

Regression Analysis: Limitations

• Limited dependent variable– Censored dependent variable ~ lots of zeros Tobit Regression

• Expenditures in home buying • Demand in a supply restricted situation• vacation expenditures

– Truncated dependent variable ~ duration analysis, available in LIMDEP• Interpurchase times• duration of unemployment

Y (

e.g

housin

g

exp.)

X (e.g. income)

Page 20: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

20

Regression with Categorical Explanatory Variables

• Some modeling problems • Is gender important in determining the level of expenditure

on medical expenses?• Do Nescafe’s supermarket coffee sales vary by state?• How would you model the impact of local crime on housing

prices if crime rate were rated - none, moderate or high?• How do I include income as a determinant of cigarette

demand when data have only been collected by income class?

• Examples• Medical expenditure = intercept+ b1* Gender + b2* age

group + error• Sales=intercept+ b1*Provinces+ error

Page 21: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

21

Interpretation of regression coefficients: Binary Coding

• Midterm exam scores by sex

• average score of female and male student:

• .

Y D

if female

i i

Yi

score

Di

if male

0 1

1

0

,

,

Y

Y

avg fem

avg male

,

,

0

0 1female male

0

1

score

Page 22: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

22

Interpretation of regression coefficients: Effect Coding

• Midterm exam scores by sex

• average score of female and male student:

• .

female male

0

1

score

12

021

10male.avgY

0meanoverallY

femaleif,1

maleif,1iD

scoreiYiD10iY

2

Note: we are not estimating 2

Page 23: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

23

Regression Analysis: Non-Linear Regression

• Example: Sales and Price dynamics of New Product Sales

First Purchase Sales

TimePrice

Time

Page 24: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

24

Pooled Regression: Fixed Effect and Random Effect models

• Panel Data – Cross Sectional Time Series Data

• Observations on “n” individuals (or countries, firms etc), each measured at T points in time (T can be different for each measuring unit)

• Observations are not independent

• use panel structure to get better parameter estimates

• Control for fixed or random individual differences

• Example of Data Setup….

• Software : LIMDEP ( also SAS…)

• Example: Cross-sectional survey 50% Female Participation in Labor Force??

Page 25: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

25

Pooled Regression: Fixed Effect and Random Effect models

• Fixed Effect – individual slopes are different - shifted by “fixed” amount

• Random Effect – individual differences are random rather than fixed – random slope terms. The slope is function of mean slope value plus random error

itit'

iit eXy

- Unobserved heterogeneitythat is stable over time- This ui is uncorrelated with X’s

itit'

iit

itit'

iit

eXy

eXy

)ue(Xy iitit'

it

Page 26: 1 Regression Analysis: Outline Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random

26

Pooled Regression: Fixed Effect and Random Effect models

• The Hausman Test:

• Model Selection – Fixed Effect vs Random Effect

– H0: that random effects would be consistent and efficient, versus

– H1: that random effects would be inconsistent. Chi-Square Test Statistic.