Upload
avery-straughan
View
246
Download
3
Tags:
Embed Size (px)
Citation preview
1
Regression Analysis: Outline
• Review on Regression Analysis
• Regression with Categorical explanatory variables
• Pooled Regression: Fixed Effect and Random Effect models
2
Regression Analysis in the overall context of Research
•Research Purpose– Research questions, objectives, hypotheses
•Methodology– Type of Study– Sampling plan and sample size determination– Data collection methods– Data analysis plan
•Execution– Data collection and analysis– Data collection and Data analysis– Discussion and Conclusion– Research Evaluations
3
Regression Analysis: Review
• What is Regression?• Dependence measure~ estimate the overall relationships between the
dependent and independent variables• Examples of dependent and independent variables?• Regression and Causality (~ experiment, theory )• Regression (~predict dependent) and Correlation (~ linear association)
• Uses of Regression• Descriptive~ describe relationship and how strong?• Inference ~ which variables are most important/ significant?• Predictive ~ forecasting• Hypothesis Testing
• Sample Size
4
Type of Variables in Regression Analysis
• Independent
• Dependent
• Moderating
• Mediating
• Moderation-mediation
5
Moderating Variables
• Moderating Variables
Testing Moderation• Y = b0 + b1* X + b2* Z + b3* XZ +e
Y = [b1 + b3* Z] X + [b0+b2*Z]
6
Mediator Variables
• Mediator Variables
Attitude BBI
c
ba
7
Multivariate Research Methods: Regression Analysis: Review
• How it works?
• Formalization of regression model:
• y = b0 + b1 x1+ b2 X2+ …+bk Xk+ error
– intercept, slope, error
– Examples??
• What do we observe? Y and X’s and estimate b’s• Which variables to include?
– Theory, Prior research, common sense– If you don’t have any idea?
» statistical criteria: stepwise, Forward and Backward ( in cases of only metric data??)
• Moderator Effects ~ Interaction Variables
• How to Obtain Estimates?– Least square method of Regression– Any straight line you fit will have some error– Objective is to minimize that errors e.g. sums of squared values of difference between Y and Y-
predicted.– Or minimize the sum squares errors– Y = a + b*X + e leads to e = Y - a -b*X– e2 = (Y - a - b*X)2 ~ minimize sum of e2
Systematicpart
usystematicpart
8
Multivariate Research Methods: Regression Analysis: Review
• Interpretation of parameter estimates?
• Intercept• mean of the dependent ~ when value of all independent
variables are zero• Mean of the dependent ~ when all slopes are zero• Not always meaningful
• Slopes: • Change in Y as we change one unit of X.• zero slope ? X does not affect Y
• b1, b2,…..bk: partial regression coefficients
• e.g. b1 = Change in the value of Y if X1 is changed by one unit
while all other explanatory variables are ( X2 …Xk) kept constant.
9
Multivariate Research Methods: Regression Analysis: Review
• Interpretation of parameter estimates?
• Size of the regression coefficient• depends on the scale of the explanatory variable• Which variable is a good explanatory variables then size of the
coefficient is not a good predictor for that. • Scale of the independent variables ~ within 10 times
• Beta coefficients/ or standardized coefficients,• provides relative importance
• Elasticity: This measures the percentage change in dependent variable for 1 % change in the independent variable.
elasticityX
Y
10
Multivariate Research Methods: Regression Analysis: Review
• Is Regression coefficient Significant?
• Is Regression Significant?
• Overall goodness of fit?• r2
• r ~ coefficient of multiple correlation • adjusted r2
10
1
2
2
r
TSSRSS
TSSESS
r
X
Y
TSSESS
RSS ( error)
Y= b0+bX
11
Multivariate Research Methods: Regression Analysis: Review
Major assumptionsHeteroscedasticity The variance of the error term is constant
Autocorrelation There is no autocorrelation in the error termMulticollinearity There is no exact linear relationship in the
independent variables There must be variability in the independentvariables The regression model is correctly specified The regression model is linear in parameters The mean value of the error term is zero No covariation between errors andindependent variables The error term is normally distributed
12
Multivariate Research Methods: Regression Analysis: Review
• Detecting problems with the assumptions?• Heteroscedasticity
• error variances are not same• when errors are related to either dependent or independent variables• e.g more stable saving ( or consumption) with lower income families/
larger variances with brand switchers than brand loyal customers
Income
SavingVariance
•Remedy ?? If we know the nature of heteroscedasticity, we can use WLS
• Volatility ~ Finance ??
13
Regression Analysis : Review
• Detecting problems with the assumptions?• Autocorrelation~ more a time-series problem
• when errors are correlated with consecutive obs.• Reasons? • Omitted variables• Model mis-specification
• Detection• Graphical methods• Durbin-Watson ~ DW= 2 (1-r), DW varies between 0 - 4
– ideal number is 2
Yet
X et-1Problem?• Over estimate coeff. of determination and underestimate the standard errors
Positive
Negative
14
Multivariate Research Methods: Regression Analysis: Review
• Detecting problems with the assumptions?• Multicollinearity
• presence of very high interrelations among explanatory variables (do not violate any assumption)• Symptoms:The standard errors are likely to be high, Estimates are not reliable?
• Detection
• Bivariate correlation• Variance Inflation Factor (VIF)~ 10 • Tolerance = 1/VIF
• Remedies
• Drop variables• composite variables e.g. Family life cycles, Social Status• Factor analysis
Y
X1X2 X1
Y
X2
.r1
1VIF
2i
15
Multivariate Research Methods: Regression Analysis: Review
• Detecting problems with the assumptions?• Linear in parameters
• Y = a + b*X2 + e ~ linear in parameters but non-linear in variables
• Y = a + b2 *X1 + b*X2+ e ~ non-linear in parameters: Non-linear regression
• The Regression model is correctly specified• Functional form, e.g. new consumer durable sales
• Influential observation• outliers• whether one or a few observations??
16
Regression Analysis: Review
• Outliers: In linear regression, an outlier is an observation with large residual. Problem with dependent variable??
• Leverage: An observation with an extreme value on a independent variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients.
• Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients.
• Detection• RESIDUAL CHECK
– Standardized residual– Studentized residual
– Problem approx.: abs. value > 2 ii
ii
i
ii
hse
e
hse
e
1
1
*
*
17
Regression Analysis: Review
• Transformation of variables– Dependent variable should be normally dist., constant variance etc– e.g. GNP per capita, Log(Price) etc– Retransformation ??
• Forecasting • model fit versus forecasting• forecasting independent variables
• Model Selection / comparing models• adjusted R-sq
• Model Validation• Cross-validation• Jackknife validation
18
Multivariate Research Methods: Regression Analysis: Limitations
• Nominal independent variables ~ dummy variable regression– gender, income groups, ethnicity, region, race etc.
• Measurement error~ Structural equation models
• XTrue = Xobs + ex
• Y=b0 +b1 * XTrue + eY
• Y= b0 +b1 * (Xobs + ex) + eY
• Y= b0 +b1 * Xobs + b1*ex + eY
• Y= b0 +b1 * Xobs + b1*ex + eYError term iscorrelated withx-variable ~ thisviolates the reg.assumption
19
Regression Analysis: Limitations
• Limited dependent variable– Censored dependent variable ~ lots of zeros Tobit Regression
• Expenditures in home buying • Demand in a supply restricted situation• vacation expenditures
– Truncated dependent variable ~ duration analysis, available in LIMDEP• Interpurchase times• duration of unemployment
Y (
e.g
housin
g
exp.)
X (e.g. income)
20
Regression with Categorical Explanatory Variables
• Some modeling problems • Is gender important in determining the level of expenditure
on medical expenses?• Do Nescafe’s supermarket coffee sales vary by state?• How would you model the impact of local crime on housing
prices if crime rate were rated - none, moderate or high?• How do I include income as a determinant of cigarette
demand when data have only been collected by income class?
• Examples• Medical expenditure = intercept+ b1* Gender + b2* age
group + error• Sales=intercept+ b1*Provinces+ error
21
Interpretation of regression coefficients: Binary Coding
• Midterm exam scores by sex
• average score of female and male student:
• .
Y D
if female
i i
Yi
score
Di
if male
0 1
1
0
,
,
Y
Y
avg fem
avg male
,
,
0
0 1female male
0
1
score
22
Interpretation of regression coefficients: Effect Coding
• Midterm exam scores by sex
• average score of female and male student:
• .
female male
0
1
score
12
021
10male.avgY
0meanoverallY
femaleif,1
maleif,1iD
scoreiYiD10iY
2
Note: we are not estimating 2
23
Regression Analysis: Non-Linear Regression
• Example: Sales and Price dynamics of New Product Sales
First Purchase Sales
TimePrice
Time
24
Pooled Regression: Fixed Effect and Random Effect models
• Panel Data – Cross Sectional Time Series Data
• Observations on “n” individuals (or countries, firms etc), each measured at T points in time (T can be different for each measuring unit)
• Observations are not independent
• use panel structure to get better parameter estimates
• Control for fixed or random individual differences
• Example of Data Setup….
• Software : LIMDEP ( also SAS…)
• Example: Cross-sectional survey 50% Female Participation in Labor Force??
25
Pooled Regression: Fixed Effect and Random Effect models
• Fixed Effect – individual slopes are different - shifted by “fixed” amount
• Random Effect – individual differences are random rather than fixed – random slope terms. The slope is function of mean slope value plus random error
itit'
iit eXy
- Unobserved heterogeneitythat is stable over time- This ui is uncorrelated with X’s
itit'
iit
itit'
iit
eXy
eXy
)ue(Xy iitit'
it
26
Pooled Regression: Fixed Effect and Random Effect models
• The Hausman Test:
• Model Selection – Fixed Effect vs Random Effect
– H0: that random effects would be consistent and efficient, versus
– H1: that random effects would be inconsistent. Chi-Square Test Statistic.