27
Econometrics ASSIGNMENT ON ‘Multiple Regression Analysis’ Prepared For:- Dr. Md. Kamal Uddin Professor Department of International Business University of Dhaka Prepared By:- Hazera Akter Roll No: 01 8 th Semester , BBA 1 St Batch Department Of International Business University of Dhaka Date of Submission 7 th April , 2012

Econometric in application

Embed Size (px)

Citation preview

Page 1: Econometric in application

Econometrics

ASSIGNMENTON

‘Multiple Regression Analysis’

Prepared For:-

Dr. Md. Kamal Uddin

Professor

Department of International Business

University of Dhaka

Prepared By:-

Hazera Akter

Roll No: 01

8th Semester , BBA 1St Batch

Department Of International Business

University of Dhaka

Date of Submission

7th April , 2012

Page 2: Econometric in application

Assignment Topic

‘Multiple Regression Analysis with Test of Heteroskedasticity, Autocorrelation and Multicollinearity ’

Table of Contents

Topics Page No.

Analysis Summery

Data Set

ANALYSIS SUMMARY

2

Page 3: Econometric in application

In multiple regression analysis, we study the relationship between an explained variable and a number of explanatory variables. In this Assignment, the current salary structure has been analyzed with the effects of some influential factors for setting salary. The purpose of this analysis includes,

Cause analysis: Learn more about the relationship between several independent variables and a dependent variable.

Impact analysis: Assess the impact of changing an independent variable to the value of dependent variable.

Time series analysis: Predict values of a time series, using either previous values of just that one series, or values from other series as well.

In the detailed analysis of the Multiple Regression, The Interpretation incudes,

• Considering the R2 (0.491) value ,we can infer that for overall estimation

this model is not strong.

• The model for Salary estimation for Employee of Coca-Cola company includes almost all collinear variables.

• But this model is very useful considering for having very low Heteroskedasticity and Autocorrelation problem.

So, these overall analysis results would help the management of Coca-Cola company to set or estimate Salary in revised decision round.

Data Set

3

Page 4: Econometric in application

A multinational corporation named “The Coca-Cola Company” would like to study on their employees’ salary structure in their Bangladesh Subsidiary Venture, by predicting Salary based on some influential factors like Gender, Age, Education Level of the employees. A sample of 30 employees’ current salary data is randomly drawn to perform a Regression analysis. The Data set is exhibited below_

In this Data set,

Dependent Variable, Y= Current Salary

ID CurrentSalary (Tk)

Gender

Job Seniority

Age Education Level

Work Experience

MinorityClass

1 16080 0 81 28.50 16 0.25 02 41400 0 73 40.33 16 12.50 13 21960 1 83 31.08 15 4.08 04 19200 0 93 31.17 16 1.83 15 28350 0 83 41.92 19 13.00 06 27250 1 80 29.50 18 2.42 07 16080 0 79 28.00 15 3.17 08 14100 0 67 28.75 15 0.50 09 12420 1 96 27.42 15 1.17 110 12300 1 77 52.92 12 26.42 011 15720 0 84 33.50 15 6.00 112 8880 1 88 54.33 12 27.00 013 22000 0 93 32.33 17 2.67 014 22800 0 98 41.17 15 12.00 015 19020 1 64 31.92 19 2.25 116 12300 1 94 46.25 12 20.00 017 22200 1 81 30.75 19 5.17 018 10380 1 72 32.67 15 6.92 119 8520 0 70 58.50 15 31.00 020 27500 0 89 34.17 17 3.17 021 11460 1 79 46.58 15 21.75 122 20500 0 83 35.17 16 5.75 023 27700 0 85 43.25 20 11.17 124 28000 1 65 28.00 16 1.58 125 22000 1 65 39.75 19 10.75 026 27250 0 78 30.08 19 2.92 027 27000 0 83 30.17 17 0.75 128 9000 1 70 44.50 12 18.00 029 31300 0 91 30.17 18 3.92 130 11760 0 70 26.83 15 1.25 0

4

Page 5: Econometric in application

Independent Variable,

X1= Sex of Employee

X2= Job Seniority

X3= Age of Employee

X4= Education Level

X5= Work Experience

X6= Minority Classification

Type of Scales Used Here

Attributes of measurement object in this analysis can be measured by different types of scales:

Nominal Scale: X1= Sex of Employee “ Where Male = 0 and Female = 1”

X6= Minority Classification “ Where White = 0 and Nonwhite = 1”

Ratio Scale: X2= Job Seniority(Years in only in Coca-Cola)

X3= Age of Employee(Years)

X4= Education Level(Scores)

X5= Work Experience(Years- overall job life)

All of these Variable has Numeric Value and can obtain an absolute Zero.

So, In this Multivariate Data Set we have to perform a Multiple Regression Analysis for predicting Possible Current Salary of an employee.

NOTE: All the analysis has been performed with the “SPSS” Software. For the ease of presentation of analysis the Variables are discussed with their detailed names/meanings.

MULTIPLE REGRESSION ANALYSIS RESULTS

5

Page 6: Econometric in application

Variables Entered/Removed

Model

Variables

Entered

Variables

Removed Method

1 MINORITY

CLASSIFICATIO

N, JOB

SENIORITY,

AGE OF

EMPLOYEE,

SEX OF

EMPLOYEE,

EDUCATIONAL

LEVEL, WORK

EXPERIENCEa

. Enter

a. All requested variables entered.

Model Summaryb

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .701a .491 .358 6458.883

a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB

SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,

EDUCATIONAL LEVEL, WORK EXPERIENCE

b. Dependent Variable: CURRENT SALARY

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9.246E8 6 1.541E8 3.694 .010a

Residual 9.595E8 23 4.172E7

Total 1.884E9 29

6

Page 7: Econometric in application

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -25969.540 23234.542 -1.118 .275

SEX OF EMPLOYEE -2126.081 2778.333 -.133 -.765 .452

JOB SENIORITY 82.398 130.286 .100 .632 .533

AGE OF EMPLOYEE 263.053 829.669 .286 .317 .754

EDUCATIONAL LEVEL 2026.429 707.189 .564 2.865 .009

WORK EXPERIENCE -298.406 870.804 -.329 -.343 .735

MINORITY

CLASSIFICATION

1846.496 2528.644 .112 .730 .473

a. Dependent Variable: CURRENT SALARY

Thus , The estimated Model of Multiple Regression Equation,

Commentary on resulted Model

This equation suggests that Education Level is far more important than all other independent variables. The equation says that one more score on education background, holding all other independent variables constant, results in an increase in Salary of TK. 2026. That is, if we consider the persons with the same level of other positions, the one with one more score of education can be expected to have higher salary of TK. 2026.

After Education level Minority classification is considered highly in setting salary structure. Here if we consider people with same level in all other

7

Y = −25969.54 −2126.081 X1 + 82.398X2 + 263.053X3 + 2026.429X4

−298.406 X5 +1846.496 X6 + Ui (Regression of y on x) R2=0.491 Ui= Errors

Page 8: Econometric in application

independent variables (constant), the one White/ Nonwhite (with any particular race determined by company management) can expected to have incrementing salary structure and thus higher salary of TK. 2126.

The equation also says that one more year of job seniority, holding all other independent variables constant, results in an increase in Salary of TK. 82. That is, if we consider the persons with the same level of other positions, the one with one more year on job on the Coca-Cola company, can be expected to have higher salary of TK. 82.

This equation also shows that one more year of Age, holding all other independent variables constant, results in an increase in Salary of TK. 263. That is, if we consider the persons with the same level of other positions, the one with one more year of age, can be expected to have higher salary of TK. 263.This shows the age of Employee is more influential than their job years on the company.

Here if we consider people with same level in all other independent variables (constant), the one with sex male/ female (with any particular sex determined by company management) can expected to have discriminatory salary structure and thus lower salary of TK. 2126.Of course, all these numbers are subject to uncertainty, it will be clear that we should be dropping the variable X1 completely.

Similarly if we consider two people with same education level and holding all other independent variables constant, the one with one more year of

experience can expected to have lower salary of TK. 298 2126.Of course, all these numbers are subject to uncertainty, it will be clear that we should be dropping the variable X5 completely.

Interpretation of the constant term:

Clearly, that is the salary one would get with no qualification in variable factors and only with minimum quality to be recruited in the company. But a negative salary is not possible. So, what would be the salary if a person just joined the firm?

In Conclusion, we have to state that the sample is not fully representative from all people working in the company. We can not extrapolate the results

8

Page 9: Econometric in application

too far out of this sample range. We can not use the equation to predict what a new entrant would earn. So at the inference, we can say that this regression equation model should not be used also for making other generalized decisions for any salary structure.

Simple Regression for Negative Influencing Factors Show,

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 SEX OF

EMPLOYEEa

. Enter

a. All requested variables entered.

b. Dependent Variable: CURRENT SALARY

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .343a .118 .086 7705.174

a. Predictors: (Constant), SEX OF EMPLOYEE

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 22191.765 1868.779 11.875 .000

SEX OF EMPLOYEE -5486.380 2838.880 -.343 -1.933 .063

a. Dependent Variable: CURRENT SALARY

It is found that the simple regression of Sex of Employee on Current Salary yet

shows negative influence without having all other variable’s influence. But initial

salary(α) is positive here.

9

Page 10: Econometric in application

Now,

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 WORK

EXPERIENCEa

. Enter

a. All requested variables entered.

b. Dependent Variable: CURRENT SALARY

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .391a .153 .123 7549.967

a. Predictors: (Constant), WORK EXPERIENCE

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 22884.178 1940.377 11.794 .000

WORK EXPERIENCE -355.087 157.964 -.391 -2.248 .033

a. Dependent Variable: CURRENT SALARY

Again, It is found that the simple regression of Work of experience on Current Salary yet

shows negative influence without having all other variable’s influence. But initial

salary(α) is also positive here.

However, after allowing for the effects of Sex of employee and Work of experience, we

find from the multiple regression equation that it also yields lower salary same as simple

regression. So, the omission of variables only yields the positive initial salary(α), but

similar effect of other independent variables.

10

Page 11: Econometric in application

HETEROSKEDASTICITY IN MULTIPLE REGRESSION

In multiple regression, one of the assumptions we have made until now that the

errors have a common variance. This is known as the homoskedasticity

assumption. But, if we don’t have a constant variance we say they are

heteroskedastic.

In our Data set analyzing through SPSS we get,

Descriptive Statistics

Mean Std. Deviation N

CURRENT SALARY 19814.33 8060.314 30

SEX OF EMPLOYEE .43 .504 30

JOB SENIORITY 80.47 9.748 30

AGE OF EMPLOYEE 36.3227 8.76549 30

EDUCATIONAL LEVEL 16.00 2.244 30

WORK EXPERIENCE 8.6453 8.87542 30

MINORITY

CLASSIFICATION

.37 .490 30

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 10342.00 29286.66 19814.33 5313.421 30

Residual -8926.251 21585.666 .000 6061.042 30

Std. Predicted Value -1.783 1.783 .000 1.000 30

Std. Residual -1.447 3.499 .000 .983 30

a. Dependent Variable: CURRENT SALARY

11

Page 12: Econometric in application

Here, Residuals plot trumpet-shaped => Residuals do not have constant variance.

Using the residuals this histogram is associated with dependent variable, leaving

independent variables for ease of getting error variance. The graph shows that it

is not totally normal distribution. There are some disturbances in this data set.

So we get the prevailing, but lower Heteroskedasticity problem here.

Model Summaryb

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .701a .491 .358 6458.883

a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB

SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,

EDUCATIONAL LEVEL, WORK EXPERIENCE

b. Dependent Variable: CURRENT SALARY

According to White and Gleijser test, we measure Heteroskedasticity problem

based on R2. So here we don’t reject hypothesis of Homoskedasticity(R2<0.50).

12

Page 13: Econometric in application

In this Normal P-P Plot, we get least square line which is also very near to be

normal. So, we get here also very lower Heteroskedasticity problem.

13

Page 14: Econometric in application

Again, regressing Standardized Residual on Standardized Predicted value, we find very Heteroskedasticity problem for showing no particular trend in this plot.

Although, We have very low Heteroskedasticity problem, we can solve the rest by

“Possible correction => log transformation of variable weight”

This log linear form’s R2 are not comparable, since the variance of dependent variable is different.

14

Page 15: Econometric in application

AUTOCORRELATION IN MULTIPLE REGRESSION

In multiple Regression analysis, the correlation between error terms, is called Autocorrelation. For detecting Autocorrelation problem Durbin-Watson test is

the simplest and most commonly used. Here the ϕ for testing hypothesis of

having Autocorrelation in Data set.

Model Summaryb

Model Durbin-Watson

1 2.168a

a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB

SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,

EDUCATIONAL LEVEL, WORK EXPERIENCE

b. Dependent Variable: CURRENT SALARY

Coefficientsa

Model

Correlations

Zero-order Partial Part

1 SEX OF EMPLOYEE -.343 -.158 -.114

JOB SENIORITY .094 .131 .094

AGE OF EMPLOYEE -.313 .066 .047

EDUCATIONAL LEVEL .659 .513 .426

WORK EXPERIENCE -.391 -.071 -.051

MINORITY

CLASSIFICATION

.224 .151 .109

a. Dependent Variable: CURRENT SALARY

15

Page 16: Econometric in application

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 8323.94 31453.22 19814.33 5646.471 30

Residual -7812.773 20206.270 .000 5752.046 30

Std. Predicted Value -2.035 2.061 .000 1.000 30

Std. Residual -1.210 3.128 .000 .891 30

a. Dependent Variable: CURRENT SALARY

16

Page 17: Econometric in application

Correlations

EDUCATION

AL LEVEL

WORK

EXPERIENCE

CURRENT

SALARY

SEX OF

EMPLOYEE

JOB

SENIORITY

AGE OF

EMPLOYEE

MINORITY

CLASSIFICAT

ION

Pearson

Correlation

CURRENT

SALARY

.659 -.391

-.391

1.000 -.343 .094 -.313 .224

SEX OF

EMPLOYEE

-.274 .271 -.343 1.000 -.225 .183 .033

JOB

SENIORITY

-.085 -.035 .094 -.225 1.000 .003 .000

AGE OF

EMPLOYEE

-.411 .979 -.313 .183 .003 1.000 -.196

EDUCATION

AL LEVEL

1.000 -.497 .659 -.274 -.085 -.411 .188

WORK

EXPERIENC

E

-.497 1.000 -.391 .271 -.035 .979 -.200

MINORITY

CLASSIFICA

TION

.188 -.200 .224 .033 .000 -.196 1.000

Sig. (1-tailed) CURRENT

SALARY

.000 .016 . .032 .311 .046 .117

SEX OF

EMPLOYEE

.071 .074 .032 . .116 .166 .432

JOB

SENIORITY

.327 .428 .311 .116 . .494 .498

AGE OF

EMPLOYEE

.012 .000 .046 .166 .494 . .150

EDUCATION

AL LEVEL

. .003 .000 .071 .327 .012 .160

WORK

EXPERIENC

E

.003 . .016 .074 .428 .000 .144

MINORITY

CLASSIFICA

TION

.160 .144 .117 .432 .498 .150 .

17

Page 18: Econometric in application

As here the D-W Statistic is 2.168 which is very near to 2. We know that if D-W

Statistic is 2it indicates zero correlation ( =0ϕ ) between Error terms. So in our

data set, there is very low Autocorrelation problem.

In solution of Autocorrelation problem, we can apply the LM Test, BKW Test etc.

MULTICOLLINEARITY IN MULTIPLE REGRESSION

One important problem in the application of multiple regression analysis involves the possible collinearity of the explanatory variables. This condition refers to situations in which some of the explanatory variables are highly correlated with each other.

One method of measuring multicollinearity uses the Variance Inflation Factor(VIF)

For each explanatory variable. We get VIF shown below through SPSS,

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 SEX OF EMPLOYEE .734 1.362

JOB SENIORITY .939 1.065

AGE OF EMPLOYEE .033 29.964

WORK EXPERIENCE .032 31.372

MINORITY CLASSIFICATION .950 1.053

a. Dependent Variable: EDUCATIONAL LEVEL

18

Page 19: Econometric in application

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 SEX OF EMPLOYEE .848 1.179

JOB SENIORITY .924 1.082

AGE OF EMPLOYEE .810 1.235

MINORITY

CLASSIFICATION

.937 1.068

EDUCATIONAL LEVEL .756 1.322

a. Dependent Variable: WORK EXPERIENCE

19

Page 20: Econometric in application

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 JOB SENIORITY .918 1.089

AGE OF EMPLOYEE .031 32.365

MINORITY

CLASSIFICATION

.947 1.056

EDUCATIONAL LEVEL .572 1.749

WORK EXPERIENCE .028 35.927

a. Dependent Variable: SEX OF EMPLOYEE

20

Page 21: Econometric in application

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 AGE OF EMPLOYEE .028 35.540

MINORITY

CLASSIFICATION

.938 1.066

EDUCATIONAL LEVEL .602 1.662

WORK EXPERIENCE .025 40.063

SEX OF EMPLOYEE .755 1.324

a. Dependent Variable: JOB SENIORITY

21

Page 22: Econometric in application

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 MINORITY

CLASSIFICATION

.938 1.066

EDUCATIONAL LEVEL .721 1.388

WORK EXPERIENCE .718 1.392

SEX OF EMPLOYEE .890 1.124

a. Dependent Variable: AGE OF EMPLOYEE

22

Page 23: Econometric in application

Coefficientsa

Model

Collinearity Statistics

Tolerance VIF

1 EDUCATIONAL LEVEL .610 1.641

WORK EXPERIENCE .025 40.044

SEX OF EMPLOYEE .763 1.311

AGE OF EMPLOYEE .028 35.539

a. Dependent Variable: MINORITY CLASSIFICATION

23

Page 24: Econometric in application

The tolerance for a variable is (1 - R-squared) for the regression of that variable on all the other independents, ignoring the dependent. When tolerance is close to 0 there is high multicollinearity of that variable with other independents and the coefficients will be unstable.

VIF is the variance inflation factor, which is simply the reciprocal of tolerance. Therefore, when VIF is high there is high multicollinearity and instability of the coefficients.

24

Page 25: Econometric in application

As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity is indicated.

From above graph and considering VIF results, we can interpret there is very high collinearity among the independent variables.

We can solve this problem through,

• Ridge Regression

• Principle component Regression

• Dropping the most influential variables

• Using Ratios or First Differences

• Using Extraneous Estimates

• Getting more data

Concluding Comments :

By analyzing the Multiple Regression, Considering the R2 (0.491) value ,we can infer that for overall estimation this model is not strong.

Again, we have found that the model for Salary estimation for Employee of Coca-Cola company includes almost all collinear variables. But this model is very useful considering for having very low Heteroskedasticity and Autocorrelation problem.

25

Page 26: Econometric in application

26

Page 27: Econometric in application

26