21
Regression and Regression and Multiple Regression Analysis Multiple Regression Analysis

# Regression

Embed Size (px)

Citation preview

Regression and Regression and Multiple Regression AnalysisMultiple Regression Analysis

- a technique use for the modeling and analysis of numerical data consisting of value of dependent variable (response variable) and of one or more independent variables (explanatory variables).

It can be used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships.

Regression concepts were published in early of 1800.

Regression

Applications of regression are numerous and occur in almost every field, including: - engineering, - physical sciences, - economics, - management, - life and biological sciences - social sciences.

In fact, regression analysis may be the most widely used statistical technique.

Applications

Types of Regression Types of Regression ModelsModels

RegressionModels

Linear Non-Linear

2+ IndependentVariables

Simple

Non-Linear

Multiple

Linear

1 independentVariable

ni ,...,3,2,1

simple linear regression model:A regression model that involves only one independent variable. The form can be express as Yi = β0+ β1Xi+ei

Here, Y = the yield (dependent), Xi= the independent variable ei= error or disturbance

Multiple linear regression model:A regression model that involves more than one regressor (independent) variable.

The general form can be express as

Yi = β0+ β1Xi1+ β2Xi2+ …. + βkXik+ ei ni ,...,3,2,1

Here, Y = the yield (dependent), Xi= the independent variable ei= error or disturbance

1. The general purpose of regression (multiple regression) is to learn about the relationship between several independent or predictor variables and a dependent variable.

2. The specific objective of regression are:• Estimate the unknown parameters in the

regression model (fitting the model to the data). • Predict or forecast the response variable and these

predictions are helpful in planning the project.

Objectives

According to Gaussian, standard or classical linear regression model (CLRM), which is the foundation/cornerstone of most econometric theory.

several assumptions:

Assumption 1: The regression model is linear in the parameters

Assumption 2: X values are fixed in repeated samplingAssumption 3: Zero mean values of disturbance (error)

Underlying Principles

Assumption 4: Error variance ie: Var(ei /Xi ) = 2 ( a constant)

Assumption 5: No autocorrelation between the disturbances (error). Assumption 6: Zero covariance between ei and Xi , or Cov (ei, Xi) = 0Assumption 7: There are no perfect linear relationships among the independent variables.

Underlying Principles cont’s …

Here we just name some well-known methods for estimating the regression model:

1. The methods of moments2. The methods of least squares3. The methods of maximum likelihood

The Ordinary Least Squares (OLS) method of estimation is the popular one, has a wide area of uses for its flexibility.

Methods of Estimation

The main aim of least square method is to estimate parameters of the linear regression model by minimizing the error sum of squares.

A multi linear model of the formY = β0+ β1X1+ β2X2+….++ β6X6+eWe may write the sample regression model as followsYi = 0 + 1xi1 + 2xi2 + ---------+ kxik + I

The least-squares function is n S = ∑I

2

i = 1 n k = ∑( yi - 0 - ∑j xij )2

i = 1 j =1This function S must be with respect to 0, 1, ……….., k. The least-squaresd estimators of 0, 1, ……….., k are estimated by

minimized this S function with respect to 0, 1, ……….., k.

The Ordinary Least Squares (OLS)

i) Standard error of the coefficientii) T-test of the coefficientsiii) Residuals standards deviationsiv) Coefficient of determination, R2

v) ANOVA for overall measures

The techniques to determining the model accuracy:

(i) The standard error is (i) The standard error is represented byrepresented by

)( ise SxxMSres /MSres : residual means squareSxx : Sum of square of independent variables

(ii) T-test of the coefficients(ii) T-test of the coefficients

• Suppose that we wish to test the Suppose that we wish to test the hypothesis that the slope equals a hypothesis that the slope equals a constant, say ßconstant, say ßi0. i0. The appropriate The appropriate hypothesis are:hypothesis are:

H0 : ßi = ßioH0 : ßi = ßio H1 : ßi ≠ ßioH1 : ßi ≠ ßio

where we have specified a two-sided alternativewhere we have specified a two-sided alternative

The definition of a t statistic is follows:The definition of a t statistic is follows: To = (βi – βio) / To = (βi – βio) / SxxMSres /

(ii) T-test of the coefficients (ii) T-test of the coefficients cont’s…cont’s…

iii) Coefficient of determination:R2 as a PRE (proportional-reduction-in-error measure of association)

the standard deviation of the residuals (residuals = differences between observed and predicted values). It is calculated as follows:

iv) Residual standard deviation:

(v) (v) ANOVA for overall measuresANOVA for overall measures

The analysis of variance table divides the total variation in The analysis of variance table divides the total variation in the dependent variable into two components, the dependent variable into two components,

11stst component- which can be attributed to the regression component- which can be attributed to the regression model (labeled model (labeled RegressionRegression) ) 22ndnd component-which cannot (labeled component-which cannot (labeled ResidualResidual). ).

*If the significance level for the F-test is small (less than *If the significance level for the F-test is small (less than 0.05), then the hypothesis that there is no (linear) 0.05), then the hypothesis that there is no (linear) relationship can be rejected, and the multiple correlation relationship can be rejected, and the multiple correlation coefficient can be called statistically significant. The F coefficient can be called statistically significant. The F statistic can be written asstatistic can be written as

FFoo = =

MSresMSr MSr = Regression means square

MSres = Residual means square

Here we have considered a seven variable Multiple linear regression model.The model can be written as a linear form

Y = β0+ β1X1+ β2X2+….++ β6X6+e

Y = Overall rating of job being done by supervisorX1 = Handles employee complaintsX2 = Does not allow special privilegesX3 = Opportunity to learn new thingsX4 = Raises based on performanceX5 = To critical of poor performanceX6 = Rate of advancing to better jobse = Error termβ0, β1, β2,….,β6 are the unknown parameters.

Our ultimate goal is to estimate the unknown parameters from the model.

Literature on Applications of OLS method :

For estimating model we have used here SPSS 11.5 version. The outputs getting from SPSS 11.5 version are given below:

Summary of coefficients

ModelModel

tt Sig.Sig.

CoefficientCoefficientss

Std. Error Std. Error of of CoefficientCoefficientss

(Constant)(Constant) 10.78710.787 11.58911.589 .931.931 .362.362 X1X1 .613.613 .161.161 3.8093.809 .001.001 X2X2 -.073-.073 .136.136 -.538-.538 .596.596 X3X3 .320.320 .169.169 1.9011.901 .040.040 X4X4 .082.082 .221.221 .369.369 .715.715 X5X5 .038.038 .147.147 .261.261 .796.796 X6X6 -.217-.217 .178.178 -1.218-1.218 .236.236From summary of the coefficients table we see that the variables

X1 and X3 are significance than comparing the other variables.

The R2 value =0.73 and standard error of the estimate= 7.06

ModelModel

Sum of Sum of SquaresSquares dfdf Mean SquareMean Square FF Sig.Sig.

RegressionRegression 3147.9663147.966 66 524.661524.661 10.50210.502 .000.000

ResidualResidual 1149.0001149.000 2323 49.95749.957

TotalTotal 4296.9674296.967 2929

Here value of R2 is high, this imply that our fitting model for this data set is appropriate.

ANOVA

We can also comment from ANOVA Table that over all fitting of the model is also appropriate (F=10.502, α=0.01).

1. Regression- can learn the relationship between several independent variables and a dependent variable.

2. Regression- can estimate the unknown parameters of regression model

3. It also can be use for forecasting the response variable and these predictions are helpful in planning the project.

Conclusion