26
MODEL BUILDING MODEL BUILDING IN IN REGRESSION MODELS REGRESSION MODELS

MODEL BUILDING IN REGRESSION MODELS

  • Upload
    boaz

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

MODEL BUILDING IN REGRESSION MODELS. Model Building and Multicollinearity. Suppose we have five factors that we feel could linearly affect y. If all 5 are included we have: y =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  - PowerPoint PPT Presentation

Citation preview

Page 1: MODEL BUILDING IN REGRESSION MODELS

MODEL BUILDINGMODEL BUILDING

ININ

REGRESSION MODELSREGRESSION MODELS

Page 2: MODEL BUILDING IN REGRESSION MODELS

Model Building and Multicollinearity

• Suppose we have five factors that we feel could linearly affect y. If all 5 are included we have:

y = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 +

• But while the p-value for the F-test (Significance F) might be small, one or more (if not all) of the p-values for the individual t-tests may be large.

• Question: Which factors make up the “best” model?– This is called model building

Page 3: MODEL BUILDING IN REGRESSION MODELS

Model Building

• There many approaches to model building– Elimination of some (all) of the variables

with high p-values is one approach

• Forward stepwise regression “builds” the model by adding one variable at a time.

• Modified F-tests can be used to test if the a certain subset of the variables should be included in the model.

Page 4: MODEL BUILDING IN REGRESSION MODELS

The Stepwise Regression Approach

• y = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 +

• Step 1: Run five simple linear regressions:– y = 0 + 1 x1 – y = 0 + 2 x2 – y = 0 + 3 x3 – y = 0 + 4 x4

– y = 0 + 5 x5

• Check the p-values for each –– Note for simple linear regression Significance F = p-value for the t-test.

Suppose this model has lowest p-value (< α)

Page 5: MODEL BUILDING IN REGRESSION MODELS

Stepwise Regression

• Step 2: Run four 2-variable linear regressions:

Check Significance F and p-values for:– y = 0 + 4 x4 + 1 x1

– y = 0 + 4 x4 + 2 x2

– y = 0 + 4 x4 + 3 x3

– y = 0 + 4 x4 + 5 x5

Suppose lowest p-values (< α)Add X3Add X3

Page 6: MODEL BUILDING IN REGRESSION MODELS

Stepwise Regression

• Step 3: Run three 3-variable linear regressions:– y = 0 + 3 x3 + 4 x4 + 1 x1 – y = 0 + 3 x3 + 4 x4 + 2 x2

– y = 0 + 3 x3 + 4 x4 + 5 x5

• Suppose none of these models have all p-values < α -- STOP -- best model is the one with x3 and x4 only

Page 7: MODEL BUILDING IN REGRESSION MODELS

Example

Page 8: MODEL BUILDING IN REGRESSION MODELS

Regression on 5 Variables

Page 9: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from1-Variable Tests

Page 10: MODEL BUILDING IN REGRESSION MODELS

Performing Tests With More Than One Variable

• Remember the Range for X must be contiguous

• Use CUTCUT and INSERT CUT CELLSINSERT CUT CELLS to arrange the X columns so that they are next to each other

Page 11: MODEL BUILDING IN REGRESSION MODELS

Summary of Results From2-Variable Tests

Page 12: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from3-Variable Tests

Page 13: MODEL BUILDING IN REGRESSION MODELS

Summary of Results from4-Variable Tests

Page 14: MODEL BUILDING IN REGRESSION MODELS

Best Model

• The best model is the three-variable model that includes x1, x4, and x5.

541 21.36134x931.9743x130.5134x 2782.66- y

Page 15: MODEL BUILDING IN REGRESSION MODELS

TESTING PARTS OF THE MODEL

• Sometimes we wish to see whether to keep a set of variables “as a group” or eliminate them from the model.– Example: Model might include 3 dummy

variables to account for how the independent variable is affected by a particular season (or quarter) of the year.

• Will either keep all seasons or will keep none

• The general approach is to assess how much “extra value” these additional variables will add to the model.– Approach is a Modified F-test

Page 16: MODEL BUILDING IN REGRESSION MODELS

Approach: Compare Two Models –The Full Model and The Reduced Model

• Suppose a model consists of p variables and we wish to consider whether or not to keep a set of p-q of those p variables in the model.

• Two models– Full model – p variables– Reduced model – q variables

• For notational convenience, assume the last p-q of the p variables are the ones that would be eliminated.

– Sample of size n is taken

Page 17: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Test

• Modified F-Test:

H0: βq+1 = βq+2 = ..… = βp = 0

HA: At least one of these p-q β’s ≠ 0

• This is an F-test of the form:

Reject H0 (Accept HA) if: F > Fα,p-q,n-p-1

# variables considered for eliminationDegrees of Freedom for the Error

Term of the Full Model

Page 18: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Statistic

• For this model, the F-statistic is defined by:

Full

FullReduced

Full

MSE

q-p)SSE(SSE

Error SquareMean

Errors Squared in theReduction Mean F

Page 19: MODEL BUILDING IN REGRESSION MODELS

Example

• A housing price model (Full model) is proposed for homes in Laguna Hills that takes into account p = 5 factors:– House size, Lot Size, Age, Whether or not there is

a pool, # Bedrooms

• A reduced model that takes into account only the first of these (q = 3) was discussed earlier.

• Based on a sample of n = 38 sales, can we conclude that adding these p-q = 2 additional variables (Pool, # Bedrooms) is significant?

Page 20: MODEL BUILDING IN REGRESSION MODELS
Page 21: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Test For This Example

• Modified F-Test:H0: β4 = β5 = 0

HA: At least one of β4 and β5 ≠ 0

For α = .05, the test is

Reject H0 (Accept HA) if: F > F.05,2,32

F.05,2,32 can be generated in Excel by FINV(.05,2,32) = 3.29.

Page 22: MODEL BUILDING IN REGRESSION MODELS

Full Model

SSEFull

MSEFullDFEFull

Page 23: MODEL BUILDING IN REGRESSION MODELS

Reduced Model

SSEReduced

Page 24: MODEL BUILDING IN REGRESSION MODELS

The Partial F-Test

=((G3-C13)/2)/D13

=FINV(.05,2,B13)

SSE from

Output Reduced Worksheet

Page 25: MODEL BUILDING IN REGRESSION MODELS

The Modified F-Statistic

• For this model, the modified F-statistic is:

• The critical value of F = F.05,2,32 = 3.29453087

• 21.43522834 > 3.29453087

There is enough evidence to conclude that including Pool and Bedrooms is significant.

43522834.21,04011,375,871

23,286)364,027,87-59,959(851,716,6

MSE

q-p)SSE(SSE

FFull

FullReduced

Page 26: MODEL BUILDING IN REGRESSION MODELS

Review

• Stepwise regression helps determine a “best model” from a series of possible independent variables (x’s)– Approach –

• Step 1 – Run one variable regressions– If there is a p-value < , keep the variable with lowest p-value as a variable

in the model

• Step 2 – Run 2-variable regressions– One of the two variables in each model is the one determined in Step 1

– Keep the one with the lowest p-values if both are < • Repeat with 3, 4, 5 variables, etc. until no model as has p-values <

• Modified F-test for testing the significance of parts of the model– Compare F to Fα,p-q,DFE(Full), where

F= ((SSEReduced – SSEFull)/(#terms removed))/MSEFull