Multiple Regression. Multiple regression Previously discussed the one predictor scenario Multiple regression is the case of having two or more independent

Multiple Regression

Multiple regression Previously discussed the one predictor

scenario Multiple regression is the case of having two

or more independent variables predicting some outcome variable

Basic idea is the same as simple regression, however more will need to be considered in its interpretation

The best fitting plane Before we attempted to find the best fitting line to

our 2d scatterplot of values With the addition of another predictor our cloud of

values becomes 3d Now we are looking for what amounts to the best

fitting plane With 3 or more we get into hyperspace and dealing with a

regression surface Regression equation:

0 1 1 2 2ˆ ... p pY b b X b X b X

Linear combination The notion of a linear combination is important for you to

understand for MR and multivariate techniques in general Again, what MR analysis does is create a linear combination

(weighted sum) of the predictors The weights are important to help us assess the nature of the

predictor-DV relationships with consideration of the other variables in the model

We then look to see how the linear combination in a sense matches up with the DV

One way to think about it is we extract relevant information from predictors to help us understand the DV

Stage of Condom

Use

Stage of Condom

Use

(X3) Self-Efficacy of Condom Use

(X3) Self-Efficacy of Condom Use

(X2) Cons of Condom Use

(X2) Cons of Condom Use

(X1) Pros of Condom Use

(X1) Pros of Condom Use

(X4) Psychosexual Functioning

(X4) Psychosexual Functioning

X’

New Linear Combination

3

4

2

1

MR Example

Considerations in multiple regression Assumptions Overall fit Parameter estimates and variable importance Variable entry IV relationships Prediction

Assumptions: Normality The assumptions for simple

regression will continue to hold Normality, homoscedasticity,

independence Mulitvariate normality can be at

least partially checked through examination of individual variables for normality, linearity, and heteroscedasticity

Tests for multivariate normality seem to be easily obtained in every package except SPSS

Assumptions: Model Misspecification In addition, we must worry about

model misspecification Omitting relevant variables,

including irrelevant ones, incorrect paths

Not much one can do about omitting relevant variables, but it may produce biased and less valid results

However we can’t just throw in all the variables we can think of also Overfitting Violation of Ockham's razor

Including irrelevant variables contributes to the standard error of estimate (and thus the SE for our coefficients) which will affect the statistical tests on individual variables

Coefficientsa

.845 .409 2.067 .039

-.001 .017 -.003 -.075 .940 .256 -.003 -.003

.761 .078 .434 9.818 .000 .505 .416 .381

.007 .001 .238 5.754 .000 .370 .259 .223

(Constant)

Visits to healthprofessionals

Physical healthsymptoms

Stressful life events

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Zero-order Partial Part

Correlations

Dependent Variable: Mental health symptomsa.

Coefficientsa

.849 .403 2.105 .036

.759 .071 .432 10.622 .000 .505 .443 .412

.007 .001 .238 5.840 .000 .370 .262 .226

(Constant)



Model1

B Std. Error


Beta



Correlations


Example data Current salary predicted by educational level,

time since hire, and previous experience (N = 474)

As with any analysis, initial data analysis should be extensive prior to examination of the inferential analysis

Initial examination of data We can use the descriptives to give us a general feel for

what’s going on with the variables in question Here we can also see that months since hire and previous

experience are not too well correlated with our dependent variable of current salary Ack!

We’d also want to look at the scatterplots to further aid our assessment of the predictor-DV relationships

Descriptive Statistics

34419.57 17075.661 474

13.49 2.885 474

17016.09 7870.638 474

81.11 10.061 474

95.86 104.586 474

Current Salary

Educational Level (years)

Beginning Salary

Months since Hire

Previous Experience(months)

Mean Std. Deviation N

Correlations

1 .047 -.252** .661**

. .303 .000 .000

474 474 474 474

.047 1 .003 .084

.303 . .948 .067

474 474 474 474

-.252** .003 1 -.097*

.000 .948 . .034

474 474 474 474

.661** .084 -.097* 1

.000 .067 .034 .

474 474 474 474

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N


Months since Hire


Current Salary

EducationalLevel (years)

Monthssince Hire

PreviousExperience(months) Current Salary

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Starting point: Statistical significance of the model The Anova summary table tells us whether our model is

statistically significant R2 different from zero Equation is better predictor than the mean

As with simple regression, the analysis involves the ratio of variance predicted to residual variance

As we can see, it is reflective of the relationship of the predictors to the DV (R2), the number of predictors in the model, and sample size

ANOVAb

6.13E+10 3 2.042E+10 125.176 .000a

7.67E+10 470 163112654.7

1.38E+11 473

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Previous Experience (months), Months since Hire,Educational Level (years)

a.

Dependent Variable: Current Salaryb.

2

2

( 1)

(1 )

, 1

R N pF

R p

df p N p

Multiple correlation coefficient The multiple correlation coefficient is the correlation

between the DV and the linear combination of predictors which minimizes the sum of the squared residuals

More simply, it is the correlation between the observed values and the values that would be predicted by our model

Its squared value (R2) is the amount of variance in the dependent variable accounted for by the independent variables

R2

Here it appears we have an OK model for predicting current salary

Model Summary

.666a .444 .441 $12,771.556Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Previous Experience (months),Months since Hire, Educational Level (years)

a.

Variable importance: Statistical significance After noting that our model is

viable, we can begin our interpretation of how the predictors’ relative contributions

To begin with we can examine the output to determine which variables statistically significantly contribute to the model

Standard error measure of the variability that

would be found among the different slopes estimated from other samples drawn from the same population

Variable importance: Statistical significance We can see from the output that only previous

experience and education level are statistically significant predictors

Coefficientsa

-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730

4004.576 210.628 .677 19.013 .000 3590.687 4418.466

87.951 58.441 .052 1.505 .133 -26.887 202.788

11.936 5.803 .073 2.057 .040 .533 23.340

(Constant)


Months since Hire


Model1

B Std. Error


Beta


t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: Current Salarya.

Variable importance: Weights Statistical significance, as usual, is only a starting point for our

assessment of results What we’d really want is a measure of the unique contribution of an

IV to the model Unfortunately the regression coefficient, though useful in

understanding that particular variable’s relationship to the DV, is not useful for comparing to other IVs that are of a different scale

Coefficientsa

-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730

4004.576 210.628 .677 19.013 .000 3590.687 4418.466

87.951 58.441 .052 1.505 .133 -26.887 202.788

11.936 5.803 .073 2.057 .040 .533 23.340

(Constant)


Months since Hire


Model1

B Std. Error


Beta





Variable importance: standardized coefficients Standardized regression coefficients get around that problem Now we can see how much the DV will change in standard deviation units with

one standard deviation unit change in the IV (all others held constant) Here we can see that education level seems to have much more influence on the

DV Another 3 years of education >$11000 bump in salary

Coefficientsa

-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730

4004.576 210.628 .677 19.013 .000 3590.687 4418.466

87.951 58.441 .052 1.505 .133 -26.887 202.788

11.936 5.803 .073 2.057 .040 .533 23.340

(Constant)


Months since Hire


Model1

B Std. Error


Beta





Variable importance However we still have other output to help us

understand variable contribution Partial correlation is the contribution of an IV after

the contributions of the other IVs have been taken out of both the IV and DV

Semi-partial correlation is the unique contribution of an IV after the contribution of other IVs have been taken only out of the predictor in question

Variable importance: Partial correlation A+B+C+D represents all the

variability in the DV to be explained A+B+C = R2

The squared partial correlation is the amount a variable explains relative to the amount in the DV that is left to explain after the contributions of the other IVs have been removed from both the predictor and criterion

It is A/(A+D) For IV2 it would be B/(B+D)

Variable importance: Semipartial correlation The semipartial correlation (squared) is

perhaps the more useful measure of contribution

It refers to the unique contribution of A to the model, i.e. the relationship between the DV and IV after the contributions of the other IVs have been removed from the predictor

A/(A+B+C+D) For IV2

B/(A+B+C+D)

Interpretation (of the squared value): Out of all the variance to be accounted for,

how much does this variable explain that no other IV does or

How much would R2 drop if the variable were removed?

2 2 2 21 2 ... pR r sr sr

Variable importance

IV1

IV2

Note that exactly how partial and semi-partial will be figured will depend on the type of multiple regression employed.

The previous examples concerned a standard multiple regression situation.

For sequential (i.e. hierarchical) regression, the partial correlation would be IV1 = (A+C)/(A+C+D) IV2 = B/(B+D)

Variable importance For semi-partial correlation

IV1 = (A+C)/(A+B+C+D) IV2 same as before

The result for the addition of the second variable is the same as it would be in standard MR

Thus if the goal is to see the unique contribution of a single variable after all others have been controlled for, there is no real reason to perform a sequential over standard MR

In general terms, it is the unique contribution of the variable at the point it enters the equation (sequential or stepwise)

Variable importance: Example data The semipartial correlation is labeled as ‘part’ correlation

in SPSS Here we can see that education level is really doing all the

work in this model Obviously from some alternate universe

Coefficientsa

-27886.3 5529.479 -5.043 .000

4004.576 210.628 .677 19.013 .000 .661 .659 .654

87.951 58.441 .052 1.505 .133 .084 .069 .052

11.936 5.803 .073 2.057 .040 -.097 .094 .071

(Constant)


Months since Hire


Model1

B Std. Error


Beta



Correlations


Another example Mental health symptoms predicted by number

of doctor visits, physical health symptoms, number of stressful life events

Model Summary

.553a .306 .302 3.504Model1



Predictors: (Constant), Stressful life events, Visits tohealth professionals, Physical health symptoms

a.

ANOVAb

2498.626 3 832.875 67.820 .000a

5661.387 461 12.281

8160.013 464

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Stressful life events, Visits to health professionals, Physicalhealth symptoms

a.

Dependent Variable: Mental health symptomsb.

Here we see that physical health symptoms and stressful life events both significantly contribute to the model

Physical health symptoms more ‘important’

Coefficientsa

.845 .409 2.067 .039

-.001 .017 -.003 -.075 .940 .256 -.003 -.003

.761 .078 .434 9.818 .000 .505 .416 .381

.007 .001 .238 5.754 .000 .370 .259 .223

(Constant)

Visits to healthprofessionals



Model1

B Std. Error


Beta



Correlations


Variable Importance: Comparison Comparison of

standardized coefficients, partial, and semi-partial correlation coefficients

All of them are ‘partial’ correlations

1 2 12

212

1 2 12

2 22 12

1 2 12

212

( )( )

1 ( )

( )( )

1 1 ( )

( )( )

1 ( )

y y

y y

y

y y

r r r

r

r r rPartial

r r

r r rSemi Partial

r

Another Approach to Variable Importance The methods just provided give us a glimpse as to variable importance,

but interestingly we don’t have a unique contribution statistic that is a true decomposition of R-squared, i.e. that we could add each measure of importance to equal our overall R-squared

One that does provides an average R2 increase, depending on the order the variable enters into the model 3 predictor example A B C; B A C, C A B etc.

One way to think about it using what you’ve just learned is thinking of the squared semi-partial correlation whether a variable is first second third etc.

Note that the average is for all possible permutations E.g. the R-square contribution for B being first in the model includes B A C

and B C A, both of which would of course be the same value The following example comes from the survey data

Model Summary

.793a .629 .618 1.46685 .629 54.346 1 32 .000

.809b .654 .619 1.46367 .025 1.070 2 30 .356

Model1

2



R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), war on terrora.

Predictors: (Constant), war on terror, mathmatical ability, grade for bushb.

Model Summary

.096a .009 -.022 2.39850 .009 .295 1 32 .591

.805b .649 .626 1.45116 .639 56.417 1 31 .000

.809c .654 .619 1.46367 .005 .472 1 30 .497

Model1

2

3




Change Statistics

Predictors: (Constant), mathmatical abilitya.

Predictors: (Constant), mathmatical ability, war on terrorb.

Predictors: (Constant), mathmatical ability, war on terror, grade for bushc.

Model Summary

.742a .551 .537 1.61427 .551 39.296 1 32 .000

.799b .638 .615 1.47194 .087 7.488 1 31 .010

.809c .654 .619 1.46367 .016 1.351 1 30 .254

Model1

2

3




Change Statistics

Predictors: (Constant), grade for busha.

Predictors: (Constant), grade for bush, war on terrorb.

Predictors: (Constant), grade for bush, war on terror, mathmatical abilityc.

Model Summary

.746a .557 .528 1.63025 .557 19.453 2 31 .000

.809b .654 .619 1.46367 .098 8.458 1 30 .007

Model1

2




Change Statistics

Predictors: (Constant), mathmatical ability, grade for busha.

Predictors: (Constant), mathmatical ability, grade for bush, war on terrorb.

As Predictor 1: R2 = .629Note there are 2 models in which war would be .629

As Predictor 2: R2 change = .639 and .087

As Predictor 3: R2 change = .098There are 2 models in which war would be .098

Interpretation The average of these is the average contribution to R

square for a particular variable over all possible orderings In this case for war it is ~.36, i.e. on average, it increases

R square 36% of variance accounted for Furthermore, if we add up the average R-squared

contribution for all three… .36+.28+.01 = .65 .65 is the R2 for the model

R program example library(relaimpo) RegModel.1 <- lm(SOCIAL~BUSH+MTHABLTY+WAR, data=Dataset) calc.relimp(RegModel.1, type = c("lmg", "last", "first", "betasq", "pratt"))

Output: LMG, is what we were just talking about. LMG stands for Lindemann, Merenda and Gold, authors who introduced it

Last is simply the squared semi-partial correlation First is just the square of the simple bivariate correlation between predictor and

DV Beta square is the square of the beta coefficient with ‘all in’ Pratt is the product of the standardized correlation and the simple bivariate

correlation It too will add up to the model R2 but is not recommended, one reason being that it can

actually be negative

lmg last first betasq pratt

BUSH 0.278 0.005 0.551 0.024 0.116

MATH 0.012 0.016 0.009 0.016 0.012

WAR 0.363 0.098 0.629 0.439 0.526

*Note the relaimpo package is equipped to provide bootstrapped estimates

Different Methods Note that one’s assessment of relative importance may

depend on the method Much of the time those methods will largely agree,

but they may not, so use multiple estimates to help you decide

One might go with the LMG typically as it is both intuitive and a decomposition of R2

lmg last first betasq pratt

BUSH 0.278 0.005 0.551 0.024 0.116

MATH 0.012 0.016 0.009 0.016 0.012

WAR 0.363 0.098 0.629 0.439 0.526

Relative Importance Summary There are multiple ways to estimate a variable’s contribution to the

model, and some may be better than others A general approach: Check simple bivariate relationships.

If you don’t see worthwhile correlations with the DV there you shouldn’t expect much from your results regarding the model Check for outliers and compare with robust measures also

You may detect that some variables are so highly correlated that one is redundant

Statistical significance is not a useful means of assessing relative importance, nor is the raw coefficient

Standardized coefficients and partial correlations are a first step Compare standardized to simple correlations as a check on possible

suppression Of typical output the semi-partial correlation is probably the more

intuitive assessment The LMG is also intuitive, and is a natural decomposition of R2, unlike

the others

Relative Importance Summary One thing to keep in mind is that determining

variable importance, while possible for a single sample, should not be overgeneralized

Variable orderings likely will change upon repeated sampling E.g. while one might think that war and bush are better

than math (it certainly makes theoretical sense), saying that either would be better than the other would be quite a stretch with just one sample

What you see in your sample is specific to it, and it would be wise to not make any bold claims without validation

Regression Diagnostics Of course all of the previous information would be relatively useless if we

are not meeting our assumptions and/or have overly influential data points In fact, you shouldn’t be really looking at the results unless you test

assumptions and look for outliers, even though this requires running the analysis to begin with

Various tools are available for the detection of outliers Classical methods

Standardized Residuals (ZRESID) Studentized Residuals (SRESID) Studentized Deleted Residuals (SDRESID)

Ways to think about outliers Leverage Discrepancy Influence

Thinking ‘robustly’

Regression Diagnostics Standardized Residuals (ZRESID)

Standardized errors in prediction Mean 0, Sd = std. error of estimate To standardize, divide each residual by its s.e.e.

At best an initial indicator (e.g. the +2 rule of thumb), but because the case itself determines what the mean residual would be, almost useless

Studentized Residuals (SRESID) Same thing but studentized residual recognizes that the error

associated with predicting values far from the mean of X is larger than the error associated with predicting values closer to the mean of X

standard error is multiplied by a value that will allow the result to take this into account

Studentized Deleted Residuals (SDRESID) Studentized in which the standard error is calculated with the

case in question removed from the others

Regression Diagnostics Mahalanobis’ Distance

Mahalanobis distance is the distance of a case from the centroid of the remaining points (point where the means meet in n-dimensional space)

Cook’s Distance Identifies an influential data point whether in terms of predictor or

DV A measure of how much the residuals of all cases would change if a

particular case were excluded from the calculation of the regression coefficients.

With larger (relative) values, excluding a case would change the coefficients substantially.

DfBeta Change in the regression coefficient that results from the exclusion of

a particular case Note that you get DfBetas for each coefficient associated with the

predictors

Regression Diagnostics Leverage assesses outliers among the IVs

Mahalanobis distance Relatively high Mahalanobis suggests an outlier on one or more variables

Discrepancy Measures the extent to which a case is in line with others

Influence A product of leverage and discrepancy How much would the coefficients change if the case were deleted?

Cook’s distance, dfBetas

Outliers Influence plots With a couple measures of

‘outlierness’ we can construct a scatterplot to note especially problematic cases After fitting a regression model in R-

commander, i.e. running the analysis, this graph is available via point and click

Here we have what is actually a 3-d plot, with 2 outlier measures on the x and y axes (studentized residuals and ‘hat’ values, a measure of leverage) and a third in terms of the size of the circle (Cook’s distance)

For this example, case 35 appears to be a problem

Outliers It should be clear to interested readers whatever has

been done to deal with outliers, Applications such as S-plus, R, and even SAS and

Stata (pretty much all but SPSS) provide methods of robust regression analysis, and would be preferred

Summary: Outliers No matter the analysis, some cases will be the ‘most

extreme’. However, none may really qualify as being overly influential.

Whatever you do, always run some diagnostic analysis and do not ignore influential cases

It should be clear to interested readers whatever has been done to deal with outliers

As noted before, the best approach to dealing with outliers when they do occur is to run a robust regression with capable software

Documents

Multiple Regression. Multiple regression Previously discussed the one predictor scenario Multiple regression is the case of having two or more independent