Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition

Introduction to testing statistical significance of interactions

Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

Overview• Testing statistical significance of individual coefficients• Testing effect of interaction terms on overall model fit• Approaches to testing statistical significance of

interactions– Alternative model specification – The “TEST” statement– Simple slopes calculations for compound coefficients– Changing the reference category


Statistical significance of an interaction• To evaluate statistical significance of an interaction,

use a set of approaches • t-tests for individual coefficients • F-tests for the collective contribution of a set of terms to

the overall fit of an OLS model

• The corresponding statistics for a logistic model are• z-statistics for individual coefficients• –2 log likelihood statistic for overall model fit

• Methods for testing differences among values of variables involved in the interaction – Contrasts within the overall shape of the pattern


Estimated coefficients from an OLS model of birth weight in gramsModel A: Without

interactionsModel B: With

interactionsβ t-

statisticβ t-

statisticMain effects termsRace (ref. = non-Hisp. white)

Non-Hispanic Black (NHB) –172.6** –9.86 –168.1** –5.66Mexican American (MA) –23.1 –1.02 –104.2** –2.16

Mother’s ed. (ref. = > HS) Less than high school (<HS) –55.5** –2.88 –54.2** –2.35

High school graduate (=HS) –53.9** –3.64 –62.0** –3.77Interactions: race & education

NHB_<HS –38.5 –0.88MA_<HS 99.4 1.72NHB_=HS 18.4 0.47MA_=HS 93.7 1.49

F-statistic 94.08 65.59Degrees of freedom (df) 9 13

Statistical significance of βs on individual interaction terms

• Statistical significance of coefficients on each of the interaction terms is assessed as for any other independent variable in a multivariate regression model

• In the example from the previous slide, none of the βs on the individual interaction terms between race/ethnicity and mother’s education achieve statistical significance as assessed by their t-statistics – E.g., β

NHB_<HS = –38.5, with a t-statistic of –0.88


Overall shape of an interaction• But recall that βs on main effect and interaction terms cannot

be interpreted in isolation from one another• E.g., in a model of birth weight with an interaction between

race and education, the difference in birth weight for non-Hispanic black infants born to mothers with < HS compared to the reference category involves βs on three variables

= βNHB + β<HS + βNHB_<HS

• More than one β is involved in this calculation, so looking only at the statistical significance of each of those three βs does not tell us the statistical significance of differences between groups defined by combinations of the two IVs in the interaction


What do inferential statistics for individual terms in a model tell us?

• If the coefficient on an interaction term is statistically significant in a model that includes the corresponding main effects terms– We know only that that combination of characteristics has a

joint effect on the DV over and above the main effects

• E.g., if <HS_NHB is statistically significant in a model of birth weight that also includes the main effects of education and race– We know only that that combination of race and education

has a different effect on birth weight than would be implied by the on the main effects of NHB and <HS alone


Assessing effects of interactions on overall model goodness-of-fit (GOF)

• To assess whether the interaction terms collectively improve overall model fit, calculate the difference in F-statistics for models with and without those terms– Model A: Main effects only– Model B: Main effects and interactions

• Compare against critical value of the F-statistic for the number of degrees of freedom (df) for the model.– df for the numerator is based on the difference in number of

covariates in models with and without interaction terms– df for the denominator depends on the sample size

• If the difference in F > the critical value, the interaction terms statistically significantly improves the overall fit of the model


Example difference in model GOF• The difference in F-statistics between models A and B

= Fmodel A – Fmodel B = 94.1 – 65.6 = 28.5

• The difference in the number of degrees of freedom between models A and B = 13 df – 9 df = 4 df

• For an F distribution with • 4 degrees of freedom for the numerator• ∞ degrees of freedom for the denominator (based on the sample size

used to estimate the model) • The critical value for p = 0.001 is 10.8

• The difference in F exceeds the critical value (28.5 > 10.8)– Thus we conclude that inclusion of interaction terms improves the

overall fit of the model at p < 0.001


How can overall fit improve if individual terms aren’t statistically significant?

• In models that include several main effect and interaction terms, one or more of those terms may not be needed to capture relevant variation in the DV– Could be collapsed into the reference category or

combined with other subgroups based on empirical testing– Might yield statistical significance for some interaction

terms

• Models that include many interaction terms may be affected by multicollinearity– Can explain why the t-statistics show a lack of significance

even if the F-statistic indicates statistical significanceThe Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

What do inferential statistics for individual terms in a model tell us?

• If <HS_NHB is statistically significant in a model of birth weight that also includes the main effects of education and race– We know only that that combination of race and education

has a different effect on birth weight than would be implied by the s on the main effects of NHB and <HS alone


What don’t inferential statistics for individual terms in a model tell us?

• Based on the separate test statistics for each of the individual main effect and interaction s alone cannot assess statistical significance of differences in predicted birth weight– For example, for non-Hispanic blacks born to mothers with < HS

compared to non-Hispanic whites born to mothers with > HS (the reference category)

– Across racial/ethnic groups within the < HS group– Across education levels among non-Hispanic blacks

• Remember: each of these comparisons involves comparing values calculated from more than one , e.g., – For non-Hispanic black < HS: <HS + NHB + <HS_NHB


Calculating overall effect for non-Hispanic blacks with < HS education

–54–39

–54

–39

–168

–168

= βNHB + β<HS + βNHB_<HS = (–168) + (–54) + (–39) = –261

βNHB =

β<HS =

βNHB_<HS =

We want to know whether that sum is statistically significantly different from 0; e.g., no difference in birth weight compared to infants born to non-Hispanic white women with more than a high school education = reference category

* p < 0.05 based on t-tests for individual coefficients

*

*

Substantive question behind the interaction model: “Does race modify the association between education

and birth weight?”

• The bar for each race/education combination involves the sum of the intercept and one to three other coefficients

• t-tests for individual βs won’t tell us about statistical significance of differences in those sums


Tests of differences across groups other than the reference category

• Conduct formal inferential tests of whether the predicted value of the dependent variable is statistically significantly different across categories

• Possible approaches– Use “TEST” statement to contrast coefficients– Revise the model specification

• Estimate a model with dummies for all interaction combinations• Reestimate the model with different reference categories

– See separate podcast on that topic

– Conduct post-hoc tests of differences between s from one model

• See separate podcast on simple slopeThe Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

Summary

• Inferential tests for individual coefficients in a regression model test whether each β is statistically significantly different from 0

• In models using main effects and interaction terms, calculating the overall shape of an interaction requires summing several βs– Tests of the individual component βs don’t address

statistical significance of differences in the overall interaction pattern


Suggested resources

• Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. University of Chicago Press, chapters 11, 15, and 16.

• Cohen, Jacob, Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge, chapters 7 and 9.


Suggested online resources• Podcasts on

– Specifying models to test for interactions– Calculating the overall shape of an interaction pattern

from regression coefficients– Comparing overall goodness-of-fit across models– Approaches to testing statistical significance of interactions– Conducting post-hoc tests of compound coefficients using

the simple slopes technique– Using alternative reference categories to test statistical

significance of interactions


Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/multivariate/index.html


mailto:[email protected]

https://hsb.rutgers.edu/owa/redir.aspx?C=7303dc1e3af340ffbabeb7d8d0e92de8&URL=http%3A%2F%2Fpress.uchicago.edu%2Fbooks%2Fmiller%2Fmultivariate%2Findex.html

Documents

Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition