31
© Willett, Harvard University Graduate School of Education, 03/22/2022 S052/I.1(e) – Slide 1 More details can be found in the Course Objectives and Content” handout on the course webpage. Multiple Regression Analysis (MRA) i i i i X X Y 2 2 1 1 0 Do your residuals meet the required assumptions? Test for residual normalit y Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If your sole predictor is continuous, MRA is identical to correlational analysis If your sole predictor is dichotomous, MRA is identical to a t-test If your several predictors are categorical, MRA is identical to ANOVA If time is a predictor, you need discrete- time survival analysisIf your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotom ous outcome) Multinomia l logistic regression analysis (polytomo us outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non- linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, How do you deal with missing data? S052/§I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area? Today’s Topic Area

S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

Embed Size (px)

DESCRIPTION

If your several predictors are categorical , MRA is identical to ANOVA. If your sole predictor is continuous , MRA is identical to correlational analysis. If your sole predictor is dichotomous , MRA is identical to a t-test. Do your residuals meet the required assumptions ?. - PowerPoint PPT Presentation

Citation preview

Page 1: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 1

More details can be found in the “Course Objectives and Content” handout on the course webpage.More details can be found in the “Course Objectives and Content” handout on the course webpage.

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a Multi-level

Model

If your sole predictor is continuous, MRA is

identical to correlational analysis

If your sole predictor is dichotomous, MRA is identical to a t-test

If your several predictors are

categorical, MRA is identical to ANOVA

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.Form composites of the indicators of any common

construct.

Conduct a Principal Components Analysis

Use Cluster Analysis

Use non-linear regression analysis.

Transform the outcome or predictor

If your outcome vs. predictor relationship

is non-linear,

How do you deal with missing

data?

S052/§I.1(e): Applied Data AnalysisRoadmap of the Course – What Is Today’s Topic Area?

S052/§I.1(e): Applied Data AnalysisRoadmap of the Course – What Is Today’s Topic Area?

Today’s Topic Area

Page 2: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 2

S052/§I.1(e): Applied Data Analysis Printed Syllabus – What Is Today’s Topic?

S052/§I.1(e): Applied Data Analysis Printed Syllabus – What Is Today’s Topic?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Syllabus Section I.1(e), on Interpreting Findings, includes: Three things you must report, in a statistical analysis (Slide 3). The difference between analysis & reporting (Slide 4). Deciding what to report (Slides 5-6). Sidelining an effect by setting a predictor to a prototypical value

(Slide 7). Plotting fitted trend lines for prototypical children, and crafting a

conclusion (Slide 8-13). Using the GLH strategy to test “post-hoc” hypotheses (Slides 14-

18). Carrying out the “post-hoc” GLH test by hand (Slides 19-21). Appendix 1: Main effects of multiple dummies (Slides 22-25). Appendix 2: Interactions with multiple dummies (Slides 26-29). Appendix 3: What happens to main effects when interactions are

added? (Slides 30-31).

Syllabus Section I.1(e), on Interpreting Findings, includes: Three things you must report, in a statistical analysis (Slide 3). The difference between analysis & reporting (Slide 4). Deciding what to report (Slides 5-6). Sidelining an effect by setting a predictor to a prototypical value

(Slide 7). Plotting fitted trend lines for prototypical children, and crafting a

conclusion (Slide 8-13). Using the GLH strategy to test “post-hoc” hypotheses (Slides 14-

18). Carrying out the “post-hoc” GLH test by hand (Slides 19-21). Appendix 1: Main effects of multiple dummies (Slides 22-25). Appendix 2: Interactions with multiple dummies (Slides 26-29). Appendix 3: What happens to main effects when interactions are

added? (Slides 30-31).

Page 3: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 3

S052/§I.1(e): Interpreting FindingsThree Things You Must Include, In The Report Of Any Statistical Analysis

S052/§I.1(e): Interpreting FindingsThree Things You Must Include, In The Report Of Any Statistical Analysis

Report whether you have detected an

effect, or not.

Report whether you have detected an

effect, or not.

Report your hypothesis testing to stipulate whether the

effect is different from zero in the

population.

Report your hypothesis testing to stipulate whether the

effect is different from zero in the

population.

Report the direction of the effect you have

detected

Report the direction of the effect you have

detected

Stipulate whether the effect is positive or

negative?

Stipulate whether the effect is positive or

negative?

Report the magnitude of the effect you have detected.

Report the magnitude of the effect you have detected.

Stipulate whether the effect is small,

medium, or large.

Stipulate whether the effect is small,

medium, or large.

When you report differences between groups, a difference of:

.2 st. dev. is small.5 st. dev. is medium.8 st. dev. is large

When you report differences between groups, a difference of:

.2 st. dev. is small.5 st. dev. is medium.8 st. dev. is large

When you report relationships, a correlation of :

.10 is small .30 is medium .50 is large

When you report relationships, a correlation of :

.10 is small .30 is medium .50 is large

When you report your analyses,You must address three important things

When you report your analyses,You must address three important things

You can often achieve all this simultaneously, by plotting fitted trend

lines for individuals who are prototypical for the population.

You can often achieve all this simultaneously, by plotting fitted trend

lines for individuals who are prototypical for the population.

Page 4: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 4

S052/§I.1(e): S052/§I.1(e): Interpreting FindingsWhich Leads To An Obvious, But Critically Important Distinction!!!

S052/§I.1(e): S052/§I.1(e): Interpreting FindingsWhich Leads To An Obvious, But Critically Important Distinction!!!

“ANALYSIS”In analysis, you must include all important effects in your hypothesized regression models: If you don’t, then you may not

have represented the underlying population process credibly.

“ANALYSIS”In analysis, you must include all important effects in your hypothesized regression models: If you don’t, then you may not

have represented the underlying population process credibly.

“REPORTING”In reporting, you don’t need to report every effect included in your analysis: Focus on the main story and present

evidence selected to tell that story in the most effective way.

Need not report effects you’ve detected in the same form in which they were specified in the analyses (and vice versa!):

Can always conduct additional GLH tests to check out interesting post-hoc hypotheses.

“REPORTING”In reporting, you don’t need to report every effect included in your analysis: Focus on the main story and present

evidence selected to tell that story in the most effective way.

Need not report effects you’ve detected in the same form in which they were specified in the analyses (and vice versa!):

Can always conduct additional GLH tests to check out interesting post-hoc hypotheses.

Effective reporters?

In your work, make sure you distinguish carefully between…In your work, make sure you distinguish carefully between…

Your report (thesis, paper, presentation) is NOT a“A Diary of the Data-Analysis I Did Last Summer”Your report (thesis, paper, presentation) is NOT a

“A Diary of the Data-Analysis I Did Last Summer”

Possible structure for a: Research Proposal Journal Article Conference Presentation

Possible structure for a: Research Proposal Journal Article Conference Presentation

Page 5: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 5

S052/§I.1(e): Interpreting Findings You’ve Done The Analysis, But What Should You Report?

S052/§I.1(e): Interpreting Findings You’ve Done The Analysis, But What Should You Report?

These were the models that I fitted in order to get to the “final” model …

But, which ones should I report … ?

These were the models that I fitted in order to get to the “final” model …

But, which ones should I report … ?

We’ll report the “final” model, of course. But, to decide whether there are others that should be reported too, ask:• Do we want to tell more than

one story?• Which stories will we forefront,

which background?• Which model(s) supports the

story we want to forefront? Which the one we want to background?

• Should we alert the audience to particular models that were key links in the analytic chain?

• Is there a model that a particular audience would “expect” to see in our report?

• When the refrigerator door is closed and the little light goes out, do the vegetables get scared in the dark?

• …?

We’ll report the “final” model, of course. But, to decide whether there are others that should be reported too, ask:• Do we want to tell more than

one story?• Which stories will we forefront,

which background?• Which model(s) supports the

story we want to forefront? Which the one we want to background?

• Should we alert the audience to particular models that were key links in the analytic chain?

• Is there a model that a particular audience would “expect” to see in our report?

• When the refrigerator door is closed and the little light goes out, do the vegetables get scared in the dark?

• …?

Page 6: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 6

S052/§I.1(e): Interpreting Findings Deciding What We Should Report!S052/§I.1(e): Interpreting Findings Deciding What We Should Report!

Here’s one possible subset of the taxonomy that we could effectively report, in a journal article or thesis …

Here’s one possible subset of the taxonomy that we could effectively report, in a journal article or thesis …

Before we get to the actual report, let’s think about how best to interpret the main

and interaction effects in this “final” model in the taxonomy.

Before we get to the actual report, let’s think about how best to interpret the main

and interaction effects in this “final” model in the taxonomy.

Page 7: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 7

SESAGEILLAGEILLUSEAILLCM 100.0006.0020.0030.0136.2ˆ:6

S052/§I.1(e): Interpreting Findings Algebraic Representation of the “Final” Fitted Model … What Should We Interpret?

S052/§I.1(e): Interpreting Findings Algebraic Representation of the “Final” Fitted Model … What Should We Interpret?

Notice that there is no residual term in a fitted model.

Notice that there is no residual term in a fitted model.

The fitted model contains all the important effects detected,but which effects should feature in the interpretation?

The fitted model contains all the important effects detected,but which effects should feature in the interpretation?

We must interpret the effect of health status (H) -- it’s

the question predictor.

We must interpret the effect of health status (H) -- it’s

the question predictor.

We should interpret the effect of AGE, because the P.I. had a “developmental”

interest in the children?

We should interpret the effect of AGE, because the P.I. had a “developmental”

interest in the children?

How to present findings, without interpreting SES?Don’t ignore SES, just set it to some sensible prototypical value:

Perhaps the sample mean -- here, 2.3 (Handout I_1e_1)? Or, to some other substantively-reasonable value?

How is this done? Substitute this value for SES in the fitted model, & simplify. Proceed with interpretation of remaining effects as usual. But, describe them as applying to a “child of average SES.”

How to present findings, without interpreting SES?Don’t ignore SES, just set it to some sensible prototypical value:

Perhaps the sample mean -- here, 2.3 (Handout I_1e_1)? Or, to some other substantively-reasonable value?

How is this done? Substitute this value for SES in the fitted model, & simplify. Proceed with interpretation of remaining effects as usual. But, describe them as applying to a “child of average SES.”

Perhaps we don’t need to interpret the effect of SES – because it was only

included as a covariate to ensure that the “health” story was captured correctly?

Perhaps we don’t need to interpret the effect of SES – because it was only

included as a covariate to ensure that the “health” story was captured correctly?

The “hat” over the outcome variable indicates that this is the equation for the predicted values, and so this must be the fitted model.

The “hat” over the outcome variable indicates that this is the equation for the predicted values, and so this must be the fitted model.

Page 8: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

SESAGEILLAGEILLUSEAILLCM 100.0006.0020.0030.0136.2ˆ:6

AGEILLAGEILLSESUSEAILLC

AGEILLAGEILLSESUSEAILLC

AGEILLAGEILLSESUSEAILLC

006.0020.0030.0906.13.2ˆ230.0006.0020.0030.0136.23.2ˆ

3.2100.0006.0020.0030.0136.23.2ˆ

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 8

S052/§I.1(e): Interpreting Findings Sidelining A Detected Effect By Setting The Predictor To A Prototypical Value

S052/§I.1(e): Interpreting Findings Sidelining A Detected Effect By Setting The Predictor To A Prototypical Value

Let’s try this – set SES to its sample average (2.3), and simplify & collect terms in the fitted model … Let’s try this – set SES to its sample average (2.3), and simplify & collect terms in the fitted model …

We can interpret this new specification of the fitted model to illustrate the simultaneous impact

of ILL and AGE for children of average

SES.

We can interpret this new specification of the fitted model to illustrate the simultaneous impact

of ILL and AGE for children of average

SES.

This notation indicates that the

fitted values of the outcome are now expressed at the “prototypical”

substituted value of SES (= 2.3)

This notation indicates that the

fitted values of the outcome are now expressed at the “prototypical”

substituted value of SES (= 2.3)

Make Sure You Can Explain The Steps Illustrated On This Page, In Words.Make Sure You Can Explain The Steps Illustrated On This Page, In Words.

Page 9: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

AGEILLAGEILLSESUSEAILLCM 006.0020.0030.0906.13.2ˆ:6

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 9

And here are the fitted models representing the required fitted trend lines that we must plot:And here are the fitted models representing the required fitted trend lines that we must plot:

So, here’s the new version of the final fitted model for a prototypical child of average SES (=2.3) …So, here’s the new version of the final fitted model for a prototypical child of average SES (=2.3) …

S052/§I.1(e): Interpreting Findings Figuring Out Which Fitted Lines To Plot, For The Children Of Average SES?

S052/§I.1(e): Interpreting Findings Figuring Out Which Fitted Lines To Plot, For The Children Of Average SES?

Create a fitted plot in which AGE is plotted along the horizontal axis, and there are two fitted regression lines: One for chronically-ill children (ILL=1), One for healthy children (ILL=0).

Fitted values of ILLCAUSE (computed at the average value of SES) will be plotted on the vertical

axis (ordinate).

Can You Explain The Steps In The Process Illustrated On This Page, In Words?Can You Explain The Steps In The Process Illustrated On This Page, In Words?

AGESESILLUSEAILLC

AGEAGESESILLUSEAILLC

020.0906.1]3.2;0ˆ[

0006.0020.0)0(030.0906.1]3.2;0ˆ[

Healthy, Average SES

AGESESILLUSEAILLC

AGEAGESESILLUSEAILLC

AGEAGESESILLUSEAILLC

014.0936.1]3.2;1ˆ[

006.0020.0030.0906.1]3.2;1ˆ[

1006.0020.0)1(030.0906.1]3.2;1ˆ[

Chronically Ill, Average SES

Page 10: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 10

Age(Months) Ill Healthy

61.00 2.7962.00 2.80463.00 2.81864.00 2.83265.00 2.846 3.20666.00 2.86 3.22667.00 2.874 3.24668.00 2.888 3.26669.00 2.902 3.28670.00 2.916 3.30671.00 2.93 3.32672.00 2.944 3.34673.00 2.958 3.36674.00 2.972 3.38675.00 2.986 3.40676.00 3 3.42677.00 3.014 3.44678.00 3.028 3.46679.00 3.042 3.48680.00 3.056 3.50681.00 3.07 3.52682.00 3.084 3.54683.00 3.098 3.566

2

3

4

5

6

60 110 160 210

Pre

dict

ed I

LL

CA

USE

Age (Months)

Healthy

Chronically Ill

MaxMin

20364Healthy

20061Ill

Age (months)

S052/§I.1(e): Interpreting Findings Plotting the Fitted Lines For Prototypical Children

S052/§I.1(e): Interpreting Findings Plotting the Fitted Lines For Prototypical Children

Figure I.1(e).1. Predicted understanding of illness causality as a function of child age (in months) by health status, for children of average socio-economic status (SES=2.3).

Do You Know How To Use MS Excel To Create Such Plots?Do You Know How To Use MS Excel To Create Such Plots?

Page 11: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

SESAGEILLAGEILLUSEAILLCM 100.0006.0020.0030.0136.2ˆ:6

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 11

S052/§I.1(e): Interpreting Findings How Would You Include The Effect Of SES Explicitly In The Interpretation?

S052/§I.1(e): Interpreting Findings How Would You Include The Effect Of SES Explicitly In The Interpretation?

It may be important to show the impact of SES on the outcome, along with the impact of HEALTH and AGE (it would be even more

important if SES had interacted with other predictors in the model!)

It may be important to show the impact of SES on the outcome, along with the impact of HEALTH and AGE (it would be even more

important if SES had interacted with other predictors in the model!)

The main effect of SES can be surfaced explicitly by: Using the earlier approach to interpreting the effects of ILL and

AGE, but offering them at contrasting prototypical values of SES.

For example, at the 25st (“Hi SES”) and 75th (“Lo SES”) percentiles of socio-economic status, say (SES=2 and 3, respectively).

Panel Of Plots For Four Prototypical ChildrenPanel Of Plots For Four Prototypical Children

“Hi SES” PlotFitted ILLCAUSE

vs AGE,For Healthy & Ill

children

“Lo SES” PlotFitted ILLCAUSE

vs AGE,For Healthy & Ill

children

What Features Must You Consider When Deciding How To Assemble Such A Panel Of

Plots?

What Features Must You Consider When Deciding How To Assemble Such A Panel Of

Plots?

Page 12: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

AGEUSEAILLC

AGEAGEAGESESILLUSEAILLC

020.0936.1ˆ200.0020.0136.2

)2(100.00006.0020.0)0(030.0136.22;0ˆ

AGEUSEAILLC

AGEAGEAGEAGESESILLUSEAILLC

014.0966.1ˆ200.0006.0020.0030.0136.2

)2(100.01006.0020.0)1(030.0136.22;1ˆ

AGEUSEAILLC

AGEAGEAGESESILLUSEAILLC

020.0836.1ˆ300.0020.0136.2

)3(100.00006.0020.0)0(030.0136.23;0ˆ

AGEUSEAILLC

AGEAGEAGEAGESESILLUSEAILLC

014.0866.1ˆ300.0006.0020.0030.0136.2

)3(100.01006.0020.0)1(030.0136.23;1ˆ

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 12

High socio-economic status (SES=2):High socio-economic status (SES=2):

Low socio-economic status (SES=3):Low socio-economic status (SES=3):

S052/§I.1(e): Interpreting Findings Creating A Panel Of Plots To Display Simultaneous Effects Of HEALTH, AGE & SES

S052/§I.1(e): Interpreting Findings Creating A Panel Of Plots To Display Simultaneous Effects Of HEALTH, AGE & SES

Can You Explain The Steps In The Process Illustrated On This Page, In Words?Can You Explain The Steps In The Process Illustrated On This Page, In Words?

SESAGEILLAGEILLUSEAILLCM 100.0006.0020.0030.0136.2ˆ:6

Page 13: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 13

Healthy Ill Healthy Ill

61 2.7262 2.73463 2.74864 2.862 2.76265 3.236 2.876 2.77666 3.256 2.89 2.7967 3.276 2.904 2.80468 3.296 2.918 2.81869 3.316 2.932 2.83270 3.336 2.946 2.84671 3.356 2.96 2.8672 3.376 2.974 2.87473 3.396 2.988 2.88874 3.416 3.002 2.90275 3.436 3.016 2.91676 3.456 3.03 3.356 2.9377 3.476 3.044 3.376 2.94478 3.496 3.058 3.396 2.95879 3.516 3.072 3.416 2.97280 3.536 3.086 3.436 2.98681 3.556 3.1 3.456 382 3.576 3.114 3.476 3.01483 3.596 3.128 3.496 3.02884 3.616 3.142 3.516 3.042

AGE(Months)

High SES Low SES

2

3

4

5

6

60 110 160 210

Und

erst

andi

ng o

f Illn

ess

Caus

alit

y

Age (Months)

S052/§I.1(e): Interpreting Findings Panel Of Plots To Display Simultaneous Effects Of HEALTH, AGE & SES

S052/§I.1(e): Interpreting Findings Panel Of Plots To Display Simultaneous Effects Of HEALTH, AGE & SES

What Conclusion Would You Craft To Accompany This Panel Of Plots?

What Conclusion Would You Craft To Accompany This Panel Of Plots?

Healthy

SES=3

Chronically Ill

Figure I.1(e).2. Fitted values of understanding of illness causality versus the child’s age (in months) by health status, for children of low (SES=3) and high (SES=2) socio-economic status.

SES=2

SES=3SES=2

Page 14: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

60 80 100 120 140 160 180 2002

3

4

5

6

Age (Months)

Pre

dict

ed I

LL

CA

USE

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 14

Healthy

Chronically Ill

S052/§I.1(e): Interpreting Findings Generating Interesting Post-Hoc Questions About Detected Effects

S052/§I.1(e): Interpreting Findings Generating Interesting Post-Hoc Questions About Detected Effects

Whichever way you plot them and whatever stories you decide to tell … inspection of fitted plots often suggests

interesting follow-up questions whose answers may provide a useful perspective in your written account!

Whichever way you plot them and whatever stories you decide to tell … inspection of fitted plots often suggests

interesting follow-up questions whose answers may provide a useful perspective in your written account!

For instance, perhaps children of average SES really start off their lives understanding illness at

the same levels regardless of their their health status (on average, in the population)?

Perhaps there are really no differences between healthy and ill children (of average SES) when

they are young, at 60 months, say?

For instance, perhaps children of average SES really start off their lives understanding illness at

the same levels regardless of their their health status (on average, in the population)?

Perhaps there are really no differences between healthy and ill children (of average SES) when

they are young, at 60 months, say?You can provide an answer to a “post-hoc” question like this

by conducting a General Linear Hypothesis Test.You can provide an answer to a “post-hoc” question like this

by conducting a General Linear Hypothesis Test.

e.g., because the ILL×AGE interaction is present in the fitted model, differences between healthy and chronically ill children in the predicted values of

ILLCAUSE are greater among older children.

This does not mean, however, that there are statistically significant differences between

healthy and ill children at every age!

Page 15: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 15

iiiiiii SESAGEILLAGEILLILLCAUSE 43210 )(:M6

Let’s conduct a GLH test to determine whether the value of ILLCAUSE at age-60 months is the same for healthy and chronically-ill children, on average, in the population ...Let’s conduct a GLH test to determine whether the value of ILLCAUSE at age-60 months is the same for healthy and chronically-ill children, on average, in the population ...

S052/§I.1(e): Interpreting Findings Testing More Complex Post-Hoc Hypotheses:

Distinguishing Structural and Stochastic Parts of the Hypothesized Regression Model

S052/§I.1(e): Interpreting Findings Testing More Complex Post-Hoc Hypotheses:

Distinguishing Structural and Stochastic Parts of the Hypothesized Regression Model

To do this, we must first work with the hypothesized, not the fitted, final model to figure out the algebraic form of the null hypothesis that we must test in order to address our post-hoc question … we start, therefore, with:To do this, we must first work with the hypothesized, not the fitted, final model to figure out the algebraic form of the null hypothesis that we must test in order to address our post-hoc question … we start, therefore, with:

This is called the Stochastic part of the hypothesized regression model, because it contains

our hypotheses about the

randomness in the underlying population

This is called the Stochastic part of the hypothesized regression model, because it contains

our hypotheses about the

randomness in the underlying population

This is called the Structural Part of the right hand side of the hypothesized regression model, because it contains our hypotheses about the structure of the underlying population process. We often write it out separately as:

This is called the Structural Part of the right hand side of the hypothesized regression model, because it contains our hypotheses about the structure of the underlying population process. We often write it out separately as:

iiiiii SESAGEILLAGEILLILLCAUSEE 43210 )(

If we are postulating and testing hypotheses about population

process, we use the structural part of the model to express our

hypotheses about the expected value of the outcome …

If we are postulating and testing hypotheses about population

process, we use the structural part of the model to express our

hypotheses about the expected value of the outcome …

This “Expectation” notation – represented by E[…] – indicates that we are concerned with the

average value of the outcome for different types of folk in the

population, as designated by their ILL, AGE & SES values.

This “Expectation” notation – represented by E[…] – indicates that we are concerned with the

average value of the outcome for different types of folk in the

population, as designated by their ILL, AGE & SES values.

Page 16: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

420

43210

3.2603.26006003.2;60;0

SESAGEILLILLCAUSEE

Healthy (ILL=0) children of avg.socio-economic status (SES=2.3) at age 60 months (AGE=60):

43210

43210

3.260603.26016013.2;60;1

SESAGEILLILLCAUSEE i

Chronically-ill (ILL=1) children of avg. socio-economic status (SES=2.3) at age 60 months (AGE=60):

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 16

iiiiii SESAGEILLAGEILLILLCAUSEE 43210 )(:M6

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

We use the “expectation” form of the hypothesized regression model to figure out a null hypothesis that we can test, to address our post-hoc question about the possible equivalence of the values of ILLCAUSE for chronically-ill and healthy children at age 60 months, on average, in the population … as follows

We use the “expectation” form of the hypothesized regression model to figure out a null hypothesis that we can test, to address our post-hoc question about the possible equivalence of the values of ILLCAUSE for chronically-ill and healthy children at age 60 months, on average, in the population … as follows

First, let’s substitute chosen prototypical values of the predictors into the structural part of the model to specify the particular groups of children whose values of ILLCAUSE we want to compare, on

average, in the population … they are:

These two expressions are the hypothesized population values of ILLCAUSE for the two types of prototypical children … let’s carry them forward …

These two expressions are the hypothesized population values of ILLCAUSE for the two types of prototypical children … let’s carry them forward …

Page 17: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

60 80 100 120 140 160 180 2002

3

4

5

6

Age (Months)

Pre

dict

ed I

LL

CA

USE

Healthy

Chronically Ill

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 17

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

Chronically ill, 60 months, avg. SESChronically ill, 60 months, avg. SES

43210 3.26060of valuePop.

ILLCAUSE

Healthy, 60 months, avg. SESHealthy, 60 months, avg. SES

420 3.260of valuePop.

ILLCAUSE

3143

210

4

20 603.260

603.260

So, the difference in the population values of ILLCAUSE between the chronically ill and

healthy groups at age=60 and avg. SES

And here’s the focus of our GLH test … we must test:And here’s the focus of our GLH test … we must test:

060or

060 31

AGEILLILL0

0

:H

:H

Page 18: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 18

*-----------------------------------------------------------* Use a GLH test to address an interesting post-hoc question*-----------------------------------------------------------*; * Here's the final "full" model, yet again; PROC REG DATA=ILLCAUSE; VAR ILLCAUSE ILL AGE SES; M6F: MODEL ILLCAUSE = ILL AGE ILLxAGE SES; * Here's the post-hoc test; TEST ILL + 60*ILLxAGE = 0;

*-----------------------------------------------------------* Use a GLH test to address an interesting post-hoc question*-----------------------------------------------------------*; * Here's the final "full" model, yet again; PROC REG DATA=ILLCAUSE; VAR ILLCAUSE ILL AGE SES; M6F: MODEL ILLCAUSE = ILL AGE ILLxAGE SES; * Here's the post-hoc test; TEST ILL + 60*ILLxAGE = 0;

While this appears complex, it can actually be handled easily by the General Linear Hypothesis testing strategy introduced earlier … the new test is carried out in Data-Analytic Handout I.1(e).2, as follows:While this appears complex, it can actually be handled easily by the General Linear Hypothesis testing strategy introduced earlier … the new test is carried out in Data-Analytic Handout I.1(e).2, as follows:

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

Test: Results for Dependent Variable ILLCAUSE

  MeanSource DF Square F Value Pr > F

Numerator 1 1.41407 4.06 0.0452Denominator 189 0.34803

Test: Results for Dependent Variable ILLCAUSE

  MeanSource DF Square F Value Pr > F

Numerator 1 1.41407 4.06 0.0452Denominator 189 0.34803

Since p<.05, reject:

Conclude that the understanding of illness causality of chronically ill and healthy children, of average SES, is not the same at age-60 months, on average in the population.

Since p<.05, reject:

Conclude that the understanding of illness causality of chronically ill and healthy children, of average SES, is not the same at age-60 months, on average in the population.

060:0 AGEILLILLH

Page 19: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 19

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

As with any GLH strategy, you can always do the same test by hand – because it’s all about comparing the fit of competing models … and, even though you need not do it by hand, it is worth knowing how!

The Challenge Is Figuring Out What Should Serve as the Reduced Model for this Post-Hoc Test!

As with any GLH strategy, you can always do the same test by hand – because it’s all about comparing the fit of competing models … and, even though you need not do it by hand, it is worth knowing how!

The Challenge Is Figuring Out What Should Serve as the Reduced Model for this Post-Hoc Test!

iiiiiii SESAGEILLAGEILLILLCAUSE 43210 )(:FullHere’s our Full Model:Here’s our Full Model:

060:0 AGEILLILLH And here’s the Null Hypothesis we want to test:And here’s the Null Hypothesis we want to test:

Since we know that the Reduced Model is just the Full Model with the Null Hypothesis forced into it as a Constraint … why don’t we just go ahead and “force” the null hypothesis into the full model: You can see what this means by writing out the Full Model with the exact combination of parameters

specified in the null hypothesis – that’s ILL + 60ILL×AGE -- forced to be zero within the model.

How is this done? By setting ILL equal to - 60ILL×AGE in the model, like this …

Since we know that the Reduced Model is just the Full Model with the Null Hypothesis forced into it as a Constraint … why don’t we just go ahead and “force” the null hypothesis into the full model: You can see what this means by writing out the Full Model with the exact combination of parameters

specified in the null hypothesis – that’s ILL + 60ILL×AGE -- forced to be zero within the model.

How is this done? By setting ILL equal to - 60ILL×AGE in the model, like this …

iiiiii

iiiii

iiiii

iiiiii

SESILLAGEILLAGEILLCAUSESESILLAGEILLAGESESAGEILLAGEILLSESAGEILLAGEILLILLCAUSE

4320

43320

43230

43230

60)(60)(

)(60)(60

Reduced Model:Reduced Model:

This tells us that, if we go into the dataset, form a new composite predictor whose values are equal to the value of “ILL times AGE, minus 60 times ILL” for each child, and then we regress ILLCAUSE on: (a) AGE, (b) the new predictor, and (c) SES, we

will have fitted the requisite reduced model … and can conduct the GLH test by hand!

Page 20: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 20

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

Here’s the corresponding PC-SAS code:Here’s the corresponding PC-SAS code:

* Create composite predictor, COMPVAR, to include in reduced model; DATA ILLCAUSE; SET ILLCAUSE; COMPVAR = ILLxAGE-(60*ILL); * Fit the reduced model; PROC REG DATA=ILLCAUSE; VAR ILLCAUSE ILL AGE SES; M6R: MODEL ILLCAUSE = AGE COMPVAR SES;

Here, I create the new composite predictor, COMPVAR.

And add it, along with AGE and SES, as a predictor of ILLCAUSE in order to obtain the fitted Reduced Model.

Here’s the corresponding regression output:Here’s the corresponding regression output:

Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > F

Model 3 134.35548 44.78516 126.64 <.0001Error 190 67.19166 0.35364Corrected Total 193 201.54714  Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 1.91296 0.18774 10.19 <.0001AGE 1 0.02204 0.00119 18.58 <.0001COMPVAR 1 -0.01000 0.00115 -8.70 <.0001SES 1 -0.12882 0.04878 -2.64 0.0090

Critical fit statistic, from the Reduced Model is

SSModel = 134.3555

Page 21: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 21

We can check whether the difference is “statistically significant” by converting these differences in SSModel and dfModel into an F-statistic, as usual:

And because Fobserved is larger than critical value, Fcritical = F1,189(=.05) = 3.89, we can reject H0: βILL + 60 βILLAGE = 0

We can check whether the difference is “statistically significant” by converting these differences in SSModel and dfModel into an F-statistic, as usual:

And because Fobserved is larger than critical value, Fcritical = F1,189(=.05) = 3.89, we can reject H0: βILL + 60 βILLAGE = 0

348.0Model Fullin Variance Residual

in Changein Change ModelModelobserved

dfSSF

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

S052/§I.1(e): Interpreting Findings Does the Understanding Of Illness Causality of Healthy Children

Differ From That Of Chronically Ill Children, When They Are Young?

And the GLH test can be carried out by the comparison of fits of full and reduced models, as usual …And the GLH test can be carried out by the comparison of fits of full and reduced models, as usual …

Model RoleOf Model

PredictorsIn Model

Constraint Imposed To Force Full Model To

Become Reduced ModelSSModel

Change in

SSModel

DfModel

ChangeIn

dfModel

M6F FullILL, AGE,

ILLAGE, SES135.7695 4

M6R Reduced ILL, COMPVAR, SES 134.3555 3

This is the observed F-statistic provided by the GLH test.

This is the observed F-statistic provided by the GLH test.

Key Question:Is losing 1.414 units of fit from SSModel worth

gaining 1 extra degree of freedom?

Key Question:Is losing 1.414 units of fit from SSModel worth

gaining 1 extra degree of freedom?

The constraint that was imposed to make the full model become the reduced model is actually a statement of the null hypothesis being tested.

The constraint that was imposed to make the full model become the reduced model is actually a statement of the null hypothesis being tested.

This is the critical F-statistic implicit in the GLH test. The “denominator” df are those of the residual variance in the full model.

This is the critical F-statistic implicit in the GLH test. The “denominator” df are those of the residual variance in the full model.

060 AGEILLILL

Page 22: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 22

For pedagogic simplicity, let’s consider a model that is not the final model in the taxonomy, Model M2, and ask: How can you get three fitted lines from a fitted regression model that contains only the main effects of two health status dummies, D & A (and the predictor AGE)? You proceed as follows:

For pedagogic simplicity, let’s consider a model that is not the final model in the taxonomy, Model M2, and ask: How can you get three fitted lines from a fitted regression model that contains only the main effects of two health status dummies, D & A (and the predictor AGE)? You proceed as follows:

AGEADUSEAILLCM 017.0897.0946.0389.2ˆ:2

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

Estimated intercept tells you the average predicted

ILLCAUSE for healthy children

Estimated intercept tells you the average predicted

ILLCAUSE for healthy children

Estimated slopes associated with the health status dummies describe the differences in average predicted ILLCAUSE between diabetic &

healthy children and between asthmatic & healthy children, respectively

Estimated slopes associated with the health status dummies describe the differences in average predicted ILLCAUSE between diabetic &

healthy children and between asthmatic & healthy children, respectively

It’s easy to recover three fitted lines that describe this relationship. You just substitute appropriate values for the health status dummies, as follows: For prototypical Healthy children, substitute D = 0 and A = 0. For prototypical Diabetic children, substitute D = 1 and A = 0. For prototypical Asthmatic children, substitute D = 0 and A = 1.

It’s easy to recover three fitted lines that describe this relationship. You just substitute appropriate values for the health status dummies, as follows: For prototypical Healthy children, substitute D = 0 and A = 0. For prototypical Diabetic children, substitute D = 1 and A = 0. For prototypical Asthmatic children, substitute D = 0 and A = 1.

Page 23: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 23

Diabetic Child (D=1;A=0):Diabetic Child (D=1;A=0):

AGEDiabeticUSEAILLC

AGEDiabeticUSEAILLC

AGEDiabeticUSEAILLC

017.0443.1ˆ017.0946.0389.2ˆ

017.0)0(897.0)1(946.0389.2ˆ

Asthmatic Child (D=0;A=1):Asthmatic Child (D=0;A=1):

AGEAsthmaticUSEAILLC

AGEAsthmaticUSEAILLC

AGEAsthmaticUSEAILLC

017.0492.1ˆ017.0897.0389.2ˆ

017.0)1(897.0)0(946.0389.2ˆ

AGEADUSEAILLCM 017.0897.0946.0389.2ˆ:2

AGEHealthyUSEAILLC

AGEHealthyUSEAILLC

017.0389.2ˆ017.0)0(897.0)0(946.0389.2ˆ

Healthy Child (D=0;A=0):Healthy Child (D=0;A=0):

Plot these!Plot these!

Substitute the prototypical values of D and A into the fitted model:

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

Page 24: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 24

61.00 2.52962.00 2.497 2.54663.00 2.514 2.56364.00 3.477 2.531 2.5865.00 3.494 2.548 2.59766.00 3.511 2.565 2.61467.00 3.528 2.582 2.63168.00 3.545 2.599 2.64869.00 3.562 2.616 2.66570.00 3.579 2.633 2.68271.00 3.596 2.65 2.69972.00 3.613 2.667 2.71673.00 3.63 2.684 2.73374.00 3.647 2.701 2.7575.00 3.664 2.718 2.76776.00 3.681 2.735 2.78477.00 3.698 2.752 2.80178.00 3.715 2.769 2.81879.00 3.732 2.786 2.83580.00 3.749 2.803 2.85281.00 3.766 2.82 2.86982.00 3.783 2.837 2.88683.00 3.8 2.854 2.90384.00 3.817 2.871 2.92

Figure A1.1 . Fitted values of the understanding of illness causality versus the child's chronological age, by health status,

from Model M2 of the fitted taxonomy of models.

1

2

3

4

5

6

50.00 90.00 130.00 170.00 210.00

Age (Months)

Pre

dict

ed I

LL

CA

US

EHealthy

Diabetic

Asthmatic

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

Page 25: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 25

AGEADILLCAUSEEM AGEAD 0:3

You can obtain a similar explanation by substituting prototypical values of D and A into the population model…

AGEHealthyILLCAUSEE

AGEHealthyILLCAUSEE

AGE

AGEAD

0

0 00

Healthy Child (D=0;A=0):Healthy Child (D=0;A=0):

Diabetic Child (D=1;A=0):Diabetic Child (D=1;A=0):

AGEDiabeticILLCAUSEE

AGEDiabeticILLCAUSEE

AGEDiabeticILLCAUSEE

AGED

AGED

AGEAD

0

0

0 01

Asthmatic Child (D=0;A=1):Asthmatic Child (D=0;A=1):

AGEAsthmaticILLCAUSEE

AGEAsthmaticILLCAUSEE

AGEAsthmaticILLCAUSEE

AGEA

AGEA

AGEAD

0

0

0 10

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 1: Understanding the Main Effects of Multiple Dummies

Intercepts differ by the main effects of the health status dummies

AGE-slopes identical in all three population models.

Page 26: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 26

For pedagogic simplicity, let’s consider a model that is not the final model in the taxonomy, Model M3, and ask: How do you get three fitted lines with different slopes, when there are only interactions between two health status dummies, D & A, and predictor AGE in the fitted model? You proceed as follows:

For pedagogic simplicity, let’s consider a model that is not the final model in the taxonomy, Model M3, and ask: How do you get three fitted lines with different slopes, when there are only interactions between two health status dummies, D & A, and predictor AGE in the fitted model? You proceed as follows:

)(006.0)(008.0020.0141.0098.0978.1ˆ:3 AGEAAGEDAGEADUSEAILLCM

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

It’s easy to recover the three fitted lines that describe this interaction, by substituting appropriate values for the health status dummies, as follows:

For prototypical Healthy children, set D=0 and A=0. For prototypical Diabetic children, set D=1 and A=0. For prototypical Asthmatic children, set D=0 and A=1.

It’s easy to recover the three fitted lines that describe this interaction, by substituting appropriate values for the health status dummies, as follows:

For prototypical Healthy children, set D=0 and A=0. For prototypical Diabetic children, set D=1 and A=0. For prototypical Asthmatic children, set D=0 and A=1.

Presence of this two-way HEALTH by AGE interaction ensures that:Presence of this two-way HEALTH by AGE interaction ensures that:

Eitherthe relationship betweenILLCAUSE and HEALTH

differs by AGE

Eitherthe relationship betweenILLCAUSE and HEALTH

differs by AGE

Orthe relationship between

ILLCAUSE and AGEdiffers by HEALTH

Orthe relationship between

ILLCAUSE and AGEdiffers by HEALTH

Page 27: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 27

)(006.0)(008.0020.0141.0098.0978.1ˆ:3 AGEAAGEDAGEADUSEAILLCM

AGEDiabeticUSEAILLC

AGEAGEDiabeticUSEAILLC

AGEAGEAGEDiabeticUSEAILLC

012.0076.2)ˆ(

008.0020.0098.0978.1)ˆ(

)0(006.0)1(008.0020.0)0(141.0)1(098.0978.1)ˆ(

Diabetic Child:Diabetic Child:

AGEAsthmaticUSEAILLC

AGEAGEAsthmaticUSEAILLC

AGEAGEAGEAsthmaticUSEAILLC

014.0837.1)ˆ(

006.0020.0141.0978.1)ˆ(

)1(006.0)0(008.0020.0)1(141.0)0(098.0978.1)ˆ(

Asthmatic Child:Asthmatic Child:

AGEHealthyUSEAILLC

AGEAGEAGEHealthyUSEAILLC

020.0978.1)ˆ(

)0(006.0)0(008.0020.0)0(141.0)0(098.0978.1)ˆ(

Healthy Child:Healthy Child:

Plot these!Plot these!

Substitute the prototypical values of D and A into the fitted model:

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

Page 28: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 28

61.00 2.69162.00 2.82 2.70563.00 2.832 2.71964.00 3.258 2.844 2.73365.00 3.278 2.856 2.74766.00 3.298 2.868 2.76167.00 3.318 2.88 2.77568.00 3.338 2.892 2.78969.00 3.358 2.904 2.80370.00 3.378 2.916 2.81771.00 3.398 2.928 2.83172.00 3.418 2.94 2.84573.00 3.438 2.952 2.85974.00 3.458 2.964 2.87375.00 3.478 2.976 2.88776.00 3.498 2.988 2.90177.00 3.518 3 2.91578.00 3.538 3.012 2.92979.00 3.558 3.024 2.94380.00 3.578 3.036 2.95781.00 3.598 3.048 2.97182.00 3.618 3.06 2.98583.00 3.638 3.072 2.99984.00 3.658 3.084 3.013

1

2

3

4

5

6

50.00 90.00 130.00 170.00 210.00

Pre

dic

ted

IL

LC

AU

SE

Age (Months)

Figure A2.1. Fitted values of the understanding of illness causality versus chronological age, by health status, from Model M3 of the fitted taxonomy of

models..

Healthy

Diabetic

Asthmatic

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

Page 29: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 29

)()(:3 0 AGEAAGEDAGEADILLCAUSEEM AGEAAGEDAGEAD A similar explanation can be obtained by substituting prototypical values of D and A into the population model…

AGEHealthyILLCAUSEE

AGEAGEAGEHealthyILLCAUSEE

AGE

AGEAAGEDAGEAD

0

0 )0()0(00

Healthy Child (D=0;A=0):Healthy Child (D=0;A=0):

Diabetic Child (D=1;A=0):Diabetic Child (D=1;A=0):

AGEDiabeticILLCAUSEE

AGEAGEDiabeticILLCAUSEE

AGEAGEAGEDiabeticILLCAUSEE

AGEDAGED

AGEDAGED

AGEAAGEDAGEAD

0

0

0 )0()1(01

Asthmatic Child (D=0;A=1):Asthmatic Child (D=0;A=1):

AGEAsthmaticILLCAUSEE

AGEAGEAsthmaticILLCAUSEE

AGEAGEAGEAsthmaticILLCAUSEE

AGEAAGEA

AGEAAGEA

AGEAAGEDAGEAD

0

0

0 )1()0(10

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

S052/§I.1(e): Interpreting Findings Appendix 2: Understanding Two-Way Interactions With Multiple Dummies

Intercepts differ by the main effects of the health status dummies

AGE-slopes differ by the effects of the interactions between AGE and the health status dummies

Page 30: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 30

)(006.0)(008.0020.0141.0098.0978.1ˆ:3

017.0897.0946.0389.2ˆ:2

AGEAAGEDAGEADUSEAILLCM

AGEADUSEAILLCM

For pedagogic simplicity, let’s consider models that are not the final models in the taxonomy, Models M2 and M3, and ask: Why does the interpretation of the main effects differ when an interaction term is added to the model?

For pedagogic simplicity, let’s consider models that are not the final models in the taxonomy, Models M2 and M3, and ask: Why does the interpretation of the main effects differ when an interaction term is added to the model?

S052/§I.1(e): Interpreting Findings Appendix 3: What Happens to the Main Effects When Two-Way Interactions Are Included?

S052/§I.1(e): Interpreting Findings Appendix 3: What Happens to the Main Effects When Two-Way Interactions Are Included?

It’s pretty easy to see what’s going on, if you plot prototypical trajectories for both

models.

It’s pretty easy to see what’s going on, if you plot prototypical trajectories for both

models.

Notice the magnitudes and directions of the main effects of health status differ when the health status by AGE interaction is added to

convert Model M2 into Model M3!

What does this mean, or imply?

Notice the magnitudes and directions of the main effects of health status differ when the health status by AGE interaction is added to

convert Model M2 into Model M3!

What does this mean, or imply?

Page 31: S052/ § I.1(e): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/I.1(e) – Slide 31

61.00 2.691 2.52962.00 2.82 2.705 2.497 2.54663.00 2.832 2.719 2.514 2.56364.00 3.258 2.844 2.733 3.477 2.531 2.5865.00 3.278 2.856 2.747 3.494 2.548 2.59766.00 3.298 2.868 2.761 3.511 2.565 2.61467.00 3.318 2.88 2.775 3.528 2.582 2.63168.00 3.338 2.892 2.789 3.545 2.599 2.64869.00 3.358 2.904 2.803 3.562 2.616 2.66570.00 3.378 2.916 2.817 3.579 2.633 2.68271.00 3.398 2.928 2.831 3.596 2.65 2.69972.00 3.418 2.94 2.845 3.613 2.667 2.71673.00 3.438 2.952 2.859 3.63 2.684 2.73374.00 3.458 2.964 2.873 3.647 2.701 2.7575.00 3.478 2.976 2.887 3.664 2.718 2.76776.00 3.498 2.988 2.901 3.681 2.735 2.78477.00 3.518 3 2.915 3.698 2.752 2.80178.00 3.538 3.012 2.929 3.715 2.769 2.81879.00 3.558 3.024 2.943 3.732 2.786 2.83580.00 3.578 3.036 2.957 3.749 2.803 2.85281.00 3.598 3.048 2.971 3.766 2.82 2.86982.00 3.618 3.06 2.985 3.783 2.837 2.88683.00 3.638 3.072 2.999 3.8 2.854 2.90384.00 3.658 3.084 3.013 3.817 2.871 2.92

Interaction Model

1

2

3

4

5

6

0 30 60 90 120 150 180 210

Age (Months)

Pred

icte

d IL

LC

AU

SE

Main Effects Model

1

2

3

4

5

6

0 30 60 90 120 150 180 210

Age (Months)

Pred

icte

d IL

LC

AU

SE

Differences in the main effects of health status between M2 and M3 reflect the differences in intercept that occur as a result of the fitted lines “tilting” due to the presence of the Health by Age interactionDifferences in the main effects of health status between M2 and M3 reflect the differences in intercept that occur as a result of the fitted lines “tilting” due to the presence of the Health by Age interaction

Situation in the middle of the data remains pretty much the same!

Situation in the middle of the data remains pretty much the same!

S052/§I.1(e): Interpreting Findings Appendix 3: What Happens to Main Effects When Two-Way Interactions Are Included

S052/§I.1(e): Interpreting Findings Appendix 3: What Happens to Main Effects When Two-Way Interactions Are Included