33
Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Embed Size (px)

Citation preview

Page 1: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Curvilinear 2

Modeling Departures from the Straight Line

(Curves and Interactions)

Page 2: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Skill Set

• How does polynomial regression test for quadratic and cubic trends?

• What are orthogonal polynomials? When can they be used? Describe an advantage of using orthogonal polynomials over simple polynomial regression.

• Suppose we have one IV and we analyze this IV twice, once thru linear regression and once as a categorical variable. What does the test for the difference in R-square between the two tell us? What doesn’t it tell us, that is, if the result is significant, what is left to do?

Page 3: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

More skills

• Why is collinearity likely to be a problem in using polynomial regression?

• Describe the sequence of tests used to model curves in polynomial regression.

• How do you model interactions of continuous variables with regression?

• What is the difference between a moderator and a mediator? How do you test for the presence of each?

Page 4: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Nonlinear Trends in Experimental Research• Suppose we go to Bush Gardens and

Measure reactions to a roller coaster as a function of time.

• We ask for excitement ratings (1 to 10 scale) either immediately after the ride or at 5, 10 or 15 minutes after.

Page 5: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Roller Coaster Ratings

Rating (DV) Time

(Contin IV) V1 V2 V3 Stats

10 0 1 0 0 9 0 1 0 0 10 0 1 0 0 8 0 1 0 0 M=9.2 9 0 1 0 0 SD=.84 8 5 0 1 0 7 5 0 1 0 7 5 0 1 0 8 5 0 1 0 M=7.8 9 5 0 1 0 SD=.84 7 10 0 0 1 6 10 0 0 1 8 10 0 0 1 5 10 0 0 1 M=6.6 7 10 0 0 1 SD=1.14 5 15 0 0 0 6 15 0 0 0 7 15 0 0 0 7 15 0 0 0 M=6.6 8 15 0 0 0 SD=1.14

Note that IV is represented in 2 ways (a) as a continuous IV, and (b) as a dummy coded (3 vector) categorical IV.

Excitement as a function of time.

Page 6: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

SAS boxplotsof roller coaster data.

Page 7: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 1 20.25000000 20.25000000 19.49 0.0003 Error 18 18.70000000 1.03888889 Corrected Total 19 38.95000000 R-Square Coeff Var Root MSE rating Mean 0.519897 13.50012 1.019259 7.550000 Source DF Type I SS Mean Square F Value Pr > F time 1 20.25000000 20.25000000 19.49 0.0003 Source DF Type III SS Mean Square F Value Pr > F time 1 20.25000000 20.25000000 19.49 0.0003 Standard Parameter Estimate Error t Value Pr > |t| Intercept 8.900000000 0.38137179 23.34 <.0001 time -0.180000000 0.04077036 -4.41 0.0003

R-square

Roller coaster data analysis with time as continuous IV

Page 8: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 3 22.95000000 7.65000000 7.65 0.0022 Error 16 16.00000000 1.00000000 Corrected Total 19 38.95000000 R-Square Coeff Var Root MSE rating Mean 0.589217 13.24503 1.00000 7.550000 Source DF Type I SS Mean Square F Value Pr > F v1 1 18.15000000 18.15000000 18.15 0.0006 v2 1 4.80000000 4.80000000 4.80 0.0436 v3 1 0.00000000 0.00000000 0.00 1.0000 Source DF Type III SS Mean Square F Value Pr > F v1 1 16.90000000 16.90000000 16.90 0.0008 v2 1 3.60000000 3.60000000 3.60 0.0760 v3 1 0.00000000 0.00000000 0.00 1.0000 Standard Parameter Estimate Error t Value Pr > |t| Intercept 6.600000000 0.44721360 14.76 <.0001 v1 2.600000000 0.63245553 4.11 0.0008 v2 1.200000000 0.63245553 1.90 0.0760 v3 0.000000000 0.63245553 0.00 1.0000

R-square

Analysis with time as a categorical IV

Page 9: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Testing for curvesCompare the R-square values.

Linear: .51987.Categorical: .5892. )1/()1(

/)(

12

123.

212.

2123.

kNR

kkRRF

y

xyy

35.1025675.

0346515.

)1320/()5892.1(

)13/()51987.5892(.

F

Critical value (alpha = .05) of F(2,16) = 3.63, n.s.

If significant, the F test indicates departure from linearity, but not where or how. If you have M levels, can have up to (M-1) bends.

Page 10: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Review

Suppose we have one IV and we analyze this IV twice, once thru linear regression and once as a categorical variable. What does the test for the difference in R-square between the two tell us? What doesn’t it tell us, that is, if the result is significant, what is left to do?

Page 11: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Orthogonal Polynomials

Sometimes orthogonal polynomials can be used to analyze experimental data to test for curves. Two restrictive assumptions must be met to use orthogonal polynomials: (1) equal ‘spacings’ of the IV, and (2) equal numbers of observations (people) at each cell (e.g., coaster data).Orthogonal polynomials are special sets of coefficients that test for bends but manage to remain uncorrelated with one another. This gives them an advantage in statistical power and in simplicity of understanding.

Page 12: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Orthogonal Poly Rating (DV)

Time (Contin. IV)

Lin 1

Quad2

Cub 3

10 0 -3 1 -1 9 0 -3 1 -1 10 0 -3 1 -1 8 0 -3 1 -1 M=9.2 9 0 -3 1 -1 S=.84 8 5 -1 -1 3 7 5 -1 -1 3 7 5 -1 -1 3 8 5 -1 -1 3 M=7.8 9 5 -1 -1 3 S=.84 7 10 1 -1 -3 6 10 1 -1 -3 8 10 1 -1 -3 5 10 1 -1 -3 M=6.6 7 10 1 -1 -3 S=1.14 5 15 3 1 1 6 15 3 1 1 7 15 3 1 1 7 15 3 1 1 M=6.6 8 15 3 1 1 S=1.14

Coaster data with orthogonal polynomial vectors. Note the pattern in the vectors. Switches indicate bends. Find orthogonal polynomials in a table.

Page 13: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Orthogonal Poly TablePolynomial X=1 2 3 4Linear -1 0 1  

Quadratic 1 -2 1  

         

Linear -3 -1 1 3Quadratic 1 -1 -1 1Cubic -1 3 -3 1

Note. Rows in table will be columns in data. Cols in table represent levels of IV.

Page 14: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Correlations Among VectorsR Time Excite

(Rating)L Q C

Time 1        

Excite -.72 1      

Linear 1.00 -.72 1    

Quad .00 .25 .00 1  

Cubic .00 .08 .00 .00 1

These are correlations among the vectors for the coaster data. Time in minutes since leaving the coaster correlated -.72 with excitement ratings. Time correlates 1.0 with the linear trend. Note that the Linear, Quadratic and Cubic vectors are uncorrelated (orthogonal).

Page 15: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Regression with Orthogonal Polynomials

Source df Estimate Type I & Type III SS

F P

Intercept   7.55      

Linear 1 -.45 20.25 20.25 .0004

Quad 1 .35 2.45 2.45 .1371

Cubic 1 .05 0.25 0.25 .6239

Note that R-square for the model using orthogonal polynomials is the same as that using the dummy vectors. The F for the linear component is larger using orthogonal polynomials than it was for the linear regression because the error term is smaller due to the quadratic and cubic terms. Orthogonal polynomials provide a powerful test of effects. Also can be used to graph results to show bends.

Page 16: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Review

•How does polynomial regression test for quadratic and cubic trends?

•What are orthogonal polynomials? When can they be used?

Page 17: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Nonlinear Relations in Nonexperimental Research• Create power terms (IV taken to

successive powers)

• Test for increasing numbers of bends by adding terms

• Quit when adding a term does not increase variance accounted for.

Page 18: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Rating(DV)

Time Time**2 Time**3

10 0 0 09 0 0 010 0 0 08 0 0 09 0 0 08 5 25 1257 5 25 1257 5 25 1258 5 25 1259 5 25 1257 10 100 1000

6 10 100 1000

8 10 100 1000

5 10 100 1000

7 10 100 1000

5 15 225 3375

6 15 225 3375

7 15 225 3375

7 15 225 3375

8 15 225 3375

Polynomials to model bends in nonexperimental research

Page 19: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Correlations among terms

  Excite Time Time**2 Time**3

Excite 1      

Time -.72 1    

Time**2 -.62 .96 1  

Time**3 -.55 .91 .99 1

Note that terms with higher exponents are VERY highly correlated. There WILL be problems with collinearity.

Sequence of tests. Start with time, add time squared. If significant, add time cubed. Stop when adding a term doesn’t help. Each power adds a bend. Quadratic is one bend, cubic is two, and so forth.

Page 20: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Results of Polynomial Regression

  Excite Time Time**2 Time**3

Excite 1      

Time -.72 1    

Time**2 -.62 .96 1  

Time**3 -.55 .91 .99 1

Model Intercept b1 b2 b3 R2 R2 Ch

1 Time 8.90 -.18     .52 .52

2 Time, Time2

9.25 -.39 .014   .58 .06

3 Time,Time2,Time3

9.20 -.23 -.02 .001 .59 .01

Note that polynomial is a special case of hierarchical reg.

Page 21: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Polynomial Results (2)

Suppose it had happened that the term for time-squared had been significant. The regression equation is Y' = 9.25 -.39X + .014X2. The results graphed:

201612840-4

Time

12

9

6

3

0

Exc

item

ent

Roller Coaster Evaluations

Page 22: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Interpreting Weights in Polynomial Regression• All power terms for an IV work together to

define the curve relating Y to X.• Do not interpret b weights for polynomials.

They change if you subtract the mean from the raw data.

• To estimate ‘importance’ look to the change in R-square for the block of variables that represent the IV.

• Never use polynomials in a variable selection algorithm (e.g., stepwise regression).

• Specialized literature on nonlinear terms in path analysis and SEM (hard to do).

Page 23: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

ReviewDescribe an advantage of using orthogonal polynomials over simple polynomial regression.

•Why is collinearity likely to be a problem in using polynomial regression?

•Describe the sequence of tests used to model curves in polynomial regression.

Page 24: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Interactions

• An interaction means that the ‘importance’ of one variable depends upon the value of another.

• An interaction is also sometimes called a moderator, as in “Z moderates the relations between X and Y.”

• In regression, we look to see if the slope relating the DV to the IV changes depending on the value of a second IV.

Page 25: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Example Interaction

1062-2-6-10

Creativity

10

5

0

-5

-10

Pro

duct

ivity

Low Cognitive Ability

1062-2-6-10

Creativity

10

5

0

-5

-10

Pro

duct

ivity

Medium Cognitive Ability

1062-2-6-10

Creativity

10

5

0

-5

-10

Pro

duct

ivity

High Cognitive Ability

For those with low cog ability, there is a small correlation between creativity and productivity.

As cognitive ability increases, the relations between creativity and productivity become stronger. The slope of productivity on creativity depends on cog ability.

Page 26: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Interaction Response SurfaceInteraction

108

64

20

108

64

20

80

60

40

20

0

X1X2

Y

The slope of X1 depends on the value of X2 and vice versa. Regression is looking to fit this response surface and no other when we do the customary analysis for interactions with continuous IVs. More restrictive than ANOVA.

Interaction

108

642

0

10

8

6

4

2

0

80

60

40

20

0

X1

X2

Y

Page 27: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Significance Tests for Interactions• Subtract means from each IV (optional).• Compute product of IVs.• Compute significance of change in R-square

using interaction(s).• If R-square change is n.s., no interaction(s)

present.• If R-square change is significant, find the

significant interaction(s).• Graph the interaction(s)

Page 28: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

data d1; input person product create cog; inter=create*cog;

cards;

1 50 40 100

2 35 45 80

3 40 50 90

4 50 55 105

5 55 60 110

6 35 40 95

7 45 45 100

8 55 50 105

9 50 55 95

10 40 60 90

11 45 40 110

12 50 45 115

13 60 50 120

14 65 55 125

15 55 60 105

16 50 40 110

17 55 45 95

18 55 50 115

19 60 60 120

20 65 65 140

proc print; proc corr; proc glm; model product = create cog;

run;

Data to test for interaction between cognitive ability and creativity on performance.

Page 29: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Pearson Correlation Coefficients, N = 20

Prob > |r| under H0: Rho=0

person product create cog inter

person 1.00000 0.65629 0.32531 0.66705 0.57538

0.0017 0.1616 0.0013 0.0079

product 0.65629 1.00000 0.50470 0.83568 0.78465

0.0017 0.0232 <.0001 <.0001

create 0.32531 0.50470 1.00000 0.38414 0.84954

0.1616 0.0232 0.0945 <.0001

cog 0.66705 0.83568 0.38414 1.00000 0.80732

0.0013 <.0001 0.0945 <.0001

inter 0.57538 0.78465 0.84954 0.80732 1.00000

0.0079 <.0001 <.0001 <.0001

Correlation Matrix

Page 30: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

The GLM Procedure Dependent Variable: product Sum of Source DF Squares Mean Square F Value Pr > F Model 2 1080.151718 540.075859 23.93 <.0001 Error 17 383.598282 22.564605 Corrected Total 19 1463.750000 R-Square Coeff Var Root MSE product Mean 0.737935 9.360042 4.750222 50.75000 Source DF Type I SS Mean Square F Value Pr > F create 1 372.8504184 372.8504184 16.52 0.0008 cog 1 707.3012991 707.3012991 31.35 <.0001 Source DF Type III SS Mean Square F Value Pr > F create 1 57.9376107 57.9376107 2.57 0.1275 cog 1 707.3012991 707.3012991 31.35 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept -11.31387459 9.26553773 -1.22 0.2387 create 0.23848700 0.14883268 1.60 0.1275 cog 0.47077911 0.08408699 5.60 <.0001

Results for 2 IVs (Main Effects)

Page 31: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Dependent Variable: product Sum of Source DF Squares Mean Square F Value Pr > F Model 3 1088.423539 362.807846 15.47 <.0001 Error 16 375.326461 23.457904 Corrected Total 19 1463.750000 R-Square Coeff Var Root MSE product Mean 0.743586 9.543519 4.843336 50.75000 Source DF Type I SS Mean Square F Value Pr > F create 1 372.8504184 372.8504184 15.89 0.0011 cog 1 707.3012991 707.3012991 30.15 <.0001 inter 1 8.2718212 8.2718212 0.35 0.5609 Source DF Type III SS Mean Square F Value Pr > F create 1 15.35701934 15.35701934 0.65 0.4303 cog 1 49.38823497 49.38823497 2.11 0.1661 inter 1 8.27182120 8.27182120 0.35 0.5609 Standard Parameter Estimate Error t Value Pr > |t| Intercept -45.48374780 58.31267387 -0.78 0.4468 create 0.87233592 1.07813934 0.81 0.4303 cog 0.79009054 0.54451482 1.45 0.1661 inter -0.00587585 0.00989498 -0.59 0.5609

Result for interaction

Page 32: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Moderator and Mediator

• Moderator Means Interaction. Slope of one depends on the value of the other. Use moderated regression (test for an interaction) to test.

• Mediator means there is a causal chain of events. The mediating variable is the proximal cause of the DV. A more distal cause changes the mediator. Use path analysis to test. In this graph, X2 is the mediator.

Moderator Mediator

X1 X2

Y

X1 X2 Y

Page 33: Curvilinear 2 Modeling Departures from the Straight Line (Curves and Interactions)

Review

• How do you model interactions of continuous variables with regression?

• What is the difference between a moderator and a mediator? How do you test for the presence of each?