27
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters 9.4 – 9.5) Specially Constructed Explanatory Variables (Chapter 9.3) Polynomial terms for curvature Interaction terms Sets of indicator variables for nominal variables

Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Lecture 21 – Thurs., Nov. 20

• Review of Interpreting Coefficients and Prediction in Multiple Regression

• Strategy for Data Analysis and Graphics (Chapters 9.4 – 9.5)

• Specially Constructed Explanatory Variables (Chapter 9.3)– Polynomial terms for curvature– Interaction terms– Sets of indicator variables for nominal variables

Page 2: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Interpreting Coefficients

• Multiple Linear Regression Model

• Interpretation of Coefficient : The change in the mean of Y that is associated with increasing Xj by one unit and not changing X1,…,Xj-1, Xj+1,…,Xp

• Interpretation holds even if X1,…,Xp are correlated.• Same warning about extrapolation beyond the observed

X1,…,Xp points as in simple linear regression.

ppp XXXXY 1101 },....,|{

j

Page 3: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Coefficients in Mammal Study

• It is estimated that– A 1 kg increase in body weight with gestation period and

litter size held fixed is associated with a 0.90 g mean increase in brain weight [95% CI: (0.80,1.17)]

– A 1 day increase in gestation period with body weight and litter size held fixed is associated with a 1.81g mean increase in brain weight [95% CI : (1.10,2.51)]

– A 1 animal increase in litter size with body weight and gestation period held fixed is associated with a 27.65g mean increase in brain weight [95% CI: (-6.94, 62.23)]

Parameter Estimates Term Estimate Std Error Prob>|t| Low er 95% Upper 95% Intercept -225.2921 83.05875 0.0080 -390.254 -60.33028 BODY 0.9858781 0.094283 <.0001 0.7986246 1.1731315 GESTATION 1.8087434 0.354449 <.0001 1.1047774 2.5127094 LITTER 27.648639 17.41429 0.1158 -6.937651 62.234929

Page 4: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Prediction from Multiple Regression

• Estimated mean brain weight (=predicted brain weight) for a mammal which has a body weight of 3kg, a gestation period of 180 days and a litter size of 1

Parameter Estimates Term Estimate Std Error Prob>|t| Low er 95% Upper 95% Intercept -225.2921 83.05875 0.0080 -390.254 -60.33028 BODY 0.9858781 0.094283 <.0001 0.7986246 1.1731315 GESTATION 1.8087434 0.354449 <.0001 1.1047774 2.5127094 LITTER 27.648639 17.41429 0.1158 -6.937651 62.234929

13.1311*65.27180*81.13*99.029.225ˆ nsizeibra

Page 5: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Strategy for Data Analysis and Graphics

• Strategy for Data Analysis: Display 9.9 in Chapter 9.4

• Good graphical method for initial exploration of data is a matrix of pairwise scatterplots. To display this in JMP, click on Analyze, Multivariate and then put all the variables in Y, Columns.

Page 6: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Specially Constructed Explanatory Variables

• The scope of multiple linear regression can be dramatically expanded by using specially constructed explanatory variables: – Powers of the explanatory variables Xj

k can be used to model curvature in regression function.

– Indicator variables can be used to model the effect of nominal variables

– Products of explanatory variables can be used to model interactive effects of explanatory variables

kjXX

Page 7: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Curved Regression Functions

• Linearity assumption in simple linear regression is violated. Transformations wouldn’t work because function isn’t monotonic.

Bivariate Fit of YIELD By RAINFALL

20

25

30

35

40

YIE

LD

6 7 8 9 1011121314151617

RAINFALL

-10

-5

0

5

Res

idua

l

6 7 8 9 10 11 12 13 14 15 16 17

RAINFALL

Page 8: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Squared Term for Curvature

• Multiple Linear Regression Model:

2

210}|{ rainrainrainyield Bivariate Fit of YIELD By RAINFALL

20

25

30

35

40

YIE

LD

6 7 8 9 1011121314151617

RAINFALL

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 21.660175 3.094868 7.00 <.0001 RAINFALL 1.0572654 0.293956 3.60 0.0010 (RAINFALL-10.7842)^2 -0.229364 0.088635 -2.59 0.0140

-10

-5

0

5

Re

sid

ua

l

6 7 8 9 10 11 12 13 14 15 16 17

RAINFALL

Page 9: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Terms for Curvature

• Two ways to incorporate squared or higher polynomial terms for curvature in JMP

– Fit Model, create a variable rainfall2

– Fit Y by X, under red triangle next to Bivariate Fit of Yield by Rainfall, click Fit Polynomial then 2, Quadratic instead of Fit Line (a model with both a squared and cubed term can be fit by clicking 3, Cubic)

• Coefficients are not directly interpretable. Change in the mean of Y that is associated with a one unit increase in X depends on X

]1)*2[(

][

])1()1([}|{}1|{

21

2210

2210

X

XX

XXXYXY

Page 10: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Interaction Terms

• Two variables are said to interact if the effect that one of them has on the mean response depends on the value of the other.

• An explanatory variable for interaction can be constructed as the product of the two explanatory variables that are thought to interact.

Page 11: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Interaction in Meadowfoam

• Does the effect of light intesnity on mean number of flowers depend on the timing of light regime?

• Multiple linear regression model that has term for interaction:

• Model is equivalent to

• Change in mean of flowers for a one unit increase in light intensity depends on timing onset.

• Coefficients are not easily interpretable. Best method for communicating findings with interaction is table or graph of estimated means at various combinations of interacting variables.

)*(},|{ 3210 earlylightearlylightearlylightflowers

lightearlyearlyearlylightflowers *)()(},|{ 3120

Page 12: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Interaction in Meadowfoam

• There is not much evidence of an interaction. The p-value for the test that the interaction coefficient is zero is 0.9096.

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|

Intercept 71.623333 4.343305 16.49 <.0001 Early 11.523333 6.142361 1.88 0.0753 INTENS -0.041076 0.007435 -5.52 <.0001 Early*Intens 0.0012095 0.010515 0.12 0.9096

Page 13: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Displaying Interaction – Coded Scatterplots (Section 9.5.2)

• A coded scatterplot is a scatterplot with different symbols to distinguish two or more groups

O v e r l a y P l o t O v e r l a y Y ' s

20.000000000000030.000000000000040.000000000000050.000000000000060.000000000000070.000000000000080.000000000000090.0000000000000

Y

.0000000000000INTENS

Y 0 1

Page 14: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Coded Scatterplots in JMP

• Split the Y variable by the group identity variables (Click Tables, Split, then put Y variable in Split and Group Identity variable in Col ID).

• Graph, Overlay Plot, put the columns corresponding to the Y’s for the different group identity variables in Y and put the X variable (light intensity) in X.

Page 15: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Parallel vs. Separate Regression Lines

• Model without interaction between time onset and light intensity is a “parallel regression lines” model

• Model with interaction is a “separate regression lines” model

210

10

210

}1,|{

}0,|{

},|{

lightearlylightflowers

lightearlylightflowers

earlylightearlylightflowers

earlylightearlylightflowers

lightearlylightflowers

earlylightearlylightearlylightflowers

*)(}1,|{

}0,|{

)*(},|{

3120

10

3210

Page 16: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Polynomials and Interactions Example

• An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

• Relationship between y and each explanatory variable might be quadratic because restaurants attract mostly middle-income households and children in the mid age ranges.

Page 17: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

fastfoodchain.jmp results

• Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|

Intercept -1133.981 320.0193 -3.54 0.0022 Income 173.20317 28.20399 6.14 <.0001 Age 23.549963 32.23447 0.73 0.4739 Income sq -3.726129 0.542156 -6.87 <.0001 Age sq -3.868707 1.179054 -3.28 0.0039 (Income)( Age) 1.9672682 0.944082 2.08 0.0509

Page 18: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Nominal Variables

• To incorporate nominal variables in multiple regression analysis, we use indicator variables.

• Indicator variable to distinguish between two groups: The time onset (early vs. late is a nominal variable). To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.

earlylightearlylightflowers 210},|{

Page 19: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Nominal Variables with More than Two Categories

• To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.

Page 20: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Nominal Explanatory Variables Example: Auction Car Prices

• A car dealer wants to predict the auction price of a car.– The dealer believes that odometer reading and

the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP)

– Three color categories are considered:• White• Silver• Other colors

Note: Color is a nominal variable.

Page 21: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

I1 =1 if the color is white0 if the color is not white

I2 =1 if the color is silver0 if the color is not silver

The category “Other colors” is defined by:I1 = 0; I2 = 0

Indicator Variables in Auction Car Prices

Page 22: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

• Solution– the proposed model is

– The dataPrice Odometer I-1 I-214636 37388 1 014122 44758 1 014016 45833 0 015590 30862 0 015568 31705 0 114718 34010 0 1

. . . .

. . . .

White car

Other color

Silver color

Auction Car Price Model

231210},|{ IIodometercolorodometerY

Page 23: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Odometer

Price

Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1)

Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0)

Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)

16701 - .0555(Odometer)

16791.48 - .0555(Odometer)

16996.48 - .0555(Odometer)

The equation for an“other color” car.

The equation for awhite color car.

The equation for asilver color car.

From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)

Example: Auction Car Price The Regression Equation

Page 24: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

From JMP we get the regression equationPRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2)

A white car sells, on the average, for $90.48 more than a car of the “Other color” category

A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.

For one additional mile the auction price decreases by 5.55 cents.

Example: Auction Car Price The Regression Equation

Page 25: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

There is insufficient evidenceto infer that a white color car anda car of “other color” sell for adifferent auction price.

There is sufficient evidenceto infer that a silver color carsells for a larger price than acar of the “other color” category.

Xm18-02b

Example: Auction Car Price The Regression Equation

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t|

Intercept 16700.646 184.3331 90.60 <.0001 Odometer -0.05554 0.004737 -11.72 <.0001 I-1 90.481959 68.16886 1.33 0.1876 I-2 295.47602 76.36998 3.87 0.0002

Page 26: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Shorthand Notation for Nominal Variables

• Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables– Parallel Regression Lines model:

– Separate Regression Lines model:

TIMElightTIMElightflowers },|{

)*(},|{ TIMElightTIMElightTIMElightflowers

Page 27: Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters

Nominal Variables in JMP

• It is not necessary to create indicator variables yourself to represent a nominal variable.

• Make sure that the nominal variable’s modeling type is in fact nominal.

• Include the nominal variable in the Construct Model Effects box in Fit Model

• JMP will create indicator variables. The brackets indicate the category of the nominal variable for which the indicator variable is 1.