78
The Use of Dummy Variables

The Use of Dummy Variables

  • Upload
    baris

  • View
    60

  • Download
    2

Embed Size (px)

DESCRIPTION

The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent variables are categorical. - PowerPoint PPT Presentation

Citation preview

Page 1: The Use of Dummy Variables

The Use of Dummy Variables

Page 2: The Use of Dummy Variables

• In the examples so far the independent variables are continuous numerical variables.

• Suppose that some of the independent variables are categorical.

• Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.

Page 3: The Use of Dummy Variables

Example:Comparison of Slopes of k Regression Lines with Common Intercept

Page 4: The Use of Dummy Variables

Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured

both – Y (the response variable) and – X (an independent variable)

• Y is assumed to be linearly related to X with – the slope dependent on treatment

(population), while – the intercept is the same for each treatment

Page 5: The Use of Dummy Variables

The Model:k) , ... 2, 1, (i ient for treatm )(

1 XY i

30201000

20

40

60

80

100

120

Graphical Illustration of the above Model

x

yTreat 1

Treat 2

Treat 3

Treat k

.....

Common Intercept

Different Slopes

Page 6: The Use of Dummy Variables

• This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.

• Dummy variables are variables that are artificially defined

Page 7: The Use of Dummy Variables

In this case we define a new variable for each category of the categorical variable.

That is we will define Xi for each category of treatments as follows:

otherwise0

i treatmentreceivessubject theifXX i

Page 8: The Use of Dummy Variables

Then the model can be written as follows:

The Complete Model:

where

otherwise0

i treatmentreceivessubject theifXX i

kk XXXY )(

12)2(

11)1(

10

Page 9: The Use of Dummy Variables

In this case

Dependent Variable: Y

Independent Variables: X1, X2, ... , Xk

Page 10: The Use of Dummy Variables

In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis

(q = k – 1)

)(1

)2(1

)1(10 : kH

Page 11: The Use of Dummy Variables

The Reduced Model:

Dependent Variable: Y

Independent Variable:

X = X1+ X2+... + Xk

XY 10

Page 12: The Use of Dummy Variables

Example:

In the following example we are measuring – Yield Y

as it depends on – the amount (X) of a pesticide.

Again we will assume that the dependence of Y on X will be linear.

(I should point out that the concepts that are used in this discussion can easily be adapted to the non-linear situation.)

Page 13: The Use of Dummy Variables

• Suppose that the experiment is going to be repeated for three brands of pesticides:

• A, B and C. • The quantity, X, of pesticide in this

experiment was set at 4 different levels: – 2 units/hectare, – 4 units/hectare and – 8 units per hectare.

• Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide.

Page 14: The Use of Dummy Variables

• Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent.

Page 15: The Use of Dummy Variables

The data for this experiment is given in the following table:

  2 4 8

A 29.63 28.16 28.45

31.87 33.48 37.21

28.02 28.13 35.06

35.24 28.25 33.99

B 32.95 29.55 44.38

24.74 34.97 38.78

23.38 36.35 34.92

32.08 38.38 27.45

C 28.68 33.79 46.26

28.70 43.95 50.77

22.67 36.89 50.21

30.02 33.56 44.14

Page 16: The Use of Dummy Variables

0

20

40

60

0 1 2 3 4 5 6 7 8

A

B

C

Page 17: The Use of Dummy Variables

Pesticide X (Amount) X1 X2 X3 Y

A 2 2 0 0 29.63A 2 2 0 0 31.87A 2 2 0 0 28.02A 2 2 0 0 35.24B 2 0 2 0 32.95B 2 0 2 0 24.74B 2 0 2 0 23.38B 2 0 2 0 32.08C 2 0 0 2 28.68C 2 0 0 2 28.70C 2 0 0 2 22.67C 2 0 0 2 30.02A 4 4 0 0 28.16A 4 4 0 0 33.48A 4 4 0 0 28.13A 4 4 0 0 28.25B 4 0 4 0 29.55B 4 0 4 0 34.97B 4 0 4 0 36.35B 4 0 4 0 38.38C 4 0 0 4 33.79C 4 0 0 4 43.95C 4 0 0 4 36.89C 4 0 0 4 33.56A 8 8 0 0 28.45A 8 8 0 0 37.21A 8 8 0 0 35.06A 8 8 0 0 33.99B 8 0 8 0 44.38B 8 0 8 0 38.78B 8 0 8 0 34.92B 8 0 8 0 27.45C 8 0 0 8 46.26C 8 0 0 8 50.77C 8 0 0 8 50.21C 8 0 0 8 44.14

The data as it would appear in a data file. The variables X1, X2 and X3 are the

“dummy” variables

Page 18: The Use of Dummy Variables

Fitting the complete model :ANOVA          

  df SS MS F Significance F

Regression 3 1095.815813 365.2719378 18.33114788 4.19538E-07

Residual 32 637.6415754 19.92629923    

Total 35 1733.457389      

  CoefficientsIntercept 26.24166667

X1 0.981388889

X2 1.422638889

X3 2.602400794

Page 19: The Use of Dummy Variables

Fitting the reduced model :ANOVA          

  df SS MS F Significance F

Regression 1 623.8232508

623.8232508

19.11439978

0.000110172

Residual 34 1109.634138

32.63629818

   

Total 35 1733.457389

     

  Coefficients

Intercept 26.24166667

X 1.668809524

Page 20: The Use of Dummy Variables

The Anova Table for testing the equality of slopes

  df SS MS F Significance F

common slope zero

1 623.8232508 623.8232508 31.3065283 3.51448E-06

Slope comparison 2 471.9925627 235.9962813 11.84345766 0.000141367

Residual 32 637.6415754 19.92629923    

Total 35 1733.457389      

Page 21: The Use of Dummy Variables

Example:Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)

Page 22: The Use of Dummy Variables

Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured

both Y (then response variable) and X (an independent variable)

• Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment.

• Y is called the response variable, while X is called the covariate.

Page 23: The Use of Dummy Variables

The Model:k) , ... 2, 1, (i ient for treatm 1

)(0 XY i

30201000

100

200

Graphical Illustration of the One-wayAnalysis of Covariance Model

x

y

Treat 1Treat 2

Treat 3

Treat k

Common Slopes

Page 24: The Use of Dummy Variables

Equivalent Forms of the Model:

ient for treatm 1i XXY

ient for treatmmean adjusted i

1)

ient for treatm 1i XXY

responsemean adjusted overall i 2)

ient for treatmeffect adjusted i

ii

Page 25: The Use of Dummy Variables

• This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.

Page 26: The Use of Dummy Variables

In this case we define a new variable for each category of the categorical variable.

That is we will define Xi for categories I

i = 1, 2, …, (k – 1) of treatments as follows:

otherwise0

i treatmentreceivessubject theif1iX

Page 27: The Use of Dummy Variables

Then the model can be written as follows:

The Complete Model:

where

otherwise0

i treatmentreceivessubject theif1iX

XXXXY kk 11122110

Page 28: The Use of Dummy Variables

In this case

Dependent Variable: Y

Independent Variables:

X1, X2, ... , Xk-1, X

Page 29: The Use of Dummy Variables

In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis

(q = k – 1)

0: 1210 kH

Page 30: The Use of Dummy Variables

The Reduced Model:

Dependent Variable: Y

Independent Variable: X

XY 10

Page 31: The Use of Dummy Variables

Example:

In the following example we are interested in comparing the effects of five workbooks (A, B, C, D, E) on the performance of students in Mathematics. For each workbook, 15 students are selected (Total of n = 15×5 = 75). Each student is given a pretest (pretest score ≡ X) and given a final test (final score ≡ Y). The data is given on the following slide

Page 32: The Use of Dummy Variables

The data

Pre Post Pre Post Pre Post Pre Post Pre Post

43.0 46.4 43.6 52.5 57.5 61.9 59.9 56.1 43.2 46.055.3 43.9 45.2 61.8 49.3 57.5 50.5 49.6 60.7 59.759.4 59.7 54.2 69.1 48.0 52.5 45.0 46.1 42.7 45.451.7 49.6 45.5 61.7 31.3 42.9 55.0 53.2 46.6 44.353.0 49.3 43.4 53.3 65.3 74.5 52.6 50.8 42.6 46.548.7 47.1 50.1 57.4 47.1 48.9 62.8 60.1 25.6 38.445.4 47.4 36.2 48.7 34.8 47.2 41.4 49.5 52.5 57.742.1 33.3 55.1 61.9 53.9 59.8 62.1 58.3 51.2 47.160.0 53.2 48.9 55.0 42.7 49.6 56.4 58.1 48.8 50.432.4 34.1 52.9 63.3 47.6 55.6 54.2 56.8 44.1 52.774.4 66.7 51.7 64.7 56.1 62.4 51.6 46.1 73.8 73.643.2 43.2 55.3 66.4 39.7 52.1 63.3 56.0 52.6 50.844.5 42.5 45.2 59.4 32.3 49.7 37.3 48.8 67.8 66.847.1 51.3 37.6 56.9 59.5 67.1 39.2 45.1 42.9 47.257.0 48.9 41.7 51.3 46.2 55.2 62.1 58.0 51.7 57.0

Workbook EWorkbook A Workbook B Workbook C Workbook D

The Model:

( )0 1 for workbook ( , , , , )iY X i i A B C D E

Page 33: The Use of Dummy Variables

Graphical display of data

0

10

20

30

40

50

60

70

80

0 20 40 60 80Pretest Score

Fin

al

Sc

ore

Workbook A

Workbook B

Workbook C

Workbook D

Workbook E

Page 34: The Use of Dummy Variables

Some comments

1. The linear relationship between Y (Final Score) and X (Pretest Score), models the differing aptitudes for mathematics.

2. The shifting up and down of this linear relationship measures the effect of workbooks on the final score Y.

Page 35: The Use of Dummy Variables

The Model:( )0 1 for workbook ( , , , , )iY X i i A B C D E

30 20 10 0 0

100

200

Graphical Illustration of the One-way Analysis of Covariance Model

x

y D C

B

A

Common Slopes

Page 36: The Use of Dummy Variables

The data as it would appear in a data file.

Pre Final Workbook43 46.4 A

55.3 43.9 A59.4 59.7 A51.7 49.6 A

53 49.3 A48.7 47.1 A45.4 47.4 A42.1 33.3 A

60 53.2 A32.4 34.1 A74.4 66.7 A43.2 43.2 A44.5 42.5 A47.1 51.3 A

57 48.9 A43.6 52.5 B45.2 61.8 B54.2 69.1 B45.5 61.7 B43.4 53.3 B

54.2 56.8 D51.6 46.1 D63.3 56 D37.3 48.8 D39.2 45.1 D62.1 58 D43.2 46 E60.7 59.7 E42.7 45.4 E46.6 44.3 E42.6 46.5 E25.6 38.4 E52.5 57.7 E51.2 47.1 E48.8 50.4 E44.1 52.7 E73.8 73.6 E52.6 50.8 E67.8 66.8 E42.9 47.2 E51.7 57 E

Page 37: The Use of Dummy Variables

The data as it would appear in a data file with Dummy variables, (X1 , X2, X3, X4 )added

Pre Final Workbook X1 X2 X3 X443 46.4 A 1 0 0 0

55.3 43.9 A 1 0 0 059.4 59.7 A 1 0 0 051.7 49.6 A 1 0 0 0

53 49.3 A 1 0 0 048.7 47.1 A 1 0 0 045.4 47.4 A 1 0 0 042.1 33.3 A 1 0 0 0

60 53.2 A 1 0 0 032.4 34.1 A 1 0 0 074.4 66.7 A 1 0 0 043.2 43.2 A 1 0 0 044.5 42.5 A 1 0 0 047.1 51.3 A 1 0 0 0

57 48.9 A 1 0 0 043.6 52.5 B 0 1 0 045.2 61.8 B 0 1 0 0

37.3 48.8 D 0 0 0 139.2 45.1 D 0 0 0 162.1 58 D 0 0 0 143.2 46 E 0 0 0 060.7 59.7 E 0 0 0 042.7 45.4 E 0 0 0 046.6 44.3 E 0 0 0 042.6 46.5 E 0 0 0 025.6 38.4 E 0 0 0 052.5 57.7 E 0 0 0 051.2 47.1 E 0 0 0 048.8 50.4 E 0 0 0 044.1 52.7 E 0 0 0 073.8 73.6 E 0 0 0 052.6 50.8 E 0 0 0 067.8 66.8 E 0 0 0 042.9 47.2 E 0 0 0 051.7 57 E 0 0 0 0

Page 38: The Use of Dummy Variables

Here is the data file in SPSS with the Dummy variables, (X1 , X2, X3, X4 )added. The can be added within SPSS

Page 39: The Use of Dummy Variables

Fitting the complete model

The dependent variable is the final score, Y.The independent variables are the Pre-score X and the four dummy variables X1, X2, X3, X4.

Page 40: The Use of Dummy Variables

The OutputVariables Entered/Removedb

X4, PRE,X3, X1, X2

a . Enter

Model1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: FINALb.

Model Summary

.908a .825 .812 3.594Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), X4, PRE, X3, X1, X2a.

Page 41: The Use of Dummy Variables

The Output - continuedANOVAb

4191.378 5 838.276 64.895 .000a

891.297 69 12.917

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), X4, PRE, X3, X1, X2a.

Dependent Variable: FINALb.

Coefficientsa

16.954 2.441 6.944 .000

.709 .045 .809 15.626 .000

-4.958 1.313 -.241 -3.777 .000

8.553 1.318 .416 6.489 .000

5.231 1.317 .254 3.972 .000

-1.602 1.320 -.078 -1.214 .229

(Constant)

PRE

X1

X2

X3

X4

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: FINALa.

Page 42: The Use of Dummy Variables

The interpretation of the coefficients

Coefficientsa

16.954 2.441 6.944 .000

.709 .045 .809 15.626 .000

-4.958 1.313 -.241 -3.777 .000

8.553 1.318 .416 6.489 .000

5.231 1.317 .254 3.972 .000

-1.602 1.320 -.078 -1.214 .229

(Constant)

PRE

X1

X2

X3

X4

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: FINALa.

The common slope

Page 43: The Use of Dummy Variables

The interpretation of the coefficients

Coefficientsa

16.954 2.441 6.944 .000

.709 .045 .809 15.626 .000

-4.958 1.313 -.241 -3.777 .000

8.553 1.318 .416 6.489 .000

5.231 1.317 .254 3.972 .000

-1.602 1.320 -.078 -1.214 .229

(Constant)

PRE

X1

X2

X3

X4

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: FINALa.

The intercept for workbook E

Page 44: The Use of Dummy Variables

The interpretation of the coefficients

Coefficientsa

16.954 2.441 6.944 .000

.709 .045 .809 15.626 .000

-4.958 1.313 -.241 -3.777 .000

8.553 1.318 .416 6.489 .000

5.231 1.317 .254 3.972 .000

-1.602 1.320 -.078 -1.214 .229

(Constant)

PRE

X1

X2

X3

X4

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: FINALa.

The changes in the intercept when we change from workbook E to other workbooks.

Page 45: The Use of Dummy Variables

1. When the workbook is E then X1 = 0,…, X4 = 0 and

0 1 1 2 2 3 3 4 4 1Y X X X X X

The model can be written as follows:

The Complete Model:

0 1Y X

2. When the workbook is A then X1 = 1,…, X4 = 0 and

0 1 1Y X

hence 1 is the change in the intercept when we change form workbook E to workbook A.

Page 46: The Use of Dummy Variables

0 1 2 3 4i.e. : 0H

Testing for the equality of the intercepts

0 1Y X

The reduced model

The dependent variable in only X (the pre-score)

Page 47: The Use of Dummy Variables

Fitting the reduced model

The dependent variable is the final score, Y.The independent variables is only the Pre-score X.

Page 48: The Use of Dummy Variables

The Output for the reduced model

Variables Entered/Removedb

PREa . EnterModel1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: FINALb.

Model Summary

.700a .490 .483 5.956Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), PREa.

Lower R2

Page 49: The Use of Dummy Variables

The Output - continuedANOVAb

2492.779 1 2492.779 70.263 .000a

2589.896 73 35.478

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), PREa.

Dependent Variable: FINALb.

Increased R.S.SCoefficientsa

23.105 3.692 6.259 .000

.614 .073 .700 8.382 .000

(Constant)

PRE

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: FINALa.

Page 50: The Use of Dummy Variables

The F Test

Reduction in R.S.S

MSE for complete modelq

F

Page 51: The Use of Dummy Variables

ANOVAb

2492.779 1 2492.779 70.263 .000a

2589.896 73 35.478

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), PREa.

Dependent Variable: FINALb.

The Reduced model

The Complete modelANOVAb

4191.378 5 838.276 64.895 .000a

891.297 69 12.917

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), X4, PRE, X3, X1, X2a.

Dependent Variable: FINALb.

Page 52: The Use of Dummy Variables

The F test

reduced ANOVASum of Squares df Mean Square F Sig.

Regression 2492.77885 1 2492.77885 70.2626 4.56272E-13Residual 2589.89635 73 35.47803219Total 5082.6752 74

Complete ANOVASum of Squares df Mean Square F Sig.

Regression 4191.377971 5 838.2755942 64.89532 9.99448E-25Residual 891.297229 69 12.91735115Total 5082.6752 74

Sum of Squares df Mean Square F Sig.

slope 2492.77885 1 2492.77885 192.9791 1.13567E-21equality of int. 1698.599121 4 424.6497803 32.87437 2.46006E-15Residual 891.297229 69 12.91735115Total 5082.6752 74

Test equality of slope

Page 53: The Use of Dummy Variables

0 1i.e. : 0H

Testing for zero slope

0 1 1 2 2 3 3 4 4Y X X X X

The reduced model

The dependent variables are X1, X2, X3, X4 (the dummies)

Page 54: The Use of Dummy Variables

The Reduced model

The Complete modelANOVAb

4191.378 5 838.276 64.895 .000a

891.297 69 12.917

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), X4, PRE, X3, X1, X2a.

Dependent Variable: FINALb.

ANOVAb

1037.475 4 259.369 4.488 .003a

4045.200 70 57.789

5082.675 74

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), X4, X3, X2, X1a.

Dependent Variable: FINALb.

Page 55: The Use of Dummy Variables

The F testReduced Sum of Squares df Mean Square F Sig.

Regression 1037.4752 4 259.3688 4.488237 0.002757501Residual 4045.2 70 57.78857143Total 5082.6752 74

Complete Sum of Squares df Mean Square F Sig.Regression 4191.377971 5 838.2755942 64.89532 9.99448E-25Residual 891.297229 69 12.91735115Total 5082.6752 74

Zero slope Sum of Squares df Mean Square F Sig.Regression 1037.4752 4 259.3688 20.0791 5.30755E-11zero slope 3153.902771 1 3153.902771 244.1602 2.3422E-24Residual 891.297229 69 12.91735115Total 5082.6752 74

Page 56: The Use of Dummy Variables

The Analysis of Covariance

• This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA)

• The package sets up the dummy variables automatically

Page 57: The Use of Dummy Variables

Here is the data file in SPSS . The Dummy variables are no longer needed.

Page 58: The Use of Dummy Variables

In SPSS to perform ANACOVA you select from the menu –Analysis->General Linear Model->Univariatee

Page 59: The Use of Dummy Variables

This dialog box will appear

Page 60: The Use of Dummy Variables

You now select:1. The dependent variable Y (Final Score)2. The Fixed Factor (the categorical

independent variable – workbook)3. The covariate (the continuous independent

variable – pretest score)

Page 61: The Use of Dummy Variables

Compare this with the previous computed table

Tests of Between-Subjects Effects

Dependent Variable: FINAL

4191.378a 5 838.276 64.895 .000

837.590 1 837.590 64.842 .000

3153.903 1 3153.903 244.160 .000

1698.599 4 424.650 32.874 .000

891.297 69 12.917

219815.6 75

5082.675 74

SourceCorrected Model

Intercept

PRE

WORKBOOK

Error

Total

Corrected Total

Type IIISum ofSquares df

MeanSquare F Sig.

R Squared = .825 (Adjusted R Squared = .812)a.

Sum of Squares df Mean Square F Sig.

slope 2492.77885 1 2492.77885 192.9791 1.13567E-21equality of int. 1698.599121 4 424.6497803 32.87437 2.46006E-15Residual 891.297229 69 12.91735115Total 5082.6752 74

The output: The ANOVA TABLE

Page 62: The Use of Dummy Variables

This is the sum of squares in the numerator when we attempt to test if the slope is zero (and allow the intercepts to be different)

Tests of Between-Subjects Effects

Dependent Variable: FINAL

4191.378a 5 838.276 64.895 .000

837.590 1 837.590 64.842 .000

3153.903 1 3153.903 244.160 .000

1698.599 4 424.650 32.874 .000

891.297 69 12.917

219815.6 75

5082.675 74

SourceCorrected Model

Intercept

PRE

WORKBOOK

Error

Total

Corrected Total

Type IIISum ofSquares df

MeanSquare F Sig.

R Squared = .825 (Adjusted R Squared = .812)a.

The output: The ANOVA TABLE

Page 63: The Use of Dummy Variables

Another application of the use of dummy variables

• The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes).

Y

Xnodes

Page 64: The Use of Dummy Variables

The model

0 1 1

0 1 2 1 2 1 2

0 1 2 1 2 3 2 3 2 3

X X x

x X x X xY

x x X x X x

Y

Xx1 x2 xk

12

k

0 1 1

0 1 1 2 1 1 2

0 1 1 2 2 1 3 2 2 3

X X x

x X x x X xY

x x x X x x X x

or

Page 65: The Use of Dummy Variables

Now define

11

1 1

if

if

X X xX

x X x

1

2 1 1 2

2 1 2

0 if

if

X x

X X x x X x

x x x X

2

3 2 2 3

3 3 3

0 if

if

X x

X X x x X x

x x x X

Etc.

Page 66: The Use of Dummy Variables

Then the model

can be written

0 1 1

0 1 1 2 1 1 2

0 1 1 2 2 1 3 2 2 3

X X x

x X x x X xY

x x x X x x X x

0 1 1 2 2 3 3Y X X X

Page 67: The Use of Dummy Variables

An ExampleIn this example we are measuring Y at time X.

Y is growing linearly with time.

At time X = 10, an additive is added to the process which may change the rate of growth.

The data

X 0.0 1.0 2.0 3.0 4.0 5.0 6.0Y 3.9 5.9 6.4 6.3 7.5 7.9 8.5X 7.0 8.0 9.0 10.0 11.0 12.0 13.0Y 10.7 10.0 12.4 11.0 11.5 13.9 17.6X 14.0 15.0 16.0 17.0 18.0 19.0 20.0Y 18.2 16.8 21.8 23.1 22.9 26.2 27.7

Page 68: The Use of Dummy Variables

Graph

0

5

10

15

20

25

30

0 5 10 15 20

Page 69: The Use of Dummy Variables

Now define the dummy variables

1

if 10

10 if 10

X XX

X

2

0 if 10

10 if 10

XX

X X

Page 70: The Use of Dummy Variables

The data as it appears in SPSS – x1, x2 are the dummy variables

Page 71: The Use of Dummy Variables

We now regress y on x1 and x2.

Page 72: The Use of Dummy Variables

The OutputModel Summary

.990a .980 .978 1.0626Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), X2, X1a.

ANOVAb

1015.909 2 507.954 449.875 .000a

20.324 18 1.129

1036.232 20

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), X2, X1a.

Dependent Variable: Yb.

Coefficientsa

4.714 .577 8.175 .000

.673 .085 .325 7.886 .000

1.579 .085 .761 18.485 .000

(Constant)

X1

X2

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: Ya.

Page 73: The Use of Dummy Variables

Graph

0

5

10

15

20

25

30

0 5 10 15 20

Page 74: The Use of Dummy Variables

Testing for no change in slope

Here we want to test

H0: 1 = 2 vs HA: 1 ≠ 2

The reduced model is

Y = 0 + 1 (X1+ X2) +

= 0 + 1 X +

Page 75: The Use of Dummy Variables

Fitting the reduced model

We now regress y on x.

Page 76: The Use of Dummy Variables

The OutputModel Summary

.971a .942 .939 1.7772Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), Xa.

ANOVAb

976.219 1 976.219 309.070 .000a

60.013 19 3.159

1036.232 20

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), Xa.

Dependent Variable: Yb. Coefficientsa

2.559 .749 3.418 .003

1.126 .064 .971 17.580 .000

(Constant)

X

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: Ya.

Page 77: The Use of Dummy Variables

Graph – fitting a common slope

0

5

10

15

20

25

30

0 5 10 15 20

Page 78: The Use of Dummy Variables

The test for the equality of slope

Reduced Model Sum of Squares df Mean Square F Sig.

Regression 976.2194805 1 976.2194805 309.0697 3.27405E-13Residual 60.01290043 19 3.158573707Total 1036.232381 20

Complete Model Sum of Squares df Mean Square F Sig.

Regression 1015.908579 2 507.9542895 449.8753 0Residual 20.32380204 18 1.129100113Total 1036.232381 20

equality of slope Sum of Squares df Mean Square F Sig.

slope 976.2194805 1 976.2194805 864.5996 1.14256E-16equality of slope 39.6890984 1 39.6890984 35.15109 1.30425E-05Residual 20.32380204 18 1.129100113Total 1036.232381 20