1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of...

Preview:

Citation preview

1

Chapter 8 Indicator Variable

Ray-Bing Chen

Institute of Statistics

National University of Kaohsiung

2

8.1 The General Concept of Indicator Variables• The Variables in regression analysis:

– Quantitative variables: well-defined scale of measurement. For example: temperature, distance, income, …

– Qualitative variable (Categorical variable): for example: operators, employment status (employed or unemployed), shifts (day, evening or night), and sex (male or female). Usually no natural scale of measurement.

3

• Assign a set of levels to a qualitative variable to account the effect that variable may have on the response. (indicator variable or dummy variable)

• For example: The effective life of a cutting tool (y) v.s. the lathe speed (x1) and the type of cutting

tool (x2).

4

5

6

7

Example 8.1 Tool Life Data• The scatter diagram is in Figure 8.2.• Two different regression lines.

8

9

10

11

12

• Two separate straight-line models v.s. a single model with an indicator variable:– Prefer the single-model approach (a simpler

practical result)– Since assume the same slope, it makes sense to

combine the data from both tool types to produce a single estimate of this common parameter.

– Can give one estimate of the common error variance 2 and more residual degrees of freedom.

13

• Different in intercept and slope:

14

15

Example 8.2 The Tool Life Data:

16

17

18

Example 8.3 An Indicator Variable with More Than Two Levels

• Total electricity consumption (y) v.s. the size of house (x1) and the four types of sir condition

systems.• Four types of air conditions systems:

19

3 - 4: relative efficiency of a heat pump compared

to central air conditioning. • Assume the variance doesn’t depend on the types.

20

21

Example 8.4 More Than One Indicator Variable• Add the type of cutting oil used in Example 8.1•

22

23

24

25

26

27

8.2 Comments on the Use of Indicator Variables8.2.1 Indicator Variables versus Regression on Alloc

ated Codes• Another approach to measure the levels of the vari

ables is by an allocated code.• In Example 8.3,

28

29

• The allocated codes impose a particular metric on the levels of the qualitative factor.

• Indicator variables are more informative because they do not force any particular metric on the levels of the qualitative factor.

• Searle and Udell (1970): regression using indicator variables always leads to a larger R2 than does regression on allocated codes.

30

8.2.2 Indicator Variables as a Substitute for a Quantitative Regressor

• Quantitative regressor can also be represented by indicator variables.

• In Example 8.3, for income factor:

• Use four indicator variables to represent the factor “income”.

31

• Disadvantage: – More parameters are required to represent the i

nformation content of the quantitative factor. (a-1 v.s. 1) So it would increase the complexity of the model.

– Reduce the degrees of freedom for error. • Advantage: It does not require the analyst to make

any prior assumptions about the functional form of the relationship between the response and the regressor variable.

32

8.3 Regression Approach to Analysis of Variance• The Analysis of Variance is a technique frequently

used to analyze data from planned ot designed experiments.

• Any ANOVA problem can be treated as a linear regression problem.

• Ordinarily we do not recommend that regression mothods be used for ANOVA because the specialized computing techniques are usually quite efficient.

33

• However, there some ANOVA situation, particularly those involving unbalance designs, where the regression approach is helpful.

• Essentially, any ANOVA problem can be treated as a regression problem in which all of the regressors are indicator variables.

n

34

• Define the treatment effects in the balance case (an equal number of observations per treatment) as 1 + 2 + … + k = n

i = + i is the mean of the ith treatment.

• Test H0 : 1 = 2 = … = k = 0 v.s. H1 : 2 0 for at

least one i

35

36

Example: 3 treatments

• Model: yij = + i + ij , i = 1, 2, 3, j = 1, 2, …, n

37

38

39

40

Recommended