184
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Discrete Multivariate Analysis

  • Upload
    marcus

  • View
    94

  • Download
    1

Embed Size (px)

DESCRIPTION

Discrete Multivariate Analysis. Analysis of Multivariate Categorical Data. References. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. - PowerPoint PPT Presentation

Citation preview

Page 1: Discrete Multivariate Analysis

Discrete Multivariate Analysis

Analysis of Multivariate Categorical Data

Page 2: Discrete Multivariate Analysis

References

1. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass.

2. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press.

3. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.

Page 3: Discrete Multivariate Analysis

Example 1

Data Set #1 - A two-way frequency table Serum Systolic Blood pressure

Cholesterol <127 127-146 147-166 167+ Total <200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237

In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol

Page 4: Discrete Multivariate Analysis

Example 2

The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).

Page 5: Discrete Multivariate Analysis

The study involved a dichotomous response Y– Success (no major parole violation) or – Failure (returned to prison either as technical

violators or with a new conviction)

based on a one-year follow-up.The predictors of parole success included are:

1. type of committed offence (Person offense or Other offense),

2. Age (25 or Older or Under 25), 3. Prior Record (No prior sentence or Prior

Sentence), and 4. Drug or Alcohol Dependency (No drug or

Alcohol dependency or Drug and/or Alcohol dependency).

Page 6: Discrete Multivariate Analysis

• The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses.

• The second part of the data was set aside for a validation study of the model to be fitted in the first part.

Page 7: Discrete Multivariate Analysis

Table

No drug or alcohol dependency Drug and/or alcohol dependency 25 or older Under 25 25 or Older Under 25 Person

offense Other

offense Person offense

Other offense

Person offense

Other offense

Person offense

Other offense

No prior Sentence of Any Kind Success 48 34 37 49 48 28 35 57 (44) (34) (29) (58) (47) (38) (37) (53) Failure 1 5 7 11 3 8 5 18 (1) (7) (7) (5) (1) (2) (4) (24) Prior Sentence Success 117 259 131 319 197 435 107 291 (111) (253) (131) (320) (202) (392) (103) (294) Failure 23 61 20 89 38 194 27 101 (27) (55) (25) (93) (46) (215) (34) (102)

Page 8: Discrete Multivariate Analysis

Multiway Frequency Tables

• Two-Way

A

B

Page 9: Discrete Multivariate Analysis

• Three -Way

A

B

C

Page 10: Discrete Multivariate Analysis

A

B

C

• Three -Way

Page 11: Discrete Multivariate Analysis

• four -Way

A

B

C

D

Page 12: Discrete Multivariate Analysis

Analysis of a Two-way Frequency Table:

Page 13: Discrete Multivariate Analysis

Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439

260+ 67 99 46 33 245 Total 388 527 204 118 1237

Page 14: Discrete Multivariate Analysis

Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure)

Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)

<200 9.46 9.78 3.80 1.78 24.82 200-219 6.87 7.92 3.48 1.62 19.89 220-259 9.62 16.90 5.50 3.48 35.49

260+ 5.42 8.00 3.72 2.67 19.81 Marginal distn (BP)

31.37 42.60 16.49 9.54 100.00

The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.

Page 15: Discrete Multivariate Analysis

Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol )

The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 38.11 39.41 15.31 7.17 100.00 200-219 34.55 39.84 17.48 8.13 100.00 220-259 27.11 47.61 15.49 9.79 100.00

260+ 27.35 40.41 18.78 13.47 100.00 Marginal distn (BP)

31.37 42.60 16.49 9.54 100.00

Page 16: Discrete Multivariate Analysis

Conditional Distributions (Serum Cholesterol given Systolic Blood Pressure)

Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)

<200 30.15 22.96 23.04 18.64 24.82 200-219 21.91 18.60 21.08 16.95 19.89 220-259 30.67 39.66 33.33 36.44 35.49

260+ 17.27 18.79 22.55 27.97 19.81 Total 100.00 100.00 100.00 100.00 100.00

Page 17: Discrete Multivariate Analysis

GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol

127-146 147-166<127 167+

SYSTOLIC BLOOD PRESSURE

<200

200-219

260+

220-259

Marginal Distribution

SERUM CHOLESTEROL

40%

50%

30%

20%

10%

Page 18: Discrete Multivariate Analysis

Notation:

Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.

1

c

i i ijj

x R x

1

r

j j iji

x C x

1 1 1 1

r c r c

ij i ji j i j

x N x x x

Page 19: Discrete Multivariate Analysis

Different Models

,ij P X i Y j

11 1211 12 11 12

11

, , , rcxx xrc rc

rc

Nf x x x

x x

The Multinomial Model:Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters ij

11 1211 12

11

!! !

rcxx xrc

rc

Nx x

ij ij ijE x N

Page 20: Discrete Multivariate Analysis

11 1211 12 1| 2| |

1 1

, , , ic

ri xx x

rc i i c ii i ic

Rf x x x

x x

The Product Multinomial Model:Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters j|i

|ij ij i j iE x R

Page 21: Discrete Multivariate Analysis

11 121 1

, , ,!

ij

ij

xr cij

rci j ij

f x x x ex

The Poisson Model:In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let ij

denote the mean of xij.

ij ijE x

!

ij

ij

xij

ij ijij

f x ex

Page 22: Discrete Multivariate Analysis

Independence

Page 23: Discrete Multivariate Analysis

Multinomial Model ,ij P X i Y j P X i P Y j

i j

ij ij i jN N

if independent

and

The estimated expected frequency in cell (i,j) in the case of independence is:

ˆ ˆ ˆ jiij ij i j

xxm N N

N N

i j i jx x R CN N

Page 24: Discrete Multivariate Analysis

The same can be shown for the other two models – the Product Multinomial model and the Poisson model

namelyThe estimated expected frequency in cell (i,j) in the case of independence is:

ˆ i j i jij ij

R C x xm

N x

Standardized residuals are defined for each cell:

ij ijij

ij

x mr

m

Page 25: Discrete Multivariate Analysis

The Chi-Square Statistic

2

2 2

1 1 1 1

r c r cij ij

iji j i j ij

x mr

m

The Chi-Square test for independence

Reject H0: independence if

2

2 2/ 2

1 1

1 1r c

ij ij

i j ij

x mdf r c

m

Page 26: Discrete Multivariate Analysis

TableExpected frequencies, Observed frequencies,

Standardized Residuals

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35

200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72

220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17

260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99

Total 388 527 204 118 1237 2 = 20.85 (p = 0.0133)

Page 27: Discrete Multivariate Analysis

Example

In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied.

The crime of the first victimization (X) and the crime of the second victimization (Y) were noted.

The data were tabulated on the following slide

Page 28: Discrete Multivariate Analysis

Table 1: Frequencies

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Total Ra 26 50 11 6 82 39 48 11 273 A 65 2997 238 85 2553 1083 1349 216 8586

First Ro 12 279 197 36 459 197 221 47 1448 Victimization PP/PS 3 102 40 61 243 115 101 38 703

in pair PL 75 2628 413 229 12137 2658 3689 687 22516 B 52 1117 191 102 2649 3210 1973 301 9595 HL 42 1251 206 117 3757 1962 4646 391 12372 MV 3 221 51 24 678 301 367 269 1914 Total 278 8645 1347 660 22558 9565 12394 1960

Page 29: Discrete Multivariate Analysis

Table 2: Expected Frequencies (assuming independence)

Ra A Ro PP/PS PL B HL MV TotalRa 1.32 41.11 6.41 3.14 107.27 45.49 58.94 9.32 273A 41.58 1292.98 201.46 98.71 3373.86 1430.58 1853.69 293.14 8586

Ro 7.01 218.06 33.98 16.65 568.99 241.26 312.62 49.44 1448PP/PS 3.40 105.87 16.50 8.08 276.24 117.13 151.78 24.00 703

PL 109.04 3390.72 528.32 258.86 8847.63 3751.56 4861.14 768.75 22516B 46.46 1444.92 225.14 110.31 3770.34 1598.69 2071.53 327.59 9595

HL 59.91 1863.12 290.30 142.24 4861.56 2061.39 2671.08 422.41 12372MV 9.27 288.23 44.91 22.00 752.10 318.91 413.23 65.35 1914

Total 278 8645 1347 660 22558 9565 12394 1960 57407

Page 30: Discrete Multivariate Analysis

Table 3: Standardized residuals

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 21.5 1.4 1.8 1.6 -2.4 -1.0 -1.9 0.6 A 3.6 47.4 2.6 -1.4 -14.1 -9.2 -11.7 -4.5

First Ro 1.9 4.1 28.0 4.7 -4.6 -2.8 -5.2 -0.3 Victimization PP/PS -0.2 -0.4 5.8 18.6 -2.0 -0.2 -4.1 2.9

in pair PL -3.3 -13.1 -5.0 -1.9 35.0 -17.9 -16.8 -2.9 B 0.8 -8.6 -2.3 -0.8 -18.3 40.3 -2.2 -1.5 HL -2.3 -14.2 -4.9 -2.1 -15.8 -2.2 38.2 -1.5 MV -2.1 -4.0 0.9 0.4 -2.7 -1.0 -2.3 25.2

11,430 (highly significant)

Page 31: Discrete Multivariate Analysis

Table 3: Conditional distribution of second victimization given the first victimization (%)

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 9.5 18.3 4.0 2.2 30.0 14.3 17.6 4.0 100.0 A 0.8 34.9 2.8 1.0 29.7 12.6 15.7 2.5 100.0

First Ro 0.8 19.3 13.6 2.5 31.7 13.6 15.3 3.2 100.0 Victimization PP/PS 0.4 14.5 5.7 8.7 34.6 16.4 14.4 5.4 100.0

in pair PL 0.3 11.7 1.8 1.0 53.9 11.8 16.4 3.1 100.0 B 0.5 11.6 2.0 1.1 27.6 33.5 20.6 3.1 100.0 HL 0.3 10.1 1.7 0.9 30.4 15.9 37.6 3.2 100.0 MV 0.2 11.5 2.7 1.3 35.4 15.7 19.2 14.1 100.0 Marginal 0.5 15.1 2.3 1.1 39.3 16.7 21.6 3.4 100.0

Page 32: Discrete Multivariate Analysis

Log Linear Model

Page 33: Discrete Multivariate Analysis

Recall, if the two variables, rows (X) and columns (Y) are independent then

ij ij i jN N

and

ln ln ln lnij i jN

Page 34: Discrete Multivariate Analysis

In general let

1( ) 2( ) 12( , )ln ij i j i ju u u u

1 ln iji j

urc

1( )1 lni ij

j

u uc

2( )1 lnj ij

i

u ur

12( , ) 1( ) 2( )lni j ij i ju u u u

then

where1( ) 2( ) 12( , ) 12( , ) 0i j i j i j

i j i j

u u u u

(1)

Equation (1) is called the log-linear model for the frequencies xij.

Page 35: Discrete Multivariate Analysis

Note: X and Y are independent if

1( ) 2( )ln ij i ju u u

In this case the log-linear model becomes

12( , ) 0 for all ,i ju i j

Page 36: Discrete Multivariate Analysis

Comment:The log-linear model for a two-way frequency table:

is similar to the model for a two factor experiment

1( ) 2( ) 12( , )ln ij i j i ju u u u

ijji

ij jBiAy

and when ofmean the where

ijkij

ijkijjiijky

Page 37: Discrete Multivariate Analysis

Three-way Frequency Tables

Page 38: Discrete Multivariate Analysis

ExampleData from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962])

Variables1. Systolic Blood Pressure (X)

– < 127, 127-146, 147-166, 167+

2. Serum Cholesterol– <200, 200-219, 220-259, 260+

3. Heart Disease– Present, Absent

The data is tabulated on the next slide

Page 39: Discrete Multivariate Analysis

Three-way Frequency Table

Coronary Heart

Serum Cholesterol

Systolic Blood pressure (mm Hg)

Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4

Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22

Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33

Page 40: Discrete Multivariate Analysis

Log-Linear model for three-way tables

Let ijk denote the expected frequency in cell (i,j,k) of the table then in general

1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u

1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j

u u u u u

13( , ) 23( , ) 123( , , )i k j k i j ku u u

where

13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k

u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k

i j k

u u u

Page 41: Discrete Multivariate Analysis

Hierarchical Log-linear models for categorical Data

For three way tables

The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

Page 42: Discrete Multivariate Analysis

1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)

i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[1][2][3]

Description:Mutual independence between all three variables.

Page 43: Discrete Multivariate Analysis

2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)

i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[12][3]

Description:Independence of Variable 3 with variables 1 and 2.

Page 44: Discrete Multivariate Analysis

3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)

i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.

Notation: [13][2]

Description:Independence of Variable 2 with variables 1 and 3.

Page 45: Discrete Multivariate Analysis

4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)

i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.

Notation: [23][1]

Description:Independence of Variable 3 with variables 1 and 2.

Page 46: Discrete Multivariate Analysis

5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

i.e. u23(j,k) = u123(i,j,k) = 0.

Notation:[12][13]

Description:Conditional independence between variables 2 and 3 given variable 1.

Page 47: Discrete Multivariate Analysis

6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)

i.e. u13(i,k) = u123(i,j,k) = 0.

Notation:[12][23]

Description:Conditional independence between variables 1 and 3 given variable 2.

Page 48: Discrete Multivariate Analysis

7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)

i.e. u12(i,j) = u123(i,j,k) = 0.

Notation: [13][23]

Description:Conditional independence between variables 1 and 2 given variable 3.

Page 49: Discrete Multivariate Analysis

8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) i.e. u123(i,j,k) = 0.

Notation: [12][13][23]

Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

Page 50: Discrete Multivariate Analysis

9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) + u123(i,j,k)

Notation: [123]

Description:No simplifying dependence structure.

Page 51: Discrete Multivariate Analysis

Hierarchical Log-linear models for 3 way table

Model Description[1][2][3] Mutual independence between all three variables.

[1][23] Independence of Variable 1 with variables 2 and 3.

[2][13] Independence of Variable 2 with variables 1 and 3.

[3][12] Independence of Variable 3 with variables 1 and 2.

[12][13] Conditional independence between variables 2 and 3 given variable 1.

[12][23] Conditional independence between variables 1 and 3 given variable 2.

[13][23] Conditional independence between variables 1 and 2 given variable 3.

[12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

[123] The saturated model

Page 52: Discrete Multivariate Analysis

Maximum Likelihood Estimation

Log-Linear Model

Page 53: Discrete Multivariate Analysis

For any Model it is possible to determine the maximum Likelihood Estimators of the parameters

Example Two-way table – independence – multinomial model

11 1211 12 11 12

11

, , , rcxx xrc rc

rc

Nf x x x

x x

11 12

11 12

11

!! !

rcxx xrc

rc

Nx x N N N

ij ij ijE x N orij

ij N

Page 54: Discrete Multivariate Analysis

Log-likelihood

11 12, , ln ! ln !rc iji j

l N x

ln lnij ij iji j i j

N x x lnij ij

i j

K x where ln ! ln ! lnij

i j

K N x N N

1 2ln ij i ju u u

With the model of independence

Page 55: Discrete Multivariate Analysis

and

1 1 1 2 1 2, , , , , ,c rl u u u u u K

1 2ij i ji j

x u u u

with 1 2 0i ji j

u u

1 2i ji ji j

K Nu x u x u

1 2 1 2i j i ju u u u uuij

i j i j i j

e e e e N

also

Page 56: Discrete Multivariate Analysis

Let 1 2 21 1 1 2 1 2, , , , , , , , ,c rg u u u u u

1 2

1 11 2i ju uu

i ji j i j

u u e e e N

1 2i ji j

i j

K Nu x u x u

Now

1 2 1 0i ju uu

i j

g N e e e Nu

1

Page 57: Discrete Multivariate Analysis

1 2

11

i ju uui

ji

g x e e eu

1

11 0i

i

u

i u

i

ex Ne

1

1

1i

i

ui i

u

i

x xeN Ne

1 111 and 0

ii i

i

xx

rN N N

Since

Page 58: Discrete Multivariate Analysis

Now 1

1iu

ie x K

or 11 ln lniiu x K

11 ln ln 0iii i

u x r K

Page 59: Discrete Multivariate Analysis

Hence

11ln lni ii

i

u x xr

11ln ln i

i

K xr

and

21ln lnj jj

i

u x xc Similarly

1 2 1 2i j i ju u u u uuij

i j i j i j

e e e e N

Finally

Page 60: Discrete Multivariate Analysis

Hence

2

1

1

ju j

c c

jj

xe

x

Now

1 2i j

uu u

i j

Nee e

and

1

1

1

iu i

r r

ii

xe

x

11

1 1

r c cru

i ji ji j

i j

Ne x xx x

11

1 1

1 r c cr

i ji j

x xN

Page 61: Discrete Multivariate Analysis

Hence

Note

1 1ln ln lni ji j

u x x Nr c

1 2ln ij i ju u u 1 1ln ln lni j

i j

x x Nr c

1 1ln ln ln lni i j ji i

x x x xr c

ln ln lni jN x x

or i jij

x xN

Page 62: Discrete Multivariate Analysis

Comments• Maximum Likelihood estimates can be

computed for any hierarchical log linear model (i.e. more than 2 variables)

• In certain situations the equations need to be solved numerically

• For the saturated model (all interactions and main effects), the estimate of ijk… is xijk… .

Page 63: Discrete Multivariate Analysis

Discrete Multivariate Analysis

Analysis of Multivariate Categorical Data

Page 64: Discrete Multivariate Analysis

Multiway Frequency Tables

• Two-Way

A

B

Page 65: Discrete Multivariate Analysis

• four -Way

A

B

C

D

Page 66: Discrete Multivariate Analysis

Log Linear Model

Page 67: Discrete Multivariate Analysis

Two- way table

where1( ) 2( ) 12( , ) 12( , ) 0i j i j i j

i j i j

u u u u

1( ) 2( ) 12( , )ln ij i j i ju u u u

jiji

uuuuij

jiji eeee ,1221,1221

The multiplicative form:

Page 68: Discrete Multivariate Analysis

Log-Linear model for three-way tablesLet ijk denote the expected frequency in cell (i,j,k) of the table then in general

1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u

1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j

u u u u u

13( , ) 23( , ) 123( , , )i k j k i j ku u u

where

13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k

u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k

i j k

u u u

Page 69: Discrete Multivariate Analysis

Log-Linear model for three-way tablesLet ijk denote the expected frequency in cell (i,j,k) of the table then in general

1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u

13( , ) 23( , ) 123( , , )i k j k i j ku u u

or the multiplicative form1( ) 2( ) 3( ) 12 ( , )ln ij i j k i ju u u uu

ij e e e e e e 13( , ) 23( , ) 123( , , )i k j k i j ku u ue e e

13( , ) 23( , ) 123( , , )i k j k i j k 1( ) 2( ) 3( ) 12( , )i j k i j

Page 70: Discrete Multivariate Analysis

Comments• The log-linear model is similar to the ANOVA

models for factorial experiments. • The ANOVA models are used to understand the

effects of categorical independent variables (factors) on a continuous dependent variable (Y).

• The log-linear model is used to understand dependence amongst categorical variables

• The presence of interactions indicate dependence between the variables present in the interactions

Page 71: Discrete Multivariate Analysis

Hierarchical Log-linear models for categorical Data

For three way tables

The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

Page 72: Discrete Multivariate Analysis

1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)

i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[1][2][3]

Description:Mutual independence between all three variables.

Page 73: Discrete Multivariate Analysis

2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)

i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[12][3]

Description:Independence of Variable 3 with variables 1 and 2.

Page 74: Discrete Multivariate Analysis

3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)

i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.

Notation: [13][2]

Description:Independence of Variable 2 with variables 1 and 3.

Page 75: Discrete Multivariate Analysis

4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)

i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.

Notation: [23][1]

Description:Independence of Variable 3 with variables 1 and 2.

Page 76: Discrete Multivariate Analysis

5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

i.e. u23(j,k) = u123(i,j,k) = 0.

Notation:[12][13]

Description:Conditional independence between variables 2 and 3 given variable 1.

Page 77: Discrete Multivariate Analysis

6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)

i.e. u13(i,k) = u123(i,j,k) = 0.

Notation:[12][23]

Description:Conditional independence between variables 1 and 3 given variable 2.

Page 78: Discrete Multivariate Analysis

7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)

i.e. u12(i,j) = u123(i,j,k) = 0.

Notation: [13][23]

Description:Conditional independence between variables 1 and 2 given variable 3.

Page 79: Discrete Multivariate Analysis

8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) i.e. u123(i,j,k) = 0.

Notation: [12][13][23]

Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

Page 80: Discrete Multivariate Analysis

9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) + u123(i,j,k)

Notation: [123]

Description:No simplifying dependence structure.

Page 81: Discrete Multivariate Analysis

Hierarchical Log-linear models for 3 way table

Model Description[1][2][3] Mutual independence between all three variables.

[1][23] Independence of Variable 1 with variables 2 and 3.

[2][13] Independence of Variable 2 with variables 1 and 3.

[3][12] Independence of Variable 3 with variables 1 and 2.

[12][13] Conditional independence between variables 2 and 3 given variable 1.

[12][23] Conditional independence between variables 1 and 3 given variable 2.

[13][23] Conditional independence between variables 1 and 2 given variable 3.

[12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

[123] The saturated model

Page 82: Discrete Multivariate Analysis

Goodness of Fit Statistics

These statistics can be used to check if a log-linear model will fit the

observed frequency table

Page 83: Discrete Multivariate Analysis

Goodness of Fit StatisticsThe Chi-squared statistic

22 Observed Expected

Expected

The Likelihood Ratio statistic:

2 2 ln 2 lnˆ

ijkijk

ijk

xObservedG Observed xExpected

d.f. = # cells - # parameters fitted

ˆijk ijk

ijk

x

We reject the model if 2 or G2 is greater than2

/ 2

Page 84: Discrete Multivariate Analysis

Example: Variables

Coronary Heart

Serum Cholesterol

Systolic Blood pressure (mm Hg)

Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4

Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22

Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33

1. Systolic Blood Pressure (B)Serum Cholesterol (C)Coronary Heart Disease (H)

Page 85: Discrete Multivariate Analysis

MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s.

Goodness of fit testing of Models

Possible Models:1. [BH][CH] – B and C independent given H.2. [BC][BH][CH] – all two factor interaction model

Page 86: Discrete Multivariate Analysis

Model 1: [BH][CH] Log-linear parameters

Heart disease -Blood Pressure Interaction

Bp Hd <127 127-146 147-166 167+ Pres -0.256 -0.241 0.066 0.431 Abs 0.256 0.241 -0.066 -0.431

,HB i ju

Bp Hd <127 127-146 147-166 167+ Pres -2.607 -2.733 0.660 4.461 Abs 2.607 2.733 -0.660 -4.461

,

,

HB i j

HB i j

u

uz

Page 87: Discrete Multivariate Analysis

Multiplicative effect

,

, ,exp HB i juHB i j HB i ju e

Bp Hd <127 127-146 147-166 167+ Pres 0.774 0.786 1.068 1.538 Abs 1.291 1.272 0.936 0.65

, ,ln ijk H i B j C k HB i j HC i ku u u u u u

, ,H i B j C k HB i j HC i ku u u u uuijk e e e e e e

Log-Linear Model

, ,H i B j C k HB i j HC i k

Page 88: Discrete Multivariate Analysis

Heart Disease - Cholesterol Interaction

Chol Hd <200 200-219 220-259 260+ Pres -0.233 -0.325 0.063 0.494 Abs 0.233 0.325 -0.063 -0.494

,HC i ku

,

,

HC i k

HC i k

u

uz

Chol Hd <200 200-219 220-259 260+ Pres -1.889 -2.268 0.677 5.558 Abs 1.889 2.268 -0.677 -5.558

Page 89: Discrete Multivariate Analysis

Multiplicative effect

,

, ,exp HB i kuHC i k HB i ku e

Chol Hd <200 200-219 220-259 260+ Pres 0.792 0.723 1.065 1.640 Abs 1.262 1.384 0.939 0.610

Page 90: Discrete Multivariate Analysis

Model 2: [BC][BH][CH] Log-linear parameters

Blood pressure-Cholesterol interaction: ,BC j ku

Bp Chol <200 200-219 220-259 260+ <200 0.222 -0.019 -0.034 -0.169 200-219 0.114 -0.041 0.013 -0.086 220-259 -0.114 0.154 -0.058 0.018 260+ -0.221 -0.094 0.079 0.237

Page 91: Discrete Multivariate Analysis

,

,

BC j k

BC j k

u

uz

Bp Chol <200 200-219 220-259 260+ <200 2.68 -0.236 -0.326 -1.291 200-219 1.27 -0.472 0.117 -0.626 220-259 -1.502 2.253 -0.636 0.167 260+ -2.487 -1.175 0.785 2.051

Bp Chol <200 200-219 220-259 260+ <200 1.248 0.981 0.967 0.844 200-219 1.120 0.960 1.013 0.918 220-259 0.892 1.166 0.944 1.018 260+ 0.802 0.910 1.082 1.267

Multiplicative effect ,

, ,exp HB j kuBC j k BC j ku e

Page 92: Discrete Multivariate Analysis

Heart disease -Blood Pressure Interaction

Bp Hd <127 127-146 147-166 167+ Pres -0.211 -0.232 0.055 0.389 Abs 0.211 0.232 -0.055 -0.389

,HB i ju

Bp Hd <127 127-146 147-166 167+ Pres -2.125 -2.604 0.542 3.938 Abs 2.125 2.604 -0.542 -3.938

,

,

HB i j

HB i j

u

uz

Page 93: Discrete Multivariate Analysis

Multiplicative effect

,

, ,exp HB i juHB i j HB i ju e

Bp Hd <127 127-146 147-166 167+ Pres 0.809 0.793 1.056 1.475 Abs 1.235 1.261 0.947 0.678

Page 94: Discrete Multivariate Analysis

Heart Disease - Cholesterol Interaction

Chol Hd <200 200-219 220-259 260+ Pres -0.212 -0.316 0.069 0.460 Abs 0.212 0.316 -0.069 -0.460

,HC i ku

,

,

HC i k

HC i k

u

uz

Chol Hd <200 200-219 220-259 260+ Pres -1.712 -2.199 0.732 5.095 Abs 1.712 2.199 -0.732 -5.095

Page 95: Discrete Multivariate Analysis

Multiplicative effect

,

, ,exp HB i kuHC i k HB i ku e

Chol Hd <200 200-219 220-259 260+ Pres 0.809 0.729 1.071 1.584 Abs 1.237 1.372 0.933 0.631

Page 96: Discrete Multivariate Analysis

Another Example

In this study it was determined for N = 4353 males

1. Occupation category2. Educational Level3. Academic Aptidude

Page 97: Discrete Multivariate Analysis

1. Occupation categoriesa. Self-employed Businessb. Teacher\Educationc. Self-employed Professionald. Salaried Employed

2. Education levelsa. Lowb. Low/Medc. Medd. High/Mede. High

Page 98: Discrete Multivariate Analysis

3. Academic Aptitudea. Lowb. Low/Medc. High/Medd. High

Page 99: Discrete Multivariate Analysis

Table Self-employed, Business Teacher Education Education

Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low 42 55 22 3 122 Low 0 0 1 19 20

LMed 72 82 60 12 226 LMed 0 3 3 60 66 Med 90 106 85 25 306 Med 1 4 5 86 96

HMed 27 48 47 8 130 HMed 0 0 2 36 38 High 8 18 19 5 50 High 0 0 1 14 15 Total 239 309 233 53 834 Total 1 7 12 215 235

Self-employed, Professional Salaried Employed Education Education

Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low 1 2 8 19 30 Low 172 151 107 42 472

LMed 1 2 15 33 51 LMed 208 198 206 92 704 Med 2 5 25 83 115 Med 279 271 331 191 1072

HMed 2 2 10 45 59 HMed 99 126 179 97 501 High 0 0 12 19 31 High 36 35 99 79 249 Total 6 11 70 199 286 Total 794 781 922 501 2998

Page 100: Discrete Multivariate Analysis

Two-way Tables (With 2): Education vs Aptitude Education vs Occcupation

(2 = 178.6) (2 = 1254.1) Low Lmed HMed High Total Low Lmed HMed High Total

Low 215 208 138 83 644 SEB 239 309 233 53 834 Lmed 281 285 284 197 1047 SEP 6 11 70 199 286 Med 372 386 446 385 1589 TCHR 1 7 12 215 235

HMed 128 176 238 186 728 SEM 794 781 922 501 2998 High 44 53 131 117 345 Total 1040 1108 1237 968 4353 Total 1040 1108 1237 968 4353

Aptitude vs Occupation

(2 = 35.8) SEB SEP TCHR SEM Total

Low 122 30 20 472 644 Lmed 226 51 66 704 1047 Med 306 115 96 1072 1589

HMed 130 59 38 501 728 High 50 31 15 249 345 Total 834 286 235 2998 4353

Page 101: Discrete Multivariate Analysis

• It is common to handle a Multiway table by testing for independence in all two way tables.

• This is similar to looking at all the bivariate correlations

• In this example we learn that:

1. Education is related to Aptitude2. Education is related to Occupational category3. Education is related to Aptitude

Can we do better than this?

Page 102: Discrete Multivariate Analysis

Fitting various log-linear models

Goodness of fit

Model Likelihood

Ratio DF Sig. Pearson DF Sig. [Occ][Ed][Apt] 1356.9702 69 0.0000 1519.802 69 0.0000 [Occ, Ed] [Apt] 228.2215 60 0.0000 226.6615 60 0.0000 [Apt, Ed][Occ] 1179.6403 57 0.0000 1336.765 57 0.0000 [Apt, Occ][Ed] 1319.561 57 0.0000 1424.1488 57 0.0000 [Occ, Ed] [Occ,Apt] 190.8123 48 0.0000 184.6386 48 0.0000 [Apt, Ed] [Occ,Apt] 1142.2311 45 0.0000 1301.1317 45 0.0000 [Apt, Ed] [Occ, Ed] 50.8915 48 0.3605 48.0105 48 0.4724 [Apt, Ed] [Occ, Ed] [Occ, Apt] 25.1048 36 0.9134 23.6465 36 0.9436

Simplest model that fits is: [Apt,Ed][Occ,Ed]This model implies conditional independence betweenAptitude and Occupation given Education.

Page 103: Discrete Multivariate Analysis

Log-linear ParametersAptitude – Education Interaction

Education Aptitude Low Low-Med High-Med High

Low 0.4602 0.3225 -0.2752 -0.5075 Low-Med 0.1857 0.0953 -0.0957 -0.1853

Med 0.0399 -0.0277 -0.0706 0.0584 High-Med -0.2250 -0.0111 0.1032 0.1329

High -0.4607 -0.3791 0.3383 0.5015

Page 104: Discrete Multivariate Analysis

Aptitude – Education Interaction (Multiplicative)

Education Aptitude Low Low-Med High-Med High

Low 1.584 1.381 0.759 0.602 Low-Med 1.204 1.100 0.909 0.831

Med 1.041 0.973 0.932 1.060 High-Med 0.799 0.989 1.109 1.142

High 0.631 0.684 1.403 1.651

Page 105: Discrete Multivariate Analysis

Occupation – Education Interaction

Occupation Education SEB T SEP SAL

Low 1.241 -1.528 -0.718 1.005 LowMed 0.800 -0.280 -0.810 0.290 HighMed -0.050 -0.309 0.472 -0.112

High -1.991 2.117 1.057 -1.182

Page 106: Discrete Multivariate Analysis

Occupation – Education Interaction (Multiplicative)

Occupation Education SEB T SEP SAL

Low 3.460 0.217 0.488 2.731 LowMed 2.226 0.756 0.445 1.336 HighMed 0.951 0.734 1.603 0.894

High 0.137 8.303 2.877 0.307

Page 107: Discrete Multivariate Analysis

Conditional Test Statistics

Page 108: Discrete Multivariate Analysis

• Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1.

• That is the parameters of Model 2 are a subset of the parameters of Model 1.

• Also assume that Model 1 has been shown to adequately fit the data.

Page 109: Discrete Multivariate Analysis

In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:

2 2 22 1 2 1G G G

1

2

2Expected

ObservedExpected

2 1df df df

Page 110: Discrete Multivariate Analysis

Example

Table 1: Cross-Classification of a Sample of 1008 consumers according to: (1) The Softness of the Laundry Used (2) The Previous Use of Detergent Brand M (3) The Temperature of the Laundry Water Used (4) The preference of Detergent Brand X over Brand M in a Consumer Blind Trial. Previous user of M Previous nonuser of M

Water Softness

Brand Preference

High Temperature

Low Temperature

High Temperature

Low Temperature

Soft X 19 57 29 63 M 29 49 27 53 Medium X 23 47 33 66 M 47 55 23 50 Hard X 24 37 42 68 M 43 52 30 42

Page 111: Discrete Multivariate Analysis

Model d.f. G2 p - valueAll k-factor models[1][2][3][4] 18 42.9 0.00083 G2(1)[12][13][14][23][24][34] 9 9.9 0.35864 G2(2)[123][124][134][234] 2 0.7 0.70469 G2(3)[1234] 0 0.0 G2(4)

Goodness of Fit test for the all k-factor models

Model d.f. G2 p - valuetwo-factor interactions 9 33.0 0.00013 G2(1|2)= G2(1)-G2(2)three-factor interactions 7 9.2 0.23861 G2(2|3)= G2(2)-G2(3)four-factor interaction 2 0.7 0.70469 G2(3|4)= G2(3)-G2(4)

Conditional tests for zero k-factor interactions

Page 112: Discrete Multivariate Analysis

Conclusions

1. The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705)

2. The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705)

3. All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239).

4. The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359)

5. There are significant 2 factor interactions G2(1|2) = 33.0 (p = 0.00083.

Conclude that the model should contain main effects and some two-factor interactions

Page 113: Discrete Multivariate Analysis

There also may be a natural sequence of progressively complicated models that one might want to identify.In the laundry detergent example the variables are:

1. Softness of Laundry Used2. Previous use of Brand M3. Temperature of laundry water used4. Preference of brand X over brand M

Page 114: Discrete Multivariate Analysis

A natural order for increasingly complex models which should be considered might be:

1. [1][2][3][4]2. [1][3][24]3. [1][34][24]4. [13][34][24]5. [13][234]6. [134][234]

The all-Main effects model Independence amongst all four variables

Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction

Brand M is recommended for hot water add 2nd the 3-4 interactionbrand M is also recommended for Soft laundry add 3rd the 1-3 interaction

Add finally some possible 3-factor interactions

Page 115: Discrete Multivariate Analysis

Models  d]f] G2

[1][3][24] 17 22.4[1][24][34] 16 18[13][24][34] 14 11.9[13][23][24][34] 13 11.2[12][13][23][24][34] 11 10.1[1][234] 14 14.5[134][24] 10 12.2[13][234] 12 8.4[24][34][123] 9 8.4[123][234] 8 5.6

Likelihood Ratio G2 for various models

Page 116: Discrete Multivariate Analysis

Table 2: A Partitioning of the Likelihood Ratio Chi-Square Statistic for Complete Independence (Model (a) = [1][2][3][4], Model (b) = [1][3][24], Model (c) = [1][24][34], Model (d) = [13][24][34], Model (e) = [13][234], Model (f) = [123][234]) Model d.f. G2 Model (a) 18 42.9* Difference between models (b) and (a) 1 20.5* Model (b) 17 22.4 Difference between models (c) and (b) 1 4.4* Model (c) 16 18.0 Difference between models (d) and (c) 2 6.1* Model (d) 14 11.9 Difference between models (e) and (d) 2 3.5 Model (e) 12 8.4 Difference between models (f) and (e) 4 2.8 Model (f) 8 5.6

Page 117: Discrete Multivariate Analysis

Discrete Multivariate Analysis

Analysis of Multivariate Categorical Data

Page 118: Discrete Multivariate Analysis

Log-Linear model for three-way tables

Let ijk denote the expected frequency in cell (i,j,k) of the table then in general

1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u

1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j

u u u u u

13( , ) 23( , ) 123( , , )i k j k i j ku u u

where

13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k

u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k

i j k

u u u

Page 119: Discrete Multivariate Analysis

Hierarchical Log-linear models for categorical Data

For three way tables

The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

Page 120: Discrete Multivariate Analysis

Models for three-way tables

Page 121: Discrete Multivariate Analysis

1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)

i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[1][2][3]Description:Mutual independence between all three variables.

Comment: For any model the parameters (u, u1(i) , u2(j) , u3(k)) can be estimated in addition to the expected frequencies (ijk) in each cell

Page 122: Discrete Multivariate Analysis

2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)

i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:[12][3]

Description:Independence of Variable 3 with variables 1 and 2.

Page 123: Discrete Multivariate Analysis

3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)

i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.

Notation: [13][2]

Description:Independence of Variable 2 with variables 1 and 3.

Page 124: Discrete Multivariate Analysis

4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)

i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.

Notation: [23][1]

Description:Independence of Variable 3 with variables 1 and 2.

Page 125: Discrete Multivariate Analysis

5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

i.e. u23(j,k) = u123(i,j,k) = 0.

Notation:[12][13]

Description:Conditional independence between variables 2 and 3 given variable 1.

Page 126: Discrete Multivariate Analysis

6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)

i.e. u13(i,k) = u123(i,j,k) = 0.

Notation:[12][23]

Description:Conditional independence between variables 1 and 3 given variable 2.

Page 127: Discrete Multivariate Analysis

7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)

i.e. u12(i,j) = u123(i,j,k) = 0.

Notation: [13][23]

Description:Conditional independence between variables 1 and 2 given variable 3.

Page 128: Discrete Multivariate Analysis

8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) i.e. u123(i,j,k) = 0.

Notation: [12][13][23]

Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

Page 129: Discrete Multivariate Analysis

9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

+ u23(j,k) + u123(i,j,k)

Notation: [123]

Description:No simplifying dependence structure.

Page 130: Discrete Multivariate Analysis

Goodness of Fit StatisticsThe Chi-squared statistic

22 Observed Expected

Expected

The Likelihood Ratio statistic:

2 2 ln 2 lnˆ

ijkijk

ijk

xObservedG Observed xExpected

d.f. = # cells - # parameters fitted

ˆijk ijk

ijk

x

We reject the model if 2 or G2 is greater than2

/ 2

Page 131: Discrete Multivariate Analysis

Conditional Test Statistics

Page 132: Discrete Multivariate Analysis

In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:

2 2 22 1 2 1G G G

1

2

2Expected

ObservedExpected

2 1df df df

Page 133: Discrete Multivariate Analysis

Stepwise selection procedures

Forward SelectionBackward Elimination

Page 134: Discrete Multivariate Analysis

Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model:To determine the significance of a parameter added we use the statistic:

G2(2|1) = G2(2) – G2(1)Model 1 contains the parameter.Model 2 does not contain the parameter

Page 135: Discrete Multivariate Analysis

Backward Elimination: Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved.At each step the log-linear parameter that is least significant is deleted from the model:

To determine the significance of a parameter deleted we use the statistic:

G2(2|1) = G2(2) – G2(1)Model 1 contains the parameter.Model 2 does not contain the parameter

Page 136: Discrete Multivariate Analysis

Example: Fitting a Log-linear model – Forward Selection Table: Dyke -Patterson Data - N=1729 individuals classified according to five variables (1) Reading Newspapers (2) Listen to radio (3) Do "solid'" reading (4) Attend Lectures (5) Knowledge regarding cancer

Radio No Radio Solid

Reading No solid Reading

Solid Reading

No solid Reading

Good Poor Good Poor Good Poor Good Poor Newspaper Lectures 23 8 8 4 27 18 7 6 None 102 67 35 59 201 177 75 156 None Lectures 1 3 4 3 3 8 2 10 None 16 16 13 50 67 83 84 393

Page 137: Discrete Multivariate Analysis

MODEL D.F. CHI-SQUARE PROB CHI-SQUARE PROB ----- ---- ---------- ---- ---------- ---- K,L,N,S,R. 26 596.84 0.0000 751.31 0.0000 MODELS FORMED BY ADDING TERMS TO MODEL -- K,L,N,S,R. LIKELIHOOD-RATIO PEARSON MODEL D.F. CHI-SQUARE PROB CHI-SQUARE PROB ----- ---- ---------- ---- ---------- ---- KL,N,S,R. 25 579.68 0.0000 691.18 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,L,S,R. 25 491.06 0.0000 533.89 0.0000 DIFF. DUE TO ADDING KN. 1 105.78 0.0000 KS,L,N,R. 25 446.39 0.0000 497.12 0.0000 DIFF. DUE TO ADDING KS. 1 150.45 0.0000 KR,L,N,S. 25 572.59 0.0000 674.61 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 K,LN,S,R. 25 575.24 0.0000 688.89 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 K,LS,N,R. 25 573.09 0.0000 692.25 0.0000 DIFF. DUE TO ADDING LS. 1 23.74 0.0000 K,LR,N,S. 25 577.89 0.0000 698.17 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 K,L,NS,R. 25 343.13 0.0000 383.90 0.0000 DIFF. DUE TO ADDING NS. 1 253.71 0.0000 K,L,NR,S. 25 522.61 0.0000 615.20 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 K,L,N,SR. 25 575.76 0.0000 680.88 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 1. BEST MODEL FOUND IS -- K,L,NS,R.

K = knowledge

N = Newspaper

R = Radio

S = Reading

L = Lectures

Page 138: Discrete Multivariate Analysis

KL,NS,R. 24 325.97 0.0000 339.14 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,L,NS,R. 24 237.35 0.0000 258.87 0.0000 DIFF. DUE TO ADDING KN. 1 105.78 0.0000 KS,L,NS,R. 24 192.68 0.0000 216.12 0.0000 DIFF. DUE TO ADDING KS. 1 150.45 0.0000 KR,L,NS. 24 318.88 0.0000 329.40 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 K,LN,NS,R. 24 321.53 0.0000 341.35 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 K,LS,NS,R. 24 319.39 0.0000 348.68 0.0000 DIFF. DUE TO ADDING LS. 1 23.75 0.0000 K,LR,NS. 24 324.18 0.0000 341.62 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 K,L,NR,NS. 24 268.90 0.0000 280.86 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 K,L,SR,NS. 24 322.05 0.0000 347.33 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 2. BEST MODEL FOUND IS -- KS,L,NS,R.

Page 139: Discrete Multivariate Analysis

KL,KS,NS,R. 23 175.52 0.0000 182.86 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,KS,L,NS,R. 23 152.96 0.0000 163.87 0.0000 DIFF. DUE TO ADDING KN. 1 39.72 0.0000 KR,KS,L,NS. 23 168.43 0.0000 173.32 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 KS,LN,NS,R. 23 171.08 0.0000 184.56 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 LS,KS,NS,R. 23 168.93 0.0000 202.28 0.0000 DIFF. DUE TO ADDING LS. 1 23.74 0.0000 KS,LR,NS. 23 173.73 0.0000 178.08 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 KS,L,NR,NS. 23 118.45 0.0000 128.83 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 SR,KS,L,NS. 23 171.60 0.0000 198.23 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 3. BEST MODEL FOUND IS -- KS,L,NR,NS.

Page 140: Discrete Multivariate Analysis

LN,KL,SR,KR,KN,LR,LS,KS,NR,NS. 16 19.56 0.2406 21.21 0.1706 DIFF. DUE TO ADDING SR. 1 0.42 0.5147 KLN,KR,LR,LS,KS,NR,NS. 16 18.86 0.2762 21.53 0.1589 DIFF. DUE TO ADDING KLN. 1 1.13 0.2878 LN,KLS,KR,KN,LR,NR,NS. 16 15.99 0.4538 15.63 0.4794 DIFF. DUE TO ADDING KLS. 1 4.00 0.0456 LN,KLR,KN,LS,KS,NR,NS. 16 19.28 0.2543 20.81 0.1860 DIFF. DUE TO ADDING KLR. 1 0.70 0.4015 LN,KL,KR,KNS,LR,LS,NR. 16 16.78 0.4000 18.74 0.2821 DIFF. DUE TO ADDING KNS. 1 3.21 0.0733 LN,KL,KNR,LR,LS,KS,NS. 16 19.90 0.2247 21.27 0.1682 DIFF. DUE TO ADDING KNR. 1 0.09 0.7704 LNS,KL,KR,KN,LR,KS,NR. 16 19.58 0.2397 20.98 0.1794 DIFF. DUE TO ADDING LNS. 1 0.41 0.5239 LNR,KL,KR,KN,LS,KS,NS. 16 18.11 0.3176 18.80 0.2790 DIFF. DUE TO ADDING LNR. 1 1.88 0.1706 STEP 10. BEST MODEL FOUND IS -- LN,KLS,KR,KN,LR,NR,NS.

Continuing after 10 steps

Page 141: Discrete Multivariate Analysis

LN,SR,KLS,KR,KN,LR,NR,NS. 15 15.55 0.4127 15.15 0.4406 DIFF. DUE TO ADDING SR. 1 0.44 0.5072 KLN,KLS,KR,LR,NR,NS. 15 12.98 0.6041 13.84 0.5379 DIFF. DUE TO ADDING KLN. 1 3.01 0.0827 LN,KLR,KLS,KN,NR,NS. 15 15.10 0.4446 15.06 0.4471 DIFF. DUE TO ADDING KLR. 1 0.89 0.3446 LN,KNS,KLS,KR,LR,NR. 15 13.21 0.5861 13.19 0.5878 DIFF. DUE TO ADDING KNS. 1 2.78 0.0955 LN,KLS,KNR,LR,NS. 15 15.93 0.3870 15.48 0.4173 DIFF. DUE TO ADDING KNR. 1 0.06 0.8034 LNS,KLS,KR,KN,LR,NR. 15 15.87 0.3905 15.60 0.4089 DIFF. DUE TO ADDING LNS. 1 0.12 0.7343 LNR,KLS,KR,KN,NS. 15 14.23 0.5085 13.75 0.5446 DIFF. DUE TO ADDING LNR. 1 1.76 0.1842 STEP 11. BEST MODEL FOUND IS -- KLN,KLS,KR,LR,NR,NS.

The final step

Page 142: Discrete Multivariate Analysis

The best model was found a the previous step• [LN][KLS][KR][KN][LR][NR][NS]

Page 143: Discrete Multivariate Analysis

Modelling of response variables

Independent → Dependent

Page 144: Discrete Multivariate Analysis

Logit Models

To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.

Page 145: Discrete Multivariate Analysis

Example: Logit Models Table: The Effect of planting depth on mortality of Pine seedlings Longleaf Seedlings Slash Seedlings

Depth of Planting Dead Alive Totals Dead Alive Totals Too High 41 59 100 12 88 100 Too Low 11 89 100 5 95 100

Totals 52 148 200 17 183 200 Table: Loglinear Models Fit to Data in Above Table and their Goodness of Fit Statistics Model 2 G2 df [12][13][23] 1.37 1.28 1 [13][23] 26.54 27.79 2 [12][13] 24.03 25.03 2 [13][2] 54.70 50.10 3

Page 146: Discrete Multivariate Analysis

The variables1. Type of seedling (T)

a. Longleaf seedlingb. Slash seedling

2. Depth of planting (D)a. Too low.b. Too high

3. Mortality (M) (the dependent variable)a. Deadb. Alive

Page 147: Discrete Multivariate Analysis

The Log-linear Model

Note: ij1 = # dead when T = i and D = j.

ln ijk T i D j M ku u u u

, , , , ,TD i j TM i k DM j k TDM i j ku u u u

ij2 = # alive when T = i and D = j.

1

2

ij

ij

deadalive

= mortality ratio when T = i and D = j.

Page 148: Discrete Multivariate Analysis

Hence

1T i D j Mu u u u

, ,1 ,1 , ,1TD i j TM i DM j TDM i ju u u u

11 2

2

ln ln ln log-mortality ratioijij ij

ij

since

2T i D j Mu u u u

, ,2 ,2 , ,2TD i j TM i DM j TDM i ju u u u

1 ,1 ,1 , ,12 2 2 2M TM i DM j TDM i ju u u u

2 1 ,2 ,1, ,M M TM i TM iu u u u

,2 ,1 , ,2 , ,1,DM j DM j TDM i j TDM i ju u u u

Page 149: Discrete Multivariate Analysis

The logit model:1

1 22

ln ln ln log-mortality ratioijij ij

ij

where ,T i D j TD i jv v v v

1 ,1 ,12 , 2 , 2 , andM T i TM i D j DM jv u v u v u

, , ,12TD i j TDM i jv u

Page 150: Discrete Multivariate Analysis

Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable.

Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit modelk + 1 = 1 constant term in logit modelk + 1 = 2, main effects in logit model

Page 151: Discrete Multivariate Analysis

Example: Logit Models Table: The Effect of planting depth on mortality of Pine seedlings Longleaf Seedlings Slash Seedlings

Depth of Planting Dead Alive Totals Dead Alive Totals Too High 41 59 100 12 88 100 Too Low 11 89 100 5 95 100

Totals 52 148 200 17 183 200 Table: Loglinear Models Fit to Data in Above Table and their Goodness of Fit Statistics Model 2 G2 df [12][13][23] 1.37 1.28 1 [13][23] 26.54 27.79 2 [12][13] 24.03 25.03 2 [13][2] 54.70 50.10 3

1 = Depth, 2 = Mort, 3 = Type

Page 152: Discrete Multivariate Analysis

Log-Linear parameters for Model: [TM][TD][DM]Main Effects: Mort Mort ------ Dead Alive ------------------- -0.946 0.946 Type Type ------ Lleaf Slash ------------------- 0.240 -0.240 Depth Depth ------ low high ------------------- 0.257 -0.257

Two-Factor Interactions: Type-Mort Type Mort ------ ------ Dead Alive --------------------------- Lleaf 0.354 -0.354 Slash -0.354 0.354

Depth-Mort Depth Mort ------ ------ Dead Alive --------------------------- low 0.376 -0.376 high -0.376 0.376 Mort -Type Depth Type ------ ------ Lleaf Slash --------------------------- low -0.063 0.063 high 0.063 -0.063

Page 153: Discrete Multivariate Analysis

Logit Model for predicting the Mortality

ln D i T kMR v v v

D i T kv vvdeadMR e e ealive

or

Log-Linear Logit Multconst -0.946 -1.892 0.151Depth- High 0.354 0.708 2.030

Low -0.354 -0.708 0.493Type-Long 0.376 0.752 2.121

Slash -0.376 -0.752 0.471

Page 154: Discrete Multivariate Analysis

Example: Fitting a Log-linear model – Forward Selection Table: Dyke -Patterson Data - N=1729 individuals classified according to five variables (1) Reading Newspapers (2) Listen to radio (3) Do "solid'" reading (4) Attend Lectures (5) Knowledge regarding cancer

Radio No Radio Solid

Reading No solid Reading

Solid Reading

No solid Reading

Good Poor Good Poor Good Poor Good Poor Newspaper Lectures 23 8 8 4 27 18 7 6 None 102 67 35 59 201 177 75 156 None Lectures 1 3 4 3 3 8 2 10 None 16 16 13 50 67 83 84 393

Page 155: Discrete Multivariate Analysis

The best model was found by forward selection was[LN][KLS][KR][KN][LR][NR][NS]

To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely

[LNRS][KLS][KR][KN]The logit model will containMain effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading)Two factor interaction effect for L and S

Page 156: Discrete Multivariate Analysis

The Logit Parameters for the Model : LNSR, KLS, KR, KN ( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters)The Constant term:

-0.226 (0.798)The Main effects on Knowledge:Lectures Lect 0.268 (1.307)

None -0.268 (0.765)Newspaper News 0.324 (1.383)

None -0.324 (0.723)Reading Solid 0.340 (1.405)

Not -0.340 (0.712)Radio Radio 0.150 (1.162)

None -0.150 (0.861)

The Two-factor interaction Effect of Reading and Lectures on Knowledge

Reading Lectures Solid Not

Lect -0.180 (0.835) 0.180 (1.197) None 0.180 (1.197) -0.180 (0.835)

ratio goodKpoor

Page 157: Discrete Multivariate Analysis

Fitting a Logit Model with a Polytomous Response Variable

Page 158: Discrete Multivariate Analysis

Example: Table

Observed Cross-Classification of 2294 Males Who Failed to Pass the Armed Forces Qualification Test

Father's Respondent's Education Race Age Education Grammar School Some HS HS Graduate

GS 39 29 8 < 22 Some HS 4 8 1 HS Grad 11 9 6 NA 48 17 8

White GS 231 115 51 22 Some HS 17 21 13 HS Grad 18 28 45 NA 197 111 35 GS 19 40 19 < 22 Some HS 5 17 7 HS Grad 2 14 3 NA 49 79 24

Black GS 110 133 103 22 Some HS 18 38 25 HS Grad 11 25 18 NA 178 206 81

NA – Not available

Page 159: Discrete Multivariate Analysis

The variables

1. Race – white, black2. Age - < 22, ≥ 223. Father’s education – GS, some HS, HS grad,

NA4. Respondents Education - GS, some HS, HS

grad – the response (dependent) variable

Page 160: Discrete Multivariate Analysis

Table: Various Loglinear Models Fit to the 3 4 2 2 Table above Model d.f. G2 p-value [234][1] 30 254.8 0.0000 [234][12] 24 162.6 0.0000 [234][13] 28 242.7 0.0000 [234][14] 28 152.8 0.0000 [234][12][13] 22 151.5 0.0000 [234][12][14] 22 46.7 0.0016 [234][13][14] 26 142.5 0.0000 [234][12][13][14] 20 36.9 0.0120 [234][123][14] 14 27.9 0.0147 [234][124][13] 14 18.1 0.2023 [234][134][12] 18 33.2 0.0158 [234][123][124] 8 9.7 0.2867

Page 161: Discrete Multivariate Analysis

Techniques for handling Polytomous Response VariableApproaches1. Consider the categories 2 at a time. Do this for all

possible pairs of the categories.2. Look at the continuation ratios

i. 1 vs 2ii. 1,2 vs 3iii. 1,2,3 vs 4iv. etc

Page 162: Discrete Multivariate Analysis

Table Estimated Logit Effects for The Three Logit Models

Corresponding to the Log Linear Model - [234][124][13]

Grammar vs Some HS

log(m1jkl/m2jkl)

Grammar vs HS Grad

log(m1jkl/m3jkl)

Some HS vs HS Grad

log(m2jkl/m3jkl) Constant -0.289 0.451 0.740

Race White 0.395 0.390 -0.005 Black -0.395 -0.390 0.005

Age < 22 -0.120 0.099 0.219 ≥ 22 0.120 -0.099 -0.219 Grammar 0.380 0.406 0.026

Father's Some HS -0.371 -0.355 0.016 Education HS Grad -0.441 -0.918 -0.477

NA 0.432 0.867 0.435

Race - Father's Education Interaction Grammar 0.063 0.345 0.282

White by Some HS -0.128 -0.016 0.112 HS Grad 0.030 -0.429 -0.459 NA 0.035 0.101 0.066 \Grammar -0.063 -0.345 -0.282

Black by Some HS 0.128 0.016 -0.112 HS Grad -0.030 0.429 0.459 NA -0.035 -0.101 -0.066

Page 163: Discrete Multivariate Analysis

Table Multiplicative Logit Effects for The Three Logit Models Corresponding to the Log Linear Model - [234][124][13]

Grammar vs Some HS

log(m1jkl/m2jkl)

Grammar vs HS Grad

log(m1jkl/m3jkl)

Some HS vs HS Grad

log(m2jkl/m3jkl) Constant 0.749 1.570 2.096

Race White 1.484 1.477 0.995 Black 0.674 0.677 1.005

Age < 22 0.887 1.104 1.245 ≥ 22 1.127 0.906 0.803 Grammar 1.462 1.501 1.026

Father's Some HS 0.690 0.701 1.016 Education HS Grad 0.643 0.399 0.621

NA 1.540 2.380 1.545

Race - Father's Education Interaction Grammar 1.065 1.412 1.326

White by Some HS 0.880 0.984 1.119 HS Grad 1.030 0.651 0.632 NA 1.036 1.106 1.068 Grammar 0.939 0.708 0.754

Black by Some HS 1.137 1.016 0.894 HS Grad 0.970 1.536 1.582 NA 0.966 0.904 0.936

Page 164: Discrete Multivariate Analysis

Table Various Logit Models for thre Log Continuation ratios in the first Table

a log

m2jkm1jk

b log

m3jkm1jk m2jk

Combined Fit

Model d.f. G2 d.f. G2 d.f. G2 [234][1] 15 131.5 15 123.3 30 254.8 [234][12] 12 97.9 12 64.7 24 162.6 [234][13] 14 123.3 14 119.4 28 242.7 [234][14] 14 49.0 14 102.8 28 152.8 [234][12][13] 11 91.9 11 60.3 22 152.2 [234][12][14] 11 16.1 11 35.6 22 51.7 [234][13][14] 13 43.7 13 98.7 26 142.4 [234][12][13][14] 10 12.4 10 29.8 20 42.2 [234][123][14] 7 9.3 7 23.2 14 32.5 [234][124][13] 7 9.3 7 23.2 14 18.5 [234][134][12] 9 8.6 9 29.7 18 38.3 [234][123][124] 4 8.5 4 1.2 8 9.7

Page 165: Discrete Multivariate Analysis

Causal or Path Analysis for Categorical Data

Page 166: Discrete Multivariate Analysis

When the data is continuous, a causal pattern may be assumed to exist amongst the variables.The path diagramThis is a diagram summarizing causal relationships.Straight arrows are drawn between a variable that has some cause and effect on another variable X YCurved double sided arrows are drawn between variables that are simply correlated

X Y

Page 167: Discrete Multivariate Analysis

Example 1 The variables – Job stress, Smoking, Heart DiseaseThe path diagram

Job Stress

Heart Disease

Smoking

In Path Analysis for continuous variables, one is interested in determining the contribution along each path (the path coefficents)

Page 168: Discrete Multivariate Analysis

Example 2The variables – Job stress, Alcoholic Drinking, Smoking, Heart DiseaseThe path diagram Job

Stress

Heart Disease

SmokingDrinking

Page 169: Discrete Multivariate Analysis

In analysis of categorical data there are no path coefficients but path diagrams can point to the appropriate logit analysis

ExampleIn this example the data consists of a two wave, two variable panel data for a sample of n =3398 schoolboys.It is looking at “membership” and “attitude towards” the leading crowd.

Page 170: Discrete Multivariate Analysis

The path diagram: A B C D This suggest predicting B from A, thenC from A and B and finallyD from A, B and C.

Examples of Causal Analysis Using Recursive Systems of Logit Models Example 1 Two-Wave Two-Variable Panel Data for 3398 Schoolboys: Membership in and attitude toward the "Leading Crowd".

Second Interview Membership + + - - Attitude + - + -

Membership Attitude + + 458 140 110 49 First + - 171 182 56 87 Interview - + 184 75 531 281 - - 85 97 338 554

A = Membership at first interview , B = Attitude at first interview C = Membership at second interview, D = Attitude at second interview

Page 171: Discrete Multivariate Analysis

Two-way Analysis for determining the effect of A on B Attitude(B)

+ - + 757 496 Membership

(A)

- 1071 1074

Page 172: Discrete Multivariate Analysis

Goodness of Fit Statistics for determining the effect of A, B on C 1. [AB][AC][BC] (1 df; G2 = 0.0) 2. [AB][BC] (2 df; G2 = 1005.1) 3. [AB][AC] (2 df; G2 = 27.2) Identified Logit Model (Model # 1. [AB][AC][BC])

logitAB|C

ij log

mAB|Cij1

mAB|Cij2

wAB|C wAB|C

1i wAB|C2j

Page 173: Discrete Multivariate Analysis

Goodness of Fit Statistics for determining the effect of A, B, C on D 4. [ABC][AD][BD][CD] (4 df; G2 = 1.2) 5. [ABC][BD][CD] (5 df; G2 = 4.0) 6. [ABC][AD][CD] (5 df; G2 = 262.5) 7. [ABC][AD][BD] (5 df; G2 = 15.7)

Identified Logit Model (Model # 5. [ABC][BD][CD])

logitABC|D

ijk wABC|D wABC|D2j wABC|CD

3k

Page 174: Discrete Multivariate Analysis

Example 2In this example we are looking at 1. Social Economic Status (SES)2. Sex3. IQ4. Parental Encouragement for Higher

Education (PE)5. College Plans(CP)

Page 175: Discrete Multivariate Analysis

Social Class, Parental Encouragement,IQ, and Educational Aspirations College Parental SES Sex IQ Plans Encouragement L LM UM H M L Yes Low 4 2 8 4 High 13 27 47 39 No Low 349 232 166 48 High 64 84 91 57 LM Yes Low 9 7 6 5 High 33 64 74 123 No Low 207 201 120 47 High 72 95 110 90 UM Yes Low 12 12 17 9 High 38 93 148 224 No Low 126 115 92 41 High 54 92 100 65 H Yes Low 10 17 6 8 High 49 119 198 414 No Low 67 79 42 17 High 43 59 73 54 M L Yes Low 5 11 7 6 High 9 29 36 36 No Low 454 285 163 50 High 44 61 72 58 LM Yes Low 5 19 13 5 High 14 47 75 110 No Low 312 236 193 70 High 47 88 90 76 UM Yes Low 8 12 12 12 High 20 62 91 230 No Low 216 164 174 48 High 35 85 100 81 H Yes Low 13 15 20 13 High 28 72 142 360 No Low 96 113 81 49 High 24 50 77 98

Page 176: Discrete Multivariate Analysis

The Path Diagram

SES

Sex

IQ

PE

CP

Page 177: Discrete Multivariate Analysis

The path diagram suggests

1. Predicting Parental Encouragement from Sex, SocioEconomic status, and IQ, then

2. Predicting College Plans from Parental Encouragement, Sex, SocioEconomic status, and IQ.

Page 178: Discrete Multivariate Analysis

Goodness of Fit Statistics for determining the effect of A, B, C on D (A = Social class, B = IQ, C = Sex, D = Parental Encouragement, E = College Plans) 1. [ABC][AD][BD][CD] (24 df; G2 = 55.81) 2. [ABC][ABD][CD] (15 df; G2 = 34.60) 3. [ABC][BCD][ACD] (18 df; G2 = 31.48) 4. [ABC][ABD][BCD] (12 df; G2 = 22.44) 5. [ABC][ABD][ACD] (12 df; G2 = 22.45) 6. [ABC][ABD][ACD][BCD] (9 df; G2 = 9.22)

Page 179: Discrete Multivariate Analysis

Logit Parameters: Model [ABC][ABD][ACD][BCD]

Constant term wABC|D = 0.124 Main Effects Social Class L LM UM H w1(i)

ABC|D = -1.178, -0.384, 0.222, 1.340 IQ L LM UM H w2(j)

ABC|D = -0.772, -0.226, 0.210, 0.788 Sex M F w3(k)

ABC|D = 0.304, -0.304

Page 180: Discrete Multivariate Analysis

Two factor Interactions

IQ by Social Class IQ L LM UM H L -0.016 -0.098 -0.058 -0.026 Social LM 0.066 0.032 0.144 -0.244 Class UM 0.074 -0.044 -0.138 0.108 H -0.126 -0.086 0.048 0.164

Page 181: Discrete Multivariate Analysis

Social Class by Sex Sex M F L 0.140 -0.140 Social LM -0.052 0.052 Class UM 0.018 -0.018 H -0.106 0.106

IQ by Sex Sex M F L -0.126 0.126 IQ LM -0.016 0.016 UM 0.018 -0.018 H 0.122 -0.122

Page 182: Discrete Multivariate Analysis

Goodness of Fit Statistics for determining the effect of A, B, C, D on E (A = Social class, B = IQ, C = Sex, D = Parental Encouragement, E = College Plans) 7. [ABCD][E][CD] (63 df; G2 = 4497.51) 8. [ABCD][AE][BE][CE][DE] (55 df; G2 = 73.82) 9. [ABCD][BCE][AE][DE] (52 df; G2 = 59.55)

Page 183: Discrete Multivariate Analysis

Logit Parameters for Predicting College Plans Using Model 9:[ABCD][BCE][AE][DE]

Constant term wABCD|E = - 1.292 Main Effects Social Class L LM UM H w1(i)

ABCD|E = -0.650, -0.200, 0.062, 0.790 IQ L LM UM H w2(j)

ABCD|E = -0.840, -0.300, 0.266, 0.876 Sex M F w3(k)

ABCD|E = 0.082, -0.082 Parental Encouragement L H w4(l)

ABCD|E = -1.214, 1.214

Page 184: Discrete Multivariate Analysis

Two Factor Interactions IQ by Sex Sex M F L -0.134 0.134 IQ LM -0.078 0.078 UM 0.094 -0.094 H 0.118 -0.118