Categorical Data Analysis 1 - University of Kansas

Categorical Data Analysis 1

ChangHwan Kim

KU

January 2011

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 1 / 1

Types of Variable

Continuous Variables: In priciple, can assume an infinte number ofvalues

Categorical Variables: Those variables that can be measured usingonly a limited number of values or categories


Types of Variable

1 Quantitative Variables1 Continous2 Discrete

2 Qualitative Variables1 Ordinal2 Nominal

⇒ 1.2; 2.1; and 2.2 are categorical variables.


Types of Variable

1 Quantitative Variables1 Continous2 Discrete

2 Qualitative Variables1 Ordinal2 Nominal

⇒ 1.2; 2.1; and 2.2 are categorical variables.


Interpretation of OLS

y = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where y refers to log annual earnings.

As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.

As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.

∂y

∂(years of sch)= β







∂y








∂y








∂y



Categorical Variables as Independent Variables

y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε

where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.

Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.

∂y

∂(HSG)= β1






∂y

∂(HSG)= β1






∂y

∂(HSG)= β1



y = α + δ(life-satisfaction) + γ1(age) + γ2(age2) + ε

where life-satisfaction is measured by Likert scale:1. Very satified2. Somewhat satifsfied3. Neutral4. Somewhat unsatisfied5. Very unsatisfied



y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε

y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε

∂y

∂(female)= θ + η(years of sch)

∂y

∂(age)= γ1 + 2γ2(age)





∂y


∂y

∂(age)= γ1 + 2γ2(age)





∂y


∂y

∂(age)= γ1 + 2γ2(age)


Categorical Variables as Dependent Variables

Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where 1 = employment and 0 = unemployment.

How to interpret β?

Might be probability.

Distribution of log earnings vs. Distribution of employment





How to interpret β? Might be probability.






How to interpret β? Might be probability.



Assumptions of OLS

E (yi |xi ) = β0 + β1x1i + β2x2i + · · ·+ βKxKi

E (y |x) = β0 + β1x1 + β2x2 + · · ·+ βKxK

y = β0 + β1x1 + β2x2 + · · ·+ βKxK

y = β0 + β1x1 + β2x2 + · · ·+ βKxK + ε

=K∑

k=0

βkxk + ε

= x′B + ε


Assumptions of OLS

1 Linearity

2 Independence of x ’s and ε

3 i.i.d. of ε


Assumptions of OLS

1 Linearitythe conditional mean of y is a linear fuction of x variables,E (y |x) =

∑βkxk

linear funciton = additive function


3 i.i.d. of ε


Assumptions of OLS

1 Linearityif y = αxβγ,then, take log on both sides,ln y = lnα + β ln x + ε,β = the expected % change of y, as x grows by 1%.


3 i.i.d. of ε


Assumptions of OLS

1 Linearity

2 Independence of x ’s and εcov(x , ε) = 0,therefore unbiasedness of β’s

3 i.i.d. of ε


Assumptions of OLS

1 Linearity


3 i.i.d. of εε’s are independently and identically distributed.independence = no autocorrelationidentically distributed = homoscedasticity = constant varianceε = N(0, σ2)


OLS: Least Square Estimation

yi =n∑

i=1

βkxik

thus, εi = yi − yi

S(β) =n∑

i=1

(yi −∑

βkxik)2 =n∑

i=1

(yi − yi )2 =

n∑i=1

ε2i

LS is to find estimates of β’s that make the sum of squared errors aroundthe conditional means as small as possible.

∂S(β)

∂βk= −2

n∑i=1

(y −∑

βkxik)xik = 0


OLS: Least Square Estimation

Each kth variable will have one equation, yielding total K + 1 normalequations.

In matrix form,b = [X′X]−1X′yvar(b) = σ2ε(X′X)−1

var(b) is a (K + 1)× (K + 1) matrix, with diagonal elements equal to thevariances of the estimates and off-disgonal elements equal to thecovariances between estimates.→ a variance and covariance matrix.


Weaker Assumptions

What if some assumptions are violated?


Documents

Categorical Data Analysis 1 - University of Kansas