Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Categorical Data Analysis 1
ChangHwan Kim
KU
January 2011
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 1 / 1
Types of Variable
Continuous Variables: In priciple, can assume an infinte number ofvalues
Categorical Variables: Those variables that can be measured usingonly a limited number of values or categories
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 2 / 1
Types of Variable
1 Quantitative Variables1 Continous2 Discrete
2 Qualitative Variables1 Ordinal2 Nominal
⇒ 1.2; 2.1; and 2.2 are categorical variables.
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 3 / 1
Types of Variable
1 Quantitative Variables1 Continous2 Discrete
2 Qualitative Variables1 Ordinal2 Nominal
⇒ 1.2; 2.1; and 2.2 are categorical variables.
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 3 / 1
Interpretation of OLS
y = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where y refers to log annual earnings.
As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.
As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.
∂y
∂(years of sch)= β
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1
Interpretation of OLS
y = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where y refers to log annual earnings.
As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.
As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.
∂y
∂(years of sch)= β
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1
Interpretation of OLS
y = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where y refers to log annual earnings.
As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.
As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.
∂y
∂(years of sch)= β
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1
Interpretation of OLS
y = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where y refers to log annual earnings.
As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.
As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.
∂y
∂(years of sch)= β
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1
Categorical Variables as Independent Variables
y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε
where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.
Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.
∂y
∂(HSG)= β1
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1
Categorical Variables as Independent Variables
y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε
where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.
Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.
∂y
∂(HSG)= β1
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1
Categorical Variables as Independent Variables
y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε
where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.
Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.
∂y
∂(HSG)= β1
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1
Categorical Variables as Independent Variables
y = α + δ(life-satisfaction) + γ1(age) + γ2(age2) + ε
where life-satisfaction is measured by Likert scale:1. Very satified2. Somewhat satifsfied3. Neutral4. Somewhat unsatisfied5. Very unsatisfied
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 6 / 1
Categorical Variables as Independent Variables
y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε
y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε
∂y
∂(female)= θ + η(years of sch)
∂y
∂(age)= γ1 + 2γ2(age)
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1
Categorical Variables as Independent Variables
y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε
y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε
∂y
∂(female)= θ + η(years of sch)
∂y
∂(age)= γ1 + 2γ2(age)
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1
Categorical Variables as Independent Variables
y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε
y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε
∂y
∂(female)= θ + η(years of sch)
∂y
∂(age)= γ1 + 2γ2(age)
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1
Categorical Variables as Dependent Variables
Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where 1 = employment and 0 = unemployment.
How to interpret β?
Might be probability.
Distribution of log earnings vs. Distribution of employment
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1
Categorical Variables as Dependent Variables
Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where 1 = employment and 0 = unemployment.
How to interpret β? Might be probability.
Distribution of log earnings vs. Distribution of employment
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1
Categorical Variables as Dependent Variables
Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε
where 1 = employment and 0 = unemployment.
How to interpret β? Might be probability.
Distribution of log earnings vs. Distribution of employment
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1
Assumptions of OLS
E (yi |xi ) = β0 + β1x1i + β2x2i + · · ·+ βKxKi
E (y |x) = β0 + β1x1 + β2x2 + · · ·+ βKxK
y = β0 + β1x1 + β2x2 + · · ·+ βKxK
y = β0 + β1x1 + β2x2 + · · ·+ βKxK + ε
=K∑
k=0
βkxk + ε
= x′B + ε
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 9 / 1
Assumptions of OLS
1 Linearity
2 Independence of x ’s and ε
3 i.i.d. of ε
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1
Assumptions of OLS
1 Linearitythe conditional mean of y is a linear fuction of x variables,E (y |x) =
∑βkxk
linear funciton = additive function
2 Independence of x ’s and ε
3 i.i.d. of ε
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1
Assumptions of OLS
1 Linearityif y = αxβγ,then, take log on both sides,ln y = lnα + β ln x + ε,β = the expected % change of y, as x grows by 1%.
2 Independence of x ’s and ε
3 i.i.d. of ε
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1
Assumptions of OLS
1 Linearity
2 Independence of x ’s and εcov(x , ε) = 0,therefore unbiasedness of β’s
3 i.i.d. of ε
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1
Assumptions of OLS
1 Linearity
2 Independence of x ’s and ε
3 i.i.d. of εε’s are independently and identically distributed.independence = no autocorrelationidentically distributed = homoscedasticity = constant varianceε = N(0, σ2)
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1
OLS: Least Square Estimation
yi =n∑
i=1
βkxik
thus, εi = yi − yi
S(β) =n∑
i=1
(yi −∑
βkxik)2 =n∑
i=1
(yi − yi )2 =
n∑i=1
ε2i
LS is to find estimates of β’s that make the sum of squared errors aroundthe conditional means as small as possible.
∂S(β)
∂βk= −2
n∑i=1
(y −∑
βkxik)xik = 0
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 11 / 1
OLS: Least Square Estimation
Each kth variable will have one equation, yielding total K + 1 normalequations.
In matrix form,b = [X′X]−1X′yvar(b) = σ2ε(X′X)−1
var(b) is a (K + 1)× (K + 1) matrix, with diagonal elements equal to thevariances of the estimates and off-disgonal elements equal to thecovariances between estimates.→ a variance and covariance matrix.
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 12 / 1
Weaker Assumptions
What if some assumptions are violated?
ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 13 / 1