27
Categorical Data Analysis 1 ChangHwan Kim KU January 2011 ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 1/1

Categorical Data Analysis 1 - University of Kansas

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Categorical Data Analysis 1 - University of Kansas

Categorical Data Analysis 1

ChangHwan Kim

KU

January 2011

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 1 / 1

Page 2: Categorical Data Analysis 1 - University of Kansas

Types of Variable

Continuous Variables: In priciple, can assume an infinte number ofvalues

Categorical Variables: Those variables that can be measured usingonly a limited number of values or categories

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 2 / 1

Page 3: Categorical Data Analysis 1 - University of Kansas

Types of Variable

1 Quantitative Variables1 Continous2 Discrete

2 Qualitative Variables1 Ordinal2 Nominal

⇒ 1.2; 2.1; and 2.2 are categorical variables.

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 3 / 1

Page 4: Categorical Data Analysis 1 - University of Kansas

Types of Variable

1 Quantitative Variables1 Continous2 Discrete

2 Qualitative Variables1 Ordinal2 Nominal

⇒ 1.2; 2.1; and 2.2 are categorical variables.

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 3 / 1

Page 5: Categorical Data Analysis 1 - University of Kansas

Interpretation of OLS

y = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where y refers to log annual earnings.

As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.

As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.

∂y

∂(years of sch)= β

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1

Page 6: Categorical Data Analysis 1 - University of Kansas

Interpretation of OLS

y = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where y refers to log annual earnings.

As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.

As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.

∂y

∂(years of sch)= β

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1

Page 7: Categorical Data Analysis 1 - University of Kansas

Interpretation of OLS

y = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where y refers to log annual earnings.

As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.

As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.

∂y

∂(years of sch)= β

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1

Page 8: Categorical Data Analysis 1 - University of Kansas

Interpretation of OLS

y = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where y refers to log annual earnings.

As years of schooling increases by 1 year, the expected log annual earningsgrows by β, after controlling for age and age-squared.

As years of schooling increases by 1 year, the expected annual earnings willgrow by [exp(β)− 1] percent, after controlling for age and age-squared.

∂y

∂(years of sch)= β

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 4 / 1

Page 9: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε

where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.

Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.

∂y

∂(HSG)= β1

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1

Page 10: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε

where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.

Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.

∂y

∂(HSG)= β1

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1

Page 11: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β1(HSG) + β2(BA+) + γ1(age) + γ2(age2) + ε

where education is represtend by two dummy variables,HSG and BA+, seting LTHS as a reference group.

Comparing to LTHS, the log annual earnings of HSG is larger by β1 onaverage, after controlling for age and age-squared.

∂y

∂(HSG)= β1

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 5 / 1

Page 12: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + δ(life-satisfaction) + γ1(age) + γ2(age2) + ε

where life-satisfaction is measured by Likert scale:1. Very satified2. Somewhat satifsfied3. Neutral4. Somewhat unsatisfied5. Very unsatisfied

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 6 / 1

Page 13: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε

y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε

∂y

∂(female)= θ + η(years of sch)

∂y

∂(age)= γ1 + 2γ2(age)

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1

Page 14: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε

y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε

∂y

∂(female)= θ + η(years of sch)

∂y

∂(age)= γ1 + 2γ2(age)

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1

Page 15: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Independent Variables

y = α + β(years of sch) + θ(female)+γ1(age) + γ2(age2) + ε

y = α + β(years of sch) + θ(female) + η(years of sch× female)+γ1(age) + γ2(age2) + ε

∂y

∂(female)= θ + η(years of sch)

∂y

∂(age)= γ1 + 2γ2(age)

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 7 / 1

Page 16: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Dependent Variables

Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where 1 = employment and 0 = unemployment.

How to interpret β?

Might be probability.

Distribution of log earnings vs. Distribution of employment

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1

Page 17: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Dependent Variables

Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where 1 = employment and 0 = unemployment.

How to interpret β? Might be probability.

Distribution of log earnings vs. Distribution of employment

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1

Page 18: Categorical Data Analysis 1 - University of Kansas

Categorical Variables as Dependent Variables

Employment = α + β(years of sch) + γ1(age) + γ2(age2) + ε

where 1 = employment and 0 = unemployment.

How to interpret β? Might be probability.

Distribution of log earnings vs. Distribution of employment

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 8 / 1

Page 19: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

E (yi |xi ) = β0 + β1x1i + β2x2i + · · ·+ βKxKi

E (y |x) = β0 + β1x1 + β2x2 + · · ·+ βKxK

y = β0 + β1x1 + β2x2 + · · ·+ βKxK

y = β0 + β1x1 + β2x2 + · · ·+ βKxK + ε

=K∑

k=0

βkxk + ε

= x′B + ε

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 9 / 1

Page 20: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

1 Linearity

2 Independence of x ’s and ε

3 i.i.d. of ε

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1

Page 21: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

1 Linearitythe conditional mean of y is a linear fuction of x variables,E (y |x) =

∑βkxk

linear funciton = additive function

2 Independence of x ’s and ε

3 i.i.d. of ε

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1

Page 22: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

1 Linearityif y = αxβγ,then, take log on both sides,ln y = lnα + β ln x + ε,β = the expected % change of y, as x grows by 1%.

2 Independence of x ’s and ε

3 i.i.d. of ε

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1

Page 23: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

1 Linearity

2 Independence of x ’s and εcov(x , ε) = 0,therefore unbiasedness of β’s

3 i.i.d. of ε

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1

Page 24: Categorical Data Analysis 1 - University of Kansas

Assumptions of OLS

1 Linearity

2 Independence of x ’s and ε

3 i.i.d. of εε’s are independently and identically distributed.independence = no autocorrelationidentically distributed = homoscedasticity = constant varianceε = N(0, σ2)

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 10 / 1

Page 25: Categorical Data Analysis 1 - University of Kansas

OLS: Least Square Estimation

yi =n∑

i=1

βkxik

thus, εi = yi − yi

S(β) =n∑

i=1

(yi −∑

βkxik)2 =n∑

i=1

(yi − yi )2 =

n∑i=1

ε2i

LS is to find estimates of β’s that make the sum of squared errors aroundthe conditional means as small as possible.

∂S(β)

∂βk= −2

n∑i=1

(y −∑

βkxik)xik = 0

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 11 / 1

Page 26: Categorical Data Analysis 1 - University of Kansas

OLS: Least Square Estimation

Each kth variable will have one equation, yielding total K + 1 normalequations.

In matrix form,b = [X′X]−1X′yvar(b) = σ2ε(X′X)−1

var(b) is a (K + 1)× (K + 1) matrix, with diagonal elements equal to thevariances of the estimates and off-disgonal elements equal to thecovariances between estimates.→ a variance and covariance matrix.

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 12 / 1

Page 27: Categorical Data Analysis 1 - University of Kansas

Weaker Assumptions

What if some assumptions are violated?

ChangHwan Kim (KU) Categorical Data Analysis 1 January 2011 13 / 1