28
Statistical Reasoning Week 10 Sciences Po - Louis de Charsonville Spring 2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Statistical ReasoningWeek 10

Sciences Po - Louis de Charsonville

Spring 2018

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Page 2: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Outline

Research Paper

RegressionReminderAssumptionsInterpretationControl and BiasCategorical variables

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 2 / 25

Page 3: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Research Paper

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 3 / 25

Page 4: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Research Paper

Timeline

2nd draft 10 AprilWeek 11 17 April

Final draft 24 April

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 4 / 25

Page 5: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Writing

Explore associationsÏ Stata Guide, Sec. 10 ttest, prtest, tab, chi2, pwcorrÏ Stata Guide, Sec. 11 sc, lowess, pwcorr, reg, rfvplot

Write up substantive results as sentences ; cite significance testsand other statistics in brackets : (ρ = .7) (p < .05).

Go through editingÏ Remove technical contentÏ Rewrite until concisionÏ Keep your message clear

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 5 / 25

Page 6: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Regression

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 6 / 25

Page 7: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

15

20

25

Mile

ag

e (

mp

g)

2,000 2,500 3,000 3,500 4,000Weight (lbs.)

Data values

Fitted line

Fitted values

Residuals

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 7 / 25

Page 8: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Regression

Mathematical Form

Y =α+βX +∑iγi Ci +ε (1)

Key ingredientsÏ Dependent variable, or outcome variable Y

Ï Treatment variable X

Ï Control variables Ci

Key outcomesÏ Intercept α

Ï Effect of treatment β

Ï Effect of controls γi

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 8 / 25

Page 9: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Assumptions of the regression

Simple Linear Regression

Y =α+βX +ε (2)

β and α are correctly estimated under the following assumptions :

1. H1 : Sample variation in X

2. H2 :Random sampling : {Yi , Xi } are independent andindentically distributed (i.i.d.)

3. H3 : Zero Conditional mean : E(ε|X ) = 0 or in plain English :"values of the residuals, ε, does not depend on X .

4. H4 : Linear in parameters5. H5 : Heteroscedasticity : V ar (ε|X ) =V ar (ε). Variance of the

residuals does not depend of X

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 9 / 25

Page 10: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Break H1 : No sample variation in X

H1 not true ⇒ X is constant

10

20

30

40

Y

2000 4000

Constant X

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 10 / 25

Page 11: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

H3 : ε does not depend of X

H3 : E(ε|X ) = 0

10

20

30

40

2,000 3,000 4,000 5,000Weight (lbs.)

Mileage (mpg)

Fitted values

tw (sc mpg weight) (lfit mpg weight)

−5

0

5

10

15

2,000 3,000 4,000 5,000Weight (lbs.)

Residuals

Fitted values

tw (sc epsilon weight) (lfit epsilon weight)

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 11 / 25

Page 12: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

H4 : Linear in parameters

H4 not true ⇒ non-linearity in parameters

Example

Y =α+β2X +ε

Y =α+eβX +ε

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 12 / 25

Page 13: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

H5 : Homoskedasticity

H5 : Homoskedasticity, variance residuals should be independent ofX , e.g. V ar (ε|X ) =V ar (ε)

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 13 / 25

Page 14: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Example of heteroskedasticity

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 14 / 25

Page 15: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Interpretation of R2

Ï R2 measures the fraction of the sample variance in Y explainedby the regressors, X .

Ï Low R2 does not say anything about whether we estimatecausal effects.

Ï Low R2 says that the model is not useful for prediction.

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 15 / 25

Page 16: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Interpretation of coefficients

ModelY =α+βX +ε

Ï An increase in one unit of X is associated with an increase ofβ units of Y .

StandardizationÏ Each variable can be normalized to fit N (0,1) so that theirstandardized coefficients have comparable standarddeviations units.

Ï Interpret unstandardized coefficientsÏ Use standardization for comparisons

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 16 / 25

Page 17: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Why controls ? 1/4

Do hospitals make people healthier ?

Group Sample size Mean Health Status Std.Error

Hospital 7774 2.74 0.014No Hospital 900049 2.07 0.003NHIS data

Ï Mean difference : 0.71 (t-stat : 58.9)Ï People who have been hospitalised in the past 12 monthsdeclare a significantly lower health status.

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 17 / 25

Page 18: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Why controls ? 1/4

Do hospitals make people healthier ?

Group Sample size Mean Health Status Std.Error

Hospital 7774 2.74 0.014No Hospital 900049 2.07 0.003NHIS data

Ï Mean difference : 0.71 (t-stat : 58.9)Ï People who have been hospitalised in the past 12 monthsdeclare a significantly lower health status.

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 17 / 25

Page 19: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Why controls ? 2/4

Do hospitals make people healthier ?

Let’s denoteÏ Yi health status of obs i

Ï Di = {0,1} a binary variable for hospitalisation.Ï Rephrase question with notation : "Is Yi affected by hospitalcare ?"

Two potential outcomes :Ï Y1i if Di = 1 (individual status if he goes to hospital)Ï Y0i if Di = 0 (individual status had he not gone to hospital,

irrespective of whether he actually went)

We would like to know Y1i −Y0i .

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 18 / 25

Page 20: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Why controls ? 3/4Naive comparison of average

Average difference in average health=+Average health of hospitalised people

−Average health of non-hospitalised people

Decomposition in 4 terms :Average difference in average health=

+Average health of hospitalised people

−Average health of HP had they not gone to hospital

+Average health of HP had they not gone to hospital

−Average health of non-hospitalised people

Average treatment effect on the treated+Average health of hospitalised people

−Average health of HP had they not gone to hospital

Selection bias+Average health of HP had they not gone to hospital

−Average health of non-hospitalised peopleSciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 19 / 25

Page 21: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Why controls ? 4/4

NotationsÏ E [Yi |Di = 1] = Average health of hospitalised peopleÏ E [Yi |Di = 0] = Average health of non-hospitalised peopleÏ E [Y0i |Di = 1] = Average health of hospitalised people had theynot gone to hospital

Ï E [Y0i |Di = 0] = Average health of unhospitalised people

Mathematically

E [Yi |Di = 1]−E [Yi |Di = 0] = E [Yi |Di = 1]−E [Y0i |Di = 1]

+E [Y0i |Di = 1]−E [Y0i |Di = 0]

Observed difference in averages = average treatment effecton the treated + selection bias

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 20 / 25

Page 22: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Controls and causality

Ï There is causality if the variable of interest is independent ofpotential outcomes.

Ï In randomized controlled trials, this is typically the case.Ï In non-random assignment, we need to assume that aftercontrolling for Ci , both the treated and non-treated groups areequivalent in their remaining characteristics

Conditional Independence AssumptionThe dependent variable (or outcome) is independent of theindependent variable of interest (or treatment), conditionals oncontrol.Formally :

Y ⊥X |C

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 21 / 25

Page 23: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Omitted variable bias

Let’s assume the underlying true model is :

Y =α+β1X1 +β2X2 (3)

We estimate :Y = α̃+ β̃1X1 +ε (4)

How different is β̃1 from β1 ?

Bias on β̃1 depends on the correlation between X1 and X2 :

Cor r (X1, X2) > 0 Cor r (X1, X2) < 0

β̃1 > 0 + -β̃1 < 0 - +

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 22 / 25

Page 24: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Omitted variable bias

Let’s assume the underlying true model is :

Y =α+β1X1 +β2X2 (3)

We estimate :Y = α̃+ β̃1X1 +ε (4)

How different is β̃1 from β1 ?Bias on β̃1 depends on the correlation between X1 and X2 :

Cor r (X1, X2) > 0 Cor r (X1, X2) < 0

β̃1 > 0 + -β̃1 < 0 - +

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 22 / 25

Page 25: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Bad controls

Ï More controls are not always betterÏ Bad controls : variables that are themselves outcome variablesin the experiment

Ï Good control : have been fixed at the time the dependentvariable was determined.

Ï Timing uncertain / Unknown ? → Explicit assumptions aboutwhat happened first, or assumption that none of the controlvariables are themselves caused by the regressor of interest.

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 23 / 25

Page 26: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Dummy Variables 1/2

Single coefficient of dummy X3

Y =α0 +α1X1 +δ0 ∗0+ε

Y =α0 +α1X1 +δ0 ∗1+ε

The omitted category X3 = 0 is called the reference category andis part of the baseline model Y =α, for which all coefficients are null

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 24 / 25

Page 27: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Dummy variables 2/2

Example

Income =α0 +α1 ∗educati on +0∗ f emal e +ε

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 25 / 25

Page 28: Week10 SciencesPo-LouisdeCharsonville Spring2018 · Week10 SciencesPo-LouisdeCharsonville Spring2018 Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 1 / 25

Practice

Ï Rerun week9.do and analyze residuals.

Sciences Po - Louis de Charsonville Statistical Reasoning Spring 2018 26 / 25