IPSS Ch 2. Selection Problem 2.1. The Nature of the Problem Non-Response, Dropped from Census, Sample Attrition in Longitudinal Survey, Censored Data We

IPSS Ch 2. Selection Problem

• 2.1. The Nature of the ProblemNon-Response, Dropped from Census, Sample Attrition in Longitudinal Survey,Censored Data We (Social Scientists) are interested in Treatment-Effects, e.g.,• What is the effect of Treatment on 　 Y ? Schooling Market WagesWelfare Labor supplySentencing Policy Crime commissionNew Drug AIDS patientsSurgery Life spanChemotherapy Life span

We can’t observe the differences.

1


2

Selection ProblemExample:Market Wage depends on, Schooling, Work Experience, Demographic Background (covariates)Note:The selection problem is logically separate from the extrapolation problem. (New Challenge)Extrapolation Problem -arises from the fact that random sampling does not yield observations of y off the support of x. Selection Problem- arises when a censored random sampling process does not fully reveal the behavior of y on the support of x.


• Binary outcome z (y,z,x) z = 1 if y is observed, z = 0 if not observedObserve y only when z =1.Example:y : Market Wagex : Education, Work Experience, Race, Sex, …(covariates)z : observation z =1 observed, z = 0 not observed

3


(2.1) P(y| x) = P(y| x, z = 1) P(z = 1| x) + P(y| x, z = 0) P(z = 0| x) Law of Total Probability Selection probability P(z = 1| x)Censoring probability P(z = 0| x)Conditional probability P(y| x)? because P(y| x, z = 0) is unobservable

(2.2) P(y| x) = P(y| x, z = 1) P(z = 1| x) + gP(z = 0| x)

4


• Outline of Chapter 22.2 worst case scenario: no information on g2.3 an empirical illustration2.4 identifying power of prior information2.5 – 2.8 problems of identifying treatment effects

5


6

2.2. Identification from Censored Samples Alone

Two Negative FactsFact 1. Conditional ProbabilityAssume exogenous or ignorable selection

(2.3) P(y| x, z = 0) = P(y| x, z = 1)

Þ P(y| x) = P(y| x, z = 1)

Can we refute validity of (2.3)? No!Assumption (2.3) is necessarily consistent with the empirical evidence.


7

Fact 2 Conditional Expectation

(2.4) E(y| x) = E(y| x, z = 1) P(z = 1| x) + E(y| x, z = 0) P(z = 0| x)

E(y| x, z = 1), P(z = 1| x), P(z = 0|x) are identifiable,E(y| x, z = 0) isn’t.


8

Bounds on conditional probabilitiesSelection problem is not fatal in the absence of prior information.We still find informative and interpretable bounds. B: set of outcome (e.g., “success”) (2.5) P(y B| x) ∈ = P(y B| x, z = 1) P(z = 1| x)∈ + P(y B| x, z = 0) ∈ P(z = 0| x). P(y B| x, z = 1), P(z = 1|x), P(z = 0|x) are identifiable.∈ but no information on P(y B| x, z = 0)∈


9

Can we say anything about it? Yes! We can find bounds,[Lower Limit, Upper Limit]. (2.6) P(y B| x, z = 1) P(z =1|x) ∈ lower(g=0) ≤ P(y B| x) ∈ ≤ P(y B| x, z = 1) P(z =1| x) + P(z = 0| x)∈upper(g=1)

B: event (y ≤ t)

(2.7) P(y ≤ t| x, z = 1) P(z = 1| x) ≤ P(y ≤ t| x) ≤ P(y ≤ t| x, z = 1) P(z =1| x) + P(z = 0| x).


10

Statistical Inference・ The selection problem is a failure of identification. The bounds are functions of P(y| x, z = 1) and P(z| x). We can estimate the features of these distributions, and obtain estimates of the bounds.

Example: to estimate the bound (2.6) on P(y B| x)∈Estimate P(y B| x, z = 1) and P(z = 1| x) as in Section 1.3.∈ The precision of an estimate of the bound can be measured by confidence interval around the estimate.


11

Distinction between the bound and the confidence interval (around its estimate) The bound on P(y B| x) is a population concept.∈what could be learned about P(y B | x)∈ if one knew P(y B| x, z = 1) and P(z| x). ∈ The confidence interval is a sampling concept.the precision with which the bound is estimated when estimates of P(y B| x, z = 1) and P(z| x) are obtained from ∈a sample of fixed size.

The confidence interval is typically wider than the bound but narrows to match the bound as the sample size increases.


12

2.3. Bounding the Probability of Exiting HomelessnessPopulation: Homeless People at time t0 Outcome (y): y = 1 Home y = 0 Still HomelessBackground (x): race, sex, education, etcSelection: z = 1 interviewed, z = 0 not interviewedConditioning Variable: Sex MaleSample size at t0: 106Sample size at t1: 64 21 out of HL P(y=1| male, z = 1) = 21/64 P(z=1| male) = 64/106 Bound of P(y=1| male) [21/106, 63/106] = [0.20,0.59]


13

FemaleSample size at t0: 31Sample size at t1: 14 3 out of HLBound of P(y = 1| female) [3/31, 20/31] = [0.10, 0.65]Point : Without restrictions on the attrition process, we have got meaningful bounds

Continuous caseCondition: Sex, IncomeIncome : What was the best job you ever had? ($/week) Sample sizeMale 89Female 22


14

Fig.2.1 Attrition Probabilities P(z=0| x)


15

Fig.2.2 Estimated Bounds P(y=1| x)

Lower Bound: P(y = 1| x, z = 1) P(z = 1| x)

Upper Bound: P(y = 1| x, z = l) P(z = 1| x) + P(z = 0| x)


16

・ The estimated bound is tightest at the low end of the income domain and spreads as income increases. The interval ： [.24, .55] at income $50 [.23, .66] at income $600. ・ This spreading reflects the fact that the estimated 　　　　　　　 probability of attrition increases with income. Is the Cup Part Empty or Part Full?P(male exits HL) = P(y = 1|male) : [.20, .59]Improvement from [0.0, 1.0]Can we narrow the interval?Can we pin down the P(y = 1| male)?

Documents

IPSS Ch 2. Selection Problem 2.1. The Nature of the Problem Non-Response, Dropped from Census, Sample Attrition in Longitudinal Survey, Censored Data We