Upload
sabina-bryan
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Quantile Regression
Quantile Regression
• The Problem• The Estimator• Computation• Properties of the Regression• Properties of the Estimator• Hypothesis Testing• Bibliography• Software
Quantile Regression
The Problem
Quantile Regression
• Problem– The distribution of Y, the “dependent” variable,
conditional on the covariate X, may have thick tails.– The conditional distribution of Y may be asymmetric.– The conditional distribution of Y may not be
unimodal.
Neither regression nor ANOVA will give us robust results. Outliers are problematic, the mean is pulled toward the skewed tail, multiple modes will not be revealed.
Quantile Regression
• Problem– ANOVA and regression provide information
only about the conditional mean.– More knowledge about the distribution of the
statistic may be important.– The covariates may shift not only the location
or scale of the distribution, they may affect the shape as well.
Quantile Regression
0
5
10
15
20
25
Five Treatments, 100 Patients
1 2 3 4 5
Raw Data
Treatment
Quantile Regression
1 2 3 4 5
Treatment
0
4
8
12
16
Sco
re
Five Treatments, 100 PatientsMeans with Error Bars
Quantile Regression
Quantiles
0
4
8
12
16
20
0 1 2 3 4 5 6
Treatment
Sco
re
10th %tile
25th %tile
50th %tile
75th %tile
90th %tile
Mean
Quantile Regression
The Estimator
Quantile Regression
• Ordinarily we specify a quadratic loss function. That is, L(u) = u2
• Under quadratic loss we use the conditional mean, via regression or ANOVA, as our predictor of Y for a given X=x.
Quantile Regression
• Definition: Given p ∈ [0, 1]. A pth quantile of a random variable Z is any number ζp such that Pr(Z< ζ p ) ≤ p ≤ Pr(Z ≤ ζ p ). The solution always exists, but need not be unique.Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21} and p=0.5 then Pr(Z<9) = 3/8 ≤ 1/2 ≤ Pr(Z ≤ 9) = 5/8
Quantile Regression
• Quantiles can be used to characterize a distribution:– Median– Interquartile Range– Interdecile Range
– Symmetry = (ζ.75- ζ.5)/(ζ.5- ζ.25)
– Tail Weight = (ζ.90- ζ.10)/(ζ.75- ζ.25)
Quantile Regression
• Suppose Z is a continuous r.v. with cdf F(.), then Pr(Z<z) = Pr(Z≤z)=F(z) for every z in the support and a pth quantile is any number ζ p such that F(ζ p) = p
• If F is continuous and strictly increasing then the inverse exists and ζ p =F-1(p)
Quantile Regression• Definition: The asymmetric absolute loss function is
• Where u is the prediction error we have made and I(u) is an indicator function of the sort u)0u(Ip
u)0u(I)p1()0u(IpLp
0uif0
0uif1
)0u(I
Quantile RegressionAbsolute Loss vs. Quadratic Loss
0
0.5
1
1.5
2
2.5
3
-2 -1 0 1 2
Quad
p=.5
p=.7
Quantile Regression
• Proposition: Under the asymmetric absolute loss function Lp a best predictor of Y given X=x is a pth conditional quantile. For example, if p=.5 then the best predictor is the median.
)x(p
Quantile Regression• Definition: A parametric quantile regression model is correctly specified if, for example,
That is, is a particular linear combination of the independent variable(s) such that
where F( ) is some univariate distribution.
x),x(q)x(p
)x)x((F)xYPr(
)x|)x((F)xX)x(YPr(p
p
pp
x
Quantile Regression
.25
x)x(25.
Quantile Regression
• Definition: A quantile regression model is identifiable if
has a unique solution.
)xY(LEmin pF,
Quantile Regression
• A family of conditional quantiles of Y given X=x.– Let Y=α + βx + u with α = β = 1
x
Y
90%
75%
50%
25%
10%
Quantile Regression
Daily High Temperature
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50
Yesterday
To
day
Quantile Regression
5 10 15 20
20
40
60
80Cool Yesterday (n=259)
Temperature Today
Freq
uenc
y75
1
X 1
18.47.6 X 0
Quantile Regression
15 20 25 30 35 40 45
20
40
60
80Hot Yesterday (n=259)
Temperature Today
Freq
uenc
y61
6
X 1
42.5514 X 0
Quantile Regression• Quantiles at .9, .75, .5, .25, and .10. Today’s temperature fitted to a quartic on yesterday’s temperature.
Temperature Quantiles
0
10
20
30
40
50
60
5 15 25 35 45
Yesterday
To
da
y
Quantile Regression
Computation of the Estimate
Quantile RegressionEstimation
• The quantile regression coefficients are the solution to
• The k first order conditions are
)1(xyxysgnpn
1min
n
1i
'ii
'ii2
121
)2(0xˆxysgn2
1
2
1p
n
1 n
1iip
'ii
Quantile Regression Estimation
• The fitted line will go through k data points.• The # of negative residuals ≤ np ≤ # of neg
residuals + # of zero residuals• The computational algorithm is to set up the
objective function as a linear programming problem
• The solution to (1) – (2) [previous slide] need not be unique.
Quantile Regression
Properties of the Regression
Quantile RegressionProperties of the regression
• Transformation equivarianceFor any monotone function, h(.),
since P(T<t|x) = P(h(T)<h(t)|x). This is especially important where the response variable has been censored, I.e. top coded.
))x|(Q(h)x|(Q T)T(h
Properties
• The mean does not have transformation equivariance since Eh(Y) ≠ h(E(Y))
Quantile RegressionEquivariance
• Practical implications
gularsinnonAx,y,ˆAxA,y,ˆ
x,y,ˆx,xy,ˆ
0x,y,ˆx,y,1ˆ
0x,y,ˆx,y,ˆ
1
k
Equivariance
• (i) and (ii) imply scale equivariance
• (iii) is a shift or regression equivariance
• (iv) is equivariance to reparameterization of design
Quantile RegressionProperties
• Robust to outliers. As long as the sign of the residual does not change, any Yi may be changed without shifting the conditional quantile line.
• The regression quantiles are correlated.
Quantile Regression
Properties of the Estimator
Quantile RegressionProperties of the Estimator
• Asymptotic Distribution
• The covariance depends on the unknown f(.) and the value of the vector x at which the covariance is being evaluated.
1iiiii
1iii
L
xx)x|0(fExxExx)x|0(fE)1(
where
,0Nˆn
Quantile RegressionProperties of the Estimator
• When the error is independent of x then the coefficient covariance reduces to
where
1
2u
xxE0f
1
n
1i ii xxn
1)xx(E
Quantile RegressionProperties of the Estimator
• In general the quantile regression estimator is more efficient than OLS
• The efficient estimator requires knowledge of the true error distribution.
Quantile RegressionCoefficient Interpretation
• The marginal change in the Θth conditional quantile due to a marginal change in the jth element of x. There is no guarantee that the ith person will remain in the same quantile after her x is changed.
ij
ii
x
x|yQ
Quantile Regression
Hypothesis Testing
Quantile RegressionHypothesis Testing
• Given asymptotic normality, one can construct asymptotic t-statistics for the coefficients
• The error term may be heteroscedastic. The test statistic is, in construction, similar to the Wald Test.
• A test for symmetry, also resembling a Wald Test, can be built relying on the invariance properties referred to above.
Heteroscedasticity
• Model: yi = βo+β1xi+ui , with iid errors.
– The quantiles are a vertical shift of one another.
• Model: yi = βo+β1xi+σ(xi)ui , errors are now heteroscedastic.– The quantiles now exhibit a location shift as
well as a scale shift.
• Khmaladze-Koenker Test Statistic
Quantile RegressionBibliography
• Koenker and Hullock (2001), “Quantile Regression,” Journal of Economic Perspectives, Vol. 15, Pps. 143-156.
• Buchinsky (1998), “Recent Advances in Quantile Regression Models”, Journal of Human Resources, Vo. 33, Pps. 88-126.
Quantile Regression
• S+ Programs - Lib.stat.cmu.edu/s
• www.econ.uiuc.edu/~roger
• http://Lib.stat.cmu.edu/R/CRAN
• TSP
• Limdep