13
Stat The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27 Dissemination of Statistics Research A location-scale model for non-crossing expectile curves Sabine K. Schnabel a and Paul H. C. Eilers a,b Received 5 June 2013; Accepted 15 July 2013 In quantile smoothing, crossing of the estimated curves is a common nuisance, in particular with small data sets and dense sets of quantiles. Similar problems arise in expectile smoothing. We propose a novel method to avoid crossings. It is based on a location-scale model for expectiles and estimates all expectile curves simultaneously in a bundle using iterative least asymmetrically weighted squares. In addition, we show how to estimate a density non-parametrically from a set of expectiles. The model is applied to two data sets. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: density estimation; expectiles; location-scale model; P-splines 1 Introduction Mean and median describe only the central tendency of a data sample. Standard deviation and (inter-quartile) range inform us about spread. A set of quantiles provides an almost complete description of the distribution of the data. The same is true for expectiles, which were proposed by Newey & Powell (1987) as a least squares-based analog of quantiles. In recent years, there has been a growing interest in quantile and—to a lesser extent—expectile smoothing. Both provide powerful tools for visualizing the central tendency and spread of data in a scatterplot. A common problem in quantile and expectile smoothing, especially with small data sets, is that curves for different amounts of asymmetry cross each other. In theory, this is not possible, but in practice, because of sampling variation, it is commonly encountered. This behavior is visually disturbing, and it might hamper further analysis for which monotone behavior is mandatory. An example for the occurrence of this problem is shown in Figure 1. The data set consists of daily air quality measurements in New York from 1973. In the first row of the panel, we used a simple least asymmetrically weighted squares (LAWS) model to estimate a range of expectile curves; see Section 3. The smoothing parameters are chosen by asymmetric generalized cross-validation (see Schnabel & Eilers (2009b) for a detailed description). Especially, the more extreme expectile curves show cross-overs. In the second row of the panel, we used our proposed location-scale model (Section 4) to overcome this problem. There have been a number of proposals for improving quantile smoothing to avoid crossing curves. In the next section, we provide a short literature overview. In our previous work on expectile smoothing (e.g. in Schnabel & Eilers, 2009a,b), we also encountered minor problems with crossing expectile curves. However, in our experience, intersecting curves are a less frequent problem in expectiles than with the currently available quantile estimation techniques. Nevertheless, as a remedy to crossing expectile curves, we propose the expectile bundle model. It is a location-scale model and assumes that the relative spacing a Biometris, Wageningen University and Research Centre, PO Box 100, 6700 AC Wageningen, The Netherlands b Department of Biostatistics, Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, The Netherlands Email: [email protected] Stat 2013; 2: 171–183 Copyright © 2013 John Wiley & Sons Ltd

A location-scale model for non-crossing expectile curves

  • Upload
    paul-hc

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A location-scale model for non-crossing expectile curves

StatThe ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

A location-scale model for non-crossingexpectile curvesSabine K. Schnabela� and Paul H. C. Eilersa,b

Received 5 June 2013; Accepted 15 July 2013

In quantile smoothing, crossing of the estimated curves is a common nuisance, in particular with small data setsand dense sets of quantiles. Similar problems arise in expectile smoothing. We propose a novel method to avoidcrossings. It is based on a location-scale model for expectiles and estimates all expectile curves simultaneously ina bundle using iterative least asymmetrically weighted squares. In addition, we show how to estimate a densitynon-parametrically from a set of expectiles. The model is applied to two data sets. Copyright © 2013 John Wiley &Sons, Ltd.

Keywords: density estimation; expectiles; location-scale model; P-splines

1 IntroductionMean and median describe only the central tendency of a data sample. Standard deviation and (inter-quartile) rangeinform us about spread. A set of quantiles provides an almost complete description of the distribution of the data.The same is true for expectiles, which were proposed by Newey & Powell (1987) as a least squares-based analog ofquantiles. In recent years, there has been a growing interest in quantile and—to a lesser extent—expectile smoothing.Both provide powerful tools for visualizing the central tendency and spread of data in a scatterplot.

A common problem in quantile and expectile smoothing, especially with small data sets, is that curves for differentamounts of asymmetry cross each other. In theory, this is not possible, but in practice, because of sampling variation,it is commonly encountered. This behavior is visually disturbing, and it might hamper further analysis for whichmonotone behavior is mandatory. An example for the occurrence of this problem is shown in Figure 1. The data setconsists of daily air quality measurements in New York from 1973. In the first row of the panel, we used a simpleleast asymmetrically weighted squares (LAWS) model to estimate a range of expectile curves; see Section 3. Thesmoothing parameters are chosen by asymmetric generalized cross-validation (see Schnabel & Eilers (2009b) for adetailed description). Especially, the more extreme expectile curves show cross-overs. In the second row of the panel,we used our proposed location-scale model (Section 4) to overcome this problem.

There have been a number of proposals for improving quantile smoothing to avoid crossing curves. In the next section,we provide a short literature overview.

In our previous work on expectile smoothing (e.g. in Schnabel & Eilers, 2009a,b), we also encountered minor problemswith crossing expectile curves. However, in our experience, intersecting curves are a less frequent problem in expectilesthan with the currently available quantile estimation techniques. Nevertheless, as a remedy to crossing expectilecurves, we propose the expectile bundle model. It is a location-scale model and assumes that the relative spacing

aBiometris, Wageningen University and Research Centre, PO Box 100, 6700 AC Wageningen, The NetherlandsbDepartment of Biostatistics, Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, The Netherlands∗Email: [email protected]

Stat 2013; 2: 171–183 Copyright © 2013 John Wiley & Sons Ltd

Page 2: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

Ozo

ne

simple LAWS

5 10 15 20

5 10 15 20

050

100

150

Ozo

ne0

5010

015

0

050

100

150

050

100

150

60 70 80 90

60 70 80 90

p = 0.99p = 0.98p = 0.95p = 0.9p = 0.8p = 0.5p = 0.2p = 0.1p = 0.05p = 0.02p = 0.01

Wind speed (mph)

location− scale

Temperature (F)

Figure 1. Expectile curves (p D 0.01, : : : , 0.99; bottom to top, blue to red) for air quality measurements estimated with thesimple LAWS model (first row). Expectile curves estimated with the location-scale model (second row).

between expectiles is the same everywhere, but its absolute size changes smoothly along the independent variable. Inaddition, it assumes a smooth trend, which however is not necessarily the central tendency. It might also be similar toa baseline curve in case the error distribution is asymmetric. All expectile curves follow from three curves: the smoothtrend, the smooth amplitude curve, and a (generally smooth) asymmetry function, which represents the standardizedexpectiles. The trend and amplitude curves are modeled by P-splines (Eilers & Marx, 1996). The components of thebundle model are estimated by iterative asymmetrically weighted (penalized) least squares.

Expectile curves generally are smoother and seem to do more justice to the data than quantile curves. They give agood impression of the central tendency and the spread of the data in a scatterplot. We also present an algorithm toestimate a smooth non-negative density from a set of empirical expectiles. A color-scale representation of this densitycan be plotted as a background to the data, to give a good impression of their conditional distribution.

The article is structured as follows: In the next section, we give an overview of the existing literature on quantilecrossing. Section 3 describes the derivation of LAWS. The bundle model as an extension of the simple LAWS modelis proposed in Section 4. In Section 5, we show first how to calculate theoretical expectiles for given distributions.Subsequently, we estimate the underlying density from the standardized expectiles as a result of the bundle model.We apply these methods to two data sets: the first one is the well-known textbook data from the light detection andranging (LIDAR) experiment and the second a data set monitoring growth of children. The results can be found inSection 6. The manuscript concludes with a short discussion in Section 7.

2 Literature reviewIn this section, we give a short overview of recent developments in the literature on quantile smoothing that attemptto solve the problem of crossings.

Already in He (1997), two methods were proposed to avoid crossing quantiles: The first, restricted regressionquantiles, provides a flexible framework with favorable computational properties. The second was a model for

Copyright © 2013 John Wiley & Sons Ltd 172 Stat 2013; 2: 171–183

Page 3: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

non- parametric conditional quantile functions. The properties of these restricted regression quantiles were exploredin Zhao (2000). We compare He’s (1997) approach and our expectile bundle model in Section 4.1. on page 6 Yu &Jones (1998) suggested a double kernel estimator for quantiles with the property of non-crossing curves. The paperby Hall et al. (1999) was motivated by Yu & Jones (1998) and uses their class of estimators to estimate a con-ditional distribution. Shortly after this, Heagerty & Pepe (1999) published on a related area. Although they did notexplicitly propose a non-crossing method for quantile estimation, their semi-parametric location-scale model includedthis property. A non-parametric quantile estimator and several possible extensions to the model were described inTakeuchi et al. (2006). They also suggested how to overcome the problem of crossing quantiles. Chernozhukov et al.(2010) investigated the natural monotonization of empirical quantile curves in order to prevent possible crossings.This can be seen as a simple regularization for estimating conditional quantiles using rearrangement. They showedthat the transformed curves were true curves in a finite sample. In contrast, Dette & Volgushev (2008) proposed anew non-parametric estimator for conditional quantiles in order to prevent crossing curves. They use an initial esti-mate of the conditional distribution function and solve inversion and monotonization of the distribution simultaneously.Shim et al. (2009) suggested a new method for non-crossing quantile regression that uses a doubly penalized kernelmachine with four free parameters. This location-scale model can be also used in a multi-dimensional context. Thepaper includes suggestions for model selection criteria. Bollaerts et al. (2006) and later Bondell et al. (2010) imposedmonotonicity restrictions to enforce non-crossing quantile curves. The aforementioned papers attempt to solve theproblem of crossing quantile curves using different approaches such as restricted regression quantiles, non-parametricand semi-parametric techniques, and natural monotonization. To our knowledge, none of these techniques have beensuccessfully applied to the problem of crossing expectiles. This is why we propose in the following a location-scalemodel to estimate non-crossing expectile curves. The expectile bundle model is based on LAWS and determines allexpectile curves simultaneously.

Like the model in He (1997), we fit trends for location and scale. However, our estimation procedure is different. He(1997) proposed to compute a smooth median curve, subtract it from the data, take absolute values of the residuals,and smooth them to obtain an estimate of the error amplitude. This works well with a more or less symmetric errordistribution. When considering an asymmetric error distribution, such as the exponential distribution, with a relativelywavy amplitude, on top of a very smooth or even constant baseline, a smooth curve for mean or median will be wavyas well. Consequently, we will miss the proper description of the data generation process. A similar reasoning holds fora wavy trend line and asymmetric errors with a constant amplitude. Our algorithm is designed to avoid these problems.

Location-scale models are attractive because they give a clear picture of the distribution of the data, summarized bythree curves. Our bundle model is of this type. In practice, we have encountered data sets for which it was inade-quate, because the shape of the error distribution changes with the independent variable. More advanced tools areneeded then.

To handle more complicated data, we have developed expectile sheets, a two-dimensional surface on the .x, p/domain. The surface is estimated with penalized tensor products of B-splines, increasing monotonically with theasymmetry p everywhere. To compute expectile curves for any p, one computes the value of the surface at the chosenp. Expectile sheets present a very general approach to expectile smoothing (Schnabel & Eilers, 2013a). With smallchanges, they also work for quantiles (Schnabel & Eilers, 2013b).

3 Theory of Least Asymmetrically Weighted SquaresIn ordinary least squares (OLS) estimation, one seeks to minimize the sum of squares

SOLS DX

i

.yi � �i/2 (1)

Stat 2013; 2: 171–183 173 Copyright © 2013 John Wiley & Sons Ltd

Page 4: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

where yi is the response variable and �i is the expected value to be estimated. For example, in linear regression, it is�i D ˇ0 C ˇ1xi with ˇ the coefficient vector.

Least asymmetrically weighted squares estimation is a weighted generalization of OLS and seeks to minimize thefollowing weighted objective function for a range of values of the asymmetry parameter p, 0 < p < 1:

S DX

i

wi.p/ .yi � �.xi,ˇ, p//2 (2)

with weights

wi.p/ D²

p if yi > �.xi,ˇ, p/1 � p if yi � �.xi,ˇ, p/

(3)

where yi is the response variable and �.xi,ˇ, p/ is the value to be estimated according to a statistical model evaluatedat xi for asymmetry p and with parameter ˇ. The obtained functions �.xi,ˇ, p/ are called p-expectiles as introducedin Newey & Powell (1987). To simplify the notation, we will drop the dependence on x, ˇ, or p where convenientand unambiguous. OLS estimation is a special case of LAWS with p D 0.5. Fitting the LAWS model is carried out byiterating between weighted regression and re-computing the weights. The objective function is convex, guaranteeing aunique minimum. In most examples, convergence is reached after only a few iteration steps.

To estimate each p-expectile, we combine LAWS with a flexible functional form for the expectile curve. We chosepenalized B-splines (Eilers & Marx, 1996): �i D

Pk bikak, where B D[ bik] is the matrix of B-spline basis functions

and a the coefficient vector. The basis covers the domain of x with a generous number of splines. P-splines include apenalty for assuring a smooth function. Thus, we are seeking to minimize the penalized LAWS function:

S� D .y � Ba/TW.y � Ba/C � kDdak2 (4)

with respect to coefficient vector a. W is a matrix with weights (3) on the main diagonal. Dd is a matrix that formsd-th order differences of a. In practice, second-order or third-order differences are used. This estimation is carried outseparately for each p-expectile. We have a different specification for each curve for the selected values of asymmetry p.Using P-splines requires a choice of the smoothing parameter �. To tackle this issue, we adapted two methods for thesimple LAWS model: the classic concept of cross-validation for this asymmetric setting and Schall’s algorithm (Schall,1991)—originally designed for the estimation of variance components in mixed models. These two approaches forfinding the optimal smoothing parameters are described in detail in Schnabel & Eilers (2009b).

4 Extension of the basic model: the expectile bundle modelUsing the previous approach, estimated expectile curves can cross. Theoretically, this is impossible, but it is commonlyencountered in applications. The curves are estimated in isolation (for different values of p), neighboring expectilesare not taken into account, and so cross-overs may occur.

Examples for the occurrence of this problem are shown in Figure 2, where we use the simple LAWS model to estimatea range of expectile curves. We used asymmetric generalized cross-validation to choose the amount of smoothing.The expectile curves for p D 0.01 and p D 0.02 intersect at a few places. These data are explained in more detail inSection 6.1.

With the expectile bundle model, we offer a solution to this problem. As seen from the literature review in Section 2,there are some suggestions for the solution of a similar problem in quantile estimation. However, to our knowledge, thisproblem has not been investigated for expectiles. In the following, we present a technique for estimating non-crossingexpectiles using the expectile bundle model.

Copyright © 2013 John Wiley & Sons Ltd 174 Stat 2013; 2: 171–183

Page 5: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

400 450 500 550 600 650 700

−1.

0−

0.8

−0.

6−

0.4

−0.

20.

0

Range

Logr

atio

p = 0.99p = 0.98p = 0.5p = 0.02p = 0.01

Figure 2. Expectile curves (p D 0.01, 0.02, 0.5, 0.98, and 0.99; bottom to top, blue to red) with smoothing according togeneralized cross-validation for the LIDAR data (for data description, see Ruppert et al., 2003). Crossing expectiles can befound for p D 0.01 and p D 0.02 up to values of 550 for the independent variable.

4.1. TheoryIn the simple LAWS model, each expectile curve is modeled by a P-spline �i D

Pk bikak separately for each asymmetry

parameter p. In the bundle model, the expectiles �.x, p/ are defined by

�.x, p/ D t.x/C c.p/s.x/ (5)

where t.x/ is a common smooth trend of all expectile curves specified by a P-spline; c.p/ is the asymmetry function,the set of standardized expectiles that represent the error distribution; s.x/ represents the local width of the bundleand is also constructed with P-splines. The expectile bundle model is a location-scale model.

The objective function is

Sb D

mXiD1

nXjD1

wij�yi � t.xi/ � cjs.xi/

�2 (6)

where the vector c with elements cj D c.pj/, j D 1, : : : , J contains the standardized expectiles and wij is the asymmetricweight that follows from the sign of yi � .t.xi/C cjs.xi// according to (3).

We present two algorithms. One is fast and simple and adequate when the error distribution is more or less symmetric.When that is not the case, we have another algorithm, which, however, is slower.

The first algorithm consists of two steps. In step 1, we use straightforward P-spline estimation for the trend curve,and we minimize

St D�y � Bat�T �y � Bat�C �t

��Ddat��2 (7)

with t.x/ D Bat, that is, the common trend is described by a penalized B-spline over the independent variable xincluding the smoothing parameter �t. The estimated coefficients are at.

In step 2, we use the residuals ri D yi � t.xi/ and model them as the product of functions c.p/ and s.x/. This iscarried out in an iterative procedure alternating between calculating the amplitude function s.x/ D Bas with smoothingparameter �s and determining the asymmetry function c.p/ until convergence—which usually occurs after only a fewiterations. To make the model identifiable, the norm of as is set to some arbitrary value. We use

PKkD1.a

sk/2 D K.

In the present set-up, smoothing is involved in both steps of the estimation procedure, namely in modeling the commontrend t.x/ of all expectiles and in estimating the amplitude function s.x/. Generally, for determining the optimal

Stat 2013; 2: 171–183 175 Copyright © 2013 John Wiley & Sons Ltd

Page 6: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

smoothing parameters, there are several options. Here, we opted to use cross-validation. We define the weighted(leave-one-out) cross-validation score as

WCV D1

nJ

nXiD1

JXjD1

wij�yi � O��i,j

�2 (8)

where n is the sample size, J is the number of estimated p-expectiles, and O��i,j D O�.x�i, pj/ is the predicted valueat xi from the remaining observations. In this context, we use this score to perform a 10-fold cross-validation over a10 � 10 values grid of possible combinations of the smoothing parameters .�t,�s/.

As already mentioned in the literature review in Section 2, the model presented in He (1997) has similarities to ourexpectile bundle model. This model for computing non-parametric conditional quantile functions takes the followingform:

y D f.x/C s.x/e (9)

The estimation procedures of both approaches differ substantially. He (1997) determined first the conditional medianfunction and then in a second step estimated the smooth non-negative amplitude function. The third step consistedof the step-wise calculation of the “asymmetry factor” c˛ for each ˛-quantile curve separately. In our expectile bundlemodel, we also first determine the smooth trend function. But in the second step, the estimation of the smoothamplitude function s.x/ and the asymmetry function c.p/ is performed iteratively, using all values of asymmetryparameter p simultaneously.

For some data, it is not a good idea to first determine the mean trend and to model the residuals, as is the case inthe first algorithm. When the error distribution is asymmetric with a non-constant amplitude, the mean curve is not agood estimate of t.x/. The simulated data in Figure 3 present such a case. Here, t.x/ is a straight line and s.x/ is acosine, modulating a Gamma distribution (the sum of two exponential variates).

In the second algorithm, we alternate between updating the estimates of t.x/ and s.x/ jointly and updating c.p/. Givena current estimate Qs.x/, we form a regression basis MB D[ B : QSB] to estimate the vector [ .at/T : .as/T]T. Here, QS isa diagonal matrix with Qs on the diagonal. This algorithm is much slower than the first one, but it gives the desiredresults, as shown in Figure 3.

5 From densities to expectiles and vice versaIn this section, we lay out the connection between expectiles and the underlying distribution of the data. First, weshow the theoretical derivation of expectiles from the distribution function. Then in Section 5.2., we present a novelapproach to estimate the underlying density from a given set of expectiles.

5.1. From densities to expectilesThe theoretical derivation of an expectile can be found in more detail in Newey & Powell (1987). In short, assume wehave a probability density function f.x/ with

F.x/ D

xZ�1

f.u/du and G.x/ D

xZ�1

uf.u/du (10)

Here, F.x/ is the distribution function, that is, the cumulative density, and G.x/ is the (first) partial moment function.In this context, we denote the theoretical p-expectile by ep. In expectile estimation, we minimize (2) with weights

Copyright © 2013 John Wiley & Sons Ltd 176 Stat 2013; 2: 171–183

Page 7: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

xx

0.0 0.2 0.4 0.6 0.8 1.0xx

yy

Data, trend curve and expectile bundle

0.0

0.5

1.0

1.5

2.0

2.5

3.0

r

After subtraction of the trend

Figure 3. The bundle model for simulated data with an asymmetric distribution and a very smooth trend, using the algorithmupdating trend and error amplitude simultaneously (xx 2 [0,1], t(xx) straight line as trend, s(xx) following a cosine plus agamma-distributed error) . The thick blue line in the upper panel indicates the estimated trend. Note that it differs from themean curve (the middle expectile curve), which would have been the trend estimate with the simple algorithm.

according to (3). For a continuous distribution, this is equivalent to

minep

.1 � p/

epZ�1

.u � ep/2f.u/duC p

1Zep

.u � ep/2f.u/du (11)

Minimizing this form leads to

.1 � p/

epZ�1

.u � ep/f.u/duC p

1Zep

.u � ep/f.u/du D 0 (12)

After some algebra and insertion of F.x/ and G.x/, we can determine the theoretical p-expectile ep by

ep D.1 � p/G.ep/C p.� � G.ep//

.1 � p/F.ep/C p.1 � F.ep//(13)

with � as the mean of the underlying distribution F and G.1/ D �. Solving for p, we obtain

p DG.ep/ � epF.ep/

2�G.ep/ � epF.ep/

�C .ep � �/

(14)

Stat 2013; 2: 171–183 177 Copyright © 2013 John Wiley & Sons Ltd

Page 8: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

This relationship follows from Newey & Powell (1987) and Jones (1994). According to Theorem 1 in the formerpublication, expectiles shift and scale like expected values with changes in mean � and standard deviation � off.xj�, �/.

Equation (14) gives an explicit relation for p given ep. Thus, for a given expectile, it is straightforward to compute thecorresponding asymmetry. Unfortunately, this is not the case for ep in (13): one has to use numerical inversion to findthe expectile ep that corresponds to a given asymmetry p.

5.2. From expectiles to densitiesIn addition to calculating the expectiles from a known density, we can also determine the underlying distribution froma given set of expectiles.

Equation (12) implicitly defines expectiles as roots of an equation. If we write

.u, p/ D².1 � p/.u � ep/ u < ep

p.u � ep/ u � ep(15)

then .u, p/ is an (asymmetric) estimating function (Godambe, 1991) and1Z�1

f.t/ .t, p/dt D 0 (16)

has to hold for every pair .p, ep/. Suppose we know ep for a number of values of p. Then we can define the inverseproblem: estimate a density from expectiles. For a finite number of values of p, this is an ill-conditioned problem: onecan construct infinitely many densities that are compatible with the given expectiles—assuming that the expectilesare consistent: ep0 > ep for p0 > p. We need to constrain the density to obtain a well-posed problem. A natural choiceis to require smoothness. To simplify the computations, we assume a fine grid for u. We index u by i and p by j. Let˛ij D .ui, pj/ and ' a discrete approximation to the density f. Then we have thatX

i

'i˛ij D 0 8j

In addition,P

i 'i D 1 has to hold. We add this equation and give it a large weight. Let the roughness of ' bedetermined by

P.�3'i/

2, that is, a third-order difference penalty. We form the penalized least squares problem andminimize

D DJC1XjD1

kj �

Xi

˛ij'i

!2C �d

Xi

��3'i

�2(17)

Here, k is a vector of J zeros and one 1. We have extended the matrix A D[˛ij] with a row of ones to accommodatethe additional condition on ' for a discrete density. We can reformulate this in matrix notation

V D�

A1

�and k D[ 0, : : : , 1]T

D D jjV' � kjj22 C �djjD3'jj22 (18)

where jj.jj2 is the L2 norm.

There is no guarantee that all elements of the discrete density ' will be non-negative. Indeed, in practice, we encounternegative values (see also the first example in Section 6). Therefore, we refine the computation and set ' D e� with

Copyright © 2013 John Wiley & Sons Ltd 178 Stat 2013; 2: 171–183

Page 9: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

a roughness penalty on �. The modified objective function can be optimized by Newton–Raphson iterations usinglog.' �min.'/C 0.02max.'// from the results of the unconstrained estimation as starting values. This is to preventtaking logarithms of negative numbers.

We can use any set of estimated expectiles, irrespective of how it was estimated. It is also possible to calculatethe density from the results of the simple LAWS model where each p-expectile curve is determined individually.This estimation would result in a set of densities determined at every value of the independent variable. For somesituations, this approach might be of interest. We show an example using growth data in Section 6.2.. However, in thefollowing, we describe how to use the results of the bundle model for the estimation of one underlying density. Usingthe asymmetry function c.p/ for the density estimation results in the advantage that we have only a single densitythat is modulated over the independent variable. We choose a grid for u that is in the same order of magnitude ofc.p/, for example, by having a closer look at the range of the standardized residuals .y � t.x//=s.x/. From there, weset up the model matrix V and then proceed to calculate the discrete density ' as described earlier. By this set-up, wecan estimate the underlying global discrete density at values of asymmetry p. It is on the scale of c.p/, which can belinked to the associated values of the response y via the standardized residuals. An example for the density estimationis shown in Section 6.

Additionally, the estimation of the underlying density gives us the opportunity to determine quantiles from expectiles.With this density, we can express a set of quantiles of interest on the scale of the asymmetry function c.p/. Usingthe location-scale model (5) with the relevant values of c.p/, we can calculate quantile curves. Because of theconstruction of the expectile bundle model, this procedure results in smooth non-crossing quantile curves. More detailsand comparison with existing implementations of non-crossing quantile curves exceed the scope of this paper and willbe reported elsewhere.

6 Applications6.1. LIDAR dataAs a first example, we use the LIDAR data set (as described also in Ruppert et al., 2003). This data set consists of221 observations. The independent variable is the distance that the light traveled before it was reflected back to itssource. The dependent variable is the logarithm of the ratio of received light from two laser sources. The smoothingparameters were determined via 10-fold cross-validation over a 10� 10 grid of .�t,�s/ on log10.�2, 5/. The results aredepicted in Figure 4.

The estimated expectiles are a smooth bundle of non-crossing curves. As laid out earlier, it is possible to estimate theunderlying density from the results of the bundle model. The unconstrained and non-negative densities are depictedin Figure 5.

For these data, the underlying density seems to be unimodal and has a fairly symmetric shape. We can overlay thedensity with the data as depicted in Figure 6. This is helpful to give a better impression of the density. Darker shadedareas indicate a higher density while lighter regions symbolize a lower density level. The density is modulated over theabscissa, and we observe a wider spread for longer distances traveled by the light.

6.2. Dutch data on age and height of boysOur second example concerns human growth. The Fourth Dutch Growth study collected cross-sectional data on height,weight, and head circumference of Dutch children (Van Buuren & Fredriks, 2001). The used subset consists of about7000 observations of age and height of Dutch boys (Van Buuren, 2007). Given this large sample size, simple LAWSestimation will by itself already result in smooth non-crossing curves. However, in the context of growth curves, it

Stat 2013; 2: 171–183 179 Copyright © 2013 John Wiley & Sons Ltd

Page 10: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

400 450 500 550 600 650 700

−1.

2−

0.8

−0.

40.

0

Range

Logr

atio

p = 0.01p = 0.02p = 0.05p = 0.1p = 0.2p = 0.5p = 0.8p = 0.9p = 0.95p = 0.98p = 0.99

Figure 4. Expectile curves of the LIDAR data using the expectile bundle model. Asymmetry p D 0.01, : : : , 0.99; bottom totop, blue to red.

Standardized residual

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

Figure 5. Estimated density from expectile bundle results for the LIDAR data. Black line: unconstrained estimation. Light-colored/red line: estimated non-negative density.

400 450 500 550 600 650 700

−1.

2−

0.8

−0.

40.

0

Range

Logr

atio

Figure 6. LIDAR data with estimated density. Light to dark colors express increasing levels of the underlying density.

might be more interesting to have a look at the side result of the bundle model: namely the possibility to estimatethe underlying global density. By modulating the density along the x-axis, we can overlay the density with the data asdepicted in Figure 7.

In addition to the density estimation using the bundle model, we directly estimated the density from the results of theLAWS simple model and determined the corresponding densities for different values of x.

Copyright © 2013 John Wiley & Sons Ltd 180 Stat 2013; 2: 171–183

Page 11: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

1 2 3 4

5010

015

020

0

sqrt(age)

heig

ht

Figure 7. Data on age and height of Dutch boys with estimated density. Light to dark colors express increasing levels of theunderlying density. Data points represent a 10% random sample of the whole data set.

50 100 150 200

0.00

0.05

0.10

0.15

0.20

height

dens

ity

Figure 8. Comparison density estimation at different ages (10 days to 17.5 years— from left to right) using the results fromthe bundle model (solid lines) and from the simple LAWS estimation (dashed lines).

In Figure 8, we compare the results at different ages (10 age points between 10 days and 17.5 years). One canobserve that the estimated densities are very similar and that both techniques lead to approximately the same results.This confirms the validity of density estimation irrespective of the expectile estimation procedure. However, the resultsfrom the bundle model seem to give smoother densities with the additional advantage of a smaller computationaleffort as the density only has to be calculated once.

7 ConclusionIn this paper, we discuss the problem of crossing expectiles and propose an approach to overcome this complication.In theory, expectiles for different values of asymmetry p (as well as quantiles) cannot cross, but it is often encounteredin practice. Our approach consists of a location-scale model with three functions: a smooth trend function, a smoothfunction of the amplitude of the residuals, and an asymmetry function. As implied by the model set-up, we canestimate non-crossing smooth expectile curves. An additional feature of the model comes from the asymmetry function.We can make use of this monotonically increasing function for estimating and smoothing a global underlying densityof the data. Furthermore, the cumulative distribution of the density could be used to determine smooth non-crossing

Stat 2013; 2: 171–183 181 Copyright © 2013 John Wiley & Sons Ltd

Page 12: A location-scale model for non-crossing expectile curves

S. K. Schnabel and P. H. C. Eilers Stat(wileyonlinelibrary.com) DOI: 10.1002/sta4.27 The ISI’s Journal for the Rapid

Dissemination of Statistics Research

quantiles. The presented expectile bundle model can also be extended to a two-dimensional location-scale model toaccommodate two explanatory variables.

A package for R, expectreg, is available in Sobotka et al. (2013). It can fit the simple LAWS model (Schnabel &Eilers, 2009b) well as the expectile bundle described in this manuscript. Figures 2 and 4 can be produced with thispackage. In an upcoming package update, we will also include the density estimation based on the expectile bundlemodel along with illustrating plots.

The location-scale model decomposes the pattern in the data into a trend and a distribution of which the shapeis independent of x but with a variable width. If the model holds, it is attractive, but of course real data will notalways allow this simple view of their nature. As indicated in Section 2, the expectile sheet is a more general solutionwhenever the bundle model is inadequate.

AcknowledgementsThis paper was partly written while the first author was at the Max Planck Institute for Demographic Research. Wethank Peter van der Heijden from Utrecht University and Jutta Gampe from the Max Planck Institute for DemographicResearch for helpful comments on an earlier version of the paper.

ReferencesBollaerts, K, Eilers, PHC & Aerts, M (2006), ‘Quantile regression with monotonicity restrictions using P-splines and

the L1-norm’, Statistical Modelling, 6, 189–207. DOI: 10.1191/1471082X06st118oa.

Bondell, HD, Reich, BJ & Wang, H (2010), ‘Noncrossing quantile regression curve estimation’, Biometrika, 97(4),825–838. DOI: 10.1093/biomet/asq048.

Chernozhukov, V, Fernandez-Val, I & Galichon, A (2010), ‘Quantile and probability curves without crossing’,Econometrica, 78(3), 1093–1125. DOI: 10.3982/ECTA7880.

Dette, H & Volgushev, S (2008), ‘Non-crossing non-parametric estimates of quantile curves’, Journal of the RoyalStatistical Society: Series B, 70(3), 609–627. DOI: 10.1111/j.1467-9868.2008.00651.x.

Eilers, PHC & Marx, BD (1996), ‘Flexible smoothing with B-splines and penalties’, Statistical Science, 11 (2),89–121. DOI: 10.1214/ss/1038425655.

Godambe, VP. (1991). ‘Confidence intervals for quantiles’, in Godambe, VP (ed.), Estimating Functions , OxfordUniversity Press, New York, pp. 211–217.

Hall, P, Wolff, RCL & Yao, Q (1999), ‘Methods for estimating a conditional distribution function’, Journal of theAmerican Statistical Association, 94(445), 154–163. DOI: 10.1214/009053604000001282.

He, X (1997), ‘Quantile curves without crossing’, The American Statistician, 51 (2), 186–192. DOI:10.1080/00031305.1997.10473959.

Heagerty, PJ & Pepe, MS (1999), ‘Semiparametric estimation of regression quantiles with application to standardizingweight for height and age in us children’, Applied Statistics, 48(4), 533–551. DOI: 10.1111/1467-9876.00170.

Jones, MC (1994), ‘Expectiles and M-quantiles are quantiles’, Statistics & Probability Letters, 20, 149–153. DOI:10.1016/0167-7152(94)90031-0.

Copyright © 2013 John Wiley & Sons Ltd 182 Stat 2013; 2: 171–183

Page 13: A location-scale model for non-crossing expectile curves

Stat Location-scale model for non-crossing expectiles

The ISI’s Journal for the Rapid (wileyonlinelibrary.com) DOI: 10.1002/sta4.27Dissemination of Statistics Research

Newey, WK & Powell, JL (1987), ‘Asymmetric least squares estimation and testing’, Econometrica, 55(4), 819–847.

Ruppert, D, Wand, MP & Caroll, RJ (2003), Semiparametric Regression, Cambridge Series in Statistical andProbabilistic Mathematics, Cambridge University Press, Cambridge, UK.

Schall, R (1991), ‘Estimation in generalized linear models with random effects’, Biometrika, 78(4), 719–727. DOI:10.1093/biomet/78.4.719.

Schnabel, SK & Eilers, PHC (2009a), ‘An analysis of life expectancy and economic production using expectile frontierzones’, Demographic Research, 21(5), 109–134. DOI: 10.4054/DemRes.2009.21.5.

Schnabel, SK & Eilers, PHC (2009b), ‘Optimal expectile smoothing’, Computational Statistics & Data Analysis, 53,4168–4177. DOI: 10.1016/j.csda.2009.05.002.

Schnabel, SK & Eilers, PHC (2013a), Expectile sheets for joint estimation of expectile curves. In Revision.

Schnabel, SK & Eilers, PHC (2013b), ‘Simultaneous estimation of quantile curves using quantile sheets’, AStAAdvances in Statistical Analysis, 97, 77–87. DOI: 10.1007/s10182-012-0198-1.

Schnabel, SK & Eilers, PHC (2009), ‘Optimal expectile smoothing’, Computational Statistics & Data Analysis, 53,4168–4177. DOI: 10.1016/j.csda.2009.05.002.

Shim, J, Hwang, C & Seok, KH (2009), ‘Non-crossing quantile regression via doubly penalized kernel machine’,Computational Statistics, 23, 83–94. DOI: 10.1007/s00180-008-0123-y.

Sobotka, F, Kneib, T, Schnabel, S & Eilers, P (2013), expectreg: Expectile regression. R package version 0.37.

Takeuchi, I, Le, QV, Sears, TD & Smola, AJ (2006), ‘Nonparametric quantile estimation’, Journal of Machine LearningResearch, 7, 1231–1264.

Van Buuren, S (2007), Worm plot in quantile regression: code and data [Internet]. http://www.stefvanbuuren.nl/wormplot/dutchdata.boys.sdd.txt.

Van Buuren, S & Fredriks, AM (2001), ‘Worm plot: a simple diagnostic device for modeling growth reference curves’,Statistics in Medicine, 20, 1259–1277. DOI: 10.1002/sim.746.

Yu, K & Jones, MC (1998), ‘Local linear quantile regression’, Journal of the American Statistical Association, 93(441), 228–237. DOI: 10.1080/01624159.1998.10474104.

Zhao, Q (2000), ‘Restricted regression quantiles’, Journal of Multivariate Analysis, 72, 78–99. DOI:10.1006/jmva.1999.1849.

Stat 2013; 2: 171–183 183 Copyright © 2013 John Wiley & Sons Ltd