22
Test (2010) 19: 558–579 DOI 10.1007/s11749-010-0186-2 ORIGINAL PAPER Small area estimators based on restricted mixed models Cristina Rueda · José A. Menéndez · Federico Gómez Received: 15 July 2008 / Accepted: 11 February 2010 / Published online: 6 March 2010 © Sociedad de Estadística e Investigación Operativa 2010 Abstract This paper proposes a new model-based approach to estimate small areas that extends the Fay–Herriot methodology. The new model is additive, with a ran- dom term to characterize the inter-area variability and a nonparametric mean function specification, defined using the information on an auxiliary variable. The most signif- icant advantage of the proposal is that it avoids the model misspecification problem. The monotonicity is the only assumption about the functional form of the relationship between the variable of interest and the auxiliary one. Estimators for the area means are derived combining “Order Restricted Inference” and standard mixed model ap- proaches. A large simulation experiment shows how the new approach outperforms the Fay–Herriot methodology in many scenarios. Besides, the new method is applied to the Australian farms data. Keywords James–Stein estimator · Small areas · Order restricted inference · Mixed models · EBLUP estimator · Fay–Herriot model Mathematics Subject Classification (2000) 62F30 · 62J07 1 Introduction Small area estimation is receiving a lot of attention due to its important applica- tions, particularly in official statistics. The EBLUP (empirical best linear unbiased predictor) estimator based on the Fay–Herriot (1979) model is one of the most pop- ular methodologies used by private and public agencies because of its flexibility in combining different sources of information and explaining different sources of errors. C. Rueda ( ) · J.A. Menéndez · F. Gómez Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, 47005 Valladolid, Spain e-mail: [email protected]

Small area estimators based on restricted mixed models

Embed Size (px)

Citation preview

Test (2010) 19: 558–579DOI 10.1007/s11749-010-0186-2

O R I G I NA L PA P E R

Small area estimators based on restricted mixed models

Cristina Rueda · José A. Menéndez ·Federico Gómez

Received: 15 July 2008 / Accepted: 11 February 2010 / Published online: 6 March 2010© Sociedad de Estadística e Investigación Operativa 2010

Abstract This paper proposes a new model-based approach to estimate small areasthat extends the Fay–Herriot methodology. The new model is additive, with a ran-dom term to characterize the inter-area variability and a nonparametric mean functionspecification, defined using the information on an auxiliary variable. The most signif-icant advantage of the proposal is that it avoids the model misspecification problem.The monotonicity is the only assumption about the functional form of the relationshipbetween the variable of interest and the auxiliary one. Estimators for the area meansare derived combining “Order Restricted Inference” and standard mixed model ap-proaches. A large simulation experiment shows how the new approach outperformsthe Fay–Herriot methodology in many scenarios. Besides, the new method is appliedto the Australian farms data.

Keywords James–Stein estimator · Small areas · Order restricted inference · Mixedmodels · EBLUP estimator · Fay–Herriot model

Mathematics Subject Classification (2000) 62F30 · 62J07

1 Introduction

Small area estimation is receiving a lot of attention due to its important applica-tions, particularly in official statistics. The EBLUP (empirical best linear unbiasedpredictor) estimator based on the Fay–Herriot (1979) model is one of the most pop-ular methodologies used by private and public agencies because of its flexibility incombining different sources of information and explaining different sources of errors.

C. Rueda (�) · J.A. Menéndez · F. GómezDepartamento de Estadística e Investigación Operativa, Universidad de Valladolid, 47005 Valladolid,Spaine-mail: [email protected]

Small area estimators based on restricted mixed models 559

Policy decisions, such as allocation of government funds across states or counties, areoften based on EBLUP estimates, and any contribution to refining the methodology isimportant in providing more precise estimators. An example of this is the Small AreaIncome and Poverty Estimates (SAIPE) program created by the U.S. Census Bureau,to provide more current estimates of poverty statistics than the most recent decen-nial census. The estimates are neither direct counts from administrative records nordirect estimates from sample surveys because they do not provide good estimatorsfor all counties. Instead, the estimators are developed using a relationship betweenthe direct estimates from the U.S. Current Population Survey (CPS) and administra-tive records data and age group poverty estimates from the previous decennial census(http://www.census.gov/hhes/www/saipe).

Typically, a Fay–Herriot model assumes a D × p matrix (D being the numberof areas) of scaled auxiliary variables x related to the D-dimensional parameter ofinterest Y by a linear model Yd = x′

dβ + ud , where β is the p-dimensional vector ofunknown regression coefficients that is estimated from the data, and ud is the modelerror. Moreover, it is also assumed that the direct estimates satisfy yd = Yd + ed,

where ed is the survey error. In the SAIPE program, D is the number of states orcounties, x is the administrative and census data in the log scale, and (yd,Yd) arethe CPS direct estimate poverty ratio and the true population poverty ratio for area d ,respectively.

In SAIPE and many other applications, one or two of the following shortcomingsoccur: first, the relationship between the response and the auxiliary variables cannotbe specified as linear a priori, and second, that outliers are likely to be present (Huangand Bell 2006; Chambers and Tzavidis 2006; Opsomer et al. 2008, among others).Under these circumstances, there could be an erroneous specification and estimationof the model as well as less precise small area estimators.

An interesting route to improve the predictive power of the small area estimatorsmight be to relax the linear assumption in the fixed effects, defining alternative andmore robust forms for the regression. In this paper, we propose an approach to smallarea estimation in this direction.

To motivate the definition of the proposed model, we will first comment on the roleof auxiliary variables in the small area estimation problem. A general assumption inthis context is that some kind of shrinkage would provide more precise estimators inthe presence of high sample variability. The simplest way of achieving this shrinkageis by using the James–Stein estimator based on shrinkage towards the mean, whichtherefore implies the use of the a priori guess of equality among the areas. This,mathematically, means that the shrinkage region is of dimension 1. In many appli-cations this seems to be a too strong assumption, and the availability of auxiliaryinformation is used to define other (higher-dimensional) shrinkage regions. The Fay–Herriot methodology uses auxiliary variables to define linear shrinkage subspaces ofdimension p. In this approach, an a priori linear relationship between x and Y is thusassumed.

A linear relation is a simplification of a monotone relationship, and this latterkind of relationship very often arises, or is easily assumed, in applications. The ideapresented in this paper is to use only the knowledge of the monotonicity to definethe shrinkage region. We do that assuming an ordering among the areas defined by

560 C. Rueda et al.

the auxiliary information. The corresponding shrinkage space is a cone in RD, whichwe use to define composite estimators. The mathematical formulation of this ideacan be done as a generalization of the Fay–Herriot model. As will be shown in thepaper using simulation results, it gives more precise estimators than the Fay–Herriotmethodology in a variety of scenarios.

The new model fits especially well in the case of having an ordinal variate asauxiliary information. An example of this situation could be the determination ofpoverty rankings (David 2003), where the auxiliary variate is the ranking obtainedfrom the principal component score from commune-level indicators.

The new approach is based on assuming a mixed model where the fixed part is anonlinear model for the monotone relationship of response with the auxiliary infor-mation. The model is estimated using Order Restricted Inference (ORI) methodology.In this first attempt we consider a basic area level model with one auxiliary variateto explain and show the advantages of the new methodology using a simple modeloften considered in practice. However, extensions of the simple model in many direc-tions are also admissible in this framework. These include models defined for otherassumptions about the shape of the relationship between the response and the auxil-iary information, such as unimodality, unit-level models, and multivariate auxiliaryvariables. To estimate the MSE of the ORI estimates, we propose to use a bootstrapapproach, similar to the one proposed by Chatterjee et al. (2008) and Hall and Maiti(2006). We illustrate the performance of the methodology in Sect. 4.2.

Nonlinear and nonparametric approaches are not as popular as they should be insmall area estimation, due in large part to the methodological difficulties. Two re-cent references are Chambers and Tzavidis (2006), where M-quantile regression issuggested in the presence of outliers, and Opsomer et al. (2008), where a penalizedspline regression mixed model is used. Also, an interesting recent contribution onnonlinear and nonparametric approaches is Ugarte et al. (2009a, 2009b). These alter-native approaches differ from ours in that the first one assumes a linear model for theconditional quantile Qq(X) = X′βq , avoiding problems associated with the specifi-cation of random effects. On the other hand, the curve-fitting approach of Opsomer etal. (2008) proposes an additive model, as in the Fay–Herriot, with a random effect tocharacterize area variability, but it proposes modeling the deviation from a parametricspecification of the mean function (X′β), adding a second random effect term to thefixed term. The approach is robust against the linearity assumption but does not avoidthe problem of model specification. Our proposal also assumes an additive modelwith a random area effect term, but the mean function is modeled in a pure nonpara-metric fashion, with only the assumption of monotonicity, so the approach is robustagainst the linearity assumption also avoiding the problem of the model specification.

The proposed approach is based on ORI methods. The most important referencesin the subject are the book by Robertson et al. (1988) and the book by Silvapulle andSen (2005). Basics results in ORI guarantees that the methodology proposed in thispaper will work to reduce the MSE of the direct estimator, although we do not knowa priori how it will perform in comparison with standard composite estimators.

The particular problem of estimation in constrained mixed models has not beenvery well studied in the literature. We can cite the paper by Pan and Khattree (1999),where the MLE for order restricted balanced mixed models are analyzed. However,

Small area estimators based on restricted mixed models 561

the estimation problem for order restricted unbalanced models has not been dealtwith in the literature. We consider this latter case in this paper, and we also introducealternatives to the MLE estimators for the variance components. Also, a first proposalfor using ORI methods in small area problems was made by Rueda and Menéndez(2005), where alternatives to James–Stein are given when no auxiliary information isprovided. We have included an appendix where some basic notation and results forrestricted normal models are commented on.

In Sect. 2, we define a general mixed model with a random effect for the smallareas and also, using Gaussian errors with known variances, derive maximum like-lihood estimators for the small area means. The estimator reduces to a compositeestimator as in the Fay–Herriot model. In this case, the synthetic part is an orderrestricted estimator. Different estimators for the variance of the random effect areproposed in Sect. 3, which are compared using theoretical and simulated results.The corresponding empirical maximum likelihood estimators are also defined in thissection. In Sect. 4, two numerical experiments are presented to show how the newmethod behaves when compared with the more traditional Fay–Herriot and James–Stein methods. In Sect. 5, the new approach is applied to data from Australian farms.Finally, the conclusions are addressed in Sect. 6.

2 Restricted area-level model

Let Yd = μd , d = 1, . . . ,D (D > 3), be the parameters of interest that we will as-sume, for simplicity, to be the area means. We have, for each area, a direct estimatoryd and the information on an auxiliary variable xd . Let us assume that the observa-tions are ordered according to x (xi = x(i)). Consider the general restricted mixedmodel that is given by the two-level model:

Level 1: Sampling model: yd |μd ∼ N(μd,σ 2d ), d = 1, . . . ,D.

Level 2: Linking model: μd ∼ N(θd, σ 2u ), d = 1, . . . ,D, θ = (θ1, . . . , θD) ∈

C(x).Level 2 links the true small area means μd to the auxiliary variable by using C(x),

a region in �D whose definition states the type of relationship between x and Y.An important particular case is the Fay–Herriot model, where C(x) = L(x) = {θ =α1 +βx} is a linear subspace generated by (1, x). Besides, the James–Stein estimatorarises whenever C(x) = L = {θ ∈ �D/θ1 = · · · = θD} (Rao 2003).

The particular case that we deal with in this paper is where C(x) is the cone thatdefines the order induced by x. As the areas are ordered according to x, we have thatC(x) = C = {θ ∈ �D/θ1 ≤ · · · ≤ θD}, which is usually known as the simple ordercone. This latter cone represents mathematically the a priori guess that Y increaseswith x and then that the auxiliary information is used to derive an order between thearea parameters: θ1 ≤ · · · ≤ θD .

The two-level model can be viewed as a mixed model:

yd = μd + ed = θd + ud + ed, d = 1, . . . ,D, θ ∈ C, (1)

where ed ’s and ud ’s are independent, ed ∼ N(0, σ 2d ), ud ∼ N(0, σ 2

u ), and yd ∼

N(θd, σ 2d + σ 2

u ).

562 C. Rueda et al.

The log-likelihood is given by

l = logL = −1

2D log(2π) − 1

2

D∑

d=1

log(σ 2

d + σ 2u

) − 1

2

D∑

d=1

(yd − θd)2

σ 2d + σ 2

u

.

Consider V = var(y) = diag(σ 21 + σ 2

u , . . . , σ 2D + σ 2

u ) and W = V−1. When matrixV is known, the MLE for θ is given by the projection of y onto C using the metricgiven by W (see Appendix A.1):

θC = arg minθ∈C

D∑

d=1

(yd − θd)2

σ 2d + σ 2

u

= Pw(y|C).

By using the conditional mean as the value that gives the minimum quadraticmean error in prediction, that is, E(ud |yd) = E(ud)+ Cov(ud, yd)(Var(yd))−1(yd −E(yd)) = σ 2

u

σ 2d +σ 2

u

(yd − θd), we get the random effect predictor

uCd = σ 2

u

σ 2d + σ 2

u

(yd − θC

d

),

and also, we have the estimators for the area means:

μCd =

(1 − σ 2

u

σ 2d + σ 2

u

)θCd + σ 2

u

σ 2d + σ 2

u

yd, d = 1, . . . ,D. (2)

In a similar way to the BLUP estimator (Rao 2003), μCd is expressed as a weighted

average of the direct estimator yd and the order restricted-synthetic estimator θCd ,

where σ 2u

σ 2d +σ 2

u

is called the shrinkage factor and measures the uncertainty in model-

ing θ ′ sd , namely the model variance σ 2

u relative to the total variance: σ 2d + σ 2

u .

3 Estimation under unknown random effect variance

Assume that σ 2u is unknown. To obtain the empirical MLE, we deal with two cases

that are usually considered in applications: σ 2d known and equal or possibly unequal.

3.1 CASE A: σ 2d known and equal, σ 2

u unknown

Let σ 2d = σ 2. Then W =( 1

σ 2+σ 2u)I, so that θC = arg minθ∈C

∑Dd=1 (yd − θd)2 =

P(y|C), the orthogonal projection of y onto C.We introduce several possible estimators for σ 2

u . Some properties of these estima-tors are analyzed in Proposition 1 below, which is proven in Appendix A.2.

The first candidate is the MLE of σ 2u , which is given by Pan and Khattree (1999).

It is known that this estimator is biased:

σ 2,MLEu =

∑Dd=1(yd − θC

d )2

D− σ 2.

Small area estimators based on restricted mixed models 563

Our second option is to consider

σ 2,Lu =

∑Dd=1(yd − θC

d )2

∑Dd=1 P(d)(D − d)

− σ 2,

where P(d) are known quantities, called level probabilities, that can be obtainedfrom closed formulas (see Robertson et al. 1988, Corollary B, p. 82) and are definedin Appendix A.1.

σ2,Lu arises as a natural unbiased estimator under the assumption θ ∈ L, as shown

in Proposition 1 (i) below. However, θ ∈ L is a strong assumption in many applica-tions. We next define a new estimator that is almost unbiased in a wide parameterspace.

For any B ⊂ {1, . . . ,D − 1}, consider the subspace LB = {v ∈ �D/vd = vd+1,

d ∈ B} and the region RB = {yε�D/θC = P(y|LB)}. Consider also the cone CB =LB ∩ C and the associated level probabilities P(d,CB). Whenever y ∈ RB , θC =P(y|LB), a new estimator is given by

σ 2,Cu =

∑Dd=1(yd − θC

d )2

D − dim(LB)− σ 2. (3)

From the definitions we see that the three estimators defined above are quite simi-lar, the only difference being the denominator in the first term. With the aid of Propo-sition 1, we will show that the three are negatively biased and that under generalassumptions the best choice, using the bias criterium, is σ 2C

u .

Proposition 1

(i) Eθ

2,Lu

) = σ 2u , θ ∈ L.

(ii) Eθ

2,Cu

) = σ 2u −

(1

D!)(

σ 2 + σ 2u

), θ ∈ L.

(iii) Eθ

2,Cu

∣∣y ∈ RB

) = σ 2u , θ ∈ LB, B ⊂ {1, . . . ,D − 1}.

(iv) σ2,MLEu ≤ σ

2,Mu , M = L,C.

(v) Eθ

2,Mu

) ≤ Eθ0

2,Mu

), θ0 ∈ L, θ ∈ C, M = MLE,L.

(vi) Limλ→∞Eλθ

2,Cu

) = σ 2u −

(1

(D − dim LB)!)(

σ 2 + σ 2u

),

θ ∈ LB, θd < θd+1 ∀d /∈ B, B ⊂ {1, . . . ,D − 1}.

(vii) Limλ→∞Eλθ

2,Lu

) =(∑D

d=dim LBP (d,CB)(D − d)

∑Dd=1 P(d)(D − d)

)σ 2

u ,

θ ∈ LB, θd < θd+1 ∀d /∈ B, B ⊂ {1, . . . ,D − 1}.

Proposition 1(iv) shows that both σ2,Lu and σ

2,Cu are greater than σ

2,MLEu , and (v)

proves that σ2,Lu is negative biased for other values of the parameters. Therefore, from

564 C. Rueda et al.

Proposition 1 (iv) and (v) we conclude that the bias term of σ2,MLEu is always greater

than that of σ2,Lu , and we therefore discard σ

2,MLEu in the following. On the other

hand, for θ ∈ L, Proposition 1(ii) provides the negative bias term for σ2,Cu , which

is negligible for moderate and high D, and for θ ∈ LB , Proposition 1(iii) shows thatσ

2,Cu is conditionally unbiased.

In most small area applications, D > 15 and D > 20 very often, which impliesthat

(σ 2,C

u

) � σ 2u , θ ∈ L.

Besides, from Proposition 1(vi) we conjecture that σ2,Cu is also negative biased

when θ ∈ C and also, from Proposition 1(vii), that for values of θ ∈ C far from L,σ

2,Cu is less biased than σ

2,Lu . For instance, consider λ > 0, θ ∈ LB , θd < θd+1, d /∈ B,

σ 2 = σ 2u , and D = 30. In this case, we have, from Proposition 1(vi) and (vii), that

limλ→0

Eλθ

(σ 2,C

u

) = σ 2u

(1 − 2

30!)

limλ→∞Eλθ

(σ 2,C

u

)

= σ 2u

(1 − 2

(30 − dimLB)!)

={= σ 2

u (1 − 227! ) ≥ ( 23.05

25.9 )σ 2u = limλ→∞ Eλθ (σ

2,Lu ), dim LB = 3,

= σ 2u (1 − 2

20! ) ≥ ( 16.4025.9 )σ 2

u = limλ→∞ Eλθ (σ2,Lu ), dim LB = 10.

The equalities and inequalities above show that along the line λθ, λ > 0, the biasof σ

2,Cu is negligible if dim LB is not too high. However, the bias of σ

2,Lu increases a

lot with λ and for moderate values of dim LB.

In real applications, negative values of the estimators defined above are possible.In order to get coherent results, we consider the positive versions that we define asfollows:

σ 2,Mu = max

(σ 2,M

u ,0), M = L,C. (4)

From this definition we have that Eθ (σ2,Mu ) ≥ Eθ(σ

2,Mu ), M = L,C, so the nega-

tive bias of σ2,Mu is partly corrected.

In Sect. 4, both estimators σ2,Lu and σ

2,Cu are compared, using the MSE, with that

obtained from the Fay–Herriot model, σ2,FHu , the latter being better for θ∈ L(x), but

the former better for θ ∈ C − L(x).Between the two estimators, σ

2,Cu gives the best results in simulations and is com-

putationally as simple as σ2,Lu . Moreover, as we have commented above, σ

2,Cu is less

biased, a fact that is also shown with the simulations results. We select this estimatorfor the rest of the presentation.

Although it is not shown in the paper so as to simplify the results in Sect. 4, wewould like to note that the MSE values obtained in the simulations for the discardedestimator based on the MLE are greater than those for the selected estimators in everyconsidered scenario.

Small area estimators based on restricted mixed models 565

In particular, using σ2,Cu to estimate σ 2

u , the empirical predictor of μd is definedas follows:

μCd =

(1 − σ

2,Cu

σ 2 + σ2,Cu

)θCd + σ

2,Cu

σ 2 + σ2,Cu

yd, d = 1, . . . ,D. (5)

Next, we state in Proposition 2 some theoretical findings about the behavior of theestimators of μ. These results show that when the ORI model is correct but not theFay–Harriot model, i.e., θ /∈ L(x) and θ ∈ LB ∩C, and ‖θ‖ increases, the Fay–Herriotestimator becomes the direct estimator. Therefore, the misspecification of the modelresults in a total loss of efficiency of the estimator. However, the behavior exhibited bythe ORI estimator is different, even at the limit. The results in Proposition 2 support,in part, the numerical findings exhibited in the next section.

Proposition 2 Let θ0 ∈ LB , θ0d < θ0d+1 ∀d /∈ B , B ⊂ {1, . . . ,D − 1}, λ > 0, andθ = λθ0.

(i) y − μFH a.s→λ→∞ 0;

(ii) μC − μCBa.s.→

λ→∞ 0;

(iii) y − μC�

λ→∞ 0.

3.2 CASE B: σ 2d known but possibly unequal, σ 2

u unknown

In this case, θC depends on σ 2u because W have unequal diagonal elements.

We propose an iterative procedure to obtain σ 2u and θC . The procedure is similar

to the one proposed by Fay and Herriot (1979) and is based on the next equality thatfollows from (6) in Appendix A.1:

[h(σ 2

u

)∣∣y ∈ RB

] = D − dim LB, θ ∈ LB,

where h(σ 2u ) = ‖y − Pw(y|C)‖2

w.

θC = Pw(y|C) and σ 2u are obtained by solving the equation h(σ 2

u ) = D − dim LB

iteratively and letting σ 2u = 0 when no positive solution exists. The procedure in-

cludes a control variable co to assure that dim LB does not vary in the next iterationonce dim LB has attained the same value in two successive iterations.

1. Start with σ2(0)u = 0, co = 0, and dim L(0)

B = D.

2. Obtain W(i) = diag((σ2(i)u + σ 2

1 )−1, . . . , (σ2(i)u + σ 2

D)−1), PW(i) (y|C), and

dim L(i)B , where L(i)

B satisfy PW(i) (y|C) = PW(i) (y|L(i)B ).

3. If dim L(i)B = dim L(i−1)

B = D, or c0 > 0 and dim L(i)B �= dim L(i−1)

B , make

σ 2(i)u = σ 2(i−1)

u .

566 C. Rueda et al.

If co = 0 or dim L(i)B = dim L(i−1)

B < D, define

σ 2(i)u = max

(0, σ 2(i−1)

u + D − dim L(i)B − h(σ

2(i−2)u )

h′∗(σ2(i−1)u )

),

where h′∗(σ 2u ) = −∑D

d=1(yd−PW (y|C)d )2

(σ 2d +σ 2

u )2 approximates the derivative of h(σ 2u ).

4. If dim L(i)B = dim L(i−1)

B , then co = co + 1.

5. Repeat Steps 2 to 4 until convergence.

Convergence is rapid, in most cases requiring less than 10 iterations. However,in some cases, in simulations, the procedure does not converge, and then the equal-ity h(σ 2

u ) = D − dim LB is not achieved. Even in these cases, the procedure gives

admissible σ2(10)u values.

Simulation experiments show that the performance of the estimator, obtained fromthe iterative procedure described above, has a similar behavior, in terms of bias andMSE, to that of σ

2,Cu for the equal sample variances case introduced in Sect. 3.1.

Using h(σ 2u ) = ∑D

d=1 P(d)(D − d) instead of h(σ 2u ) = D − dim LB , a similar

iterative procedure could be defined to obtain σ2,Lu . Although convergence is attained

in all cases, both estimators σ2,Lu and μL have a greater MSE than σ

2,Cu and μC in

most simulated scenarios, and we use σ2,Cu to obtain the empirical predictor for the

area means. As in the equal variances case, the predictor is given by

μdC =

(1 − σ

2,Cu

σ 2d + σ

2,Cu

)θd

C + σ2,Cu

σ 2d + σ

2,Cu

yd, d = 1, . . . ,D.

4 Results of simulation studies

To evaluate the behavior of the new estimators, we have conducted two simulationexperiments. In the first one, we generate an artificial auxiliary variable, representinga wide range of scenarios with different parameter values. In the second experiment,we have used as a base for the simulation the well-known baseball data set. Moreover,in this latter case, we have used bootstrap to illustrate the estimation of the MSPE ofthe ORI estimators.

4.1 Simulation experiments with an artificial auxiliary variable

Let us assume that the information on an auxiliary variable x = (x1, . . . , xD) is avail-able. In order to simplify the experiment, we get (x1, . . . , xD) = (1, . . . ,D), for whichC = {θ ∈ �D/θ1 ≤ · · · ≤ θD} and L(x) = {θ ∈ �D/θd − θd−1 = θd+1 − θd}. We sim-ulate the model yd = μd + ed = θd +ud + ed, d = 1, . . . ,D, where e = (e1, . . . , eD)

and u = (u1, . . . , uD) are independent, N(0,diag(σ 21 , . . . , σ 2

D)) and N(0,σ 2u I), re-

spectively. We consider the equal and unequal sample variances cases. In the former,we assume σ 2

d = σ 2 = 1, and, in the latter, values with mean equal 1 and ranging

Small area estimators based on restricted mixed models 567

Fig. 1 θ0 ∈ L(x) represents an increasing linear relationship; θc1 to θc5 (∈ C−L(x)) represent increasingrelationships different from the linear one; θx1 to θx4 (/∈ C) represent different nonincreasing relationshipsbetween the areas

from 2.2 to 0.3 have been selected for σ 2d . We simulate the model under two different

σ 2u values, (1,0.25).

We also select different standardized θ values verifying: θ ∈ L(x) (the Fay–Herriotmodel is true), θ ∈ C− L(x) (the mixed order model is true, but the Fay–Herriotmodel is not true), and θ /∈ C (neither the Fay–Herriot nor the mixed order model istrue).

Figure 1 shows the selected θ values and the type of relationships between re-sponse and auxiliary that the values represent: θ0 ∈ L(x) represents an increasing lin-ear relationship, θc1 to θc5(∈ C − L(x)) represent different increasing relationshipsfrom the linear one, see the upper graph in Fig. 1, and θX1 to θX4(/∈ C) representdifferent nonincreasing relationships, as shown in the bottom graph in Fig. 1.

In all of these scenarios, we simulate 11 different models increasing ‖θ‖ from 1to 50. Let

∑Dd=1(θd − θ)2/D be the measure of the inter-area variation. Then, sce-

narios with high values of ‖θ‖ correspond, in our setup, to high inter-area variationand represent situations where the area means are far from the total mean. In thesecases, the synthetic estimator, the sample mean, is expected to perform badly as anestimator for the individual areas, as it is also expected that the James–Stein approach

568 C. Rueda et al.

Table 1 Bias and MSE for σ 2Lu , σ 2C

u , and σ 2FHu . σ 2

d= σ 2

u = 1

θ ‖θ‖ BIAS MSE

σ 2Lu σ 2C

u σ 2 FHu σ 2L

u σ 2Cu σ 2 FH

u

θ0 5 −0.415 −0.195 −0.023 0.395 0.325 0.278

θ0 25 −0.910 −0.466 −0.023 0.867 0.493 0.278

θ0 50 −0.993 −0.645 −0.023 0.988 0.665 0.278

θc1 5 −0.416 −0.193 −0.001 0.394 0.325 0.289

θc1 25 −0.912 −0.470 0.546 0.870 0.488 0.757

θc1 50 −0.992 −0.610 2.250 0.987 0.668 5.998

θc2 5 −0.402 −0.182 0.067 0.386 0.324 0.315

θc2 25 −0.885 −0.414 2.187 0.834 0.466 5.728

θc2 50 −0.982 −0.526 8.804 0.970 0.617 80.360

θc3 5 −0.340 −0.145 0.363 0.352 0.312 0.520

θc3 25 −0.735 −0.276 9.365 0.651 0.397 90.570

θc3 50 −0.887 −0.347 37.405 0.837 0.470 1409.670

θc4 5 −0.321 −0.142 0.051 0.355 0.328 0.300

θc4 25 −0.685 −0.240 2.004 0.610 0.404 4.860

θc4 50 −0.831 −0.268 8.175 0.771 0.458 69.420

θc5 5 −0.288 −0.102 0.141 0.336 0.319 0.330

θc5 25 −0.355 −0.030 4.054 0.362 0.343 17.780

θc5 50 −0.355 −0.030 16.275 0.363 0.344 269.590

will not be a good choice. These scenarios represent the situations where more so-phisticated estimation methods, which use auxiliary information, should be used andshould work.

To summarize, we have performed a total of 440 experiments (with 1000 repeti-tions each) to represent a wide range of real and extreme situations.

We first compare, in Table 1, the MSE of the new estimators σ2,Lu and σ

2,Cu with

that of the σ2,FHu obtained using the procedure in Fay and Herriot (1979). The MSE

of σ2,FHu is clearly smaller only for θ0. Besides, σ

2,FHu performs very badly when

θ /∈ L(x) and ‖θ‖ > 5. Moreover, σ2,Cu has a smaller bias and MSE than σ

2,Cu in all

scenarios.To compare the performance of estimators of the small area means, we have used

the mean squared prediction error (MSPE), E[∑Dd=1(μd − μd)2].

In Fig. 2, the MSPE of the new estimator μC is compared with that of the Fay–Herriot μFH and the direct estimator, y, in the unequal variances scenario. The figurefor the equal variances case is similar, so it is not included. The MSPE for the directestimator is equal to 30 and estimated as 30.50. The 220 points in the graph representthe different parameter configurations considered. Points in the upper triangle (ingrey) correspond to experiments for which the MSPE for μC is smaller than μFH,and points in the lower triangle represent experiments where the opposite occurs. It

Small area estimators based on restricted mixed models 569

Fig. 2 The unequal variances scenarios. The MSPE of μFH vs. the new estimator μC . Larger pointscorrespond to σ 2

u = 0.25, and the smaller to σ 2u = 1

is clear from the figure that, in most cases, the new estimator has a better behavior.The experiments where the Fay–Herriot estimate is clearly superior correspond toparameter combinations verifying θ ∈ L(x), θ0 in the figure, represented by the pointsdrawing horizontal lines in the lower triangle, the upper one for scenarios with σ 2

u =0.25, and the lowest for σ 2

u = 1. Scenarios corresponding to θ /∈ L(x), ‖θ‖ < 10, arerepresented by points close to the diagonal and to the bottom left corner. In thesescenarios, both estimators, μC and μFH, give smaller MSPE values as compared tothe MSPE of the direct estimator. Other scenarios where the Fay–Herriot model isnot true but the restricted model is true, θ ∈ C − L(x), ‖θ‖ ≥ 10, are represented bypoints in the upper triangle far from the upper right corner and are scenarios whereμC clearly gives smaller MSPE values than μFH. In particular, Proposition 2 explainsthe simulation findings for the configuration θc5 , and more approximately for otherconfigurations such as θc3 , θc4 , and θc2 . Finally, points in the upper triangle but closeto the upper right corner represent scenarios where θ /∈ C; in these cases, the newestimator also gives smaller MSPE values.

570 C. Rueda et al.

Table 2 Baseball example: MSE for estimators with different methods

Method Model MSE Observed data MSE

Order (C) 9.60 5.81

Fay–Herriot (FH) 8.64 5.44

James–Stein (JS) 8.53 4.45

Direct 17.91 13.71

4.2 Simulation experiment based on baseball data

We now revisit the baseball data example given in Efron and Morris (1975) and usedby many other authors.

For the player d (d = 1, . . . ,18), let pd and πd be the batting average for thefirst 45 “at-bat” and the true season batting average of the 1970 season, respec-tively. Consider, also used by other authors, the arc-sine transformation: yd =√

n arcsin(2pd − 1), μd = √n arcsin(2πd − 1). We use as an auxiliary information,

x, the previous “at-bat,” and consider the players ordered using x.Assume the following model: yd ∼ N(μd,1), μd ∼ N(θd, σ 2

u ), θd = f (x).In order to find a plausible θd , we fit a polynomial regression of μ against x.

σ 2u = 0.5 is also obtained from this fitted model.

We have simulated the above model (k = 100 iterations).The results of the MSPE for the different estimators of μ appear in Table 2. Figures

in the first column show that the James–Stein estimator, μJS, is the best and alsothat the auxiliary information based estimators (AIBE), μC and μFH, perform verywell, compared to the MSPE values with those of the direct estimator. This is due tothe relatively low inter-area variance. The second column of Table 2 compares theestimators using only the observed y to estimate μ, the arcsin transformation of thetrue season batting average of 1970.

The good behavior of the James–Stein estimators shown in Table 2 implies thatthe auxiliary information is not useful in this particular case.

Figure 3 displays the MSPE of the new estimator μC along with that of μFH, μJS

and the direct estimator y for different scenarios, using the model above for differentvalues of θ = λθ0, log(λ) ranging from −1 to 4. From Fig. 3 we conclude that whenthe inter-area variance increases, μJS is no longer the best estimator and that in thesecases, the new estimator, μC , is the one with the best behavior and the estimator thatuses the auxiliary information more efficiently.

Next, we illustrate the use of bootstrap to estimate the MSPE of the ORI, FH,and JS estimators. We follow a similar parametric bootstrap approach as the one inChatterjee et al. (2008) and Hall and Maiti (2006).

Consider θC = P(y/C) and calculate σ2,Cu using the definition in (3) and (4) with

σ 2 = 1. A bootstrap sample is given by y∗d = θC

d + u∗d + e∗

d , where u∗d and e∗

d are

the values generated from independent N(0, σ2,Cu ) and N(0,1), respectively. Now,

obtaining θC∗ and σ2,C∗u as before, but applied to y∗, we get the bootstrap empirical

predictor μC∗ using (5). By selecting B bootstrap samples, the bootstrap estimator

for the MSPE in the area d is defined by MSPEC

d,B = 1B

∑Bb=1(μ

C∗d,b − θC

d − u∗d).

Small area estimators based on restricted mixed models 571

Fig. 3 Baseball example: MSPE of the predictors μFH, μJS, μC and y

Fig. 4 Baseball example: Bootstrap estimation, MSPEC

d,B , for the ORI approach and the 18 players

Also, we define MSPEC

B = ∑18d=1

MSPEC

d,B . In a similar way, MSPEFHB (MSPE

JSB )

could be obtained using the corresponding FH (JS) estimators for θ and μ.

In Fig. 4, we have included the bootstrap estimates, MSPEC

d,B , for each player d

obtained from the real sample with the ORI approach.

572 C. Rueda et al.

Table 3 Baseball example: Mean values of bootstrap estimates, MSPEB , from 100 simulated samplesunder different values of ‖θ‖

Method log(‖θ‖)2 3 4 5 6

Order (C) 8.47 8.66 9.74 10.20 11.87

Fay–Herriot (FH) 7.06 7.50 10.88 16.05 17.65

James–Stein (JS) 7.10 8.56 13.71 17.14 17.84

Direct 17.99 18.00 17.99 18.00 17.97

In order to validate the bootstrap estimation, we have included a summary, inTable 3, of the results from 100 simulated samples, B = 1000, and different valuesof ‖θ‖. The mean values of MSPEB for the three estimators replicate the valuesin Fig. 3, demonstrating the good performance of the proposed bootstrap approach,which could be adopted to select the best estimator in a given scenario and to estimatethe MSPEs.

5 Analyzing real data: Australian farms

The data for 1652 farms comes from the Agriculture and Grazing Industries Surveys(AAGIS). This data has been analyzed by several authors, Chambers and Tzavidis(2006), among others. The small areas are the within-state regions with D = 29.

The response variable is the logarithm of the total cash costs (y), and the fourauxiliaries are the logarithm of the total area (x1), the logarithm of its cultivated area(x2), the number of beef cattle (x3), and the number of sheep (x4). The AAGIS datawere used to generate a population of 33040 farms reproducing 20 times the originalsample. From the global population we have extracted 100 independently selectedsamples in such a way that each farm has a probability of 0.05 to be in each sample,which gives samples with mean size equal to the original sample. We assume thatσ 2

d = σ 2/nd , where nd is the region sample size, and σ 2 = 0.41 is the estimate ofthe residual variance in a linear mixed model adjusted to the population data with thefour auxiliaries as fixed effects and a random effect for the regions.

We have considered six different models in which auxiliary information is used toobtain the estimators. For each model, we have calculated the MSE of the estimatorsusing the ORI and the Fay–Herriot approaches.

Models m1 to m4 correspond to the use of {x1} to {x4}, respectively, m5 is themodel using the auxiliaries {x1,x4}, and m6 is the model including {x1,x2,x3,x4}.In the last two models, the cones associated with the linear regression hyperplanes,estimated with the Fay–Herriot approach, are used to derive the estimators under theORI approach.

The different models represent a variety of relationships between the response andthe auxiliary information giving us different scenarios to test the new approach.

The multiple weighted correlation coefficients ρω between the response and theauxiliary variables (the total number of farms per region being the weight variable)

Small area estimators based on restricted mixed models 573

Fig. 5 Australian farms: Relative MSE for model m6 under Fay–Herriot and ORI. The order of the regionsis inverse to the MSE of the direct estimator

and the average MSE values over the 100 samples for the six models appear in Ta-ble 4. The MSE values for m1 to m5 are smaller using the ORI approach, and theMSE of m6 is smaller for the Fay–Herriot approach. The ORI approach performsbetter than Fay–Herriot in those situations where the relationship between the re-sponse and the auxiliaries is not close to being linear, which occurs with models m1to m5. These conclusions are in agreement with the results in the simulation study inSect. 4.

In order to show the performance of the estimators at the region level, in Fig. 5, wehave represented the RMSE, the MSE relative to the MSE of the direct estimator, forthe 29 regions, and model m6 using Fay–Herriot and ORI approaches. The RMSE issmaller than 1 in 28 out of 29 regions with both approaches, the maximum RMSE be-ing smaller with the ORI approach. The Fay–Herriot approach gives a smaller RMSEthan the ORI in 18 regions, and the opposite occurs in 11 regions. Coefficients ofvariation and bias estimators also have been calculated at the region level and forother models. Conclusions are according to those derived from Fig. 5 and Table 4.

574 C. Rueda et al.

Table 4 Australian farms:Weighted correlationcoefficients and mean MSE forsmall area estimators underFay–Herriot and ORIapproaches

Model ρw MSE MSE

Fay–Herriot ORI

m1 0.81 0.5056 0.4867

m2 0.18 0.6462 0.6009

m3 0.29 0.6343 0.6171

m4 0.06 0.6844 0.6236

m5 0.84 0.5130 0.4959

m6 0.95 0.4209 0.4486

6 Conclusions

In this paper an extension of the Fay–Herriot model to estimate small area parametersis presented. The model formulation is simple, and the estimates of the parametersinvolved are obtained using simple iterative methods based on the normal likelihood.The new approach is based on assuming other monotone relationships, different fromthe linear one, between the response and the auxiliary variables. In this sense it couldbe considered as a robust approach. The result of an extensive simulation experimenthighlights the improvement using the new methodology in most scenarios. In par-ticular, in scenarios with high inter-area variation, when auxiliary information basedmethods, AIBE, should be preferred against the more simple approach of James–Stein, which does not use the auxiliary information.

The Fay–Herriot estimator is only preferred to the one derived with the new ORIapproach in two sorts of scenarios. The first one is when an “exact” Fay–Herriotstructure is behind the data. This is a model with more mathematical than practicalrelevance. In applications it would be reasonable to assume this model only if the cor-relation is high and the dispersion diagram does not show deviations from linearity.The example in Sect. 5 illustrates this fact, showing that only in the case with a highcorrelation and a good fit of the linearity does the Fay–Herriot estimator give betterMSE values. The other type of scenarios is characterized by small inter-area variation.From the simulations we know that in these latter cases the behavior of the estima-tor under both the Fay–Herriot and the new ORI approach greatly improves the directestimator, giving substantially smaller MSPE. Even the classical James–Stein estima-tor has a very good behavior in these scenarios. Results of the simulation experimentbased on baseball data give an scenario within this framework. In this case the bestchoice is neither the estimator obtained from the new approach nor the Fay–Herriotone, but the James–Stein. The artificial modification of the baseball data, in order togive scenarios with higher inter-area variation, shows that the new ORI methodologyuses the auxiliary information more efficiently than the Fay–Herriot approach.

We conclude that, in applications characterized by a moderate or high inter-areavariation, such as the SAIPE program, where an exact linear relationship among re-sponse and explicative is not clear, where an AIBE, such as the Fay–Herriot, is pre-ferred to simpler methods like James–Stein, the alternative of using the new ORIapproach should be seriously considered because more precise estimators are likelyto be obtained, as simulations results show.

Small area estimators based on restricted mixed models 575

Three important questions remain open: the first, the extension of the ORI method-ology to the case of several auxiliary variables; the second, the extension to unit levelmodels; and the third, the validation, in the general model, of different approaches tothe estimation of standard errors and the development of confidence intervals. De-tailed research into these questions will be part of our future work. However, toadvance the good perspectives for the ORI methodology as a general approach toestimation in small area problems, we now include, a short discussion on the threetopics.

For the first question, a simple proposal is to generate a new summary variable us-ing the auxiliary information (like the first principal component or a linear regression)which can be used to define the simple order cone in the ORI model. This alternativeis considered in the example in Sect. 5. More sophisticated procedures could also bedeveloped to define other regions, not necessarily a simple order cone, in RD , thatbetter reflect the relation between parameters and auxiliary information. For instance,the intersection cone given by the order cones associated with two or more auxiliaryvariables or the region defined by the sum of order cones as in Mammen and Yu(2007). The generalization of the procedure is possible using the corresponding al-gorithms from ORI and similar models and estimators such as those presented in thispaper.

On the other hand, an isotonic unit level model could be defined in terms of amonotone function f (x) as follows: ydj = f (xdj )+ud +edj . Under this model, if the

population size is large, the area means are given by Yd = 1Nd

∑Nd

j=1 f (xdj ) + ud . Anestimate for f (x) can be easily derived from the sample isotonic regression estimatorusing the unit observations, and ui can be predicted as in the area level framework.

Finally, for the third question, we propose to consider the bootstrap scheme usedin Sect. 4.2 in the general setting and alternatives, to estimate standard errors of dif-ferent parameter functions and to derive confidence intervals. Also, naive predictionintervals based on known analytical expressions for the MSPE for the EBLUP, butcentered on the ORI estimator, would be considered. The structure of the modelas a mixed model permits a similar treatment of the problem in this more gen-eral framework than in the standard linear mixed model (Hall and Maiti 2006;Chatterjee et al. 2008; González-Manteiga et al. 2008). Besides, the similarities inthe model structure and in the optimization problem in the estimation step betweenthe ORI and the linear mixed model approaches equally facilitate the incorporationof other extensions initially designed for the latter models, such as the inclusion ofbenchmarking restrictions (Ugarte et al. 2009a, 2009b).

Acknowledgements We would like thank the referees for their careful reading, comments, and sug-gestions, which have contributed to improving the presentation of the paper. We are grateful to professorIsabel Molina for providing us with Australian farms data. This research was partially supported by Span-ish DGES (grant MTM 2009-11161).

Appendix

A.1 The projection of a normal vector onto a simple order cone

Consider a vector y ∈ �D and the simple order cone C = {v ∈ �D/v1 ≤ · · · ≤ vD}.

576 C. Rueda et al.

We show here some interesting properties of the projection of a vector onto C andthe distribution of the corresponding norm for a normal distribution. See Silvapulleand Sen (2005) and Robertson et al. (1988) for a complete development of these andother questions in order restricted inference.

Consider the metric given by the scalar product (u,v) =(u′Wv), where W =diag(w1, . . . ,wD) is a matrix of weights, and the corresponding norm, ‖v‖W =(v′Wv)1/2. For each B ⊂ {1, . . . ,D − 1}, consider the subspace LB = {v ∈ �D/vd =vd+1, d ∈ B}; the cones CB = LB ∩ C are called faces of C. These faces aresimple order cones themselves, where dim(CB) = dim(LB) = D − card(B). Theleast-dimensional face is called the linearity of the cone and is given by the one-dimensional subspace L = {v ∈ �D/vd = · · · = vD}.

The projection of y onto C is Pw(y|C) = arg minv∈C∑D

d=1 wd(yd − vd)2.A unique solution always exists. It is of interest to note that ∀y ∈ �D, ∃B ⊂ {1, . . . ,

D − 1} such that Pw(y|C) = Pw(y|LB) = Pw(y|CB).Let y ∼ N(θ,W−1), θ ∈ L. The distribution of ‖y−Pw(y|C)‖2

W is called Chi-Bar-Squared distribution, a basic distribution in order restricted inference that is stated inthe following paragraph.

For B ⊂ {1, . . . ,D − 1}, let RB = {y ∈ �D/Pw(y|C) = Pw(y|CB) = Pw(y|LB)}.The regions {RB}B are a partition of �D . A well-known result (Lemma 3.13.3 inSilvapulle and Sen 2005) states that

(∥∥y − Pw(y|C)∥∥2

W

∣∣y ∈ RB

) ∼ χ2D−d , where d = dim(LB). (6)

Now, conditioning on every R and using (6), we have that

prθ{∥∥y − Pw(y|C)

∥∥2W ≤ t

} =D∑

d=1

P(d)pr(χ2

D−d ≤ t), (7)

where P(d) = ∑B:dim(LB)=d prθ (y ∈ RB). P(d) depends on C and W.

In practical applications we need to know P(d), which are called level probabili-ties. In the simple case of W = I, these probabilities are tabulated (see Robertson etal. 1988). In general cases numerical procedures can be easily designed to obtain thecorresponding values.

A.2 Proof of Proposition 1

Proof (i), (ii), and (iii) From (6) the distribution of∑D

d=1(yd−θC

d )2

σ 2+σ 2u

, when θ ∈ L,

conditional on RB , is a χ2D−d , where d = dim(LB). The same result is true when

θ ∈ LB. As a consequence of this property, ∀B ⊂ {1, . . . ,D − 1},∀ θ ∈ L ⊂ LB, wehave that

(σ 2,L

u

) = E

(Eθ

( ∑Dd=1(yd − θC

d )2

∑Dd=1 P(d)(D − d)

∣∣∣∣∣y ∈ RB

))− σ 2

Small area estimators based on restricted mixed models 577

=(

σ 2 + σ 2u∑D

d=1 P(d)(D − d)

)Eθ

(D∑

d=1

P(d)χ2D−d

)− σ 2 = σ 2

u ,

and (i) follows. Using similar arguments, Eθ(σ2,Cu |y ∈ RB) = σ 2

u , and (ii) follows.Also,

(σ 2,C

u

) = E

(Eθ

(∑Dd=1(yd − p(y/LB)d)2

(D − dim(LB))|y ∈ RB

))− σ 2

= (σ 2 + σ 2

u

)D−1∑

d=1

P(d) − σ 2 = σ 2u −

(1

D!)(

σ 2 + σ 2u

), (8)

where the last equality follows from Robertson et al. (1988, Corollary A, p. 81) as∑D−1d=1 P(d) = 1 − p(D) = 1 − 1

D! , and (iii) follows.

Part (iv) is straightforward because∑D

d=1 P(d)(D − d) ≤ D, since∑Dd=1 P(d) = 1.From Robertson et al. (1988, p. 102) we know that, ∀ θ ∈ C,

∥∥y + θ − P(y + θ |C)∥∥2 ≤ ∥∥y − P(y|C)

∥∥2.

Then, ∀θ ∈ C, θ0 ∈ L,

prθ[∥∥y − P(y|C)

∥∥2 ≥ c] = prθ0

[∥∥y + θ − P(y + θ |C)∥∥2 ≥ c

]

≤ prθ0

[∥∥y − P(y|C)∥∥2 ≥ c

] = prθ0

[∥∥y − P(y|C)∥∥2 ≥ c

],

where the last equality follows from (7).Now,

(∥∥y − P(y|C)∥∥2)

=∫

prθ[∥∥y − P(y|C)

∥∥2 ≥ c]dc

≤∫

prθ0

[∥∥y − P(y|C)∥∥2 ≥ c

]dc = Eθ0

(∥∥y − P(y|C)∥∥2)

, (9)

and the result (v) follows from this inequality.(vi) and (vii). Let B ⊂ {1, . . . ,D − 1}, θ ∈ LB , θd < θd+1, d /∈ B, λ > 0. Then,

P(y|C) − P(y|CB)a.s.→

λ→∞ 0, and(10)∥∥y − P(y|C)

∥∥2 d→λ→∞

∥∥y − P(y|CB)∥∥2

.

578 C. Rueda et al.

Using (7), similar arguments as in (8), and denoting by P(d,CB) the level proba-bilities corresponding to the cone CB , we have that

Limλ→∞Eλθ

(σ 2,C

u

) = Eλθ

(σ 2,CB

u

) = (σ 2 + σ 2

u

) dimCB∑

d=1

P(d,CB) − σ 2

= σ 2u −

(1

(D − dim CB)!)(

σ 2 + σ 2u

)

= σ 2u −

(1

(D − dim LB)!)(

σ 2 + σ 2u

)

and also that

Limλ→∞Eλθ

(σ 2,L

u

) =(∑dimCB

d=1 P(d,CB)(D − d)∑D

d=1 P(d)(D − d)

)σ 2

u ,

and (vi) and (vii) follow. �

A.3 Proof of Proposition 2

Proof It is straightforward to write the estimators as:

μFH = wF P(y|L(x)

) + (1 − wF )y, where wF = σ 2(D − 2)

‖y − P(y|L(x))‖2;

μC = wCP(y|C) + (1 − wC)y, where wC = σ 2(D − dim LB)

‖y − P(y|C)‖2, for y ∈ RB;

μCB = wCBP (y|CB) + (1 − wCB

)y, where wCB= σ 2(D − dim LB∗)

‖y − P(y|CB)‖2,

for y ∈ RB∗.

For any fixed B ⊂ {1, . . . ,D − 1}, LB∗ (LB∗ ⊇ LB) is the subspace on whichP(y|CB) would be attained. The weights wF , wC , and wCB

depend on λ and y.Now, let z = y − θ , which does not depend on λ. Then, as θ = λθ0, we can write

y − μFH = wF P(y|L(x)⊥

) = wF

[P

(z|L(x)⊥

) + λP(θ0|L(x)⊥

)],

wF = σ 2(D − 2)

‖P(z|L(x)⊥) + λP (θ0|L(x)⊥)‖2

a.s.→λ→∞ 0,

and also λwFa.s.→

λ→∞ 0, and (i) follows.

Also, as θ = λθ0 and B∗ ⊆ B , it follows that θ ∈ LB ⊆ LB∗, and we have thefollowing:

P(z|CB) = P(y − θ |CB) = P(y − θ |LB∗) = P(y|LB∗) − θ = P(y|CB) − θ. (11)

Small area estimators based on restricted mixed models 579

Now, from (11) we have that

wCB= σ 2(D − dim LB∗)

‖y − P(y|CB)‖2= σ 2(D − dim LB∗)

‖z − P(z|CB)‖2(12)

in such a way that wCBis positive and does not depend on λ. Also, from (10) and

(11) we have

y − P(y|C) = z − P(y|C) + P(y|CB) − P(z|CB)a.s.→

λ→∞ z − P(z|CB). (13)

From (13) and (12) we conclude that

wCa.s.→

λ→∞ wCB, (14)

and (ii) follows from this last statement, (11), and (13), since

μC − μCB = wCB

(y − P(y|CB)

) − wC

(y − P(y|C)

) a.s.→λ→∞ 0.

Finally, (iii) follows from (13) and (14) because

y − μC = wC

(y − P(y|C)

) a.s.→λ→∞ wCB

(z − P(z|CB)

). �

References

Chambers R, Tzavidis N (2006) M-quantile models for small area estimation. Biometrika 93(2):255–268Chatterjee S, Lahiri P, Li H (2008) Parametric bootstrap approximation to the distribution of EBLUP and

related prediction intervals in linear mixed models. Ann Stat 36(3):1221–1245David B (2003) Choosing a method for poverty mapping. Food and Agriculture Organization of the United

Nations, RomeEfron B, Morris CN (1975) Data analysis using Stein’s estimator and its generalizations. J Am Stat Assoc

70:311–319Fay RE, Herriot RA (1979) Estimates of income for small places: an application of James–Stein procedures

to census data. J Am Stat Assoc 74:341–353González-Manteiga W, Lombardía MJ, Molina I, Morales D, Santamaría L (2008) Bootstrap mean squared

error of a small-area EBLUP. J Stat Comput Simul 78(5):443–462Hall P, Maiti T (2006) On parametric bootstrap methods for small area prediction. J R Stat Soc, Ser B

68(2):221–238Huang ET, Bell W (2006) Using the t -distribution in small area estimation: an application to SAIPE state

poverty models. SAIPE publications. US Census BureauMammen E, Yu K (2007) Additive isotonic regression. Lecture notes monograph series, vol 55, pp 179–

195Opsomer J, Claeskens G, Ranalli M, Kauermann G, Breidt FJ (2008) Non-parametric small area estimation

using penalized spline regression. J R Stat Soc B 70(1):265–286Pan G, Khattree R (1999) On estimation and testing arising from order restricted balanced mixed models.

J Stat Plan Inference 77:281–292Rao JNK (2003) Small area estimation. Wiley, New YorkRobertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, New YorkRueda C, Menéndez JA (2005) A restricted model approach to improve the precision of estimators. Stat

Transit J 7(3):697Silvapulle MJ, Sen PK (2005) Constrained statistical inference. Wiley, New YorkUgarte MD, Goicoa T, Militino AF, Durbán M (2009a) Spline smoothing in small area trend estimation

and forecasting. Comput Stat Data Anal 53:3616–3629Ugarte MD, Militino AF, Goicoa T (2009b) Benchmarked estimators in small areas using linear mixed

models with restrictions. Test 18(2):342–364