16
Interfaces with Other Disciplines How to measure the impact of environmental factors in a nonparametric production model Luiza Ba ˘din a,b , Cinzia Daraio c , Léopold Simar d,a Department of Applied Mathematics, Bucharest Academy of Economic Studies, Bucharest, Romania b Gh. Mihoc – C. Iacob Institute of Mathematical Statistics and Applied Mathematics, Bucharest, Romania c Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), Sapienza University of Rome, Italy d Institut de Statistique, Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Voie du Roman Pays 20, B 1348 Louvain-la-Neuve, Belgium article info Article history: Received 14 July 2011 Accepted 18 June 2012 Available online 26 June 2012 Keywords: Data Envelopment Analysis (DEA) Free Disposal Hull (FDH) Conditional efficiency measures Nonparametric frontiers Managerial efficiency Two-stage efficiency analysis abstract The measurement of technical efficiency allows managers and policy makers to enhance existing differ- entials and potential improvements across a sample of analyzed units. The next step involves relating the obtained efficiency estimates to some external or environmental factors which may influence the produc- tion process, affect the performances and explain the efficiency differentials. Recently introduced condi- tional efficiency measures (Daraio and Simar, 2005, 2007a,b), including conditional FDH, conditional DEA, conditional order-m and conditional order-a, have rapidly developed into a useful tool to explore the impact of exogenous factors on the performance of Decision Making Units in a nonparametric framework. This paper contributes in a twofold fashion. It first extends previous studies by showing that a careful analysis of both full and partial conditional measures allows the disentangling of the impact of environ- mental factors on the production process in its two components: impact on the attainable set and/or impact on the distribution of the efficiency scores. The authors investigate these interrelationships, both from an individual and a global perspective. Second, this paper examines the impact of environmental factors on the production process in a new two-stage type approach but using conditional measures to avoid the flaws of the traditional two-stage analysis. This novel approach also provides a measure of inef- ficiency whitened from the main effect of the environmental factors allowing a ranking of units according to their managerial efficiency, even when facing heterogeneous environmental conditions. The paper includes an illustration on simulated samples and a real data set from the banking industry. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction and basic notations Productivity analysis is concerned with the evaluation of the performances of firms to identify inefficient units wherein improvements could help increase profitability or reduce costs. Most of the efficiency analysis literature has focused on the estima- tion of the production frontier that provides the benchmark against which the economic producers are evaluated. Nevertheless, a very important component that concerns recent studies is the explana- tion of efficiency differentials by including in the analysis exoge- nous variables or environmental factors that cannot be controlled by the producer but may influence the production process. From a managerial point of view, it is important to identify the particu- larities of the production process or the economic conditions that might be responsible for inefficiency as well as to detect and ana- lyze possible influential factors that can determine changes in pro- ductivity patterns. The choice of the environmental variables has to be done on a case-by-case basis by taking into account the eco- nomic field of application. We will first introduce the notations and the basic assumptions on the Data Generating Process (DGP) characterizing the produc- tion process in the presence of environmental factors. Let X 2 R p þ denote the vector of inputs and let Y 2 R q þ denote the vector of out- puts. We consider a vector of environmental factors Z 2 Z R r that may influence the process and the productivity patterns. Firms transform quantities of inputs into outputs, but the environmental variables may affect this process. Let ðX; A; PÞ be the probability space on which the random variables are defined; we denote by P the support of the joint distribution of (X, Y, Z); and we denote a particular DGP by P 2 P. The impact and influence of Z on the production process may be multiple and can be quite different from one application to an- other. The effect of Z on the production may either affect the range of achievable values for the couples (X, Y), including the shape of the boundaries of the attainable set, or it may only affect the distri- bution of the inefficiencies inside a set with boundaries not depending on Z (only the probability of being more or less far from the efficient frontier may depend on Z), or it can affect both. Finally, 0377-2217/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ejor.2012.06.028 Corresponding author. E-mail addresses: [email protected] (L. Ba ˘din), [email protected] (C. Daraio), [email protected] (L. Simar). European Journal of Operational Research 223 (2012) 818–833 Contents lists available at SciVerse ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

How to measure the impact of environmental factors in a nonparametric production model

  • Upload
    leopold

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How to measure the impact of environmental factors in a nonparametric production model

European Journal of Operational Research 223 (2012) 818–833

Contents lists available at SciVerse ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Interfaces with Other Disciplines

How to measure the impact of environmental factors in a nonparametricproduction model

Luiza Badin a,b, Cinzia Daraio c, Léopold Simar d,⇑a Department of Applied Mathematics, Bucharest Academy of Economic Studies, Bucharest, Romaniab Gh. Mihoc – C. Iacob Institute of Mathematical Statistics and Applied Mathematics, Bucharest, Romaniac Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), Sapienza University of Rome, Italyd Institut de Statistique, Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Voie du Roman Pays 20, B 1348 Louvain-la-Neuve, Belgium

a r t i c l e i n f o a b s t r a c t

Article history:Received 14 July 2011Accepted 18 June 2012Available online 26 June 2012

Keywords:Data Envelopment Analysis (DEA)Free Disposal Hull (FDH)Conditional efficiency measuresNonparametric frontiersManagerial efficiencyTwo-stage efficiency analysis

0377-2217/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.ejor.2012.06.028

⇑ Corresponding author.E-mail addresses: [email protected] (L. Ba

(C. Daraio), [email protected] (L. Simar).

The measurement of technical efficiency allows managers and policy makers to enhance existing differ-entials and potential improvements across a sample of analyzed units. The next step involves relating theobtained efficiency estimates to some external or environmental factors which may influence the produc-tion process, affect the performances and explain the efficiency differentials. Recently introduced condi-tional efficiency measures (Daraio and Simar, 2005, 2007a,b), including conditional FDH, conditional DEA,conditional order-m and conditional order-a, have rapidly developed into a useful tool to explore theimpact of exogenous factors on the performance of Decision Making Units in a nonparametric framework.This paper contributes in a twofold fashion. It first extends previous studies by showing that a carefulanalysis of both full and partial conditional measures allows the disentangling of the impact of environ-mental factors on the production process in its two components: impact on the attainable set and/orimpact on the distribution of the efficiency scores. The authors investigate these interrelationships, bothfrom an individual and a global perspective. Second, this paper examines the impact of environmentalfactors on the production process in a new two-stage type approach but using conditional measures toavoid the flaws of the traditional two-stage analysis. This novel approach also provides a measure of inef-ficiency whitened from the main effect of the environmental factors allowing a ranking of units accordingto their managerial efficiency, even when facing heterogeneous environmental conditions. The paperincludes an illustration on simulated samples and a real data set from the banking industry.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction and basic notations

Productivity analysis is concerned with the evaluation of theperformances of firms to identify inefficient units whereinimprovements could help increase profitability or reduce costs.Most of the efficiency analysis literature has focused on the estima-tion of the production frontier that provides the benchmark againstwhich the economic producers are evaluated. Nevertheless, a veryimportant component that concerns recent studies is the explana-tion of efficiency differentials by including in the analysis exoge-nous variables or environmental factors that cannot be controlledby the producer but may influence the production process. Froma managerial point of view, it is important to identify the particu-larities of the production process or the economic conditions thatmight be responsible for inefficiency as well as to detect and ana-lyze possible influential factors that can determine changes in pro-ductivity patterns. The choice of the environmental variables has to

ll rights reserved.

din), [email protected]

be done on a case-by-case basis by taking into account the eco-nomic field of application.

We will first introduce the notations and the basic assumptionson the Data Generating Process (DGP) characterizing the produc-tion process in the presence of environmental factors. Let X 2 Rp

þdenote the vector of inputs and let Y 2 Rq

þ denote the vector of out-puts. We consider a vector of environmental factors Z 2 Z � Rr

that may influence the process and the productivity patterns. Firmstransform quantities of inputs into outputs, but the environmentalvariables may affect this process. Let ðX; A; PÞ be the probabilityspace on which the random variables are defined; we denote byP the support of the joint distribution of (X,Y,Z); and we denotea particular DGP by P 2 P.

The impact and influence of Z on the production process may bemultiple and can be quite different from one application to an-other. The effect of Z on the production may either affect the rangeof achievable values for the couples (X,Y), including the shape ofthe boundaries of the attainable set, or it may only affect the distri-bution of the inefficiencies inside a set with boundaries notdepending on Z (only the probability of being more or less far fromthe efficient frontier may depend on Z), or it can affect both. Finally,

Page 2: How to measure the impact of environmental factors in a nonparametric production model

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 819

the environmental factors Z may also be completely independentof (X,Y).

Daraio and Simar (2005, 2007a,b), extending previous work ofCazals et al. (2002), provide a quite general and unrestricted frame-work to investigate the joint behavior of (X,Y,Z) from a productiv-ity point of view. They consider a probability model that generatesthe variables (X,Y,Z) where the conditional distribution of (X,Y) gi-ven a particular value of Z will be of particular interest. This condi-tional process can be described by

Hðx; yjzÞ ¼ ProbðX 6 x;Y P yjZ ¼ zÞ; ð1:1Þ

or any equivalent variation of it (the joint conditional density func-tion or the joint conditional cumulative distribution function, etc.).The function H(x,yjz) is simply the probability for a unit operating atlevel (x,y) to be dominated by firms facing the same environmentalconditions z. Given that Z = z, the range of possible combinations ofinputs � outputs, Wz, is the support of H(x,yjz):

Wz ¼ fðx; yÞj Z ¼ z; x can produce yg: ð1:2Þ

If H(x,y) denotes the unconditional probability of being domi-nated, we have

Hðx; yÞ ¼ZZ

Hðx; yjzÞ f ZðzÞ dz; ð1:3Þ

having support W, the marginal (unconditional) attainable set de-fined as

W ¼ fðx; yÞj x can produceyg ¼[z2Z

W z: ð1:4Þ

Remember that the joint support of the variables (X,Y,Z) is de-noted by P. It is clear that, by construction, for all z 2 Z; Wz # W.

A large part of the literature on this topic has focused on so-called two-stage analysis, where typically, some first stage esti-mates of the efficiency of the firms are regressed in a second stageon these additional factors to investigate their effect on efficiency. 1

Simar and Wilson (2007) clarified that these two-stage approachesare restricted to models where these factors do not influence theshape of the production set. This is the separability condition whichstates that the support of (X,Y) is not dependent of Z, equivalently

separability condition : Wz ¼ W; for all z 2 Z: ð1:5Þ

In this latter case, the support of (X,Y,Z) can be written asP ¼ W� Z, where � represents the cartesian product. As clearlyillustrated by Figs. 1 and 2 in Simar and Wilson (2011b), it isimportant to understand the implications of condition (1.5). Ifthe condition is verified, the only potential remaining impact ofthe environmental factors on the production process may be onthe distribution of the efficiencies. This justifies the use of two-stage approaches as illustrated in Simar and Wilson (2007). If thecondition (1.5) is not verified, the measure of the distance of a unit(x,y) to the boundary of W, even if it can be well defined and esti-mated (details on this follow), has little economic interest, becauseit ignores the heterogeneity introduced by Z on the attainable set ofvalues for (X,Y). Whether or not Wz is independent of z is an empir-ical issue, and Daraio et al. (2010) provide a statistical procedure totest this hypothesis. The test is a global test of separability since ittests the null hypothesis Wz ¼ W; 8z 2 Z against its complement:9z 2 Z such that Wz – W.

1 Simar and Wilson (2007) cite more than 50 articles that use this approach. Aspointed out by Simar and Wilson (2011b), a Google Scholar search returned about1590 articles after a search on ‘‘efficiency’’, ‘‘two-stage’’, and ‘‘dea’’ for the period2007–2010. A large number of these papers incorrectly use either ordinary leastsquares (OLS) or tobit regression in the second stage and rely on conventionalmethods for inference.

Banker and Natarajan (2008) suggest another model where atwo-stage approach is valid, but the model heavily depends onquite restrictive and unrealistic assumptions on the productionprocess, as described and commented in detail in Simar and Wilson(2011b). If the two-stage approach is validated by the appropriatetest (Daraio et al., 2010), one can indeed estimate in the first stagethe efficiency scores of the units relative to the boundary of theunconditional attainable set in the inputs � outputs space andthen regress, in a second stage, the obtained efficiencies on theenvironmental factors. We know that even if an appropriate modelis used (e.g. Logit, Truncated Normal, nonparametric truncatedregression), the inference on the impact of Z on the efficiency mea-sures has to be carefully conducted, using adapted bootstrap tech-niques (i.e. traditional inference is wrong: see Simar and Wilson,2007, 2011b for details).

Another line of research, that we could call one-stage ap-proaches, has been proposed in the literature (see e.g. Bankerand Morey, 1986a,b for categorical external factors; Färe et al.,1989; or Färe et al., 1994, p. 223–6). Here the factors Z are consid-ered as free disposal inputs and/or outputs which contribute todefining the attainable set W � Rp

þ � Rqþ � Rr , but which are not ac-

tive in the optimization process defining the efficiency scores. Inthis case, and for giving just an example in the input-oriented case,the efficiency score of a unit facing external conditions z, would bedefined as:

hðx; y; zÞ ¼ inffhjðhx; y; zÞ 2 Wg: ð1:6Þ

The estimator of W would be obtained as usual, by adding thevariables Z in defining the Free Disposal Hull (FDH) and/or the DataEnvelopment Analysis (DEA) enveloping set, with a variable Zbeing considered as an input if it is favorable to efficiency and asan output if it is detrimental to efficiency. The drawback of this ap-proach is twofold: first we have to know a priori Z’s role on the pro-duction process, and second we assume the free disposability (andeventually convexity, if DEA is used) of the corresponding attain-able extended set W. Finally, this approach only admits variableZ having a monotone effect on the production process (no U-shape,or inversed-U shaped impacts are allowed; see Daraio and Simar,2005 for further discussion).

As described, e.g. in Daraio and Simar (2007a), the two mea-sures H(x,yjz) and H(x,y) allow us to define conditional andmarginal efficiency scores that can be estimated by nonparametricmethods. The comparison of the conditional and marginalefficiency scores can be used to investigate the impact of Z onthe production process. One of the objectives of this paper is toclarify what can be learned from the analysis of these conditionalefficiency scores. We will also focus on the particular role of effi-ciency scores relative to partial order frontiers (order-m frontiersfrom Cazals et al., 2002 and order-a quantile type frontiers fromDaouia and Simar, 2007), that not only provide robust versions ofthe efficient frontier, but also allow us to investigate different as-pects of the role of Z on the production process: impact on theattainable set and/or impact on the distribution of efficiencies.

In this paper, we also propose a regression-type procedureallowing us to make inference on the impact of Z on the conditionalefficiency scores. Confidence intervals for the local impact of Z willbe obtained by adapting subsampling ideas from Simar and Wilson(2011a). The latter analysis can be seen as a two-stage method, asdescribed earlier, but with the great difference that here, the objectregressed on Z (the conditional efficiency) is economically mean-ingful. The unexplained part of the conditional efficiencies can thenbe interpreted as a measure of managerial efficiency allowing us torank the performance of firms facing different environmentalconditions.

Summing up, in this paper, we develop a nonparametric pro-duction model where the role of these environmental factors is

Page 3: How to measure the impact of environmental factors in a nonparametric production model

820 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

explicitly introduced in a non restrictive way, through conditionalefficiency measures. The paper contributes in a twofold fashion. Itfirst extends previous research on conditional measures by show-ing that a careful analysis of both full and partial conditional mea-sures allows us to disentangle the impact of environmental factorson the production process in its two components: impact on theattainable set in the input � output space, and/or impact on thedistribution of the efficiency level. To the best of our knowledgethis was never developed before. Inference tools are then proposedby adapting appropriate bootstrap algorithms. Second, for the firsttime, the impact of environmental factors on the production pro-cess is also examined by using a novel two-stage type approachon conditional measures of efficiency to avoid the limitations ofthe traditional two-stage analysis. Our approach also provides ameasure of inefficiency whitened from the main effect of the envi-ronmental factors. This allows us to rank the firms according totheir managerial efficiency, even when facing heterogeneous envi-ronmental conditions.

The paper is organized as follows. Section 2 reviews the basicdefinition of marginal and conditional efficiency scores, with re-spect to full frontier and also to more robust partial frontiers. ThenSection 3 explains how we can disentangle the impact of externalfactors on the production process (i.e. impact on the support of theproduction set and impact on the distribution of the inefficiencyscores) by the analysis and comparison of conditional and uncon-ditional efficiencies. In addition, we propose a flexible model totry to whiten the conditional efficiencies from the effect of Z, in or-der to derive a measure of managerial efficiency. Section 4 providesthe various nonparametric estimates of the quantities of interestand offers useful guidelines to conduct inference, by using thebootstrap. We illustrate the procedure with a simulated data set,and we apply the approach to a real data set from the banking sec-tor in Section 5.2. Section 6 summarizes the main findings and con-cludes the paper.

2. Marginal and conditional efficiency measures

2.1. Farrell efficiency scores

The literature on efficiency analysis proposes several ways formeasuring the distance of a firm operating at the level (x0,y0) tothe efficient boundary of the attainable set. In the lines of the pio-neering work of Debreu (1951), Farrell (1957) and Shephard(1970), radial distances became very popular in the efficiency liter-ature. They can be input or output oriented (maximal radial con-traction of the inputs or maximal radial expansion of the outputsto reach the efficient boundary). Färe et al. (1985) introducedhyperbolic radial distances that avoid some of the ambiguity inchoosing output or input orientation. In this case, input and outputlevels are adjusted simultaneously. We can define these radialmeasures as follows:

hðx0; y0Þ ¼ inffh > 0jðhx0; y0Þ 2 Wg ð2:1Þkðx0; y0Þ ¼ supfk > 0jðx0; ky0Þ 2 Wg ð2:2Þcðx0; y0Þ ¼ supfc > 0jðc�1x0; cy0Þ 2 Wg: ð2:3Þ

In this section, we limit the technical presentation to the outputorientation, but it is easy to adapt the formulae to the input-ori-ented and hyperbolic cases. From Cazals et al. (2002) and Daraioand Simar (2005), we know that under the assumption of free dis-posability of the inputs and of the outputs, these measures can becharacterized by some appropriate probability function deter-mined by H(x,y). We have, for the marginal Farrell output measureof efficiency,

kðx0; y0Þ ¼ supfk > 0jSY jXðky0jX 6 x0Þ > 0g; ð2:4Þ

where SYjXðy0jX 6 x0Þ ¼ ProbðY P y0jX 6 x0Þ ¼ Hðx0 ;y0ÞHðx0 ;0Þ

is the (non-standard) conditional survival function of Y, nonstandard becausethe condition is X 6 x0 and not X = x0.

If the firm is facing environmental factors Z = z0, then Daraioand Simar (2005) define the conditional Farrell output measureof efficiency as

kðx0; y0jz0Þ ¼ supfk > 0jðx0; ky0Þ 2 Wz0g ð2:5Þ¼ supfk > 0jSY jX;Zðky0jX 6 x0; Z ¼ z0Þ > 0g; ð2:6Þ

whereSY jX;Zðy0jX 6 x0; Z ¼ z0Þ ¼ ProbðY P y0jX 6 x0; Z ¼ z0Þ ¼ Hðx0 ;y0 jz0Þ

Hðx0 ;0jz0Þis

the conditional survival function of Y, here we condition on X 6 x0

and Z = z0. Since for all z0 2 Z, Wz0 # W, we have for allðx0; y0; z0Þ 2 P the relations 1 6 k(x0,y0jz0) 6 k(x0,y0).

Daraio et al. (2010) use these two measures to conduct a globaltest of separability. In their approach, using unconditional and con-ditional efficiency measures, they propose to estimate (by usingFDH or DEA techniques) a mean integrated square difference be-tween the boundaries of P and W� Z. This provides a test statisticwith a sampling distribution approximated by the bootstrap.

2.2. Partial order frontiers

The literature has proposed partial frontiers and the resultingpartial efficiency scores to provide robust measures of efficiencies,robust to extreme data points or outliers (a survey and a detailedanalysis of these approaches can be found in Daraio and Simar,2007a). In our setup here, this remains true when we will use par-tial frontiers of extreme orders, as explained later. However, whenusing partial frontiers of lower order, we will see that we obtainuseful complementary information on the impact of Z on the distri-bution of the inefficiencies inside the attainable set. To save space,we limit the presentation to the output-oriented case and to theorder-a quantile frontiers. The extension to other orientations (in-put and hyperbolic) is immediate. The case of the partial order-mfrontier is described in Cazals et al. (2002) and in Daraio and Simar(2005); see also Daraio and Simar (2007a) for a general presenta-tion and applications to real data.

2.2.1. Order-a quantile frontiersDaouia and Simar (2007) define for any a 2 (0,1] the order-a

output efficiency score as

kaðx0; y0Þ ¼ supfk > 0jSY jXðky0jX 6 x0Þ > 1� ag: ð2:7Þ

We see that if a ? 1, ka(x0,y0) ? k(x0,y0). If ka(x0,y0) = 1, thepoint (x0,y0) belongs to the order-a quantile frontier, meaning thatonly (1 � a) � 100% of the firms using fewer resources than x0,dominate the unit (x0,y0). A value ka(x0,y0) < 1 indicates a firm pro-ducing more than the level determined by the order-a frontier at x0.

By conditioning on Z = z0, Daouia and Simar (2007) similarly de-fine the conditional order-a output efficiency score of (x0,y0) as

kaðx0; y0jz0Þ ¼ supfk > 0jSY jX;Zðky0jX 6 x0; Z ¼ z0Þ > 1� ag: ð2:8Þ

Again, if a ? 1, ka(x0,y0jz0) ? k(x0,y0jz0).

3. What do we learn by the analysis of conditional andunconditional efficiency scores?

3.1. Individual analysis

The individual efficiency scores k(x,y) and k(x,yjz) have theirusual interpretation: they measure the radial feasible proportion-ate increase of output a unit operating at the level (x,y) should per-form to reach the efficient boundary of W and Wz, respectively. Incase the environmental factor Z has an effect on this boundary, the

Page 4: How to measure the impact of environmental factors in a nonparametric production model

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 821

first measure k(x,y) suffers from a lack of economic sounding, be-cause, facing the external conditions z, this firm may not be able toreach the frontier of W, that may be quite different from the one ofWz. So, the conditional measure is more appropriate to evaluatethe effort a firm must exert to be considered efficient. Note, however,that ranking firms according to these conditional measures can al-ways be done, but as far as managerial efficiency is concerned, thisranking is meaningless because firms face different operating condi-tions, and, some external conditions may be easier (or harder) tohandle than others to reach the frontier. We will see later how toderive a measure of managerial efficiency allowing us to rank theunits even when they face different environmental conditions.

The analysis of the individual ratios may also be of interest:they allow us to measure, for a unit (x,y), the local effect of Z onthe reachable frontier, independently of the inherent inefficiencyof the unit (x,y). Indeed, RO(x,yjz) = k(x,yjz)/k(x,y) 6 1 is the ratioof the radial distances of (x,y) to the two frontiers. The inherent le-vel of inefficiency of the unit (x,y) has been cleaned off, in the fol-lowing sense:

ROðx; yjzÞ ¼kðx; yjzÞkðx; yÞ ¼

kykkðx; yjzÞkykkðx; yÞ ¼

ky@;zx kky@xk

ð3:9Þ

where kyk is the modulus (Euclidean norm) of y and y@x and y@;zx arethe projections of (x,y) on the efficient frontiers (unconditional andconditional, respectively), along the ray y and orthogonally to x.Clearly ky@;zx k and ky@xk are both independent of the inherent ineffi-ciency of the unit (x,y). So, the ratio measures the shift of the fron-tier in the output direction, due to the particular value of z, alongthe ray y and for an input level x, whatever being the modulus of y.

This becomes even easier to see if we consider the particularcase of univariate y. To be specific, in this case, the efficient bound-aries can be described by maximal production functions:

uðxÞ ¼ supfyjSYjXðyjX 6 xÞ > 0g ð3:10ÞuðxjzÞ ¼ supfyjSY jX;ZðyjX 6 x; Z ¼ zÞ > 0g: ð3:11Þ

Here we have RO(x,yjz) = u(xjz)/u(x) 6 1, and we note thatu(x) = supzu(xjz). We observe that the ratio is indeed independentof the level of output y. So, to summarize, these ratios allow us toinvestigate the local effect of Z of the attainable frontier itself, fora given x and a given output mix. Using efficiency scores is particu-larly useful when y is multidimensional.

The same can be said for the input orientation, whereRI(x,yjz) = h(x,yjz)/h(x,y) P 1. In the particular case where x is uni-variate, the efficient boundaries can be described by the minimalinput functions:

/ðyÞ ¼ inffx 2 RþjFXjY ðxjY P yÞ > 0g; ð3:12Þ/ðyjzÞ ¼ inffx 2 RþjFXjY ;ZðxjY P y; Z ¼ zÞ > 0g; ð3:13Þ

where the notation introduced here is unambiguous. In this case,the ratio can be written as RI(x,yjz) = /(yjz)//(y) P 1, with the rela-tion /(y) = infz/(yjz). The same analysis as the one described earlier,can be done, mutatis mutandis, for the input orientation. The toppanel of Fig. 9, reported in Appendix A, illustrates possible behav-iors of these minimal input functions, conditional and uncondi-tional, in the simple case of p = q = r = 1.

3.2. Global analysis

The first important global analysis required in this setup is theone provided by Daraio et al. (2010), where conditional and uncon-ditional efficiency scores are used to build some test statistics totest the separability condition (1.5). This statistic is built by mea-suring, in some way, the difference between the two efficientboundaries. The bootstrap is then used to find critical values. Tosave space, we refer the reader to Daraio et al. (2010) for the details.

Besides a global test of separability, the comparison of the indi-vidual ratios of conditional to unconditional scores as a function ofZ may also be useful. However, this comparison could be mislead-ing when wrongly conducted. In this section, we clarify exactlywhat can be done and how to interpret the resulting pictures,extending the previous methodologies suggested in Daraio and Si-mar (2005, 2007a) to more general setups.

Indeed, Daraio and Simar described the usefulness of analyzingthe ratios considered as a function of z. This allows us to capturethe marginal effect of Z on the frontier shifts, but this effect maychange according to the level of the inputs, when frontier outputratios are considered or according to the level of the outputs, whenfrontier input ratios are analyzed.

This situation is explained and illustrated in detail in AppendixA, for a simple univariate scenario in the input-oriented frame-work. To summarize the Appendix, the interpretation of the ratiosas a function of z exclusively can always be done to explore themarginal effect of z on the frontier shifts, but the picture mightbe rather difficult to interpret when some dependence exists be-tween Z and both the efficient input levels and the outputs Y.Therefore, in the absence of any information, it is better to firstanalyze the behavior of the ratios RI(x,yjz) as a function of z, forfixed levels of the outputs y (multivariate analysis), or as illustratedin the applications below, as a joint function of both y and z. Ofcourse, if Y is independent of Z, or in a less restrictive way, underthe assumption that the shape of the boundaries of P in the sec-tions Y = y (in the (X,Z) space) would not change with the level y(which formally defines what we call partial separability), the anal-ysis of the ratios RI(x,yjz) as a function of z is largely simplified.

For the output orientation, and for the same reasons, the analysisof the ratios RO(x,yjz) as a function of z should first be conducted forfixed levels of the inputs X. Here, for given values of the inputs x, anincreasing shape for RO(x,yjz) as a function of z, would correspond toa favorable effect of Z (higher values of Z allow us to reach higheroutputs, Z is acting as a freely available input) and the opposite fora decreasing shape (Z is acting as an undesirable output). Here again,under the additional assumption of partial separability, i.e. the shapeof the boundaries of P in the sections X = x (in the (Y,Z) space) wouldnot change with the level x, the ratios RO(x,yjz) would have the sameshape for all values of x, and so the analysis of the effect of Z on theefficient frontier, as a function of z only, would be simplified.

3.3. Full frontier or partial frontier?

Partial frontiers are very popular nowadays because they pro-duce robust estimators of the efficient frontiers and of the effi-ciency scores, sharing nice statistical properties. Here, robustnessrefers to outliers or extreme data points, and we know that some-times, outliers can mask the effect of Z on the production process(see Daraio and Simar, 2007a, Section 5.4.1 for details). In our setuphere, we clarify what the partial scores and their corresponding ra-tios, e.g. RO,a(x,yjz) = ka(x,yjz)/ka(x,y) can add to the analysis of theeffect of Z on the production process. Here we will focus the pre-sentation on the output orientation.

First, as mentioned earlier, it is important to remember that theratios RO(x,yjz), when defined relative to the full frontiers, onlybring information on potential differences between the boundariesof W and Wz. They are not sensitive to changes in the distributionof inefficiencies. We saw earlier that the measure RO(x,yjz) 6 1 for afixed point (x,y) only depends on the relative position of theboundaries of W and Wz (in the radial direction given by y). It isno more the case for partial frontiers: the values of ka(x,y) andka(x,yjz) do not depend only on the boundary, they also dependon the effect of Z on the distribution of the output Y inside Wz,conditionally to X 6 x. It is easy to see that the ratiosRO,a(x,yjz) = ka(x,yjz)/ka(x,y) could be either 61 or P1, depending

Page 5: How to measure the impact of environmental factors in a nonparametric production model

822 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

on the actual effect of Z on the distribution of Y given X 6 x (whenconditioning on Z = z). We illustrate these facts in Appendix B, inthe simple case of a univariate output y and a univariate z.

So, to summarize the analysis of Appendix B, we see that if a isnear 1, the partial measures provide the same information as the fullmeasure but they use more robust frontiers. Using small values of acould be misleading, without having a clear picture of the separabil-ity condition. Under the latter, the analysis with small values of a(e.g. a = 0.5: median frontier) provides complementary informationon the effect of Z on the distribution of the inefficiencies (its medianvalue when a = 0.5). The same analysis could be done, mutatismutandis, for the input-orientation and for partial order-m frontiers.

3.4. Second-stage regression and managerial efficiency

The idea of regressing efficiency scores on the environmentalvariables to estimate the average effect of Z on the efficiency isquite old. However, as pointed out in Section 1 earlier, and in de-tails in Simar and Wilson (2007, 2011b), this second-stage regres-sion is meaningless, or at minimum difficult to interpret, whenusing the unconditional efficiency scores k(x,y). Indeed, if the sep-arability assumption (1.5) is not verified, the unconditional effi-ciencies are relative to the boundary of W defined by (1.4), whichhas no economic meaning for firms facing different environmentalconditions, i.e. facing different attainable sets Wz.2 Note that com-paring two firms facing different values of z by comparing their valuek(x,yjZ = z) is unfair.

It is therefore much more meaningful to analyze the averagebehavior of k(x,yjz) as a function of z, to capture the main effectof Z on these conditional measures.3 It is clear that here too, theconditional measures k(x,yjZ = z) may vary with both x and z.However, here we want to capture the marginal effect of Z on theefficiency scores, so it is legitimate to analyze the regressionEðkðX;Y jZÞjZ ¼ zÞ as a function of z. We suggest using a flexibleregression model defining lðzÞ ¼ EðkðX;YjZÞjZ ¼ zÞ, and r2ðzÞ ¼VðkðX;Y jZÞjZ ¼ zÞ thus, we may write

kðX;YjZ ¼ zÞ ¼ lðzÞ þ rðzÞe; ð3:14Þ

where EðejZ ¼ zÞ ¼ 0 and VðejZ ¼ zÞ ¼ 1. In the next section we willbriefly address the problem of estimating the functions l(z) andr(z) in a nonparametric way, with some guidelines for a bootstrapprocedure for getting confidence interval for l(z), at any given valuez. Whereas l(z) measures the average effect of z on the efficiency,r(z) provides additional information on the dispersion of the effi-ciency distribution as a function of z.

Another important result of this approach is the analysis of theresiduals. For a particular given unit (x,y,z), we can define the errorterm

e ¼ kðx; yjzÞ � lðzÞrðzÞ : ð3:15Þ

This can be viewed as the unexplained part of the conditionalefficiency score. If Z and e do not show a strong correlation in(3.14), this quantity can be interpreted as a pure efficiencymeasure of the unit (x,y).4 If Z and e show some correlation, still

2 This is a relevant empirical issue due to the great number of papers that haveappeared in recent years. See e.g. Kao and Hwang (2008); Chen et al. (2009a,b); andZha and Liang (2010), just to cite a few of the most recent.

3 As a matter of fact, since k(X, YjZ = z) is a ratio of two output levels, and since anadditive model will be suggested in (3.14), it might be more appropriate to performthis second-stage regression on the log k(X,YjZ = z). This is an empirical issue, and wecome back to this in the empirical illustrations.

4 Our approach may be seen as a recent advanced and robust interpretation ofLeibenstein’s (1966, 1979) X-inefficiency theory which has among its proximatecauses those related to the performance of management (see also Leibenstein andMaital, 1992).

the quantity defined in (3.15) can be used as a proxy for measuringthe managerial efficiency, since it is the remaining part of the condi-tional efficiency after removing the location and scale effect due to Z.We call it managerial because it depends only upon the managers’ability and not upon the environmental factors. What we have doneis a kind of whitening of the conditional efficiency scores from theeffects due to the environmental conditions Z. We can use thesequantities, which are standardized (mean zero and variance one),to compare the firms between them: a large value of e indicates aunit which has poor performance, even after eliminating the maineffects of the environmental factors. A small (negative) value, onthe contrary, indicates very good managerial performance of the firm(x,y,z). It allows us to rank the firms facing different environmentalconditions, because the main effects of these factors have beeneliminated. Extreme (unexpected) values of e would also warn forpotential outliers.

This analysis could also be performed using partial efficiencyscores, like ka(X,YjZ = z). When a is near 1, it would provide a ro-bust version of this analysis. We will also later review that thequality of the estimation is better when using partial efficiencymeasures (better rates of convergence).

4. Nonparametric estimator

4.1. Efficiency estimators

Nonparametric estimators of the conditional and unconditionalefficiency scores are very easy to obtain. We summarize the nota-tions and properties here as to what is needed for the remainderpaper (details can be found in Daraio and Simar, 2007a, or Simarand Wilson, 2008). We will denote Sn ¼ fðXi; Yi; ZiÞ j i ¼ 1; . . . ;ngthe sample of n iid observations on (X,Y,Z) generated in P accordingto the DGP P 2 P. If we plug nonparametric estimators of SYjX andSYjX, Z into all the formulae mentioned earlier, we obtain very natu-ral nonparametric estimators of the efficiencies. For the SYjX we canuse the empirical probabilities

bSY jXðy0jX 6 x0Þ ¼1=n

Pni¼1IðXi 6 x0;Yi P y0Þ

1=nPn

i¼1IðXi 6 x0Þ; ð4:1Þ

where Ið�Þ is the indicator function. This provides the popular FDHestimator of k(x0,y0)

kðx0; y0Þ ¼ maxfijXi6x0g

minj¼1;...;q

Yji

yj0

( ); ð4:2Þ

whose statistical properties are well known (see e.g. Simar andWilson, 2008). To summarize, under mild regularity conditions:

n1=ðpþqÞðkðx0; y0Þ � kðx0; y0ÞÞ!L

Weibullðlpþq0 ;pþ qÞ; ð4:3Þ

where l0 is a constant depending on the DGP P 2 P that is describedin Park et al. (2000). For the conditional (conditional to Z = z0) somesmoothing techniques are required. We have the estimator

bSY jX;Zðy0jX 6 x0; Z ¼ z0Þ

¼ 1=nPn

i¼1IðXi 6 x0;Yi P y0ÞKððz0 � ZiÞ=bÞ1=n

Pni¼1IðXi 6 x0ÞKððz0 � ZiÞ=bÞ

; ð4:4Þ

where for simplicity, we wrote the expression for a univariate Z.Here K(�) is a kernel with compact support and b > 0 is the band-width. For the general multivariate case, see Daraio and Simar(2007a). In the general multivariate setup, an optimal bandwidthselection procedure has been suggested in Badin et al. (2010), basedon a least-squares cross validation technique. This leads to the con-ditional efficiency estimator

Page 6: How to measure the impact of environmental factors in a nonparametric production model

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 823

kðx0; y0jz0Þ ¼ maxfijXi6x0 ;kZi�z0k6bg

minj¼1;...;q

Yji

yj0

( ): ð4:5Þ

So, it appears that the estimation of the conditional efficiencyscore is a kind of restricted FDH program (restricted to data pointshaving kZi � z0k 6 b). The statistical properties of the estimators ofthe conditional measures have been determined in Jeong et al.,2010. To summarize and roughly speaking, these estimators main-tain similar properties as the FDH estimator but with an effectivesample size depending on the bandwidth: n is replaced by nbr,where r is the dimension of Z. In practice, since the optimal band-width has a size n�1/(r+4) (see Badin et al., 2010, for details), thisgives a rate of convergence for the conditional measures estimatorsof n4/((r+4)(p+q)) in place of the better rate n1/(p+q) achieved by theFDH estimators. It is important to report these rates to ensure con-sistent subsequent bootstrap algorithms.

The nonparametric partial frontier efficiency estimates are ob-tained in a similar way, by plugging the estimators bSY jX and bSY jX;Zin the expressions defining the partial efficiency measures: algo-rithms have been proposed in Cazals et al. (2002), Daraio and Simar(2005, 2007a) for the order-m case and in Daouia and Simar (2007)and Daraio and Simar (2007a) for the order-a quantile case. Theirstatistical properties have been also established. Under mild regu-larity conditions, we have for instance

ffiffiffinp

kaðx0; y0Þ � kaðx0; y0Þ� �

!L N ð0;r2ða; x0ÞÞ; ð4:6Þ

where an expression for r2(a,x0) is given in Daraio and Simar(2006). A similar result holds for the order-m case (see Cazalset al., 2002).

For the estimators of the conditional partial measures, we havesimilar results where the rate of convergence

ffiffiffinp

deteriorates toffiffiffiffiffiffiffinbrp

¼ n2=ðrþ4Þ when the optimal bandwidth of Badin et al.(2010) described here is used.

5 As pointed out in Jeong and Simar (2006), if the point of interest (x, y,z) is ratherextreme with respect to the cloud of data Sn , some of the FDH estimators (conditionaland unconditional) may be undefined when computed relative to a bootstrap sampleS�m . This is a small sample issue, and this event should disappear asymptotically. Inthis case, as Jeong and Simar recommend, we define the estimators as being equal to1. This does not alter the asymptotic consistency of the bootstrap.

4.1.1. Robust estimators of the full frontierAs explained earlier, the partial frontiers have their particular

usefulness providing less extreme surfaces to benchmark individualunits and allowing us to investigate the impact of Z on the distribu-tion of the efficiencies. In particular, for m = 1, the order-m frontieris not looking to an optimal behavior but rather to an average behav-ior of firms (the same is true for the order-a frontier with a = 0.50).

But as discussed and illustrated in Daraio and Simar (2007a), itmay happen that outliers or extreme data points can hide the realeffect of environmental factors. So, in this case, it is particularlyuseful to build robust estimators of the full frontier. This can beachieved by using partial order frontier with extreme orders.

Indeed, if we let a = a(n) ? 1 (or m = m(n) ?1) when n ?1fast enough (see Cazals et al., 2002 and Daraio and Simar (2006),for details), the respective partial frontier estimators will convergeto the full frontier sharing the same properties as the FDH estima-tor (with the same limiting Weibull distribution). But for finite n(as we use in practice), a(n) will be less than 1 (and m(n) will beless than infinity) and so the corresponding estimate of the fullfrontier will not envelop all the data points being more robustand resistant to outliers and extreme values than the standardenvelopment estimators like FDH or DEA.

Simar (2003) and Daraio and Simar (2007a) have suggestedsome data-driven techniques to select reasonable values of a andm by analyzing the proportion of data points remaining outsidethe corresponding partial frontiers over a grid of values of the or-ders. This allows us to detect potential outliers. Daouia and Gijbels(2011a) provide a theoretical background for the comparison ofboth partial frontiers in terms of their robustness properties; seealso Daouia and Gijbels (2011b), where a theoretical rule is given

to select the appropriate order of the partial frontiers for obtainingrobust estimators of the full frontier in the presence of outliers.

4.1.2. Estimation of the ratios RO(x,yjz) and RO,a(x,yjz)Consistent estimators of the ratios are directly obtained by

plugging the nonparametric estimators derived earlier into the cor-responding formulae. So we have

bROðx; yjzÞ ¼bkðx; yjzÞbkðx; yÞ : ð4:7Þ

For any given point (x,y,z), it is easy to prove that bROðx; yjzÞ is aconsistent estimator of RO(x,yjz), sharing the worst rate of conver-gence of its components, i.e. the numerator. The limiting distribu-tion of the error is rather complicated, but it can be shown (seeDaraio et al., 2010, for details) that

njðbROðx; yjzÞ � ROðx; yjzÞÞ!L

Q zPð�Þ; ð4:8Þ

where the rate of convergence was given, j = 4/((r + 4)(p + q)) andwhere Qz

P is a nondegenerate distribution (i.e. it is not a Dirac distri-bution with mass 1 at one single value) depending on the currentvalue of z and on characteristics of the DGP P.

The partial measures will benefit from the better rate of conver-gence of the individual efficiency estimators. We have

bRO;aðx; yjzÞ ¼bkaðx; yjzÞbkaðx; yÞ

ð4:9Þ

and the asymptotic distribution of the error of estimation follows

nc bRO;aðx; yjzÞ � RO;aðx; yjzÞ� �

!L Q z;aP ð�Þ; ð4:10Þ

where here the rate is c = 2/(r + 4) and where Qz;aP is another nonde-

generate limiting distribution.Note that the unit (x,y,z) of interest can be any point in P, even

if in practice we will be interested to estimate these ratios at theobserved data points (Xi,Yi,Zi).

Since the limiting distributions are unknown, the bootstrap isthe only available route to draw inference on these individual ra-tios. Here we can directly apply the subsampling procedure de-scribed in Simar and Wilson (2011a) to derive confidenceintervals for RO(x,yjz) (and for the partial correspondents). To savespace we refer simply to the algorithms described in Simar and Wil-son’s paper (Section 4.2), the adaptation being straightforward. Justto avoid misunderstandings, we sketch the subsampling algorithm:

[1] First, we compute from the sample Sn ¼ fðXi;Yi; ZiÞji ¼ 1; . . . ;ng the efficiency score kðx; yÞ and its conditionalversion kðx; yjzÞ . By doing so, we compute bn,z the optimalbandwidth for the conditional survival function at z (we dothis by using the Badin et al. (2010) approach). We computethe ratio bROðx; yjzÞ.

[2] For a given value of m < n, we repeat the next steps [2.1]–[2.2] L times, for ‘ = 1, . . . ,L, where L is large enough (say,L = 2000).[2.1] Draw a random sample S�m;‘ ¼ X�;‘i ;Y

�;‘i ; Z

�;‘i

� �j

�i ¼ 1; . . . ;mg without replacement from Sn.

[2.2] Compute the ratio bR�;‘O ðx; yjzÞ by the same techniques asin [1]. Note that here we have to rescale the corre-sponding bandwidth at the appropriate size. So we willuse the bandwidths bm,z = (n/m)1/(r+4)bn,z for computingthe conditional scores with the bootstrap sample S�m;‘.

5

Page 7: How to measure the impact of environmental factors in a nonparametric production model

824 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

[3] From the collection of L values bR�;‘O ðx; yjzÞ with bROðx; yjzÞ,build the confidence interval obtained for this particularvalue of m.

[4] Select the appropriate value of the subsample size m, whichwill correspond to a value where the results show low vola-tility with respect to m (see Simar and Wilson, 2011a).

4.2. Second-stage regression of the conditional efficiency scores

To save space we present only the full-frontier case, where wewant to estimate l(z) and r(z) in model (3.14) by using basic toolsfrom the nonparametric econometrics literature. Our presentationis for continuous Z.6

Several flexible nonparametric estimators could be providedwhen working with the full-frontier efficiency scores. We know in-deed that, by definition, k(X,YjZ = z) P 1 with probability one. So,we could for instance estimate in a first step l(z), by local constantmethods (Pagan and Ullah, 1999) or local exponential smoothing(see Ziegelmann, 2002) on the values k(X,YjZ = z) � 1. The estima-tion of the variance function r2(z) is rather standard (see Fan andGijbels, 1996; Fan and Yao, 1998; Pagan and Ullah, 1999; and thereferences therein) and is obtained by regressing the squares ofthe residuals obtained from the first step, on Z. Here again, localconstant or local exponential methods can be used. Once lðzÞand r2ðzÞ are obtained, we can compute the residuals by applying(3.15).7

Whatever the selected nonparametric estimators, they sharetypically similar asymptotic properties with the same rate of con-vergence. As pointed out in Simar and Wilson (2007), the main sta-tistical issue in this second-stage regression comes from the factthat we have neither observations of k(Xi,YijZ = z) nor observationsk(Xi,YijZi) because the lambda’s are unknown. What we have is theset of the n estimators kðXi;YijZiÞ, obtained from the sample Sn. Sowe have a sample of n pairs Zi; bkðXi;YijZiÞ

� �, i = 1, . . . , n from which

we will estimate l(z) and r(z), by one of the methods describedearlier. For the regression, the resulting estimator will be writtenlnðzÞ. All these techniques involve smoothing (localizing) tech-niques requiring the selection of bandwidths hz, for the Z variables.Bandwidths hz with appropriate size (i.e. hz = c n�1/(r+4)) can be ob-tained by least-squares cross validation criterion (see e.g. Li andRacine, 2007 for details).

If the true but unavailable independent k(Xi,YijZi) would be usedas dependent variable in the regression, standard tools would pro-vide an estimator ~lnðzÞwith standard properties, i.e., as n ?1 andhz ? 0 with nhr

z !1ffiffiffiffiffiffiffiffinhr

z

q~lnðzÞ � lðzÞ � h2

z Bz� �

!L N ð0;VzÞ; ð4:11Þ

where the bias Bz and the variance Vz are bounded constantsdepending on the model and on the chosen estimator. Balancing be-tween bias and variance, the optimal bandwidth should indeed beof the order hz = cn�1/(r+4), providing the asymptotic result

n2=ðrþ4Þð~lnðzÞ � lðzÞÞ!L N ðBz;VzÞ: ð4:12Þ

When replacing k(Xi,YijZi) by kðXi;YijZiÞ, which are estimatorswith nj rate of convergence, we obtain the estimator lnðzÞ. Byusing the same arguments as in Kneip et al. (2012), it can be shownthat it is a consistent estimator of l(z) but at the rate of

6 For more details on how to handle discrete variables Z in a similar framework, seeBadin, 2011 and the references therein.

7 As mentioned earlier, in some applications, the additive model (3.14) could bemore appropriate for the logk(X, YjZ = z). In a particular application, it is important touse some standard checks to see if the hypothesis of the independence between e andZ, is more reasonable or not in the log scale (see the empirical illustrations that followin this paper).

convergence n4/((r+4)(p+q)), which becomes lower as soon asp + q > 2. Asymptotic behavior of the error ðlnðzÞ � lðzÞÞ shouldalso be available along the recent theoretical results developed inKneip et al. (2012), allowing us to develop bootstrap algorithmsand derive confidence intervals for l(z), but this is rather technicaland beyond the scope of this paper.

5. Numerical illustrations

5.1. Simulated examples

To illustrate how the procedure can work in practice, we firstintroduce some simulated examples, because there we know whatwe expect to find. We will use, as simulated scenario, an exampleinspired from Simar and Wilson (2011b) where we see clearly thetwo different ways an environmental factor can influence the pro-duction process. We analyze the three following different DGPs:

Y ¼ gðXÞe�U ð5:1ÞY� ¼ gðXÞe�UjZ�2j ð5:2ÞY�� ¼ gðXÞð1þ jZ � 2j=2Þ1=2 e�U ; ð5:3Þ

where g(X) = [1 � (X � 1)2]1/2 with X � U(0,1) and Z � U(0,4). FinallyU P 0 with U � N

þ 0;r2U

� �, and we choose for the illustration

r2U ¼ 0:05.

In DGP1 (5.1), Z has no effect on the production process (Z isindependent of (X,Y)). In DGP2 (5.2), we have the separability con-dition Wz �W, "z but Z influences the distribution of the ineffi-ciencies (higher probability of being inefficient when jZ � 2jincreases). In DGP3 (5.3), the effect of Z is only on the boundaryof the attainable (X,Y), violating the separability condition, the shift(increasing the level of the attainable frontier) is multiplicative andmore important when jZ � 2j increases.

5.1.1. Analysis of the ratios bROðx; yjzÞWe first investigate how the ratios bROðx; yjzÞ can inform us of

the potential shifts of the frontier due to the environmental factorZ. We look at bROðx; yjzÞ as a function of Z and of X. Fig. 1 displays asummary of the results for the case n = 200 (the case n = 100 gavesimilar plots with more sampling noise). Here, DGP1 correspondsto the top panels, then DGP2 is in the middle and DGP3 in the bot-tom panels. Since it is not easy to visualize three-dimensional pic-tures without rotating the pictures (as most up-to-date softwareallows on screen), we display in the right panels the two marginalviews from the X-side (marginal effect of X) and from the Z-side(marginal effect of Z). The results are as we expected from com-ments in Section 3.2, taking into account the fact that we use esti-mates with low rates of convergence (n4/((r+4)(p+q)) = n2/5) in place oftrue values. For the three DGPs, the clouds of points are flat fromthe perspective of X, because in the three cases, the effect of Z onthe efficient frontier, if any, is independent of X. For the first twoDGPs the separability condition is verified, the cloud is really flatwith respect to the two dimensions. For the DGP3, the U-shapewith respect to Z that appears in both figures is exactly what weexpected from (5.3). To conclude, the pictures of these ratios asfunctions of x and z are clearly informative. In our examples here,a marginal analysis of the effect of Z on the shifts of the efficientfrontiers would also provide meaningful interpretations.

The analysis of the partial ratios, bRO;aðx; yjzÞ, with a not far from1, would provide the same pictures (we do not have outliers here).However, it is interesting to look at these ratios for smaller valuesof a to detect potential effects of Z on the distribution of the inef-ficiencies. Fig. 2 does this, for the median value (a = 0.5). Hereagain the results confirm mostly what we expected (again, spuri-ous unexpected behaviors could come from the fact that we use

Page 8: How to measure the impact of environmental factors in a nonparametric production model

00.2

0.40.6

0.81

01

23

40

0.2

0.4

0.6

0.8

1

1.2

xz

RO

(x,y

|z)

0 0.2 0.4 0.6 0.8 10

0.5

1

Marginal Effect of X on the Ratios RO(x,y|z)

values of x

RO

(x,y

|z)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

Marginal Effect of Z on the Ratios RO(x,y|z)

values of z

RO

(x,y

|z)

00.2

0.40.6

0.81

01

23

40

0.2

0.4

0.6

0.8

1

1.2

xz

RO

(x,y

|z)

0 0.2 0.4 0.6 0.8 10

0.5

1

Marginal Effect of X on the Ratios RO(x,y|z)

values of x

RO

(x,y

|z)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

Marginal Effect of Z on the Ratios RO(x,y|z)

values of z

RO

(x,y

|z)

00.2

0.40.6

0.81

01

23

40

0.2

0.4

0.6

0.8

1

1.2

xz

RO

(x,y

|z)

0 0.2 0.4 0.6 0.8 10

0.5

1

Marginal Effect of X on the Ratios RO(x,y|z)

values of x

RO

(x,y

|z)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

Marginal Effect of Z on the Ratios RO(x,y|z)

values of z

RO

(x,y

|z)

Fig. 1. Effect of X and Z on the ratios bROðx; yjzÞ. From top to bottom: DGP1, DGP2 and DGP3. Here n = 200 and the circles are the estimated ratios.

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 825

estimates with low rates of convergence). For DGP1, top panel, thecloud of points is flat: we see no effect of Z on the efficiencies. Inthe case of DGP2, where we have the separability, we see somecurvature in the z direction and flat behavior in the x direction; in-deed, the marginal and conditional frontiers have the same supportbut the distribution of the inefficiencies is changing with the valueof z. Near the center (z = 2), the sampling variations of bRO;aðx; yjzÞare near 1, and for larger values of jz � 2j we have more valuessmaller than 1. For DGP3, we reproduce for the median the U-shaped effect of the shift of the frontier we have observed by look-ing to the bottom panel of Fig. 1.

5.1.2. Second-stage regressionFig. 3 summarises the results of the analysis of the second stage

regression of log kðx; yjzÞ on z. The analysis done with kðx; yjzÞ gavevery similar results. For each DGP, we show on the left the resultsof the nonparametric regression for l(z) and r(z), in the middle thehistogram of the resulting residuals that can be interpreted as

managerial efficiency and on the right, the clouds of n pointsðZi; eiÞ to check if any pattern is still apparent after whitening theeffect of Z on the conditional efficiencies.

Again, the results are as we expected. We do not see any effectof z for DGP1, for DGP2, we observe a visible U-shaped effect forboth l(z) and r(z), confirming that the distribution of the ineffi-ciencies varies with z. For DGP3, a small spurious effect (due tosampling uncertainties) shows, but it still maintains, roughly a sta-ble l(z) and a constant r(z), as it should (the distribution of theinefficiencies does not depend on z). Note that the histograms ofthe managerial efficiencies ei, recover in the three cases the shapeof the half-normal distribution that has been simulated for U. Thecorrelations between Ui and ei are quite high in each case: 0.94,0.87, 0.84 for DGP1, DGP2 and DGP3, respectively. So, it seemslegitimate and meaningful to use these residuals to rank the firmsaccording their managerial inefficiencies. The scatter plots be-tween ei and Zi do not show a particular structure; the correlationbetween the two variables are indeed very low: �0.05, 0.02, 0.01for DGP1, DGP2 and DGP3, respectively.

Page 9: How to measure the impact of environmental factors in a nonparametric production model

00.2

0.40.6

0.81

01

23

40

0.5

1

1.5

2

xz

RO

, α(x

,y|z

)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

Marginal Effect of X on the Ratios RO,α(x,y|z)

values of x

RO

,α(x

,y|z

)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

Marginal Effect of Z on the Ratios RO,α(x,y|z)

values of z

RO

, α(x

,y|z

)

00.2

0.40.6

0.81

01

23

40

0.5

1

1.5

2

xz

RO

,α(x

,y|z

)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

Marginal Effect of X on the Ratios RO,α(x,y|z)

values of x

RO

,α(x

,y|z

)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

Marginal Effect of Z on the Ratios RO,α(x,y|z)

values of z

RO

,α(x

,y|z

)

00.2

0.40.6

0.81

01

23

40

0.5

1

1.5

2

xz

RO

,α(x

,y|z

)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

Marginal Effect of X on the Ratios RO,α(x,y|z)

values of x

RO

, α(x

,y|z

)

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

Marginal Effect of Z on the Ratios RO,α(x,y|z)

values of z

RO

,α(x

,y|z

)

Fig. 2. Effect of X and Z on the ratios bRO;aðx; yjzÞ, with a = 0.5. From top to bottom: DGP1, DGP2 and DGP3. Here n = 200 and the circles are the estimated ratios.

826 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

5.2. Efficiency in the banking sector

Simar and Wilson (2007) include an empirical example basedon Aly et al. (1990) using data on 6.955 US commercial banks ob-served at the end of the fourth quarter, 2002.8 They run a truncatedregression on the input-oriented DEA estimates of efficiency in a sec-ond stage (as suggested in Aly et al. (1990)). Daraio et al. (2010) usedthe same data set to test the separability condition which was re-jected at any reasonable level, indicating that any two-stage proce-dure is meaningless for this dataset. This was a global test; we willproceed here to a local analysis and try to detect the size and direc-tion of the detected effect.

The original data set contains three inputs (purchased funds,core deposits and labor) and four outputs (consumer loans, busi-ness loans, real estate loans, and securities held) for banks. Alyet al. (1990) considered two continuous environmental factors,the size of the banks Z1, and a measure of the diversity of the

8 We would like to thank Paul W. Wilson who provided us this data set.

services proposed by the banks Z2 (see Aly et al. (1990), for details)and one binary variable indicating if the banks belong to a Metro-politan Statistical Area (MSA). We use, as in Simar and Wilson(2007), a measure of the size of the banks by the log of the total as-sets, rather than the total deposits as in Aly et al. (1990). For sim-plifying the presentation, we will illustrate our procedure with asubsample of 322 banks (also used in Simar and Wilson, 2007).

Some prior exploratory data analysis indicates that the three in-puts are highly correlated and the same is true for the four outputs.So, due to the dimensionality of the problem (three inputs, fouroutputs, and three environmental factors) with the limited sampleused here (322 units), we first reduce the dimension in the in-put � output space by using the methodology suggested in Daraioand Simar (2007a).

Since the radial measures are scale invariant, we divide each in-put and output by their mean (to be unit free) and replace the threescaled inputs by their best (non-centered) linear combination (weuse here a kind of non-centered PCA, as explained in details in Dar-aio and Simar, 2007a), and we check that we did not loose much

Page 10: How to measure the impact of environmental factors in a nonparametric production model

0 0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.72nd−stage regression

Z values

Log

Con

ditio

nal E

ffici

enci

es

μ(z)σ(z)data points

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.50

10

20

30

40

50

60

70Histogram of estimated managerial efficiencies

Values of εi

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Values of Zi

Valu

es o

f εi

0 0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.82nd−stage regression

Z values

Log

Con

ditio

nal E

ffici

enci

es

μ(z)σ(z)data points

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 30

10

20

30

40

50

60Histogram of estimated managerial efficiencies

Values of εi

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

Values of Zi

Valu

es o

f εi

0 0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.72nd−stage regression

Z values

Log

Con

ditio

nal E

ffici

enci

es

μ(z)σ(z)data points

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.50

10

20

30

40

50

60Histogram of estimated managerial efficiencies

Values of εi

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Values of Zi

Valu

es o

f εi

Fig. 3. Results of second-stage regression. From left to right: location and scale estimates of log bkðx; yjzÞ as a function of z, histograms of managerial efficiencies, scatter plot ofei against Zi. From top to bottom: DGP1, DGP2 and DGP3. Here n = 200 and the circles are the estimated ratios.

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 827

information by doing so, and that the resulting univariate inputfactor is highly correlated with the three original inputs. We followthe same procedure with the four outputs. The results are

IF ¼ 0:5707X1 þ 0:5731X2 þ 0:5881X3;

OF ¼ 0:4851Y1 þ 0:4875Y2 þ 0:5095Y3 þ 0:5172Y4;

indicating that both the input and the output factor represent a kindof average of the scaled inputs and outputs, respectively (theweights are equal). We obtain the following correlationsqIF;Xj

¼ ð0:972; 0:971; 0:996Þ for j = 1, 2, 3 and IF explains 96% of to-tal inertia of the original data (X1,X2,X3). We obtain similar resultswhen reducing the dimension in the output space:qOF;Yj

¼ ð0:924; 0:938; 0:975; 0:990Þ for j = 1, . . . ,4, and OF explains92% of total inertia of the original data (Y1, . . . ,Y4). Hence, we canconclude that we do not loose much information by this dimensionreduction, and the factors IF and OF are good representatives of theinput and output activities of the banks.

Remember that with the full data set and with all the originalvariables, Daraio et al. (2010) rejected the null hypothesis of globalseparability. We will here illustrate in our simplified version of the

examples what we can learn by the methodology we proposed inthis paper.

We first investigate the effect of Z1 = SIZE on the productionprocess. It is clear that in this example Z1 is highly correlated withY (the linear correlation is 0.57, but the Spearman rank correlationis 0.97). Fig. 4 shows the ratios as function of Y and Z1. Here it is theinput orientation: bRIðXi;YijZiÞ and for the partial frontiersbRI;aðXi; YijZiÞwith a = 0.95, to see if some extreme data points couldhide some effect and with a = 0.5 to investigate the effect on themiddle of the distribution of the inefficiencies. Without being ableto rotate the three-dimensional figures on the left panels, we havean idea of what happens complementing the left picture with thetwo marginal views. It is not clear from the full-frontier ratios ifY has some effect on the frontier levels, but looking to the picturefor the extreme quantile 0.95, it is more clear. For Z1 it is also clearfor the three pictures that Z1 has a negative (unfavorable) effect onthe frontier levels. When a = 0.5, the effect is also visible, confirm-ing the effect of the shift of the frontier. This short descriptive anal-ysis also confirms that the separability condition for Z1 seemsunrealistic.

Fig. 5 shows the results of the analysis for the full-frontierconditional efficiencies as a function of Z1 (the analysis was done

Page 11: How to measure the impact of environmental factors in a nonparametric production model

00.1

0.20.3

0.40.5

10

12

14

1

1.5

2

2.5

3

yz

RI(x

,y|z

)

0 0.1 0.2 0.3 0.4 0.51

1.5

2

2.5

3

Marginal Effect of Y on the Ratios RI(x,y|z)

values of y

RI(x

,y|z

)

9 10 11 12 13 14 151

1.5

2

2.5

3

Marginal Effect of Z on the Ratios RI(x,y|z)

values of z

RI(x

,y|z

)

00.1

0.20.3

0.40.5

10

12

14

0

0.5

1

1.5

yz

RI,α

(x,y

|z)

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

Marginal Effect of Y on the Ratios RI,α(x,y|z)

values of y

RI,α

(x,y

|z)

9 10 11 12 13 14 150

0.5

1

1.5

Marginal Effect of Z on the Ratios RI,α(x,y|z)

values of z

RI,α

(x,y

|z)

00.1

0.20.3

0.40.5

10

12

14

0

0.5

1

1.5

yz

RI,α

(x,y

|z)

0 0.1 0.2 0.3 0.4 0.50

0.5

1

Marginal Effect of Y on the Ratios RI,α(x,y|z)

values of y

RI,α

(x,y

|z)

9 10 11 12 13 14 150

0.5

1

Marginal Effect of Z on the Ratios RI,α(x,y|z)

values of z

RI,α

(x,y

|z)

Fig. 4. Effect of Y and Z1 = SIZE on the ratios bRIðXi;YijZiÞ (top panel) and bRI;aðXi;YijZiÞ (middle panel a = 0.95 and bottom panel a = 0.5).

9 10 11 12 13 14 15

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

2nd−stage regression

Z values

Log

Con

ditio

nal E

ffici

enci

es

μ(z)σ(z)data points

−3 −2 −1 0 1 20

10

20

30

40

50

60

70Histogram of estimated managerial efficiencies

Values of εi

9 10 11 12 13 14 15 16 17 18−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Values of Zi

Valu

es o

f εi

Fig. 5. Effect of Z1 = SIZE on conditional efficiencies log bkðx; yjzÞ, histograms of managerial efficiencies, scatter plot of ei against Z1.

828 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

on the logs, but the picture in original units is very similar). Theregression line l(z) has a global shape not far from the horizontal

line, indicating that the effect of Z1 on the distribution of the effi-ciencies is rather low. This is confirmed by the rather constant

Page 12: How to measure the impact of environmental factors in a nonparametric production model

00.1

0.20.3

0.40.5

0.40.6

0.81

1.2

1

1.5

2

2.5

3

yz

RI(x

,y|z

)0 0.1 0.2 0.3 0.4 0.5

1

1.5

2

2.5

3

Marginal Effect of Y on the Ratios RI(x,y|z)

values of y

RI(x

,y|z

)

0.4 0.6 0.8 1 1.2 1.41

1.5

2

2.5

3Marginal Effect of Z on the Ratios RI(x,y|z)

values of z

RI(x

,y|z

)

00.1

0.20.3

0.40.5

0.40.6

0.81

1.2

0

0.5

1

1.5

yz

RI, α

(x,y

|z)

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

Marginal Effect of Y on the Ratios RI,α

(x,y|z)

values of y

RI,α

(x,y

|z)

0.4 0.6 0.8 1 1.2 1.40

0.5

1

1.5

Marginal Effect of Z on the Ratios RI,α(x,y|z)

values of z

RI,α

(x,y

|z)

00.1

0.20.3

0.40.5

0.40.6

0.81

1.2

0

0.5

1

1.5

yz

RI,α

(x,y

|z)

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

Marginal Effect of Y on the Ratios RI,α(x,y|z)

values of y

RI,α

(x,y

|z)

0.4 0.6 0.8 1 1.2 1.40

0.5

1

1.5

Marginal Effect of Z on the Ratios RI,α(x,y|z)

values of z

RI, α

(x,y

|z)

Fig. 6. Effect of Y and Z2 = DIVERSE on the ratios bRIðXi;YijZiÞ (top panel) and bRI;aðXi;YijZiÞ (middle panel a = 0.95 and bottom panel a = 0.5).

0.4 0.6 0.8 1 1.2 1.4

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

2nd−stage regression

Z values

Log

Con

ditio

nal E

ffici

enci

es

μ(z)σ(z)data points

−3 −2 −1 0 1 20

10

20

30

40

50

60

70Histogram of estimated managerial efficiencies

Values of εi

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Values of Zi

Valu

es o

f εi

Fig. 7. Effect of Z2 = DIVERSE on conditional efficiencies log bkðx; yjzÞ, histograms of managerial efficiencies, scatter plot of ei against Z2.

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 829

Page 13: How to measure the impact of environmental factors in a nonparametric production model

10

12

14

0.40.6

0.81

1.21.4−1

−0.8

−0.6

−0.4

−0.2

0

0.2

Z1Z2

μ(Z)

10

12

14

0.40.6

0.81

1.21.4−1

−0.8

−0.6

−0.4

−0.2

0

0.2

Z1Z2

σ(Z)

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 10

50

100

150Histogram of estimated managerial efficiencies

Values of εi

Fig. 8. Effect of Z = (Z1,Z2) on conditional efficiencies log hðx; yjzÞ (left panel l(z), middle panel r(z)) and the histogram of estimated managerial efficiencies.

830 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

shape of r(z). The distribution of managerial efficiencies has a rea-sonable shape (typically not far from a half-normal). The effect ofZ1 on the conditional efficiency scores has been nicely whitened:the Pearson linear correlation between Zi and ei is �0.009 andthe Spearman rank correlation is �0.008. So the ranking of thebanks according to ei is cleaned from the effect of the size variableZ1. Note that the resulting ranking is different from the ranking ob-tained by the marginal FDH scores kðXi;YiÞ, but since the effect ofZ1 is not a big effect, there remains some correlation (the correla-tion between the two rankings is 0.64).

We did the same univariate exercise to investigate the marginaleffect of the variable Z2 (DIVERSE: a measure of the diversity of theproducts of the Banks). Figs. 6 and 7 display the results. We sum-marize very shortly the conclusions and let the reader completethe analysis. The effect of Z2 seems quite small (see top panel ofFig. 6). Z2 is not responsible for the rejection of the separabilitycondition. However, we can see a small effect on the partial ratiosbRI;aðXi;YijZiÞ, indicating a small effect on the distribution of the effi-ciencies, but a favorable one: banks having more diversity seems tohave a distribution of their efficiency slightly more concentratednear the efficient frontier. This is confirmed by looking at Fig. 7,where l(z) is slightly increasing, combined with a slightly decreas-ing r(z). The effect of Z2 on the conditional efficiencies has beenwell removed, and the correlation between ei and Zi is 0.06 (theright panel of Fig. 7 reveals no clear remaining pattern).

The bivariate analysis would consist of using the location-scaleregression model with Z = (Z1,Z2). Pictures to see the joint effect ofZ on the frontier levels for fixed levels of the outputs are difficult todisplay (four dimensions), but we know that the assumption ofseparability was rejected in Daraio et al. (2010). The analysis ofthe conditional efficiency scores as a function of Z is similar towhat we did earlier for one dimension. Fig. 8 displays the resultsfor the surfaces l(z) and r(z). To summarize, we confirm typicallythe two marginal analysis we performed, and we do not see anyinteraction between Z1 and Z2 in the effect on the conditional effi-ciencies. The resulting managerial estimates have correlation�0.0461 and �0.0339 with Z1 and Z2, respectively, so most of theeffects of Z have been removed by our location-scale model. Theright panel of Fig. 8 shows the resulting histogram of these residu-als. The shape looks very similar to those obtained by the marginalanalysis completed earlier, but we have as expected more massnear the efficient frontier, because we explain the conditional effi-ciencies by two environmental factors here.

In the case of the discrete Z3 = MSA, the analysis could be per-formed separately on the two groups, or Z3 could be introducedin these models by using appropriate kernels (see, for details).The procedure would go along the same lines as the one illustratedabove. Of course, increasing the number of variables will give

estimators with less precision and the descriptive tools presentedearlier are limited to pictures in three dimensions.

6. Conclusions

This paper develops the conditional nonparametric methodol-ogy to investigate the role of environmental variables Z by intro-ducing these external factors in a non restrictive way.

The paper clarifies what can be learned by analyzing conditionalefficiency measures and proposes a general approach to measureand draw inference about the impact of these factors on the pro-duction process.

By using conditional efficiency measures we can indeed mea-sure the impact of external factors on the attainable set in the in-put–output space, and/or we can investigate the impact of theexternal factors on the distribution of inefficiency scores. We ex-tend existing methodological tools to explore these interrelation-ships, both from individual and global perspectives.

Further, we propose a two-stage type approach of the condi-tional efficiencies on the factors that do not suffer from the limita-tions of traditional two-stage approaches. We suggest a flexiblemodel to eliminate the location and the scale effect of Z on the effi-ciencies. The analysis of the obtained residuals provides a measureof efficiency whitened from the main effects of the environmentalfactors. This allows us to rank the firms according to their manage-rial efficiency, even when facing heterogeneous environmentalconditions.

The procedure is illustrated through simulated samples andwith a real data set on US commercial banks.

Future research should address the following issues: testing thepartial separability condition to simplify the analysis; testing theindependence between the error term and the environmental fac-tors in the second-stage regression; and, finally establishing theasymptotic properties of the second-stage regression. The lattertwo should be interrelated.

Acknowledgements

Financial support from the Romanian National Authority forScientific Research, CNCS-UEFISCDI, Project number PN-II-ID-PCE-2011-3-0893, from the Inter-university Attraction Pole, PhaseVI (No. P6/03) of the Belgian Government (Belgian Science Policy)and from the INRA-GREMAQ, Toulouse, France are gratefullyacknowledged. This work was completed during an academic visitin 2012 of L. Simar at the DIAG, Sapienza University of Rome,Italy.

Page 14: How to measure the impact of environmental factors in a nonparametric production model

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 831

Appendix A. Effect of Z on the frontier levels

We illustrate the basic ideas for the input orientation in a sim-ple scenario where p = q = r = 1. In this case, we can describe the(input) efficient frontier and its conditional version by the func-tions /(y) and /(yjz). Suppose that when Y = y1, Z is favorable tothe production process (it acts like a free disposal input), thefrontier /(y1jz) is displayed in dashed line in the top panel ofFig. 9, we see also the marginal input-frontier /(y1). Suppose thatwhen Y = y2, Z is favorable until level za and then unfavorable(acting like an undesired output), the conditional frontier /(y2jz)is represented by the solid line in the figure. Finally, suppose thatwhen Y = y3, Z is unfavorable, the conditional frontier /(y3jz) isthen displayed as the dotted line in the figure. We see that forall cases, RI(x,yjz) P 1, but the shape of the ratios as a functionof z can be different according to the values of Y (see the bottompanels for the three different levels of Y). In the cases illustratedhere, the analysis of these ratios RI(x,yjz) as a function of z only,would be problematic, so in general, without additional assump-tions, it is better to carry the analysis for fixed levels of the out-puts Y.

Of course, the example illustrated in Fig. 9 is rather extreme,and in many situations, the interactions between Z and Y on thefrontier levels will be less complicated. In particular, if Y is inde-pendent of Z, or in a less restrictive way, under the assumption thatthe shape of the boundaries of P in the sections Y = y (in the (X,Z)space) would not change with the level y, the conditional frontiersin the top panel of Fig. 9 would be parallel, so that the ratiosRI(x,yjz) would have the same shape when considered as a functionof z for all values of Y. For instance, this would be the case if Zwould act as a free disposal input for all the values of Y (Panel Iin the bottom panels of Fig. 9). In the lines of Simar and Wilson

Fig. 9. Various scenarios for interpreting the effect of Z. Top panel, the minimal inpcorresponding ratios RI(x,yjz) as a function of z.

(2007), this corresponds to an assumption of partial separability,which was implicitly assumed and illustrated, in Daraio and Simar(2005, 2007a). In this case, the analysis of the effect of Z on the effi-cient frontier is largely simplified. Testing this partial separabilityassumption remains an open issue for future work, in the numer-ical illustrations provided in the paper we give some descriptivetools to investigate this issue.

Appendix B. Complementarity of full-frontier and partial-frontier measures

In Fig. 10 we illustrate the basic ideas of Section 3.3 for the out-put orientation, for the particular case of a univariate output, for afixed level of input x0, and for a fixed value z0. The figure displaysthe conditional distributions F(yjX 6 x0) and F(yjX 6 x0, Z = z0), forvarious scenarios, along with the upper boundary of their supportand their a-quantiles. Remember that in this univariate outputcase, RO(x0,y0jz0) = u(x0jz0)/u(x0), with a similar expression forRO,a(x0,y0jz0). In the left panels, where the separability conditionis verified (see panels II and III), the conditional and unconditionaldistributions of the inefficiencies are different, but they share thesame support; this results in ratios RO(x0,y0jz0) = 1 (we see indeedthat u(x0jz0) � u(x0)).

Second, the information carried by the ratios R0,a(x,yjz), whendefined relative to the partial frontiers, is multiple. Suppose thatWz = W and so RO(x,yjz) = 1 for all points (x,y,z) (left panels inFig. 10). Then, if the distribution of the inefficiencies is affectedby Z, the quantiles of SYjX,Z will be different from those of SYjX.Therefore, for all (x,y) 2Wz, the ratios RO,a(x,yjz) will be affected.Note that in this case (Wz = W), the changes can go in twodirections for the partial parameter: if the distribution of the

ut frontiers, in the coordinates (x,z) for different levels of Y. Bottom panels, the

Page 15: How to measure the impact of environmental factors in a nonparametric production model

Fig. 10. Various scenarios for F(yjX 6 x0) and F(yjX 6 x0, Z = z0). In the left panels the separability condition is verified at (x0,z0), while on the right panels, this condition isviolated. In all six panels, the dashed black line represents F(yjX 6 x0), with upper boundary of support u(x0), and the solid blue line is F(yjX 6 x0, Z = z0), with upper boundaryof support u(x0jz0). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

832 L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833

inefficiency is more spread in the direction of less efficient behav-ior (as in panel II), we observe ua(x0jz0) < ua(x0) givingRO,a(x0,y0jz0) < 1. On the contrary, if z0 provides a favorable envi-ronment to efficient behavior of the firms (without affecting theupper boundary), the distribution of Y will be more concentratednear the efficient boundary when Z = z0 (as in panel III), we haveua(x0jz0) > ua(x0) giving RO,a(x0,y0jz0) > 1. This is, of course, lessprobable when a ? 1. That is the reason why the global test of sep-arability of Daraio et al. (2010) uses test statistics based only on thefull measures of efficiency and not on the partial efficiency scores,unless a is not far from one.

Third, if there is a shift on the frontier Wz �W, it is much moredifficult to interpret the ratios RO,a(x,yjz). It is clear that a shift of theboundary will be transferred to the partial frontier, at least for largevalues of a, near 1, but this effect can either be increased or com-pensated by a simultaneous change of the distribution of the ineffi-ciencies from the unconditional to the conditional. So, in the case ofa shift of the boundary (see the right panels of Fig. 10), we could ob-serve RO,a(x0,y0jz0) less, equal or greater than 1. We illustrate threecases in Fig. 10. We see that in panel IV, the shift of ua(x0jz0) withrespect to ua(x0) is the same as the shift of u(x0jz0) with respectto u(x0), giving here RO,a(x0,y0jz0) < RO(x0,y0jz0) < 1. In panel V, we

have more spread toward inefficiencies when conditioning on z0,the shift of the quantile of the conditional distribution is muchmore important so RO,a(x0,y0jz0) RO(x0,y0jz0) < 1. But we couldobserve, as in panel VI, a different behavior when for a given z0 itis more probable to reach the efficient frontier u(x0jz0) implyingthat we could obtain for some quantiles RO,a(x0,y0jz0) > RO(x0,y0jz0).So, even if RO(x0,y0jz0) < 1, we could have in extreme casesRO,a(x0,y0jz0) P 1 (as in panel VI of Fig. 10).

To summarize the second and third points if Wz = W, the ratiosRO,a(x,yjz) are useful to shed light on the local impact of Z on theshape of the distribution of the inefficiencies. But it does not allowus to detect, when considered alone, a local shift of the boundary ofthe support of (X,Y). Unless a ? 1, because in this case, the partialfrontier can serve as a robust estimator of the full frontier.

In any case, these partial measures bring useful complementaryinformation of the relative position of the quantiles of SYjX,Z withrespect to those of SYjX. It will therefore be useful to provide somedetailed analysis of the ratios ROðx; yjzÞ; RO;a1 ðx; yjzÞ; . . . ;RO;ak

ðx; yjzÞ,as described in Section 3.1, for a grid of selected values for a like,0.99, 0.95, 0.90 . . . ,0.50. The latter case a = 0.50 provides for in-stance, a picture on the impact of z on the median of the ineffi-ciency distribution.

Page 16: How to measure the impact of environmental factors in a nonparametric production model

L. Badin et al. / European Journal of Operational Research 223 (2012) 818–833 833

The same would be true for the order-m partial ratios RO,m(x,yjz)where the particular case m = 1 would allow us to investigate theeffect of Z on the average frontier. Here, the choice of large valuesof m would provide the same information as for the full-frontiercase.

References

Aly, H.Y., Grabowski, C.P.R.G., Rangan, N., 1990. Technical, scale, and allocativeefficiencies in U.S. banking: an empirical investigation. Review of Economicsand Statistics 72, 211–218.

Banker, R.D., Morey, R.C., 1986a. Efficiency analysis for exogenously fixed inputsand outputs. Operations Research 34 (4), 513–521.

Banker, R.D., Morey, R.C., 1986b. The use of categorical variables in dataenvelopment analysis. Management Science 32 (12), 1613–1627.

Banker, R.D., Natarajan, R., 2008. Evaluating contextual variables affectingproductivity using data envelopment analysis. Operations Research 56 (1),48–58.

Badin, L., Daraio, C., 2011. Explaining efficiency in nonparametric frontier models.Recent developments in statistical inference. In: Van Keilegom, I., Wilson, P.W.(Eds.), Exploring Research Frontiers in Contemporary Statistics andEconometrics. Springer-Verlag, Berlin Heidelberg.

Badin, L., Daraio, C., Simar, L., 2010. Optimal bandwidth selection for conditionalefficiency measures: a data-driven approach. European Journal of OperationalResearch 201 (2), 633–640.

Cazals, C., Florens, J.P., Simar, L., 2002. Nonparametric frontier estimation: a robustapproach. Journal of Econometrics 106, 1–25.

Chen, Y., Cook, W.D., Li, N., Zhu, J., 2009a. Additive efficiency decomposition in twostage DEA. European Journal of Operational Research 196 (3), 1170–1176.

Chen, Y., Liang, L., Zhu, J., 2009b. Equivalence in two-stage DEA approaches.European Journal of Operational Research 193, 600–604.

Daouia, A., Gijbels, I., 2011a. Robustness and inference in nonparametric partial-frontier modeling. Journal of Econometrics 161, 147–165.

Daouia, A., Gijbels, I., 2011b. Estimating frontier cost models using extremiles. In:Van Keilegom, I., Wilson, P.W. (Eds.), Exploring Research Frontiers inContemporary Statistics and Econometrics. Springer-Verlag, Berlin Heidelberg.

Daouia, A., Simar, L., 2007. Nonparametric efficiency analysis: a multivariateconditional quantile approach. Journal of Econometrics 140, 375–400.

Daraio, C., Simar, L., 2005. Introducing environmental variables in nonparametricfrontier models: a probabilistic approach. Journal of Productivity Analysis 24,93–121.

Daraio, C., Simar, L., 2006. A robust nonparametric approach to evaluate and explainthe performance of mutual funds. European Journal of Operational Research175 (1), 516–542.

Daraio, C., Simar, L., 2007a. Advanced Robust and Nonparametric Methods inEfficiency Analysis. Methodology and Applications. Springer, New York.

Daraio, C., Simar, L., 2007b. Conditional nonparametric Frontier models for convexand non convex technologies: a unifying approach. Journal of ProductivityAnalysis 28, 13–32.

Daraio, C., Simar, L., Wilson, P., 2010. Testing whether two-stage estimation ismeaningful in nonparametric models of production. Discussion Paper #1031.Institut de Statistique, Université Catholique de Louvain, Louvain-la-Neuve,Belgium.

Debreu, G., 1951. The coefficient of resource utilization. Econometrica 19 (3), 273–292.

Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and Its Applications. Chapmanand Hall.

Fan, J., Yao, Q., 1998. Efficient estimation of conditional variance functions instochastic regression. Biometrika 85, 645–660.

Farrell, M.J., 1957. The measurement of the productive efficiency. Journal of theRoyal Statistical Society, Series A CXX (Part 3), 253–290.

Färe, R., Grosskopf, S., Lovell, C.A.K., 1985. The Measurement of Efficiency ofProduction. Kluwer-Nijhoff Publishing, Boston.

Färe, R., Grosskopf, S., Lovell, C.A.K., 1994. Production Frontiers. CambridgeUniversity Press, Cambridge.

Färe, R., Grosskopf, S., Lovell, C.A.K., Pasurka, C., 1989. Multilateral productivitycomparisons when some outputs are undesirable: a nonparametric approach.Review of Economics and Statistics 71 (1), 90–98.

Jeong, S.O., Park, B.U., Simar, L., 2010. Nonparametric conditional efficiencymeasures: asymptotic properties. Annals of Operations Research 173, 105–122.

Jeong, S.O., Simar, L., 2006. Linearly interpolated FDH efficiency score for nonconvexfrontiers. Journal of Multivariate Analysis 97, 2141–2161.

Kao, C., Hwang, S.N., 2008. Efficiency decomposition in two-stage data envelopmentanalysis: an application to non-life insurance companies in Taiwan. EuropeanJournal of Operational Research 185 (1), 418–429.

Kneip, A., Simar, L., Wilson, P.W., 2012. Central Limit Theorems for DEA Estimators:When Bias Can Kill Variance, manuscript.

Leibenstein, H., 1966. Allocative efficiency vs ‘‘X-efficiency’’. American EconomicReview 56, 392–415.

Leibenstein, H., 1979. A branch of economics is missing: micro–micro theory.Journal of Economic Literature XVII, 477–502.

Leibenstein, H., Maital, S., 1992. Empirical estimation and partitioning of X-inefficiency: a data-envelopment approach. AEA Papers and Proceedings 82 (2),428–433.

Li, Q., Racine, J., 2007. Nonparametric Econometrics: Theory and Practice. PrincetonUniversity Press.

Pagan, A., Ullah, A., 1999. Nonparametric Econometrics. Cambridge University Press.Park, B., Simar, L., Weiner, C., 2000. The FDH estimator for productivity efficiency

scores: asymptotic properties. Econometric Theory 16, 855–877.Shephard, R.W., 1970. Theory of Cost and Production Function. Princeton University

Press, Princeton, New-Jersey.Simar, L., 2003. Detecting outliers in frontiers models: a simple approach. Journal of

Productivity Analysis 20, 391–424.Simar, L., Wilson, P.W., 2007. Estimation and inference in two-stage, semi-

parametric models of production processes. Journal of Econometrics 136 (1),31–64.

Simar, L., Wilson, P.W., 2008. Statistical inference in nonparametric frontier models:recent developments and perspectives. In: Harold, Fried, KnoxLovell, C.A.,Schmidt, Shelton. (Eds.), The Measurement of Productive Efficiency, 2nd ed.Oxford University Press.

Simar, L., Wilson, P.W., 2011a. Inference by the m out of n bootstrap innonparametric frontier models. Journal of Productivity Analysis 36, 33–53.

Simar, L., Wilson, P.W., 2011b. Two-stage DEA: Caveat Emptor. Journal ofProductivity Analysis 36, 205–218.

Zha, Y., Liang, L., 2010. Two-stage cooperation model with input freely distributedamong the stages. European Journal of Operational Research 205 (2), 332–338.

Ziegelmann, F.A., 2002. nonparametric estimation of volatility functions: the localexponential estimator. Econometric Theory 18 (4), 985–991.