Data Envelopment Analysis models for a mixture of non ... · Data Envelopment Analysis models for a mixture of non-ratio and ratio variables Sanaz Sigaroudi Doctor of Philosophy Graduate

Data Envelopment Analysis models for a mixture ofnon-ratio and ratio variables

written by

Sanaz Sigaroudi

A report submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Mechanical and Industrial EngineeringUniversity of Toronto

Copyright c© 2016 by Sanaz Sigaroudi

Abstract

Data Envelopment Analysis models for a mixture of non-ratio and ratio variables

Sanaz Sigaroudi

Doctor of Philosophy

Graduate Department of Mechanical and Industrial Engineering

University of Toronto

2016

Performance comparison is a delicate business, even among organizations of the same

kind. The simplest of all is usually the ratio of a single output to a single input. The

problem lies in the fact that one aspect of the business could hardly represent the whole

picture and the landscape the business is operating in. Businesses have complex struc-

tures and offer variety of products so it is only fair to take all into consideration to judge

their performance against others in an industry. Data Envelopment Analysis (DEA) is

one method suitable when there are multiple inputs and outputs to be considered. It is

a non-parametric method conceptualized by Farrell in 1957. However, it was not untill

20 years later, that Charnes, Cooper and Rhodes brought this concept into practice by

finding a way to realize this idea and make it work. The breakthrough came from the

fact that under certain assumptions Farrell’s idea could be formulated as a linear math-

ematical program (LP) which could be solved using the simplex and similar methods.

One limitation of the existing DEA models is their inability to work with ratio variables

because the linear combination of DMUs do not generally translate to linear combina-

tion of inputs and outputs in the ratio form. In this work, our contribution to the field

includes extending Farrell’s idea to include ratio inputs and outputs and operationalizing

four models under variable returns to scale assumption. Three non-oriented models are

formulated and linearized and one non-linear model is solved using a heuristic.

ii

Acknowledgements

Thanks God for the greatest gift of being, for the opportunities and the wonderful people I

have come across in life and work with. Among those people, first and foremost, I would

like to express my sincere gratitude to my supervisor, Professor Joseph C. Paradi for

supporting me with his knowledge, patience, and sincerity. This PhD journey has had

many dimensions beyond the academic aspect and it has been a life experience which

made it worthwhile. He has been so kind to let me follow my love and life outside the

country, and enjoy my time as a new mom. Professor Paradi’s devotion to the wellbeing

of his students is exceptional and exemplary.

I am grateful to my committee members Professor Y. Lawryshyn, Professor R. Kwon

for providing me with constructive comments and insightful feedback, as well as Professor

C. Lee, and in particular Professor E. Thanassoulis for serving on my defense committee.

Professor Thanassoulis comments and feedback have greatly helped us to improve the

work.

I would also like to thank the staff at the MIE graduate office, mostly Brenda Fung

for helping me out through the administrative parts of the process. I would also like

to acknowledge the Department of Mechanical and Industrial Engineering and Rotman

Business School, the Graduate Management Consulting Association and Environmental

management Committee, who gave me the opportunity to get involved and enhance my

personal and professional development beyond standard education. I would also like to

thank my friends at the Centre for Management of Technology and Entrepreneurship,

present and alumni for their friendship and help.

My life in Canada has been a rich experience due to the friends I have made, the ones

I feel they have been with me all through my life. I have to specially thank my relatives

Solmaz, Makhmal and Anoosh who have offered me their home, care and love during my

frequent visits to Toronto.

I also like to thank Professor R. Thorpe who trusted me and gave me a place and

iii

the opportunity to work on my research in the Leeds University Business School, where

my life took me. I also like to thank Professor K. Pandza for giving me the flexibility to

finish my PhD while working.

I wish to extend my personal thanks to my family. To my parents for their uncondi-

tional love, my brother and sister-in-law, for being there for me. I am also blessed with

loving and caring parents-in-law who keep me in their good prayers. My sisters-in-law

and their families have been always there to cheer for me and support me. Also a per-

sonal tribute to my best friend’s mom, who is not with us anymore, but always believed

in me and encouraged me to get my PhD, I hope she is watching this from heavens.

Most importantly, thanks to my husband, Mohsen, for all he has been for me, for all

he means to me, and all he will be. For his understanding, love and encouragement along

this long process. Thanks to my beautiful daughter, Nika, and soon to come son, Iliya for

bringing hope, happiness and joy to my life. Thanks for your patience and cooperation,

enduring my absence and long working days and nights. I am truly blessed.

iv

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Thesis Objectives and Contributions . . . . . . . . . . . . . . . . . . . . 6

2 Data Envelopment Analysis: Theory, Assumptions and Realization

Techniques 8

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 DEA Basic assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Production Possibility Set . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.3 Efficiency Definition . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.4 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 DEA basic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 CCR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 BCC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.3 Additive model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Deterministic Frontier Estimators . . . . . . . . . . . . . . . . . . . . . . 19

2.4.1 Free Disposal Hull Frontier . . . . . . . . . . . . . . . . . . . . . . 20

2.4.2 Variable Returns to Scale Frontiers . . . . . . . . . . . . . . . . . 20

v

2.4.3 Constant Returns to Scale Frontier . . . . . . . . . . . . . . . . . 21

2.5 Probabilistic Frontier Estimators . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 Partial m Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.2 Quantile Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.3 Practical Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Linkage between Data Envelopment Analysis and Ratio Analysis . . . . . 26

2.6.1 Comparing DEA and RA . . . . . . . . . . . . . . . . . . . . . . . 26

2.6.2 Combining DEA and RA . . . . . . . . . . . . . . . . . . . . . . . 27

3 Literature review of non-oriented models 30

3.1 Russell Graph Efficiency Model . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Refined Russell Graph Efficiency Model . . . . . . . . . . . . . . . . . . . 34

3.3 Multiplicative Model (log measure) . . . . . . . . . . . . . . . . . . . . . 36

3.4 Invariant Multiplicative Model . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Pareto efficiency test model (Additive) . . . . . . . . . . . . . . . . . . . 40

3.6 Extended Additive model . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Constant Weighted Additive Model . . . . . . . . . . . . . . . . . . . . . 45

3.8 Normalized Weighted Additive Model . . . . . . . . . . . . . . . . . . . . 46

3.9 Global Efficiency Measure (GEM) . . . . . . . . . . . . . . . . . . . . . . 47

3.10 Enhanced Russell Graph Efficiency Measure (enhanced GEM) . . . . . . 48

3.11 Range Adjusted Model (RAM) . . . . . . . . . . . . . . . . . . . . . . . 51

3.12 BAM: a bounded adjusted measure . . . . . . . . . . . . . . . . . . . . . 52

3.13 Slack-based Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.14 Directional slack-based measure and distance function . . . . . . . . . . . 55

3.15 Graph Hyperbolic measure of efficiency . . . . . . . . . . . . . . . . . . . 57

3.16 Benefit function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.17 Range Directional Model and Inverse Range Directional Model . . . . . . 59

3.18 Modified Slack-based Measure . . . . . . . . . . . . . . . . . . . . . . . . 62

vi

3.19 Directional distance functions and slack-based measures of efficiency . . . 64

3.20 Universal model for ranking . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.21 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 Literature review of approximation models 67

4.1 Bootstrapping and DEA . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 Sampling techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Methodology:Proposed Non-oriented Model 75

5.1 Required adjustments to the basics of DEA . . . . . . . . . . . . . . . . . 75

5.1.1 Defining PPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.1.2 Disposability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.1.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1.4 Identifying the efficient units . . . . . . . . . . . . . . . . . . . . . 78

5.1.5 Calculating the relative efficiency score . . . . . . . . . . . . . . . 78

5.2 Building the right measure of Efficiency . . . . . . . . . . . . . . . . . . 80

5.2.1 Proposed non-oriented model . . . . . . . . . . . . . . . . . . . . 81

5.2.2 Model in the making . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.2.3 Making sense of the inefficiency score . . . . . . . . . . . . . . . . 97

6 Methodology: Approximating the Frontier in BBC Model 99

6.1 Partial Improvement: Approximation methods . . . . . . . . . . . . . . . 101

6.1.1 How to generate PPS progressively . . . . . . . . . . . . . . . . . 102

6.1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1.3 Pseudo Monte Carlo method . . . . . . . . . . . . . . . . . . . . . 106

6.1.4 Keep or discard, an LP feasibility problem . . . . . . . . . . . . . 109

6.1.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

vii

7 Realization, Case Study and Results 119

7.1 Realization of the non-oriented model using MATLAB . . . . . . . . . . 120

7.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2.1 Bank branch data: choice of model, inputs and outputs . . . . . . 122

7.2.2 Comparing the proposed model against traditional additive model 124

7.3 Case study, nonlinear BCC Model: approximation method . . . . . . . . 127

8 Recommendations and future work 131

8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8.2 Discussion of the Results: Proposed model . . . . . . . . . . . . . . . . . 132

8.2.1 Efficiency Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8.2.2 Direction of improvement . . . . . . . . . . . . . . . . . . . . . . 133

8.3 Discussion of the results: approximation method . . . . . . . . . . . . . . 134

8.4 Recommendation, limitations and future directions . . . . . . . . . . . . 135

References 143

viii

List of Tables

1.1 Lasik Equipment Information . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Comprehensive Lasik Equipment Information . . . . . . . . . . . . . . . 5

5.1 Non-oriented models and their properties . . . . . . . . . . . . . . . . . . 83

5.2 Summary of models and desired properties . . . . . . . . . . . . . . . . . 85

7.1 Input and output variables, rev=revenue and res=resources . . . . . . . . 124

7.2 Missed potential on savings at input side . . . . . . . . . . . . . . . . . . 126

7.3 Missed opportunity for higher return on revenue . . . . . . . . . . . . . . 126

8.1 Further savings on inputs (million $) . . . . . . . . . . . . . . . . . . . . 134

ix

List of Figures

1.1 Difference between the facets generated, based on correct PPS estimator

(black) and conventional DEA (red) with ratio variables . . . . . . . . . 6

2.1 DEA basic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Shapes of conventional frontiers: FDH, BCC, and CCR . . . . . . . . . . 21

5.1 The blue print to construct a non-oriented DEA model . . . . . . . . . . 86

6.1 Average non-zero weighs of size p vs Resolution . . . . . . . . . . . . . . 103

6.2 Sparse Matrix: Average number of non-zero weights vs Resolution . . . . 105

6.3 Number of iterations grows exponentially with smaller resolution when the

number of DMUs increases. . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.4 for same completeness ratio, larger sample size wins . . . . . . . . . . . . 113

6.5 Sample size affect is small if the number of hypothetical DMUs generated

stays the same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.6 Increasing the number of unobserved DMUs, Sample Size 5 . . . . . . . . 115

6.7 The same number of unobserved DMUs but different constructs, Sample

Size 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.1 Comparing efficiency scores against traditional model . . . . . . . . . . . 125

7.2 drop/raise in the efficiency score of branches after adding unobserved

DMUs generated by the approximation method . . . . . . . . . . . . . . 128

x

7.3 Efficiency score of the unobserved DMUs generated by the approximation

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

xi

Chapter 1

Introduction

Performance comparison is a delicate business, even among organizations of the same

kind. The simplest of all is usually the ratio of a single output to a single input. The

problem lies in the fact that one aspect of the business could hardly represent the whole

picture and the landscape the business is operating in. Businesses have complex struc-

tures and offer variety of products so it is only fair to take all into consideration to

judge their performance against others in the industry. Data Envelopment Analysis

(DEA) is one method suitable when there are multiple inputs and outputs to be con-

sidered. It is a non-parametric method conceptualized by Farrel in 1957 [Farr 57]. One

limitation of the existing DEA models is their inability to work with ratio variables

[Holl 03, Emro 09, Siga 09]. In this work, our contribution to the field includes extend-

ing Farrel’s idea to include ratio inputs and outputs and operationalizing two models:

non-oriented additive model and a variable returns to scale (VRS) model with ratio

variables at the side of orientation. For the latter we operationalized an existing concept.

This chapter defines the problem we aim to solve and provide some background infor-

mation around it. This chapter is structured as follows: first we provide some background

information then define the problem we aim to solve and, at the end, the thesis objectives

and our contributions to the field are listed.

1

Chapter 1. Introduction 2

1.1 Background

Ratio analysis(RA) is an easy-to-understand and straightforward method to measure

relative efficiency on a single aspect. When we talk about the term “efficiency”, the

ratio of a single output to a single input, such as return on assets, may come to mind.

Relative efficiency then is defined by dividing the aforesaid ratio by the corresponding

“best performer’s” efficiency. Best performer is the one unit with maximum or minimum

ratio value depending on what is desirable in the problem. Ratio variables, simple and

straightforward, mask some of the information by nature. Looking on only one ratio

might be misleading, two stocks with the same return on equity might be very different

in the amount of equity they hold, their profit margin, the amount they can borrow and

their future earnings. Despite its limitations ratio analysis has been thus far the preferred

method in industry [Kriv 08].

Data Envelopment Analysis (DEA) is a method to measure overall relative perfor-

mance of every unit in a group with multiple inputs and outputs; units are usually referred

to as Decision Making Units or, in short, DMUs. The definition of performance is not

unique and may depend on the certain issues that need to be addressed in an industry

or business. DMUs are characterized by their input and output variables and can, for

instance, represent manufacturing sites, bank branches, hospitals, schools or even human

beings. What makes this method so distinctive is its holistic approach to consider all

the inputs and outputs simultaneously, in contrast with other methods such as indexing

or ratio analysis. In 1957 Farrell [Farr 57] produced an activity analysis approach (the

methodology behind DEA) to correct what he believed were deficiencies in commonly

used index number approaches to productivity (and similar) measurements. However,

it was not untill 20 years later, that Charnes, Cooper and Rhodes brought this concept

into practice by finding a way to realize this idea and make it work [Char 78]. The

breakthrough came from the fact that under certain assumptions Farrell’s idea could be

formulated as a linear mathematical program (LP) which could be solved using the sim-


plex and similar methods. Despite DEA’s popularity among academics, it is not widely

used as a practical tool for performance assessment in industry. It seems that the major

hurdle in the DEA deployment is its different language in communicating the results to

managers. It is expected that the language that involves RA in explaining DEA results

would make it more understandable to management and, as a result, would be more

appealing to the industry. This new language may require the use of desired ratios as

the inputs and/or outputs of the DMUs.

Over the years, DEA gained acceptance and became popular as an efficiency mea-

surement tool. Scientists and a very few practitioners from different fields started to use

it. Some attempted to combine RA and DEA, overlooking the assumptions that made

this idea a computational reality in the first place. One of these common mistakes is

to feed the ratio variables in either inputs or outputs to the original Charnes, Cooper

and Rhodes (CCR) model, or other existing models in the DEA literature, which would

inevitably result in distorted outcomes. For example imagine three job-shops, for which

the headquarters have provided training classes for operating a new machine. Job-shop

A has 4 staff members on the machine and after training they were able to generate 8

products (productivity=2) with the highest quality 5. Job-shop B has 5 staff and they

have produced 15 units (productivity=3) of quality 4. Job-shop C with 6 staff produced

24 units (productivity=4) with quality of 3. If the productivity and quality are the

two output metrics that job-shops are evaluated against, and the input (for example

equipment) be the same for all, the traditional DEA will report all the job-shops equally

efficient. However note that the job-shop B is not as efficient as the average of job-shops

A and C combined with 5 staff who produce 16 units (1 more than 15) with the same

quality ratings as job-shop B. A preliminary study on finding such optimal solutions was

done in [Siga 09]. This doctoral thesis stems from that initial work and goes beyond that

to form a new branch in the DEA and ratios.

The problem with ratio variables in the DEA has been raised a few times by scholars.


In 2001, Dyson et al. [RGDy 01] listed several shortcomings of the DEA approach in use

and among them was the problem of mixing index, ratio data with volumes. In 2003,

Hollingsworth et al. [Holl 03], pointed out the problem of using ratios as inputs or outputs

in one DEA model (CCR). Later on, in 2008, Emrouznejad et al. [Emro 09] highlighted

the convexity axiom violation when using ratios in inputs or outputs in another DEA

model (BCC) and proposed a modified model when ratios may be present. The model

is only applicable to DEA models with specific orientation (input or output) where the

ratio variables do not exist on the orientation side. Moreover, the presented model was

conceptual rather than computational since the existing commercial software packages

did not handle ratios. In 2009, in our work leading to this thesis [Siga 09], we augmented

the model in [Emro 09] by adding a second phase to it and made that computationally

feasible and illustrated it with MATLAB coded example. We also proposed an additive

model, which did not have either the limitations of the required orientation or the one-

side only requirement of ratio variables. Our model, in its original form, did not however

provide a comprehensive measure of efficiency. It’s score was not bounded by unity and

it was not units invariant. Hence, it was highly sensitive to measurement errors. Our

goal in this thesis is to develop a model based on the Farrell’s idea that can be applied to

DMUs with ratio variables and, even more importantly, make that idea a computational

reality. We study the linkage between the RA and DEA in Chapter 2.

1.2 Problem Statement

Including ratio variables “as is” in the well-known DEA models, may lead to incorrect

results. One of the fundamental assumptions in DEA is that the production possibility

set (PPS), which consists of all feasible DMUs (observed and unobserved), could be con-

structed by a convex combination of the observed DMUs. This assumption is jeopardized

when ratio variables are involved and, as a result, the true best practice is missed. The


Table 1.1: Lasik Equipment Information

Per season Branch A Branch B Branch C

Number of Returning customers 30 24 96

Technical Staff hours 600 300 800

Number of units sold 130 144 10

Sales Staff hours 650 1200 200

Commercial Expenses 100 70 110

Table 1.2: Comprehensive Lasik Equipment Information

output indicator output indicator input indicator

Customer Satisfaction : Revenue generation: Cost

Number of returning customersTechnical Staff hours

Number of units soldSales Staff hours

Branch A 5% 20% 100

Branch B 8% 12% 70

Branch C 12% 5% 110

problem was extensively studied in [Siga 09] and here, I borrow an example from there

to explain the problem.

A company active in selling laser equipment has received the data in Table 1.1 regard-

ing its branches, which has then to be rearranged in the managerial form of choice for

better communication. In practice, management prefers to see the information in ratio

form, as shown in Table 1.2. Here, we show in Figure 1.1 how the existing DEA mod-

els, without consideration for ratio variables, results in an incorrect frontier and might

result in missing some of the potential for improvement. In this thesis, we create models

that can correctly use ratios, hence, define the right, feasible production units and the

benchmark. Here, we need to comment on what Cook et al. has recently published

[Cook 14], they pointed out that not every use of ratios in a model imposes a potential

problem and it is too restrictive to reach the conclusion that two forms of data (ratio


Figure 1.1: Difference between the facets generated, based on correct PPS estimator

(black) and conventional DEA (red) with ratio variables

and normal) cannot coexist in a model. It is true that the outcome depends on the ratio

and how it is generated. For example when inputs and outputs are created using the

same denominator (for instance, all inputs have been divided by the largest input), an

easy transformation like multiplication by a constant would make ratios into a normal

variable. In such cases well-known DEA models can still be used, with no adjustments.

The general advice is: whenever possible, try to replace a ratio variable with a proxy

measure (e.g. instead of a poverty index, use the number of people seeking jobs or are

on benefits) but when that is not an option, we have a solution. Here, we focus on the

general case of using ratios where such transformations are not available.

1.3 Thesis Objectives and Contributions

The major goal in this thesis is to develop a mechanism for correctly incorporating ratio

variables into DEA. We develop a robust non-oriented model, which is units and trans-

lation invariant. It provides a comprehensive measure of efficiency bounded by zero and

one, which is easy to explain to management. We also propose a method to overcome

nonlinearity for a radially oriented model (BCC). We then convert this conceptual model

into a computational reality by transforming it into an LP. We use approximation meth-


ods to develop a semi Monte Carlo mechanism for solving the nonlinear cases. On this

journey, we had made a catalogue of all developed non-oriented models as well as a pro-

cedure to create a non-oriented model. For our purposes we also make a list of desired

properties for a DEA model and compare existing models with ours, based on these. Our

models are tested on a case study to create a visual showcase and demonstrate the tan-

gible benefits. We believe the techniques presented are helpful for researchers trying to

develop DEA models for specific industries which are traditionally more inclined towards

using ratio variables.

This thesis is structured as follows:

• In Chapter 2, we cover the DEA concept, assumptions and realization methods.

• Chapters 3 and 4 collectively provide the literature review required for this work.

Chapter 3 is dedicated to the non-oriented DEA models. Chapter 4 presents infor-

mation on the approximation methods in connection with the DEA.

• Chapters 5 and 6 outline the proposed methodologies in this work. Chapter 5 deals

with creating and realizing a new non-oriented DEA model, which supports the

use of ratios. Chapter 6 outlines the methodology of the approximation method

proposed for the special case of BCC with ratio variables.

• In Chapter 7, models are compared on a small but real set of data from 132 branches

of a major Canadian bank.

• Chapter 8 summarizes the contributions and comments on the future prospects of

this work.

Chapter 2

Data Envelopment Analysis:

Theory, Assumptions and

Realization Techniques

2.1 Introduction

In his seminal econometric work [Farr 57], Farrell proposed an activity analysis approach

to correct what he believed were deficiencies in the commonly used index number ap-

proaches to productivity (and alike) measurements. His main concern was to generate an

overall measure of efficiency that accounts for the measurements of multiple inputs and

outputs. The concept was materialized over twenty years later. In 1978, Charnes, Cooper

and Rhodes (CCR) [Char 78] generalized Farrell’s work and formulated it in a mathe-

matical form. Charnes et al. [Char 78] described DEA as a “mathematical programming

model applied to observational data to provide a new way of obtaining empirical estimates

of relationships — such as production functions and/or efficient production possibility

surfaces — that are the cornerstones of modern economics” [Coop 04].

DEA is a “data oriented” approach for evaluating the performance of a set of peer

8

Chapter 2. DEA: Theory, Assumptions and Realization Techniques 9

entities called Decision Making Units (DMUs), which provides a single efficiency score

while simultaneously considering multiple inputs and multiple outputs. It is essential

that all DMUs have the same operational and cultural environment, otherwise they are

not comparable, e.g. we cannot compare bank branches with grocery stores. Because

it requires very few assumptions, DEA has opened up possibilities for use in cases that

have been resistant to other approaches because of the complex (often unknown) nature

of the relationships between the multiple inputs and multiple outputs involved in the

operation of the DMUs.

Formally, DEA is a methodology directed to frontiers rather than central tendencies.

Instead of trying to fit a regression plane through the data, as in statistical regression, for

example, one “floats” a piecewise linear surface to rest on top of the observations. Because

of this perspective, DEA proves to be particularly adept at uncovering relationships that

remain hidden in other methodologies [Coop 04].

Researchers in a number of fields have recognized that DEA is an excellent method-

ology for modeling operational processes. Its empirical orientation and minimization of

a-priori assumptions have resulted in its use in a number of studies involving efficient

frontier estimation in the nonprofit sector, in the regulated sector, and in the private

sector. DEA encompasses a variety of applications in evaluating the performances of

different kinds of entities such as hospitals, universities, cities, courts, business firms,

and banks, among others. According to the latest bibliography available [Emro 08], over

4000 papers were published on DEA by 2007. Our recent search using Google Scholar

showed that the number of publications between 2008 and 2015 with “Data Envelopment

Analysis” in the title is above 3000. In total, the subject has generated enough interest

among scholars to write more than 7000 papers in peer reviewed journals. Such a rapid

growth and widespread acceptance of the methodology of DEA are testimonies to its

strengths and perceived applicability by academics.


2.2 DEA Basic assumptions

We have talked about the history of DEA, the motivation behind it and the flexibility

it offers compared to or used in conjunction with statistical methods. Here, we review

the principles and assumptions around various DEA models that we may refer to in this

work.

2.2.1 Production Possibility Set

In the productivity analysis, or efficiency measurement in general, when the DMUs

consume s different inputs to produce m different outputs, the production possibil-

ity set is the collection of all feasible DMUs that are capable of producing output

Y = (y1, y2, ..., ym) by consuming input X = (x1, x2, ..., xs). The PPS is defined as

the set:

Ψ =

(X, Y ) ∈ Rm+s‖X can produce Y

(2.1)

As mentioned in section 2.1, DEA is very data oriented. This means that we build the

production possibility set, based on observed data points and some assumptions, which

in some aspects, relates to our model. We briefly introduce some of the assumptions used

but leave the detailed evaluation and the choice of the appropriate model for the next

chapter.

Free disposability axiom: A fundamental assumption to form the PPS out of the

available data is “disposability”. If X can produce Y so does any X‘ ≥ X and if Y

could be produced by X so could be any Y ′ ≤ Y . Formally, each observed set of data

X = (x1, ..., xm), Y = (y1, ..., ys) brings along part of the unobserved piece of the PPS

which is defined as:

Ψ′ ⊆ Ψ =

(X ′, Y ′) ∈ Rm+s|X ′ ≥ X and Y ′ ≤ Y

This is like saying, if DMUi could be realized, then any DMU that is doing worse is


feasible, too. This assumption leads to the Free Disposal Hull (FDH) model [Depr 84],

which shares its PPS with many of the other models.

Convexity: Any convex linear combination of realized DMUs is feasible. In other words,

if two DMUs are in the PPS, so is the line connecting them (or any linear combination

of them). More generally, this holds for the linear combination of n DMUs defined

by: DMUcomposite = ∑n

i=1 λi · DMUi|∑n

i=1 λi = 1. This assumption leads to the BCC

model, a variable returns to scale model which will be explained later in 2.3.2.

Ray Unboundedness: Scaling up or down of any realized DMU generates a new feasible

DMU. ∀ DMUi ∈ PPS and γ ≥ 0, γ · DMUi ∈ PPS. This assumption, added to the

convexity assumption, is the basis of CCR, a constant returns to scale model which we

will visit later in 2.3.1.

2.2.2 Frontier

Once we generate the desired PPS, set Ψ in (2.2.3), then it is time to define the potential

benchmarks or the frontier. The frontier is composed of one or more estimated lines or

surfaces (depending on dimensions) enveloping only but no less than the whole PPS. It is

the line/hyperplane that separates the feasible DMUs from infeasible ones. We might be

interested in certain facets of the frontier, depending on our intention to reduce inputs

or augment outputs. The projection to the frontier may be based on an input or output

facet (segment) of the frontier or a combination of both, defined as follows:

∂ΨX = Y |(X, Y ) ∈ Ψ, (X, η · Y ) /∈ Ψ,∀η > 1 ,

∂ΨY = X|(X, Y ) ∈ Ψ, (θ ·X, Y ) /∈ Ψ, 0 < θ < 1 .

Figure 2.1 shows different frontiers based on disposability, convexity and ray unbound-

edness assumptions about PPS.


FDH Frontier

E

A

B

D

Output Slack

Input Slack

Input

CCRFrontier

BCCFrontier

XX

X

T1

T3

T2

Figure 2.1: DEA basic models

2.2.3 Efficiency Definition

What do we mean by “efficiency”, or more generally, by saying that one DMU is more

efficient than another DMU? Relative efficiency in DEA provides us with the following

definition, which has the advantage of avoiding the need for assigning a-priori measures

of relative importance to any input or output.

Full Efficiency: Full efficiency is attained by any DMU if, and only if, none of its inputs

or outputs can be improved without worsening some of its other inputs or outputs. In

most management or social science applications, the theoretically possible levels of effi-

ciency will not be known. The preceding definition is therefore replaced by emphasizing

its uses with only the information that is empirically available, as in the following defi-

nition.

Full Relative Efficiency (Pareto efficiency): In DEA we speak of relative efficiency,

because we compare the DMU against a set of reference peers. Full efficiency is attained


by any DMU if, and only if, compared to other observed DMUs, under certain assump-

tions relevant to the case such as control over inputs and/or outputs, it is not possible to

reduce the amount of any inputs and/or attain more of any outputs without using more

of at least one another input and/or reducing the levels of at least one another output.

A DMU is pareto-efficient if it bears no slacks/shortfalls in any of its inputs/outputs.

Mathematically if the production possibility set, the collection of all feasible DMUs with

output Y = (y1, y2, ..., ym) and input X = (x1, x2, ..., xs) be:

Ψ =

(X, Y ) ∈ Rm+s‖X can produce Y

DMUk is pareto-efficient if there exist no DMUj, j 6= k in PPS such that Yj ≥ Yk while

Xj ≤ Xk. Note: In compact form Yj ≥ Yk means yij > yik for some i ∈ 1...m and

yi′j ≥ yi′k for the rest, i′ ∈ 1...m 6= i, the same principal applies to X.

Technical efficiency: Assuming that the inputs or outputs can only contract or expand

radially, input technical efficiency of a unit is defined as the maximum proportion that

any of its inputs can contract without making other inputs infeasible and/or worsening

any of the outputs. Similarly output technical efficiency is the maximum proportion

any of the outputs can expand without using more of any inputs or making the unit

infeasible. So the individual inputs or outputs may carry slacks/shortfalls even though

they are technically efficient. Mathematically let’s assume the PPS is convex then the

technical input efficiency for DMUk with input and output vectors, Xk and Yk as de-

noted above, is θ∗ = minθ θ : (θ ·Xk, Yk) ∈ Ψ, 0 < θ and technical output efficiency is

η∗ = maxη (Xk, η · Yk) ∈ Ψ, η > 1 Note: η · Yk in compact orm means (η · y1k, ...η · ysk),

the same is for θ ·X.

Technical change: Technical change is the relative efficiency of the entity when com-

pared to a broader or newer peer groups over time. It represents the difference of the

organization’s environment and technology adoption or, technically speaking, the bench-

mark shift [Grif 99]. In light of new technology in certain industry, the frontier could

shift and DMUk would appear less efficient in the new settings. In Figure 2.1 the green


frontier shows the frontir shift. The technical efficiency can be measured by dividing the

new efficiency score by the old score.

Scale efficiency: Banker et al. [Bank 84] identify the difference between the “variable

returns to scale” model, BCC, and the “constant returns to scale” model, CCR, as a

production scale effect. Scale efficiency represents the failure in achieving the most pro-

ductive scale size, reflected by the score difference between CCR and BCC models. It is

computed as the CCR efficiency score,θ∗CCR divided by the BCC efficiency score:θ∗BCC . In

Figure 2.1 for DMU D, the scale efficiency captures the distance between “T2” and “T3”.

This means that under VRS assumptions point “T2” is already using the best practice

and because of economies of scale it is impossible to achieve the same productivity as

“T3” (which equals the productivity of A and B).

Input slack factor: For every DMU on the frontier, the input slack factor for input xi

addresses the unused capacity of that input meaning that the input xi could have been

further reduced while staying technically efficient. Input slack factor of one indicates there

is no slack for input xi. Mathematically for any DMU on the frontier (X, Y ) : X ∈ ∂ΨY

the input slack factor for xi is defined as min γi : (x1, ..., γi · xi, ..., xm) ∈ ΨY . It is evi-

dent that (1− γ∗i · xi is the slack for input xi.

Input substitution factor: For a DMU on the frontier where the input xi carries no

slack, input substitution factor identifies the lowest level of input xi that is feasible at

the cost of increasing at least one other output. Mathematically for any DMU on the

frontier with no slack in xi, (X, Y ) : X ∈ ∂ΨY , γ∗i = 1 the input substitution factor for

xi is defined as min κi : (κ1 · x1, ..., κi · xi, ..., κm cotxm) ∈ ΨY , κj > 0j = 1, ...,m. It is

the least amount of input i being able to produce output Y in the PPS. So there would

exist no other DMU with lower input i (no matter what the rest of inputs are, to produce

output Y .

Output Slack factor: For every DMU on the frontier, the output slack factor for

output yi addresses the unmet potential of that output meaning that the output yi


could have been further expanded while staying technically efficient. Output slack

factor of one indicates there is no shortfall for output yi. Mathematically for any

DMU on the frontier (X, Y ) : Y ∈ ∂ΨX the output slack factor for yi is defined as

max γi : (y1, ..., γi · yi, ..., ys) ∈ ΨX. It is evident that (γ∗i − 1) · yi is the shortfall for

output yi.

Output substitution factor: It is the maximum amount of output i achievable con-

suming X in the PPS. So there would exist no other DMU consuming X to generate

higher output i (no matter what the rest of outputs are).

2.2.4 Orientation

DMUs are represented by their inputs and outputs. Efficiency scores depend on how far

the DMU is located from the frontier. Depending on the problem, DMUs can reduce

their inputs or increase their outputs, or target improvements in inputs and outputs,

simultaneously, in order to move to a point on the frontier. The models that focus on

minimizing inputs are called input oriented and the models that focus on maximizing

outputs are called output oriented. There are models with the goal of minimizing inputs

and maximizing outputs simultaneously, they are called non-oriented models.

2.3 DEA basic models

Depending on how one defines the PPS, the frontier and how to measure the distance,

there are several models that can be used. Each model has its applications and is suitable

for certain subject areas or cases. As there are dozens of special DEA models, we will

only describe the basic CCR, BCC, and input and output orientation for both. What

is also worth paying attention to is that defining a model which is theoretically sound

but cannot be operationalized is hardly of any use in practical applications. The models


that have gained traction and were put into use were those with which the scholars also

provided a guide or solution on how to realize them.

2.3.1 CCR Model

In the CCR model [Char 78],[Char 81], which is named after its developers, Charnes,

Cooper and Rhodes, the PPS is based on ray unboundedness and disposability assump-

tion. The authors simply generalize the ratio efficiency for a one-input, one-output case

to include multiple inputs and outputs. They reduce the nonlinear form to a linear

model. The model can be either input or output oriented and it mainly deals with scale

efficiency. They scale all the inputs down or outputs up to achieve a better efficiency. A

given DMUk has a relative efficiency, θk, defined as the maximum of ratio of the weighted

sum of s outputs, yk = (y1k, ..., ysk, to the m inputs, xk = (x1k, ..., xmk), in other words

for each DMU a virtual aggregated output and input is formed and is maximized.

θk =u1y1k + ...+ ukyskv1x1k + ...+ vmxmk

.

The input and output weights, (v1, ..., vm) and (u1, ..., us) respectively, are not fixed and

are chosen to show DMUk under the best possible light. Weight and efficiency score are

formulated in the following non-linear optimization for DMUk:

max θk =

∑si=1 uiyik∑mi=1 vixik

,

s.t.∑si=1 uiyij∑mi=1 vixij

≤ 1, j = 1, ..., n,

ui, vi ≥ 0.

To make this computationally viable, first the above fractional form was linearized by

assuming∑m

i=1 vixij = 1 and adding it to the constraints. The dual of the resulted LP is

then used. The dual of the above is transformed into the following LP form, introducing


weights for each DMU, λj, this is known as output oriented CCR:

min θk (2.2)

s.t.

n∑j=1

λjxij ≤ θkxik, i = 1, ...,m, (2.3)

n∑j=1

λjyij ≥ yik i = 1, ..., s, (2.4)

λj ≥ 0. (2.5)

If in the original form, instead of maximizing the virtual output to the virtual input,

we choose to minimize the inverse, the resulting dual LP will be as the following which

is known as the output oriented version. For the CCR model the input efficiency score

would be the inverse of the output efficiency score.

max ηk (2.6)

s.t.

n∑j=1

λjxij ≤ xik, i = 1, ...,m, (2.7)

n∑j=1

λjyij ≥ ηkyik i = 1, ..., s, (2.8)

λj ≥ 0. (2.9)

2.3.2 BCC Model

In the BCC model [Bank 84], the PPS assumptions are more restrictive and the convexity

postulate replaces the ray unboundedness but the rest are the same as the CCR model.

As a result, the efficiency score in the BCC is usually higher than that of the CCR.

Almost the same technique is used to reduce the problem to a linear form, the dual LP


for input oriented model is given by:

min θk (2.10)

s.t.

n∑j=1

λjxij ≤ θkxik, i = 1, ...,m, (2.11)

n∑j=1

λjyij ≥ yik i = 1, ..., s, (2.12)

n∑j=1

λj = 1, λj ≥ 0. (2.13)

With the aid of the simplex method and advances in computer and mathematical algo-

rithms, the LP forms became widespread. The output-oriented model would be the same

as CCR with the additional convexity constraint,∑n

j=1 λj = 1. Although the frontier

will not be affected by the input or output orientation, the inefficient DMUs will have

different efficiency scores in BCC model, depending on the orientation, because they

target different parts of the frontier.

2.3.3 Additive model

While the CCR and BCC models are either focused on minimizing inputs (input oriented)

or maximizing outputs (output oriented), the additive model focuses on decreasing inputs

(eliminating input slacks, s−i ) and increasing outputs (eliminating output shortfalls, s−i )

simultaneously and therefore, has no orientation. The original additive model[Char 85]

was formulated for VRS case and shares the same PPS with the BCC model, through out

this work we will refer to VRS additive model in general. However additive model can

also be formulated under CRS assumption by eliminating the convexity constraint as was

done in [Ali 93]. In Figure 2.1, point “E” is an inefficient DMU and in input orientation

BCC model, “E” needs to reduce inputs and some output slacks to reach “A”, and

the same analogy is true for output orientation which leads “E” to “G”. However, in


the Additive model, point “B” is optimum because reaching that point requires overall

maximum cuts in waste and shortfalls. The LP for additive model is given by:

maxλ,s±i

m∑i=1

si− +

s∑i=1

si+ (2.14)

s.t.

n∑j=1

λj · xij − xik + si− = 0 i = 1, ...,m (2.15)

n∑j=1

λj · yij − yik − si+ = 0 i = 1, ..., s (2.16)

n∑j=1

λj = 1 (2.17)

λj ≥ 0 (2.18)

2.4 Deterministic Frontier Estimators

All DMUs belong to the subspace between the origin and the frontier or the frontier and

infinity depending on output or input orientation, respectively. The concept of a frontier

is more general and easier to understand than the concept of a “production function”,

which has been regarded as a fundamental concept in economics. The frontier concept

allows each DMU to be seen under the best possible light contrary to production function

that remains the same for every DMU. Based on assumptions about the production e.g.

VRS, CRS, and the scope for improvements e.g. control over input/output, a benchmark

is constructed and a role model on that frontier is identified for every DMU. It is then vital

to get that benchmark right because otherwise, we would set an improper objective for the

DMU under study. Hence, we might either underestimate or overestimate the efficiency

score and, as a result, give an unrealistic projection. Number of basic assumptions in

deterministic frontier estimations is explained below.


2.4.1 Free Disposal Hull Frontier

The free disposal hull (FDH) assumption adds the unobserved production points with

output levels equal to or lower than those of some observed points, and with at least

one improved input; or, with input levels equal to or higher than those of some observed

points, and at least one improved output compared to the observed production data

[Depr 84]. In other words, if X generates Y , then more X can still generate Y , and X

can generate less Y , too. FDH is assumed, in the literature, to be sufficient to induce a

reference set that has all the properties that the economic theory requires of a production

set [Tulk 93]. However, strong disposability assumptions exclude congestion, which is

frequently observed, e.g. in agriculture and transportation, and undesired outputs (or

inputs), e.g. in oil production. As a simple example, if 100 trucks can deliver goods along

a specific route, within a certain time, then 1000 trucks (more input) might not necessarily

perform at the same level because the entire route might not have the capacity to handle

1000 trucks. Assessment of congestion analysis within DEA and ways to deal with it could

be found in a work by Fare et al. [Fare 83a], Brockett et al. [Broc 98] and Cherchye

et al. [Cher 01]. For more information on the undesired output and disposability issue,

refer to Yang and Pollitt’s work on environmental efficiency [Yang 07]. The FDH frontier

looks like a staircase for the one-input, one-output case, as seen in Figure 2.2.

2.4.2 Variable Returns to Scale Frontiers

The convexity assumption adds any non-observed data, which is a convex combination of

some points in the FDH, to the PPS. Although there are notable arguments and evidence

favoring convexity, some researchers have found this axiom very restrictive and proposed

to drop or weaken it. For a complete study on this issue, see Cherchye et al. [Cher 99].


x

y

x

CCR

BCC FDH

Figure 2.2: Shapes of conventional frontiers: FDH, BCC, and CCR

2.4.3 Constant Returns to Scale Frontier

The full proportionality assumption includes any non-observed production point that is

proportional to some data points in the FDH. This assumption is in accordance with the

original DEA model (CCR) and the Farrell efficiency measure. It is critical to know that

in DEA, we assume that a linear combination of DMUs is possible and real. The CCR

frontier contains the other frontiers and so, if a DMU is on the CCR frontier, it will also

be on the FDH and BCC frontiers. The reverse, however, is not true.

2.5 Probabilistic Frontier Estimators

The nonparametric deterministic estimators envelop all the data points and so are very

sensitive to noise. They may be seriously affected by the presence of outliers (units

which are significantly different from others), as well as data errors, which may lead to a

substantial underestimation of the overall efficiency scores. Therefore, in order to assure

credibility of the efficiency indices, it is important to adopt some additional methods

to correct for such discrepancies. Only then may one hope to obtain estimators that

could be useful for the decision-making process. We look at probabilistic models for this


purpose.

In probabilistic models, we leave some room for errors in the PPS. The production

process is defined with a joint probability. For (x, y), a realization of random variables

of input/output (X, Y ), to be in the PPS, it should be either a dominant point (on the

frontier) or a dominated point (enveloped by the frontier):

F (x, y) = Prob(X ≥ x,Y ≤ y) > 0 dominant point;

H(x, y) = Prob(X ≤ x,Y ≥ y) > 0 dominated point.

Based on the n observed data points, the FDH estimator of the PPS,H, the frontier,F ,

and the efficiency score estimator,θ for an input-oriented case given by:

Fn(x, y) =n∑i=1

Prob(Xi ≥ x,Yi ≤ y), (2.19)

Hn(x, y) =n∑i=1

Prob(Xi ≤ x,Yi ≥ y), (2.20)

θn(x, y) = infθ|Fn(θ · x|y) > 0, and (2.21)

θn(x, y) = infθ|Hn(θ · x|y) > 0 (2.22)

It has been proven by Park et al. [Park 00] that θ(x, y) is a consistent estimator of

θ(x, y) with the convergence rate of n−1/(m+s) where m=number of inputs and s=number

of outputs.

2.5.1 Partial m Frontier

Cazals et al. [Caza 02] introduced the concept of partial frontiers (order-m frontiers)

with a nonparametric estimator that does not envelop all the data points. While keeping

its nonparametric nature, the expected order-m frontier does not impose convexity on

the production set and allows for noise (with zero expected values). For example, to

measure the input efficiency of (x, y), we pick m random DMUs that fit the criterion

of producing at the same level or better than y. We estimate the FDH PPS and the


efficiency score, based on those m DMUs. (x, y) is then compared to a set of m peers

producing more than its level y with the expectation of the minimal achievable input

being the benchmark. This would replace the absolute minimal achievable input given

by:

θm(x, y) = infθ|(θ · x, y) ∈ Ψm(y),

Ψm(y) = (x∗, y∗)|x∗ ≥ xi=1...m, y∗ ≤ y.

In the probabilistic model m DMUs are represented by random variables, so do Ψm(y)

and θm(x, y). The input efficiency score, on average, is then given by:

θm(x, y) = E(θm(x, y)|Y ≥ y). (2.23)

Therefore, instead of looking for the lower boundary, input orientation frontier, the

order-m efficiency score can be viewed as the expectation of the minimum input efficiency

score of the unit (x, y), when compared to m units randomly drawn from the population

of units producing more outputs than the level y. This is a less extreme benchmark

for the unit (x, y) than the absolute minimal achievable level of inputs. The order-m

efficiency score is not bounded by one: a value greater than one indicates that the unit

operating at the level (x, y) is more efficient than the average of m peers randomly drawn

from the population of units (n observed DMUs) producing more output than y.

θm,n(x, y) = θn(x, y) +

∫ ∞θn(x,y)

(1− FX(ux|y)m)du

lim θm,n(x, y) = θn(x, y), as m→∞

For a finite m, the frontier may not envelop all data points. The value of m may

be considered as a trimming parameter and as m increases, the partial order-m frontier

converges to the full-frontier. It is shown that by selecting the value of m as an appropriate

function of n, the nonparametric estimator of the order-m efficiency scores provides a

robust estimator of the corresponding efficiency scores, sharing the same asymptotic


properties as the FDH estimators, but being less sensitive to outliers and/or extreme

values. In the literature, numerical methods like the Monte Carlo procedure are being

used instead of evaluating multivariate integrals. In chapter 6 of this work, we will return

to the idea of m-frontiers and build our own Monte Carlo method to derive the frontier

estimator to deal with nonlinearity.

2.5.2 Quantile Frontier

Aragon et al. [Arag 05] proposed an alternative approach to order-m partial frontiers by

introducing quantile-based partial frontiers. The intention is to replace the concept of

the “discrete” order-m partial frontier by a “continuous” order− α partial frontier, where

α ∈ [0, 1] corresponds to the level of an appropriate nonstandard conditional quantile

frontier. This method is more robust in relation to the effects of outliers. The original

α-quantile approach proposed in [Arag 05] was limited to one-dimensional input for the

input oriented frontier and to one-dimensional output for the output oriented frontier.

Daouia and Simar [Daou 07] developed the α-quantile model for multiple inputs and

outputs. Similar to equation 2.22, α− quantile input efficiency is defined as:

θα(x, y) = infθ|H(θ · x, y) > 1− α (2.24)

Unit (x, y) consumes, by a ratio α, less than all other units producing output larger than,

or equal to, y and consumes, by a ratio (1−α), more than remaining units. If θ(x, y) = 1,

we will say that the unit is input efficient at the level α. Clearly, when α = 1, this is

the same as the Farrell-Debreu input efficiency score, sharing the same properties of the

FDH estimator, but since it does not envelop all the data points, it will be more robust

in relation to extreme and/or outlying observations [Daou 07].


2.5.3 Practical Frontiers

As we have seen, DEA is very data oriented and it builds the PPS, based on certain

assumptions. DEA does not have any benchmark to rank the efficient units against and,

since its vision is limited to the sampled data, it cannot perceive any potential improve-

ment beyond the already identified efficient DMUs. Moreover, although the fundamental

assumptions hold, on average, in practical cases, there might be exceptions for some

entities due to either managerial or natural restrictions. For example, although we may

assume that any linear combination of DMUs could be realized, we cannot guarantee if

any inefficient DMU projected to a target, can imitate that production by changing its

inputs/outputs accordingly.

One of DEA’s limitations is associated with its inability to provide any further in-

sight into the DMUs on the frontier. However, there might be a possibility for the

DEA-efficient DMUs to improve and it is important for management to set targets for

their efficient units if the organization is to advance as a whole. Sowlati and Paradi

[Sowl 04] looked at the problem and formed a new practical frontier, by possible changes

in the inputs/outputs of the already efficient DMUs. They introduced a novel linear

programming approach to create those hypothetical DMUs, and formed a new practical

frontier. Other researchers have worked on a practical frontier by introducing weight

restrictions on inputs/outputs to prevent DEA from setting a practically impossible tar-

get on the frontier for an inefficient unit. The 73rd Annals of Operations Research was

dedicated to “extending the frontiers of DEA” and it includes various papers on the issue

[Lewi 97]. In our work, we will use numerical methods to generate hypothetical DMUs

to build the practical frontier that dominates the conventional DEA frontier.


2.6 Linkage between Data Envelopment Analysis and

Ratio Analysis

A number of researchers have studied DEA and RA and noted their positive and negative

aspects. While some papers have compared DEA and RA [Cron 02], [Fero 03], others

attempted to combine or relate the two methods [Bowl 04],[Chen 02b, Wu 05, Desp 07,

Chen 07]. In this section we study how the two techniques compare in the DEA context

and how they can be combined for better results.

2.6.1 Comparing DEA and RA

There have been several studies tat compare DEA and RA. Below we provide a sum-

mary of those studies. Cronje [Cron 02] compared the use of the DuPont system with

DEA in measuring the profitability of local and foreign-controlled banks in South Africa.

The DuPont system [Gall 03] is an analysis technique to determine what processes the

company does well in and what processes can be improved by focusing on the interrela-

tionship between return on assets, profit margins and asset turnover. The results show

that DEA gives a more accurate classification because it provides a combined comparison

of the performance of the banks with regard to different financial ratios, beyond the three

ratios involved in the DuPont system.

Feroz et al. [Fero 03] tested the null hypothesis that there is no relationship between

DEA and traditional accounting ratios as measures of the performance of a firm. Their

results reject the null hypothesis indicating that DEA can provide information to analysts

that is additional to that provided by traditional ratio analysis. They applied DEA to

the oil and gas industry to demonstrate how financial analysis can employ DEA as a

complement to ratio analysis.

Thanassoulis et al. [Than 96] studied DEA and performance indicators (output to

input ratios) as alternative instruments of performance assessment, using data from the


provision of prenatal care in England. They compared the two aspects of performance

measurement and target settings. As far as performance measures are concerned, in a

typical multi-input multi-output situation, various ratios should be defined. However,

this makes it difficult to gain an overview of the unit’s performance, particularly when

the different ratios of that unit do not agree on the unit’s performance, as is often the

case; yet, selecting only some ratios can bias the assessment. They found that their DEA

and individual ratios agree weakly on unit performance and this is because DEA reflects

the overall efficiency, while RA only reflects specific ones. On the second aspect, namely,

target setting, DEA identifies input/output levels that would render a unit efficient.

Ratio-based targets may result in unrealistic projections because they are derived with

reference to one input and one output at a certain point in time, regardless of the rest of

the input-output levels. However, the authors believe that ratios could give some useful

guidance for further improvements of the efficient units in DEA.

Finally, Bowlin [Bowl 04] used DEA and ratio analysis to assess the financial health

of companies participating in the Civil Reserve Air Fleet — an important component

of the Department of Defense’s airlift capability — over a 10-year period. He employed

DEA and then tried to explain the observations based on ratio analysis. He believes the

two methods together gave a better insight to the study.

2.6.2 Combining DEA and RA

There are several studies that attempted to combine or relate the two methods. As

proved in [Siga 09], ratio analysis (RA) is the same as the CCR model when the DMUs

have only a single input and a single output. Below we provide a summary of the studies

combining DEA and RA.

Chen and Agha [Chen 02b] characterized the inherent relationships between the DEA

frontier DMUs and output-input ratios. They showed that top-ranked performance by

ratio is a DEA frontier point and DEA subsumes the premise of the RA, however, it fails


to identify all types of dominating units, as DEA does.

Gonzalez-Bravo[Gonz 07] proposed a Prior-Ratio-Analysis procedure which is based

on the existence of a relationship of individual ratios to DEA efficiency scores. He listed

the efficient units whose efficiencies are overestimated by DEA because they perform

highly in a single dimension, and the inefficient units whose efficiencies are underesti-

mated because they perform reasonably well in all the considered dimensions, but do not

stand out in any of them.

Being motivated by Chen and Agha’s [Chen 02b] paper, Wu and his colleagues

[Wu 05] proposed an aggregated ratio analysis model in DEA. This ratio model has

been proven to be equivalent to the CCR model. However, we believe the inclusion of all

possible ratios in the model does not necessarily make sense and the number of possible

ratios grows in size exponentially, as the number of inputs-outputs increase. To illustrate

this, consider a three-input, three-output case. We will have (23 − 1) aggregated inputs

and (23 − 1) aggregated outputs, so the model optimizes 7 × 7 = 49 aggregated ratios,

where some of them do not represent a meaningful concept, e.g. the ratio of cured people

in ICU to the number of children admitted in ER. The authors have also proven that

a subset of all the possible aggregated ratios is also equivalent to the CCR model and,

in our example, the number of variables decrease significantly, which is a substantial

improvement. However, the necessities of including unrelated ratios still remain unad-

dressed. To deal with some meaningless ratios Despic et al. [Desp 07] proposed a DEA-R

efficiency model, in which all possible ratios (output/input) are considered as outputs.

This model enables the analyst to easily translate some of the expert opinions into weight

restrictions, in terms of ratios, thereby creating an immediate communication between

the experts and the model.

In another interesting study, Chen and McGinnis [Chen 07] showed there is a bridge

between ratio efficiency and technical efficiency. Therefore, RA(m,s,k), where m is a one

of the input elements from input X, s is one of the outputs from output Y and k repre-


sents the specific DMU, is a product of seven different component measurements. They

are technical efficiency, technical change, scale efficiency, input slack factor, input sub-

stitution factor, output slack factor and output substitution factor. Technical efficiency,

technical change and scale efficiency are DMU dependent only, i.e. for DMUk, they will

be the same, no matter what input m and output s are selected for RA(m,s,k). Input

slack factors and input substitution factors are DMU and input dependent. For a par-

ticular RA(m,s,k), they depend on the selection of input m and DMUk but not output

s. However, output slack and substitution factors are functions of DMUk, input m and

output s. This relationship provides a basis for concluding that the conventional partial

productivity metric is not a proper performance index for system benchmarking. This is

because it depends on other effects, in addition to the system-based technical efficiency

between a given DMU and a “benchmark” DMU. Furthermore, RA(m,s,k) is the product

of technical efficiency and the other six factors which are all less than, or equal to, one.

Therefore, RA(m,s,k) being close to one indicates that all seven factors should be close to

one, and, in fact, larger than RA(m,s,k). This property partly explains why the DMUs

with the largest output-input ratio will be technically efficient when their RA equals one

[Chen 02a].

Other researchers have started to use the financial ratios as inputs and outputs in

DEA, with an expectation that they can get the best out of that. In a Magyar Nemzeti

Bank Working Paper, Hollo and Nagy [Holl 06] employed ratios in their production model

to assess 2459 banks in the European Union. Hollingsworth and Smith, for the first time,

pointed out the inaccuracy of the CCR model [Holl 03], when data is in the form of

ratios. Then Emrouznejad et al. [Emro 09] examined the problem of ratios in more

detail and proposed a series of modified DEA models. We focused on the full efficiency

and complemented their model by adding a second phase and avoided nonlinearity with

algebraic transformations [Siga 09]. In this thesis we develop models suitable to employ

ratio variables in DEA and offer solutions to operationalize the developed concepts.

Chapter 3

Literature review of non-oriented

models

As discussed in the introduction this chapter forms the first part of the literature review

required for our work. It has been essential for us to examine the non-oriented models

in the literature and see if they can be altered in anyway to take in ratio variables.

In addition to make our model comparable, we wanted to understand the way they

others have operationalized their models and the attributes they offer. This chapter

discusses the existing non-oriented DEA models and concepts, their realization methods

and characteristics.

“Non-oriented models” is a general term associated with the DEA models that mea-

sure efficiency by simultaneously decreasing the input and increasing output. In the

literature of DEA, there are only a small number of non-oriented models, with different

applications. Non-oriented models differ from each other in how the distance to the best

practice is calculated, how much importance (weight) is put on every input or output,

and how the final score is interpreted. In this chapter, we review all the non-oriented

models in the literature and examine their properties such as units and translation in-

variance, the efficiency score bounds, and their computational complexity. This is the

30

Chapter 3. Literature review of non-oriented models 31

starting point to understand the field and a guide for us in designing our proposed model,

knowing what characteristics we need to include and how our model stands out in relation

to other models.

3.1 Russell Graph Efficiency Model

Fare et al. [Fare 85] built upon their suggested models, named input and output Russell

measure of technical efficiency to combine both. Their first idea was presented in 1978

[Fare 78] in which they extended the Farrell’s [Farr 57] one-input one-output efficiency

measure to multiple-input case. Their model overcomes the four shortcomings of Farrell’s

input measure of efficiency, the two important ones being: a) The score is one if, and

only if, the input set is technically efficient; and b) it is monotonic so an increase in input

should inversely affect the efficiency score.

They operationalized their concept by converting the radial measure to a non-radial

measure. L(y) consists of X that can produce at least y. For the input-oriented problem

and one-output case, efficiency measures defined by Farrell and Fare and Lovell [Fare 78]

are given by the following where strictly positive input elements of input x are sorted

from 1 to l and the rest are zero:

minl∑1

θil|(θ1X1, θ2X2, ...θlXl, 0, ..0) ∈ L(y)

minλ|λX ∈ L(y).

Zero elements do not come into play in efficiency calculations. It is clear that in Fare and

Lovell model for non-zero inputs, slacks are eliminated as well. They later expanded this

to a multi-output case [Fare 83b]. In a similar fashion, they defined the Russell output

technical efficiency [Fare 85] and defined that by max∑ φi

o|((Y1.φ1, Y2.φ2, ...Yo.φ0, 0, ..0) ∈

L(x), φi ≥ 1. L(x) is the set consisting of output Y that use at most x. Combining the

two input and output efficiencies, they produced the input-output, or as they called it,


“graph” efficiency measure [Fare 85].

The Russell graph efficiency considers both input and output simultaneously and sup-

posedly the unit is efficient if, and only if, R = 1. However, this property is questioned,

as we discuss later. The downside is that R < 1 does not convey a readily meaningful

message to management, and it is non-linear. Recently, Levkoff et al. [Levk 12] pointed

out the model’s failure to distinguish between efficient and inefficient units at the bound-

ary of the output space. In addition, it does not satisfy weak monotonicity at inefficient

units on the boundary (an increase in any output lowers the efficiency score). This tends

to create problems with zero values, in some outputs.

The way to compute the Russell graph efficiency is through the following simplified

nonlinear programming approach:

minR =

∑li=1 θi +

∑oi=1

1φi

l + o, l ∈ 1, ..,m, o ∈ 1, .., s (3.1a)

s.t.

n∑j=1

λj · xij ≤ θi · xik ∀i = 1, .., l (3.1b)

n∑j=1

λj · yij ≥ φi · yik ∀i = 1, .., o (3.1c)

λj ≥ 0 0 ≤ θi ≤ 1 φi ≥ 1. (3.1d)

It is not clear to us why the two input and output Russell measure objectives were not

added to each other, and averaged as a whole. It is worth mentioning that instead of

having an arithmetic mean of input contractions and an inverse harmonic mean of output

expansions, the objective is now an unweighted mean of aggregated input contractions

and inverse output expansions for positive input and outputs only. Zero inputs and

outputs do not come into play. Although not considered by the authors, with a simple

investigation, we can prove that the formulation is units invariant but not translation

invariant. To overcome the computational difficulty, whenever the goal is merely to group

the units into two Russell efficient and inefficient categories, and data is strictly positive,


Cooper et al. [Coop 99a] devised the following model named MIP (measure of inefficiency

proportions) and proved a unit is efficient in the Russell model, if it is MIP efficient. The

optimization to be solved for MIP is given by:

maxλ,s±i

m∑i=1

sik−

xik+

s∑i=1

sik+

yik, (3.2a)

s.t.

n∑j=1

λj · xij − xik + si− = 0 i = 1..m (3.2b)

n∑j=1

λj · yij − yik − si+ = 0 i = 1..s (3.2c)

λj, s±ik ≥ 0. (3.2d)

Moreover, they have shown that if optimum output slacks to MIP equal zero, then the

two optimal objectives will be equal; of course, this does not hold for zero input slacks.

Cooper, Park and Pastor also provided an algebraic approximation for the Russell

measure under special circumstances: positive values ands+ikyik

< 1. They first transformed

Russell measure through algebraic manipulation and then used algebra again to approx-

imate the nonlinear objective into a linear version and guarantee it will be between zero

and one. Their attempt to develop a routine to solve nonlinearity caused by the sum of

fractions by algebraic approximation is worth further attention and work, but has not

been investigated to date, to our knowledge. The approximation for Russell is linear and

they have proved the following:∑mi=1 θi

∗+∑si=1

1φi

∗

m+s≈ 1 −

∑mi=1

sik∗

xik+∑si=1

s∗ikyik

m+s. The optimal

values of the two differ less than∑si=1

s∗ikyik

2

m+s. It is also worth mentioning that Ruggiero et

al. [Rugg 98] produced a weighted Russell measure for the one sided case. The intention

is to give priority to a few variables preferred by management. The outcome of the model

depends on the right choice of weights and if the relative weights are biased, distortions

might be introduced. For the one-output case, ordinary least square regression can be

used to choose weights, while for the multi-output case, the canonical regression analysis


is the optimum method. For the two sided case, there is no suggestion on how to choose

weights.

3.2 Refined Russell Graph Efficiency Model

As briefly mentioned above, Levkoff et al. [Levk 12] established that the Russell graph

measure does not behave as expected at the output boundary, and, in particular, once

the output has zero elements, the inefficient unit might be classified as efficient with score

one, while increasing those output levels from zero will decrease the efficiency score. This

is because it brings the once ignored output into play. Then, for two outputs, the one with

the lower level of output (zero) has a higher efficiency score. This can be shown using a

simple example of one-input two-outputs case; please see [Levk 12]. They have proposed

the following to rectify the problem: instead of excluding yi = 0 from the formulation,

the zero elements of outputs are excluded only if any increase in that element takes the

unit outside the PPS, assuming that the PPS is defined by technology, T . They defined

an indicator function to differentiate between efficient and inefficient outputs with zero

values (on the frontier) as follows:

ψj(x, y, T ) = 1 if yj ≥ 0 oryj = 0∧ < x, (y1, ..yj + ε+ ys >∈ T for some ε ≥ 0

ψj(x, y, T ) = 0 if yj = 0∧ < x, (y1, ..yj + ε+ ys >/∈ T ∀ε

δ(xi) = 0 if xi = 0 and δ(xi) = 1 if xi ≥ 0.

The modified Russell measure is then defined as

infθ,φ

∑i δ(xi) θi +

∑j ψj(x, y, T ) φ−1

j∑i δ(xi) +

∑j ψj(x, y, T )

|(x θ, y φ) ∈ T

.

Computationally formulating the above is not easy. Indicator functions ψj depend on

infinitesimal comparisons ε, which are called the non-Archimedean element. It is assumed

to be smaller than any positive number and remains so, even if it is multiplied by a

large number. The constraint set is not closed and the minimum does not always exist.


The authors suggested replacing zero outputs with ε if the technology is known, then

calculating ψj in the same fashion as before, if the technology is known and convex, then

shadow prices can help. For zero output elements, if the shadow price is positive for any

shadow price vector supporting < x, y >, then ψj = 0, otherwise, ψj = 1. However,

most of the time, the technology is not known and will be estimated using data points

available, as is typically with DEA. The difficulty still lies in getting ψj right. The authors

suggest calculations in three steps: In step 1, for each selected DMU, the zero outputs

are examined to see if they are efficient or not, and this is done by replacing zeros in

output with small ε and solving the following:

minR =l∑

i=1

θi +s∑i=1

φi, l ∈ 1, ..,m, (3.3a)

s.t.

n∑j=1

λj · xij ≤ θi · xik ∀i = 1, .., l (3.3b)

n∑j=1

λj · yεij ≥yεikφi

∀i = 1, .., s (3.3c)

λj ≥ 0, 0 ≤ θi ≤ 1, 0 ≤ φi ≤ 1. (3.3d)

We take note of Set=i|yi = 0∧φ∗i ≤ 1 as they are output elements belonging to the

inefficient production vector. Step 2 involves finding the minimum for the numerator of

the modified function, by setting φi = 0 for zero outputs that are inefficient and solving

the following non-linear optimization:

minR =l∑

i=1

θi +∑i/∈set

φi l ∈ 1..m, (3.4a)

s.t.

n∑j=1

λj · xij ≤ θi · xik ∀i = 1..l (3.4b)

n∑j=1

λj · yεij ≥yεikφi

∀i /∈ set (3.4c)


λj ≥ 0 0 ≤ θi ≤ 1 0 ≤ φi ≤ 1 (3.4d)

And finally, step 3 involves dividing the above objective function by l + o+ |Set|.

The above procedure is complicated since two steps involve nonlinear programming

and at each step, outputs with zero values have to be replaced with infinitesimal values.

3.3 Multiplicative Model (log measure)

In this model, instead of the “summation” used in the CCR model to build the virtual

outputs/inputs, “multiplication” is used [Char 82]. The initial formulation looks like:

maxµ,ν

∏si=1 y

µir0∏m

i=1 xνii0

, (3.5a)

s.t.∏si=1 y

µiij∏m

i=1 yνiij

≤ 1 ∀j = 1, .., n (3.5b)

µi, νi ≥ 1∀i. (3.5c)

Taking the logarithm of the above, we will get

maxµi,νi

s∑i=1

µiyr0 −m∑i=1

νixi0, (3.6a)

s.t.

s∑i=1

µiyij −m∑i=1

νixij ≤ 0 j = 1, .., n (3.6b)

µi, νi ≥ 1. (3.6c)

And the dual is

maxλ,s±i

m∑i=1

si− +

s∑i=1

si+, (3.7a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.7b)


n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.7c)

λj, si−, si

+ ≥ 0, (3.7d)

which is exactly like the additive model for log inputs and log outputs, whereby it mea-

sures log efficiency. Note that a DMU is efficient if, and only if, it has a log-efficiency

of zero. It can be proven that if DMUo appears in the optimal basic solution of (3.7a),

then DMUo is efficient. This is the same as what we had in the case of CCR where the

reference set is all efficient. The easiest way to solve (3.7a) is to continue to use the dual

log measure. For every DMU, only the right-hand side of the constraints will change.

In terms of computation, there is no need to compute efficiency for all observed DMUs,

because for each DMU, once we have the efficient subset, it is not required to solve the

LP (3.7a). This only needs to be solved for the DMUs not recorded so far. The model

is neither units nor translation invariant and the efficiency score below zero does not

convey any meaningful message other than “inefficient”. The score is not positive and

not bounded from below. This method is only useful to pinpoint efficient units (frontier)

rather than estimating the efficiency score of inefficient ones.

3.4 Invariant Multiplicative Model

In this model, Charnes et al. [Char 83] built upon their log measure model and enhanced

it to be units invariant. This is done by including a virtual input and output element for

DMUo equal to e to make the method units invariant [Char 83]. The exponents η, ξ are

used for this virtual input and output, respectively, where intensity variables µi, νi are

used for real inputs and outputs. So the formulation becomes:

maxµ,ν,η,ξ

eη∏s

i=1 yµii0

eξ∏m

i=1 xνii0

, (3.8a)

s.t.


eη∏s

i=1 yµiij

eξ∏m

i=1 xνiij

≤ 1 ∀j = 1, .., n (3.8b)

η, ξ ≥ 0, µi, νi ≥ δ, ∀i δ > 0. (3.8c)

Scaling outputs by ai and inputs by bi, where ai, bi > 0, will transform the above to the

following:

maxµ,ν,η,ξ

eη∏s

i=1 yµii0

∏si=1 a

µii

eξ∏m

i=1 bνii

∏mi=1 x

νii0

, (3.9a)

s.t.

eη∏s

i=1 yµiij

∏si=1 a

µii

eξ∏m

i=1 xνiij

∏mi=1 b

νii

≤ 1 ∀j = 1, .., n (3.9b)

ξ, η ≥ 0 µi, νi ≥ δ∀i. (3.9c)

Given we have optimal solution to 3.8, a feasible solution to 3.9 can be formed with

the same objective value, so 3.9a≥3.8a. Similarly a feasible solution can be constructed

from the optimal solution to 3.8 with the same objective value which implies 3.8a≥3.9a.

Hence the optimal objective scores, have to be equal, and therefore the efficiency value

is invariant under change of units; however, keep in mind that the optimal values for

variables will not necessarily be the same. After taking the log and making this into

compact form, we will arrive at the following were the hat sign of input/output means

logarithm:

max η − ξ + µT Y0 − νT X0, (3.10a)

s.t.

ηeT − ξeT + µT Y0 − νT X0 ≤ 0, ξ, η, δ > 0; µT , νT ≥ −δeT . (3.10b)

Taking the dual will result in:

maxλ,s±i

m∑i=1

δ.si− + δ.

s∑i=1

si+, (3.11a)

s.t.


n∑j=1

λj − θ+ = 1 (3.11b)

n∑j=1

λj + θ− = 1 (3.11c)

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.11d)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.11e)

λj, θ+, θ+, si−, si

+ ≥ 0. (3.11f)

Adding the first two equation will result in θ+ + θ− = 0 and because the two are non-

negative both should be zero. As a result the dual will be reduced to:

maxλ,s±i

m∑i=1

δ.si− + δ.

s∑i=1

si+, (3.12a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.12b)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.12c)

n∑j=1

λj = 1 (3.12d)

λj, si−, si

+ ≥ 0. (3.12e)

Therefore, in the log domain, the log input and output of DMUo are enveloped by the

convex combinations of log inputs and outputs. In the log domain and from a DEA

perspective, (3.12a) is based on variable returns to scale technology, as opposed to (3.7a)

initial constant return to scale model. Having the optimal solution to (3.12a), we can

write Y0 =∏n

j=1 Yλ∗j e−s∗j

j and because∑n

j=1 λj = 1, Y0 is a Cobb-Douglas function. The

same can be said for X0. It is useful to explain what a Cobb-Douglas function is. In

economics, the Cobb-Douglas production function usually represents the relationship

between two or more inputs producing one overall output. The function is of the form


Y = A.LβKα, where Y, for example, is the real value of all goods produced, L is labour

input, K is capital input, A is productivity, and α and β are output elasticities, which

are constant and defined by technology. They tell us if the capital usage increases by a

percentage, how the output will be affected. If α + β = 1, then it is constant returns

to scale, meaning that doubling all the inputs will double the output, but if the sum is

less than one, then it is decreasing returns to scale. Finally if α + β is greater than one

it is increasing returns to scale. For the case of α + β = 1, α and β show the input

shares of output. Because only positive λjs come into play in Y0 =∏n

j=1 Yλ∗j e−s∗j

j , we

can say DMUj with λj > 0 is efficient and is part of frontier (efficient facet) for DMU0.

This represents a new method for estimation of piecewise the Cobb-Douglas production

function directly from empirical data. Very recently, Cook and Zhu [Cook 14] have built

a model based on the invariant multiplicative DEA model, which enables the ranking of

units or, as they call it, cross-efficiency. Ranking in DEA is often the subject of criticism

because it depends on an optimal set of weights, which is not unique. The authors have,

however, proved that using the above model for calculating cross-efficiency will lead to a

unique score, without the need to impose secondary goals. Moreover, it is linear. Since

ranking is not in the scope of this work, we do not cover the details here; this was just

mentioned to show one of the capabilities of this model.

3.5 Pareto efficiency test model (Additive)

The Pareto efficiency test model, which was later labeled as additive, was developed by

Charnes et al. [Char 85]. Pareto efficiency is when no element of output could be bigger

without producing less of another; or, from the consumption point of view, decreasing

an input will not be possible without increasing another. Nonzero slacks are identified

as the source of inefficiency. It is worth mentioning that the meaning or amount of loss

or gain is not considered. One unit of loss in output 1 might result in three units of


gain in output 2, which, although overall, might be preferable, is not considered Pareto

improvement.

Given the empirical points and observed units, the empirical production set (EPS) is

defined as the convex hull of observed data, and it is extended to the empirical production

possibility set (EPPS) with inputs from the production set and outputs not greater than

those in the production set (disposability of outputs). Let EPPS’ be the set corresponding

to EPPS. A frontier function is defined as f(x) = max y, (x, y) ∈ EPPS’. It is proven

that f(x) is concave and piecewise linear on EPS. The Pareto-efficient empirical frontier

function is determined by first pinpointing the Pareto-efficient units from n observations.

Then the function is defined on the convex hull of the inputs by the convex combination of

the outputs. Authors had shown before that the necessary and sufficient condition for a

point x∗ to be Pareto efficient is to be the optimal solution to the following: min∑gk(x)

subject to gk(x) ≤ gk(x∗),∀k where gk(x) is a function representing our objectives. Here,

our goal is to achieve technical efficiency and to maximize outputs and minimize inputs,

as given by the following optimization problem.

minλ,s±i

m∑i=1

n∑j=1

λj · xij −s∑i=1

n∑j=1

λj · yij, (3.13a)

s.t.

n∑j=1

λj · xij − xik + si− = 0 i = 1, ..,m (3.13b)

n∑j=1

λj · yij − yik − si+ = 0 i = 1, .., s (3.13c)

n∑j=1

λj = 1 (3.13d)

λj ≥ 0. (3.13e)

Because the solution will not change if we add constants to the objective, they have

cleverly rewritten the above in the following form, thereby giving birth to the additive


model. The intention behind the above formulation is mainly to obtain a test for unit k:

if the optimal objective is zero, then unit k is optimal, thus a Pareto-efficient point. The

authors at this point were not concerned with relative efficiency and score, rather, they

just wanted to identify efficient points [Char 85].

minλ,s±i

−m∑i=1

si− −

s∑i=1

si+ (3.14a)

s.t.

n∑j=1

λj · xij − xik + si− = 0 i = 1..m (3.14b)

n∑j=1

λj · yij − yik − si+ = 0 i = 1..s (3.14c)

n∑j=1

λj = 1 (3.14d)

λj ≥ 0 (3.14e)

The linear program in (3.14) maximizes the L1 distance of a point in the convex hull

of observation to (xk, yk). In the basic additive model, units with zero slacks are the

efficient ones and inefficiencies are measured in terms of the summation of all slacks.

Since inputs/outputs have different scales, trying to merely maximize the size of the

slack, regardless of the percentage of change required, or the value of that variable,

might result in unwise targets. For example a unit wasting 500 ml of water, and 0.5

grams of gold in making an alloy will be recommended to use 500 ml of water because

simply 500 > 0.5 and the goal is to shrink the waste in size.

The Additive model is not units invariant. To achieve a units invariant measure,

the authors modified the objective by dividing each slack by the corresponding input or

output of unit k to achieve a units invariant state. They also used a scaler, δ, to map the

objective onto a desired range. The authors suggested, for instance δ = 10 1m+s

will make

the objective between zero and −10. However, they were not right and the range would

be between zero and −∑ymaxi − ymini . Although for one output case the Pareto-efficient


empirical frontier function is isotonic, this is not the case for multiple outputs. This

means that the frontier function is not monotonically increasing. Thus moving on the

frontier, the one with more input might not have, strictly, more output, rather, it could

have fewer of some outputs and more of others and still be Pareto efficient. However, we

can always find a cone of directions in the output space, on which the output projections

are isotonic. As already mentioned, in the formulation (3.14), the production possibility

set was based on convexity and disposability of outputs. If one chooses to add other

assumptions like disposability of inputs, as in in the BCC, the extended frontier is not

only composed of units with zero slacks but also with ones that have slacks in, at most,

n− 1 inputs and m− 1 outputs. If they have slacks in all inputs or outputs, they cannot

be on the frontier because they will have a radial efficiency score less than one. The other

issue is that since the goal is to maximize the slacks, the unit is projected to the farthest

part of the frontier, which might not necessarily be in the vicinity of the unit. Then the

only good outcomes will be those with zero slacks, are fully efficient and, of course, are

part of the frontier.

3.6 Extended Additive model

By switching from minimization to maximization in the above Pareto efficiency model,

Charnes et al. [Char 87] could measure inefficiency. Further to their idea of assigning

variable weights to the slacks to achieve units invariance, they tried to correct the range

issue by a post-optimization treatment. Weights, as suggested above, are the input and

output values of the unit being evaluated, but please bear in mind that the input and

output values need to be strictly positive. The extended additive model is formulated as

the LP below:

maxλ,s±i

m∑i=1

s−ixik

+s∑i=1

s+i

yik, (3.15a)

s.t.


n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.15b)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = 1, .., s (3.15c)

n∑j=1

λj = 1 (3.15d)

λj ≥ 0. (3.15e)

The other point is that this model is not translation invariant and cannot handle zero

or negative data. The score is not bounded by zero and one, and has no natural inter-

pretation as relative efficiency. Green et al. [Gree 97] suggested an intuitive objective,

which is bounded by zero and one, despite various routes that can be tried to transform

the above score to a meaningful measure. They suggested thinking of efficiency as the

ratio of the current state to the best practice. In this case output efficiency=ykj

ykj+s+j

and

input efficiency=xkj−s−jxkj

. The objective measures inefficiency which is 1-efficiency. For

the output inefficiency, we gets+j

ykj+s+j

and for the input,s−jxkj

. The input part is the same

as the units invariant extended model but the output part is slightly different. This way

the objective is bounded between zero and one but the cost becomes nonlinear. Cooper

el al. [Coop 99a] later decided to use the idea but implemented it ex post facto. This

is, rather than solving a nonlinear case, they used the idea of Green et al. [Gree 97] to

map the score after the additive model is solved. They generated a meaningful efficiency

measure, which is bounded by zero and one for this model and is presented below:

0 ≤ 1

m+ s

m∑i=1

s−∗ixik

+s∑i=1

s+∗i

yik≤ 1

yik = yik + s+i ∀i = 1, .., s.

This score is calculated after the original format is solved and the optimum slacks are

known. To keep the score below one, Y has been changed to Y , as Green et al. [Gree 97]

suggested. Although the input’s maximum slack cannot be bigger than the input itself


and the first term is bounded by m, output shortfalls can be bigger than the output

itself, and the second term can be bigger than s, if Y was used (this happens if the target

produces more than double the present value). Finally, to make the score a real measure

of efficiency (score one=100 efficiency), we subtract it from one:

0 ≤ 1− 1

m+ s

m∑i=1

s−∗ixik

+s∑i=1

s+∗i

yik≤ 1.

3.7 Constant Weighted Additive Model

Pastor changed the objective in additive model by multiplying the input excess and

output shortfalls by some constant nonnegative weights [Past 96]. The LP is given by:

maxλ,s±i

m∑i=1

ω−i · s−i +s∑i=1

ω+i · s+

i , (3.16a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.16b)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.16c)

n∑j=1

λj = 1, (3.16d)

λj ≥ 0. (3.16e)

He has presented theories which can be used to guide weight choices in a very flexible

manner. For instance, one can assign weights to the slacks in the objective function

and leave the constraints unaltered or, equivalently, one may assign these weights to the

constraints, while leaving the objective function unaltered. The weights are chosen case

by case, and the main intention is that such weights can give different inputs/outputs

yet obtain the same advantage. In other words, the weights will mitigate the problem we

encounter in the above example of water and gold. The model is translation invariant,

but no weights exist that make the model units invariant [Love 95a, Past 99a, Coop 95].


This is, in particular, due to the theory he proved in additive model scaling, where an

input/output is the equivalent of leaving the input/output unaltered and scaling the

corresponding slacks in the objective function. This tells us that there does not exist a

weighted additive model with constant weights that gives the same objective value if any

of the input/output variables are scaled.

To give a meaning to the objective, after finding the optimal slacks from the above for-

mulation, Pastor suggested calculating the following, which reflects the relative efficiency

and score of 1.0 means fully efficient.

0 ≤ 1− 1

m+ s

(m∑i=1

s−∗ixik − xi

+s∑i=1

s+∗i

yi − yik

)≤ 1

xi is the minimum of all xis and yi is the maximum of yi. It is not clear to us why

the weights in the weighted additive model are not replaced by 1m+s

1xik−xi

and 1m+s

1yi−yik

,

respectively, in the first place. If this happens, then the model will become translation

invariant too and negative data would not be a problem. This objective is what Cooper

et al. suggested to be used in the RAM model to increase the discrimination power of

RAM [Coop 99a]. Later, Cooper et al. took up this suggestion and introduced the BAM

model, as we will discuss later [Coop 11].

3.8 Normalized Weighted Additive Model

Lovell and Pastor [Love 95b] retained the constraints of the extended additive model

[Char 87] but introduced an objective with variable weights, which was dimensionless

and also bearing the desired translation invariance property.

minλ,s±i

−m∑i=1

1/σ−i · s−i +s∑i=1

1/σ+i · s+

i , (3.17a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.17b)


n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.17c)

n∑j=1

λj = 1 (3.17d)

λj ≥ 0. (3.17e)

Where σ+i and σ−i are the sample standard deviation of input (i = 1, ..,m) and output

(i = 1, .., s). It is rare, but worth mentioning, that if the sample standard deviation of

a variable is zero, then that can be completely removed from formulation, because if a

variable is constant for every unit, it reflects no change. The model is proved to be both

translation and units invariant. This would be a perfect model if the score is bounded

by unity.

3.9 Global Efficiency Measure (GEM)

The extended additive model would classify a unit as efficient if the objective function is

zero, and inefficient when it is greater than zero. It is units invariant but the score does

not make relative sense. In an attempt to make the objective function (score) meaningful,

relative to the others, Lovell et al. [Love 95a] retained the constraints but proposed a

new fractional objective, in a way to be bounded between zero and one. In their original

paper, they only considered an output oriented case and omitted inputs, not only from

objective but also from the constraints, claiming input constraints to be redundant. We

have built the following, including the inputs, also based on the same methodology.

minλ,s±i

[1 +

1

m

m∑i=1

s−ixik

+1

s

s∑i=1

s+i

yik

]−1

, (3.18a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.18b)


n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.18c)

n∑j=1

λj = 1 (3.18d)

λj ≥ 0. (3.18e)

Because an optimal solution for the extended model will be an optimal solution for GEM,

to avoid solving a nonlinear fractional program,in practice, the GEM’s approach is first

to find the slacks by solving the extended model, which is a simple linear program. Then it

uses the slacks to calculate a score following this formulation:[1 + 1

m

∑mi=1

s−∗ixik

+ 1s

∑si=1

s+∗iyik

]−1

.

The authors proved that this measure

• is greater than zero and bounded by one;

• equals to one if, and only if, the unit is fully efficient;

• is strictly monotonic;

• is units invariant.

Our observation is that the lower bound of the efficiency measure (objective) is 11+φi

where φi =∑s φis

and φi is the scalar of shortfalls of each yi. The efficiency measure here

is not translation invariant and cannot handle negative data.

3.10 Enhanced Russell Graph Efficiency Measure (en-

hanced GEM)

In the Russell graph measure, Fare et al. [Fare 85] averaged the individual input and

output efficiencies. Prior to that, they had developed Russell input and Russell output

measures, which were the arithmetic mean of positive inputs’ shrinking and the arithmetic

mean of positive outputs’ augmentations. Pastor et al. built the ratio of those averages


[Past 99b]. They have assumed, however, that all the input/output variables are strictly

positive, (more limited than the Russell graph which allows zero values). Needless to say,

minimizing ensures the Russell input measure is minimized, while the Russell output

measure is maximized, which is the desired outcome. Their formulation is given by:

minR =

∑mi=1 θim∑si=1 φis

, (3.19a)

s.t.

n∑j=1

λj · xij ≤ θi · xik ∀i = 1, ..,m (3.19b)

n∑j=1

λj · yij ≥ φi · yik ∀i = 1, .., s (3.19c)

λj ≥ 0 θi ≤ 1 φi ≥ 1. (3.19d)

This measure makes it easy to separate and interpret the input and output efficien-

cies, on average. In other words, the unit should decrease the use of inputs by∑mi=1 θim

and increase outputs by∑si=1 φis

, on average, to become a point on the frontier. So the

ratio shows how much the DMU has been successful, in terms of transforming inputs to

outputs, on average.

The above formulation has several desirable properties. The measure is greater than

zero and less than, or equal to, one, and one means Pareto efficient (zero slacks). The

objective is isotonic and units invariant but not translation invariant. For the proofs,

please consult the appendix in [Past 99a]. From the computational aspect, the enhanced

Russell measure is computed more easily, compared to the original Russell graph measure.

Nevertheless, both are nonlinear although the formulation of enhanced Russell can be

linearized by some re-arrangements and smart change of variables. With a transformation

of variables using total slacks as:

θi =xik − s−ikxik

= 1− s−ikxik

i = 1, ..,m

φi =yik + s+

ik

yik= 1 +

s+ik

yiki = 1, .., s


We will have:

minλ,s±i

1− 1m

∑mi=1

s−ixik

1 + 1s

∑si=1

s+iyik

, (3.20a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.20b)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.20c)

λj, s±i ≥ 0. (3.20d)

This is a fractional linear program and with the following change of variable method

(similar to Charnes and Cooper, 1962) it will lend itself to a linear program like:

minβ,t±ik,µ

β − 1

m

m∑i=1

t−ikxik

, (3.21a)

s.t.

β +1

s

s∑i=1

t+ikyik

= 1 (3.21b)

− βxik +n∑j=1

µj · xij + t−ik = 0 i = 1, ..,m (3.21c)

− βyik +n∑j=1

µj · yij − t+ik = 0 i = 1, .., s (3.21d)

µj, t±ik, β ≥ 0. (3.21e)

Where

β =

(1 +

1

s

s∑i=1

s+ik

yik

)−1

t−ik = β.s−ik i = 1..m

t+ik = β.s+ik i = 1..s

µj = β.λj j = 1..n


The objectives are the same for the projections, and from the optimal solution to the

above, we can construct an optimal solution to the fractional program.

3.11 Range Adjusted Model (RAM)

Cooper et al. wanted a model to generate efficiency scores that were: a) bounded by zero

and one, and b) not only 1 had to mean 100% efficient but also zero had to mean fully

inefficient. In addition, they required units and translation invariance and isotonic to

hold. They retained the constraints of the additive model and defined a new objective,

whose formulation is called RAM (Range Adjusted Model)[Coop 99a]. The general form∑mi=1 ω

−i · s−i +

∑si=1 ω

+i · s+

i measures inefficiency (when it is zero, it means that the unit

is efficient). The objective is invariant to an alternative optimum solution. To make the

measure more comprehensive, Cooper et al. changed the above by subtracting it from

one (see the equation below) to measure efficiency. To accommodate all the above, they

chose the “range” of input and outputs for the weights in the following form:

0 ≤ 1− 1

m+ s

m∑i=1

s−∗ixi − xi

+s∑i=1

s+∗i

yi − yi≤ 1,

where xi and yi

are the minimum of all xis and yis and xi and yi are the maximum of xis

and yis respectively. The measure becomes zero in a case where every input and output

of the unit is the worst possible and the target is the overall best, which is a very rare

situation. In the unlikely event of a zero range, it is ignored and corresponding constraints

are omitted (they are redundant). The other property is the “ranking” potential of this

model as the authors claim because the weights are constant for every slack element.

Steinmann et al. examined the RAM and listed several limitations. They claim the

model is misleading because it classifies large and inefficient units as being less efficient

than small and inefficient units. For the proof, refer to [Stei 01]. The reader should bear in

mind that the ranges might require updating when new observations are introduced, since


data orientedness is the nature of DEA, in general. The authors claim this formulation

is fairly robust even if the maximum and minimum of elements are breached [Aida 98].

The relatively large denominators in this measure, compared to the others, lead to higher

efficiency measures. To correct this, one suggestion is to replace xi with xik and yi

with yik to make denominators slightly smaller. The price of that is that the ranking

capability of RAM will be lost, since the divisor will differ at every element. The large

denominator also decreases the discriminating power of this measure, as detected by Aida

et al. [Aida 98], where a large group of water suppliers in Japan scored above 98%. As

we observed in GEM, this is because the full range between zero and one is not used. A

decade later, Cooper et al. [Coop 11] have addressed this in the BAM model, which we

discuss next.

3.12 BAM: a bounded adjusted measure

As mentioned before, RAM has little discriminating power and it is defined under VRS

technology only. It is easy to show that under non-increasing returns to scale, the RAM

score could be negative, as Cooper el al. showed in [Coop 11]. Authors have used the

suggestion made in the original paper to modify the objective denominators to L−i =

xi − xi and L+i = yi − yi to make them smaller and this will increase the discriminating

power. The BAM would lose the ranking potential carried by RAM and although BAM

is isotonic, it is not strongly monotonic. To make this available to other technologies

(CRS, NIRS, NDRS) they introduced bounds (extra constraints) to confine the score

to positive values. The general bounds are∑n

j=1 λj · xij ≥ xi and∑n

j=1 λj · yij ≤ yi,

in addition to the usual bounds on λs for constant, non-increasing and non-decreasing

returns to scale. For example,∑n

j=1 λj ≤ 1 for NIRS. It can be shown that the BAM-VRS

score never surpasses that of RAM. This model was later generalized by Pastor et al.

in tow attempts, [Past 13a, Past 13b] to ensure free disposability, provisions for partial


bounds, as well as projection onto the strong efficient frontier under CRS technology.

3.13 Slack-based Measure

In his paper, Tone [Tone 01] suggested a similar model to GEM, as following:

minλ,s±i

1− 1m

∑mi=1

s−ixik

1 + 1s

∑si=1

s+iyik

, (3.22a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.22b)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.22c)

λj, s±i ≥ 0. (3.22d)

His approach in linearizing is what made it unique. By multiplying the numerator and

denominator by t and rearranging variables, the above can be transformed into the fol-

lowing linear formulation

min τ = t− 1

m

m∑i=1

S−ixik

, (3.23a)

s.t.

1 = t+1

s

s∑i=1

S+i

yik(3.23b)

n∑j=1

Λj ·Xij − t.xik + S−i = 0 i = 1, ..,m (3.23c)

n∑j=1

Λj · yij − t.yik − S+i = 0 i = 1, .., s (3.23d)

Λj, S±i , t ≥ 0. (3.23e)

He also studies the dual and shows that SBM deals with profit rather than cost.

Albeit the model is not translation invariant, Tone had considered zeros in input and


output. For the zero inputs, the corresponding slack variable is omitted completely from

the formulation. For the zero outputs, if the unit does not have the facilities to produce

the output, this is omitted but if it has but not using it, then the zero should be replaced

by a small number.

We attempted to use Tone’s method to linearize the basic additive model we devel-

oped in [Siga 09] with no success. The reason is explained below, in two steps. In the

first step we have transformed the original model using∑n

j=1

∑ri=1 ∆ij =

∑ri=1 s

+i and∑n

j=1

∑qi=1 Φij =

∑qi=1 s

−i to the following:

min1− 1

m

∑qi=1

s−ixik− 1

m

∑mi=q+1

s−ixik

1 + 1s

∑ri=1

s+iyik

+ 1s

∑si=r+1

s+iyik

, (3.24a)

s.t.∑nj=1 λj · nxij∑nj=1 λj · dxij

− xik + s−i = 0 xik = nxik/dxik∀i = 1, .., q (3.24b)

n∑j=1

λj · xij − xik + s−i = 0 i = q + 1, ..,m (3.24c)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = r + 1, .., s (3.24d)∑n

j=1 λj · nyij∑nj=1 λj · dyij

− yik − s+i = 0 yik = nyik/dyik∀i = 1, .., r (3.24e)

n∑j=1

λj = 1 (3.24f)

λj ≥ 0. (3.24g)

Applying Tone’s technique to the above will result in:

min τ = t− 1

m

m∑i=1

S−ixik

, (3.25a)

s.t.

1 = t+1

s

s∑i=1

S+i

yik(3.25b)


n∑j=1

Λj ·Xij − t.xik + S−i = 0 i = q + 1, ..,m (3.25c)

n∑j=1

Λj · yij − t.yik − S+i = 0 i = r + 1, .., s (3.25d)

n∑j=1

Λj = t (3.25e)

n∑j=1

Λj · σij +n∑j=1

Φij · dxij = 0 ∀i = 1, .., q (3.25f)

n∑j=1

Λj · wij −n∑j=1

∆ij · dyij = 0 ∀i = 1, .., r (3.25g)

Λj, S±i , t ≥ 0. (3.25h)

Although (t∗, S±i∗,Λ∗) gives us s±i

∗, λ∗i it does not guarantee Φ∗ij equals s−i

∗ · λ∗j · t∗. The

same is true for ∆∗ij.

3.14 Directional slack-based measure and distance

function

Considering the production possibility P (x) = X can produce Y and assuming weak dis-

posability, the directional distance function is defined as ~D(X, Y, g) = sup β : Y + βg ∈ P (x).

Weak disposability for undesirable outputs is a must since disposing of them should come

at a cost [Cham 96]. Chung et al. [Chun 97] have shown that if we take g = Y , the Shep-

hard [Shep 70] output distance function, D(X, Y ) = infθ : Y

θ∈ P (x)

, will become a

special case of directional distance function, D = 1

1+ ~D. β measures the technical ineffi-

ciency. The bundle g is arbitrary, and the intention is that with different choices of g,

you can define a model suitable to the application on hand. For example g could be in

the same direction as good outputs and the opposite side of undesirable outputs. Or,

the g of (−xk, yk) will project unit k to a point on the frontier with x∗ = (1− β).xk and


y∗ = (1 + β).yk inputs and outputs. In LP form, it is given by:

max β, (3.26a)

s.t.

n∑j=1

λj · xij + βgxi ≤ xik i = 1, ..,m (3.26b)

n∑j=1

λj · yij − β.gyi ≥ yik ∀i = 1, .., s (3.26c)

n∑j=1

λj = 1 (3.26d)

λj ≥ 0. (3.26e)

We emphasize that although the model tries to maximize the radial input contraction

and output expansion along the preferred bundle simultaneously, it fails to detect all

the sources of inefficiency, because: a) it is dependent on the choice of direction, and

b) all the elements are forced to grow/contract at the same rate. This is clearly shown

in a numerical example in Ray’s book [Ray 00]. In an attempt to correct the above,

Fukuyama and Weber [Fuku 09] proposed a directional slack based inefficiency measure

(DSBI) which gauges the slacks according to gx and gy as shown below:

maxλ,s±i

1m

∑mi=1

s−igxi

+ 1s

∑si=1

s+igyi

2, (3.27a)

s.t.

n∑j=1

λj · xij − xik + s−i = 0 i = 1, ..,m (3.27b)

n∑j=1

λj · yij − yik − s+i = 0 i = 1, .., s (3.27c)

n∑j=1

λj = 1 (3.27d)

λj ≥ 0. (3.27e)


If the optimum slacks for the directional distance formulation 3.26a are called t−i and

t+i for inputs and outputs, respectively, the objective of DSBI can be written as the

following, which shows the new measure is at least as directional distance function and

the two are equal, if no slack exists: β∗ +1m

∑mi=1

t−igxi

+ 1s

∑si=1

t+igyi

2

The model is translation invariant for a fixed directional vector and is units invariant if

gx = xk and gy = yk and it is monotonic and homogeneous of degree minus 1.

3.15 Graph Hyperbolic measure of efficiency

The hyperbolic graph efficiency extends the radial input and output measures by com-

bining them together. Fare el al. [Fare 85] (page 125) have mentioned why this is called

hyperbolic: “the model constraints the search for more efficient production planes to

a hyperbolic path along which all inputs are reduced, and all outputs are increased,

by the same proportion.” Suppose technology is defined as T = (x, y) : y ≤ f(x),

then the graph of technology is G = (x, y) : y =≤ (x) and (x, y) ∈ G and it is tech-

nically efficient. A hyperbolic efficiency score of (xk, yk) will be 1δ

if (1δxk, δyk) ∈ G.

For the CRS technology, the nonlinear formulation is easily linearized, and for the

VRS technology, Ray [Ray 00] suggested the use of first-order Taylor series approxi-

mation for f(δ) = 1δ, so at an arbitrary point δ0, the Taylor series would result in

f(δ) ≈ f(δ0) + f ′(δ0)(δ− δ0) = 2δ0−δδ0

. Then, assuming δ0 = 1, f(δ) ≈ 2− δ, the resulting

linear programming will be:

max δ, (3.28a)

s.t.

n∑j=1

λj · xij + δxik ≤ 2xik i = 1, ..,m (3.28b)

n∑j=1

λj · yij ≥ δ.yik ∀i = 1, .., s (3.28c)


n∑j=1

λj = 1 (3.28d)

λj ≥ 0. (3.28e)

By construction, the observed unit and its efficient projection lie on a rectangular hy-

perbola. This is, of course, a limiting factor which does not allow for the full efficiency

projection. One possible improvement is using a different rate for inputs and outputs,

but this will not solve the problem. One other suggestion is to create another objec-

tive function based on available constructs here. For example Portela and Thanassoulis

[Port 02, Port 07] introduced the concept of geometric distance function GDF=(∏i φi)

1m

(∏r βr) 1

s

.

3.16 Benefit function

Benefit functions come from consumer theory rather than production theory and are de-

rived from personal preferences.[Luen 92],[Cham 96],[Fare 00]. Benefit function is suited

to maximizing the welfare of a group because the benefit of a preferred choice by indi-

viduals can be aggregated meaningfully. For any g, x, y with g 6= 0, g ∈ R+d , x ∈ X, y ∈ Y

let b(x, y, g) = max β : x− β g ∈ X,U(x− β g) ≥ y otherwise−∞, U is the utility

function defining the outcome of any decision or choice in X. g is a reference vector

defining the measure by which alternative bundles are compared. g is said to be a

good bundle if u(x + α g) ≥ u(x), given that x + α g ∈ X and α ≥ 0. The bundle

is weakly good if u(x + α g) ≥ u(x); if we assume U is monotonic, the ∀g ≥ 0 g is

weakly good. The benefit function has certain properties: (a)Monotonic with respect to

y, (b)b(g;x + αg, y) = α + b(g;x, y), (c)If g is weakly good, then b(g;x, y) ≥ 0 implies

U(x) ≥ y.

In contrast with distance function [Shep 70], d(x, y) = maxγ : U(x

γ) ≥ y

, which is

used in individual consumer theory, the benefit function has use in group welfare re-

lations. However, under appropriate assumptions, the dual of both functions will give


the expenditure (cost) function. Because benefit function is easily transferable to LP

form, and with minor modification it can be applied to the production theory, it is of

interest to us. As a matter of fact, Chambers et al. [Cham 96] have extensively studied

the relations of benefit and distance functions in consumer theory and modified them

for use in production theory. Input distance function in production theory will look like

d(x, y) = maxγ : x

γ∈ L(y)

, where L(y) = x : xcan producey is a subsection of the

production possibility set. We can argue that for the utility function of xγ

to be greater

than y, it is strictly required that xγ∈ L(y). Chambers et al. [Cham 96] have proposed the

input directional distance function: ~D(x, y, g) = sup β : x− βg ∈ L(y), which is basi-

cally the same as the Luenberger benefit function [Luen 92], when translated into produc-

tion theory. Under weak input disposability because x ∈ L(y), if and only if, d(x, y) = 1,

we can say ~Dx, y, g) = sup β : d(x− βg, y) ≥ 1, which clearly shows the relation be-

tween benefit and input distance function. In the case of choosing g = x, it is proven

that ~D(x, y, x) = 1 − 1d(x,y

and d(x, y) = 1~D(0,y,−x)

. Later Fare and Grosskopf [Fare 00]

extended the function to expand output and contract input simultaneously, so that the

dual would be the profit function ~D(x, y,−gx, gy) = sup β : x− βgx ∈ L(y + βgy).

3.17 Range Directional Model and Inverse Range

Directional Model

Portela et al. have introduced these models to initially deal with the negative data

directly [Port 04]. They have built on the directional distance function and with defining

a new range along with pretreatment of the data make it possible to deal with negative

data and set targets which are easier to achieve (closer to the DMU in contrast with the

farthest (largest slack). RDM defines the direction of improvement as the path towards

the super champion (ideal point) which is the same for all and the best of best (it might


not even exist). First, we look at their RDM model:

max βk, (3.29a)

s.t.

n∑j=1

λj · xij ≤ xik − βk Rik i = 1, ..,m (3.29b)

n∑j=1

λj · yrj ≥ yrk + βk Rrk r = 1, .., s (3.29c)

n∑j=1

λj = 1 (3.29d)

Rrk = maxjyrj − yrk r = 1..s (3.29e)

Rik = xik −minjxij i = 1..m (3.29f)

λj, s±i ≥ 0. (3.29g)

The above model is non-oriented and input contraction and output expansion are looked

at simultaneously. Setting either of the ranges, Rik or Rrk, to zero would make the above

output or input oriented, respectively. The range is an upper bound on the slacks for

each variable. The authors have proved RDM is also translation and units invariant.

β is an inefficiency score, but β does not encapsulate all sources of inefficiency and

some inputs or outputs might have nonzero slacks at the optimal value for β. 1 − β is

considered as the RDM efficiency score which is bounded by one. However, the direction

towards the production frontier is biased towards the factor with the highest potential

for improvement (as in most slack-based measures).

To make targets closer to the unit (give priority of improvement to those factors of

a unit which are closer to the best practice) IRDM is suggested, which uses the inverse

range. Whenever the range is zero, the division by zero is avoided and the inverse is

replaced by zero, which is reasonable, since the range zero means the unit is already


efficient on that front. IRDM is defined through the following LP:

max βk, (3.30a)

s.t.

n∑j=1

λj · xij ≤ xik −βkRik

i = 1, ..,m (3.30b)

n∑j=1

λj · yrj ≥ yrk +βkRrk

r = 1, .., s (3.30c)

n∑j=1

λj = 1 (3.30d)

Rrk = maxjyrj − yrk r = 1, .., s (3.30e)

Rik = xik −minjxij i = 1, ..,m (3.30f)

λj, s±i ≥ 0. (3.30g)

The above is translation invariant but not units invariant. The authors suggested a

pretreatment of the data (normalization) to achieve units invariance artificially. The

intention is to divide every output by the largest output and the same for the input.

Please note that the ranges need to be re-evaluated after this normalization stage. IRDM

efficiency score measures the distance from an observed point to a target point with

reference to some ideal point. But contrary to the RDM, this time ideal point changes

for every DMU and as a result, interpreting the efficiency score or comparing scores or

rankings is not an option here. This model is merely used for target setting.

Asmild and Pastor [Asmi 10] extended the RDM (by adding a second phase) to ac-

count for Pareto efficiency. The projection by RDM might be to the weakly efficient

frontier and also benchmarks have non-directional slacks. The second phase is a weighted

additive model with the aim of detecting the existence of those slacks:

maxm∑i=1

τ−ikRik

+s∑r=1

τ+rk

Rrk

, (3.31a)

s.t.


n∑j=1

λj · xij + τ−ik = xik − β∗k .Rik i = 1, ..,m (3.31b)

n∑j=1

λj · yrj − τ+rk = yrk + β∗k .Rrk r = 1, .., s (3.31c)

n∑j=1

λj = 1 (3.31d)

Rrk = maxjyrj − yrk r = 1, .., s (3.31e)

Rik = xik −minjxij i = 1, ..,m (3.31f)

λj, s±i ≥ 0. (3.31g)

The measure then is defined ex post facto as 1m+s

(∑mi=1

R−∗ikRik

+∑s

r=1

R+∗rk

Rrk

)= (1− β∗k)− 1

m+s

∑mi=1

τ−∗ikRik

+∑s

r=1

τ+∗rkRrk

In this way the contribution of every input and output to the measure is clear.

3.18 Modified Slack-based Measure

Sharp et al. have built their MSBM model [Shar 07] by using the SBM model by

Tone[Tone 01] as the base, and incorporating Portela [Port 04] ranges in order to en-

able the SBM to deal with natural negative data. [Shar 07] Natural negative variable

is a variable with a meaningful zero (like most undesirable outputs). The model over-

comes two drawbacks of SBM with negative data: it is translation invariant and does

not generate negative inefficiency scores. Their model is based on the assumption that at

least one positive input and one positive output should exist. To avoid dividing by zero

whenever the corresponding ranges are zero, the corresponding term is dropped from the

objective. Here is what they suggest:

minλ,s±i

ρ =1− 1

m

∑mi=1

wis−i

P−i0

1 + 1s

∑sr=1

vrs+r

P+r0

, (3.32a)

s.t.


n∑j=1

λj · xij + s−i = xi0 i = 1, ..,m (3.32b)

n∑j=1

λj · yrj − s+r = yr0 r = 1, .., s (3.32c)

n∑j=1

λj = 1 (3.32d)

m∑i=1

wi = 1 (3.32e)

s∑r=1

vr = 1 (3.32f)

P−i0 = xi0 −minjxij i = 1, ..,m (3.32g)

P+r0 = max

jyrj − yr0 r = 1, .., s (3.32h)

λj, s±i , wi, vr ≥ 0. (3.32i)

They have proved the measure ρ is between zero and one and the model is both units

and translation invariant. The model can be linearized in the same fashion as SBM:

min τ = t−m∑i=1

wis−i

P−i0(3.33a)

s.t.

X Λ + S− = t.x0 (3.33b)

Y Λ− S+ = t.y0 (3.33c)

t+s∑r=1

vrs+r

Pr0= 1 (3.33d)

m∑i=1

wi = 1 (3.33e)

s∑r=1

vr = 1 (3.33f)

Λ, S±, t ≥ 0 (3.33g)

At the optimal solution, we have ρ∗ = τ ∗ , λ∗ = Λ∗

t∗, s−

∗= S−

∗

t∗and s+∗ = S+∗

t∗.

The MSBM score cannot be greater than the RDM score as they have shown through


an example. The model allows for slack weight alteration, depending upon strategic or

managerial preferences.

3.19 Directional distance functions and slack-based

measures of efficiency

Fare and Grosskopt [Fare 10a, Fare 10b] also worked on Tone’s SBM model and proposed

the following which does not require any adjustments with zero components in input or

output, while it is translation and units invariance.

α0 = max β1 + ...+ βm + γ1 + ...+ γr, (3.34a)

s.t.

n∑j=1

λj · xij ≤ xi0 − βiIi i = 1, ..,m (3.34b)

n∑j=1

λj · yrj ≥ yr0 + γrIr r = 1, .., s (3.34c)

n∑j=1

λj = 1 (3.34d)

λj, βi, γr ≥ 0∀j, i, r. (3.34e)

(3.34f)

Ii and Ir are directional vectors of one size corresponding to each input and output

unit of measurement. This choice of vectors is necessary to keep the units invariant

characteristic, so if input two is measured in kilograms, I2 is one kilogram and if it

changes to grams, then I2 is one gram. Please note that alpha0 = 0 if, and only if, all

slacks are zero. It acts like an efficiency score of one in SBM model.


3.20 Universal model for ranking

Paterson’s extended non-oriented model is aimed at measuring super efficiency and rank-

ing the units [Pate 00], which is not our focus at all, however, because the idea on how

to treat our scores was formed reading his work, we briefly mention it here. Two existing

methods for ranking units in DEA are: calculating the Malmquist index and measuring

super efficiency [Ande 93]. For the latter, oriented models result in some shortcomings,

and that is why Paterson proposed two non-oriented models to measure super efficiency

the right way. We skip the radial model he proposed but the second one is an additive

model used in the Andersen and Petersen procedure, i.e. each DMU is evaluated consid-

ering the data set, excluding the DMU itself. This implies the DMU under study might

be inside or outside the convex hull of others. Since this information is not available a

priori, the following is solved under two different situations:

φi, φo ≥ 0 for points inside the hull, and

φi, φo ≤ 0 for points outside the hull.

If the first has a feasible answer, the point is inside and if not, the second set of constraints

is applied to 3.35 as presented below:

maxλ,φi,φo

m∑i=1

1

σiφi · xik +

s∑o=1

1

σoφo · yok, (3.35a)

s.t.

n−1∑j=1

λj · xij ≤ (1− φi) · xik ∀i = 1, ..,m (3.35b)

n−1∑j=1

λj · yoj ≥ (1 + φo) · yok ∀o = 1, .., s (3.35c)

n∑j=1

λj = 1 (3.35d)

λj ≥ 0. (3.35e)


The σi and σo are standard deviations of every input and output considering all the n

points. It is easy to show that the above formulation is units invariant. The criticism of

this formulation is why the actual inputs and outputs are included in the objective, which

brings the actual size of the unit into account and might give big units an advantage,

something we prefer to avoid in DEA. The formulation is similar to the Russell graph

measure, however, instead of the summation of scalars, the actual radial reduction of

inputs and expansion of outputs is maximized. The novelty of this approach from our

point of view is how Paterson extended this and normalized the scores. He added a worst

possible DMU to the data set, this DMU having the maximum of inputs and producing

the minimum of outputs (every output). He calculated the super efficiency for the worst

unit and then normalized the scores of the rest, with respect to the worst player.

3.21 Remarks

We have reviewed about 20 of the existing models (almost all of them that we have come

across), yet none of them properly supported the ratio variables. In addition, none of

them with a linear computational complexity had the desired characteristics of units and

translation invariance in addition to providing an efficiency score between zero and one.

However, the method for transforming scores and how to normalize or achieve units or

translation invariant properties all represent valuable information, which we use when

developing our own model. The methodology pertinent to this literature review will

follow in Chapter 5.

Chapter 4

Literature review of approximation

models

As mentioned in the first chapter, this chapter forms the second part of the literature

review pertinent to this thesis. The methodology will follow in Chapter 6. The literature

around the non-linearity problem arises in the case of BCC model with ratio variables

on the side of orientation being in fact, scarce. The only paper that discusses the issue

directly, where was mentioned in Chapter 1, dates back to 2009 and was indeed the

motivation for this thesis [Emro 09].

The LP formulation discussed in Chapter 2 is a tool to mathematically solve the DEA

concept. The DEA concept remains the same for our case with ratio variables, however

the well-known tool, LP, to estimate the boundaries of the PPS, based on the observed

sample, cannot be employed. We devise this problem to how to find an estimate of the

true production frontier, with a sample of n observations (DMUs). There is a vast and

rich literature focusing on the goodness of the estimate of the frontier in DEA and the

statistical inferences about it [Knei 03, Dyso 10].

We have reviewed the literature on estimating the true frontier in DEA as you will

find below. The authors all pursue a goal like ours: estimating the unobserved boundaries

67

Chapter 4. Literature review of approximation models 68

of the PPS based on an observed sample of DMUs. The sources of uncertainty studied

in DEA is embedded in the nature of choosing, defining, judging, and measuring the

variables. A very good study by Dyson and Shale details various uncertainties in real

world situations about the true efficient frontiers, and the methods to deal with them

[Dyso 10]. Although for reasons that will become apparent at the end, we produce a

heuristic of our own, the literature has been helpful to shape our thoughts and give us

insight into the techniques employed mainly by statisticians. This is relevant to our case

as we also deal with problems in which a few samples of the PPS are available, and we

would like to know more about the actual population.

4.1 Bootstrapping and DEA

Bootstrapping is a well-established re-sampling technique to approximate the distribution

of a random variable in order to estimate a value such as the mean [Efro 79, Efro 82,

Efro 94]. It entails three basic steps:

• Construct the sample probability distribution Θ, giving each observation the chance

of 1/n;

• draw a random sample of size n with replacement and call this the bootstrap sample;

and

• approximate the bootstrap distribution of the variable of interest induced by the

random mechanism above, with Θ fixed.

The difficult part of the bootstrap procedure is the actual calculation of the bootstrap

distribution. Although in a few cases direct theoretical calculation is possible, Monte

Carlo approximation is often used. Repeated resampling of size n from the observed

data is drawn with replacement and the corresponding values of the variable of interest


is recorded. The pattern of these values is an approximation of the actual bootstrap

distribution. This is the method used in DEA.

In our case the variable of interest is an estimate of the true frontier. We have a

sample of the PPS, our observed DMUs, and would like to estimate the boundary of

PPS.

The application of bootstrapping in the context of Data Envelopment analysis dates

back to 1995 [Gsta 95] and were later developed by Simar and Wilson [Sima 98, Sima 00,

Sima 99a, Sima 99c, Sima 99b]. They claim DEA models measure efficiency relative to a

non-parametric maximum likelihood estimate of an unobserved true frontier, conditional

on observed data resulting from an underlying and usually unknown data generating pro-

cess. They argue that because efficiency is measured relative to an estimate of the true

frontier, estimates of efficiency from DEA models are subject to uncertainty due to sam-

pling variation. Others have also extended and modified Simar and Wilson bootstrapping

approach such as [Loth 99, Tzio 12, Ferr 97, Ferr 99]. The idea has been applied to the

real problems like in [Alex 10, Sadj 10] to name a few. A full review of the methods are

presented in [Sima 08].

The bootstrap algorithm has the following steps:

1. Calculate the original efficiency estimate using the LP presented in Chapter 2 and

transform the observed input- output vectors as the following

θi = minθ,λ

θ : yi ≤ Y · λ, θ · xi ≥ X · λ,

n∑i=1

λi = 1, λi ≥ 0

(4.1)

(xif , yi

)=(θi · xi, yi

)(4.2)

2. Resample independently with replacement n efficiency scores form the n original

estimates,θi

. Let δ∗i , i = 1, . . . , n, denote the resampled efficiencies. In some

methods a noise is added to the resampled efficiencies followed by a correction

mechanism. A so-called smoothing procedure championed by Simar and Wilson


is to increase consistency of the bootstrap estimator. For the detailed discussion

on this, please refer to [Sima 00]. Independent of the procedures we will have n

randomly selected efficiency scores to generate bootstrap pseudo-data.

3. Let the bootstrap pseudo-data be given by

((x∗i , y

∗i ) = xi

f/δ∗i , yi

). (4.3)

4. Estimate the bootstrap efficiencies using the psuedo-data and the linear program

as in step 1 by

θ∗bi = minθ,λ

θ : yi ≤ Y · λ, θ · x∗i ≥ X∗ · λ,

n∑i=1

λi = 1, λi ≥ 0

. (4.4)

Again, here, is a debate between scholars as to whether the efficiency estimate of

the ith DMU should be evaluated as the efficiency of the original input or the

pseudo input relative to the boundary of the convex and free-disposal hull of the

pseudo-observations. Lothgren claims the latter eliminates the complex smoothing

procedure by Simar and Wilson [Loth 98].

5. Repeat steps 2-4 B times to create B bootstrap estimates for each DMUs efficiency.

These will estimate the distribution of the efficiency score and the mean which is

of interest, as well as the confidence Intervals. 95% Confidence Interval is between

2.5th and 97.5th percentile. B usually is chosen as 1000 after the recommendation

by Efron and Tibshirani [Efro 94].

There is another variation of bootstrapping used in DEA, which is called bootstrap with

sub-sampling. This method essentially allows a smaller sample size m < n from the

original n observations to be used for bootstrapping. Similar to the original bootstrap,

repeated samples are drawn uniformly, independently and with replacement [Knei 03].

Through an algorithm detailed in their work the procedure creates a consistent estimate

similar the smoothed version.


The main motivation to employ bootstrapping in Simar and Wilson work was to ac-

count for the measurement error in inputs and outputs. They wanted to study statistical

properties of the non-parametric frontier and were interested in the confidence intervals.

In the presence of ratio variables, because of the non-linear formulation for the efficiency

estimates, we have no estimate of the true frontier to begin with. With the knowledge

of the case under the study, for instance bank branch performance, we can hypothesize

a sensible distribution for the efficiency scores but without an operational mathematical

program we have no way of estimating original frontier inputs for each branch. The data

generating process assumed by most of these studies is based on a random deviation from

the input frontier for a given output, hence we need a sample of efficiency scores and

the original outputs and the estimate of the efficient input level which is not available

to us. Nevertheless we use the idea and define a data generating process for PPS based

on re-sampling with replacement from the observed DMUs. Because we are interested

in the samples, perceived to be closer to the true frontier, the sub-sampling idea is of

interest and the next logical step is to study sampling techniques.

4.2 Sampling techniques

The Monte Carlo method was first publicly introduced in 1949 to solve integrals that were

hard to solve [Metr 49]. It was quickly developed as applications in physics, business,

computing, finance, and engineering adopted it. Every Monte Carlo calculation such

as the one used in the bootstrapping, as seen in the previous section, requires repeated

sampling of random events that, in some way, represents or defines the phenomenon of

interest. This repeated sampling is a way of simulating the behaviour of the phenomenon.

In this way, the samples or simulation can be used to make approximations of properties

of interest. For well-known distributions standard uniform sampling methods exist which

have been incorporated in most computer packages. For real life problems, where the


variables representing the event, do not fit into the standard distributions, other tech-

niques have been developed to decrease the chance of misrepresenting the population.

Popular ones are importance sampling, rejection sampling (Stan Ulam and John Von

Neumann), Metropolis sampling [Metr 53], Metropolis-Hasting sampling [Hast 70] and

Gibbs sampling [Gema 84]. Among these rejection sampling is of a practical interest to

us, as explained next.

Rejection sampling

The idea behind the rejection sampling is to use the standard techniques to draw samples

from an envelope distribution that lies over the required distribution. The goal is by

accepting some draws and rejecting others the resulting sample mimics the population of

interest. We use the idea later in Chapter 6, to reject samples that have a small chance

of being close to the efficient frontier.

Markov Chain Monte Carlo Methods (MCMC)

The Monte Carlo simulation algorithm simulates independent random values from the

probability distribution of interest. But, in MCMC algorithms, there is a dependence

between simulated values. MCMC algorithm works by small random jumps to explore

the distribution of interest.

Metropolis is one of the MCMC algorithms. This algorithm constructs a Markov

chain by proposing small probabilistic symmetric jumps centered on the current state of

the chain. These are either accepted or rejected according to some specified probability,

and in the case of rejection the next step in the chain equals the previous step [Metr 53].

Hasting later generalized the algorithm so the jumps in the Metropolis-Hasting algorithm

do not have to be symmetric and indeed the probability of acceptance of one of these

new values is also different [Hast 70]. Gibbs sampling is a special case of the Metropolis-

Hasting approach, that uses conditional probabilities for the steps.


There are examples in the literature that show the above mentioned techniques used

alongside DEA. For instance Gibbs sampling has been used in stochastic frontier analysis

together with DEA to improve estimates of the efficiency [Tsio 03]. For a two output and

one input case under VRS assumptions, Gstach used DEA to generate a frontier estimate

and devised a DGP based on output proportions. The author used Metropolis-Hasting

method to statistically define the unobserved output targets given an input and output

mix [Gsta 03]. However in none of them the Monte Carlo method was used to estimate

the DEA efficiency estimates as we intend to do.

4.3 Summary

Although bootstrap and Monte Carlo techniques are both used to simulate a statistical

measure by repetitive sampling, they are different in many ways. The main difference is

that in bootstrap there is a readily available sample of size n from the population, and

re-sampling is based on that, whereas in Monte Carlo, we have no samples to begin with,

and we try to construct a proper data generating mechanism to provide us with a sample

of size n to represent the population.

Monte Carlo is mostly used in designing experiments, where the production function,

inputs and outputs and true efficiency scores are known. The goal is to compare different

models, according to the efficiency scores they produce and how those scores deviate from

the true frontier. The general form of production function is Y = f(X)+error, where Y

and X are random variables representing output and input vectors. The production func-

tion needs a-priori information about the underlying technology (CRS, VRS ...). Errors

consist of noise and technical inefficiency. Here x, realization of X, is drawn randomly and

corresponding y is derived according to the formula. Banker et al. [Bank 93, Bank 87]

used Monte Carlo to simulate decision making units to compare DEA with some statisti-

cal measures. The same is done by [Gong 92] to compare DEA and stochastic frontiers.


For our work, however, we are interested in the frontier where, as with Monte Carlo,

we have no samples to begin with. We have a few samples of the PPS but we do not have

a direct mathematical formula to generate the frontier. However we have a sample of the

PPS (the observed DMUs) as in the bootstrap and we can sub-sample from it to explore

the PPS. We can estimate PPS partially out of those samples and then, via an LP, find

close to frontier estimates. The details of this appear in the methodology in Chapter 6.

Chapter 5

Methodology:Proposed

Non-oriented Model

In Chapter 3 we reviewed the DEA literature around the existing non-oriented mod-

els, and in Chapter 1 we delved into the difficulties and encountered the ratio variable

problems [Holl 03, Emro 09, Siga 09] due to DEA nonlinear formulation. Here, we are

seeking a model that can be transformed into a form that is amendable to be solved by

linear programming or, at least, fractional linear programming when the ratio variables

are mixed with non-ratio variables. This is accomplished in two parts. The first part is

to know how to redefine the DEA model having the use of ratios in mind which avoids

the incorrect calculation of the PPS. The second is to build a non-oriented model that

satisfies a list of desirable properties, which we compiled from the literature. More im-

portantly, our model should be computationally feasible when the ideas from the first

part are applied to it.

5.1 Required adjustments to the basics of DEA

Let us first try to rework the DEA going back to the original concept by Farrell [Farr 57].

We break the DEA procedure into three parts: first, we define the production possibility

75

Chapter 5. Methodology:Proposed Non-oriented Model 76

set (PPS), to include all the feasible units including observed units as well as interpolation

of the units; second, we identify the efficient units, and third, measure the changes the

inefficient units need to fulfill in order to become efficient.

5.1.1 Defining PPS

DEA is very data oriented. This means that the perceived potential of a business wholly

depends on the observed data. There are various assumptions/rules that allow inter-

polation/extrapolation of the possible units based on the observed points. This means

that no matter what the assumption is, once the sampled/observed data varies, the PPS

may change and, as a result, the sensitivity of DEA to the data at hand is significant.

A remedy for this is comprehensively sampled data from good performers, average per-

formers and weak performers, based on educated guesses or the recommendation by the

management. For example, in the banking industry having the data from branches well

spread over different regions (provinces, urban, metropolitan, and rural) would mitigate

the sensitivity to the data. It is worth noting that adding new data (a new unit) can

only decrease the efficiency score. Popular assumptions, or in other words, the accepted

rules, based on which of the points of PPS are interpolated/extrapolated from the ob-

served points, are listed below, with the choice of an appropriate model depending on

the knowledge domain regarding the industry.

5.1.2 Disposability

If a unit with input X and output Y is observed, it means that any unit worse than

this, with inputs larger than/equal to X and outputs smaller than/equal to Y can be

realized. This is a fundamental assumption to form the PPS out of the available data.

Free disposability or strong disposability does not impose any rules on “being smaller”

or “larger”. Weak disposability defines a rule, for example when (X, Y ) is observed,

(X,αY ) can be realized if 0 ≤ α ≤ 1 [Rolf 89]. A combination of weak and strong


disposability might be applied, depending on how the inputs/outputs are defined, the

nature of those inputs/outputs, and to what degree various outputs are linked together.

For example in an energy production sector, for the same input, the harmful emissions

cannot be reduced without reducing the actual electricity generated or introducing new

technology.

5.1.3 Convexity

If any two units are attainable, any unit representing the average weighted sum of those is

feasible. The convexity assumption has its roots in economic theory. Although it sounds

practical, some have challenged the limitation that convexity imposes in attaining PPS.

Deprins et al. [Depr 84] claim that it is not a good fit for the data at hand to generate

better performers that eventually reduce the efficiency of the observed points. They point

out that it is,instead, good for creating future projections [Tulk 93]. There are other valid

concerns about convexity in a number of industries, specifically, the production. Cherchye

et al. have documented this issue quite well [Cher 99].

Here are some of the limitations that convexity axiom has. Convexity requires pro-

duction activities (it usually translates to input output) to be divisible [Farr 59] and

this is not certain in every business. For example, some raw materials come in batches

and 1.2 batches is infeasible or you cannot have 1.5 workers for labour. But, in general,

indivisibility is not a huge issue because variables are divisible to some extent and this

problem can be solved by approximation. The other issue is economies of scale or in-

creasing marginal productivity. Let us say increasing inputs twofold will result in output

levels being more than doubled, since that happens at a certain level, like X > 3.5T , the

weighted average of the two units if one enjoys the economies of scale (4T, 6Y ) and one

does not (2T, 2Y ) might be something infeasible like (3T, 4Y ). A simple remedy would

be to partition the data into sets with similar characteristics, like units with X < 3.5T

in one group and X > 3.5T in another. The same argument can be used to show the


diseconomies of scope, such as selling as a bundle, which would cause a problem but

treating the bundle as a product, rather than breaking it down would, at least, partially

solve the problem.

The aforementioned problems with economies of scale, divisibility, and diseconomies

of scope [Bous 09], have little relevance to our problem of interest. Therefore we use

convexity assumption in our modeling. For the research dealing with risk aversion, the

convexity assumption should be used with caution or avoided entirely. While generating

the PPS, we also need to consider customized rules/limitations based on practicality,

experts’ opinion and so on.

5.1.4 Identifying the efficient units

Contrary to the general perception that identifying efficient units depends on how one de-

fines efficiency, we argue that efficient units depend on how one defines PPS in DEA model

and has less to do with the type of efficiency. Basically, efficient units are sitting on the

frontier of the PPS. We might be looking only at specific parts of this frontier, or frontier

facets [Ali 93, Oles 03, Apar 07, Frei 99], depending on the type of efficiency/orientation.

All the frontier units are radially efficient units either in terms of input or output, whereas

slack-based efficient units are a subset of the radially efficient units and might not include

all of them. Frontier units do not change for a specific PPS; the efficiency score of the

inefficient ones might change though, depending on how we measure the distance (radial,

slack, input/output or orientation) to the efficient frontier. In the next part we look at

the ways the distance/path to frontier is defined.

5.1.5 Calculating the relative efficiency score

Relative efficiency scores are widely known as scalars, which determines how much a unit

must decrease all its inputs or increase all its outputs to become technically efficient. As

explained in Chapter 2, a DMU is pareto-efficient if it bears no slacks/shortfalls in any


of its inputs/outputs. For a Pareto efficient unit, any reduction in any of the inputs, or

an increase in any of the outputs, would make the unit infeasible unless, we, respectively

increase some other inputs or decrease some other outputs. Note that the amount of

substitution and trade-off is not part of the decision-making process here. As a result,

radial efficiency scores do not reflect the possibility of input or output substitution, or

generally removing slacks. The CCR model of Charnes et al. [Char 78] and the BCC

model of Banker et al. [Bank 84] and even the FDH model [Depr 84, Tulk 93] calculate

radial efficiency. Cooper et al. and Tone later modified the BCC and CCR models by

adding a second stage to examine if the target (a unit on the frontier) can be further

improved by eliminating slacks [Coop 04, Tone 99].

Radial efficiency has its own merits such as good “reference” selection. Close targets

require minimal changes to an inefficient unit to become an efficient [Conc 03]. Since

the goal is to project the unit to a part of the frontier that requires minimum change

in inputs/outputs, the unit is compared with the ones in its league. In other words, we

evaluate the unit in the best possible light. Radial efficiency is sometimes called the

Farrell-Debreu efficiency measure [Coop 99a].

Non-radial models first appeared in Fare and Lovell’s work in 1978 [Fare 78]. The

model was designed to reduce slacks in input or output, but not both, by individual

scalars for every input or output instead of the same scalar for all, as in CCR. They later

combined the two and defined the Russell graph measure [Fare 85]. Then, Charnes et

al. [Char 85] produced the additive model with the flexibility to change both inputs and

outputs and the goal of eliminating slacks as much as possible. Others also worked on the

non-radial efficiency measures such as Zieschange [Zies 84], who created a hybrid mea-

sure of Farrell and Russell’s input efficiency, and Green et al. [Gree 97], who suggested

calculating efficiency as the ratio of current levels to the optimum ones and adding them

up. He, in fact, created a nonlinear additive model with a meaningful bounded measure.

Thanassoulis and Dyson [Than 92] made a hybrid model of the Russell and additive


models, forcing the preferred variables to have zero slacks while other variables’ slack is

reduced as much as possible. The score, however, does not convey an operational mean-

ing and the method is mainly used for target setting. The selection of weights is also an

issue and fully subjective. Ruggiero et al. [Rugg 98] came up with a weighted Russell

measure, in which the performance depends on the right choice of weights, and if the

relative weights are biased, distortions might be introduced. If there is one output, ordi-

nary least square regression can be used and to choose weights for multiple output cases,

the canonical regression analysis is the optimum method. To avoid biases, it is recom-

mended that the canonical regression is performed on the group of Farrell efficient units.

Other models include: the model for imprecise data by Cooper et al. [Coop 99b], models

for dealing with congestion in [Fare 85], [Coop 01a], [Coop 01b], [Broc 98], Cherchye et

al. [Cher 01], and a model for environmental performance when undesirable outputs are

present by Zhou et al. [Zhou 07].

With the prior knowledge that radial models do not work at all and cannot be lin-

earized when ratio and non-ratio variables are mixed, we focus on non-radial models to

build a desired non-oriented model.

5.2 Building the right measure of Efficiency

When inputs and/or outputs are in the form of ratios, conventional linear programming

methods may fail to build the correct frontier and, as a result, the scores and projections

are distorted. Recall that the main concern about using ratios as input or output variables

in the context of the conventional DEA is due to the fact that DEA estimates the PPS out

of the n available data points. Conventional DEA identifies each DMUi by its production

process (xi, yi) and works with a linear combination of (xi, yi)s. As long as xi and yi do

not contain any ratios, working with inputs and outputs translates exactly into working

with DMUs. However, merging inputs and outputs is not equivalent to merging DMUs


when ratios are involved. For example, the composite output, Yr is given by:

[Yr =

N

D

]DMUcomposite

= (λ1%)

[Yr1 =

N1

D1

]+ . . .+ (λn%)

[Yrn =

Nn

Dn

], (5.1)

which is not equal to the output of a composite DMU, Y ∗r in 5.2 given by:[Y∗

r=

N

D

]DMUcomposite

=(λ1%)(N1) + . . .+ (λn%)(Nn)

(λ1%)(D1) + . . .+ (λn%)(Dn). (5.2)

Hollingsworth et al. [Holl 03] believed that the BCC formulation is the appropriate

model when ratios are involved because: Y ∗r = Yr ⇒∑

i λi = 1. However, as we proved

in [Siga 09], for the radial DEA approach to be valid when using ratios, the BCC model

is a necessary condition, but not a “sufficient” one [Siga 09]. Emrouznejad [Emro 09]

examined the problem and suggested two models. The first breaks down ratios to their

numerator and denominator parts and treats one part as input and the other as output.

The second one is based on the idea of working with DMUs rather than their production

processes. We will use the latter to build our models. We do not go through the details

of Emrouznejad’s models because they have been covered in [Siga 09]. In [Siga 09], we

added a second stage to the Emrouznejad second model to eliminate any remaining slacks

after the radial projection onto the frontier. Success with this case has inspired us to

build a model that seeks targets with zero slacks.

5.2.1 Proposed non-oriented model

This research is directed toward non-oriented measures of efficiency in situations where

the use of ratio form in elements of inputs and outputs is selected due to managerial

judgments. We want our model to capture all inefficiencies (zero slacks) and our measure

to incorporate all the identified inefficiencies in the form of a single real number, which, by

itself, requires the whole process to be operationalized by the use of linear programming.

As we have seen in the literature review chapter 3, the development of non-oriented

models that aim to assess performance and decrease inputs while increasing outputs have


been a topic of interest in the literature since 1992, with the last paper being in 2010.

Among the scholars working on this topic, Pastor has contributed the most, in general. A

quick review is presented in Table 5.1. We would like to restate the definitions to clarify

the types of improvements which non-oriented models suggest. There exist two constructs

in every model: input inefficiency and output inefficiency. For each of the constructs, the

potential for improvement is measured. The “How to” measure depends on the technology

and methodology the analyst has in mind. It could be either overall radially: scaling

up/down all the components of input or output, simultaneously or scaling up/down

every component of the input/output, while keeping others constant and aggregating

them either by adding them up or by multiplication, or focusing on the distance from

the frontier and deciding on the preference/importance of the distances, based on value

judgment, and again aggregating them either by summation or multiplication.

The next step usually involves aggregating the two constructs of input efficiency and

output efficiency, which could be simply done by adding them up or taking the ratio.

In this stage, the concern is to define this measure in a way that satisfies certain re-

quired/preferred properties [Coop 99a, Love 95a, Coop 11]. We have compiled a list of

properties we found in the literature. Hardly any existing models meet all the require-

ments. Here, we list the desired properties with a brief description to identify what,

for us, has been important and considered while developing our model. Mathematical

properties that we want our model and the resulted efficiency score, Ω, to satisfy are:

1. 0 ≤ Ω ≤ 1;

2. Ω = 1 only if the unit is fully efficient and Ω = 0 means fully inefficient;

3. Ω is dimensionless: invariant to change of units;

4. Ω is not affected by change of origin or shifts in data: invariant to translation

5. Ω is invariant to alternative optima;


Table 5.1: Non-oriented models and their properties

YearNam

eScore Range

Meaning

TypeUnits

invariantTranslation invarariant

Zero inputs/outputs

Computational

Degree

1985Russell M

odel0‐1

No slack

non‐RadialY

NY/N

Non‐linear

fractional

2012Refined Russell M

odel0‐1

No slack

non‐RadialY

NY/Y

Non‐linear m

ulti step

1982Multiplicative M

odel (log efficiency)

0‐1 log efficient

Log non‐Radial

NN

N/N

log Linear multistep

1983Invariant M

ultiplicative Model

0‐1log efficient

Log non‐Radial

YN

N/N

log Linear multistp

1985Pareto efficient em

perical production function m

odel‐infinity‐0

No slack

non‐RadialN/Y

Y/NN/N

Linear

1995Global Efficiency m

easure0‐1

No slack

non‐RadialY

NN/N

Non‐linear/ Linear

1999Range A

djusted Measure

0‐1No slack

non‐RadialY

YN/N

Linear

1994Constant w

eighted additve model

0‐1No slack

non‐RadialN

NLinear

1995Norm

alized weighted A

dditive model

‐infinity‐0

No slack

non‐RadialY

YY/Y

Linear

1999Enhanced Russell M

easure0‐1

No slack

non‐RadialY

NN/N

fractional non‐linear /Linear

1996Directional D

istance Function radial eff

radial directional

YN

NLinear

1985Graph H

yperbolic measure

0‐1radial eff

radial efficiency

YN

N/N

non‐linear

2011Bounded A

djusted measure

0‐1No slack

non‐RadialY

N (yes in VRS

case)N/N

Linear2001

Slack Based measure

0‐1No slack

non‐RadialY

NY/Y

Linear

2004Range D

irectional Model

0‐1technical eff

non‐RadialY

YY/Y

Linear

2007Modified Slack based m

easure0‐1

No slack

non‐RadialY

YY/Y

Linear2010

Slack free RDM

0‐1No slack

non‐RadialY

YY/Y

Linear


6. Ω is isotonic; and

7. Ω possesses discriminating power.

The first and second properties conform to the common practice in other disciplines

as well as main DEA literature, where 100% means fully efficient. The third property

is of great value and guarantees that the solution is not affected by change in units of

measurement. We had seen an example before on how the absence of this property could

lead to misjudgments. The fourth property is essential when it comes to dealing with

zero and negative data. It allows for adding a constant to the data in order to transform

the negative/nonzero elements into positive values and remain assured that this will not

affect the Ω. However, the reader should bear in mind that there exist models which do

not possess this property but can deal with negative or zero data in some other fashion.

Property five is required to maintain the independence of the efficiency score from the

reference set (benchmark), so the route to efficiency might be different but the amount

of overall inefficiency to address would not be affected by the choice. Property six is

saying that for every inefficient unit, improvement in every element, while holding the

rest fixed, will result in improvement in the efficiency score. We will not seek this strict

monotonicity for the efficient units though because Pareto efficiency, by definition, means

no further improvement is possible. Moreover, weak monotonicity at extreme points

ensures robustness [Coop 99a]. The last property is desired so that the model/measure

makes use of the full range between zero and one. In many cases, the density function of

the score is very sharp in a specific range, which means that the majority of scores are

pressed in that range which prevents proper discrimination. This can be checked using

kernel density estimation, as used in a very recent research paper [Chen 14]. A summary

of the models and to what extent they meet the desired characteristics is demonstrated

in Table 5.2.

The thought process we were going through in the search for the proper model is

depicted in Figure 5.2.1 With prior knowledge that radial models will not work at all in


Table 5.2: Summary of models and desired properties a

Models

Properties

RG

M

RR

M

MU

M

IMU

M

AD

DM

GEM

RA

M

CW

AD

DM

NW

AD

DM

ERM

SBM

DD

F

DSB

MI

GH

M

BA

M

RD

M

IRD

M

SFRD

M

MSB

M

0 ≤ Ω ≤ 1 P P O O O P P P O P P P O P P P P P P

Ω = 1iff fully efficient

P P O O O P P P O P P O O O P O O P P

Ω = 0iff fullyinefficient

O O O O O P O O O O O O O P O O P P

Ω isUnits invariant

P P O P O P P O P P P P O P P P P P P

Ω isTranslationinvariant

O O O O P O P O P O O O P O P P O P P

Ω is stronglyisotonic

O P P P P P O P P P P O P O O P P O P

Ω is invariant toalternativeoptima

P P P P P P P P P P P P P P P P P P P

Ω possessesdiscriminatingpower

P P P O O O O P O P P P O P P O O P P

Computationaldegree

O OOP P P O P P P O P P P P P P P P P

aRussell Graph Model=RGM, Refined Russell Model=RRM, Multiplicative Model (log effi-ciency)=MUM, Invariant Multiplicative Model=IMUM, Additive model= ADDM, Global Efficiencymeasure=GEM, Range Adjusted Measure=RAM, Constant weighted additive model=CWADDM,Normalized weighted, Additive model=NWADDM, Enhanced Russell Measure=ERM, Slack-basedmeasure=SBM, Directional Distance Function=DDF, Directional, slack-based measure of ineffi-ciency=DSBMI, Graph Hyperbolic measure=GHM, Bounded Adjusted measure=BAM, Range Direc-tional Model=RDM, Inverse Range Directional Model=IRDM, Slack free RDM=SFRDM, ModifiedSlack-based measure=MSBM


Centre for Management of Technology and Entrepreneurship

Non-oriented model creation map

7

Aggregation (min or max)

RatioSummation

Multiplication

Input minimizationFull radialItem radial

Non-radial(additive)multiplicative

Output maximizationFull radialItem radial

Non-radial(additive)multiplicative

Figure 5.1: The blue print to construct a non-oriented DEA model

our case, we focus on non-radial models. We will deal with n DMUs in technology T ,

with m inputs x ∈ Rm and s outputs y ∈ Rs. We assume all data is strictly positive

for now; we will relax this assumption later. The production possibility set (PPS) is

defined as: (x, y)|x can produce y. Production function P : Rm → Rs is shown by

P (x) = max y|(x, y) ∈ PPS. So, P (x) is the maximum attainable output from resource

level x. In the context of relative evaluation, in practice we study a bundle like (x0, y0)

and we can define P (x0) on a subset of PPS and look for maximum attainable output from

resource level x0 which produces at least y0. We also define the set L(y) = x|P (x) ≥ y,

which is the set of resources that can produce at least y. The isoqL(y) consists of x that

cannot be reduced radially or, in other words, one of the inputs is fully efficient (zero

slack), whereas the set effL(y) contains x which are fully efficient at every input, with

zero slack. It is clear that effL(y) ⊆ isoqL(y). These two sets are defined as follows:

isoqL(y) = x|x ∈ L(y), θx /∈ L(y), θ ∈ [0, 1) ,

effL(y) = x|x ∈ L(y),∀x′ ≤ x, x′ /∈ L(y) .

From here, we could define various functions to capture the input inefficiency and output

inefficiency separately and aggregate them appropriately in order to satisfy the desired


attributes mentioned earlier. Then it comes the important question of operationalization

of the idea, which enables us to compute the measure.

5.2.2 Model in the making

Going back to our Figure 5.2.1, we go through every possible known model and evaluate

if it is a good fit to our case, which is essentially investigating if ratios can mix with non-

ratio (normal) variables, where the convexity axiom holds, in the formulation without

any distortion in the frontier. As mentioned in Chapter 1, with reference to [Cook 14],

by ratio variable we mean a variable composed of two other variables, specific to the

DMU. Therefore the change of units/scaling, e.g. various form of reporting grades in

school subjects, is not seen as a ratio variable and any standard DEA model which is

units invariant would give valid results. We are considering the cases where the convexity

assumption is applicable and the components of ratios are known. In order for this to

happen, we ask the following research questions:

• How can we formulate the convex hull of PPS?

• Can it be operationalized via linear programming?

• To what extent does it meet our preferred properties?

Individual Scalars

Let us start with individual scalars, as in the Russell measure [Fare 85], in which the

efficiency function is given by aggregating the following constructs:

ΩI(x, y) = min∑

θi|(θ1x1k, ..., θmxmk) ∈ L(y), 0 < θi ≤ 1

ΩO(x, y) = max∑

φi|x ∈ L(y1kφ1, ..., yskφs), φ ≥ 1

Before deciding on the measure formulation, we need to clarify what (θ1x1k, ..., θmxmk) ∈

L(y1kφ1, ..., yskφs) means for the ratio variables. Given the observed data, we can con-


struct the empirical PPS, which is the convex hull of observed units. The collection of

the constrains that defines the empirical PPS are given by:

∑nj=1 λj · nxij∑nj=1 λj · dxij

≤ θixik, xik = nxik/dxik ∀i = 1, .., q;

n∑j=1

λj · xij ≤ θixik i = q + 1, ..,m;

n∑j=1

λj · yij ≥ φiyik ∀i = r + 1, .., s;∑nj=1 λj · nyij∑nj=1 λj · dyij

≥ φiyik, yik = nyik/dyik ∀i = 1, .., r;

n∑j=1

λj = 1;

λj ≥ 0.

We can define some possible measures such as:

1)ΩIm

ΩOs

, 2)ΩIm

+Ω−1O

s, and 3)

ΩI + Ω−1O

m + s

It is trivial but worth mentioning that in the process of aggregating constructs, the

constraints of each construct should hold for the others as well, since aggregating is not

done ex post facto but considers both constructs simultaneously. The above constraints

can be linearized by the following change of variables:

θiλj = ωij φiλj = χij

The first measure, although nonlinear, can theoretically be transformed into a fractional

linear program same as in [Past 99b]. The way we rewrite the formulation with the

change of variables has made this into something similar to an enhanced Russell measure,

so the ratio variables have been incorporated without adding to the complexity of the


formulation as seen below:

min Ω =

∑mi=1 θim∑si=1 φis

, (5.3a)

n∑j=1

λj · nxij ≤n∑j=1

ωijdxij · xik ∀i = 1, .., q (5.3b)

n∑j=1

λj · xij ≤ θi.xik i = q + 1, ..,m (5.3c)

n∑j=1

λj · yij ≥ φi.yik ∀i = r + 1, .., s (5.3d)

n∑j=1

λj · nyij ≥ yik

n∑j=1

χij · dyij ∀i = 1, .., r (5.3e)

n∑j=1

ωij = θi ∀i = 1, .., q (5.3f)

n∑j=1

χij = φi ∀i = 1, .., r (5.3g)

n∑j=1

λj = 1 (5.3h)

λj ≥ 0. (5.3i)

Despite having successfully included ratio variables without distorting the PPS, the above

model does not meet all the desired properties we listed before. Model (5.3a) is similar to

the enhanced Russell model we discussed in Chapter 3, in terms of complexity and that

the objective function is not linear. Let us transform this into something linear using the

fractional linear programming technique. We introduce new variables as:

β =

∑si=1 φis

−1

, t−i = βθi, t+i = βφi;

µj = βλj, ωij = βωij, and χij = βχij.

Using the above, any optimal solution to the (5.4a), will give an optimal solution to 5.3a.

The LP formulation is given by:

min

∑mi=1 t

−i

m(5.4a)


s.t.∑si=1 t

+i

s= 1 (5.4b)

n∑j=1

µj · nxij ≤n∑j=1

ωijdxij · xik ∀i = 1, .., q (5.4c)

n∑j=1

µj · xij ≤ t−i .xik i = q + 1, ..,m (5.4d)

n∑j=1

µj · yij ≥ t+i .yik ∀i = r + 1, .., s (5.4e)

n∑j=1

µj · nyij ≥ yik

n∑j=1

χij · dyij (5.4f)

n∑j=1

ωij = t−i ∀i = 1, .., q (5.4g)

n∑j=1

χij = t+i ∀i = 1, .., r (5.4h)

n∑j=1

µj = β (5.4i)

µj, t±i , β ≥ 0 (5.4j)

The above has been transformed into a linear program, of course at the cost of extra

variables and constraints. It still does not meet the translation invariance property, so

we go on building a different measure.

To try the next form of aggregation, we selectΩI+Ω−1

O

m+sas the measure with a note that

if any zeros exist in inputs or outputs, they are omitted and the denominator should be

the summation of absolute positive variables. This model is less biased than the other

choice, ΩIm

+Ω−1O

s, that averages input efficiency and output efficiency separately and might

assigns more importance to the side with fewer variables, because fewer inputs or outputs

will make the denominator smaller. For this one, we choose some other form of change of

variables, as noted below. In addition, we multiply yik by∑n

j=1 λj, which will not change

anything since at the optimal point, it is going to be one. The transformations are given


by:

φ−1i = zi,

λjzi

= λjφi = χij

θiλj = ωij,n∑j=1

λj · yij ≥ yik.

∑nj=1 λj

zi∀i = r + 1, .., s.

With the new changes the LP will look like the below:

min Ω =

∑mi=1 θi +

∑si=1 zi

m+ s(5.5a)

n∑j=1

λj · nxij ≤n∑j=1

ωijdxij · xik ∀i = 1, .., q (5.5b)

n∑j=1

λj · xij ≤ θi.xik i = q + 1, ..,m (5.5c)

n∑j=1

λj · yij ≥n∑j=1

χij.yik ∀i = r + 1, .., s (5.5d)

n∑j=1

λj · nyij ≥ yik

n∑j=1

χij · dyij ∀i = 1, .., r (5.5e)

n∑j=1

ωij = θi ∀i = 1, .., q (5.5f)

n∑j=1

χij = zi ∀i = 1, .., r (5.5g)

n∑j=1

λj = 1 (5.5h)

λj ≥ 0 (5.5i)

In (5.5) we have successfully linearized the model in the presence of ratio variables. It is

interesting to know that the technique here can easily be applied to the normal Russell

graph measure we described in Chapter 3. There are various ways to approximate the

Russell graph measure, e.g. MIP [Coop 99a], but no linearization that we know of to

date has been proposed to solve the Russell graph measure through the LP. Although

we made (5.5a) practical, computationally, the model is not translation invariant and we

continue our search, this time through additive models.


Slack based measures

Now, we turn our attention to the additive constructs, which would be:

ΩI(x, y) = max∑

w−i .s−i |(x1k − s−1 ..xmk − s−m) ∈ L(y), s−i ≥ 0

(5.6)

ΩO(x, y) = max∑

w+i .s

+i |x ∈ L(s+

1 + y1k..s+s + ysk), s

+i ≥ 0

(5.7)

The goal is to eliminate the maximum slacks possible. Some of the possible choices are:

1)a+ αΩI

b− βΩO

, 2)αΩI − βb+ΩO

3)αΩIΩO and 4)αΩI + βΩO

For the measure, we leave the first two choices out due to imbalance between input and

output slacks, and the third choice because of non-linear nature. We go with the general

form αΩI + βΩO. The weights are included in the definition of ΩI and ΩO and as can be

seen in (5.6), are defined for each input and output element individually. The scalars α

and β are applied to the summation of weighted slacks as a whole. As we reviewed in the

previous section, there are a variety of choices for the weight and scalar, which gives us

the freedom to search for a measure which possesses the desired properties listed at the

beginning of this section. Among the options available after close investigation, the (5.8)

fits the purpose. It is a hybrid measure inspired by the normalized weighted model by

Lovell and Pastor [Love 95b] and range adjusted model by Cooper et al [Coop 99a]. The

σxi and σyi are standard deviations of inputs and outputs, respectively. The motivation

behind these weights is to make the model units invariant. Weights are constant and will

not change for each DMU, and there is a claim in the literature that constant weights

have the advantage of potentially ranking DMUs [Coop 99a]. By choosing the standard

deviation as weights, we also achieve translation invariance, as we prove later.

Sk = maxλ,s±i

m∑i=1

s−iσxi

+s∑i=1

s+i

σyi(5.8a)


− xik + s−i = 0 xik = nxik/dxik∀i = 1, .., q (5.8b)


n∑j=1

λj · xij − xik + s−i = 0 i = q + 1, ..,m (5.8c)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = r + 1, .., s (5.8d)∑n

j=1 λj · nyij∑nj=1 λj · dyij

− yik − s+i = 0 yik = nyik/dyik ∀i = 1, .., r (5.8e)

n∑j=1

λj = 1 (5.8f)

λj ≥ 0. (5.8g)

Theorem Changing x to ax+ b and/or y to cy + d will not change the solution and the

objective value of (5.8). If we assume that S∗O is the optimum inefficiency for DMUO,

with input XO = [x1O...xmO] and output YO = [y1O...ysO], the claim is that changing any

xi to axi + b and/or any yi to cyi + d would result in the same inefficiency score S∗O.

Proof: With no loss of generality, we assume non-ratio input xm and output ys have

been transformed to xmnew = axm + b and output ysnew = cys + d. We try to find DMU

C’s inefficiency score in this new setting. We know that the standard deviation is scaled

but is not affected by a shift in data. So σxmCnew= a.σxmC and σysCnew

= c.σysC holds.

Through the following equations, we can also see that s−mCnew= a.s−mC .

1.∑n

j=1 λj · xmjnew = xmCnew − s−mCnew

substituting xmjnew and xmCnew with a.xmj + b and a.xmC + b we have:

2.∑n

j=1 λj · (a.xmj + b) = a.xmC + b− s−mCnew

a.∑n

j=1 λjxmj + b = a.xmC + b− s−mCnewwe know

3.∑n

j=1 λjxmj = xmC − s−mC substituting this into above gives:

4. a.(xmC − s−mC

)+ b = a.xmC + b− s−mCnew

this reduces to

5. −a.s−mC = −s−mCnew


SCnew = maxm−1∑i=1

s−iCσxi

+smCnew

a.σxm+

s−1∑i=1

s+iC

σyi+ssCnew

c.σyr(5.9a)


− xiC + s−i = 0 xiC = nxiC/dxiC∀i = 1, .., q (5.9b)

n∑j=1

λj · xij − xiC + s−i = 0 i = q + 1, ..,m− 1 (5.9c)

n∑j=1

λj · xmjnew − xmCnew + s−mCnew= 0 (5.9d)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = r + 1, .., s− 1 (5.9e)

n∑j=1

λj · ysjnew − ysCnew − s+sCnew

= 0 (5.9f)∑nj=1 λj · nyij∑nj=1 λj · dyij

− yiC − s+i = 0 yiC = nyiC/dyiC∀i = 1, .., r (5.9g)

n∑j=1

λj = 1 (5.9h)

λj ≥ 0. (5.9i)

In the same fashion, we can deduce that s+sCnew

= c.s+sC and it is clear that (5.9) becomes

the same as (5.8) for DMU C (C instead of DMUk). So the optimum objectives have to

be the same as well. This is a useful property and allows us to relax the strictly positive

condition on inputs and outputs, which we mentioned earlier. For the ratio variables, it

is easy to see that scaling has no effect whether it is applied to the ratio as a whole or

the components, because any scaling to the components can easily be represented by an

scaler to the whole ratio which would reflect on the standard deviation as well and like

the above it will result in the same optimum. When it comes to the shift in the origin

for a ratio variable, it is strictly bounded to the whole ratio rather than its components.

The reason behind this is we are seeking the maximum slack of the ratio variables as a

whole and the slacks for denominator and numerator are not considered separately. The

components of the ratio variable is used merely to build the right PPS and the shift from


the origin should be applied to the whole PPS, in a sense that the original PPS can be

obtained by a simple reverse transformation. The shift in the components of the ratio is

not allowed because it will result in a completely different PPS with no clear formula to

get us back to the original PPS.

Now, to make the concept that we have already developed into a computational reality,

we have to transform the above to LP. We linearize it by introducing new variables as

follows:

ωij = nyij − dyij · yik, λj · s+i = ∆ij,

σij = nxij − dxij · xik, λj · s−i = Φij.

The additive model then becomes:

maxn∑j=1

q∑i=1

Φij

σxi+

m∑i=q+1

s−iσxi

+n∑j=1

r∑i=1

∆ij

σyi+

s∑i=r+1

s+i

σyi(5.10a)

s.t.n∑j=1

λj · xij − xik + s−i = 0 ∀i = q + 1, ..,m (5.10b)

n∑j=1

λj · σij +n∑j=1

Φij · dxij = 0 ∀i = 1, .., q (5.10c)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = r + 1, .., s (5.10d)

n∑j=1

λj · wij −n∑j=1

∆ij · dyij = 0 ∀i = 1, .., r (5.10e)

n∑j=1

λj = 1 (5.10f)

λj ≥ 0. (5.10g)

In terms of our desired properties, this model readily satisfies propertes 3, 4, 5 and,

6 mentioned in 5.2.1. Properties 3 and 4 are proven above and we will prove 5 and 6

next. We need to investigate property 7 and see if we can amend the model to satisfy

properties 1 and 2.


Theorem Changing any of the xik to x′ik < xik and/or yik to y′ik > yik will decrease

the objective value of (5.8) thus making the unit more efficient which means less ineffi-

cient since our model, at this stage, measures inefficiency. Proof: Let us assume that

everything is kept the same except for xik, which is replaced by x′ik < xik. There are

two situations x′ik ≤ xik − s−∗i or xik − s−∗i < x′ik < xik. The first case will reduce

to x′ik = xik − s−∗i because further reduction will push the point out of the PPS. It is

obvious that the corresponding slack will be zero and the efficiency score improves. For

the second scenario, let us assume that the optimal objective for the improved point is

Ω′∗. It is clear that taking s−i = s′−∗i + xik − x′ik and keeping the rest of the variables the

same will lead to a feasible solution to 5.8 with a bigger objective value, so the optimal

objective value would certainly be bigger than Ω′∗.

Theorem The objective value in (5.8) is invariant to alternative optima. Proof: The

way the objective is constructed guarantees this property since the alternative solution

is by a different set of intensity variables λs and they are not included in the objective;

additionally, the whole aggregate of slacks, which is of interest, is maximized and would

stay the same even if alternative solutions exist.

We can see that properties 1 and 2 are not satisfied by the model in its current

form. In the literature, whenever the goal has been to eliminate all slacks, a number of

techniques have been employed to bring the score into some meaningful range. Among

them, we can refer to MIP, extended additive model, constant weighted additive model,

normalized weighted additive model, GEM, enhanced Russell (which is like SBM), and

RAM. In some cases bringing the score into the zero to one range is done ex post facto, in

other words, after optimal slacks, s−∗i and s+∗i , are calculated. Here are some examples:

Ω =m∑i=1

sik−

xik+

s∑i=1

sik+

yik,

Ω = 1− 1

m+ s

m∑i=1

s−∗ixik

+s∑i=1

s+∗i

yik + s+∗i

,


Ω = 1− 1

m+ s

(m∑i=1

s−∗ixik − xi

+s∑i=1

s+∗i

yi − yik

),

Ω = −m∑i=1

1/σ−i · s−i +s∑i=1

1/σ+i · s+

i ,

Ω =

[1 +

1

m

m∑i=1

s−ixik

+1

s

s∑i=1

s+i

yik

]−1

,

Ω =1− 1

m

∑mi=1

s−ixik

1 + 1s

∑si=1

s+iyik

,

Ω = 1− 1

m+ s

m∑i=1

s−∗ixi − xi

+s∑i=1

s+∗i

yi − yi, and

Ω =1− 1

m

∑mi=1

wi·s−ixi0−minj xij

1 + 1s

∑sr=1

vr·s+rmaxj yrj−yr0

.

5.2.3 Making sense of the inefficiency score

Although we could have used some of the above ex post facto transformations, they

did not bear all the characteristics we wanted, so we decided to do something novel.

To make the score meaningful and bounded, we propose the following ex post facto

treatment. First, let us define a dummy DMU, DMU D, with the following input and

output characteristics: xiD = maximum(xij) for every i = 1, .., s and j = 1, .., n

yiD = minimum(yij) for every i = 1..r and j = 1..n.

Let us also assume that equation (5.8) could be solved, as we will show later, and generate

the objective S∗d . We then normalize the efficiency score for each DMUk by 1− S∗kS∗d

. Since

the DMU k is the worst DMU, the normalized score for each DMU is bounded by zero

and one and reflects the relative efficiency.

Another way to normalize is to choose the worst DMU, the one with the highest slack,

from existing n DMUs; this will make the DMU with the largest slack attain the score of

zero and overall scores would be lower in general. This is because the inefficiency of the

worst DMU in the set is still smaller than, if not equal to, the dummy DMU. However,


what we could gain is a slight increase in the discrimination power. The reader should

bear in mind that the scores are relative and true for the data set under study and should

not be taken out of context. In both situations, any slight change in the data would affect

the scores, although DEA is, in general, data oriented and it is not specific to this case.

The case of heuristically adding DMUs to the PPS has been practiced for other

purposes. Thanassoulis et al. [Than 12] proposed the idea of adding unobserved DMUs

with a similar mix to the anchor DMUs to get better envelopment. The added DMUs

reflect a combination of technical information and the decision-maker’s value judgment.

They have a complex procedure for doing this and the result is an extended frontier,

which envelops the data better.

Our model accounts for individual variations in input and output rather than attempt-

ing a uniform shrinkage or expansion. The units will then be less efficient compared to

radial models. Our model, however, does not satisfy the second part of property 2, scoring

zero very rarely (except when we normalize scores using the worst performer). Another

possible criticism is that dividing the inefficiency with a notably large number (the worst

performer has the highest slack, and in the case of the dummy DMU, the slacks could

be very large, in particular, if the variables are far apart) will weaken the discrimination

power of our measure and cause the crowdedness of ex post facto inefficiencies within a

certain interval. This problem exists for most of the non-oriented models in the litera-

ture and is not specific to ours but there is room for improvement here. To preserve the

fairness of DEA and the fact that we like each DMU to be seen under the best light, and

increase the discrimination power, we suggest clustering the DMUs and for each cluster,

designating a dummy weak point. For example when dealing with a large database of

bank branches, clustering the data into three: small, medium and large, and creating a

dummy DMU for each group, would result in a better discrimination power.

Chapter 6

Methodology: Approximating the

Frontier in BBC Model

We demonstrated in the Chapter 1 mixing ratio with normal variables, in conventional

DEA, distorts the frontier. By conventional DEA we mean known DEA models without

any provisions for using ratio variables. We have proposed a non-oriented model to deal

with ratios in Chapter 5, however, in the specific case of the BCC model, where the ratio

variables are on the side of orientation, the problem has remained unsolved. In other

words, we failed to linearize the BCC model, when we aim to reduce the inputs, only

some in a ratio form, or focus on raising the output levels, where only some outputs

are in the form of a ratio, and not all. We did a review of the limited literature on

non-analytical techniques to attain the production frontier in Chapter 4 in the context

of DEA. As mentioned in Chapter 4, Emrouznejad et al. [Emro 09] clearly showed the

BCC model cannot be linearized when the ratios exist on the orientation side. Let us

remind ourselves why the BCC output-oriented model with output ratios could not be

linearized mathematically. The original BCC model with consideration of the proper

convexity looks like:

99

Chapter 6. Methodology: Approximating the Frontier in BBC Model100

maxλ,η

η (6.1a)

s.t.n∑j=1

λj · xij ≤ xik i = 1, ...,m (6.1b)

n∑j=1

λj · nyij ≥ yik

n∑j=1

η · λj · dyij yik = nyik/dyik, i = 1, ..., r (6.1c)

n∑j=1

λj · yij ≥ η · yik i = r + 1, ..., s (6.1d)

n∑j=1

λj = 1 (6.1e)

λj ≥ 0. (6.1f)

Even by substituting η · λj by γj and adding the auxiliary constraints, still there is no

guarantee that the LP (6.2) solution satisfies η · λj = γj. As seen here:

maxλ,η,γ

η (6.2a)

s.t.n∑j=1

λj · xij ≤ xik i = 1, ...,m (6.2b)

n∑j=1

λj · nyij ≥ yik

n∑j=1

γj · dyij yik = nyik/dyik, i = 1, ..., r (6.2c)

n∑j=1

λj · yij ≥ η · yik i = r + 1, ..., s (6.2d)

n∑j=1

λj = 1 (6.2e)

n∑j=1

γj = η (6.2f)

λj ≥ γj ≥ 0. (6.2g)

As a matter of fact, we aim to find a solution for such cases with approximation methods

and find close to optimal solutions. The method, which has been developed by us and


will be discussed in this chapter, is a heuristic one since the problem does not have a

clear-cut solution. While going through different techniques, as discussed in Chapter

3, we were inspired by the Monte Carlo method to generate samples of the nonlinear

frontier. The method is not exactly a Monte Carlo method, as a point on the frontier

does not have a simple mathematical formulation. The rest of this chapter is organized

as follows: we look at approximation methods to generate parts of PPS in theory, we will

study the challenges in practice next and at the end we present the developed heuristic

by us.

6.1 Partial Improvement: Approximation methods

The production possibility set consists of observed DMUs plus any convex combination

of them. We also know that the frontier consists of DMUs on the desired edges (facets)

of the PPS. Our goal is to create samples of the PPS and retain the best performers

at each iteration. We hope that these local best performers will lead us collectively to

the true frontier (at least parts of the true frontier). In the Monte Carlo method, three

basic things should exist: a) a mathematical formulation, b) a reasonable variation of

each input, and c) an idea of an acceptable output. In our case, we are seeking the

best performers, and we know what is acceptable as a best performer: it should use the

minimum input to produce the maximum output. We also know how the players vary

(convex combination of DMUs), however, we do not have one formula that transforms the

DMUs into a best performer, rather, we have an LP that pinpoints the best performer

for a specific DMU. This means that our best performer is not represented by a number

or, better said, does not have a distribution/average and a confidence interval which are

usually the output of the Monte Carlo approach. This is why we call our method a pseudo

Monte Carlo method. Theoretically, if we repeat this procedure a sufficient number of

times for a specific DMU, the best performer (known as the target) will become stable


and we can conclude that this is a point on the true frontier.

6.1.1 How to generate PPS progressively

As mentioned above, the main assumption behind PPS is that any combination of DMUs

will be feasible unless stated otherwise. The exceptions are the cases with weight restric-

tions or multipliers’ restriction in the envelopment form as well as the production nature,

which may require convexity. Convexity limits the summation of DMU weights to one.

Each weight combination applied to a set of DMUs leads to a point in PPS. Now that we

know what is a reasonable variation of weights, what we struggle with is that there are

endless possibilities for these weights within the limit and in conjunction with the others.

If we had access to every possible weight set, we could generate the entire PPS, but

the number of weight sets is infinite. The weight matrix is generated either by random

weights or scanning the space for weights. Each row (vector) provides us with a unique

set of weights, which can generate a hypothetical DMU, or can be assumed as one input

in the Monte Carlo method.

We assume that the convexity axiom holds, so each weight should be less than, or

equal to, one; while the summation of all weights should equal one. Even with this

assumption, our job is not easy, as there are infinitely many real numbers between zero

and one. One idea is to limit the choices, for example defining a 0.1 resolution so that

the choices will be 0.1, 0.2, .., 1. Another way to deal with this problem is to generate

random weights that are less than one and satisfy the convexity axiom. Theoretically,

when we have a finite set of numbers for each weight, we should be able to produce all

combinations under the convexity constraint with “for loops”.

6.1.2 Challenges

The idea o generate the entire PPS works in theory but in practice it is hardly doable

due to challenges discussed here.


Figure 6.1: Average non-zero weighs of size p vs Resolution

Sparsity

If we limit the choices between zero and one for each weight with a specific resolution,

while imposing the convexity on top of that, we will end up with a matrix with many zero

elements. This sparse matrix creates computational challenges for software programs.

This is mainly because of the limited number of options to choose from, such as 5 nonzero

choices when the resolution is 15, and the fact that, on average, the number of choices

will be reduced to half since they have to add up to one. So the speed of losing choices is

12n

. As we will prove, the resolution should go to zero to avoid ending up with a sparse

matrix as shown in Figure 6.1.

Lemma:At each selection the interval breaks into half on average. Proof: For each

weight, we can select from the choices within the interval, from minimum to maximum,

with the same probability. As a result, on average, at each iteration, we cut the inter-


val into two parts. Assuming that the interval between zero and one is divided by p

or the resolution is 1p, the probability of each selection is 1/(p + 1) and the length of

interval for the next selection will be 0, 1p, 2p, ...p

p, so, on average, the next interval will

be 1p+1∗ p(p+1)

2p= 1

2and similarly for an interval of length a, the length of the remaining

interval, after one coefficient selection, will on average be a2.

Theorem: The expected number of nonzero weights for the weight vector of size

w = p is: p2

2p−1

Proof: Please note that with the resolution p the maximum number of nonzero weights

in any sets will be p and this as the rest w − p will be set to zero, when w > p. We are

looking at w = p which gives the maximum non zero weights. Assuming that 1/p is a

defined resolution, then you can treat the [0, 1] interval like p identical balls. The ways

we can have k nonzero weights is like dividing p identical balls into k distinct bins in a

way that each bin has at least one ball (each bin represent a weight). Once we decide

which k weights from 1, .., p to choose, then partitioning between them can be done in(pk

)×(p−1k−1

)ways.

The total number of ways that p balls can be assigned into n weights (empty weights

are accepted) can be thought of as allocating 2p balls into p bins as before, and then take

one ball out of each bin to create those empty bins or zero weights.(

2p−1p−1

)Now that we have the probability of having k nonzero weights, we can find the

expected value asM=

∑pk=1 k∗(

pk)(

p−1k−1)

(2p−1p−1 )

.

We have: k ∗(pk

)= p ∗

(p−1k−1

)then∑p

k=1 k ∗(pk

)(p−1k−1

)= p ∗

∑pk=1

(p−1k−1

)2= p ∗

(2p−2p−1

). Noting that

(2p−1p−1

)= 2p−1

p∗(

2p−2p−1

), the

expected M is given by:∑p

k=1

k∗(pk)(p−1p−1)

(nk)(p+k−2k−1 )

= p2

2p−1

In general, the weight vector size does not necessarily equal p and in fact itself is a

variable, for which the optimum number is not yet known. When the sampling method

is chosen, then the weight vector length must equal the sample size. Moreover, the


Figure 6.2: Sparse Matrix: Average number of non-zero weights vs Resolution

resolution 1p

should have p large enough to accommodate the desired sample size: p

requires to be at least equal to the sample size. We calculated that for weight vector, of

size w, the average number of nonzero elements is given by the following:∑w

k=1

k∗(wk)(p−1k−1)

(p+w−1p−1 )

.

However, independent from the weight vector size, to have all elements at nonzero

resolution, we need to approach zero in a way seen in the following theorem and seen in

figure 6.1.2.

Theorem: For having all the weights nonzero on average, the resolution should go to

zero. Proof: Using the above lemma, if the resolution is 1p, which means we have p

choices to start with, then for the k weight to be nonzero, we should have p2k−1 ≥ 1 ,

k ≤ p. We transform the previous inequality to 1p≥ 1

2k−1 and then 1p≥ 21−k. Since for

k >> 1, we can say k − 1 ≈ k, then k ≥ log 11p

, which implies that for the number of

nonzero weights to grow, resolution should go to zero.


Exponential number of iterations

From the previous part, we concluded that the smaller the resolution, the better, as a

sparse matrix will shrink. However, this will lead to another challenge. The problem is

that the number of iterations grows exponentially when the resolution becomes smaller

and the number of DMUs, n, increases, as can be seen in figure 6.1.2. This imposes a

computational issue with software packages. Small resolutions and the use of for loops is

not a recipe for success.

The number of iterations is directly related to the number of weight sets. Finding the

number of weight sets that totals one is like assigning p identical weights to k DMUs,

given that zero can also be assigned. As highlighted before, this is like having 2p identical

weights assigned to k DMUs with at least one to each and then taking one weight out of

each unit. This is a classic problem and it can be done in(

2p−1k−1

)ways. If we only look at

all nonzero weight sets, the number equals(p−1k−1

). For instance having only 4 DMUs and

0.1 resolution, 969 weight sets are generated with 84 all-positive sets. By changing the

resolution to 0.05, the weight sets will grow to 9,139 with 969 all-positive sets. Add one

more DMU to the set and weight sets will reach 82,251, with 3,876 all-positive sets. This

is why, as we will see later, we have moved from a for loops strategy to random selection.

6.1.3 Pseudo Monte Carlo method

We have explained the Monte Carlo method in Chapter 4. Recall that once a relationship

between inputs and outputs is established, random values for each input are drawn from

their respective distributions and for each input value, an output value is calculated.

Based on these values, the most probable value for the output is identified. In this

research, we have defined inputs and outputs in a significantly different manner than is

done in a typical Monte Carlo simulation. Our method takes the idea of generating inputs


14000000

12000000

10000000

t sets

6000000

8000000

Num

ber o

f weigh 4 DMUs

5 DMUs

6 DMUs

7 DMUs

4000000

6000000N

8 DMUs

2000000

00.05 0.1 0.2

Resolution

Figure 6.3: Number of iterations grows exponentially with smaller resolution when the

number of DMUs increases.


(in our case, DMUs) by continuing to use the convexity rule rather than a distribution;

it then uses not an explicit mathematical formula but an optimization procedure to

establish the output benchmark, for the DMU under evaluation. Last, but not least, we

do not focus on the most probable benchmark, but instead, on the best possible one. This

is why our method is not exactly a Monte Carlo simulation but similar to it. Another

thing that distinguishes our model from a conventional simulation is the learning process.

At each step, if there has been an improvement in the benchmark, we then check to see

the constructs of the benchmark and add them to our inputs (DMUs).

Now, we explain how we have made this idea work. When we have finite resolution,

we have limited options for weights but by using nested “for loops”, although time con-

suming, we can produce all possible hypothetical DMUs with the options available. The

method of fixed resolution resulted in a number of complexities, as explained in Section

6.1.2. Our solution was to, instead of covering all possibilities, draw weights randomly,

while meeting the convexity axiom. Choosing weights randomly without restricting our-

selves to a fixed resolution means that we will have unlimited options for weights between

zero and one, in contrast to p options at most with 1p

resolution. However, we cannot

exhaust all possible combinations because their number is infinite. When the number of

DMUs is large, the sparse matrix issue still poses a challenge. To fix the problem, we

will limit the number of DMUs by choosing only a sample of them. The trick here is that

this procedure is repeated many times until we visit most DMUs and produce enough

hypothetical ones to build the frontier.

We believe that a sample of existing DMUs and a randomly generated weight set

under convexity axiom is all we need to generate a point that belongs to the PPS. In this

study, we are more interested in finding the best performer, or, at least, better performers,

that envelope the traditional frontier. So it makes sense to choose our building blocks

from traditionally identified DMUs and, in the process, keep the ones that perform better

and discard the rest. Keeping underperformers will only use some of our storage capacity


without generating any valuable outcome.

The procedure can be summarized in the following steps:

1. Run DEA and find the DMUs on the conventional frontier (here the BCC frontier,

treating ratios as normal variables);

2. Select randomly p DMUs from the data set;

3. Generate hypothetical DMUs out of p parents;

4. Select the ones that happen to be above the DEA conventional frontier, and add

them to the frontier set;

5. Discard all the other hypothetical ones; and

6. Go to step 2 if the convergence criteria has not yet been met.

In the following, we elaborate on the keep or discard rule (heuristic) and, of course, the

convergence criteria.

6.1.4 Keep or discard, an LP feasibility problem

In other sampling methods like Gibbs sampling and the Monte Carlo Markov chain

process, the effort is made to get the samples that, indeed, represent the population.

They usually have a burn-in period and they throw away approximately the first 5000

iterations to make sure that the chain has reached its stationary state and samples are

truly from the population. For us, the population we are seeking and want to generate

samples from are the relatively better performers and we should have a way to eliminate

the samples not representing our interest.

Because of the nonlinear nature of our problem, we cannot know in advance if a

hypothetical DMU is a high performer compared to the rest. For each point generated,

we keep it if it happens to be outside the enveloped space. If the new point is enveloped


by the current frontier (which means it satisfies the constraints of the conventional BCC

model), we leave it out. Because if the point is feasible it has already been enveloped and

does not provide us with any potential improvement. It is worth noting that the points

below the conventional frontier (inside the conventional PPS but not on the frontier)

could convey some information if they are identified as being benchmarks. Because of

the nonlinear nature of our problem with the ratios involved, it is possible that the target

for some DMUs occur below the traditional frontier. In this research, however, we focus

on the potential improvement and do not worry about possible underestimation of some

units. Just to clarify: if the actual target for a specific DMU is below the traditional

frontier but we had measured its performance against a point on the frontier, presumably

we assessed that as being less efficient than what it actually is. This means that we had

put pressure on the unit and expected it to improve more than it could.

From the standpoint of computational complexity, we need to check if the point is

feasible for conventional DEA. Establishing if an LP model has a feasible solution is

essentially as hard as actually finding the optimal LP solution. The former cannot take,

on average, twice as many operations as the latter within a factor of 2 on average, in the

simplex method. As we are concerned only about the feasibility, we can solve the problem

by a normal LP solver, setting the objective function to be a constant. Feasibility study is

even difficult for a mixed integer program because if no feasible solution exists, then it is

necessary to go through the entire branch-and-bound procedure (or whatever algorithm

we use) to prove this. There are no shortcuts in general, unless we know something useful

about our model’s structure. For example, if we are solving some form of a transportation

problem, then we may be able to ensure feasibility by checking that the sources add up

to at least as great a number as the sum of the destinations.

Here, the intention is that as soon as the point is proved to be feasible in a conventional

LP, it is ignored and the next candidate is tested. Alternatively, we can check if the new

candidate is any better than the rest (not likely to be feasible in the conventional DEA)


by checking if at least one input is less than, and one output is more than, or equal to,

the existing ones. This way we may save solving a few unnecessary LPs.

6.1.5 Convergence

It is important to know how many times the procedure described in steps 1-5 in sections

6.1.3 should be repeated. Simply put, we need to know when to stop and decide that

the best benchmark for the DMU has been found. If there was a way to generate all

data in the PPS, then pinpointing the best benchmark would be easy. But this approach

would be computationally infeasible and, as a result, deciding on when to stop becomes a

function of variables affecting how many data points are visited and how that affects the

benchmark. How many points are visited depends on the constructs: the quality of the

weight matrix, the sample size and the number of times we sample. Sample size could

be fixed or could be a random variable itself. Having many different factors has made it

a real challenge to create and agree upon rules for convergence.

We try to create as many data points as possible from data at hand, and we also need

to check if the benchmark improves or not. For the former, we know a weight vector

applied to a set of DMUs will generate one hypothetical unit. To generate as many

hypothetical DMUs as possible, there are two strategies to consider. One is to generate

as many hypothetical DMUs as possible from all DMUs, all at once, in other words,

having a long weight vector for each iteration (resulting in a huge matrix of weights

which will lead to a sparsity problem). The other is to have samples of DMUs drawn

from the pool of DMUs we have and use a smaller weight vector (size of the vector equals

the sample size).

There is another question to be answered: what sample size should we choose? We

will discuss this in the next section. However, no matter what sample size we choose, we

need to sample enough to decrease the chance of a DMU being missed out. How many

times we sample and the size of different weight sets we employ will eventually define


the number of iterations. In the following, we will discuss the sample size effect on the

frontier.

Effects of sample size on the nonlinear frontier

Having n DMUs and a fixed sample size, k, mathematically, there is an answer for how

many non-identical samples you can draw. This is famously known as the “N choose

K”, denoted by(nk

). This is the minimum number of samples to draw to ensure that all

DMUs are visited. A simple calculation shows that this solution is not practical, in that

even for a small problem involving 50 DMUs, two million samples of size five are required

to exhaust the options. To control that, we define the completeness ratio, as the number

of samples we use divided by(nk

). It should be noted that sample sizes of k and n − k

require the same number of repetitions to get completeness ratio of 100%.

It is obvious that a sample size k implies that we have included sample sizes 1, .., k−1

in some way because weight sets can include zeros. Therefore, between sample size k and

N − k, the larger sample size is expected to generate a more accurate frontier, given

that the quality of the weight vectors are comparable. To quickly demonstrate that, for

example in Chapter 4 where we had 100 DMUs, we tested sample sizes of 10 and 90. The

procedure was repeated 100 times using 500 weight vectors. The larger sample size took

just 1% more time, but found DMUs with higher relative performance. Surprisingly, the

number of above the linear frontier DMUs was 36% fewer but with a higher efficiency

score, on average. You can see in the Figure 6.4 that more DMU efficiency scores were

dropped when we re-evaluated them introducing the unobserved DMUs generated by

sample size 90. It is worth mentioning that for the same number of hypothetical DMUs

generated (fix number of samples from PPS), the effect on the nonlinear frontier caused

by sample size was not significant, as seen in figure 6.5. However, we caution the reader

not to generalize this observation because the initial sample size affects the quality of

posterior samples and how well mixed they are (exploring all parts of the sample space).


100%

80%

90%

50%

60%

70%

fficcien

cy scores

30%

40%

50%

Reevalua

ted ef Sample size 10

Sample size 90

0%

10%

20%

0%0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100

DMUs

Figure 6.4: for same completeness ratio, larger sample size wins


50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Effic

ienc

y Sc

ore

3 DMUs

5 DMUs

7 DMUs

10 DMUs

20 DMUs

20.00%

30.00%

40.00%

DMUs

Figure 6.5: Sample size affect is small if the number of hypothetical DMUs generated

stays the same


90%

100%

70%

80%

50%

60%

ncy

scor

e

100 rep, 500 ws

30%

40%

50%

Effic

ien 100 rep, 500 ws

300 rep, 200 ws

500 rep, 200 ws

10%

20%

30%

0%

10%

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100DMUs

Figure 6.6: Increasing the number of unobserved DMUs, Sample Size 5

Effects of the number of iterations and weight vectors on the nonlinear frontier

Intuitively increasing the pool of unobserved DMUs must make the approximation better

since the ones over the conventional frontier are chosen from a bigger pool (more choices)

and they may eventually contribute to the formation of the nonlinear frontier. However,

as depicted in figure 6.6, observation for a sample size of 5 shows that although going

from 50,000 to 60,000 makes a difference and increase the number of unobserved, and

better performer DMUs, the gain from 60,000 to 100,000 has low impact, mostly because

of saturation. This saturation phenomenon is what we will look for to claim that con-

vergence has happened. It is interesting to mention that not only the pool of unobserved

DMUs are important but also how this size has been constructed, as these do make a

difference. There are two ways to change the pool of unobserved DMUs (in terms of


90%

100%

Chart Title

70%

80%

90%

50%

60%

70%

ncy

scor

e

1000 rep, 100 ws

30%

40%

%

Effic

ien 500 rep, 200 ws

250 rep, 400 ws

200 rep, 500 ws

10%

20%

0%0 2 4 6 8 10121416182022242628303234363840424446485052545658606264666870727476

DMU

Figure 6.7: The same number of unobserved DMUs but different constructs, Sample Size

5

quantity): the number of weights vectors (matrix rows), and the number of samplings

(repetition). The question is: does it matter which to change? Observation in figure 6.7

shows that it does, and for the same pool size, depending on the weight set share, and

repetition share, approximation of the nonlinear frontier can improve.

Ad-hoc rules for convergence

We have already seen how different factors contribute to the quality of the estimated

nonlinear frontier and of course, the time we need to reach saturation. The complex

nature of the interactions between the above factors makes it even more difficult to

provide the analyst with a precise and robust solution, and perhaps none at all.

There is no definite answer for how many weight vectors are sufficient or how many

times the sampling should be repeated or what sample size is the best. However, an


optimum balance between the above factors (sample size, weight sets and repetition)

is a sensible starting point, which can be improved, via trial and error. The reality is

that there is no closed form mathematical formula to generate the magic number of each

parameter before convergence or shorten the time to converge, thus, we rely on heuristics

and ad hoc approaches.

Each case should be studied separately, when the approximation is used to find the

ultimate benchmark. Having said that, through our experience, there are certain steps,

which we advise the reader to take for every problem:

1. Start by looking at the data; exclude all DMUs that are less than 50% efficient in

a conventional DEA model or the 25% bottom percentile (whichever is greater).

This eliminates the low quality DMUs from entering the selection process, as they

are less likely to contribute to the nonlinear frontier.

2. Assume a sample size of P% of the remaining DMUs. Using judgment and depend-

ing on the number of DMUs. If the number of DMUs is very large, then clustering

them before sampling is recommended as explained in [Naja 05]. Clustering cri-

terion is mostly based on envirnmental factors such as population concentration

in the area, income per capita, in order to create a peer group for the DMU un-

der evaluation. The intention behind this suggestion is mainly practicality. If the

sample size becomes very large, then the sparsity problem will occur and compu-

tation will be hard due to the length of the weight vector. We know that in DEA

each unit is benchmarked against units in its peer group, so when we have a huge

number of DMUs, grouping them makes sense and reduces the size of the pool and

consequently, the sample size.

3. Choose x number of weight vectors. There is no magic number in here but we

started with 500.

4. After one full run, the unobserved DMUs, outside the envelopment surface of the


conventional efficient frontier, are kept and added to the set.

5. DMUs on the conventional frontier are re-evaluated and the efficiency scores are

calculated. The procedure is assumed to be final when, after a number of consecu-

tive runs, the efficiency scores have not dropped further, which means the nonlinear

frontier has not changed. This number could be two, or more depending on how well

mixed the sampling is. Well mixed means samples are equally drawn from all parts

of space. A rule of thumb is that without sampling, two times with no change can

indicate saturation and with sampling, and sample size = P%sample space then at

least P iteration with no change is a sign of convergence.

6.2 Remarks

The problem we had was that the BCC model with ratio variables at the side or orien-

tation became nonlinear and there was no technique we knew of to that would linearize

it. If the conventional DEA is employed, then a linear frontier is generated but we know

this is not the true one with the ratios in place.

The approximation method proposed here aims to find a better frontier compared to

previous models. We want to emphasize that this is not necessarily the absolute frontier

but it is a close approximation. This is why we advise the user to reformulate the problem

and try to avoid ratio variables, if possible. However, if that is not an option, then this

method should be used.

The technique suggested here is intended to build parts of, if not all, the true frontier.

We advise the users that this method should not be used as a stand-alone solution to

make decisions on but rather help the analyst to probe further for possible improvement

and use this alongside other methods and their managerial judgment to come to a final

decision.

Chapter 7

Realization, Case Study and Results

In this chapter, we test the models presented in Chapters 5 and 6 to have a better

understanding regarding how the models work in practice. We can also compare the

results to the more conventional methods and compare the outcomes. To achieve this,

it is necessary to build the theoretically proposed model, using computer software as

there is no ready-made code for our model. We have coded the models using MATLAB

by MathWorks. This has proven to be a very time-consuming and difficult task, not

directly contributing to the core research but nevertheless essential, as we need to show

how the developed theory can be used. Our contribution in the realization of the model,

although not yet a commercial product, has shown itself to work well and sufficiently fit

for the purpose of testing. The code, however, could be incorporated in already available

DEA software to extend its capabilities in dealing with ratios. We briefly explain how we

transform the closed form linear programming formulation into a matrix form to make it

work with MATLAB. We also apply the code to a case study of about 130 bank branches,

extracted from one of the main Canadian banks in a certain region.

119

Chapter 7. Realization, Case Study and Results 120

7.1 Realization of the non-oriented model using MAT-

LAB

To refresh our memory, let us look back at our theoretical additive non-oriented linearized

model, appeared in Chapter 5, which is units and translation invariant:

maxn∑j=1

q∑i=1

Φij

σxi+

m∑i=q+1

s−iσxi

+n∑j=1

r∑i=1

∆ij

σyi+

s∑i=r+1

s+i

σyi(7.1a)

s.t.n∑j=1

λj · xij − xik + s−i = 0 ∀i = q + 1, ..,m (7.1b)

n∑j=1

λj · σij +n∑j=1

Φij · dxij = 0 ∀i = 1, .., q (7.1c)

n∑j=1

λj · yij − yik − s+i = 0 ∀i = r + 1, .., s (7.1d)

n∑j=1

λj · wij −n∑j=1

∆ij · dyij = 0 ∀i = 1, .., r (7.1e)

n∑j=1

λj = 1 (7.1f)

λj∆ijΦij ≥ 0 (7.1g)

ωij = nyij − dyij · yik (7.1h)

σij = nxij − dxij · xik (7.1i)

MATLAB works with matrices. The mission has been to transform the above into

the form [Aeq][variables]=[Beq] with the objective of [f][variables]. We start from the

variables: there are n weights λj (one for each DMU), r × n transformed slacks for the

ratio outputs, ∆ij, s− r output slacks(shortfalls), q ∗ n transformed slacks for the ratio

inputs and finally, m − q input slacks (waste) variables. In total, the number of our

variables will equal the number of units, plus the number of normal variables (non-ratio

inputs/outputs) added to the number of ratio variables times the number of units. It

is evident that each ratio variable will add n variables to the LP. This is not a concern


because the number of variables is not critical for the solvers, whereas the number of

constraints is.

Now each line of the constraints above needs to be written in the matrix form to fit

the [Aeq][variables]=[Beq] format. The (7.1c) non-ratio input part is re-written as:[[X](m−q).n , [0](m−q).(n×r+s−r+n×q) , Identity(m−q)2

][variables] = xk.

The (7.1d) ratio input part is expressed as:[[σ]q.n , [0]q.(n∗r+s−r) , [dx]q.(n∗q) , [0]q.(m−q)

][variables] = [0]q.1,

where dx is a matrix composed of denominators for every ratio input in all DMUs on a

diagonal block of size n, given by:

[dx1,1, dx1,2, .., dx1,n, 0, .., 0, 0, .., 0

0, .., 0, dx2,1, dx2,2, .., dx2,n, 0, .., 0

... ...

0, .., 0, 0, .., 0, 0, .., 0, dxq,1, dxq,2, .., dxq,n].

We skip the explanation on the output constraints as they are rearranged in the matrix

format in the same fashion as inputs. The objective function should be constructed as

[f]*[variables]. The slack variables for the non-ratio variables in the objective function is

easy to produce — just a vector of inverse standard deviation for each input and output.

For the ratio variables, because of the double summation in the objective, each reverse

standard deviation is transformed to a vector of size n and all the vectors come together

to build the [f ]. f = The artificial low performer DMU is added to the set from the

beginning, so bear in mind that n above included the low performers.

Upon testing the code, we realized that because of some nearly zero variables, the code

did not run as expected and MATLAB could not optimize the LP. To overcome this, we

have slightly changed the formulation to enable us to scale the inputs and outputs easily

so that MATLAB can handle the numbers. It is worth noting that our formulation is

units invariant so scaling up will not affect our actual results; it just helps the MATLAB


program. For ease of formulation in MATLAB, we also learned that normalizing the

inputs and outputs by the use of standard deviation from the beginning worked better

than normalizing at the end, in the objective function. Although on paper this means

a simple change of variable Snew = Sold/std, which does not affect the final solution, in

practice, the former worked better with MATLAB.

7.2 Case Study

The case we have chosen is derived from real data by one of the major Canadian banks.

For our purpose, we have selected all urban branches (132 in total) in one province of

Canada. It should be emphasized that the goal of this chapter is not to evaluate bank

branches but rather to show the merits of the newly developed models.

7.2.1 Bank branch data: choice of model, inputs and outputs

For our case study, let us assume that we are given a task of evaluating 130+ urban

bank branches in a certain region, in terms of resource allocation and profitability. The

model evaluates the way a branch converts its expenses into revenues through its six

revenue-generating streams. The information could assist the regional manager to spot

the best practices and pinpoint the weaknesses of low performers. It could, for example

help to better allocate resources to gain more also in a certain line of business such as

home mortgages. This could be achieved by either efforts in attracting more mortgages,

recruiting better advisors, upgrading the online system or by better managing/investing

the funds/commitments already in place and perhaps other things in real practice.

The output metrics are the return rates on six major lines of business: everyday

banking, mortgages, commercial loans, commercial deposits, wealth management, and

consumer lending. Rate of return is a simple notion and readily makes sense to everyone.

Such information makes decision-making easier for management. Because the resources


that each branch has are limited, they might be better off focusing on a few lines of

business they are good at to make more profit for the branch. We understand that

branches do not have the possibility of dropping out from any line of business, but upon

seeing the results, top management can decide on how to shift or add resources to better

deal with home mortgages, if that side of the business shows promise. The resources that

each branch has are: a combination of professional personnel and office staff (human

capital), which also provide size information, the equipment (IT hardware and software),

and the fixed assets as they are correlated highly with the status of the location of the

branch (economically affluent or deprived areas). Also, because of the importance of the

loan loss experience in the literature and in light of 2008 financial crisis, we included

the loan loss on the input side. This is to control the reward a branch might get for a

temporally high return rate on the loans, on the basis of taking high-risk clients. Input

metrics are merely the expenses related to the items mentioned. Summary of variables

are listed in Table 7.1.

In the following section, we will go over a few figures and compare the results of our

model to the traditional DEA additive model. We use the same inputs/outputs with the

traditional model, without any special treatment regarding the ratio variables. We also

include the artificial “low performer” DMU in the set. The traditional DEA additive

model will not generate a score: it just reports back the slacks. To make the comparison

fair, we transform the slacks into a score between zero and one for the traditional model

as well. For this purpose, we have normalized the slacks using the slacks for the low

performers, which obviously use the highest slacks and this will result in a score “zero”

for these units.

We would like to emphasize one more time that our goal here is just to show the

models’ merits rather than to solve a banking performance problem. The case study is

for illustration purposes only, and this is why we did not get ourselves into the selection

of right weights or imposing bounds for certain variables. We did not consult with the


Table 7.1: Input and output variables, rev=revenue and res=resources

INPUTS (expenses): OUTPUTS (rate of return):

Personnel Expense Non-interest Earnings

Equipment Expense Consumer Deposit rev. Consumer Deposits res.

Fixed Assets Consumer Lending rev. Consumer Lending res.

Loan Losses Wealth Management rev. Wealth Management res.

Cross Charges Home Owner Mortgages rev. Home Owner Mortgages res.

Commercial Deposits rev. Commercial Deposits res.

Commercial Loans rev. Commercial Loans res.

Other Expense

management of any branch to set lower and upper bounds for a specific input or output.

The important thing is that we keep everything in both models identical so that we can

focus on their differences, based on the proposed formulation of ratios and linearization

techniques. It is worth adding that we also controlled for the units and translation

invariance effects as the traditional additive model is not units invariant. We normalized

each input and output for the traditional additive model to eliminate the effect of units

of measurement.

7.2.2 Comparing the proposed model against traditional addi-

tive model

To be able to compare the results, we also solved the same problem with a traditional

DEA package, EMS. As described above, the values were normalized before being fed

into the program because the additive model in its original form is not units invariant


and an input or output might be favored just depending on the units of measurement.

The raw results from the EMS package were transformed to an efficiency score between

zero and one, using the highest slack which belongs to the artificial low performer. This

artificial low performer is a DMU which has the lowest level of every output and highest

in every input. The results are shown in figure 7.2.2, and they match our expectations.

Business profitability Average return rate jump Everyday Banking 3.72% Wealth Management 16.65% Home Mortgages 6.19% Consumer Lending 1.52% Commercial Deposits 0.04% Commercial Loans 3.48%

Expenses Average reduction (million $) Fixed Assets/Accruals -0.50 Loan Loss Experience -1.65 Employee Expense -2.70 Equipment Expenses -0.81 Other Losses -1.34 Cross Charges -1.80

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120 140 160

Effic

ienc

y Sc

ore

DMUs' ID

Gap in potential for improvement

Traditional Additive Model New Additive Model

Figure 7.1: Comparing efficiency scores against traditional model

The traditional model fails to capture the right frontier and in several cases, misses the

improvement opportunity. The frontier defined by our model sits on top of the traditional

frontier and sets the bar higher than the traditional one. The artificial low performer, of

course, gets a zero score in both models.

In total, the difference between scores is about 0.93 in this example and the range is

from 0 to 33%. This means that for some DMUs the efficiency score calculated by our


Table 7.2: Missed potential on savings at input side



0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120 140 160

Effic

ienc

y Sc

ore

DMUs' ID



Table 7.3: Missed opportunity for higher return on revenue



0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120 140 160

Effic

ienc

y Sc

ore

DMUs' ID



model is 33% lower than the efficiency score reported using the traditional DEA model.

This means there is a scope for those units to be better that has been overlooked. The

units could save 33% in expenses or generate 33% more or a combination of both. The

differences for all the units add up to 93%. Because the model collectively looks at input

reduction and output augmentation, how the efforts are devided between the two is not

readily readable from the final score. To see how this difference translates into better

use of resources and bettering performance in creating value from them (higher rates of

return), a closer look at inputs and outputs of the suggested targets is required to reveal

the net difference between the two models.

We calculated the difference in every input and output of every target (projection)

and summed them up in a general view. In some cases, our model has suggested the use

of more of a certain input compared to what EMS suggested, and this is because our

model focuses on redistribution of resources, which might mean using more in a certain

line of business that the branch is good at and less in another. However, considering all


the inputs, overall, the targets found by our new model resulted in savings of an extra

153.4 million dollars. The details of suggested further reductions in every input is in

Table 7.2. On the output side, on average, the rate of return rates can improve by 5%,

the details are listed in Table 7.3. A closer look at the seven highly referenced DMUs in

the new model shows that all of them are fully efficient in the traditional model and they

also include all four popular DMUs in the construction of benchmarks, in the traditional

model.

The PPS remains the same if the convexity and variable returns to scale assumptions

stay unchanged. This means that the choice of model (additive or BCC) does not alter

the PPS. This is indeed a fact that could assist us in solving the nonlinear case presented

in Chapter 6.

7.3 Case study, nonlinear BCC Model: approxima-

tion method

In the case of the BCC model, where the ratio variables existed on the side of orientation,

the case could not be linearized and we were left with no option except a heuristic one

to approximate the frontier. The algorithm randomly generated unobserved parts of the

PPS, using a linear combination of existing DMUs and recording them if they turned out

to be outside of the traditional PPS.

We tested this algorithm on our case study of 130 urban branches. We tried out the

algorithm with a sample size of 50 and 300 sampling repetitions (sample from the observed

DMUs). For each sample, we tried out 100 different random weight combinations. In 11

runs of the algorithm, we found 124 DMUs above the traditional frontier, and overall,

this took 2.5 minutes. We also tried the algorithm with a sample size of 20 and 200

random weight sets with the same sampling rate. Repeating the procedure four times,

added 9 more unobserved DMUs and took 52 seconds on the Juno server. In the end, we


decided that 133 unobserved DMUs is acceptable and stopped there. There is no golden

rule on when to stop; our decision was based on the quality of the unobserved DMUs

(outside the conventional envelopment form) and their spread (covering most parts of

the frontier, rather than clustering at one section). We need to stress that depending on

the results, one can run the algorithm only a few times, but there is no formula for the

number of times you need to run for an arbitrary number of DMUs. In our work, for

example in one run, the program found 84 unobserved DMUs, whereas with everything

staying the same as in the first run, we got nothing new. This is a random sampling and

it is not possible to predict the outcome for each trial. Having said that, on average,

the number of required trials is not large. We added these unobserved DMUs to the

0

0.2

0.4

0.6

0.8

1

1.2

133 153 173 193 213 233 253

Effic

ienc

y Sc

ore

DMUs' ID

0%

2%

4%

6%

8%

10%

12%

14%

0 20 40 60 80 100 120 140

Redu

ctio

n in

Eff

icie

ncy

Scor

e

DMUs' ID

Figure 7.2: drop/raise in the efficiency score of branches after adding unobserved DMUs

generated by the approximation method

original set and recalculated the efficiency score using an output oriented BCC model.

The average reduction in the score is 1.7% and this ranges, from zero to 11.5%. Therefore

our approximation model created a benchmark which is above the traditional one and

pushes the branches to aim for higher performance. This is not the whole story as there


could exist DMUs that are labeled efficient in the traditional model, yet in reality, they

are operating at 88% efficiency, as we can see in figure 7.2. As seen in the Figure 7.3, the

0

0.2

0.4

0.6

0.8

1

1.2

133 153 173 193 213 233 253

Effic

ienc

y Sc

ore

DMUs' ID

0%

2%

4%

6%

8%

10%

12%

14%

0 20 40 60 80 100 120 140

Redu

ctio

n in

Eff

icie

ncy

Scor

e

DMUs' ID

Figure 7.3: Efficiency score of the unobserved DMUs generated by the approximation

method

unobserved DMUs mostly achieve an efficiency score of one, which is expected. However,

there are a few with a score below one. Recall that these units are reported as inefficient in

comparison with the other unobserved DMUs we added later. If we add each unobserved

DMU individually to the original data, that unit could achieve full efficiency, but this is

not necessarily the case when other unobserved DMUs are added as well to the original

data. Although every unobserved DMU is slightly outside the traditional envelopment,

it may need to work harder to get to the approximated frontier which is closer to the true

frontier. We know that the PPS for the BCC model and additive model is the same and

the difference in results lie in how we measure the distance to the frontier. The targets

identified by our linearized additive model can also be added to the original data set for

approximating the nonlinear BCC frontier. In this case, there was no significant benefit


when we just used the efficient units from our non-oriented model to judge the quality of

the random unobserved DMUs. Another option is to include the efficient units from the

linearized additive model in the approximation algorithm to generate unobserved DMUs.

Chapter 8

Recommendations and future work

This chapter offers a synopsis of the work in this thesis and the contributions made, in

addition to the conclusions drawn from the results of the case study, in the previous

chapter. Opportunities for future work will also be described.

8.1 Contributions

We had a problem to address: using ratios as they are in the existing DEA models would

lead to distortions in the frontier. Consequently, this will misrepresent the opportunities

for improvement because of faulty targets. In this theoretical work, we have offered

solutions to these problem:

1. Developed two non-oriented models similar to Russell Graph Measure, and En-

hanced Russell Measure, which are modified to take in ratio variables.

2. Developed a new non-oriented model to deal with ratios, which satisfies almost all

of the desired properties we collected going through the models in the literature;

3. Reduced all the models to a linear case to make it work in practice; and

4. Proposed an approximation model to deal with the case of the BCC model with

131

Chapter 8. Recommendations and future work 132

ratios at the side on orientation, where the model could not be reduced to a linear

form.

Both developed models were tested on a small case of 130 urban bank branches of a

major Canadian bank in one Canadian province and the results proved the superiority

of our model to other existing techniques.

8.2 Discussion of the Results: Proposed model

Upon closer inspection of the results from the previous chapter, we realized that in

some instances the MATLAB optimization package has been unable to solve the LP to

the final optimal value (we can tell this by the value of the exit flag of the MATLAB

function). Having said that, MATLAB did report the best it could have achieved but

did not guarantee that it was optimum. In some instances, it was unable to find the

best weights, for example. This is not a surprise as the optimization algorithms are not

perfect and cannot avoid the degeneracy and cycling which means the algorithm does

not converge and becomes trapped in a loop [Gass 04]. This has happened for only 7%

of the cases. The DMUs achieved a reasonable score (between 0.7 and 0.9) but the

program could not decide where on the frontier they are best to target. This is not the

shortcoming of our model and this is down to the linear programming in general and

optimisation methods embedded in Matlab program in which we have no control. To

avoid any doubts, we excluded those instances from both models and we base the rest of

this discussion on the remaining 121 branches. We have also taken out the bad dummy

DMU to make sure the great improvement for the non-existing DMU does not inflate our

results.


8.2.1 Efficiency Scores

The efficiency scores reported are between zero and one, which makes the comparison

more convenient. The average efficiency score obtained through the proposed model was

85%, which is fairly consistent with the scores that are generally obtained from DEA

branch analyses. Banking is an established and profitable business and it is expected

that, on average, they operate reasonably well and are about 85% efficient [Akhi 03],

[McNu 05], [Fuen 03]. The additive DEA model with no consideration for ratios, reports

an average of 92% efficiency, which is an overestimation and does not provide good

discrimination. This is mainly beause the of the incorrect formulation due to ratio

variables.

Highly referenced DMUs in our proposed model were also marked as best performer

in the traditional DEA evaluations. Examining the characteristics of the highly refer-

enced DMUs could provide further insight into why they performed better than their

counterparts and would help management consider other factors pertinent to their better

performance and create guidelines for other units to emulate. In our dataset, we only

focused on one sector and region, and the specialty branches were excluded to have bal-

anced data and eliminate the risk of having outliers. Our goal was to test the models

rather than giving consultations to the management in their decision-making.

8.2.2 Direction of improvement

DEA not only identifies the inefficient units but also proposes a path on how to improve,

usually by setting a target which is a point on the frontier and is made up of one or a

combination of efficient units. Our proposed model has, as expected, set the bar higher

and as a result, has envisioned greater improvements for the inefficient units. Looking at

the absolute values of the targets, the added savings suggested by our model is shown in

the Table 8.1. In the additive model and with the variable returns to scale assumption,


Table 8.1: Further savings on inputs (million $)

Fixed Assets/Accruals

Loan Loss Experience

Employee Expense

Equipement Expenses

Other Losses

Cross Charges

‐14.90 ‐13.36 ‐66.95 ‐10.21 ‐6.33 ‐22.03

it is possible that the model advises getting some more of the inputs if, by comparison,

it finds that more input will make the unit more efficient due to economies of scale. Of

course, the use of weights and cone ratios can help us achieve more attainable targets;

the tolerances and lower and upper bounds for each variable need to be developed in

close relationship with management. Our goal here was to merely test the models and

not to aid the decision-making.

8.3 Discussion of the results: approximation method

None of the branches were left without a direction of improvement in our approximation

method so we are basing our discussion on the 130 branches with which the algorithm has

begun. The average efficiency score of the unobserved DMUs is 96% with the majority

being one and a few identified as inefficient. We aimed to find unobserved DMUs which

could take the frontier to the next level and expected most of them to attain an efficiency

score of one. We need to bear in mind that the selection of the best hypothetical DMUs

in any round depends on what data we have accumulated until that point and we cannot

predict future rounds. As a result, we may end up with having a few unobserved DMUs

that will turn out inefficient compared to the other unobserved DMUs we find. The

efficiency score average after the update of the PPS is 93%, which is high but it shows

improvement over the 95% average obtained using the traditional model. When compared

with the proposed model efficiency average, 87%, we can see the gap between our estimate

and the optimal reality. With more repetitions or alteration of sample size and weight

sets, results could be improved one more level. The average might sound discouraging


but with a closer look, we can identify that some units have an 11% drop in the efficiency

score, which is equivalent to an opportunity for furthur improvement that was masked

before because of a false estimate of the frontier.

8.4 Recommendation, limitations and future direc-

tions

Our work involved the development of two very different models to enable the ratio vari-

ables in DEA and, more importantly, make the models work computationally. The goal

has been to properly define the PPS and reduce the nonlinear form to a linear program

as well as proposing a heuristic method to solve nonlinear cases. Overall, we examined

all existing models and identified the desirable characteristics that we eventually wanted

our model to accommodate, while identifying the PPS correctly. In total, more than

20 models were thoroughly examined and a methodology was developed on how to de-

fine/develop new models that could be used for creating other models as well. Our final

models were tested on a simple yet powerful, case study with real data. The results

confirm the applications of our models. We also found that our method in reducing the

nonlinear format to a linear format could be applicable to the popular Russell model,

and could make it more accessible to practitioners. Our models were able to find the

best practices and showed good discriminatory power.

We were not concerned much with the practicality of the targets. If found unrealistic,

it is not difficult to add a few weight restrictions as new constraints to our LP in the

proposed model to fix this issue. For the approximation model, an extra step can be

added after random number generation so the new points in the PPS gets checked against

constraints on the variables.

We proposed some heuristic rules for convergence criterion in our approximation

method. Heuristic rules are not set in stone and there is always room to devise new


forms with quicker convergence or more deterministic procedures.

Due to the complexity of using ratio variables and the need to have the information

about the numerator and the denominator, it is advisable that, whenever possible, normal

data or some proxy measures to be used. To report back to management, a desirable

ratio format could be reconstructed from normal data and proxies (to some extent). Our

method is to be used only if the data is fully available and the use of ratio variables will

result in better targets and insights.

In computer programming, overflow occurs when an arithmetic operation attempts to

create a numeric value that is too large to be stored in the available space. For instance,

for calculating the average of some numbers, as done in many search algorithms, first

data values are added up then divided by the number of data points. This causes error or

unexpected results if the sum, (not necessarily the resulting mean)is too large. Similarly

underflow can happen where the result of a calculation is smaller than the smallest value

defined in the package. For instance a small number in the denominator might be saved

as zero in the computer and hence generates an error when the value of the fraction is

to be accessed. In such instances the code does not perform as it was expected. We

faced underflow issue and we multiplied our numbers by factors of 100 and 1000 and,

of course, at the end, rescaled them back to be in the same range as the actual data.

One must check for the exit flags in the optimization algorithm in MATLAB to make

sure that the results are indeed optimum. There could be some cases, as we experienced,

where a final optimum solution to the objective function, though defined, could not be

attained. Often, an ad-hoc remedy could be achieved by changing input/output variables

by one tenth of a percentage point. Overall, depending on the data at hand, customized

adjustments might be required.

Several areas for future development arise from this work in the area of embedding

ratio variables in DEA models, and the opportunities are mainly associated with im-

plementation techniques of the models developed. These further directions of research


include:

• Extending the computer codes to be able to adjust variable ranges automatically,

and adding various checkpoints at the appropriate steps to avoid non-convergence

and impossible solutions, while using the MATLAB optimization tool.

• Providing solutions for the DMUs for which the optimization toolbox is unable to

find the optimum.

• Implementing the reduced model in different coding languages or software pack-

ages that might have better capabilities and making it possible to integrate the

model into existing DEA software packages or making it an add-on to the existing

commercial packages.

• There is always room and scope to create a better heuristic when it comes to

non-optimal and non-deterministic solutions such as our approximation models.

• Alongside the implementation routes suggested above, one other non-technical as-

pect, which might revive the use of ratios in DEA and broaden the DEA market,

is to use non-ratio data but then rework the presentation and craft results into

popular ratio forms.


Additive Model DEA model which has no orientation and measures ef-

ficiency maximizing both the input and output slacks

simultaneously.

BCC or VRS Model DEA model which assumes a variable returns to scale

relationship between inputs and outputs.

CCR or CRS Model DEA model which assumes a constant returns to scale

relationship between inputs and outputs.

Ray Unboundedness Scaling up or down of any realized DMU generates a

new feasible DMU.

CRS Constant Returns to Scale A measure where a proportionate increase in inputs

results in an identical proportionate increase in out-

puts.

VRS Variable Returns to scale A measure where a proportionate increase in inputs

does not result in an identical proportionate increase

in outputs.

Convexity An axiom that requires the the multipliers are

summed up to one, when creating linear combination

of DMUs.

Input-Oriented Model DEA model whose objective is to minimize inputs

while keeping outputs constant.

DEA Data Envelopment Analysis A non-parametric, linear programming technique

used for measuring the relative efficiency of

units,considering multiple inputs and outputs simul-

taneously.


RA Ratio Analysis A technique that uses the ratio of a single output to

a single input and generates a relative efficiency score

by dividing the aforesaid ratio by the corresponding

“best performers” ratio, on this specific ratio defini-

tion.

DMU Decision Making Unit Term used to describe a unit under study such as

bank branch, hospital, firm, etc.

Free Disposability axiom An assumption that says if DMUi is feasible then any

DMU that is doing worse, producing less or consum-

ing more, can be realized too.

Efficient Frontier The facets and edges of the PPS, representing the

most efficient units.

Output-Oriented DEA model whose objective is to maximize outputs

while keeping inputs constant.

Reference Group Set of efficient units to which the inefficient unit has

been most directly compared when calculating its ef-

ficiency rating in DEA.

PPS Production Possibility Set Given the observed data, the set of all possible in-

put/output combinations that could exist.

Profitability Efficiency Model DEA Model that captures the business operations of

a bank branch using revenues ratios as outputs and

branch expenses as inputs.

Full Efficiency Full efficiency is attained by any DMU if and only if

none of its inputs or outputs can be improved without

worsening some of its other inputs or outputs.


Full Relative or Technical efficiency Full technical efficiency is attained by any DMU if and

only if, compared to other observed DMUs, none of its

inputs or outputs can be improved without worsening

some of its other inputs or outputs.

Technical change It is the relative efficiency of the entity when com-

pared to a broader or newer peer groups.

Scale efficiency Scale efficiency represents the failure in achieving the

most productive scale size and is the difference be-

tween CRS and VRS models.

Input Slack factor Identifies how much one of the inputs can be reduced

without changing other inputs or outputs.

Input substitution factor Identifies the smallest value for one specific input

among the DMUs belonging to the PPS.

Output Slack factor Identifies how much one of the outputs can be in-

creased without changing other outputs or inputs.

Output substitution factor Identifies the largest value for one specific output

among the DMUs belonging to the PPS.

FDH Free Disposal Hull assumption adds to the observed

production data, the unobserved production points

with output levels equal to or lower than those of

some observed points and more of at least one input;

or with input levels equal to or higher than those of

some observed points and less of at least one output.

Partial m Frontier A method for forming the frontier that does not im-

pose convexity on the production set and allows for

noise (with zero expected values) and as a result is

less sensitive to outliers.


Quantile Frontier A continuous version of partial m method to form the

frontier and more robust to the presence of outliers.

Proposed Non-oriented Model A modified additive DEA model presented in this

work which imposes convexity on the DMUs when

ratio variables are involved and it is units and trans-

lation invariant. The efficiency score is between zero

and one.

LP Linear programming A method to achieve the best outcome (such as max-

imum output or minimum input) in a mathematical

model whose requirements are represented by linear

relationships.

Bootstrapping Bootstrapping is a re-sampling technique to approx-

imate the distribution of a random variable in order

to estimate a specific statistic of the population.

Monte Carlo A broad class of computational algorithms that rely

on repeated random sampling to obtain numerical re-

sults.

MCMC Markov Chain Monte Carlo

Methods

A class of algorithms for sampling from a probability

distribution based on constructing a Markov chain

that has the desired distribution as its equilibrium

distribution.

Sparse matrix In numerical analysis, a sparse matrix is a matrix in

which most of the elements are zero.

EMS Efficiency Measurement Sys-

tem

A Data Envelopment Analysis (DEA) Software by

Holger Scheel.


MATLAB (matrix laboratory) It is a multi-paradigm numerical computing environ-

ment and fourth-generation programming language

developed by Mathworks.

References

[Aida 98] Aida, K., Cooper, W., Pastor, J., and Sueyoshi, T. “Evaluating water sup-

ply services in Japan with RAM a range adjusted measure of inefficiency”.

OMEGA: International Journal of Management Science, Vol. 26, pp. 207–

232, 1998.

[Akhi 03] Akhigbe, A. and McNulty, J. E. “The profit efficiency of small US commercial

banks”. Journal of Banking and Finance, Vol. 27, pp. 307–325, 2003.

[Alex 10] Alexander, W. R. J., Haug, A. A., and Jaforullah, M. “A two-stage double-

bootstrap data envelopment analysis of efficiency differences of New Zealand

secondary schools”. Journal of Productivity Analysis, Vol. 34, No. 2, pp. 99–

110, 2010.

[Ali 93] Ali and Seiford, L. “Computational accuracy and infinitesimals in Data

Envelopement Analysis”. Infor, Vol. 31, pp. 290–297, 1993.

[Ande 93] Andersen, P. and Petersen, N. C. “A Procedure for Ranking Efficient Units in

Data Envelopment Analysis”. Management Science, Vol. 39, pp. 1261–1264,

1993.

[Apar 07] Aparicio, J., Ruiz, J. L., and Sirvent, I. “Closest targets and minimum dis-

tance to the pareto-efficientfrontier in DEA”. Journal of Productivity Anal-

ysis, Vol. 28, pp. 209–218, 2007.

143

References 144

[Arag 05] Aragon, Y., Daouia, A., and Thomas-Agnan, C. “Nonparametric frontier

estimation: A conditional quantile-based approach”. Econometric Theory,

Vol. 21, No. 2, pp. 358–389, 2005.

[Asmi 10] Asmild, M. and Pastor, J. “Slack free MEA and RDM with comprehensive

efficiency measure”. OMEGA: International Journal of Management Science,

Vol. 38, pp. 475–483, 2010.

[Bank 84] Banker, R., Charnes, A., and Cooper, W. “Models for the estimation of tech-

nical and scale inefficiencies in Data Envelopment Analysis”. Management

Science, Vol. 30, No. 9, pp. 1078–1092, 1984.

[Bank 87] Banker, R., Charnes, A., Cooper, W., and Maindiratta, A. “A comparison

of data envelopment analysis and translog estimates of production frontiers

using simulated observations from a known technology”. Applications in

Modern Production Theory Inefficiency and Productivity, 1987.

[Bank 93] Banker, R., Gadh, V., and Gorr, W. “A Monte Carlo comparison of two

production frontier estimation methods: corrected ordinary least squares

and data envelopment analysis”. European Journal of Operational Research,

Vol. 67, 1993.

[Bous 09] Boussemart, J.-P. and Leleu, H. “Measuring potential gains from specializa-

tion under non-convex technologies”. IESEG School of Management Working

Papers, No. 2, 2009.

[Bowl 04] Bowlin, W. F. “Financial analysis of civil reserve air fleet participants us-

ing data envelopment analysis”. European Journal of Operational Research,

Vol. 154, pp. 691–709, 2004.

References 145

[Broc 98] Brockett, P., Cooper, W., Shin, H., and Wang, Y. “Inefficiency and Conges-

tion in Chinese Production Before and After the 1978 Economic Reforms”.

Socio-Economic Planning Sciences, Vol. 32, pp. 1–20, 1998.

[Caza 02] Cazals, C., Florens, J. P., and Simar, L. “Nonparametric frontier estimation:

a robust approach.”. Journal of Econometrics, Vol. 106, pp. 1–25, 2002.

[Cham 96] Chambers, R. G., Chung, Y., and Fare, R. “Benefit and distance functions”.

Journal of Economic theory, Vol. 70, 1996.

[Char 78] Charnes, A., Cooper, W. W., and Rhodes, E. “Measuring the efficiency of

decision making units”. European Journal of Operational Research, Vol. 2,

pp. 429–444, 1978.

[Char 81] Charnes, A., Cooper, W. W., and Rhodes, E. “EVALUATING PROGRAM

AND MANAGERIAL EFFICIENCY: with an illustrative APPLICATION

to the PROGRAM FOLLOW THROUGH experiment in U.S. public school

education”. Management Science, Vol. 27, pp. 668–697, 1981.

[Char 82] Charnes, A., Cooper, W., Seiford, L., and Strutz, J. “A multiplicative model

for efficiency analysis”. Socio-Economic Planning Sciences, Vol. 16, No. 5,

pp. 223–224, 1982.

[Char 83] Charnes, A., Cooper, W., Seiford, L., and Strutz, J. “Invariant Multiplicative

efficiency and piecewise Cobb-Duglas Envelopements”. Operations Research

Letters, Vol. 2, No. 3, pp. 101–103, 1983.

[Char 85] Charnes, A., Cooper, W., Golany, B., Seiford, L., and Strutz, J. “Founda-

tions of data envelopment analysis for Pareto-Koopmans efficient empirical

production functions”. Journal of Economics, Vol. 30, pp. 91–107, 1985.

References 146

[Char 87] Charnes, A., Cooper, W., Rousseau, J., and Semple, J. “Data Envelopment

Analysis and Axiomatic notions of efficiency and reference sets”. Tech. Rep.,

Center for Cybernetic studies, the university of texas, austin, 1987.

[Chen 02a] Chen, Y. and Ali, A. I. “Output-input ratio analysis and DEA frontier”.

European Journal of Operational Research, Vol. 142, pp. 476–479, 2002.

[Chen 02b] Chen, Y. and Ali, A. I. “Output-input ratio analysis and DEA frontier”.

European Journal of Operational Research, Vol. 142, pp. 476–479, 2002.

[Chen 07] Chen, W.-C. and McGinnis, L. F. “Reconciling ratio analysis and DEA as

performance assessment tools”. European Journal of Operational Research,

Vol. 178, pp. 277–291, 2007.

[Chen 14] Chen, K. and Kou, M. “Weighted Additive DEA models associated with

dataset standardization techniques”. Tech. Rep., Chinese Academy of sci-

ences, 2014.

[Cher 01] Cherchye, L., Kuosmanen, T., and Post, G. T. “Alternative Treatments of

Congestion in DEA: A rejoinder to Cooper, Gu, and Li”. European Journal

of Operational Research, Vol. 132, No. 1, pp. 75–80, 2001.

[Cher 99] Cherchye, L., Kuosmanen, T., and Post, T. “Why convexify ? An assessment

of convexity axioms in DEA”. Helsinki School of Economics and Business

Administration Working Papers, 1999.

[Chun 97] Chung, Y. H., Fare, R., and Grosskopf, S. “Productivity and undesirable out-

puts: A directional distance function approach”. Journal of Environmental

Management, Vol. 51, 1997.

[Conc 03] Conceicao, M., Portela, A. S., Borges, P. C., and Thanassoulis, E. “Finding

closest targets in non-oriented Data Envelopment Analysis models: the case

References 147

of convex and non-convx technologies”. Journal of Productivity Analysis,

Vol. 19, 2003.

[Cook 14] Cook, W. D., one, K., and Zhu, J. “Data Envelopment Analysis: Prior to

choosing a model”. Omega International, Journal of Management Science,

Vol. 44, pp. 1–4, 2014.

[Coop 01a] Cooper, W. W., Gu, B., and Li, S. “Comparison and evaluation of alternative

approaches to the Treatments of Congestion in DEA”. European Journal of

Operational Research, Vol. 132, No. 1, pp. 62–74, 2001.

[Coop 01b] Cooper, W. W., Gu, B., and Li, S. “Note: Alternative Treatments of Con-

gestion in DEA- a response to the Cherchye, Kuosmanen and Post critique”.

European Journal of Operational Research, Vol. 132, No. 1, pp. 81–87, 2001.

[Coop 04] Cooper, W. W., Seiford, L. M., and Zhu, J. Handbook on data envelopment

analysis. Kluwer Academic Publishers, 2004.

[Coop 11] Cooper, W. W., Pastor, J. T., Borras, F., and Pastor, A. D. “BAM: a

bounded adjusted measure of efficiency for use with bounded additive mod-

els”. Journal of Productivity Analysis, Vol. 35, 2011.

[Coop 95] Cooper, W. W. and Pastor, J. T. “Global Efficiency Measurement in DEA”.

Working paper, Depto Este Inv. Oper. Universidad Alicante, Alicante, Spain.,

1995.

[Coop 99a] Cooper, W., Park, K. S., and Pastor, J. “RAM: A Range Adjusted Measure

of Inefficiency for Use with Additive Models, and Relations to Other Models

and Measures in DEA”. Journal of Productivity Analysis, Vol. 11, pp. 5–42,

1999.

References 148

[Coop 99b] Cooper, W., Park, K. S., and Yu, G. “IDEA and AR-IDEA: Models for

dealing with imprecise data in DEA”. Management Science, Vol. 45, No. 4,

pp. 597–607, 1999.

[Cron 02] Cronje, J. J. L. “Data Envelopment Analysis as a measure for technical

efficiency measurement in banking - a research framework”. Southern African

Business Review, Vol. 6, No. 2, pp. 32–41, 2002.

[Daou 07] Daouia, A. and Simar, L. “Nonparametric Frontier estimation: A Multi-

variate Conditional Quantile Approach”. Journal of Econometrics, Vol. 140,

No. 2, pp. 375–400, 2007.

[Depr 84] Deprins, D., Simar, L., and Tulkens, H. “Measuring labor-efficiency in post

offices”. In: Marchand, M., Pestieau, P., and Tulkens, H., Eds., The perfor-

mance of public enterprises: concepts and measurement, Amsterdam, North-

Holland, 1984.

[Desp 07] Despic, O., Despic, M., and Paradi, J. C. “DEA-R: ratio-based comparative

efficiency model, its mathematical relation to DEA and its use in applica-

tions”. Journal of Productivity Analysis, Vol. 28, pp. 33–44, 2007.

[Dyso 10] Dyson, R. G. and Shale, E. “Data envelopment analysis, operational research

and uncertainty”. Journal of the Operational Research Society, Vol. 61, No. 1,

pp. 25–34, 2010.

[Efro 79] Efron, B. “Bootstrap methods: another look at the jackknife”. The annals

of Statistics, pp. 1–26, 1979.

[Efro 82] Efron, B. and Efron, B. The jackknife, the bootstrap and other resampling

plans. Vol. 38, SIAM, 1982.

References 149

[Efro 94] Efron, B. and Tibshirani, R. J. An introduction to the bootstrap. CRC press,

1994.

[Emro 08] Emrouznejad, A., Parker, B. R., and Tavares, G. “Evaluation of research

in efficiency and productivity: A survey and analysis of the first 30 years of

scholarly literature in DEA”. Socio-Economic Planning Sciences, 2008.

[Emro 09] Emrouznejad, A. and Amin, G. R. “DEA models for ratio data: Convexity

consideration”. Applied Mathematical Modeling, Vol. 33, No. 1, pp. 486–498,

2009.

[Fare 00] Fare, R. and Grosskopf, S. “Theory and application of directional distance

functions”. Journal of Productivity Analysis, Vol. 2, pp. 93–104, 2000.

[Fare 10a] Fare, R. and Grosskopf, S. “Directional distance functions and slacks-based

measures of efficiency”. European Journal of Operational Research, Vol. 200,

pp. 320–322, 2010.

[Fare 10b] Fare, R. and Grosskopf, S. “Directional distance functions and slacks-based

measures of efficiency: Some clarifications”. European Journal of Operational

Research, Vol. 206, p. 702, 2010.

[Fare 78] Fare, R. and Lovell, C. “Measuring the Technical Efficiency of Production”.

Journal of Economic Theory, Vol. 19, pp. 150–162, 1978.

[Fare 83a] Fare, R. and Grosskopf, S. “Measuring Congestion in Production”.

Zeitschrift fur Nationalokonomie, Vol. 43, pp. 257–271, 1983.

[Fare 83b] Fare, R., Lovell, C. A. K., and Zieschang, K. “Measuring the Technical

Efficiency of Multiple Output Production Technologies”. In: Eichhorn, W.,

Henn, R., Neumann, K., and Sheppard, R. W., Eds., Quantitative Studies

on Production and Prices, Springer, Wien, 1983.

References 150

[Fare 85] Fare, R., Grosskopf, S., and Lovell, C. The Measurement of Efficiency of

Production. Boston: Kluwer-Nijhoff Publishing, 1985.

[Farr 57] Farrell, M. “The measurement of productive efficiency”. Journal of Royal

Statistical Society, Series A, Vol. 120, No. 3, pp. 253–281, 1957.

[Farr 59] Farrell, M. “Convexity assumption in theory of competitive markets”. Jour-

nal of Political Economy, Vol. 67, 1959.

[Fero 03] Feroz, E. H., Kim, S., and Raab, R. L. “Financial statement analysis: A

Data Envelopment Analysis approach”. Journal of the Operational Research

Society, Vol. 54, pp. 48–58, 2003.

[Ferr 97] Ferrier, G. D. and Hirschberg, J. G. “Bootstrapping confidence intervals

for linear programming efficiency scores: With an illustration using Italian

banking data”. Journal of Productivity Analysis, Vol. 8, No. 1, pp. 19–33,

1997.

[Ferr 99] Ferrier, G. D. and Hirschberg, J. G. “Can we bootstrap DEA scores?”.

Journal of Productivity Analysis, Vol. 11, No. 1, pp. 81–92, 1999.

[Frei 99] Frei, F. X. and Harker, P. T. “Projections onto efficient frontiers: Theoret-

ical and conceptual extensions to DEA”. Journal of Productivity Analysis,

Vol. 11, No. 3, pp. 275–300, 1999.

[Fuen 03] Fuentes, R. and Vergara, M. “Explaining Bank Efficiency: Bank Size or

Ownership Structure?”. Proceedings of the VIII Meeting of the Research

Network of Central Banks of the Americas, 2003.

[Fuku 09] Fukuyama, H. and Weber, W. “A directional slack based measure of technical

inefficiency”. Socio-Economic Planning Sciences, Vol. 43, pp. 274–287, 2009.

References 151

[Gall 03] Gallagher, T. J. and Andrew, J. D. Financial Management. Freeload Press

Ltd., 2003.

[Gass 04] Gassa, S. I. and Vinjamuri, S. “Cycling in linear programming problems”.

Computers and Operations Research, Vol. 31, pp. 303–311, 2004.

[Gema 84] Geman, S. and Geman, D. “Stochastic relaxation, Gibbs distributions, and

the Bayesian restoration of images”. Pattern Analysis and Machine Intelli-

gence, IEEE Transactions on, No. 6, pp. 721–741, 1984.

[Gong 92] Gong, B. and Sickles, R. “Finite sample evidence on the performance of

stochastic frontiers and data envelopment analysis using panel data”. Journal

Economet, Vol. 51, 1992.

[Gonz 07] Gonzalez-Bravo, M. I. “Prior-Ratio-Analysis procedure to improve data en-

velopment analysis for performance measurement”. Journal of the Opera-

tional Research Society, Vol. 58, pp. 1214–1222, 2007.

[Gree 97] Green, R. H., Cook, W., and Doyle, J. “A Note on the Additive Data

Envelopment Analysis Model”. The Journal of the Operational Research

Society, Vol. 486, No. 4, pp. 446–448, 1997.

[Grif 99] Grifell-Tatje, E. and Lovell, C. “Profits and productivity”. Management

Science, Vol. 45, No. 9, pp. 1177–1193, 1999.

[Gsta 03] Gstach, D. “A Statistical Framework for Estimating Output-Specific Effi-

ciencies”. 2003.

[Gsta 95] Gstach, D. “Comparing structural efficiency of unbalanced subsamples: A

resampling adaptation of data envelopment analysis”. Empirical Economics,

Vol. 20, No. 3, pp. 531–542, 1995.

References 152

[Hast 70] Hastings, W. K. “Monte Carlo sampling methods using Markov chains and

their applications”. Biometrika, Vol. 57, No. 1, pp. 97–109, 1970.

[Holl 03] Hollingsworth, B. and Smith, P. “Use of ratios in data envelopment analysis”.

Applied Economics Letters, Vol. 10, pp. 733–735, 2003.

[Holl 06] Hollo, D. and Nagy, M. “Bank Efficiency in the Enlarged European Union”.

MNB working papers, 2006.

[Knei 03] Kneip, A., Simar, L., Wilson, P. W., et al. “Asymptotics for DEA estimators

in nonparametric frontier models”. Tech. Rep., Discussion paper, 2003.

[Kriv 08] Krivonozhko, V. E., Utkin, O. B., Safin, M. M., and Lychev, A. V. “On

comparison of the Ratio Analysis and the DEA approach in financial area”.

Conference on the Uses of Frontier Efficiency Methodologies for Performance

Measurement in the Financial Services Sector, 2008.

[Levk 12] Levkoff, S. B., Russell, R. R., and Schworm, W. “Boundary problems with

the Russell graph measure of technical efficiency: a refinement”. Journal of

Productivity Analysis, Vol. 37, No. 3, pp. 239–248, 2012.

[Lewi 97] Lewin, A. Y. and Seiford, L. M. “Extending the frontiers of Data Envelop-

ment Analysis”. Annals of Operations Research, Vol. 73, No. 0, pp. 1–11,

1997.

[Loth 98] Lothgren, M. “How to bootstrap DEA estimators: a Monte Carlo compari-

son”. WP in Economics and Finance, No. 223, 1998.

[Loth 99] Lothgren, M. and Tambour, M. “Bootstrapping the data envelopment anal-

ysis Malmquist productivity index”. Applied Economics, Vol. 31, No. 4,

pp. 417–425, 1999.

References 153

[Love 95a] Lovell, C. K., Pastor, J. T., and Turner, J. A. “Measuring macroeconomic

performance in the OECD: A comparison of European and non-European

countries”. European Journal of Operational Research, Vol. 87, No. 3,

pp. 507–518, 1995.

[Love 95b] Lovell, C. K. and Pastor, J. “Units Invariant and Translation invariant DEA

models”. Operations Research Letters, Vol. 18, 1995.

[Luen 92] Luenberger, D. “Benefit functions and duality”. Journal of Mathematical

Economics, Vol. 21, 1992.

[McNu 05] McNulty, J. “Profit Efficiency Sources and Differences among Small and

Large U.S. Commercial Banks”. Journal of Economics and Finance, 2005.

[Metr 49] Metropolis, N. and Ulam, S. “The monte carlo method”. Journal of the

American statistical association, Vol. 44, No. 247, pp. 335–341, 1949.

[Metr 53] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and

Teller, E. “Equation of state calculations by fast computing machines”. The

journal of chemical physics, Vol. 21, No. 6, pp. 1087–1092, 1953.

[Naja 05] Najadat, H., Nygard, K. E., and Schesvold, D. “Clustering-Based Method

for Data Envelopment Analysis.”. In: MSV, pp. 255–264, 2005.

[Oles 03] Olesen, O. and Petersen, N. “Identification and use of efficiet faces and facets

in DEA”. Journal of Productivity Analysis, Vol. 20, pp. 323–360, 2003.

[Park 00] Park, B., Simar, L., and Weiner, C. “The FDH estimator for productivity

efficiency scores”. Econometric Theory, Vol. 16, pp. 855–877, 2000.

[Past 13a] Pastor, J., Aparicio, J., Monge, J., and Pastor, D. “Modeling CRS bounded

additive DEA models and characterizing their Pareto-efficient points”. Jour-

nal of Productivity Analysis, Vol. 40, No. 3, pp. 285–292, 2013.

References 154

[Past 13b] Pastor, J., Aparicio, J., Monge, J., and Pastor, D. “Modeling CRS bounded

additive DEA models and characterizing their Pareto-efficient points”. Jour-

nal of Productivity Analysis, Vol. 40, No. 3, pp. 285–292, 2013.

[Past 96] Pastor, J. “Chapter 3 Translation invariance in data envelopment analysis:

A generalization”. Annals of Operations Research, Vol. 66, No. 2, pp. 91–102,

1996.

[Past 99a] Pastor, J. T., Ruiz, J. L., and Sirvent, I. “An enhanced DEA Russell graph

efficiency measure”. European Journal of Operational Research, Vol. 115,

pp. 596–607, 1999.

[Past 99b] Pastor, J., Ruiz, J., and Sirvent, I. “An enhanced DEA Russell graph effi-

ciency measure”. European Journal of Operational Research, Vol. 115, 1999.

[Pate 00] Paterson, I. “New Models for Data Envelopment Analysis Measuring Effi-

ciency Out with the VRS Frontier”. 2000.

[Port 02] Portela, A. S. and Thanassoulis, E. “Profit efficiency in DEA”. Darmstadt

Discussion Papers in Economics , Aston Business School, 2002.

[Port 04] Portela, M. C. A. S., Thanassoulis, E., and Simpson, G. “Negative Data in

DEA, A directional distance approach applied to bank branches”. Journal

of the Operational Research Society, Vol. 55, pp. 1111–1121, 2004.

[Port 07] Portela, M. and Thanassoulis, E. “Developing a decomposable measure of

profit efficiency using DEA”. Journal of the Operational Research Society,

Vol. 58, No. 4, pp. 481–490, 2007.

[Ray 00] Ray, S. C. Data Envelopment Analysis: Theory and Techniques for Eco-

nomics and Operations Research. Kluwer Academic Publishers, 2000.

References 155

[RGDy 01] R.G.Dyson, Allen, R., Camanho, A., Podinovski, V., Sarrico, C., and Shale,

E. “Pitfalls and protocols in DEA”. European Journal of Operational Re-

search, pp. 245–259, 2001.

[Rolf 89] Rolf, F., Grosskopf, S., Lovell, C., and Pasurka, C. “Multilateral Produc-

tivity Comparisons When Some Outputs Are Undesirable: A Nonparametric

Approach”. Review of Economics and Statistics, Vol. 71, pp. 90–98, 1989.

[Rugg 98] Ruggieroa, J. and Bretschneider, S. “The weighted Russell measure of tech-

nical efficiency”. European Journal of Operational Research, Vol. 108, No. 2,

pp. 438–451, 1998.

[Sadj 10] Sadjadi, S. and Omrani, H. “A bootstrapped robust data envelopment analy-

sis model for efficiency estimating of telecommunication companies in Iran”.

Telecommunications Policy, Vol. 34, No. 4, pp. 221–232, 2010.

[Shar 07] Sharp, J. A., Meng, W., and Liu, W. “A Modified Slacks-based measure

model for data envelopement analysis with natural negative outputs and

inputs”. The Journal of the Operational Research Society, Vol. 58, 2007.

[Shep 70] Sheppard, R. W. Theory of cost and production. Princeton University Press,

1970.

[Siga 09] Sigaroudi, S. Incorporating Ratios in DEA-An application to real data. Mas-

ter’s thesis, The University of Toronto, 2009.

[Sima 00] Simar, L. and Wilson, P. W. “A general methodology for bootstrapping in

non-parametric frontier models”. Journal of applied statistics, Vol. 27, No. 6,

pp. 779–802, 2000.

[Sima 08] Simar, L. and Wilson, P. W. “Statistical inference in nonparametric fron-

tier models: recent developments and perspectives”. The Measurement of

References 156

Productive Efficiency (H. Fried, CAK Lovell and SS Schmidt Eds), Oxford

University Press, Inc, pp. 421–521, 2008.

[Sima 98] Simar, L. and Wilson, P. W. “Sensitivity analysis of efficiency scores: How to

bootstrap in nonparametric frontier models”. Management science, Vol. 44,

No. 1, pp. 49–61, 1998.

[Sima 99a] Simar, L. and Wilson, P. W. “Estimating and bootstrapping Malmquist

indices”. European Journal of Operational Research, Vol. 115, No. 3, pp. 459–

471, 1999.

[Sima 99b] Simar, L. and Wilson, P. W. “Of course we can bootstrap DEA scores!

But does it mean anything? Logic trumps wishful thinking”. Journal of

Productivity Analysis, Vol. 11, No. 1, pp. 93–97, 1999.

[Sima 99c] Simar, L. and Wilson, P. W. “Some problems with the Ferrier/Hirschberg

bootstrap idea”. Journal of Productivity Analysis, Vol. 11, No. 1, pp. 67–80,

1999.

[Sowl 04] Sowlati, T. and Paradi, J. C. “Establishing the “practical frontier” in data

envelopment analysis”. Omega, The International Journal of Management

Science, Vol. 32, pp. 261–272, 2004.

[Stei 01] Steinmann, L. and Zweifel, P. “The Range Adjusted Measure (RAM) in

DEA: Comment”. Journal of Productivity Analysis, Vol. 15, No. 2, pp. 139–

144, 2001.

[Than 12] Thanassoulis, E., Kortelainen, M., and Allen, R. “Improving envelopment

in data envelopment analysis under variables returns to scale”. European

Journal of Operational Research, Vol. 218, 2012.

References 157

[Than 92] Thanassoulis, E. and Dyson, R. “Estimating preferred target input-output

levels using data envelopment analysis”. European Journal of Operational

Research, Vol. 56, 1992.

[Than 96] Thanassoulis, E., Boussofiane, A., and Dyson, R. “A Comparison of Data

Envelopment Analysis and Ratio Analysis as Tools for Performance Assess-

ment”. Omega International, Journal of Management Science, Vol. 24, No. 3,

pp. 229–244, 1996.

[Tone 01] Tone, K. “A slacks-based measure of efficiency in data envelopment analysis”.

European Journal of Operational Research, Vol. 130, 2001.

[Tone 99] An extensions of the two Phase Process in CCR model, 1999.

[Tsio 03] Tsionas, E. G. “Combining DEA and stochastic frontier models: An empir-

ical Bayes approach”. European Journal of Operational Research, Vol. 147,

No. 3, pp. 499–510, 2003.

[Tulk 93] Tulkens, H. “On FDH Analysis: Some Methodological Issues and Applica-

tions to Retail Banking, Courts and Urban Transit”. Journal of Productivity

Analysis, Vol. 4, No. 1–2, pp. 183–210, 1993.

[Tzio 12] Tziogkidis, P. “The Simar and Wilsons Bootstrap DEA approach: a cri-

tique”. Tech. Rep., Cardiff University, Cardiff Business School, Economics

Section, 2012.

[Wu 05] Wu, D., Liang, L., Huang, Z., and X., S. “Aggregated Ratio Analysis in

DEA”. International Journal of Information Technology and Decision Mak-

ing, Vol. 4, No. 3, pp. 369–384, 2005.

[Yang 07] Yang, H. and Pollitt, M. “Distinguishing Weak and Strong Disposability

among Undesirable Outputs in DEA: The Example of the Environmental

References 158

Efficiency of Chinese Coal-Fired Power Plants”. Electricity Policy Research,

2007.

[Zhou 07] Zhou, P., Poh, K. L., and Ang, B. W. “A non-radial DEA approach to

measuring environmental performance”. Journal of Operational Research,

Vol. 178, No. 1, pp. 1–9, 2007.

[Zies 84] Zieschang, K. “An Extended Farrell Efficiency Measure”. Journal of Eco-

nomic Theory, Vol. 33, pp. 387–396, 1984.

Documents

Data Envelopment Analysis models for a mixture of non ... · Data Envelopment Analysis models for a mixture of non-ratio and ratio variables Sanaz Sigaroudi Doctor of Philosophy Graduate