Application-Oriented Networking through …...Abstract Application-Oriented Networking through Virtualization and Service Composition Hadi Bannazadeh Doctor of Philosophy Electrical

Application-Oriented Networking

through

Virtualization and Service Composition

by

Hadi Bannazadeh

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Electrical and Computer Engineering DepartmentUniversity of Toronto

Copyright c© 2010 by Hadi Bannazadeh

Abstract


through

Virtualization and Service Composition

Hadi Bannazadeh

Doctor of Philosophy

Electrical and Computer Engineering Department

University of Toronto

2010

Future networks will face major challenges in accommodating emerging and future

networked applications. These challenges include significant architecture and manage-

ment issues pertaining to future networks. In this thesis, we study several of these chal-

lenges including issues such as configurability, application-awareness, rapid application-

creation and deployment and scalable QoS management. To address these challenges, we

propose a novel Application-Oriented Network (AON) architecture as a converged com-

puting and communication network in which application providers are able to flexibly

configure in-network resources on-demand. The resources in AON are virtualized and

offered to the application providers through service-oriented approaches.

To enable large-scale experimentation with future network architectures and applica-

tions, in the second part of this thesis, we present the Virtualized Application Networking

Infrastructure (VANI) as a prototype of an Application-Oriented Network. VANI utilizes

a service-oriented control and management plane that provides flexible and dynamic al-

location, release, program and configuration of resources used for creating applications

or performing network research experiments from layer three and up. Moreover, VANI

resources allow development of network architectures that require a converged network

of computing and communications resources such as in-network processing, storage and

ii

software and hardware-based reprogrammable resources. We also present a Distributed

Ethernet Traffic Shaping (DETS) system used in bandwidth virtualization in VANI and

designed to guarantee the send and receive Ethernet traffic rates in VANI, in a computing

cluster or a datacenter.

The third part of this thesis addresses the problem of scalable QoS and admission

control in service-oriented environments where a limited number of instances of service

components are shared among different application classes. We first use Markov Deci-

sion Processes to find optimal solutions to this problem. Next we present a scalable and

distributed heuristic algorithm able to guarantee probability of successful completion of

a composite application. The proposed algorithm does not assume a specific distribution

type for services execution times and applications request inter-arrival times, and hence

is suitable for systems with stationary or non-stationary request arrivals. We use simula-

tions and experimental measurements to show the effectiveness of the proposed solutions

and algorithms in various parts of this thesis.

iii

to the memory of my father

iv

Acknowledgements

The completion of this thesis would not have been possible without the support of

many people. First and foremost, I owe my deepest gratitude to my supervisor, Professor

Alberto Leon-Garcia, for his guidance and generous support throughout my research. I

would like to thank him for the insightful discussions and ideas that shaped my research

and led me to the completion of this thesis. Professor Leon-Garcia is not only a great

supervisor but also an admirable person that I will see as a role model in the future.

I would like to thank the honorable members of my committee: Professors Ben Liang,

Paul Chow, Baochun Li, Gordon Agnew and Ahsish Khisti for their evaluation of my

thesis and their invaluable comments and feedbacks.

I would also like to thank the university staff members, especially Ms. Linda Espeut,

Mr. Vladimirio Cirillo and Ms. Darlene Gorzo for their generous help and administrative

support.

During the years at UofT, I received support, feedback and encouragement from my

dear friends and teammates at the Network Architecture Lab, especially from Alireza

Bigdeli, Armin Ghayoori, Keith Redmond, Ali Tizghadam, Ramy Farha, Ivan Hernandez,

Agop Koulakezian and Houman Rastegarfar. I would like to express my gratitude and

thanks to all of them.

I also had the privilege to work with many students at UofT as part of their education

process. I wish to thank them all for their dedication, hard work and willingness to

experience. They are Arbab Khan, Gordon Tam, Saleh Dani, Justin Seto, Andrew Mehes,

Michael Ens, Ian Gartley, Tom Yue, Darryl Chung, Mingliang Ma, Maxim Galash, Wenyu

Li and Anthony Das Santos.

Throughout my Ph.D. years I received family-like friendship from many friends. I

would like to thank all of them for the memorable moments: Amin Farbod, Reza Safian,

Amirali Basri, Maryam Bahrami, Mostafa Haghiri, Kamran Farzan, Mehdi Lotfinezhad,

and David Brown.

v

I would also like to thank my mother, brothers and sisters for their unconditional love

and support without which the completion of this thesis would not have been possible.

Meeting my wife was one of the most wonderful events during my Ph.D. years. I would

like to thank my beloved wife, Sara, for her selfless love and support. I am grateful for

her sacrifices and patience.

vi

Contents

1 Introduction 1

1.1 Vision of A Future Network . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Motivating Application Scenarios . . . . . . . . . . . . . . . . . . 4

1.2 Research Goals and Challenges . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Proposed Solutions Overview . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

I Application-Oriented Networking 16

2 Background and Requirement Analysis 17

2.1 New Computing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 New Applications through Composition . . . . . . . . . . . . . . . . . . . 18

2.3 Emergence of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Evolution of Traditional Service Providers . . . . . . . . . . . . . . . . . 22

2.5 Introduction of Smart Phones . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6 Advancements in Content Delivery Networks . . . . . . . . . . . . . . . . 25

2.7 Future Networks Architecture . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Application-Oriented Networking 29

3.1 AON Application Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 AON Control and Management Planes . . . . . . . . . . . . . . . . . . . 39

vii

3.3 Application-Oriented Routers . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Application-Oriented Routers Use Cases . . . . . . . . . . . . . . . . . . 44

3.4.1 Telecom Service Providers . . . . . . . . . . . . . . . . . . . . . . 44

3.4.2 Enterprise Networks . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4.3 Overlay Networks and Content Distribution Networks . . . . . . . 48

3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

II Virtualized Application Networking Infrastructure 52

4 Virtualized Application Networking Infrastructure 53

4.1 VANI Design Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1.1 VANI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1.2 Current Physical Resources in VANI (VANIv1 Resources) . . . . . 60

4.1.3 Example: Requesting a Resource in VANI . . . . . . . . . . . . . 63

4.2 VANI Control and Management Plane (VANI-CMP) . . . . . . . . . . . 64

4.2.1 User Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.2 Authentication Authorization Accounting . . . . . . . . . . . . . 65

4.2.3 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2.4 Generic Resources and Registration . . . . . . . . . . . . . . . . . 66

4.3 SOA-Based Implementation of VANI-CMP . . . . . . . . . . . . . . . . . 67

4.4 Security in VANI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 Guaranteeing Bandwidth in VANI . . . . . . . . . . . . . . . . . . . . . . 70

4.5.1 Interconnecting VANI Nodes in IP Layer . . . . . . . . . . . . . . 71

4.5.2 Interconnecting VANI Nodes in Ethernet Layer . . . . . . . . . . 72

4.5.3 Experimentation with L3 Protocols . . . . . . . . . . . . . . . . . 73

4.6 SW-Based Resources in VANI . . . . . . . . . . . . . . . . . . . . . . . . 73

4.7 Federation with GENI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

viii

4.8 A VANI Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.9 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.9.1 Reprogrammable Hardware Resource . . . . . . . . . . . . . . . . 78

4.9.2 Processing Service and Network Virtualization . . . . . . . . . . . 80

4.10 Experiments & Applications . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 A Distributed Ethernet Traffic Shaping System 84

5.1 Distributed Ethernet Traffic Shaping (DETS) system . . . . . . . . . . . 89

5.1.1 DETS Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.1.2 DETS for Linux OS . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 DETS System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.1 Rate Allocator Module . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.2 Performance Improvements . . . . . . . . . . . . . . . . . . . . . 97

5.3 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.4 Modifications to Ethernet Control Plane . . . . . . . . . . . . . . . . . . 103

III QoS & Admission Control in Service-Oriented Systems105

6 Allocating Services to Applications using Markov Decision Processes 106

6.1 Concurrent Service Executions . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1.2 Markov Decision Process Formulation . . . . . . . . . . . . . . . . 111

6.1.3 Optimal Policy with Different Services . . . . . . . . . . . . . . . 113

6.1.4 The Optimal Policy and Performance Comparison . . . . . . . . . 115

6.2 Sequential Service Executions . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2.2 Markov Decision Process formulation . . . . . . . . . . . . . . . . 123

6.2.3 Optimal policy and performance comparison . . . . . . . . . . . . 124

ix

7 A Distributed Probabilistic Commitment-Control Algorithm 130

7.1 QoS Control in a Service-Oriented System . . . . . . . . . . . . . . . . . 133

7.2 Probabilistic Modeling of Service Commitment . . . . . . . . . . . . . . . 137

7.3 Computing Over-Commitment Probability . . . . . . . . . . . . . . . . . 142

7.4 Distributed Algorithm for Service Commitment . . . . . . . . . . . . . . 144

7.4.1 DASC Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 146

7.5 DASC Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 148

7.6 Queue-enabled Distributed Algorithm for Service Commitment . . . . . . 156

7.6.1 Problem Formulation and Description . . . . . . . . . . . . . . . . 157

7.6.2 Q-DASC Performance Evaluation . . . . . . . . . . . . . . . . . . 158

7.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8 Application Admission Control System 165

8.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.2 Steady-State Based Application Admission Control System . . . . . . . . 168

8.3 Online Optimization-based Application Admission Control System . . . . 170

8.3.1 Feasibility Check . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.3.2 Scenario Generation . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.3 Optimal Admission Decisions For Generated Scenarios . . . . . . 174

8.3.4 Final Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 176

8.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

9 Conclusions 181

9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

9.1.1 Application-Oriented Networking . . . . . . . . . . . . . . . . . . 182

9.1.2 Virtualized Application Networking Infrastructure . . . . . . . . . 183

9.1.3 Scalable and Distributed QoS and Admission Control . . . . . . . 184

9.1.4 Related Educational Contributions . . . . . . . . . . . . . . . . . 186

x

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Appendices 191

A Queue-Enabled Service Commitment 191

A.1 Time to Enter Service in a G/G/C/N System . . . . . . . . . . . . . . . 191

A.2 TES for G/D/C/N System . . . . . . . . . . . . . . . . . . . . . . . . . 199

A.3 TES for G/M/C/N System . . . . . . . . . . . . . . . . . . . . . . . . . 199

B Computing Over-Commitment Probability using Chernoff’s Bound 201

C Derivation of Gk(t) Probability 206

D Simulation Environment Description 208

Bibliography 210

Glossary 226

xi

List of Tables

4.1 Average maximum FPGA programming time . . . . . . . . . . . . . . . . 80

4.2 UDP and TCP traffic measurements in a VANI node in MBytes per second

(MBps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

xii

List of Figures

1.1 Vision of a future network . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Example of a future application: Smart Grids . . . . . . . . . . . . . . . 5

2.1 Basic Service-Oriented Architecture model(source:http://www.w3.org) . . 20

3.1 Three planes in an Application-Oriented Network . . . . . . . . . . . . . 30

3.2 Application Plane Resources . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Multiple Applications in AON . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 Application Plane Architecture . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Application-Oriented Network Reference Model . . . . . . . . . . . . . . 40

3.6 Overall view of an Application-Oriented Network with multiple AORs and

applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.7 Telecommunication services in an AON . . . . . . . . . . . . . . . . . . . 45

3.8 Enterprise Service Bus and AON . . . . . . . . . . . . . . . . . . . . . . 47

3.9 Peer-to-Peer network in AON . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1 VANI design requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 VANI architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 Researcher interaction with VANI planes . . . . . . . . . . . . . . . . . . 59

4.4 Virtualizing physical resources in VANI . . . . . . . . . . . . . . . . . . . 61

4.5 A sample interaction between a researcher and VANI to secure a resource 63

4.6 A sample schema for generic XML content in a getRequest response message 67

xiii

4.7 Connecting VANI nodes in IP layer . . . . . . . . . . . . . . . . . . . . . 71

4.8 Connecting VANI nodes in Ethernet layer . . . . . . . . . . . . . . . . . 72

4.9 Large scale experimentation with new L3 protocols . . . . . . . . . . . . 74

4.10 Connecting VANI to GENI . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.11 Reprogrammable Hardware (BEE2 Board) . . . . . . . . . . . . . . . . . 78

4.12 Traffic measurement experiment topology . . . . . . . . . . . . . . . . . . 81

5.1 A system with five nodes and two virtual nodes on each . . . . . . . . . . 85

5.2 TCP rate back off due to interfering UDP traffic . . . . . . . . . . . . . . 86

5.3 DETS measurement and rate control points . . . . . . . . . . . . . . . . 89

5.4 DETS System Internal Modules . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 DETS performance evaluations for system shown in Figure 5.1 . . . . . . 98

5.6 Performance evaluation of rate allocation algorithms a) RAA-SlowProbe

b) RAA-FastProbe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.7 Performance evaluation of rate allocation algorithms a) RAA-FairShare b)

RAA-ForwardExplicit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.8 DETS in Ethernet control plane . . . . . . . . . . . . . . . . . . . . . . . 103

6.1 A system with m different service types and N instance of each type . . 109

6.2 A system with three types of service and two classes of applications . . . 111

6.3 A system with three type of services, two classes of applications and two

types of instances for service type 3 . . . . . . . . . . . . . . . . . . . . . 113

6.4 Optimal policy when the system is in state (n1, n2), and α = 1, β = 0.1 . 115

6.5 Optimal policy when the system is in state (n1, n2), and α = 1, β = 0.5 . 116

6.6 Performance Comparison between Complete Sharing, Complete Partition-

ing and MDP-based partitioning mechanisms . . . . . . . . . . . . . . . . 117

6.7 A system with m different service types and N instance of each type . . 119

6.8 A system with three types of service and two classes of applications . . . 122

xiv

6.9 Optimal policy when the system is in state (n11, n12, n22), and γ = 0.1: a)

n22 = 1, b) n22 = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.10 Optimal policy when the system is in state (n11, n12, n22), and γ = 0.3: a)

n22 = 1, b) n22 = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.11 Performance Comparison between No Commitment Policy, Full Commit-

ment Policy and MDP-based partitioning mechanisms (α = −0.1, β =

0.5, γ = 0.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.12 A sample beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.13 Performance Comparison between No Commitment Policy, Full Commit-

ment Policy and MDP-based partitioning with a beta distribution for ser-

vice execution time and (α = −0.1, β = 0.5, γ = 0.1) . . . . . . . . . . . . 128

7.1 A sample service-oriented environment . . . . . . . . . . . . . . . . . . . 133

7.2 Composition Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.3 A service-oriented system with three agents, each controlling one service

type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.4 Distributed Algorithm for Service Commitment in SDL (Specification and

Description Language) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.5 Beta pdf for service execution time with parameters α = 2.333 and β = 4.666148

7.6 Application failure ratio for a system with two application classes and two

service types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.7 Comparing DASC throughput with bottleneck-based admission control

algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.8 A service oriented environment consisted of twelve service types and three

applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.9 Applications failure ratios in the system . . . . . . . . . . . . . . . . . . 152

7.10 Failure ratios in services 1 to 6 vs. applications request rates . . . . . . . 153

xv

7.11 Comparison between four admission control mechanisms with stationary

request arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

7.12 Comparison between four admission control mechanisms with on-off bursty

request arrivals with burst time (T) . . . . . . . . . . . . . . . . . . . . . 154

7.13 Applications queuing probability with ample number of queuing spaces

using Q-DASC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.14 Applications failure probability based on queue size in Q-DASC algorithm 160

8.1 A sample service-oriented environment . . . . . . . . . . . . . . . . . . . 167

8.2 Application Admission Control System using Online Optimization . . . . 171

8.3 System reward for four different techniques . . . . . . . . . . . . . . . . . 178

8.4 Application 1 and application 2 failure rates based on the applications

request rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.1 Distributions for residual service times in a service with uniform execution

time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

A.2 Distributions for residual service times in a service with Normal execution

time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

A.3 Distributions for residual service times in a service with a Beta execution

time α = 2.333, β = 4.666 . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A.4 TES distribution and calculated bound for beta distribution with α =

2.333, β = 4.666 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

B.1 A sample d(s) for a service with 900 instances, and random pi s for 1000

application instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

xvi

Chapter 1

Introduction

The present Internet has become an essential infrastructure in modern society in spite of

its glaring and serious shortcomings in regards to security, reliability, and performance

[1]. A constant in the thirty-year history of the Internet has been continual growth in

scale, both in the number of Internet users and in the diversity of Internet applications.

The growth in the Internet has been fueled by the steady improvement in cost and

performance of computing and communications technology.

The Internet is currently entering a new phase of more dramatic and diverse growth

which is driven by: 1. New computing models where a new application can be created

with the same ease as designing a new web page through the linking of service build-

ing blocks and 2. New Internet users in the form of communicating devices such as

smart wireless phones and tablets, high definition monitors, smart sensors, alarms and

controllers.

The next generation Internet will be challenged to support a much more diverse

and a much greater number of applications as well as a new generation of communicating

devices. The goal of this thesis is to study how future networks can support and facilitate

creation, deployment and management of these emerging applications. In particular, we

study how service composition techniques in application creation and virtualization of

1

Chapter 1. Introduction 2

resources can empower future networks in this support.

In the following, we will first present our vision of future networks, followed by brief

descriptions of some example application scenarios that require a new network architec-

ture. We then outline our research goals and challenges, and briefly preview our solutions.

Finally, we describe the thesis structure and research contributions.

1.1 Vision of A Future Network

Figure 1.1 shows our vision of the emerging future networks. In this vision, the network

will be mainly comprised of an optical backbone network, core/metro/access networks

and finally the terminals, datacenters and end users. The backbone optical network

will be responsible for transferring massive volumes of data between the core network

components. The access network will provide very high bandwidth connectivity using

various wireless and optical technologies to the network users.

Terminals and users in future networks fall into various classes. One major class

will be mobile computing nodes that will have a combination of advanced features such

as high processing power, a long battery life and sophisticated user interfaces including

touch screens, speech and image recognition, etc. Moreover, users will have high mobility

that will require hand-offs between various types of access technologies.

Another class of the network end terminals will be smart sensors and/or actuators,

such as smart grid sensors, that require a very high level of responsiveness and reliability

from the network and will be deployed in mass. The sheer volume of these communicating

devices will challenge the scale and cost-points of future networks.

Another major type of network “users” will be massive datacenters that exploit in-

expensive computing and storage commodity resources and constitute factories used for

creating applications that require massive processing, storage and bandwidth at low cost.

These datacenters will be connected to the network using very high bandwidth optical


EnterprisesEnterprises

Core/Metro/Access Networks

Optical Network

Users

Data

cen

ters

(Clo

ud C

om

p.)

Figure 1.1: Vision of a future network

networking technologies.

In combination, these new network users at the edge of future networks together

with deployment of very high bandwidth optical core network will open the door for

introduction of a vast universe of applications and intelligence that will serve different

purposes.

In this application universe, different classes of applications (e.g., sensors, human

communications, machine-to-machine, content distribution, etc) will have different and

sometimes contradictory expectations from the network. These diverging requirements

might force creation of separate networks, unless future networks become capable of

serving these applications on a shared infrastructure.

In this thesis our main research objective is to study future network architectures

that can provide customizable support for applications. In the next subsection we briefly

examine several motivating application scenarios to illustrate the type of challenges that


future networks will face and to motivate our research on future network architectures.

1.1.1 Motivating Application Scenarios

Future networks will support a diverse range of applications over a variety of access

technologies. In this section, we discuss four sample application scenarios in two general

categories to show the type of challenges the future networks will face and application’s

requirement that need to be addressed.

Smart Infrastructures

Smart infrastructures such as smart utility grids and smart transportation systems are an

important class of future applications. In this class of application, sensors and actuators

will be deployed in massive scale throughout the network. The sensors allow real-time

environmental information to be gathered in a variety of settings and the actuators allow

control actions to be exercised in response to environmental conditions according to

various policies and objectives.

On the other hand, inexpensive computing and storage in massive datacenters allow

introduction of applications that can receive the sensors data, process it, and generate

and forward commands to the actuators. The combination of smart sensors, actuators

and affordable large scale processing and storage will enable introduction of smart in-

frastructures that will revolutionize the way we live. One of the major requirements of

these future smart infrastructures is having a responsive network to reliably transfer the

sensor data and actuator commands between the sensors, actuators and datacenters.

Smart grids are an example of smart infrastructures that deploy sensors and actuators

in homes and housing and industrial complexes as shown in Figure 1.2. The smart

infrastructure in smart grids enables not only improved energy efficiency, but also new

business models for energy pricing and trading as well as energy consumption that is

sensitive to carbon emissions. The sensors in smart grids will generate large volumes


Core/Access Network

Optical Network

SmartGridsElements

Data

cen

ters

(Clo

ud C

om

p.)

SmartHouses

Datacenters

(Cloud Comp.)

Figure 1.2: Example of a future application: Smart Grids

of data. The generated data needs to be securely transferred through the network to

datacenters that are able to process data in massive scale and produce commands that

are to be forwarded to the actuators resided in the homes and housing and manufacturing

complexes. The amount of data that smart grids can potentially generate as well as

reliability and responsiveness level that are needed will surpass existing applications that

are supported by the current Internet.

Another type of smart infrastructure involves smart transportation systems comprised

of traffic sensors and traffic signals as well as networked cars and passengers equipped

with wireless devices, GPS and cameras. These smart transportation systems generate

a very large amount of data that need to be securely passed through the network to

the datacenters for processing. The generated commands in these datacenters will not

only guide the man-driven traffic but possibly will direct smart and machine-controlled

moving elements. This automated control will be essential to address the future energy

challenges and maximizing the use of green and renewable sources of energy.


From these two sample applications, we can see that future networks need to be highly

reliable, responsive and be able to handle a large number of wireless and mobile users in

massive scales with high security and accountability.

Content Distribution Networks

Content distribution and human-to-human communications and interactions are driving

the emergence of many new applications that will be empowered by the availability of

high bandwidth in the backbone optical networks, inexpensive computing resources and

smart mobile terminals. These applications need to handle a large number of mobile

and heterogeneous users. The heterogeneity and mobility are driven by the universal

acceptance of smart wireless phones, netbooks and new devices such as iPad that use

and build on high bandwidth wireless access technologies.

One example application will be high quality streaming of an event to a large number

of users. The end-users in this application will be using heterogeneous devices and differ-

ent access technologies and are mostly mobile. In this application the distribution and

streaming needs to be adaptive to each user’s available bandwidth and device playback

capabilities or user preferences. In a mobile environment, mobile devices may experience

temporary disconnections, however, novel caching techniques are required to maximize

Quality of Experience. In this class of applications, users need to interact with each other

and produce metadata that can be consumed by other users. The users experience will

also be improved with a large volume of metadata (speech-to-text, etc) that needs to

be automatically generated using powerful computing resources. Also, the distribution

model in this class of application needs to be customized to the application business

models requirements, the type of content as well as the target end-users.

In this class of applications, many novel functionalities are required including: efficient

content multicasting, smart caching, content conversion, image rendering and recognition,

specialized encryption and decryption functionalities, speech recognition and speech to text


as well as AAA (Authentication/Authorization/Accounting) operations. These function-

alities are also required to be highly reliable, robust and affordable and sometimes they

may only be needed for a short period of time.

Another example of content distribution applications is 3D presence systems [2] that

will enable a group of individuals to interact in 3D across a wide area network. This

application, unlike the previous scenario, might involve a small group of users, how-

ever it requires very high speed and high bandwidth connections and large amounts

of processing. Introduction of 3D technologies, inexpensive computing resources in the

datacenters used for image processing operations, together with very high bandwidth

backbone optical network will enable introduction of these types of content distribution

and streaming applications that will need different types of functionalities than what the

current networks can provide.

Having seen these scenarios, we can expect that future networks will face unprece-

dented challenges from a diverse range of applications. These challenges will push net-

works to match the advancements in the different technologies in its periphery (users,

commoditized computing, and access technologies) and within (optical networks). In this

thesis, our goal is to study future networks architecture and the type of capabilities that

these networks need to cover.

1.2 Research Goals and Challenges

In the previous section, we presented our vision of future networks and we briefly pre-

viewed a few sample future application classes. We also discussed some challenging

requirements of these applications. To enable introduction of such applications, it is

fundamental for the network to address these challenges and more. In this section we

name a few of those challenges and outline the main research questions studied in this

thesis. These challenges are grouped in four main categories including configurability


and application-orientation, facilitating application-creation, scalable service manage-

ment and QoS control, and mobility and security.

• Configurability and Application-Orientation: Traditional networks are usu-

ally designed, installed and configured once for all applications that operate over

them. Future network need to be flexible and configurable to adapt to applica-

tion requirements. This configurability should be in different levels from the lower

links configuration up to application-specific routing functions. For instance, ap-

plications should be able to customize the network architecture according to their

distribution model as well as their chosen caching, forwarding, broadcasting and

multicasting approaches for their specific content. Future networks require config-

urable and application-oriented components which enable each application to fulfill

its own set of requirements. We call such a network an Application-Oriented Net-

work. One of the main research goals in this thesis is to develop a framework in

which application-orientation and configurability can be realized in future networks.

• Facilitating Application Creation: Future distributed applications need to be

created, deployed and retired rapidly to adapt to future agile and competitive

business models. Networks can foster this agility by offering common services

used in the full life cycle of an application. Many of these common services will

use computing and communication resources. Future networks should provide a

network of computing and communication resources on which these services and

ultimately applications can operate. To provide such support, virtualization and

composition techniques will be heavily used in future networks.

Virtualization has been introduced as a technique to hide the underlying hardware

resources from the applications. Virtualization will be used in different levels in fu-

ture networks to facilitate application creation and to fulfill application-orientation

and configurability requirements. Virtualization will be both a solution and a


challenge in future network since the success and scope of future networks will de-

pend on the advancements in virtualization technologies in various domains such

as bandwidth virtualization as well as computing virtualization.

To facilitate application creation, we also need to be able to compose new applica-

tions using these common service components and virtualized resources. Another

major challenge in future networks is to define open and flexible interfaces to these

common services to enable their incorporation in different applications in the face

of heterogeneity.

• Scalable Service Management and QoS Control: Guaranteeing QoS has

been and will be a major challenge for any network. The scope and diversity of the

applications that will operate over future networks depend directly to the level of

QoS, responsiveness and predictability that these networks will provide.

Future networks will offer a much diverse range of features to the applications.

Therefore their management scope will cover managing the new features as well.

To limit the costs associated with management in future networks, scalable and

automated service and resource management solutions will be required. A major

challenge in introduction of future application-oriented networks is to design and

develop such scalable service management systems and their associated algorithms.

Another major research goal in this thesis is to study QoS control mechanisms that

are able to guarantee a desirable network behavior in large scales.

• Mobility and Security: Two of the main challenges in future networks are mo-

bility and security management. Generally, networks with IP-centric transport

stratum have difficulty in handling mobility and security issues. With emergence

of a new generation of applications and users, these two challenging issues will con-

tinue to play a major rule in success of any network architecture. Although we do

not directly address these challenges in this study, one of our main research goal


is to provide a flexible platform for enabling incorporation of novel mobility and

security management systems in future applications.

Addressing the complete list of challenges in a dissertation is impossible. Nev-

ertheless, we focus on a subset of these challenges in areas including configurability,

application-orientation, network-facilitated application creation, virtualization and scal-

able QoS management. We also direct the interested reader to other studies in our

research group on autonomous service management [3] and core network management [4]

that have addressed other challenges in future networks.

1.3 Proposed Solutions Overview

In this section, we briefly overview our proposed solutions for some of the studied chal-

lenges. In this study, we propose a new network architecture called Application-Oriented

Network architecture to address the configurability and application-orientation challenges

in future networks. Application-Oriented Network (AON) is a converged computing and

communication network that facilitates creation of diverse range of applications through

virtualization of resources and service-oriented application creation paradigm.

In AON, an application is a high-level distributed function that is composed of several

lower-level service components and is designed to either deliver a service to the end-users

or to be used in another more high-level application. One of the main objectives in AON

is to enable application marketplaces in which application providers can find available

components that are designed and developed separately and use them in their applica-

tions. To do so, AON follows the Service-Oriented Architecture (SOA) [5] application

creation paradigm.

In SOA, high-level applications and business processes are created by composing

service components that can be accessed through well-defined and standard interfaces.

Service-Oriented Architecture enables loose coupling and higher interoperability among


service components that are used in creating an application while each can be developed

and deployed independently. Our proposal for AON utilizes this paradigm to facilitate

application creation in future networks.

Virtualization is another major technique that we heavily use to facilitate application

creation and to provide configurability and application-orientation. “Virtualization” cor-

responds to different technologies in areas such as computer hardware, software, memory,

storage, data and etc. [6]. Nevertheless, we refer to a virtualized resource as a resource

that provides the essential capabilities of the real physical resource and is abstracted

from the physical resource. Through virtualization, AON allows application providers to

rapidly deploy and retire an application. Application providers are also able to flexibly

and dynamically configure the virtualized resources to satisfy their applications require-

ments.

Our proposed architecture for AON consists of three main planes: control, manage-

ment and application planes. The resources that are required for creating an application

are virtualized and abstracted as service components in the application plane. In AON

application plane, a virtual network of computing and communication resources is al-

located to each application in which that application could operate. The application

providers secure access to these resources through the open and well-defined interfaces

of the control plane. The AON management plane, on the other hand, is responsible for

managing the resources in the application plane.

Although applications in the application plane can follow any network architecture

that they choose, we propose a generic application plane architecture to address future

applications requirements. This generic architecture includes two enriched layers; a ser-

vice layer and a transport layer that covers content-delivery as well as data-delivery

functions.

To validate this architecture, we designed and developed a prototype of this network

called Virtualized Application Networking Infrastructure (VANI) [7, 8] that allows ap-


plication providers and networking researchers to create new distributed applications. In

the realm of network experimentation testbeds, VANI is a major contribution as it en-

ables network researchers and application providers to experiment with new distributed

applications and network architectures. Moreover, VANI allows experimentation with

new layer three protocols instead of Internet Protocol (IP). VANI also includes a repro-

grammable hardware resource that allows application providers to perform customized

hardware-based processing in the network.

Another major contribution of this study in the field of network virtualization is the

introduction of a Distributed Ethernet Traffic Shaping system (DETS) [9] that is able to

guarantee the send and receive bandwidth on virtual networks created for applications

in VANI. Although DETS is proposed for VANI, it is also capable of operating in any

virtual machine-based computing cluster or large datacenter such as cloud computing

[10] datacenters to improve network performance and minimize interference between dif-

ferent users traffic. One of the main advantages of DETS is that unlike other Ethernet

congestion control mechanism it does not require any changes in Ethernet equipments

and can operate on the hosts systems. We propose four algorithms for the DETS core

module which is the rate allocator. We compare these four algorithms performance and

describe their characteristic through experimentations and measurements.

One of the main challenges in a service-oriented system such as VANI is to guarantee

an agreed level of QoS for the composite applications. To address this issue, we inves-

tigate large scale QoS and admission control mechanisms and propose new algorithms

to guarantee the QoS for applications created through service composition in service-

oriented systems. Specifically, we focus on application admission and QoS control to

guarantee successful application completion in service-oriented systems where a set of

service components with a limited number of instances are shared between different ap-

plications. The goal is to allocate these limited resources in a way that the system revenue

is maximized and an agreed level of application QoS is met. In this problem, there are


several defining parameters that can affect the possible solutions including service execu-

tion times distributions, applications request arrival processes as well as the scalability

concerns.

We first formulate this problem using Markov Decision Processes (MDP) [11, 12]

for small scale systems that have exponential service execution times distributions and

are subjected to stationary Poisson request arrival processes. Next we introduce the

problem of QoS control in these environments [13] and we propose distributed heuristics

to guarantee the probability of successful completion of an admitted application instance

[13, 14]. The proposed algorithm is called Distributed Algorithm for Service Commitment

(DASC) [15]. DASC is able to operate with both stationary and non-stationary request

arrival processes and covers both queue-less and queue-enabled services. Moreover, it

does not limit the service execution times distributions to be exponential. DASC uses a

probabilistic model to predict future resource usage in the system and makes admission

decisions based on the current and the projected state of the system.

We present alternative steady-state based admission control approaches that can guar-

antee the probability of successful completion in the steady-state. Through simulations

and performance comparisons we show that DASC is able to operate with both station-

ary and non-stationary request arrivals, while the steady-state based approaches can only

operate with stationary request arrivals.

We also propose an application admission control system to both guarantee an agreed

level of QoS using DASC and maximize the system revenue by admitting more valuable

application classes to the system [16]. For the application admission control system, we

investigate the steady-state based solutions as well as online combinatorial optimization

approaches to maximize system revenue.


1.4 Thesis Structure

The thesis is composed of three parts that correspond to the major contributions of this

study:

• Part I: Application-Oriented Networking: In this part, we present the AON

architecture. We start, in Chapter 2, by analyzing the background and require-

ments for AON and studying the major trends in computer and communication

technologies [17]. The AON discussion is followed with the description of the

AON main planes and layers as well as their responsibilities and functionalities

in Chapter 3. We also present the main component in an AON network which is

an Application-Oriented Router (AOR) that utilizes hardware and software com-

ponents to support content processing and delivery. We present several sample use

cases in which AORs can add value to different applications scenarios.

• Part II: Virtualized Application Networking Infrastructure: In Chapter 4,

we describe VANI architecture and its main resources and we show how a researcher

or an application provider can interact with VANI. We also discuss different func-

tionalities of VANI control and management plane. We describe how new resources

can be created and registered in VANI. Performance measurements on VANI re-

programmable resources and VANI internal fabric are presented as well.

In Chapter 5, we present the DETS system that is able to guarantee the send and

receive bandwidth on virtual networks created for applications in VANI. In this

Chapter, we also present DETS main modules and its corresponding algorithms as

well as measurements and performance evaluations.

• Part III: QoS & Admission Control in Service-Oriented Systems: In this

final part, we investigate the problem of service allocation in service-oriented sys-

tems. We first formulate this problem using Markov Decision Processes in chapter


6. Next, we introduce the problem of QoS control in these environments and we de-

scribe the DASC system in Chapter 7 and present the probabilistic and predictive

model used in DASC for both queue-less and queue-enabled systems. In Chapter

8, we propose an application admission control system in service-oriented systems.

Performance evaluations of each of the proposed algorithms are properly placed in

the related chapters. Finally in Chapter 9, we offer the concluding remarks and

discuss our contributions as well as our future work.

Part I


16

Chapter 2

Background and Requirement

Analysis

In this chapter, we focus on the challenges to network architecture presented by new ap-

plications. We examine how the well-known trends in the commoditization of hardware,

software, and communications technology have enabled new distributed computing mod-

els. New applications based on these models have been very disruptive because of the

clear advantages that they have over traditional ones. We examine the features of these

new applications that have made them successful and identify the potential additional

benefits that may result from their associated computing models. We discuss how these

new models are leading to a new service or application provider infrastructure in which

computing and communications technologies converge in new ways.

2.1 New Computing Models

Relentless technology advance, captured by the rubric of ”Moore’s Law”, has been a

steady driver for change in networking equipment, devices, services, and applications.

Improvements in computation power and cost have facilitated the execution of more

complex software, which in turn has stimulated more demand for improved hardware.

17

Chapter 2. Background and Requirement Analysis 18

This virtuous cycle has taken a dramatic turn in the last few years as the basic enabling

computing, communications, and software technologies have become commoditized. New

distributed computing models have appeared that are fundamentally disrupting tradi-

tional models for offering services and applications by leveraging commodity resources

and introducing new business models.

Peer-to-peer applications are a prime example of these disruptive trends [18]. Com-

modity computing, communications, and software have also enabled new applications

that attain entirely new levels of scale, with Google search the preeminent example.

Peer-to-peer applications and Google search represent extremes of distributed comput-

ing in terms of ownership and control of resources, but they also share the advantages

inherent in distributed computing. Both examples can achieve huge levels of scale. Their

design provides for the delivery of the application through systems that are loosely cou-

pled so that faults can be addressed through simple mechanisms that exploit inexpensive

redundancy. Both designs incorporate self-organizing mechanisms to address the man-

agement of a huge aggregation of resources. Self-organizing mechanisms are also used

to ensure connectivity and basic levels of performance. It is clear that these new in-

frastructures that are built atop of shared and/or commodity resources can achieve very

large scales, while have the potential to provide higher reliability, performance, and much

lower operating costs.

2.2 New Applications through Composition

We have seen that the Internet has become the platform to support new applications

and that the associated infrastructure is becoming more decentralized. Moreover, the

approaches to creating new applications are also changing in a direction where innovation

becomes more decentralized. In this environment, new applications are created through

composition of service components, both in the form of ”mashups”, as well as in a more


rigorous form by using a Service Oriented Architecture [19, 20, 5].

The success of Web protocols and standards in enabling the deployment of a massive

system through the uncoordinated efforts of a global community provides support for

efforts to create applications through the linking of interoperable, loosely coupled software

components that are accessed through Internet protocols. New application providers

such as Google and Yahoo now offer access to components that provide services such as

search, map, chat, and photo sharing, to other application developers through Application

Programming Interfaces (APIs). The term ”mashup” is used to denote web applications

where several sources are used to create a new service [21]. For example, Google maps

have provided the basis for a huge number of mapping mashups. The importance of the

mashup phenomenon is that it marks the emergence of a new mode of application creation

where applications are created through a distributed and collaborative process and where

the application at any given point in time is the cumulative result of a community effort.

The term Web 2.0 refers to this network-centric platform [22].

The emergence of Service Oriented Architecture (SOA) standards represents another

related major trend for the delivery of applications [19, 20]. The key architectural con-

cept in SOA is the service orientation that enables the rapid and easy composition and

management of large-scale distributed services in the face of component autonomy and

heterogeneity. Service composition concepts are not new but have increasingly come into

the spotlight in recent years with the emergence of the new technologies such as XML

and Web Services (WS).

SOA model follows a three step phase: register, find, and invoke as shown in figure

2.1. Web Services set of specifications [23] is an instantiation of this model. Web Ser-

vices specifications provide uniform interfaces to loosely-coupled software components.

These specifications provide a messaging framework for the transfer of information, an

XML-based [24] grammar for defining web services [23], and the means for locating web

services. As in mashups, a key concept of interest to us in Web Services is the ability


Figure 2.1: Basic Service-Oriented Architecture model(source:http://www.w3.org)

to create new applications through the composition of service components that can be

accessed through standard Web Service interfaces. However, SOA goes further through

the development of the Business Process Execution Language (BPEL) [25], which al-

lows business processes to be implemented as workflows involving multiple web services.

From a network architecture perspective, SOA shifts the focus to an overlay network of

computing resources where messages are exchanged according to content and in doing so

opens the way to application-oriented networking.

The emergence of SOA as a new paradigm for service provisioning highlights the

importance of service orientation in the future application-oriented networks. SOA-based

loosely-coupled systems are giving enterprises greater agility, when it comes to adjusting

the structure of their businesses to meet changing business requirements. This model

of flexible and decentralized application creation has enabled introduction of many new

services and applications providers, and future network needs to consider this application

creation paradigm to facilitate and enable agility in application creation.


2.3 Emergence of Cloud Computing

Recently cloud computing model has emerged as a platform for deployment of appli-

cations and services. This model relies on very large scale datacenters attached to the

Internet cloud [10, 26]. These datacenters heavily utilize virtualization techniques on

top of commodity hardware and software components. The resources in the cloud are

consequently inexpensive and affordable for many application providers that find it more

economically viable to use these resources rather than investing on in-house deployment

of these resources.

The cloud computing resources are primarily based on virtualization of two pillar

resources; computing and storage. Application providers can use these virtualized re-

sources to store and process data. Moreover, they can dynamically secure or release

resource based on their current and/or anticipation of the load on their applications us-

ing programmable and open WS interfaces as in Amazon Elastic Cloud Computing (EC2)

services [27] or open-source systems based on that interface as in Eucalyptus project [28].

The cloud computing datacenters are connected to the network with very high speed

optical connections. Together with massive processing power and storage they facilitate

introduction of applications that require processing large volumes of data. Another ad-

vantage of the cloud computing model is that it has shortened the deployment phase of

an application lifecycle significantly and it has made creation of new applications con-

siderably faster than before. It also enables applications to dynamically adapt to change

in load by increasing or decreasing the amount of resources they need. Nevertheless,

the improvements should be made in the network virtualization since it has been shown

that [29, 9, 30] even internal traffic inside cloud datacenters does not enjoy a guaranteed

performance, and there is considerable interference between different users traffic.

The introduction and success of cloud computing model is another indicator that

future applications are moving toward platforms that enable rapid application creation

through composition of basic service components over shared platforms. It is also a prime


example of how commodity resources and virtualization techniques can facilitate creation

of low cost and short-life applications.

2.4 Evolution of Traditional Service Providers

The infrastructure of traditional service providers (i.e. telephone companies) is also

changing towards one that is based on the Internet Protocol, under pressure from tech-

nology advances and new application providers. While suffering from some clear dis-

advantages relative to new application providers, traditional service providers remain

superior in terms of mobility as well as reliability, security, and well-established business

models.

The infrastructure of the traditional service provider is undergoing a fundamental

transition to a multi-service packet-switching architecture based on Internet Protocol

(IP). This transition includes the introduction of a new control plane based on the Session

Initiation Protocol (SIP) [31] that enables the replacement of existing services such as

voice, and the introduction of services such as instant messaging and presence. This IP

based network architecture is articulated by the Next Generation Networks Focus Group

at the International Telecommunication Union [32].

According to the ITU-T definition, a Next Generation Network (NGN) [32, 33] is a

packet-based network able to provide services including Telecommunication Services and

able to make use of multiple broadband, QoS-enabled transport technologies and in which

service-related functions are independent from underlying transport-related technologies.

The NGN’s architecture allows decoupling of the network’s transport and service layers.

This means that whenever a provider wants to enable a new service, they can do so

by defining it directly at the service layer without considering the transport layer - i.e.

services are independent of transport details.

IP Multimedia Subsystem (IMS) [34] is an effort by telecom-oriented standard bodies


to realize the NGN concepts and extend the new control plane to any access network,

and it presents a natural evolution from the traditional closed signaling system to the

NGN service control system. IMS was developed for controlling the access to services by

customers of Third Generation wireless access networks. In the IMS approach, servers, in

the user’s home service provider’s network, control access to all services. Consequently,

the service provider can determine what services are delivered, at what quality level, and

at what cost.

IPSphere and Service Signaling Stratum (SSS) [35] are another telecom industry effort

to enable end-to-end services across multiple service providers. An interesting aspect of

the SSS is that web services are used in its implementations. Operators can publish the

services they are willing to provide, and other operators can use web services to negotiate

and secure the resources to enable end-to-end services.

The development of SSS presents interesting possibilities for the development of an

environment where traditional and emerging providers can work together in the delivery

of services and applications. Further development of systems such as SSS can provide

the means for these players to interact dynamically in distributed fashion, and in doing

so, create an open market for applications.

While initial implementations of IMS are based on traditional client/server architec-

tures, it is clear that the emerging distributed models associated with new disruptive

applications are applicable. We can therefore anticipate that future application and ser-

vice provider infrastructures will be based on similar, if not identical, infrastructures

that converge computing and communications to accommodate the emerging paradigms

for application creation and delivery. In addition, in the future, we will see the im-

pact of the main principle of NGN and IMS architecture, which is the independence of

the service-related functionalities from the transport-related functionalities, in the future

application-oriented networks. The main advantage of this separation model is the emer-

gence of numerous new service and application providers that utilize telecommunication


infrastructure for delivering their services to the users.

2.5 Introduction of Smart Phones

The introduction of smart phones and associated applications has been another major

trend in the past few years that makes the transition to mobile computers as the default

user devices. This success is mainly due to the advances in introduction of low power

consuming powerful processors, multi-touch interactive displays, and high bandwidth 3G

and 3G+ wireless networks [36], and upcoming Forth generation (Long Term Evolution)

of wireless access networks; 4G [37]. The introduction of these devices has triggered an

explosion in the number and diversity of Internet-based applications. This success has

also stimulated the introduction of other novel devices, e.g. the iPad, which in turn will

generate another wave of applications.

Applications on smart phones enjoy enhancements in other services such as location-

based services, and Instant Messaging (IM) services. They also are able to utilize cloud

based computing for performing processing intensive tasks such as speech recognition.

Another interesting trend in smart phones applications is the popularity and universal

acceptance of application marketplaces such as Apple store [38], and Android open ap-

plication marketplace [39] in which users and application providers interact and required

transactions are managed.

In combination with other trends and technologies like cloud computing and SOA-

based technologies, success of these devices is an exemplification of how the Internet, and

the infrastructure that has emerged around it have truly become the platform for the

delivery of an unlimited number of applications. It is also an indicator that the wireless,

high mobility and high bandwidth terminals will constitute the majority of users in future

networks, in contrast to the traditional view of network users as being mainly fixed and

wired.


2.6 Advancements in Content Delivery Networks

During the last decade content delivery networks have seen major advancements and

technological breakthroughs. A Content Distribution (or Delivery) Network (CDN) is an

overlay network upon which content (e.g., video) is distributed and delivered to the end

users. In CDN, content is usually copied on multiple servers across a wide area, and users

connect to one of these server to receive a copy of the content in contrast to contacting

the central server. Akamai [40] is a major content delivery network which was a pioneer

in this field. In Akamai, the content producers push their content to the Akamai edge

servers and users receive the content from these servers. The major shortcoming in this

model is that different content delivery networks are not usually inter-operable and the

users interactivity levels with the provided content is limited.

Another prime example of content delivery networks is the peer-to-peer file-sharing

networks such as BitTorrent [41] that consume a large portion of global Internet traffic

with a minimum centralized management, and proved how innovative, distributed and

self-managing systems can operate effectively over commodity and shared resources of

ordinary Internet users.

Publish/subscribe systems are another important class of content delivery networks

[42]. Publish/subscribe systems use an asynchronous messaging paradigm to link publish-

ers and subscribers of event information. One of the main protocols for pub/sub systems

is the Extensible Messaging and Presence Protocol (XMPP) [43] that is an XML-based

protocol first developed for Instant Messaging services and now is becoming one of the

main candidates for asynchronous message delivery.

A big advantage of the publish/subscribe paradigm is that publishers are loosely cou-

pled to subscribers. Publishers need not know of the existence of specific subscribers and

they can remain ignorant of the system topology. Publish/subscribe provides the op-

portunity for better scalability than traditional client-server paradigms, through parallel

operation, message caching, tree-based routing. This scalability, however, requires using


hardware-based message processing and rule matching.

Multimedia streaming applications are another emerging class of content delivery ap-

plications. Adaptive streaming [44] is the latest trend in this class of application. In

adaptive streaming, the streamed content format, and consequently required bandwidth,

is adapted to the end-user device capabilities and the available bandwidth. HTTP pro-

tocol is widely used in adaptive streaming. This class of application is going to see

another major challenge by emergence of 3D streaming applications [2]. Moreover, with

high bandwidth availability and growing demand for short delay and high quality video,

streaming uncoded and raw high definition multimedia content will become more attrac-

tive, especially since the content will be converted to many formats and be played on

heterogeneous devices.

According to a recent Cisco Visual Networking Index report [45] video traffic currently

accounts for more than third of Internet traffic, and another third of the Internet traffic

is associated with peer-to-peer file sharing networks. It is expected that by the end of

2014 all video traffic (P2P, TV, on-demand, and Internet) will take more than 91 percent

of global consumer traffic. It is anticipated that 57 percent of consumer Internet traffic

in 2014 will be Internet video, mainly due to the expected advancements in HDTV and

3D video.

Considering the statistics and all other major content delivery services such as YouTube

[46], and Hulu [47], and IPTV [48] services, we conclude that content delivery will con-

tinue to be one of the most bandwidth, processing and storage consuming applications on

future networks. Therefore, any future network architecture has to offer solid solutions

for efficient content delivery including smart caching, forwarding, broadcasting, and mul-

ticasting live as well as on-demand content (and associated metadata) to a large number

of heterogeneous (and mostly mobile) devices.


2.7 Future Networks Architecture

Although the Internet has been the essential infrastructure in modern society and it

has enjoyed enormous success in delivering myriad of services, it suffers from major

shortcomings in several aspects. These shortcomings are in different areas such as security

flaws, mobility support, QoS guarantees, traffic interference and isolation, addressing and

forwarding (multicasting/broadcasting) problems. With the introduction of new systems

and applications in Internet, these problems will become more significant and will affect

the network performance more than before.

Although many patch solutions have been proposed for these problems (e.g, firewall,

proxy, NAT, TCP-friendly protocols, etc), however it is widely accepted that the current

Internet has reached its limits and suffers from ossification [1, 49], and research into

proposing new architectures is needed to address the challenges that future networks will

face.

There have also been several proposals for introducing new network architectures and

protocols [50, 51, 52]. Among the proposed architectures, we can mention [53] in which

the authors have proposed a new network architecture based on pub/sub model. Also in

[54] Palo Alto Research Center (PARC) researchers propose a content-centric networking

in contrast to a location-centric view of the network. In this proposed content-centric

network, packet address points to a content rather than a location.

Our work in this thesis falls into this body of research and we try to study this prob-

lem from applications point of view. Our goal is to create environment in which networks

can participate more actively in full applications lifecycle including applications creation,

deployment and retirement. In the next chapter, we describe our view on future network

architectures and we present a new architecture called Application-Oriented Network

(AON). To address the challenges imposed by future applications, AON is designed as a

converged computing and communication network. To arrive at AON architecture, we

considered major trends and architectures discussed in this chapter such as Next Gener-


ation Networks architecture, Service-Oriented Architecture, Content Delivery Networks

and mobile networks.

One of the major obstacles in introducing new network architectures was and still is

experimentation with proposed network architectures in a large scale environment and

possibly with massive numbers of end users. To address this problem, there have been

several initiatives to build large scale testbeds for networking research. Examples of

these initiatives are GENI [55, 56], PlanetLab [57, 58], ProtoGENI (Emulab) [59, 60],

and ORCA [61] in the United States, FEDERICA in Europe [62], G-Lab in Germany,

and i2CAT in Spain. In the second part of this thesis we present Virtualized Application

and Networking Infrastructure (VANI) that is designed and developed based on AON

principles and enables experimentation with new network architectures and protocols

and distributed applications for future networks.

Chapter 3


In the previous chapter, we analyzed several trends in computer and communication

networking and in particular we discussed how commodity hardware and software led to

new paradigms in application creation and to the introduction of numerous applications

on the Internet platform. In this chapter, we consider the role that future networks can

play to further advance the creation of new applications and services. We introduce

an Application-Oriented Network (AON) as a converged computing and communications

network that provides flexible and dynamic support to application providers for delivering

diversified compositional services. AON support is provided through enriched transport

and service strata and a service-oriented approach to utilization of virtualized shared

resources.

AON is a converged network; in AON we eliminate the separation of computing

and communication technologies and combine them in a new approach. In particular, a

collection of networking and computing resources can be secured through AON to create

a distributed application. Unlike many other networks that deliver their services to the

end-users, the AON users are application providers. Application providers, on the other

hand, deal with the end-users of their applications. Applications in AON are created

through composition of service components and virtualized resources and can be in a

29

Chapter 3. Application-Oriented Networking 30

Man

agem

ent

Pla

ne

(man

ages A

ON

& reso

urces in

Ap

plicatio

n P

lane

Man

agem

ent

Pla

ne

(man

ages A

ON

& reso

urces in

Ap

plicatio

n P

lane

Co

ntro

l

Pla

ne

(allocates reso

urces in

Ap

plicatio

n P

lane)

Co

ntro

l

Pla

ne

(allocates reso

urces in

Ap

plicatio

n P

lane)

Ap

plica

tion

Pla

ne

(virtu

alized reso

urces, an

d serv

ice com

po

nen

ts)

Ap

plica

tion

Pla

ne

(virtu

alized reso

urces, an

d serv

ice com

po

nen

ts)

Figure 3.1: Three planes in an Application-Oriented Network

diversified range of applications such as telecommunication services, enterprise services

and content delivery networks. In AON, multiple applications are able to coexist and

have on-demand access to network resources as well as to flexibly configure and manage

these resources while each has different requirements. They might also have short life

cycles and can be easily deployed, grown/shrunk in scale and finally retired.

In the rest of this chapter, we present a reference model for an Application-Oriented

Network. The reference model is designed to describe how the AON goals can be fulfilled

and also to describe the framework of collaboration and interaction between the main

players in an AON namely service providers and application providers.

The AON reference model has three main planes: management plane, control plane

and AON user plane, also called AON application plane (Figure 3.1). As we explained

earlier, AON users are application providers. Application providers can deploy applica-

tions in the application plane using the resources instantiated in this plane. The control

plane, on the other hand, is used by the application providers to secure access to the

resources and service components in the application plane. The management plane is

responsible for managing these resources as well as the Application-Oriented Network. In

the next section, we describe AON application plane characteristics, and its architecture.


Application Plane ResourcesApplication Plane Resources

SCSC SCSC

Processing

Networking

Resources

Storage

Facilitating Service

Components

Figure 3.2: Application Plane Resources

3.1 AON Application Plane

The AON application plane is composed of virtualized resources and service components

required for creating an application. These resources can be communication resources

(e.g. virtual links), and computing resources such as virtual processing, reprogrammable

hardware resources, and storage resources as well as any hardware-based or software-

based service components needed for creating applications. These service components

can include, for example, database services, orchestration services, and content conversion

services (Figure 3.2). Other examples of resources are general service components and

software-as-service components such as application-specific authentication, authorization,

accounting, as well as security-related services such as encryption/decryption services.

In the AON application plane, all resources are virtualized and represented by one or

more service components. Virtualization, as we explained earlier, is a technique that is


used to instantiate a virtual resource that provides the essential capabilities of the real

physical resource. The virtual resources are abstracted from the physical resource and

can be shared among many users without interference. For instance, a virtual computing

resource is a processing resource that might share a physical processing resource with

other virtual resources.

AO

N C

on

trol

AO

N C

on

trol

AO

N M

anag

emen

tA

ON

Man

agem

ent

AON Application PlaneAON Application Plane

SC

SC

SC

SCSC

SC

App1

SC

SC

SC

SC

App2SC

SC

SC

SC

SC

App3

Virtual resources assigned to each application by AON Control plane, and managed by AON Management Plane

Figure 3.3: Multiple Applications in AON

The virtualized resources and service components expose their functionalities through

well-defined open interfaces, such as Web Services, that are platform-independent and can

be invoked from heterogeneous environments. Application providers are able to program,

configure and compose these resources using SOA technologies according to their own

requirements to create a more complex application or a service that can be used by other

applications. For instance, an orchestrator service can be built on top of processing and

storage services, or as another example, a content conversion service can be built using

a virtualized reprogrammable hardware resource.

The virtualized resources and service components are assigned to each application per

application provider request. These resources are secured for each application provider


through the AON control plane. The AON management plane performs the management-

related tasks such as monitoring and fault management on these virtualized resources.

In other words, the AON control and management planes cooperatively create a resource

pool for applications in which they can operate. Consequently, multiple applications are

able to coexist in an AON while each owns one of the created resource pools. (Figure

3.3).

Applications in the application plane can follow any layered network architecture that

satisfies their requirements. However, to address the applications requirements discussed

in the previous chapter, we propose a generic architecture for the application plane. In

proposing this generic architecture, we study the functionalities as well as resources that

need to be embedded in the application plane.

To arrive at the application plane architecture, we considered major trends in the

computing and communication fields (described in the previous chapter), especially Next

Generation Networks architecture [32], Service-Oriented Architecture [5] as well as Con-

tent Distribution Networks. The AON application plane architecture has the main char-

acteristics of the NGN and SOA architectures: the separation of services from transport

and a service-oriented design for the service layer.

Traditional transport layers are mostly designed to perform pure digital data delivery

between two geographically separated points. As new applications emerge, however, the

need for performing content-delivery functions in a network in addition to data-delivery

becomes more significant as content-delivery becomes the default and dominating com-

munication transfer mode in future networks. For this reason, the transport layer in

AON application plane incorporates content-delivery related functionalities to accommo-

date content distribution applications.

In comparison to the Next Generation Networks (NGN) reference model, the AON

reference model can be seen as a new interpretation of NGN principles. In AON, the key

principle of NGN that is the separation of the service layer and the transport layer is


adapted in the way that it changes the “abstraction level” of the delivery concept in the

transport layer from raw digital data-delivery to the more advanced content-delivery.

The AON reference model acknowledges the key principle of SOA which is the service-

orientation in the service layer. Therefore, this architecture for applications appears as an

evolution from the NGN reference model and SOA. This evolution provides the platform

for achieving benefits of both in a converged network.

Figure 3.4 shows the AON application plane architecture that includes two main

strata and internal planes of an application. As can be seen, within the AON application

plane each application can have its own user-plane, control plane and management plane

operating on its allocated resource pool.

Us e r Plan e

Con trol Plan e

Mgm t Plan e

Service

Transport

Figure 3.4: Application Plane Architecture

Service Stratum

The service stratum in the AON application plane embraces the functions facilitating

development and deployment of services and applications as well as service modules and

applications. Based on service-orientation concepts, functionalities inside the service

stratum will be those that enable the rapid development and deployment of services

including search, location, identity, instant messaging and application-specific authen-

tication, authorization and accounting. Other example components in this layer are

modules responsible for orchestrating services and creating new complex services and


applications.

As we stated before, the cornerstone in this layer is the service-orientation concepts

and provision of facilities for creation of new applications. In-network service layer

components can also include third-party produced services that can be used in service-

oriented application creation. Among these services we can name alternative accounting

services, orchestration engines, inter-networking services, instant messaging services, lo-

calization services and etc. AON control and management planes provide functionalities

necessary for interactions between the application-providers and generic service providers.

In the control plane description we discuss this functionality in more detail.

Transport Stratum

The main differentiating characteristic of the transport layer in future application-oriented

networks compared to conventional networks is the inclusion of content-delivery tasks in

the transport layer in addition to the pure data-delivery tasks. As we described in the

previous chapters, the majority of global Internet traffic is currently consumed by content-

delivery and file-sharing applications, and these types of applications are becoming the

dominant type of application while the traditional one-to-one human communication ap-

plication is becoming a special-case scenario. Therefore, we propose inclusion of content

delivery to accommodate the applications in this class and to fulfill the requirements dis-

cussed in the previous chapters such as efficient and smart content distribution, caching,

conversion, encryption, decryption, etc. Moreover, this change enables improved and

efficient handling of mobility and security challenges in future networks.

The inclusion of content delivery tasks in transport layer functionalities implies inclu-

sion of major content-delivery related resources into this layer in addition to traditional

networking resources. The most important of these resources are processing and storage

that are the most basic needs of future applications in the transport level. In the rest of

this subsection, we elaborate more on requirements and advantages of inclusion of these


resources into the transport stratum.

Processing

In-network processing resources in AON can be used in many content-delivery related

functionalities such as content conversion, compression and decompression, encryption

and decryption, content validation, content-based routing and content transformation.

In-network transport nodes equipped with processing resources can also host a range

of essential services for application creation provided by third parties. Among these

services we can name pub-sub engines, message-passing services, security engines, con-

version services and compression and decompression engines all deployed on the process-

ing resources. The AON control plane provides functionalities necessary for interactions

between the application-providers and these third-party service providers.

Although most of the current content processing systems are software-based, to fulfill

the scalability requirement of future applications hardware-based content processors may

be needed to empower in-network content processing functions. These hardware-based

content processors have to be configurable, customizable, and reprogrammable to meet

application-specific requirements. An example network architecture that will benefit from

this capability is the pub-sub architecture, especially for hardware-based rule-matching

and content-based routing. 3D video distribution networks or even software-defined mo-

bile networks can also benefit from such reprogrammable hardware resources.

The inclusion of powerful processors in the transport layer allows for significant im-

provements in privacy-related and security-related operations as well. For instance pro-

cessing intensive tasks such as rigorous security checks on the packets, messages and con-

tent can be done in the network to meet the applications security requirements. Later

in this chapter, we discuss security concerns in AON in more detail.

Processing can also be used in mobile networks to improve quality and efficiency in

mobility related functions. For instance, it can be used in performing handover during


video streaming to a mobile user. In this scenario, in order to provide a smooth handover

experience, multiple format conversions can be performed on a video stream and the

generated streams can be forwarded to heterogeneous devices or end-points associated

with the user. Combining this with adaptive streaming approaches in content distribution

will lead to a far better video and multimedia streaming experiences in future networks.

Other application scenarios in which in-network processors can be useful are nu-

merous. As another example, network-coding-based content distribution networks can

significantly benefit from in-network processors to perform image processing and content

coding/decoding tasks using the general processors, Graphics Processing Units (GPU),

network processors as well as reprogrammable hardware resources.

Storage

Storage is another basic requirement of most applications especially in content distribu-

tion. Applications choose various methods for dealing with the problem of storing content

according to constraints such as the content type and target users. Some applications

exploit distributed commodity storage resources spread throughout the network such as

BitTorrent [41] which utilizes a peer-to-peer based configuration while others use a more

centralized approach.

Storage is also required for reliable and efficient cache and delivery of content espe-

cially to temporarily unavailable nodes in mobile networks [63] as well as in pub/sub

systems. Storage is also needed for efficient content multicasting and broadcasting with

advanced functionalities such as playback.

Content storage together with processing is also useful in live and adaptive streaming

applications where a large number of end-users need to receive a metadata-enriched

multimedia stream over heterogeneous access technologies and mobile devices. Different

conditions in the access network, such as handover or signal loss may lead to temporary

disconnection of the mobile node. In a mobile network, in-network and near-to-the-


user storage capabilities become very useful especially for performing smart and efficient

content caching. The need for in-network storage in delivering content to mobile nodes

have also been discussed in [63] in which the authors have proposed a clean-slate cache-

and-forward architecture for video delivery in mobile networks.

Networking

Networking resources are the resources that provide pure data-delivery between different

resources inside an AON or between the AON resources and applications end-users.

These resources have been traditionally the main part of transport layers. To satisfy

future applications data-delivery requirements, in an AON the applications are able to

specify their networking requirements. Many applications require guaranteed network

connections while others prefer the traditional best-effort connectivity. For instance, the

Internet transport has been designed based on a simple best-effort packet-forwarding

approach. Although there have been many efforts to introduce advanced QoS guarantees

in IP-based transport networks, many applications are continuously finding this model

of data-delivery simplistic and insufficient. AON enables the application providers to

request for different levels of Quality of Service by specifying rate, delay, etc.

Configurability is one of the main requirements of an Application-Oriented Network.

Therefore, in an AON, unlike traditional transport layers, the application providers are

allowed to even configure the data-delivery network topology to adapt to applications

requirements. This functionality has been also demonstrated in CANARIE network us-

ing User Controlled Light Path (UCLP) Web Services [64, 65]. An Application-Oriented

Network also has to provide different levels of communication services in varying granu-

larity to applications that require such services. These communications services include

(but not limited to) optical light path connections, circuit-switch, packet-switching or

MPLS-based connections, multicast and broadcast network links, and connections to

multi-homed end-points. The networking resources in AON are virtualized and made


available to application providers through well-defined and open interfaces of the AON

control plane.

In comparison to the current CDNs, the combination of content-delivery (processing,

storage) and the data-delivery (networking) in AON transport layer enables a more ad-

vanced content-delivery that can be customized for an application and a specific content.

The inclusion of processing, storage, and content-delivery in general, in future networks,

is the true manifestation of the convergence between computing and communications re-

sources in an Application-Oriented Network, as we discussed in the previous chapters.

3.2 AON Control and Management Planes

The AON reference model is composed of an application plane (which itself can have

multiple stacks of applications) and a control plane which is responsible for dynamic

allocation of application plane resources to each application (shown in Figure 3.5). The

AON control plane is also responsible for fast failure management while the manage-

ment plane performs long-term management tasks on the application plane virtualized

resources such as provisioning, re-provisioning, prediction, and pricing, and fault moni-

toring and management.

The three-plane model is traditionally used in telecommunication networks to draw

a boundary between different functionalities required in high-quality operation of a net-

work. The Internet Protocol and TCP/IP model of communications, however, lacks a

clear identification of the control and management plane. While this fact has been a

major advantage point for IP and has enabled IP to grow in scale and be managed, it is

generally believed that the lack of a well-designed control and management plane in the

Internet will be a major reason for replacing this architecture in future networks. There-

fore, AON success directly depends to its control and management plane architecture

and the flexibility, scalability and the type of functionalities that they can provide.


AON Application PlaneAON Application Plane

Us e r Plan e

Con trol Plan e

Mgm t Plan e

Service

Transport

AO

N C

on

trol

Plan

eA

ON

Co

ntro

l

Plan

e

AO

N M

anag

emen

t

Plan

eA

ON

Man

agem

ent

Plan

e

Multiple Applications (stacks) in Application plane

Us e r Plan e

Con trol Plan e

Mgm t Plan e

Service

Transport

Figure 3.5: Application-Oriented Network Reference Model

The most important feature in Application-Oriented Network is that it has to be

configurable and application-oriented. The AON control plane provides mechanisms for

on-demand allocation and configuration of the network resources per application provider

request. For example, in most conventional networks, transport-level network topologies

are mainly determined by the topology of physical links connecting the routers. In an

application-oriented transport layer, however, network topology should be determined

based on the application requirements. It is also possible for an application to dy-

namically change the topology according to various factors, such as changes in load or

operation costs (e.g., power).

Another main functionality of the AON control and management planes is to enable

interaction between application providers and service providers. Service providers can

provide services which can be used by other application providers. Therefore, AON

provides the main functionalities and a framework for these types of interactions and

handles the accounting, management and monitoring issues related to it.

Service providers can register their services in the management plane and the control


plane allocates them to the application providers on-demand. The control and manage-

ment plane together handle the authentication, authorization and accounting aspects of

this interaction between the application providers and service providers. In other words,

the service producers do not need to know the identity of the service consumers and vice

versa.

In AON, services follow generic, platform-independent and well-defined interfaces so

that they can be allocated, controlled and managed by the AON control and management

plane. The generic interface is also needed to enable application providers to incorpo-

rate the resources and service components in their applications as simply as possible.

In AON, each class of resources follows a generic interface template. For instance, pro-

grammable resources (e.g., processing resources) can follow one generic interface, while

storage resources can follow another generic interface. Network resources are another

class of resources that need their own generic interface.

Another important function in AON management plane is the monitoring and mea-

surement operations. The monitoring and measurement need to be done in order to

control QoS and moreover, it is required for applications to adapt to the changes in the

environments that they operate in.

All the communications between the control plane and management plane need to be

secured and authenticated so that the security risks in an AON are minimized. In terms

of interactions between the application providers and the AON control and management

planes, all such communications are done through secure channels and different levels of

authentication and authorization are performed to enable secure access to AON resources.

Also different access levels for application providers are defined to efficiently authenticate

and authorize the usage of the resources in the application plane. The AON control and

management plane then allocates the resources to the application providers based on

their profile limit, and provides the isolation between the resources allocated to different

applications in the application plane so that they can not interfere with each other’s


operation. This also stops applications from performing a security attack, such as sniffing,

on another application.

In the application plane, however, the application providers are free to follow any

security approach they choose and AON does not limit them to use a particular method

and technique. The communication between the application providers and the resources

in the application plane extend beyond AON security checks and AON does not impose

any specific protocol, format or encryption technique on these interactions as well.

3.3 Application-Oriented Routers

In this section, we focus on Application-Oriented Routers (AORs) which are network-level

nodes in Application-Oriented Networks. We examine the types of functionalities which

should be embedded in these emerging network elements. We also discuss several use

cases of AORs, based on the proposed network architecture including AOR in enterprise

networks as well as in telecommunication networks and content delivery networks.

Due to inclusion of content-delivery functions in the transport level, application-

oriented routers are not only able to perform the conventional routers’ task of pure

data delivery, but also perform content-delivery related tasks. The AOR operations also

include the previously described tasks of content processing and content storage.

In order to meet the high throughput and low latency requirements, applications

need processing technology that goes beyond the conventional software processing. For

instance, in case of XML processing, relative to transactional database processing, it

has been found that the desired response times and transaction rates cannot be achieved

without major improvements in XML parsing [66, 67]. To reach the required throughput,

AORs will exploit hardware techniques for processing intensive operations especially in

the form of hardware-based XML processing, validation, transformation, encryption,

decryption, compression, decompression, and content-based routing [68]. Thus AORs


emerge as a networking component that is high in performance, has high reliability,

and includes traditional layer three routing capabilities as well as the described content-

delivery tasks.

Another important requirement for application-oriented networks is the ability to con-

figure the network elements based on the applications requirements. In most conventional

networks transport-level network topologies are mainly determined by the topology of

physical links connecting the data routers. In an application-oriented transport layer,

however, network topology should be determined based on the application requirements.

Some applications are best suited for a flat peer-to-peer topology and others might require

a hierarchical architecture.

In AON applications share the same infrastructure for application development and

deployment. However, there is no “one size fits all” configuration in AON , and network

resources and elements are configured based on the applications’ requirements. As a

result, Application-Oriented Routers are not pre-configured devices which provide some

basic functionality to all applications and force the application providers to follow the

predefined configuration. On the contrary, AORs open the doors of the network level

entities to application providers and give application providers the option to configure

their allocated resources as they prefer. This will be achieved through virtualization

of resources and providing well-defined open interfaces to configure the resources on-

demand.

Figure 3.6 shows our overall view of AORs in an Application-Oriented Network. As it

can be seen multiplicity of applications share a converged communication and computing

infrastructure as well as the hardware and software components embedded in application-

oriented routers. Each application in this view has a resource pool in a set of AORs and

the end-users and terminals can be part of one application or more.

Application-oriented routers will also have to meet many of the traditional require-

ments of existing service provider infrastructure. The traditional capabilities to engineer


Enterprise Nodes

Enterprise Nodes

AOR

AOR

AOR

AOR

Application

Oriented

Network

App.3

App.2

App.1

Figure 3.6: Overall view of an Application-Oriented Network with multiple AORs andapplications

reliability and performance into the overall system will be needed, as will the incorpora-

tion of novel self-management mechanisms that reduce operating expenses.

3.4 Application-Oriented Routers Use Cases

The success of Application-Oriented Routers depends on the value they add to the current

applications and also the facilities they provide for future ones. Therefore, in this section,

we present some use cases for application-oriented routers in the context of the proposed

architecture for application-oriented networks.

3.4.1 Telecom Service Providers

IP Multimedia Subsystem (IMS) [34] is one of the major candidate architectures for

the next generation telecommunication networks. In this subsection we focus on the

potential contributions of AOR to IMS networks. IMS is an effort by the telecom-

oriented standard bodies to realize the NGN concepts and extend the new control plane


Application

Oriented

Network

AOR AORUser A

User B

Figure 3.7: Telecommunication services in an AON

to any access network and it presents a natural evolution from the traditional closed

signaling system to the NGN service control system. IMS was developed for controlling

the access to services by customers of Third Generation wireless access networks. In the

IMS approach, servers, in the user’s home service provider’s network, control access to

all services. Consequently, the service provider can determine what services are delivered

at what quality level and at what cost.

In the context of future application-oriented networks, IMS service providers can uti-

lize the content-processing and storage functionalities embedded in the AORs to increase

the quality level of their current services and to introduce new services using newly

available functionalities.

Among these newly available functionalities we can mention content transformation

and transcoding, which enable connectivity between heterogeneous devices, and also

content multicasting and other sophisticated content processing tasks such as encryp-

tion/decryption, pattern matching, and compression/decompression. The need for these

types of network-support for content delivery has recently gained more attention as in [69]

the authors have proposed the idea of network support for content-delivery for ambient

networks that is aligned with our view on application-oriented networks as well.

For example, consider a case which is shown in Figure 3.7. User A has an active


multimedia session with user B. At one point, user A decides to change his or her device

from a SIP Phone with a limited set of capabilities to a more powerful device like a laptop

or to another SIP phone device with a different type of capabilities. To do so, user A

initiates a transfer procedure and transfers the session from his or her first device to the

second device. If for any reason, User B’s device does not handle the coding required for

the new device the transfer procedure would fail. Also, the transfer procedure might be

unsuccessful due to the incompatibilities between protocol stacks.

The storage and processing capabilities in AORs will be very useful in handling mo-

bility and security related tasks in IMS networks. For instance, in the above scenario

and in the case of hand-off, content can be temporarily stored in AOR since the user de-

vice might be temporarily disconnected from the access network. In this scenario, smart

caching combined with adaptive stream processing in AORs enable a fast and efficient

connection resume phase in which the user experiences minimal disruption.

Issues like content transformation from one format to another can be solved in a

much easier way using AORs. For example, an intermediate AOR is able to perform the

necessary conversion between different media formats, or to transform one SIP message to

another. Also, it can compress, decompress, encrypt or decrypt the content. In addition,

another important common task which is processing intensive is content validation, which

can be performed in AORs.

Another use case for AORs is content-based policy enforcement. For example, if user

A in Figure 3.7 sends a high priority message in an emergency situation, an AOR, based

on a policy, can identify the priority of the message and treat it in a way different than

ordinary messages. In another use case, if a SIP device needs to access an XML-based

service, an AOR can perform the transformation required between the protocols.


37

Service

Requester

Service

Requester

Service

Provider

Service

Provider

Enterprise Service Bus

Application Oriented NetworkApplication Oriented Network

AOR AOR

Figure 3.8: Enterprise Service Bus and AON

3.4.2 Enterprise Networks

Enterprise Service Buses (ESB)[70] are used to provide necessary functionalities for de-

ployment of enterprise applications, mainly in the context of Service-Oriented Archi-

tecture [5]. In the context of Application-Oriented Networks, ESBs can perform their

processing-intensive tasks on the Application-Oriented Routers. These tasks include

content validation, compression/decompression, pub/sub message delivery, rule-based

matching and content forwarding, content encryption/decryption, and content transfor-

mation.

Current proprietary ESBs use XML-processing appliances to do the content processing

tasks, especially encryption/decryption tasks and content transformation [71]. These

XML-processing appliances, however, are not standardized and not available to small

vendors, and also are not affordable by many enterprises. In an Application-Oriented

Network, an enterprise can exploit the content processing facilities, embedded in the


AOR, to increase its application’s quality and decrease expenses, eventually leading to

better quality of services and reaching very large scales. In addition, enterprises can use

the storage capacity provided in the AORs to store their content with lower costs and for

reliable message delivery to temporary unavailable nodes in pub/sub model of message

delivery, especially to the mobile nodes.

For example, consider the case which is shown in Figure 3.8 where an enterprise is

using an ESB to support its SOA-based operations. One of the main tasks usually done

by an ESB is the content validation, which is very processing intensive and thus places

much burden on an ESBs’ shoulders. An AOR can perform this task for the ESB. It can

also perform security-related tasks such as content encryption and decryption as well.

Another use case for AOR in an enterprise environment is content-based routing and

multicasting. This functionality is valuable when there is a request for an unspecified

service and AOR can forward the request to a server based on a policy in a way similar to

a distributed ESB. Having seen these use case scenarios, we can conclude that by using

AORs, ESBs and enterprise networks can scale and adapt to the business demand with

high agility.

3.4.3 Overlay Networks and Content Distribution Networks

In the context of media and content distribution networks, one of the major issues is

effective content caching and multicasting, especially for live content streaming [72].

In Application-Oriented Networks, application providers can utilize AORs’ capabilities

in both content storage and content processing, and deliver high quality services to

their users. Among these applications, we can point to the Video-On-Demand and TV

applications.

AON content delivery can cover functionalities provided by traditional CDNs such as

Akamai, and it can also cover advanced content delivery functions by enabling application

providers to create their own application-specific CDN architecture using features such


AOR

AOR AOR

Application

Oriented

Network

P2P Network

Figure 3.9: Peer-to-Peer network in AON

as locality and identity to provide a customized content delivery for their users. Users

can interact with the content, and other users, and can produce metadata that can be

used by other application users.

As another example, we can mention peer-to-peer networks and other flat or hierar-

chical overlay networks which are currently unable to flourish due to their need to robust

nodes with processing and storage capabilities in the network. An example of these hi-

erarchical architectures has been studied in our research group [73]. In this structure,

unpopular content can be stored on smaller number of powerful nodes, while popular

ones are copied on many computers at the edge of the network and distributed in a

peer-to-peer topology. AORs content delivery features allow these types of new network

architectures to flourish.

In peer-to-peer networks, AORs can be used to store critical data which is needed to

be stored in a robust node. In addition, as it is shown in Figure 3.9, AORs can be used

to store popular content based on the usage patterns and users’ locations. As a result,

these networks can deliver better quality services while being efficient on the bandwidth

usage in the network. In this application scenario, AORs’ content storage functionality

can be used to store content in the places near to the interested users.


3.5 Related Work

There have been many attempts on developing new architectures for future networks [50,

51, 52, 53, 54], however, our proposed AON is different from many other proposals. This

is mainly because AON is designed to serve application providers instead of end-users,

and allows multiplicity of applications with different internal architectures to coexist

over same shared virtualized infrastructure. In other words, multiple proposed network

architectures, each with its own reason for their design and existence, can be deployed

inside an Application-Oriented Network.

Nevertheless, among the proposed network architectures, we needed to distinguish

our work from two related works. The first network is the Cisco’s Application-Oriented

Network [74]. Cisco has introduced its Application-Oriented Network line of products

that performs on-the-edge XML processing functionalities especially for Cisco’s enterprise

customers. Application-Oriented Network that we defined in this chapter is different

than Cisco AON in many aspects. In our AON we provide content-delivery in addition

to message processing, and we provide a framework for allocating virtualized in-network

resources to different application providers. In other words, we create a network of

computing and communication resources and allocate them to applications on-demand.

Another related work is the Slice-based Facility Architecture (SFA) proposed by GENI

[75] for federating network research testbeds in the United States. In SFA, researchers

are able to request instantiation of a slice of testbed resources across a federated network

of testbeds to perform an experiment on a new network architectures. SFA and AON

architectures are different in many ways. SFA’s goal is to enable experimentation over

multiple testbeds, and in this regard, it is not designed to address future network chal-

lenges. SFA does not have a three plane architecture, and clear statement about content

delivery support in future networks. Moreover, SFA does not acknowledge the fact that

the service-oriented application creation paradigm and its related set of technologies are

crucial in future networks success and is central in addressing future networks require-


ments. The last but not the least important difference is that SFA is not concerned with

the management of the resources to deliver the required quality of service.

Part II

Virtualized Application Networking

Infrastructure

52

Chapter 4

Virtualized Application Networking

Infrastructure

In the past few years, the idea of clean slate network design has been circulated in the

networking community and there have been several proposals for introducing new net-

work architectures and protocols [50, 51, 52]. One of the major obstacles in introducing

new network architectures was and still is experimentation with proposed network archi-

tectures in a large scale environment and possibly with massive numbers of end users.

To address this problem, there have been several initiatives to build large scale testbeds

for networking research.

GENI [55] is one of these initiatives that tries to create a testbed by federating

different testbeds such as PlanetLab [57, 58] and Emulab (ProtoGENI) [60] on top of

a research dedicated network in the United States. GENI is still in the design and

development phase, but currently it follows a slice-based architecture [56, 75]. In GENI

different testbeds would be able to connect to each other through GENI wrappers. The

exact communication protocol between the GENI wrapper and the testbed is left to each

testbed’s control plane and currently there are a few major control planes in GENI that

are trying to federate using the wrappers.

53

Chapter 4. Virtualized Application Networking Infrastructure 54

Probably among above testbeds PlanetLab [57] is the most developed. PlanetLab pro-

vides edge hosts on Internet and implements a slice-based architecture using the Linux

vServer [76] technology. PlanetLab, however, does not have a clear solution for exper-

imentation with new layer three protocols, and it’s not clear how it would facilitate

building high scale new routers that would need hardware-based acceleration.

In Canada, there is a research dedicated optical network called CANARIE [77] that

provides light paths connecting universities and research centers across Canada. CA-

NARIE has sponsored design and development of a User Controlled Light Path [65]

(UCLP) software that enables researchers to configure CANARIE network elements

through Web Services (WS) interfaces on-demand.

Another major initiative is FEDERICA [62] in Europe that is under development

through federation of several research network platforms in Europe such as i2CAT in

Spain and HEAnet in Ireland. FEDERICA uses WS-based UCLP software for creating

on-demand virtual networks atop of involving test platforms.

Another project for experimentation with lower layer protocols and networking al-

gorithms is NetFPGA [78]. NetFPGA is a PCI card with a Field Programmable Gate

Array (FPGA) chip, and four Gigabit Ethernet interfaces that could be used for de-

veloping different networking components such as a layer three router or a hardware

accelerator.

In this chapter, we present a new testbed for networking experiments and networked

systems. This testbed is different than the above mentioned projects in several aspects. It

benefits from a novel architecture for control and management functions capable of man-

aging various hardware-based and software-based resources. It also allows experimenting

with new network architectures that require in-network content processing and storage

capabilities. Moreover, it includes a new high performance and high throughput hard-

ware resource that makes experimentation with hardware-based or hardware-accelerated

networking algorithms and protocols as easy as experimentation with software-based


protocols.

Our vision in designing this testbed was to develop a converged application-oriented

computing and communications infrastructure to support an open applications market-

place. We investigated architectural aspects of this application-oriented network and

presented a proposal in the first part of this thesis. We also investigated autonomic

management issues and proposed an approach using virtual networks in [4, 79].

The essential aspects to enabling the above application-oriented environment are:

1. Service-oriented application creation; 2. Infrastructure as a Services methods for

configuring and scaling resources to support applications; 3. Virtualization of physical

resources.

Based on this view of an Application-Oriented Network, we began the development

of a testbed that would allow university researchers and application providers to de-

velop new networked systems and networking architectures. This testbed, Virtualized

Application Networking Infrastructure (VANI), allows the creation of virtual networks

of computing and communications resources. A VANI node consists of resources such as

processing, storage, networking and programmable hardware. A service-oriented control

and management plane allows VANI nodes to be interconnected into virtual networks to

support applications operating in the applications plane.

In the rest of this chapter, we describe the main requirements in VANI design, and

its architecture and main components. Also, we explain how our design would satisfy the

requirements. Moreover, we present the performance evaluations on the developed re-

sources for this infrastructure including a virtualized reprogrammable hardware resource

that enables hardware-based experimentation of networking algorithms and protocols.


4.1 VANI Design Requirements

Virtualized Application Networking Infrastructure (VANI) is a testbed that allows uni-

versity researchers and application providers to utilize its internal resources to rapidly

create and deploy networked systems, and to even experiment with new layer three pro-

tocols. Although the underlying concepts of the VANI testbed comes from our view on

Application-Oriented Network [17], but networked systems running in VANI environ-

ment could follow any architecture in any networking layer. The only limitation that the

researchers are facing in VANI is that their experiments should run on top of Ethernet

as their layer two. Next, we describe the main requirements in designing VANI.

The first requirement for VANI testbed is that it should allow experimentation for

future network architectures that might not fit into the traditional layer three defini-

tions. Currently networks are primarily responsible for delivering raw data but in future

it would be possible for future network architectures to shift-up the network tasks to

new functionalities that might be required by emerging applications. Among these func-

tionalities could be the task of content-delivery in addition to data-delivery (such as the

network architecture discussed in [17]) that would imply having content processing and

storage functions in the infrastructure.

The second main requirement was to allow researchers to experiment with new layer

three protocols (as in the traditional definition of L3) instead of the current Internet

Protocol. To do so, we designed the testbed assuming that everything above layer two

could be redesigned and experimented with, and we chose Ethernet protocol as the basis

of our layer two design.

Another main requirement in the testbed is to be able to setup experiments or create

new applications rapidly using already developed and ready to use components that

could be accessed through open interfaces. These components could be the virtualized

resources such as processing, low-latency hardware processing, and accelerator nodes, or

software components such as event processors that are used in many experiments for


Testing new

L3 protocols

Monitoring/

Testing

Rapid exp setup/

app creation

Future

network arch

VANIIsolation/

Security

Figure 4.1: VANI design requirements

data gathering and analysis. This requirement could be satisfied through the use of the

SOA technologies and standards that could allow flexible and dynamic composition of

reusable service components.

The fourth main requirement was to provide an isolated and secure environment for

researchers to carry on their experiments and develop their networked applications. This

requirement has to be satisfied at different levels such as traffic separation, bandwidth

allocations, storage access, secure access to the physical resources and isolation between

different physical resources. The fifth main requirement was the monitoring and de-

bugging mechanisms. In our design, we envisioned powerful complex event processing

components that could be customized to gather and analyze test and debugging data for

each experiment separately as well as for the testbed itself.

4.1.1 VANI Architecture

Based on these main requirements, we designed a two plane architecture for our platform:

control and management plane (VANI-CMP) and applications plane (VANI-AP).

VANI-CMP is responsible for virtualizing physical resources and allocating them to


Co

ntro

l &

Man

agem

ent

Plan

e

Ap

plicatio

n

Plan

e

• Two main planes– Control and Management

Plane

– Application Plane

• All resources needed for experiment setup are in app plane

• Control/Management used for allocating a resource pool to a researcher/application provider

• Applications can have their own architecture inside app plane

– Example applications: a new network instead of IP network with a new layering architecture, or a new Content Delivery Network

Figure 4.2: VANI architecture

the researchers and application providers. On the other hand, researchers deploy their

applications and experiments in the VANI applications plane (VANI-AP). Applications

operating in the applications plane can have their own architecture inside an applications

plane slice that is created by VANI-CMP.

For example, an experiment/application could be a new layer three protocol that

covers OSI layer three and four functions, could replace TCP/IP layer, or could be a new

content delivery network. Figure 4.2 shows this architecture including its two planes.

All virtualized resources and service components that can be used by researchers for

creating an application reside in the applications plane. Researchers can ask for these

resources through the testbed control and management plane and then they can directly

connect to the virtualized resource in the applications plane through any resource specific

protocol such as HTTP, UDP/IP, or ssh.

For example, a user can ask for uploading or downloading of a file to the storage

service through the control plane, and then if permitted by the control plane, it has to

directly contact the storage file service using HTTP/TLS connection and download or

upload its files.


Test-bed Control and ManagementTest-bed Control and Management

Virtualization LayerVirtualization Layer

Virtu

alized R

esou

rcesV

irtualized

Reso

urces

Researcher

Virtualization LayerVirtualization Layer

Control and Management Plane Application Plane

Virtualization

Agents

Virtualization

Agents

Web Service-Interface WS HTTP/SSLSSHUDP/IP

Virtualized Resources

WS WS

proprietary proprietary

PhyRes

Physical

Ph

yR

es

Resources

Ph

yR

es

Ph

yR

es

Ph

yR

es

PhyRes

PhyRes

PhyRes

PhyRes

PhyRes

Figure 4.3: Researcher interaction with VANI planes

VANI control and management plane (VANI-CMP) is responsible for allocating testbeds

resources to the researchers. Researchers ask VANI-CMP for a resource using VANI-

CMP’s Web Service interface. WS interface is chosen due its universal acceptance for

SOA, and the abundance of available tools for orchestrating and creating new applications

using independent Web Services.

After receiving the requests for resources from a researcher, VANI-CMP authenticates

the researcher and authorizes its request and then sends the request to the resource

virtualization layer. The resource virtualization layer is the layer which abstracts a

physical resource and offers it as a service to the control and management layer. If

the allocation is successful, VANI-CMP records the allocation, and replies back to the

researcher with a successful return result.


VANI-CMP also programs and releases the resource whenever an authorized re-

searcher wants to do so. Figure 4.3 depicts the logical view of the VANI testbed and how

a researcher interacts with VANI planes.

4.1.2 Current Physical Resources in VANI (VANIv1 Resources)

Currently, several physical resources have been virtualized and made available to VANI

users. In [8], the design and development details of these resources have been presented,

and here, we briefly overview these resources and type of functionalities that they can

offer to researchers.

In VANI all physical resources are virtualized. Through virtualization, we separate

applications from their underlying physical resources. To do so, we developed a virtual-

ization layer and virtualization agents for each physical resource as shown in figure 4.4.

The task of the virtualization layer is to coordinate the system wide virtualization of a

resource and to expose the resource as a service component with Web Service interface

to the rest of the system, and the agents task is to launch or destroy the virtual resources

on top of each physical resource.

The first physical resource that we have virtualized is the reprogrammable hardware

resource. To develop this resource we have used BEE2 boards [80]. Each BEE2 board

has four high-end Xilinx Field Programmable Gate Arrays (FPGA) each connected to

four 10GE interfaces. We have virtualized all four FPGAs in a BEE2 board so that a

researcher could ask for one or more FPGAs and program it as s/he likes.

Researchers can ask for an FPGA through the control plane and then program it,

configure it, or release it. They also have access to the libraries for controlling the 10

GE interfaces and some other commonly used hardware blocks such as DDR2 memory

modules. After programming an FPGA, a researcher can directly connect to the FPGA

through the 10GE interfaces according to whatever protocol designed for that FPGA. For

example, a researcher can use one FPGA or all four FPGAs to develop a layer three router


10GE Fabric Processor Blades

FPGAs/BEE2

Storage/Fileservers

Virtualization

Layer

Web Service interfaces

Virtualization

Layer

Virtualization

Layer

Virtualization Sub Agents

Figure 4.4: Virtualizing physical resources in VANI

with 4x10GE ports or 16x10GE ports, or a content-based routers that routes packets

based on the packets payload rather than their headers. We present the performance

evaluation results for this hardware resource in the performance evaluation section of

this chapter.

Another physical resource in the VANI testbed is the processing resource. The pro-

cessing service is developed based on Linux vServer [76] technology. Linux vServer is

an OS-level virtualization software that creates a virtual processing node on top of a

Linux kernel. Researchers are able to get a processing resource through VANI-CMP, and

release it whenever they wish to do so. Once a virtual processing node is allocated, the

researcher can directly ssh to the node. Researchers are also able to program the virtual

processing node with a specific image, create an image of their own, and save it on the

storage service, and share it with others or program other virtual nodes with that image.

We have also virtualized the internal fabric of the testbed for creating virtual networks.

The internal fabric consists of a set of high capacity Ethernet switches that are able to


isolate traffic between different applications and experiments by creating separate virtual

LANs. Moreover, it allows different experiments to intercommunicate by creating shared

virtual LANs that all have access to. This resource, together with the processing resource,

enable VANI to guarantee the bandwidth for an experiment. Later in the bandwidth

guarantee section, we will discuss this feature in more detail.

The gateway and bridge resource is another developed resource that enables com-

munication between different VANI nodes. If one of the resources in VANI needs to be

accessible from the Internet or from a resource in another VANI node, it can ask for

a public address through the gateway service and get an address for duration that the

external access is needed. The researcher can release the public address when it is no

longer needed.

Th bridge service is used for experiment involving new layer three protocols on top

of Ethernet network. Using the bridge service, a researcher can send and receive layer

two Ethernet frames to any other VANI node, and hence, would be able to develop and

test new layer three protocols over a wide area network. This functionality would only

be available if the VANI nodes are connected using a wide area Ethernet network. We

will discuss this case later in more detail.

Another physical resource developed for VANI is the storage resource. Storage re-

source is implemented on a set of distributed file servers that emulates one big storage

server. Researchers are able to connect to the storage service through VANI-CMP and

then directly connect to a file server for uploading and downloading files. All the direct

communications to the file servers for uploading and downloading files are done over a

secure HTTP/TLS connection. Researchers can use this service to store images for pro-

gramming other resources such as processing resource, and reprogrammable hardware

resource, and they can also share file with other researchers through this service.


Researcher Control Virtualized

Resource

Direct Connection to Resource

Reso

urce

StartgetResource

getResource

Auth/Authz

Accounting/

Record Keeping

Auth/Auhtz

programResource

programResource

releaseResource

releaseResource

Auth/Authz

Accounting/

Record Keeping

Figure 4.5: A sample interaction between a researcher and VANI to secure a resource

4.1.3 Example: Requesting a Resource in VANI

Figure 4.5 shows a sample message exchange scenario between a researcher, the VANI

control and management plane and physical resources inside a VANI node. A researcher

starts requesting for a resource by invoking the getResource operation of the VANI-CMP

WS interfaces. In that request, the researcher includes the type of resource, the duration

and number of required resources.

VANI-CMP authenticates and authorizes the request and forwards the request to the

resource. All resources in the testbed expose their operations to VANI-CMP through a

generic WSDL interface. This makes it possible to easily extend the types of resources

and services in the testbed without changing the control and management software.

The resource responds back to the control plane request with a success result, and

a Universally Unique IDentifier (UUID) for the resource. The control plane stores this

returned UUID and passes it to the researcher. The researcher can program the resource


identified by returned UUID, and release it at a later time.

In the next section, we delve into the control and management design and we describe

its main functionalities in detail.

4.2 VANI Control and Management Plane (VANI-

CMP)

In this section, we describe in detail the main functions of VANI-CMP. We also discuss

the main technologies that are used in design and development of VANI-CMP. These

technologies are mainly the SOA-based technologies such as Enterprise Service Bus (EBS)

[81], and Business Processes Execution Language (BPEL) [25] orchestrator engine.

VANI-CMP is responsible for performing AAA operations and allocates resources to

the researchers and application providers. In addition, it performs user management

functions, and stores and manages the testbed configuration data. It also has a registry

for all services and resources that can be used by researchers for creating a new application

or experiment setup.

VANI-CMP is designed and developed using BPEL and deployed on an Enterprise

Service Bus. Similarly to the resources and services inside the testbed, all internal com-

ponents and functions of VANI-CMP have also been developed as independent service

components, and are accessed through Web Services.

The use of ESB and Web Services enables VANI-CMP to be easily extended in func-

tionality and accessed through other types of interfaces in the future. This design choice

also enables independent development, testing, and redeployment of internal functions of

VANI-CMP such as AAA operation, configuration management, etc. Moreover, the use

of BPEL language for VANI-CMP enables a high level description of the VANI control

and management operations. This enables rapid and easy modifications of the control

and management logic.


In the next subsections, we examine each of the functionalities of the control and

management plane and we describe the design steps and interfaces of each of the modules.

4.2.1 User Management

Three concepts are used to manage users in VANI: application plans, service levels, and

plan administrator levels. Application plans are used to show different experiments and

to organize resources and resource usage in each experiment. When booking a resource,

the researcher must specify which plan (experiment) the resource is being booked on.

Any researcher belongs to a service level which governs what control operations s/he is

allowed to call and also how much of each resource s/he is allowed to book. Custom

service levels may be designed for specific users in order to maintain flexibility. Lastly,

plan administrator levels are used to govern access to certain resources. Resource users

will be granted specific levels of access defining their ability to release, program, save,

etc.

4.2.2 Authentication Authorization Accounting

The control software is responsible for handling authentication of users. All operations

in the control plane require users to provide credentials. Currently, credentials are in the

form of a user name and password combination however the implementation allows this

to be easily changed. On every call to the control software, the user is authenticated and

a check is made to ensure that the user has the rights to execute the requested operation.

In addition to authentication, the control software is responsible for authorizing access

to resources. Every access to a resource consists of two checks, ensuring the resource

belongs to the user, and the user has the rights to manipulate the resource as requested.

In order to prevent outsiders from directly accessing resources and bypassing the

control plane, all requests to resources require credentials known only to the control

plane. This credential is generated when resources are initialized.


The control software keeps a record every time a resource is booked or released. This

keeps an account of which resource was used by which user (on which plan) and for

how long as well as all resources currently in use. Resources are identified by a UUID

generated by the resource and passed back through the control plane.

4.2.3 Resource Allocation

Resources are booked through the control plane whether the user is a researcher or an

application provider building a resource on top of another. Users provide their credentials

and specify which resource they wish to book (on which VANI node) and the plan to

which the resource will belong. The control plane ensures the user is allowed to book the

resource and determines the location (WSDL address) of the resource in the network. A

getResource request is then made to the resource. The resource does not know who is

requesting the resource as this information is hidden by the control software. If successful,

the resource will return a UUID identifying the resource as well as any other relevant

data which is then passed back to the user. The UUID is used by the control plane for

accounting purposes.

4.2.4 Generic Resources and Registration

New resources can be made available dynamically in the control plane through a reg-

istration operation. The new resource must consist of a unique name, a service name,

a port name, one or more WSDL addresses, and optionally a JNLP address for the re-

sources GUI. The service and port name are used to create an end point reference which

is assigned to the partner link when the resource is to be accessed. The resource may

have multiple WSDL addresses if there are different instances of the resource on differ-

ent VANI nodes. The control software will select the appropriate address depending on

which node the user is attempting to access. Lastly, a JNLP address may be included

which allows resource creators to design and deploy their own GUI using Java web start


<xsd:element name="getRequestGenericContents">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="internalIP" type="xsd:string"></xsd:element>

<xsd:element name="uuid" type="xsd:string"></xsd:element>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

Figure 4.6: A sample schema for generic XML content in a getRequest response message

technology [82].

In order for resource creators to dynamically add new resources to the control plane,

it is necessary to use a generic WSDL interface for all resources. The main objective

with the generic interface is to provide a template that makes creating resources easy

while providing flexibility. This is accomplished by providing a number of operations,

messages that are common between many resources such as get, release, and program.

To maintain flexibility, each operation contains an optional XML string which can be

used to customize data that is passed in and out (figure 4.6). Furthermore a generic

operation is included in the WSDL which can be used to include operations not already

included in the template.

4.3 SOA-Based Implementation of VANI-CMP

The Control and Management software is implemented as a collection of web services and

BPEL models. The design is modular and flexible allowing for components to be replaced

or changed as required. The control plane is a BPEL model wrapped with a WSDL

exposing a number of operations for application providers and researchers. Currently,

there are five key components, each implemented as a BPEL model: authentication, data

store, resource manager, storage manager, and the dynamic partner link generator. In


this section, a brief description of each component is provided before focusing on how

the components fit together. For more information on each component, please refer to

the relevant section.

The data store stores all the data required by the control software. This may include

user authentication data, resource allocation and accounting, and network data. The

authentication component is responsible for checking user credentials as well as ensuring

users have the rights to execute operations. The storage and resource manager are used

to access the resources on the network. The managers determine the location (stored

as a WSDL address) of the resource in the network before forwarding requests to the

appropriate location. The dynamic partner link generator is used throughout the control

plane to dynamically choose an endpoint reference. This allows calls to be made to

different web-services determined at run time (provided they have the same interface).

The data store consists of a MySQL database, a BPEL model and three web services:

query generator, database, and result processor. The query generator has a number of

operations used to generate different SQL queries. The database has one operation which

takes in a SQL query and returns the result of the query in XML. This web service has

a socket connection to a database subagent which executes the query on the MySQL

database. The third web service processes the XML result.

The authentication component is implemented using a BPEL model and makes use

of some of the operations provided by the data store. It provides a number of operations

to check user login credentials, ensure users have the permission to execute the requested

operation. In addition, this component is responsible for ensuring users have permission

to book or manipulate (release, program etc.) booked resources.

The dynamic partner link generator is used to dynamically assign an endpoint refer-

ence to a partner link. First, a call to the data store to determine the WSDL address,

service name, and port name, which is then passed to a web service which wraps the

service name as a QName and the port as an NCName. An endpoint reference is then


created using the WSDL address, service name and port name and returned.

The resource and storage manager provide an interface for accessing resources avail-

able on the network and storage. A call is made to the dynamic partner link generator to

dynamically assign an endpoint reference to the partner link depending on which resource

is being accessed.

4.4 Security in VANI

One of the basic requirements in VANI design was to make sure the experiments are done

in a secure and isolated environment from the other applications and experiments. To

create this secure environment we have to consider security issues in various parts of the

system architecture.

The first part is to secure the communications between the researchers and VANI-

CMP. In VANI all communications between these two entities are encrypted using secure

SSL connections and WS-security specification. To do so, each researcher has to share

his/her public key with VANI (and vice versa). On top of that VANI-CMP authen-

ticates the researchers and application providers using the credentials provided in all

transactions, and then, authorizes the researcher’s access level to the resource.

The second part is the communications between the resources and VANI-CMP. These

communications have also been encrypted. Moreover, credentials only known to the

resource and VANI-CMP are included in all communications from VANI-CMP to the

resources.

All internal traffic within one experiment is separated from other experiments using

tagged Ethernet VLAN s. By proper configuration of the testbed internal fabric resource,

we are able to isolate these tagged VLANs from each other. This case is discussed in

more detail in the bandwidth guarantee section.

Communications inside the applications plane, internal to one experiment, or coming


to and from that experiment could be encrypted or not depending on the experiment,

and therefore it is outside of the scope of the VANI design. This allows researchers to

freely design and develop new encryption and decryption algorithms in different layers

inside their application plane slice.

4.5 Guaranteeing Bandwidth in VANI

In order to make sure that one experiment cannot undermine another experiment’s ca-

pability to send and receive traffic, we need to have a bandwidth guarantee mechanism

in place. Likewise, for communications between different VANI nodes, there should be a

rate guarantee in place so that a distributed experiment could have a guaranteed access

to the available bandwidth.

Since all communication in VANI is carried over the VLAN tagged Ethernet frames,

an Ethernet rate limiting mechanism in processing nodes has been developed. By doing

so, we limit the rate in which each virtual processing node sends and receives traffic

from/to another virtual processing nodes inside a VANI node. To guarantee the send

and receive rate, we designed and developed a novel Ethernet traffic shaping system,

called Distributed Ethernet Traffic Shaping system (DETS) [9] that we describe in the

next chapter.

Also the gateway and bridge service controls the rate in which an experiment sends

(receives) traffic to (from) the VANI wide area network. The wide area network that is

used to connect the VANI nodes would be a research-dedicated network like CANARIE

[77] or ORION [83] that can guarantee the aggregated traffic to/from the VANI nodes.

If the wide are network was able to provide dynamic and on-demand bandwidth allo-

cation, VANI would be able to use this functionality whenever an experiment asks for

sending/receiving traffic to/from the wide area network. VANI nodes could also be con-

nected to the public Internet network, however, bandwidth could not be guaranteed for


the experiments in this case.

To request a bandwidth guarantee in VANI, a researcher can specify the bandwidth re-

quirements of a virtual processing node in the resource get request. Likewise, a bandwidth

requirement can be specified when access to the VANI wide are network is requested.

The virtualization layer in VANI control and management plane makes sure that the

specified requirements are met when allocating virtual resources to the experiment.

If an experiments needs more VLANs it can simply ask for adding a new VLAN to

the experiment. Also, if separate experiments or applications need to have intercommu-

nication inside VLANs one of them could ask for creating a shared VLAN through the

control plane, and add other experiments to it.

Another way of communication between experiments could be through the gateway

using the public addresses allocated using the bridge and gateway services.

4.5.1 Interconnecting VANI Nodes in IP Layer

IP Network

VANI Node

GW

VANI Node

GW

VR

VANI Node

GW10.X.X.X/20 VR

VR

VR

VR

VR

VR

VR

VR

Public IP Address

Legend:

GW = Gateway

VR = Virtual Resource

VLAN#10

VLAN#20

VLAN#30

Figure 4.7: Connecting VANI nodes in IP layer


Figure 4.7 shows how we can set up an experiment or create a distributed application

across a wide area IP network. In this setting, all resources inside an experiment in a

VANI node get a local IP address in the range of 10.X.X.X. All resource could send traffic

to the wide are network using the NAT functionality implemented in the gateway service

(shown as GW in figure 4.7). It is possible to put multiple gateways in place and direct

outgoing traffic to different gateways to avoid bottlenecks in the system.

On the other hand, if a resource needs to be accessible from the wide area network, the

researcher can ask the gateway service for a public address/name, and the gateway service

redirects all traffic to that public address to the resource’s internal IP address/VLAN.

4.5.2 Interconnecting VANI Nodes in Ethernet Layer

Wide Area

Ethernet

Network

VANI Node

BR

VANI Node

VANI Node

MAC3

MAC1MAC2

BR

BR

Legend:

BR = Bridge

VLAN#10

VLAN#30

VLAN#20

QinQ#100

Figure 4.8: Connecting VANI nodes in Ethernet layer

Figure 4.8 shows an Ethernet connected VANI. Ethernet connected VANIs use the

bridge service instead of the gateway service to interconnect. Inside a VANI node, all


resources in an experiment communicate using a specific VLAN which is unique to the

VANI node. If an experiment needs to operate across multiple VANI nodes (for instance,

to test a new layer three protocol), the VANI wide area network has to be able to

transfer Ethernet frames. In this case, a unique Q-in-Q tag [84] would be assigned

to the experiment. The bridge service would be used to re-frame the internal tagged

Ethernet frames to the wide are Q-in-Q frames and the destination bridge would do the

reverse operation, and deliver the Ethernet frames to the destination MAC/VLAN in the

destination VANI node.

Since Q-in-Q tagged Ethernet frames might not be available in a wide area network,

we are able to define public MACs that can be used for redirecting traffic to an internal

MAC/VLAN by the bridge service. This functionality would enable any other Ethernet-

based experiment to send Ethernet frames to a resource in another experiment through

the bridge service.

4.5.3 Experimentation with L3 Protocols

Figure 4.9 shows how the testbed could be used to test a new layer three protocol in a

a large scale and distributed environment using proxy nodes. In this setting, the new

L3 protocol is tunneled within IP payload to a resource inside a VANI node, and then

that resource strips off the IP header and feed the new L3 packet over the VANI wide

are Ethernet network.

4.6 SW-Based Resources in VANI

One of the main contributions in our testbed control and management plane is that we

could encapsulate any software or hardware resource in our testbed as a service. To do

so, the resource can be virtualized, and abstracted as a service component that follows a

generic resource WSDL template. Then it can be registered into the control plane and


Testbed Network

Access through IP tunnels

Example:

“Red” network

protocol stack

deployed in slices

of VANI nodes &

tested to scale

Figure 4.9: Large scale experimentation with new L3 protocols

made available to other researchers. Details on how this task can be accomplished have

been discussed in the control and management plane section in this chapter.

Examples of such resources as a service are any hardware function or resource that

could be reused in different applications and experiments such as hardware accelerators

for encryption, decryption, content conversion, and content compression/decompression.

Also other reconfigurable hardware modules such as NetFPGA could be virtualized and

offered to the researchers on an on-demand basis.

Other types of processing nodes could also be offered to the researchers as a resource.

For example, Amazon Elastic Computing Cloud (EC2) nodes [27], GENI virtual pro-

cessing nodes, VMWare-based virtualized processing nodes [85], or Graphics Processing

Units (GPUs) could be controlled and managed by VANI-CMP.

Moreover, software services such as BPEL orchestrator engine and Complex Event

Processing (CEP) engine, could be developed and/or deployed on top of current virtual

resources and made available to the researchers through VANI-CMP.

Currently, we have developed and deployed several software-based resources as service

components in VANI. In this section, we briefly go over these resources and we describe

what functionalities each resource provides:

1. BPEL orchestrator as a service is able to execute a BPEL project and to orchestrate


a composite application.

2. Complex Event Processing as a service is a service that is customizable to receive

events from different sources using different protocol (JMS, SNMP, etc..). This

service is able to analyze received events and produce notifications and events and

send them to different destinations using different protocols. We have used this

service for the performance monitoring and analyses of VANI.

3. Database as a service is able to store, search, and retrieve data on-demand. Re-

searchers could get this resource, program it using a database file, and query it by

sending SQL commands over WS interface to the database resource. DB resource

uses MySQL engine and stores its data on VANI’s storage service.

4. Sensor as a service is able to manage different sensors data and forward them to

anywhere a researcher asks for on-demand. For example, a researcher can ask for

sensor data for the status of wind or sun in a specific location for a limited time.

This allows creation of many new applications and experiments using the sensor

service.

5. GENI federation service enables access to PlanetLab GENI resources through VANI-

CMP for the researchers connected to VANI. We discuss this service in the next

section in more detail as we describe the interconnection between VANI and GENI.

4.7 Federation with GENI

GENI is an initiative to create a large scale experiment through federation between

different testbeds. Federation in GENI is done using GENI wrappers. A GENI wrapper

is developed for each testbed and testbeds could connect to each other through them.

In VANI, we developed a wrapper for control and management plane, and through that

we invoke GENI wrapper operations to get a node on any GENI testbed. We tested our


VANI-CMP

Virtualization

Layer

Researcher

Phy

Nodes

VANI-CMP

VANIWrapper

GeniWrapper

Client

Gen

iWrap

per

Serv

er

GENI Nodes

R

S

M

A

M

VANI/GENI

Interface

Figure 4.10: Connecting VANI to GENI

wrapper with PlanetLab GENI wrapper and managed to obtain a PlanetLab processing

node through our VANI-CMP.

In VANI, researchers are able to get a PlanetLab processing resources using VANI

generic resource template. Since PlanetLab does not support storage service, and also

does not support other VANI requirements such as processing and bandwidth require-

ments, access to PlanetLab processing resources would not support these functionalities.

Figure 4.10, shows the structure of interconnection between VANI and PlanetLab through

the GENI wrappers. Currently, we are in the development phase of offering VANI re-

sources to GENI researchers through the VANI wrapper.

4.8 A VANI Node

A VANI node is composed of the resources described in this chapter, their corresponding

virtualization software, control and management software, and the storage service. A


VANI node can be totally deployed on a computer cluster composed of normal computing

blades, and manageable Ethernet networking elements. The basic resources in a VANI

node are the processing resource, the storage service, and the fabric service for the network

virtualization that are deployed on a computer cluster.

All other resources and the control and management software are deployed on these

basic services. In addition, all other software-based resources, and the virtualization layer

for resources like reconfigurable hardware resource, and the VANI wrapper for connecting

to GENI testbeds are also deployed on these basic resources.

The only elements that cannot be found in a normal computer cluster are the reconfig-

urable hardware resources, the gateway and bridge services, and required 10GE Ethernet

switches. These resources are also co-located with the computing cluster to provide

the WAN connectivity and to enable running experimentation with the reconfigurable

hardware resource.

In the future, we will publish instruction manuals on how to connect to the VANI

control and management plane and how to access resources through the developed GUI

as well as the secure WS interfaces. We will also describe how all features that we

described in this chapter can be accessed by application providers including registering

a new service in VANI.

4.9 Performance Evaluations

Up to now, we presented the VANI architecture and we discussed different aspects of its

design. To find if the currently developed resources can meet VANI design requirements,

we performed several experiments on those resources. In this section, we present perfor-

mance measurements on two key physical resources that have been virtualized and offered

to the researchers in VANI. The first one is the reprogrammable hardware resource, and

the next one is the processing resource. Our main focus in this part would be to see


10 Gbps Ethernet portDDR2 DIMM slot

Control

FPGA

User

FPGA

User

FPGA

User

FPGA

User

FPGA

40 Gbps

40 Gbps

40 Gbps

40 Gbps

20 Gbps

20 Gbps20 Gbps

20 Gbps

Figure 4.11: Reprogrammable Hardware (BEE2 Board)

if we could guarantee the promised quality of service to the researchers that use these

resources in their experiment.

4.9.1 Reprogrammable Hardware Resource

By introducing a virtualized and reprogrammable hardware resource in VANI, we enable

researchers to test new networking algorithms and protocols using high performance and

high throughput hardware resources. To do so, we virtualized BEE2 boards developed in

the University of California at Berkeley. A BEE2 board consists of one controlling FPGA,

and four high capacity Xilinx Vertex-II FPGAs (figure 4.11) that can be programmed by

users. Each FPGA has four 10GE interfaces, and 4 GB of memory.

In VANI, a researcher can get a set of FPGAs on a BEE2 board, and can ask for

on-board inter-chip communication channels which can carry up to 5 GigaBytes per

second (GBps). The detailed design of BEE2 virtualization system and introducing it as

a resource in VANI can be found in [8]. Here, we present the performance measurements


on this resource. The parameters of interest are the programming time of the FPGAs

through the virtualization software as well as the speed with FPGAs can send and receive

data.

The first parameter is the time in which a researcher can program an FPGA through

the testbed control plane. Also, we would like to know how this time would change if

four researchers want to program all four FPGAs concurrently. To do so, we developed

a bitstream that initializes all 10GE interfaces on the FPGAs and starts sending a burst

of UDP/IP packets on one of its 10GE interfaces, and we programmed FPGAs through

VAN-CMP using the generated bistream for several times. Table 4.1 shows the average

maximum programming time that programming one, two, three, and four FPGAs take.

As can be seen, it only takes 30 seconds on average to program an FPGA in the case

where all four FPGAs are programmed concurrently, and this time is around 11 seconds

if only one FPGA is programmed at a time.

This fast programming time allows a researcher to get an FPGA with four 10GE

interfaces in less than a minute, and to run an experiment and return the FPGA back

to the VANI resource pool as soon as it’s not required.

The next experiment that we performed is to measure the speed with which the

FPGAs can send and receive traffic. To do so, we developed a traffic generator using

Verilog hardware description language, and we started sending traffic from one 10GE

interface to another 10GE interface on the same FPGA, and we recorded the maximum

bandwidth that we could receive in the hardware resource. We also compared this with

the traffic statistics gathered by the Ethernet switch connected to the FPGA. We repeated

this experiment several times and were able to send and receive Ethernet frames to the

rate of 1GBps, which is equal to 8Gbps. The reason that we could not send more traffic is

the 8/10 bit encoding mechanism for 10GE-CX4 interfaces, and 8Gbps is the maximum

achievable traffic rate per port on a BEE2 board. In our measurements, this rate did not

change if all ports started sending and receiving traffic at the same time since separate


FPGAs 1 2 3 4Programming Time (s) 11 17 24 30

Table 4.1: Average maximum FPGA programming time

internal modules are controlling each port. This experiment shows that one FPGA alone

can send and receive 32Gbps traffic. If a researcher get all four FPGAs on a BEE2 Board

it is possible to send/receive traffic in the rate of 4x32=128Gbps.

We have used this reprogrammable resource in developing the high capacity gateway

and bridge service for VANI, and we have developed a bandwidth control mechanism on

this resource that controls and guarantees the rate at which one experiment could send

and receive traffic to/from a wide are network. In the future, we will present our design

for the gateway and bridge service, and we will present our performance measurements

for this service as well.

node01 from/to UDP UDP (rl) TCP TCP(rl)

node02 (12.50MBps) 24.5/24.3 12.4/12.4 15∼35/24.7 12.3/12.3node03 (18.75MBps) 24.5/24.3 18.8/18.8 15∼35/24.3 18.4/18.4node04 (25.00MBps) 24.5/24.3 25.3/25.3 15∼35/24.1 24.8/24.6node05 (31.25Mbps) 24.5/24.3 31.7/31.6 15∼35/22.1 31.3/31.1node06 (31.25Mbps) 24.5/24.3 31.7/31.6 15∼35/23.2 31.3/31.1

Table 4.2: UDP and TCP traffic measurements in a VANI node in MBytes per second(MBps)

4.9.2 Processing Service and Network Virtualization

Another main physical resource that we have virtualized is the processing service that

uses Linux vServer software. There have been studies on processing virtualization tech-

niques [86], and also specifically on Linux vServer [76]. Linux vServer performance eval-

uations show that this virtualization module has a very low overhead on overall system


performance.

VANI Internal Fabric

node01

VN_1_1 VN_1_2 VN_1_3 VN_1_4 VN_1_5

node02

VN_2_1

node06

VN_6_5

node03

VN_3_2

node04

VN_4_3

node05

VN_5_6

vlan#105

Exp#5

vlan#104

Exp#4

vlan#103

Exp#3

vlan#102

Exp#2

vlan#101

Exp#1

Virtual Processing Nodes

1G

E

Processing Servers

1G

E

1G

E

1G

E

1G

E

1G

E

Figure 4.12: Traffic measurement experiment topology

However, since we are also doing network virtualization in addition to the processing

node virtualization, we conducted two more experiments that we believe were necessary

to show that virtual processing nodes can have guaranteed access to the VANI network.

In our experiment, we virtualized cluster blades with dual Xen 1530 CPUs and 2GB

of RAM and one 1GE interface. The Linux kernel version that we used was 2.6.16, and

we used vServer 2.3.2. patch. The developed virtualization layer allows up to ten virtual

nodes on a physical node. For this experiment, we initialized and launched 5 virtual

nodes on a node named node01. We also launched 5 other virtual processing nodes on

five separate servers with same capabilities described for node01. These nodes are named

node02 to node06. Each of the virtual nodes in node01 belongs to an experiment that

includes one other virtual node running on one of the other nodes. The topology and

VLAN tags for experiments are shown in figure 4.12.

In this experiment, we measured the UDP and TCP traffic rate that each virtual


node in an experiment could send and receive in different cases. The first case is to find

out the maximum achievable rate when no limit is placed on the traffic rate and only one

experiment is active. This rate is 122MB per second (MBps) for both UDP and TCP

traffic which is equal to 976Mbit per second (Mbps). Table 4.2 show the achievable rate

in different cases when all experiments are active and send as fast as they can. Since

all experiments running on node01 try to send and receive on one 1Gbps Ethernet link

concurrently, they get a different share of this available traffic in different cases.

In table 4.2, we show the maximum traffic rate in MBps between a virtual node on

node01 and its corresponding virtual node on node02 to node06. The UDP column shows

the maximum rate when all virtual nodes in all experiments try to send and receive UDP

traffic, concurrently, without any rate limit mechanism in place. The TCP column shows

the TCP rate in this case. As it can be seen, because of the massive packet loss in this

case, TCP cannot achieve a stable rate, and its rate changes from 15 to 35 MBps. These

measurements prove the need for a rate limiting mechanism when different experiments

want to run on a shared virtualized infrastructure.

The columns with (rl) show measurements when we limit the send and receive rate

in experiments to (12.5), (18.75), (25), (31.25), and (31.25) MBps respectively, totaling

to 118.75 MBps (950 Mbps). As can be seen, using the rate limit functionality we could

achieve the bandwidth guarantee requirements (with maximum 1% deviation from the

target rate) in a VANI node. Another case that we have studied is the case where all

virtual nodes in one experiment start sending traffic to one virtual node concurrently.

This would result in congestion on the shared link that is serving the destination virtual

node. To solve this problem, we have developed a novel traffic control mechanism that

we will present in the next chapter of this thesis.


4.10 Experiments & Applications

The testbed could be used to run large scale experiments on networked systems and

applications and network architectures from layer three up. Especially it is designed

to enable experimentation with applications that need responsiveness and quality of

service guarantee by having processing and storage services in all testbed nodes. Example

applications that could use these functionalities are video streaming applications and

smart power grid networked applications.

Due to the ability to change the experiment configuration on-demand and on the

fly and together with the everything as a service foundation of the testbeds network

architecture such as green architecture could be tested on this testbed. In green network

architecture, network topology and configuration can be changed due to the changes in

the status of renewable energy generation and consumption.

Based on the same functionalities, we are in the process of building a green orches-

trator engine that would use many aspects of the testbed including the on-demand con-

figuration, short lived resource leases, and testbed’s status and performance monitoring

tools. The outcome of this application would be published soon.

Also, due to the availability of the storage and processing resources in the testbed

nodes, the testbed could be used to experiment with various content delivery networks

such as hybrid peer-to-peer networks. In hybrid p2p networks, peers and in-network

resources could be organized and structured in a way that content be delivered to the users

with a lower search and delivery time. Also, implementing content based routers, and

distributed publish/subscribe system would be possible in our testbed, and these services

could be offered to the researchers as a stand alone and reusable service components to

facilitate experiment setup and application creation.

Chapter 5

A Distributed Ethernet Traffic

Shaping System

The architecture of the local area networks is facing new challenges with the emergence

of cloud computing [87] and the deployment of massive data centers [26]. This new

computing paradigm allows users to access a virtual network of resources in the cloud that

can be called upon to deploy applications on demand. At the same time, the networking

research community has moved toward creating similar platforms for experimenting with

new networking concepts and architectures [7]. As in cloud computing, these networking

testbeds offer a virtual network of resources to the researchers so that they could evaluate

their networked systems in large scale.

The creation of these research testbeds and cloud computing platforms has become

possible mainly due to the advancement of virtualization techniques that have made

separation of the virtual computing resource and the underlying physical resources much

easier, and have allowed operation of multiple virtual machines on one physical resource.

Inherent in such shared resource environments is the potential for disruptive interac-

tion among users and hence the need for new techniques to provide network and resource

isolation. The Virtualized Application Networking Infrastructure (VANI) [7, 8], presented

84

Chapter 5. A Distributed Ethernet Traffic Shaping System 85

in the previous chapter of this thesis, is an example of a networking research testbed that

allocates a virtual network of resources to researchers. An important requirement in

VANI is to guarantee network access rates and isolation between different experiments.

In this chapter, we present the Distributed Ethernet Traffic Shaping (DETS) system

and its corresponding algorithms designed to provide a guaranteed network access rates

in VANI. The DETS system is not only applicable to VANI, but also to the computing

clusters and data centers that virtualize and share their resources among different virtual

networks. DETS deployment in a cluster or a data center does not require any changes in

system hardware, and can be deployed on top of normal computing blades and Ethernet

switches.

Ethernet Switch

Physical

Nodes

Virtual

Nodes

Virtual

Nodes

VLAN #2

VLAN #1

VN21 VN22 VN23 VN24 VN25

VN11 VN12 VN13 VN14 VN15

PN1 PN2 PN3 PN4

PN5

Figure 5.1: A system with five nodes and two virtual nodes on each

The primary role of DETS is to control and regulate the traffic sent and received

on VLANs. Especially, this is required where more than one virtual machine is working

on a physical node, and each has to send and receive a guaranteed rate of traffic on a

dedicated VLAN on a shared Ethernet access. Figure 5.1 shows a sample scenario for

DETS. In this sample system, we have five physical nodes (SNMP) each having two


200 400 600 800 10000

200

400

600

800

1000

1200R

ecei

ved

TC

P R

ate

(Mbp

s) o

n vl

an 1

Time

Figure 5.2: TCP rate back off due to interfering UDP traffic

running virtual nodes (SNMP). All these PNs are connected to an Ethernet network and

the VNs running on these PNs require a guaranteed access rate to the Ethernet network.

For the sake of simplicity, we show an Ethernet network with just one Ethernet switch,

but in general, it is possible to have many switches in a network. In this topology, VNs

running on a node are working separately and can only communicate with their peer VNs

in other physical nodes.

If V N11, V N12, V N13, and V N14 start sending traffic to V N15 , they can consume

all the available bandwidth on the Ethernet link that connects PN5 to the Ethernet

switch. This may cause problem for traffic sent from nodes V N21, V N22, V N23, V N24

to node V N25 that shares the Ethernet link with V N15. Therefore there is a need for a

traffic shaping or rate control to limit the rate that PN5 can receive traffic for V N15 so

that V N25 can also receive traffic at a guaranteed rate.

This problem would become very evident and observable if the interfering traffic

(traffic for V N15) is UDP and the underdog traffic (traffic for VN25) is TCP. The high


amount of UDP packets on the link to PN5 would virtually disable TCP traffic to V N25

as the experimental results in Figure 5.2 show. In the Figure, V N25 receives the maxi-

mum possible TCP rate, if no traffic is sent to node V N15. However, as soon as UDP

traffic is sent to node V N15 (around time 300 in Figure 5.1), TCP rate goes to almost

zero until UDP traffic stops (at around time 1000). This experiment shows not only the

sensitivity of a TCP flow rate to a competing UDP flow but also it shows the importance

of having a traffic shaping and rate control system to guarantee an agreed access rate for

different virtual nodes on a physical node that share one Ethernet link. Although there

have been proposals for TCP-friendly transport protocols [88, 89], in many systems and

environments, such as in VANI, it is not desirable to impose a specific flavor of transport

protocol on the virtual machines. The problem of network performance degradation in

virtualized environments has been also studied in [29] and the authors, through measure-

ments on Amazon Elastic Computing services, concluded that virtualization techniques

can cause significant throughput instability.

Current Ethernet flow control uses PAUSE signals [90]. When multiple ports flood a

port, the Ethernet switch sends PAUSE signals back to the flooding ports so that they

stop sending for an amount of time specified in the PAUSE message. It has been generally

accepted that the pause mechanism in Ethernet flow control is not suitable for solving

new challenges facing these networks [26]. To address Ethernet congestion problems,

two new IEEE task forces (802.1Qua [91] and 802.1Qbb [91]) have been created. The

main approach in these task forces is to do flow control at the level of class of service

by marking frames at Ethernet switches. In contrast to these approaches, our proposed

system operates at the edge of the Ethernet network on the computing hosts in a cluster

or a data center.

We direct interested readers to [26, 30, 92, 93, 94] for a survey on the recent work

on Ethernet network congestion control for data centers. The current proposed methods

for congestion management entail modifying Ethernet network elements. Moreover, the


majority of the proposed systems are Congestion Notification based systems with no

explicit rate information [30, 92] that have been shown that have draw backs, such as

slow recovery in comparison to explicit rate systems [92].

The salient explicit rate congestion management system, Forward Explicit Congestion

Notification (FECN) [92, 94], passes explicit rate from the congestion point to the source

point based on the utilization ratio of the congested link. Our system is also an explicit

rate system, but differs from FECN in several aspects.

The DETS system is more than just an Ethernet congestion management system. In

particular DETS allows setting guaranteed limits on the send and receive on each virtual

network, and shapes the traffic so that virtual networks do not interfere with each other’s

ability to send and receive traffic. Unlike FECN, DETS does not need any change in

the current Ethernet equipments, and can be applied in the current computing cluster

systems and data centers. Moreover, our system is capable of supporting both fair and

weighted fair bandwidth allocation mechanisms. In addition, in allocating rates to the

sending nodes, the system considers the available sending capacity of the sending nodes

that results in higher throughput. Nevertheless, we emphasis that DETS is designed to

address congestion at the egress ports of Ethernet networks. Consequently, it does not

directly address congestion in the network.

The DETS operation is seamless to the virtual machines running on the host system,

and virtual machines only see the decrease and increase in send and receive traffic rate

on certain flows. In other words, the applications need not to report their bandwidth

requirement since the measurements are done in DETS. However, since our system runs

on the host system its rate set and measurements periods are limited to the system’s

timer (about 55ms).

The organization of this chapter is as follows: Section 2 describes our proposed system,

identifies key control and measurement points, and presents the DETS protocol. Section

3 presents the DETS system design and it main internal modules. In this section, we also


VN11

VN21

PN1

VN12

VN22

PN2

VN13

VN23

PN3

VN14

VN24

PN4

VN15

VN25

PN5

Ethernet SwitchMeasure and

Rate Control Point

Measure Point

and Rate Allocator

Rate Measurement

Reports

Rate Control Commands

Figure 5.3: DETS measurement and rate control points

propose four different algorithms developed for DETS. The DETS system performance

measurements are presented in section 4, and in section 5, we describe the modification in

Ethernet control plane in order to port the DETS system to Ethernet network elements.

Finally in section 6, we present concluding remarks and our future work.

5.1 Distributed Ethernet Traffic Shaping (DETS) sys-

tem

The DETS system is designed to control the rate of the traffic generated by each virtual

machine according to the total traffic rate at the destination virtual node. DETS controls

the sending rate of the traffic in the originating VN before it enters the Ethernet network

based on a target rate imposed by the receiving virtual node.

In the VANI system, a virtual LAN is created for the virtual nodes that are in one


group, and ”over the top” rate controller software is run in each of the physical nodes.

This software is able to control the rate at which each virtual machine sends traffic to

any other virtual machine in that virtual network. The module is also able to measure

received traffic to each virtual node, and detect if the received rate limit is violated. If

the received rate limit is violated, the receiving node is declared the congested node. The

controller then monitors the sent traffic to the congested node and controls its rate at

the sending node. This system is depicted in Figure 5.3 which shows the control and

measurement points.

Each agent in DETS has two separate modules; a send rate controller, and a receive

rate allocator. The send rate controller monitors the sending traffic rate to any other

virtual machine that is facing congestion, and reports it to the rate allocator in the con-

gested node (node PN5 in example scenario, called the receiving node in the remainder

of this document). The rate allocator at the receiving node (PN5) allocates a rate to

each sending node and sends set-rate commands to the corresponding send rate controller

modules in the sending nodes. The send rate controllers apply the received set rate com-

mands (at the set rate control points shown in Figure 5.3) and subsequently the traffic

sent to the congested node (PN5) will be shaped accordingly.

The DETS system can be implemented in any cluster with any operating system that

is able to control the egress Ethernet traffic rate. In the next section, we focus on a

cluster of Linux-based computing nodes, and we describe the system design and protocol

for deploying DETS in such a cluster.

5.1.1 DETS Protocol

The DETS protocol has five types of messages:

1. Traffic Report message, sent from a sending to a receiving node and includes mea-

sured rate, current rate limit, and available rate.


2. Initialize Traffic Control message, sent from a receiving to a sending node to ini-

tialize the traffic controller to the receiving node.

3. Set Rate message, sent from a receiving to a sending node and includes the allocated

rate that the sending node has been granted.

4. Keep Alive message, sent from a receiving to a sending node when the traffic control

on the receiving node is active.

5. Deactivate Traffic Control message, sent from a receiving to a sending node to

deactivate traffic control to that receiving node.

5.1.2 DETS for Linux OS

In Linux, traffic shaping can be done on egress and ingress traffic. The main command

for performing traffic shaping is ’tc’ command [95]. This command can operate on a

virtual interface (serving a VLAN), and can be also used for measuring the send and

receive rates. The shaping in our system is done in the Linux hosts, and it is seamless

to the virtual machines running on them.

5.2 DETS System Design

Figure 5.4 shows the design of DETS. In the send rate control module, there is one

state machine for each receiving node. Also, there are two internal sub modules in the

receiving rate allocator module. The first module is responsible for communicating with

the sending nodes, and the second module allocates the rates to the sending nodes.

5.2.1 Rate Allocator Module

The core part of the DETS system is the rate allocator module that allocates the sending

rate to each sending node. The rate allocator module utilizes a Rate Allocation Algorithm


Send Rate

Control

Subsystem

Receive Rate

Allocator

Rate

Allocator

Sending Node

Communication

Send Rate

Measurement

& Control

Receive Rate

Measurement

Linux Traffic Measurement and Shaping

DETS System

Figure 5.4: DETS System Internal Modules

(RAA) to determine the rate at which each sending node can send traffic to the receiving

node.

In RAA design, we need to consider that the measurements in the send rate control

modules are capped by the rate set by RAA. To better explain this limitation and its

implication on algorithm design we use an example scenario. Assume that in Figure 5.3,

the system in a steady state with four virtual nodes (V N11 to V N14) sending traffic

to VN15 with rates (80, 80, 20, 20)Mbps respectively. At this point, if VN11 stops

sending traffic to VN15 the rate allocator algorithm may reallocate the vacant rate to

other nodes. However, since there are no measurements for sending rate above the rate

limits, the RAA needs a mechanism to probe V N12 to V N14 to see if the sending nodes

need to send more traffic or not. Without a probing mechanism, RAA would allocate

rates to a node that might not need the extra allocated rate and the available bandwidth

would be wasted.

The probing mechanism allows us to provide fairness in rate allocation to virtual

nodes. Assume that in the above example, all nodes have similar importance, and have

equal amount of traffic to send to VN15, so the above allocated rate is not fair since two

of the virtual nodes have been allocated rates (80 Mbps to each) that are much more


input : Active nodes list and their send capacityoutput: Calculates granted rate to each Node

1) Calculate the fair rate;fairRate← totalRate

activeNodes;

2) Assign fairRate to all active nodes considering their send capacity;

while There are unallocated rate and nodes with sending capacity dograntRate[i]← min(fairRate,maxRate[i]);if fairRate > maxRate[i] then

fairly distribute extra rate among other nodes;end

end

Algorithm 1: RAAFairShare

than the rates allocated to the other two nodes. If V N13 and V N14 had more traffic to

send this rate allocation is unfair. In this case, the probing mechanism in RAA starts

probing nodes with lower allocated rates to see if they have more traffic to send, and

whether they need more allocated rate.

The probing mechanism in RAA is done through gradual increase and decrease in

rate allocations to different nodes and monitoring the increase and decrease in rate mea-

surements. The probing mechanism may reduce bandwidth utilization, but this might

be acceptable in order to overcome the above mentioned problem.

Another important factor in RAA design is to consider the available traffic sending

capacity in the sending nodes in rate allocation. Assume that in Figure 5.3, V N14 is

sending 20 Mbps to V N15, and 80 Mbps to V N12, and its total send limit is 100 Mbps.

Therefore, V N14 cannot send any more traffic to V N15. Therefore, the rate allocator

algorithm in PN5 should consider the available sending capacity of the sending nodes in

its rate allocation.

There are a number of possible allocation algorithms that can be used in this system.

Next, we propose four of these rate allocation algorithms: Fair Share algorithm (RAA-

FS); Slow Probe algorithm (RAA-SP); Fast Probe algorithm (RAA-FP), and Forward

Explicit algorithm (RAA-FE).


input : Active nodes list and their requested rate and send capacityoutput: Calculates granted rate to each node

1) Inflate requested rate of the nodes that fully use their allocated rate by 10%;

2) Calculate the total requested rate;

3) Calculate the ratio of increase and decrease the requested rate based on theavailable rate; ratio← totalReqRate/totalAvailableRate;

4) while There are unallocated rate and nodes with sending capacity dograntRate[i]← min(reqRate[i] ∗ ratio,maxRate[i]);if reqRate[i] ∗ ratio > maxRate[i] then

fairly distribute extra rate among other nodes;end

end

Algorithm 2: RAASlowProbe

The fair share algorithm (RAA-FS) calculates a fair rate by dividing the receiving

rate limit by the number of sending nodes that have traffic to send, and allocates that fair

share to each of the active sending nodes. This algorithm is suitable for the cases where

the sending nodes need to be treated similarly in the rate allocation process, independent

of the amount of required traffic, as shown in the pseudo code presented in Algorithm 1.

In this rate allocation mechanism, if the calculated fair rate is more than the sending

capacity of a sending node, the extra rate is fairly distributed among other sending

nodes with available sending capacity. This algorithm is oblivious to the difference in

rate requested by each active node, and does not perform any probing to see if the

nodes have more traffic to send or not. Although RAA-FS is fair but it might result to

bandwidth underutilization, since some sending nodes might not need all of the allocated

rate.

The second algorithm, slow probe algorithm (RAA-SP), allocates rates to the send-

ing nodes based on the rate measurement reports received from their send rate control

modules. The algorithm identifies the nodes that are fully utilizing their allocated rate,

and inflates their rate request by a percentage (for example 10%) to give them an op-


input : Active nodes list and their requested rate and send capacityoutput: Calculates granted rate to each node

1) Execute Slow Probe algorithm;grantRate← RAASlowProbe();

2) Sort all nodes that fully utilized their allocated rate according to their grantedrate, and calculate the mean of the granted rate to them;

3) while pick a node with highest rate above mean rate(upper) dowhile pick a node with lowest rate below mean rate(lower) do

Multiply the rate of lower node by d and deduce the increase from highernode, considering lower node send capacity;

if upper node new rate goes below mean thenaverage lower and upper rate and assign avg rate to both;

end

end

end

Algorithm 3: RAAFastProbe

portunity to increase their rate realtive to other sending nodes that are not using their

allocated rate. RAA-SP then calculates the total requested rates and allocates a por-

tion of the available bandwidth to each node. This portion is calculated based on the

inflated request rate and the receive rate limit as presented in this algorithm’s pseudo

code (Algorithm 2).

RAA-SP gradually probes the sending nodes that are fully utilizing their allocated

rate, and gives them a better chance of getting more allocated rate. RAA-SP, however,

does not address the fairness problem, since it does not reallocate the rate from the high

rate allocated nodes to the low rate nodes.

The third algorithm, fast probe RAA (RAA-FP), extends the slow probe algorithm

by reallocating the sending rates from the higher rate allocated nodes to the lower rate

allocated nodes. In contrast to the two previous algorithms, RAA-FP addresses both

fairness and bandwidth utilization concerns. This algorithm sorts the nodes that fully

utilize their allocated rate and calculate the mean allocated rate to these nodes (shown

in the pseudo code presented in Algorithm 3). RAA-FP then picks the nodes with


the highest allocated rate, and the lowest allocated rate. RAA-FP multiplies the rate

allocated to the lowest rate allocated node by a parameter (d > 1) and deducts that extra

allocated rate from the node with highest allocated rate if the resulting deducted rate

does not go below the mean allocated rate. Otherwise, it takes an average between the

highest and lowest rate allocated nodes, and allocates this average rate to both of them.

This change in the allocated rate is done considering the free sending capacity of the

node with lower allocated rate. This operation is repeated on the next two nodes with

the next highest and lowest allocated rates until all allocated rates to the fully utilizing

nodes get revised.

Our performance evaluations show that the fast probe rate allocation algorithm

(RAA-FP) is able to achieve probing algorithm goals rather quickly, since it gives more

opportunity to nodes that are fully utilizing their allocated rate to send more traffic.

Moreover, it achieves better fairness in rate allocation since it reduces the gap between

the nodes with high allocated rates and nodes with low allocated rates. The choice of pa-

rameter d controls the trade off between the fairness and bandwidth utilization. A small

d value results in more bandwidth utilization but lowers fairness in rate allocations. On

the other hand, A choice of large d results in lower bandwidth utilization in exchange of

higher fairness in rate allocation.

The fourth algorithm is inspired by the FERA algorithm introduced in [92] for FECN-

based Ethernet congestion management. This algorithm is designed to enable comparison

between a DETS-based rate allocation system and a FECN-based system. It has been

shown that [94] the FERA algorithm has a better convergence time compared to other

proposals for Ethernet congestion control. The essence of FERA is to control the queue

length of an outgoing Ethernet switch port by assigning a fair share rate to flows passing

through that port. This algorithm uses a linear (or a hyperbolic) control function to

adjust the allocated (fair) rate to achieve a target level on queue length (Qeq).

We modified this algorithm to arrive at a target receiving rate at the receiving node.


This algorithm (called RAA-FE) calculates a fair rate (ri+1) at (i+ 1)th interval, based

on the ri value at ith interval, and a control function f(r) = 1− k ∗ r−Rt

Rtin which k is a

constant, r is the measured receiving rate, and Rt is the target rate.

DETS sends back the calculated rates to the sending nodes, and the sending nodes

apply the rates to their rate controller modules. Compared to the previous algorithms,

this algorithm does not require the rate measurements at the sending nodes, and does

not support weighted fair allocation.

In the original FERA, intervals are as low as 1 ms, but in DETS intervals are about

55 ms. Therefore, the rate regulations are done every 55 ms that makes rate convergence

a challenge for this algorithm. Although the linear control function leads to a faster

convergence time compared to the hyperbolic function, but as our experiments show,

RAA-FE takes about 40 intervals (> 2s) to converge to the fair rate. The analytical

results show this slow convergence as well [92]. This is mainly because this algorithm

does not include the sending rate measurements.

5.2.2 Performance Improvements

To improve the performance of the DETS system, we have embedded several performance

improvement mechanisms in the system. These improvements are mainly to reduce the

number of exchanged messages and to better predict the required sending rate of sending

nodes.

The first improvement is in the rate measurement reports. The send rate control

module can send the measurement reports only if there is a major change in the measured

rate. By doing so, the measured rates can be sent with a lower frequency. Also if the

measured rate is less than a minimum threshold, the send rate control module can stop

reporting it, and the rate allocator would automatically allocate a minimum rate to that

node.

The send rate control module can also use a prediction algorithm on the rate in which


200 400 600 800 10000

200

400

600

Time

200 400 600 800 10000

500

1000

TCP Rate (Mbps)VLAN 1

UDP Rate(Mbps)VLAN 2

Figure 5.5: DETS performance evaluations for system shown in Figure 5.1

a sending node will generate traffic to a receiving node during the next time period, and

send it to the rate allocator module. This predicted rate can be calculated based on the

current and past measurements. This prediction improves the rate allocation algorithm

performance since it considers the predicted rate requirements of a node instead of past

sending rate measurements.

To reduce the number of rate allocation messages generated by the rate allocator mod-

ule, this module can send out these messages to the send rate control modules whenever

there is a major change in the allocated rate.

To make sure that the messages used in DETS protocol can be delivered to the

distributed modules with minimum delay, DETS messages can be conveyed on a separate

physical or virtual network. They can be even marked with a high priority, so that they

get better chance of arriving to their destination in case the network is congested.


5.3 Performance Evaluations

In this section, we present experimental results that show DETS can achieve isolation

between virtual LANs. We implemented the DETS system in C++ and deployed it on

11 nodes with 1GE Ethernet connections in a computing cluster, and we created two

VLANs on the Ethernet switches. As in our VANI processing virtualization service [8],

we used Linux vServer technology for virtualization and deployed two virtual nodes on

each physical server. One virtual node in a physical node is connected to VLAN 1, and

the other one is connected to VLAN 2. This setting is similar to the one depicted in

Figure 5.1, except that we used eleven physical nodes instead of five nodes.

We set the send and receive limit rate for all virtual nodes in the first VLAN to 400

Mbps, and in the second VLAN to 500 Mbps, and we used the fast probe rate allocation

algorithm on both VLANs with parameter d = 2. We started sending TCP traffic from

10 nodes to one node. We expect that DETS control the rate that the receiving node

receives traffic, and limit it to 400 Mpbs. We also expect that if the nodes in the second

VLAN start sending UDP traffic to the receiving node, the TCP flows destined to that

machine don’t get overwhelmed with the interfering UDP traffic.

Our results (presented in Figure 5.5) show that DETS is able to achieve both goals.

In this Figure, the rate measurements are shown in every time unit (every 55 ms). As can

be seen, when all nodes in the second VLAN simultaneously start sending UDP traffic

to the receiving node (around time unit 320 in Figure 5.5), momentarily TCP traffic on

the first VLAN gets disrupted, and it takes two time units for the control algorithm to

receive the measurements and make the decision and apply the limits on the sending

nodes. After this short transient period, TCP traffic is able to bounce back quickly, and

continue sending information at the limit rate which is 400 Mbps.

We also evaluated and compared the performance of the four allocation algorithms.

To do so, we set up a VLAN with 10 virtual nodes sending a mix of UDP and TCP

traffic to one virtual node, and we monitored the received traffic on the receiving node.

Chapter

5.

AD

istrib

uted

Ethernet

Traffic

Shapin

gSystem

100

200 400 600 800 1000 1200 1400100

200

300

400

500

600

700

800

Rec

eive

d ra

te (

Mbp

s)a1) RAA−SP

200 400 600 800 1000 1200 14000

50

100

150

200

250

300

350

400a2)

Time

Rat

e (M

bps)

200 400 600 800 1000 1200 1400100

200

300

400

500

600

700

800b1) RAA−FP

200 400 600 800 1000 1200 14000

50

100

150

200

250

300

350

400b2)

Time

MeanStdDev

MeanStdDev

Figure 5.6: Performance evaluation of rate allocation algorithms a) RAA-SlowProbe b) RAA-FastProbe

Chapter

5.

AD

istrib

uted

Ethernet

Traffic

Shapin

gSystem

101

200 400 600 800 1000 1200 1400100

200

300

400

500

600

700

800

Rec

eive

d ra

te (

Mbp

s)a1) RAA−FS

200 400 600 800 1000 1200 14000

50

100

150

200

250

300

350

400a2)

Time

Rat

e (M

bps)

200 400 600 800 1000 1200 1400100

200

300

400

500

600

700

800b1) RAA−FE

200 400 600 800 1000 1200 14000

50

100

150

200

250

300

350

400b2)

Time

Mean

Mean

Figure 5.7: Performance evaluation of rate allocation algorithms a) RAA-FairShare b) RAA-ForwardExplicit


We also limited the peak rate of three of the sending nodes to a low limit (to 20Mbps).

This helps us better compare the performance of the proposed algorithms.

We developed an on/off burst traffic generator that is able to generate a burst of

UDP or TCP traffic for a random period between 0 and T , and stops sending traffic for

another random period between 0 and T . We used various values for T ranging from 0.5s

to 10s on different nodes. This traffic generator enables DETS performance evaluation

under time varying and bursty UDP and TCP traffic.

Figures 5.6(a1, b1) and 5.7(a1, b1) show the received rate measurements on the

receiving node for all algorithms for the period of 82 seconds (1500 time units). Figures

5.6(a2, b2) show the measured mean and standard deviation of the allocated rate to the

nodes by the slow probe and fast probe (d = 2) algorithms, respectively. Figures 5.7(a2,

b2) show the mean value of the allocated rate by RAA-FS, and RAA-FE. The fluctuations

in the received rate measurements are due to the on-off nature of the generated traffic.

It can be seen that the fast probe and the slow probe algorithms achieve better

utilization of the received bandwidth compared to the fair share algorithm, especially

since some of the nodes have less sending capacity compared to the other nodes. As it

was expected, the slow probe algorithm outperforms the fast probe algorithm in term of

its receiving bandwidth utilization. However, the fast probe algorithm is able to achieve

a low standard deviation between different flows coming from different virtual nodes

compared to the slow probe algorithm.

The RAA-FE algorithm performs poorly compared to the other algorithms and has

a slow convergence rate, and it has difficulty stabilizing. This is mainly because of the

fluctuations in the generated traffic. RAA-FE also does not consider the sending rate

measurements, and does not have a probing mechanism.

In general, the fast probe algorithm is better than other algorithms if weighted fairness

is required, but if a user needs fairness in rate allocation the fair allocation schema can

be picked. The slow probe algorithm is for the cases where the user wants to increase


DETS DETS

DETS

DETS Control Messages

Sending NodeReceiving Node

Ethernet Switch

SW1

SW2

SW3

8 5

Figure 5.8: DETS in Ethernet control plane

the bandwidth utilization in expense of fairness, and does not want a sudden change of a

traffic flow rate and prefers a slow change. In DETS, it is possible to have different rate

allocation algorithms running on different virtual networks, as long as algorithms satisfy

the network isolation requirement. This allows users to pick an algorithm that suits their

needs.

5.4 Modifications to Ethernet Control Plane

Here we discuss inclusion of DETS protocol in the Ethernet control plane so that Ethernet

switching equipments can perform DETS operations even without (or with minimum)

help from hosts attached to the Ethernet network.

We propose that in an Ethernet network, the distributed modules in the DETS system

be embedded in Ethernet switches, and DETS messages be added to the Ethernet control

messages. To do so, controlling traffic to a receiving node has to be done on ingress ports

by the edge Ethernet switches. These messages could be added to the MAC Control

type of Ethernet frames (EtherType = 0x8808) as specified in IEEE 802.3 family of


specifications [90]. The only message currently defined in this type of frame is the PAUSE

message (opcode = 0x0001). The DETS messages can use other free opcodes in this frame

type. These messages have to be in VLAN-tagged frames, since DETS is designed to

control the rate on VLANs.

Figure 5.8 shows an Ethernet network equipped with DETS. The rate allocator mod-

ule operates on the receiving port of an edge Ethernet switch (SW3,port number 5) and

the send rate control module and the traffic shaper operates on the sending port of orig-

inating edge Ethernet switch (SW1,port number 8). The set rate messages are sent from

the receiving port to the sending port. The sending port applies the allocated rate to

the sent traffic, and can forward the rate control messages to the sending host in case it

(or its NIC) is able to do the traffic shaping.

Part III

QoS & Admission Control in

Service-Oriented Systems

105

Chapter 6

Allocating Services to Applications

using Markov Decision Processes

In the first two parts of this thesis, we analyzed impacts of service-oriented approaches

in application creation on future network architectures and more specifically its central

role in network-facilitated application creation in an Application-Oriented Network. In

this part of the thesis, we focus on improving quality of experience for the applications

that are created based on this paradigm.

In the Service-Oriented application creation paradigm, services that are designed and

developed independently can be composed with other service components to create new

applications or a more complex service component. Nowadays we can see the effect

of this paradigm on different aspects of networking, such as the development of new

applications through composition of service components, both in the form of “mashups”

[21] as well as in a more rigorous form by using the Service-Oriented Architecture [96,

19]. For example, Google maps have provided the basis for a huge number of mapping

mashups. The importance of the mashup phenomenon is that it marks the emergence of

a new mode of application creation where applications are created through a distributed

and collaborative process. The term Web 2.0 refers to this emerging network-centric

106

Allocating Services to Applications using MDP 107

platform [22]. In addition, The SOA-based loosely-coupled IT systems have given the

enterprises greater agility, when it comes to adjusting the structure of their businesses

to meet changing business requirements. Another example is applying this paradigm to

the multimedia applications, by composing multimedia services [97].

There are numerous literatures in the area of service composition. In [98], the authors

have discussed the service composition problem from the QoS-awareness point of view.

They have argued that the problem of composing services with different QoS parameters

for creating an application with a set of constraints on different QoS parameters is a

Linear Programming problem, and they have used the simplex method to find the best

service set for satisfying the application’s constraints.

A QoS-aware middleware for composing multimedia services for providing multimedia

applications has been proposed in [97]. The authors have shown that the problem of

composing services is an NP-hard problem and they have proposed a heuristic algorithm

for composing services in both centralized and P2P manner for satisfying the overall QoS

constraints of multimedia applications. In their peer-to-peer algorithm, upon receiving

a request from the user, the system starts finding candidate services which satisfies the

overall QoS constraints and at the end it decides which services have to be chosen to

properly serve the users interests and QoS constraints. In [99], the authors have proposed

a Markov Decision Process (MDP) model for combining services while having multiple

choices for each service to increase the overall reward for work flows while exploring

different possibilities.

The problem of scheduling workflows while composing web services has been discussed

in [100]. The authors have proposed a genetic search approach which searches among

the possible order of a vast number of business processes and tries to find the best order

so that it satisfies the overall QoS constraint of business processes. In this chapter,

we address the problem of service composition problem while having conflicted requests

for different services in composite applications. We study this problem in two different


cases. The first case assumes applications that require simultaneous execution of service

components, and in the second case we investigate the applications that execute service

components in sequence. We propose optimal policies for assigning service instances to

different applications using Markov Decision Processes in both cases. After formulating

the problem in a form of an MDP problem, we obtain the optimal policy, and we compare

performance of the system while following this policy with systems that use the Complete

Sharing (CS) or Complete Partitioning (CP) [101] mechanisms.

The rest of this chapter is organized as follows: in the next section we define the prob-

lem of service allocation in the case of concurrent service executions. then we formulate

this problem as an MDP problem. In subsection 6.1.2, we analyze the optimal solution,

and in section 6.1.3, we analyze the problem in case a service has instances with different

QoS parameters. In section 6.1.4, we present the optimal policy for a sample system and

we compare its performance with the CS and CP methods.

In the second part of this chapter, we extend the MDP-based service allocation to

the applications that execute service components in sequence. Similar to the first case,

we define and formulate this problem using MDP, and we obtain the optimal policy for

a sample system and present the performance evaluations and comparison results.

6.1 Concurrent Service Executions

6.1.1 Problem Formulation

Consider an environment with m types of services, and k class of composite applications

(Figure 6.1). For simplicity, we assume that all instances of one service have similar

QoS parameters. Each class of composite application is composed of a set of services.

For example, a class 1 application is composed of services 1, 3 and m, while a class 2

application is composed of services 1, 2, 4 and m and a class 3 application is composed

of only one service of type m. Therefore, a request for a class 1 application will be


accepted whenever there are free instances of services of type 1, 3 and m. Also, a request

for a class 2 application will be accepted whenever there are free instances of services

of type 1, 2, 4 and m, and similarly a request for a class 3 application will be accepted

whenever there is a free instance of service of type m. As it can be seen, there is a

conflict between service requirements between class 1, 2 and 3 applications. Consider a

case where high request rate for class 3 application results in allocating all services of

type m and hence decreasing the chance of accepting other classes of applications that

require a type m service instance. This fact results to leaving instances of other types of

services underutilized while the requests for the other applications are being rejected.

N-1

0

1

N-1

0

1

N-1

0

1

Service Type 1

Service Type 2

Service Type m

Figure 6.1: A system with m different service types and N instance of each type

To solve this problem, and consequently to maximize the overall utilization, there

should be a mechanism to allow or deny acceptance of requests for different classes of

application. In this section, we propose an MDP-based partitioning model for achieving

an optimal policy for accepting or denying the requests, and we compare the results

of enforcing this policy in achieving higher utilization compared to the other policies

including Complete Sharing and Complete Partitioning [101].

In CS policy, the requests for each class of application will be accepted, whenever


there is one free instance of each corresponding service. In this algorithm, no reservation

for any of the applications is carried out. The CS policy, as described before, results to

non-optimal allocation of services to the applications. In CP policy, a constant number

of services will be allocated to each application class which can not be shared with other

classes of applications. While this policy seems fair, it underutilizes the services.

We propose a mechanism for accepting or rejecting the requests for each class of

application in the time of the request. We assume that for each class of application, the

incoming rate and holding time are exponentially distributed, where λi, (1 ≤ i ≤ k) is

the arrival rate for class i application, and µi, (1 ≤ i ≤ k) is the service rate for the class i

application. Also ni, (1 ≤ i ≤ k) is the number of the class i applications being currently

served in the system.

First we assume a simple model consisting of only 2 classes of applications (k = 2)

and 3 types of services (m = 3). The application class 1 is composed of services 1, 2 and

3. The application class 2 is composed of services 1 and 2 (Figure 6.2). We assume that

all services satisfy the QoS requirements of all classes of applications. Also, we have N

instances of services in our system from each type of service, and n1 and n2 represent

the number of applications which are currently in the system, respectively from class 1

and class 2 applications. Therefore, a state vector of (n1, n2) represent the current state

of the system.

Let S = {s = (ni, n2)|0 ≤ n1, n2 ≤ N, 0 ≤ n1, n2 ≤ N} be the system state, and st

be the system state at time t. Based on the statistical assumptions, {st, t ≥ 0} is a

continuous time Markov chain whose transitions are the event of an arrival or departure

of an application.

We try to formulate our problem as a Markov Decision Process [102]. Our objective

is to maximize the utilization of the services and increase the revenue. Therefore, our

decision process is to find how we should treat the next request arrival while the system

is in state s. The system either can only accept a request for a class 1 application, or


Application Class 2

Application Class 1

N-1

0

i

N-1

0

i

N-1

0

i

Service Type 1

Service Type 2

Service Type 3

Figure 6.2: A system with three types of service and two classes of applications

only a request for a class 2 application or accept requests for both classes of applications.

Therefore, whenever the system enters the state (n1, n2), the system knows whether it

will serve the next request for either of the classes of application or it will reject it. We

assume that rejected requests do not interfere with the system. As a result, the possible

next actions based on the state s are:

A(s) = {0}:means only accept a request for class 1.

A(s) = {1}:means only accept a request for class 2.

A(s) = {2}:means accept requests for both classes.

Our objective is to find an optimal policy for each state to maximize the reward which

is the weighted sum of the applications currently being served in the system.

6.1.2 Markov Decision Process Formulation

This initial continuous-time Markov Decision Process can be converted into an equivalent

discrete-time MDP by applying the uniformization technique [102]. In order to do so, we

define the sampling time c := N(µ1 + µ2) + λ1 + λ2, and during each sample time only


one transition can occur which corresponds to either arrival of a request, departure of a

request, or a fictitious event.

To maximize the utilization in our problem we try to maximize the reward function

which is the weighted sum of different classes of applications in the system. Therefore

we use the MDP infinite-horizon discounted reward model [102, 103], and we define our

one-step reward function as follows:

R(s) = αn1 + βn2 (6.1)

The optimal discounted function and the optimal policy can be computed using the

value iteration algorithm [103],

Vn+1(s) = maxa

[R(s) + ǫ∑

s′

P ass′Vn(s′)] (6.2)

in which ǫ is the discounting factor and P ass′ is the transition probability from state s to

state s′ while applying policy a and its value is as follows:

• When there is an arrival for a request for a class i application and we accept the

request the probability is: λi/c

• When there is an arrival for a request for a class i application and we reject the


• The probability of departure of a class i application from the system is: niµi/c

• The probability of the fictitious event is: 1− (∑

niµi −∑

λi)/c

Now we can recursively compute the sequence of n-stage Vn(s) values using the method

of successive approximations [103] and limit of this sequence when n goes to infinity. It

is shown that V (s) := limn→∞ Vn(s) exists and it is the solution of the infinite-horizon

discounted problem [103].


Application Class 1

Application Class 2

S3.1

Application Class 1

S3.2

N-1

0

i

N-1

0

i

N-2

0

i

Service Type 1

Service Type 2

Service Type 3

Figure 6.3: A system with three type of services, two classes of applications and twotypes of instances for service type 3

6.1.3 Optimal Policy with Different Services

In the previous section, we formulated the problem considering the case when all services

of one type are similar. But in some situations, there are different service components

with different QoS parameters which cover similar functionalities and can be replaced

with each other. For example consider the case where we have two classes of applications

and three types of services. Application class 1 is composed of services 1, 2 and 3, while

application class 2 is composed of services 2 and 3. Also, we have two type of service

3 in the system. For example, among N instances of service 3, L instance are similar

from the QoS point of view (S3.1 services), and other remaining (L - M) instances have

similar QoS properties but different than the first L instances (S3.2) (Figure 6.3).

We assume that after solving a Linear Programming (LP) problem for satisfying the

constraints of each class of application, we have found that a class 2 application can use

both type of service 3 instances, but a class 1 application only can use the instances of

type S3.1.

Now the problem is to propose a policy for accepting or rejecting requests for the


applications class 1 and class 2 to maximize the utilization of service instances. We

show that this problem is similar to the previous problem, and the policy maker can

use the previously proposed model for achieving optimal policy and making optimal

decision. Since the services of type S3.2 can only be used by the application class 1,

then the decision is to either use the S3.1 instances for application class 1, or for a

class 2 application. Therefore, applications compete for a limited number of instances

instead of competing for all available instances. Then, upon an arrival of request for

application class 2, the system assigns the S3.2 if there is a free instance of it. If there is

no available S3.2 instance, system based on the MDP model, decides whether it should

give an instance of type S3.1 to this request or keep it for later use by a class 1 application.

Formulation of this problem is similar to the formulation of the previous problem,

except that among N services in the system we have L number of S3.1 instances of

services and the arrival rate of the class 2 application is λ2pf instead of λ2, in which pf

is the probability that a request for a class 1 application arrives to the system, and there

is no free S3.2 service instance in the system. Note that pf can be simply obtained using

Erlang B formula as follows:

pf =(λ/µ)m/m!

∑m

n=0 (λ/µ)n/n!,m = N − L (6.3)

Based on this problem formulation, in order to find the optimal policy we use the

following first-step reward function:

R(sn) = αn1 + βn2 (6.4)

where n2 represents the number of class 2 applications that have come into the system

and have not found any free S3.2 instances. Again, we try to maximize this reward using

MDP for the discounted reward model with infinite-horizon. Similar to the previous

problem, we can use the method of successive approximations for finite-period Markov


Decision Processes for finding the optimal policy that maximizes the weighted-sum reward

function.

6.1.4 The Optimal Policy and Performance Comparison

Based on the presented MDP problem, we computed the optimal policy for the first

problem that described and formulated earlier. We found the optimal policies for request

arrival rates of λ1 = λ2 = 5, holding time of µ1 = µ2 = 1, and N = 10 and we set ǫ to

0.99 in the Equation 6.2. We obtained the optimal decision in each state in the case of

(α = 1, β = 0.1) and (α = 1, β = 0.5).

9 0

8 0 0

7 0 0 0

6 0 0 0 0

5 0 0 0 0 0

n2 4 0 0 0 0 0 0

3 2 0 0 0 0 0 0

2 2 2 0 0 0 0 0 0

1 2 2 2 2 0 0 0 0 0

0 2 2 2 2 2 0 0 0 0 1

0 1 2 3 4 5 6 7 8 9

n1

Figure 6.4: Optimal policy when the system is in state (n1, n2), and α = 1, β = 0.1

Respectively, Figure 6.4 and Figure 6.5 show the optimal policy for each case when

the system is in the state (n1, n2). In both these figures, ’0’ shows that the system only

will accept a request for a class 1 application, ’1’ shows that the system only will accept

a request for a class 2 application, and ’2’ shows that the system will accept request for

both classes of applications. As it can be seen when the weight of the class 2 application

is low, and plenty of them are currently being served in the system, our decision making

mechanisms suggests that we have to reject the new requests for a class 2 application

(Figure 6.4). However, if the weight of the class 2 applications is high we have to accept


9 0

8 0 0

7 0 0 0

6 0 0 0 0

n2 5 2 2 0 0 0

4 2 2 2 0 0 0

3 2 2 2 2 0 0 0

2 2 2 2 2 2 0 0 0

1 2 2 2 2 2 2 0 0 0

0 2 2 2 2 2 2 2 2 1 1

0 1 2 3 4 5 6 7 8 9

n1

Figure 6.5: Optimal policy when the system is in state (n1, n2), and α = 1, β = 0.5

more requests for that class of application (Figure 6.5).

We simulated the system performance and compared the performance of the sys-

tem while using MDP-based partitioning mechanism with Complete Sharing (CS) and

Complete Partitioning (CP) mechanisms [101].

As we described before, in CS method, the system accepts any request for any class

of application if it has enough room to serve that request. In other words, the system

does not reserve any of its resources for any class of application. In CP method, the

system keeps a constant number of service instances for each application class and does

not allocate that portion to any other class of application. In our implementation of the

CP method, we divided the resources based on the weights of each class.

Figure 6.6 shows the comparison results between these three methods. Figure 6.6(a)

shows the case where α = 1, and β = 0.1 and Figure 6.6(b) shows the case where α = 1,

and β = 0.5.

The x-axis in both these figures represents the request rate in terms of λ1, and λ2.

In both these figures λ1 = λ2 and they change from 1 to 30. The y-axis in both figures

represents the reward value, which is the weighted sum of the number of applications

currently in the system, while applying each of the partitioning methods. As it can

be seen, MDP-based partitioning mechanism outperforms the other two mechanisms,


0

2000

4000

6000

8000

10000

12000

14000

0 5 10 15 20 25 30

lambda1, lambda2

Rew

ard

CS CP MDP-based

(a) α = 1 and β = 0.1

0

2000

4000

6000

8000

10000

12000

14000

0 5 10 15 20 25 30

lambda1, lambda2

Rew

ard

CS CP MDP-based

(b) α = 1 and β = 0.5

Figure 6.6: Performance Comparison between Complete Sharing, Complete Partitioningand MDP-based partitioning mechanisms


especially when the request rate is high. It can be seen that, when the request rate is

low, there is no significant difference between the CS, CP, and MDP-based partitioning.

However, when the load is high and weight of the second class of application is low using

the MDP-based partitioning results to 60% more reward compared to the CS method,

and 10% more reward compared to the CP method.

In the next section, we revisit this problem by relaxing some of the assumptions. We

again study the optimal policy and we present MDP-based solutions for this problem.

6.2 Sequential Service Executions

In the previous section [11], we studied the problem of optimal allocation of services

to different applications, and we proposed a Markov Decision Processes approach for

solving it. One of the main assumptions we made in that section was that all instances

of services are committed by the system to the application throughout its lifetime. In

this section, we try to put more relaxation on this assumption. We propose an optimal

policy for reserving services’ instances for different applications and business processes

using Markov Decision Processes. We obtain the optimal policy for a sample case and we

compare its performance with the performance of a system that uses a Full Commitment

Policy or a No Commitment Policy in assigning services’ instances to applications.

6.2.1 Problem formulation

Consider an environment with m types of services and k class of applications or business

processes (Figure 6.7). Each class of application is composed of a set of services. For

example, a class 1 application is composed of services 1, 3 and m, while a class 2 is

composed of services 1, 2, 4 and m and a class 3 application is composed of only one

service of type m.

Each application uses a service for a limited time during its lifetime and the service


is free for the rest of the time. Whenever the system receives a request for a class of

application, it can accept the request or deny it. If the system accepts the request, one

policy is to put all corresponding instances of services on hold, until the application

execution finishes. We call this policy a Full Commitment Policy (FCP). Under this

policy, the system can accept a request for a class 1 application whenever there are free

instances of service types 1, 3 and m. Also, a request for a class 2 application will be

accepted whenever there are free instances of service types 1, 2, 4 and m, and similarly

a request for a class 3 application will be accepted whenever there is a free instance

of service type m. As it can be seen, there is a conflict between service requirements

between application classes 1, 2 and 3.

N-1

0

1

N-1

0

1

N-1

0

1

Service Type 1

Service Type 2

Service Type m

Figure 6.7: A system with m different service types and N instance of each type

Consider a case where a high request rate for class 3 applications results in consump-

tion of all services’ instances of type m and hence decreasing the chance of accepting

other classes of applications that require a type m service. This fact results to leaving

instances of other types of services underutilized while the requests for the other applica-

tions are being rejected. In the previous section, we analyzed this problem, and provided

optimal solutions for a sample scenario. Since some types of applications or business


processes do not need the services’ instances throughout their lifetime, under the FCP

policy, the services’ instances could be underutilized, even with the optimal assignments

of services’ instances to applications or business processes.

Another alternative policy is to accept any request for any type of application when-

ever it has one free instance of the first service. We call this policy a No Commitment

Policy (NCP). Although this policy seems simple, it however has a significant drawback.

For instance, applications and business processes which are only composed of one service

and have high request rates can easily consume all instances of that service and force

other applications to fail when they need that particular service.

Another policy is the Partial Commitment Policy (PCP). Under this policy, the sys-

tem assigns the instances of services to applications considering the fact that applications

do not need all instances throughout their lifetime and also the fact that the system should

guarantee some level of service availability to all accepted applications. In this section,

we analyze this policy and propose an optimal solution based on it. We also formulate

this problem and propose an optimal solution using Markov Decision Processes. Using

the achieved policy, we compare the results of applying it in achieving higher service

utilization compared to the other policies.

Our proposed mechanism is to accept or reject the requests for each class of application

in the time of the request. In other words, by rejecting a request, we reserve available

services’ instances for future use by other classes of application.

For each class of application, the incoming rate is exponentially distributed, where

λ−1i (1 ≤ i ≤ k) is the mean interarrival time of the class i application, and µ−1

j (1 ≤ i ≤ m)

is the mean execution time of the service type j. Each application class is composed of

a set of services, and nij(1 ≤ i ≤ k)(1 ≤ j ≤ m) is the number of the class i applications

being currently served by services instances of type j in the system. Also, we have N

instances of services in our system from each type of service. As a result, the state vector

of the system is: s = (n11, n21, ..., nk1, n12, n22, ...., nk2, .., n1m, .., nkm)


If an application class does not need an specific type of service at all, its corresponding

nij will be 0 throughout the system lifetime, and therefore it could be omitted form the

state vector. The set of all possible states, S, is given by:

S =

{

s : nij ≥ 0, i, j > 0, i ≤ k, j ≤ m,∑

j

nij ≤ N

}

(6.5)

Also each application class starts from a service and step by step executes a sequence

of services according to what has been already planned for it using any type of execution

language such as Business Processes Execution Language. Therefore, the state space will

be limited to the states valid based on the planned execution path. Throughout this

article, we only consider execution plans with no conditional branches.

For example Figure 6.8 demonstrates a sample scenario consisted of only 2 classes of

applications (k = 2) and 2 types of services (m = 2). The application class 1 is composed

of services 1, 2. The application class 2 is only composed of service 2. Therefore, the

state vector (n11, n12, n22) represents the current state of the system.

Let S = {(n11, n12, n22)|0 ≤ n11 ≤ N, 0 ≤ n12 + n22 ≤ N} be the system state, and

st be the system state at time t. Based on the statistical assumptions, {st, t ≥ 0} is a

continuous time Markov chain whose transitions are the event of an arrival or departure

of an application from the system, or transition from one service to the next service

according to execution plan.

Ultimately, for each state s, the optimal solution should tell us whether we should

accept next request for a class of application or not. Thus the action vector is:

a = (a11, a21, ..., ak1, a21, a22, ..., ak2, ..., a1m, ..., akm) (6.6)

in which aij ∈ {0, 1} is the act of accepting or rejecting a request for a class i application

while entering the system at service j. Consequently the action space of the system

is: A = {a : aij ∈ {0, 1} , i, j > 0, i ≤ k, j ≤ m}. This action space, however, can be


Application Class 2

Application Class 1

N-1

0

i

N-1

0

i

Service Type 1

Service Type 2

Figure 6.8: A system with three types of service and two classes of applications

simplified based on the execution plan of each application class. Later in this section, we

will present a sample action space.

We try to formulate our problem as a Markov Decision Process [102, 103]. Our

objective is to maximize the utilization of the services and increase the revenue. Our

decision process is to find how we should treat the next request arrival, while the system

is in state s. For example, whenever the sample system enters the state (n11, n12, n22),

the system decides whether it will serve the next request for either of the classes of

application or it will reject it. We assume that rejected requests do not interfere with

the system.

Therefore in state s, the possible next actions are to accept the request only for

a class 1 application, or only for a class 2 application, or to accept requests for both

classes of application. Therefore possible next actions based on the state s are: A(s) =

{{0, 1} , {1, 0} , {1, 1}}. For simplicity we use following action representation in the rest

of this chapter:

A(s) = 0,which means only accept a request for class 1.


A(s) = 1,which means only accept a request for class 2.

A(s) = 2,which means accept requests for both classes.

Our objective is to find an optimal policy for each state to maximize the reward which

is the weighted sum of the applications currently being served in the system.

6.2.2 Markov Decision Process formulation

This initial continuous-time Markov Decision Process can be converted into an equivalent

discrete-time MDP by applying the uniformization technique [103]. In order to do so,

we define the sampling time c := N∑

µj +∑

λi. During each sample time only one

transition can be occurred, which corresponds to either a change in state, or a fictitious

event. To maximize the utilization in our problem we try to maximize the reward function

which is the weighted sum of different classes of applications in the system. Therefore

we use the MDP infinite-horizon discounted reward model [102, 103], and we define our

one-step reward function as follows:

R(s, s′) = α∆+(s, s′)n11

+ β∆+(s, s′)n12

+ γ∆+(s, s′)n22 (6.7)

∆+(s, s′)nij = max {nij(s′)− nij(s), 0}

in which ∆+(s, s′)nij denotes the amount of increase in nij due to the transition from

state s to state s′.

The optimal discounted function and the optimal policy can be computed using dy-

namic programming techniques and the value iteration algorithm [102, 103],

Vn+1(s) = maxa

{

∑

s′

P ass′(R(s, s′) + ǫVn(s′))

}

(6.8)


in which ǫ is the discounting factor and P ass′ is the transition probability from state s to

state s′ while applying policy a, and its value is as follows:

• When there is an arrival for a request for a class i application and we accept the


• When there is an arrival for a request for a class i application and we reject the


• The rate of the execution of service j is:∑

i nijµj/c

• The probability of the fictitious event is: 1− (∑∑

i nijµj −∑

λi)/c

Now we can recursively compute the sequence of n-stage Vn(s) values using the method

of successive approximations [103] and limit of this sequence when n goes to infinity. It

is shown that V (s) := limn→∞ Vn(s) exists and it is the solution of the infinite-horizon

discounted problem.

6.2.3 Optimal policy and performance comparison

Based on the presented MDP problem, we computed the optimal policy for the sample

system which is composed of two types of services and two classes of business processes.

We found the optimal policies for mean request arrival of λ−11 = λ−1

2 = 60, mean execution

time of µ−11 = 30, µ−1

2 = 40, (α = −0.1, β = 0.5), and N = 6 and we set ǫ to 0.99 in

Equation 6.8. We obtained the optimal decision in each state for γ = 0.1 and γ = 0.3.

To reflect the importance of the continuation of a business process or an application,

and not terminating it while it is in the middle way of its execution path, we chose

a negative value for α and a positive number for β. The total sum of (α + β) shows

the importance of a class 1 application or business process compared to a class 2 one,

which is represented by γ. We chose a negative value for α because: if the system let

an application to enter the system, and in the time of the completion of the first step, it


forced the application to leave the system due to the unavailability of a free instance of

a service, the system would pay a cost of α.

Respectively, Figure 6.9 and Figure 6.10 show the optimal policy for each case when

the system is in the state (n11, n12, n22). We showed the results when n22 = 1(6.9a, 6.10a)

and when n22 = 4(6.9b, 6.10b).

6 1 0 0 0 0 0

5 2 2 0 0 0 0

4 2 2 2 0 0 0

11n 3 2 2 2 0 0 0

2 2 2 2 2 0 0

1 2 2 2 2 0 0

0 2 2 2 2 2 0

0 1 2 3 4 5

12n

6 0 0 0

5 0 0 0

4 0 0 0

11n 3 0 0 0

2 2 0 0

1 2 0 0

0 2 2 0

0 1 2

12n

a) b)

Figure 6.9: Optimal policy when the system is in state (n11, n12, n22), and γ = 0.1: a)n22 = 1, b) n22 = 4

In all figures, ’0’ shows that the system only accepts a request for a class 1 application,

’1’ shows that the system only accepts a request for a class 2 application and ’2’ shows

that the system accepts requests for both classes of applications. As it can be seen,

when the weight of a class 2 application is low, and plenty of them are currently being

served in the system, our decision making mechanism suggests that we have to reject the

new requests for a class 2 application (Figure 6.9), and therefore, reserve the remaining

resources for a class 1 application. However, if the weight of a class 2 applications or

business process is high we have to accept more requests for that class of application

(Figure 6.9). Also results show us that if the numbers of class 2 applications in the


6 1 1 1 1 0 0

5 2 2 2 2 0 0

4 2 2 2 2 0 0

11n 3 2 2 2 2 2 0

2 2 2 2 2 2 0

1 2 2 2 2 2 0

0 2 2 2 2 2 0

* 0 1 2 3 4 5

12n

6 1 0 0

5 2 0 0

4 2 0 0

11n 3 2 2 0

2 2 2 0

1 2 2 0

0 2 2 0

0 1 2

12n

a) b)

Figure 6.10: Optimal policy when the system is in state (n11, n12, n22), and γ = 0.3: a)n22 = 1, b) n22 = 4

system are high, the system should reject new requests for that type of application or

business process, and reserve the free instances of services for other class of application.

We simulated the described system and compared the achieved performance using the

optimal MDP-based partitioning mechanism with other two policies; Full Commitment

Policy, and No Commitment Policy. For FCP, we use Complete Partitioning (CP) mech-

anism [101]. In CP method, the system keeps a constant number of service instances for

each application class and does not allocate that portion to any other class of application.

In our implementation of the CP method, we divided the resources based on the weight

of each class.

Figure 6.11 shows the comparison results between these three methods, for the case

where (α = −0.1, β = 0.5, γ = 0.1). The x-axis in this figure represents the requests

mean inter-arrival time as λ−11 , while λ1 = λ2 = 1, and λ−1

1 changes from 8 to 60. The

y-axis in both figures represents the system revenue or reward, which is the weighted

sum of the number of applications currently being served in the system, while applying

each of the partitioning policies.

As it can be seen, MDP-based partitioning policy outperforms the other two mecha-


600

1150

1700

2250

2800

8 18 28 38 48 58

1/(lambda1), lambda2=lambda1

Rew

ard NCP

FCP

MDP

Figure 6.11: Performance Comparison between No Commitment Policy, Full Commit-ment Policy and MDP-based partitioning mechanisms (α = −0.1, β = 0.5, γ = 0.1)

nisms, especially when the request rates are high. It can be seen that, when the request

rate is low (inter-arrival time is high), there is no significant difference between the FCP,

NCP and MDP-based partitioning. However, when the load is high the MDP-based par-

titioning results to 60% more reward compared to the No Commitment Policy, and 30%

more reward compared to the Full Commitment Policy.

Another experiment which we carried out was on the service execution time distribu-

tion. So far, we used the exponential distribution for the service execution time. For some

types of services, however, this assumption might not be accurate. Using exponential dis-

tribution, we can assume memoryless properties for the problem, and consequently, we

can use Markov Decision Processes approach for obtaining optimal policy. Also expo-

nential distribution can be helpful in studying the problem behavior in the mean sense.

Therefore, in this part we decided to see how much the achieved policy would be effective,

if we had another type of distribution for the service execution. To do so, we assumed a


0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Pro

b. d

en

sit

y

A Sample Beta pdf

Figure 6.12: A sample beta distribution

600

1150

1700

2250

2800

8 18 28 38 48 58

1/(lambda1), lambda2=lambda1

Rew

ard NCP

FCP

MDP

Figure 6.13: Performance Comparison between No Commitment Policy, Full Commit-ment Policy and MDP-based partitioning with a beta distribution for service executiontime and (α = −0.1, β = 0.5, γ = 0.1)


Beta distribution for the execution time of each service instance.

Beta distribution has some interesting properties which makes it a good candidate

for modeling many types of services and processes [104]. Figure 6.12 shows a beta prob-

ability density function, which we used for this experiment. As it can be seen, using this

distribution, we can assume an optimistic estimate, a pessimistic estimate, and also the

most likely estimate on service time execution.

Based on these assumptions, we replaced the execution time of both services with a

beta distribution which has the same mean as the exponential distribution of µ−11 , and

µ−12 . Figure 6.13 shows the result of this experiment. As it can be seen, the optimal

policy found for the exponential distribution is able to achieve satisfactory results for

the Beta distribution case as well. Beta parameters used for this experiment are α =

2.33, β = 4.66,m1 = 30,m2 = 40.

In this section, we presented optimal policies for making admission decisions in service-

oriented systems. These policies, however, are suitable for small scale systems since their

computation will become infeasible for large scale systems. Also, these policies are for

service-oriented systems that are exposed to stationary Poisson request arrival processes.

In the next chapters, we extend this work and propose heuristics that are able to operate

in large scale systems distributedly and able to handle both stationary and non-stationary

demands.

Chapter 7

A Distributed Probabilistic

Commitment-Control Algorithm

In the previous chapter, we introduced the problem of optimal allocation of services to

applications, and we proposed a Markov Decision Processes [103] approach to solve the

problem. In that chapter, we first studied this problem assuming that applications re-

quire all corresponding service instances throughout their lifetime [11]. We also assumed

exponential distribution for the applications request inter-arrival times and for the ap-

plications execution times. We next addressed the case in which applications do not

need all corresponding service instances throughout their lifetime [12] and we again as-

sumed exponential distributions for applications inter-arrival times and services execution

times. In this chapter, we propose an algorithm for the problem of service commitment

with the following desirable properties; the proposed heuristic algorithm does not limit

the distribution for services execution times and applications request inter-arrival times

to any specific type and it can be implemented in a distributed and scalable environ-

ment. Moreover, the heuristic algorithm can guarantee an important QoS parameter in

a service-oriented environment.

A key challenge in application creation through service composition is to guarantee the

130

Distributed Probabilistic Commitment Control Algorithm 131

quality of service of the created applications [97, 105, 106, 107, 108]. Guaranteeing quality

of service in a service-oriented environment has increasingly received more attention as

new types of large-scale applications are built based on this new paradigm [109, 110, 111].

We consider the QoS guarantee problem using an important QoS metric for composite

applications in terms of probability of successful completion, or equivalently its comple-

ment, probability of failure. We propose a Distributed Algorithm for Service Commit-

ment (DASC) that guarantees this QoS parameter. This algorithm can be a part of a

service-oriented system that orchestrates the execution of composite applications such

as business workflows, telecommunication applications or mixed IT/Telecommunication

applications.

The orchestrator system’s main task is to invoke different services according to the

application’s execution plan. Generally, each of the invoked services has an execution

time that is stochastic [112]. In order to guarantee successful completion of an application

a service component provider has to provide a characterization of this stochastic behavior

to the system, and the system has to consider this behavior in admitting requests for

composite application.

If a system overlooks these stochastic characteristics it may excessively invoke a ser-

vice, causing it to: serve an excessive number of application instances resulting in per-

formance degradation to applications in the system; refuse to serve some application

instances; or queue instances resulting in unwanted delays in application execution. To

avoid these undesirable events, the system includes an admission control mechanism to

control its commitments to the application instances and to guarantee the probability of

successful completion.

The design of an admission controller depends on the properties of the demand for

the applications. For example, if the demand is stationary (stationary arrival rate and

stationary service times) the admission control can be designed using off-line and steady-

state analyses. In this case, techniques and approximations such as decomposition-based


methods [113, 114, 115] can be used to find the acceptable region of requests arrival

rates, and admission mechanisms are then used to enforce arrivals using rate regulators.

If the demand is non-stationary, other techniques are required for admission control.

For example, for each arriving application request, an online admission controller can

calculate the likelihood that all service components of the application can be completed

given the current state of the system.

The DASC algorithm is designed to operate in non-stationary demand environments

(namely with non-stationary request arrivals) and it uses a predictive model that de-

livers a target level of the probability of successful completion for admitted application

instances. DASC does not assume any specific distribution type for applications request

inter-arrival time and it is capable of functioning in a distributed and scalable environ-

ment. DASC assumes that the service execution times distributions in a service-oriented

system are known and remain unchanged with time.

We present two versions of the DASC algorithm, one with no queuing permitted for

the application instances and one where the system queues application instances instead

of dropping them which is discussed later in this chapter.

We present simulations that show that without a commitment control mechanism,

the successful application completion probability can be very low. We also show that our

algorithms are able to meet QoS goals. Moreover, we compare DASC performance with

alternative steady-state based admission controllers, and we show that DASC performs

better especially where demand is bursty and non-stationary.

The chapter is organized as follows. In the next three sections, we state the problem

of service commitment in a service-oriented environment, we discuss the mathematical

basis of the problem, and we present the corresponding modeling and formulation. In

section 4, we present the DASC algorithm followed by performance evaluation results in

section 5. .

We extend the proposed algorithm to systems that can provide a limited number of


S1

Application 2

Application 1Applications

leaving the

system

S2S2S2

S1S1

Service Oriented System

Figure 7.1: A sample service-oriented environment

queuing spots for services. We present the modifications to the formulations and the

performance evaluations results. Finally in this chapter, we review the related work and

discuss our contribution to this problem.

Another issue that we do not discuss in this chapter is the system revenue maximiza-

tion. In this chapter, we assume that all application classes have the same revenue for

the system, and we are only interested in guaranteeing the quality of service. In the last

chapter of this part, we propose techniques that accompany the DASC algorithm to max-

imize the system revenue by admitting more valuable application classes, and rejecting

less valuable ones.

7.1 QoS Control in a Service-Oriented System

We are interested in guaranteeing the probability of successful completion of an applica-

tion in a service-oriented environment. To clarify the problem we begin with a simple

example. Figure 7.1 shows a service-oriented environment in which two different ap-

plication classes use two different service types. In this example, application class one

begins by executing service one followed by service two. Application class two merely

uses service type two.

The problem of interest in this paper arises when there are contending requests for

shared services. For example, a high number of requests for application class two might


result in consumption of all available service two instances. Consequently, application one

instances completing service one would fail to continue their execution because service two

instances are not available. Thus application one instances have to either leave the system

without completion or face unwanted delays. To avoid this problem, a service oriented

system can use an admission control mechanism to control its service commitments by

only admitting application instances when they are highly likely to complete successfully.

To design an admission controller for a service-oriented system, we need to consider

the environment in which it is operating. In a stationary environment with stationary

demand, we can design an admission controller using off-line steady-state based analyses

of the system. In such an environment, we can find the acceptable arrival rate region in

which the successful completion probability can be guaranteed in the steady-state. To

enforce operation within this acceptable region we can use token bucket regulators at the

portal to a service-oriented system. Although, to the best of our knowledge, there are no

exact closed form solutions for the associated Finite Capacity Queuing Network (FCQN),

in general there are a variety of approximation and decomposition based [113, 114, 115],

simulation-based [116] or bottleneck analysis methods [117] that can be used to find the

region of acceptable request arrival rates in steady-state.

When demand is non-stationary we need a different approach that can make admission

decisions based on the current state of the system including the current application

instances that are being served in the system. The design of this type of system requires

a transient-state analysis of the system and involves on-line decision making based on

these transient-state analyses. In this chapter, we present DASC as an algorithm that is

able to handle non-stationary arrivals in a distributed and scalable environment.

In DASC, each service component has an agent that tracks service component usage

as well as future commitments. When a new request for an application arrives to a

service-oriented system (for example to a service-oriented orchestrator engine), DASC

first queries all the corresponding service components agents in parallel. Each agent


accepts or rejects the admission of that request based on the current state and anticipation

of future usage, and the admission controller then makes the final admission decision

according to the agent responses. We will show that DASC is able to achieve higher

applications throughput compared to the steady-state based approaches when the service-

oriented system is exposed to the bursty request arrivals, while it can still guarantee the

application successful completion (or failure) probability.

DASC requires knowledge of the application execution plans. We assume that in a

service-oriented environment the structure of the application in terms of its logical ser-

vice components and their inter-connections are known. This is not an unreasonable

assumption, especially since the service components involved in an application and ap-

plication execution flow are known in most SO-based applications. DASC also requires

the probabilistic properties of service execution times. This allows DASC to anticipate

an application’s future service usage and commit the necessary service instances to each

admitted application instance.

S1 S3S2

S1 S2 S3

S3

S2

S1 S4

Sequential Conditional

Parallel loop

1

1

S3

S2

S1 S4

1 - p

p

l

Figure 7.2: Composition Operations

To model a service-oriented system for DASC algorithm, we first need to present some

definitions. Assume that in a service oriented system there are L different application


classes; Ai(i = 1, .., L); there exist M different service types represented by Sj(j =

1, ...,M), and each service type has Nj instances. Also S represents a set of all service

types: S = {sj : 1 ≤ j ≤M}, and Ui ⊆ S represents the set of services that are required

for creation of application i based on the composition function Ci(Ui).

Composition function Ci, uses basic operators for service composition:

Definition 1) In a service-oriented environment, the services can be composed using

five types of operations: (four operations are shown in Figure 7.2):

a) Sequential operation⊗

; Sj

⊗

Sk shows that the service Sk will be executed after

the completion of execution of service Sj.

b) Conditional operation ©; Sj © Sk shows that the system executes either service

Sj or Sk. We assume that the probability for choosing service Sj is pj, and for choosing

service Sk is pk, while pj + pk = 1.

c) Parallel operation⊕

; Sj

⊕

Sk means that services Sj and Sk will be executed in

parallel, and the output will not be available until both services finish their execution.

d) Loop operation⊗l;

⊗l Sj means that the system must execute l sequential itera-

tions of service type j before continuing the execution of the application.

e) End operation⊙

for the end of execution.

Sequential operator⊗

is also used as the fork and join operator together with the

parallel and conditional operators.

We use these operators to analyze the application instance execution times in terms

of each service execution time according to the applications execution plan.

Definition 2) an Application execution path (or execution path) is a path which a given

instance of an application follows, starting from one service type and ending with another

service type. By definition, there are no conditional operators in one execution path of

an application.

Definition 3) an Application execution plan (or execution plan) is a plan which outlines

all execution paths of an application, including all of its conditional operations and


describes the sequence of the service executions of an application from start to finish.

Our goal in this chapter is to present an algorithm that can guarantee the probability

of application completion. In other words, we would like to have:

Pf i≤ π ∀i ∈ {i = 1, .., L} (7.1)

where Pf iis the probability of failure of one instance of application class i, and π is

an agreed-upon threshold. Furthermore, this algorithm should be scalable and capable

of operating in a distributed environment, and also not involve excessive computation

overhead.

7.2 Probabilistic Modeling of Service Commitment

In this section we will consider random variables corresponding to the execution time of

an application. For simplicity of notation, in this section we will assume that the service

components that appear in the execution plan for an application are numbered S1, S2,

...., and that each such service appears only once in the plan so that the corresponding

execution times can be unambiguously denoted by X1, X2, ... The execution time of a

service j instance is a random variable Xj with a probability density function (pdf) and

cumulative distribution function (cdf) given by fj(t) and Fj(t), respectively. The pdf of

the application execution time for one path can be computed using a combination of pdfs

of corresponding services in that path.

We begin with the sequential operator. The execution time of Sj

⊗

Sk is a ran-

dom variable (Y⊗) which is the sum of two random variables representing the services

execution time: Y⊗ = Xj +Xk, with pdf of: fY ⊗(t) = fj(t)∗fk(t), assuming the indepen-

dence between execution times, and also assuming that there is no waiting time between

execution of two consecutive services.

Similar to the sequential operation, the execution time of the loop operation of one


service is the l-fold convolution of the execution time pdf of that service. In other words,

the execution time of⊗l Sj is Y⊗l =

∑l

c=0Xjc in which Xjcs are i.i.d random variables

with the pdf fj(t). Thus the pdf of the loop operation is fY⊗l(t) = f l

j(t) which denotes

the l-fold convolution of fj(t).

The execution time of the parallel operator (Sj

⊕

Sk) is a random variable (Y⊕)

which is equal to max(Xj, Xk) and has the pdf of fY ⊕(t) = FK(t)fj(t) +Fj(t)fk(t). The

execution time of the conditional operator (Sj © Sk) is the random variable:

YO =

Xj, (prob = pj)

Xk, (prob = pk), pj + pk = 1

with pdf of: fY O(t) = pjfj(t) + pkfk(t).

For an instance of application i, suppose we start service j at time zero and consider

the time until we complete a subsequent service k. Let hijk(t) denote the pdf for this

elapsed time and let Hijk(t) be the corresponding cdf. Now suppose we are interested in

the probability that having started service j at time zero that the application execution is

in service k at time t. Let m be the service that precedes service k in the execution plan,

then the application execution is in service k at time t if: 1. the application execution has

completed service m by time t; and 2. the application execution has not yet completed

service k by time t. This implies that the probability that having started service j at

time zero that the application execution is in service k at time t is given by:

Gijk(t) = Hijm(t)−Hijk(t) (7.2)

See Appendix C for a derivation of this result. Also, the probability that application i

just started executing service j still be at the same service at time t is simply equal to:

Gijj(t) = 1−Hijj(t) = 1− Fj(t).

Similarly, we can compute Gijk(t|t0) the probability that an application class i in-

stance has been in service j for t0 second, and will be at service k at time t by replacing


the pdf of fj(t) by its corresponding conditional pdf: fj(t|t0) = fj(t− t0)/(1− Fj(t0)).

Using Gijk(t), we can now characterize the random variable for the number of busy

service instances of any service at any given time t in future. Define an indicator function

for the event that the application execution will be in service k at time t having started

at service j at time zero:

Iijk(t) =

1, (prob = Gijk(t))

0, (prob = Gijk(t))(7.3)

For applications with multiple execution paths, the probabilities in the above indicator

function need to be multiplied by pijk which is the probability that the application i, now

in the service type j, will visit the service type k in future, according to its execution

plan.

Similarly, for the application instances that have already been in service j for t0

second, the probabilities in the indicator function (7.3) are replaced with their conditional

version Gijk(t|t0).

The number of busy service instances of type k at time t is found by adding the

indicator functions of all instances of applications in the system of any class i (index il),

which are being served by a service in system (index j) at time 0, and can therefore be

at service type k at time t:

Sk(t) =∑

l

Iiljk(t) (7.4)

We can now specify the probability of over-commitment in service type k at a future

time t (Pock(t)), that is, that the number of admitted applications needing service of type

k at time t exceeds the number Nk of service instances that have been provisioned. In

other words, Pock(t) is the probability that service type k is over-committed at a future

time t due to admission of too many applications:


Pock(t) = P

{

Sk(t) > Nk

}

(7.5)

In the DASC algorithm, the system computes this probability at the time of re-

ceiving a request for an application to ensure that the system is highly likely to have

the necessary free instances to serve the application instance in each of the succeeding

service types along the application execution paths for the time needed. Furthermore

DASC needs to compute the above probability for any future time t in order to meet an

agreed service level (Toc). Pockis a a major parameter in our algorithm, and we discuss

its computation in later sections.. In contrast to other admission control systems, the

incoming request rate is not a factor in computing this probability. We only need for

service execution distribution times to remain unchanged with time. For this reason our

proposed algorithm can operate in systems with bursty or non-stationary request arrivals,

and can handle transient surges in applications request rate without compromising the

service-level agreements.

We note that in this model not only different service components can have different

service execution time distributions, but also we can have different execution time dis-

tributions for a service type for different application classes. However, in the rest of this

chapter for the sake of simplicity, we assume that each service component only has one

execution time distribution for all application classes.

Consider the application failure probability which is the probability that an appli-

cation instance cannot complete its execution plan due to the unavailability of a free

instance of service type k at the time that the application needs the service k. We call

this probability Pfijk, and it can be computed as follows:

Pfijk=

∞∫

0

Pock(t)hijm(t)dt (7.6)

in which hijm(t) is pdf of the time to complete the execution of all services preceding


service type k (up to service m), or equivalently, the pdf for the start of the execution of

the service type k.

The DASC system keeps Pock(t) always below an over-commitment threshold Toc, so

an upper bound for the application failure probability is the over-commitment threshold:

Pfijk≤ Toc (7.7)

In other words, Toc is the upper bound for an application class i failure probability

at service k, if it starts its execution from service j. Consequently, to find the upper

bound of total application failure probability, we need to consider failure probabilities at

all services based on the application execution plan, as follows:

Pfij=

l∑

k=j+1

(Pfijk

k−1∏

m=j+1

(1− Pfijm)) (7.8)

where we assume that the last possible service is service l. Each term in the above sum

is the probability that the application execution fails at service k. By taking partial

derivatives of the above equation, it can be shown that Pfijis a monotonically increasing

function of Pfijk. Therefore, an upper bound for Pfij

can be obtained by applying the

upper bound for Pfijkthat is Toc:

Pfij≤ (1− (1− Toc)

l−j) (7.9)

in which (l−j) represents the maximum number of services that an application i instance

has to traverse to finalize its execution.

In the next section, we focus on the over-commitment probability, and to compute

this probability we use the Central Limit Theorem (CLT).


7.3 Computing Over-Commitment Probability

The random variable in (7.3) is a Bernoulli random variable at time t, and Sk(t) in (7.4)

is the sum of multiple non-identically distributed Bernoulli random variables. Therefore,

we can compute the mean and variance of the indicator function as:

E[Iijk(t)] = Gijk(t)

V AR[Iijk(t)] = Gijk(t)−Gijk(t)2

And consequently mean and variance of the sum random variable will be:

ηk(t) = E[Sk(t)] =∑

l

E[Iiljk(t)]

=∑

l

Giljk(t), (7.10)

σk(t)2 = V AR[Sk(t)]

=∑

l

V AR[Iiljk(t)]

+∑

l

∑

l′

COV (Iiljk(t), Iml′nk(t)) (7.11)

in which l and l′ denote the instances of applications currently in the system.

Now imagine that that there is an unlimited number of servers available to support

each service type. If so, then the application instances will all flow along their execution

paths without having to contend with each other to obtain servers and so they will not

interact at all. Consequently their corresponding indicator functions are independent

random variables. Because the over-commitment probability will be small, we can sup-

pose the number of servers of various types are ample. Therefore we assume that the

Bernoulli random variables in equation (7.4) are independent, and the above covariance


terms will be zero and the variance of sum random variable will be:

σk(t)2 =

∑

V AR[Iiljk(t)]

=∑

Giljk(t)−∑

(

Giljk(t))2

= ηk(t)−∑

(

Giljk(t))2

(7.12)

We know from the Central Limit Theorem (CLT) [118](p.278) that the sum of n

independent random variables approaches a Gaussian random variable with a mean and

variance equal to the sum of the means and variances of all of the random variables

respectively. Therefore, the over-commitment probability can be approximated using

CLT as:

Pock(t) = P

{

Sk(t) > Nk

}

= 1− Φ

(

Nk − ηk(t)

σk(t)

)

(7.13)

in which the Φ function is the cdf of a Gaussian random variable with mean η = 0

and variance σ2 = 1. The approximation for the over-commitment probability using the

Central Limit Theorem becomes more accurate as the number of application instances

in the system and the number of service instances are large, which is the case for many

real systems.

In summary, whenever an application request enters the service-oriented environ-

ment, the application admission control system computes the probability of the over-

commitment at any time during the application’s lifetime for every service along its

execution path, and if that probability is less than the permitted threshold (Toc), it

allows the application request to enter the system.

Another technique for computing the over-commitment probability is to use the theory

of large deviations and Chernoff’s bound. Chernoff’s bound enables us to find better

approximations of the probability if the target threshold Toc is very small, while the CLT-

based technique gives us good approximations of the probability for the target thresholds

in the range of 10−3 and higher. In the Appendix, we discuss this alternative method for


Agent 1 Agent 2 Agent 3

N1-1i

Serv

ice

Type

1

N1

Instances

N1-1iS

erv

ice

Type

2

N2

Instances

N1-1i

Serv

ice

Type

3

N3

Instances

AdmissionController

Application request

Queries

Figure 7.3: A service-oriented system with three agents, each controlling one service type

computing this probability using the theory of large deviation and Chernoff’s bound. In

the next section, we present the Distributed Algorithm for Service Commitment in more

detail, and we discuss how this system can be implemented in a distributed environment.

7.4 Distributed Algorithm for Service Commitment

Figure 7.3 shows the decentralized implementation of the service commitment function.

Each service type is controlled by one agent. The task of each agent is to monitor the

instances of one service type. Whenever the agent starts serving one application instance,

it informs the agents responsible for succeeding service types that it has just started the

execution of an application instance. The recipient agents have to store the relevant

information regarding each particular application instance and use it to compute their

over commitment probabilities every time a request for an application arrives. In other

words, the agent for service type k computes the parameters for the random variable

Sk(t) for all t in future.

In this distributed algorithm, when the admission controller receives a request for an


ACCEPTED

Update local Resources accordingly

Application Status (Running Agent -> *)

Is this service still in path?

IDLE ACCEPTED

Is application now running this service?

Do you have available

instances?

Application Dropped(*->Corresponding Agents)

Application Dropped(*->System Management)

Update Local

resources

RUNNING_AGENTIDLE

No

yes

yes

NoNo

yes

Update local

resources

RUNNING_AGENT

Execution Completed(Service -> *)

Update local resources

Application Status(*->Corresponding Agents)

IDLE

WAIT_FOR_COMMITMENT_CONFIRM

Commit Resources(Querying Agent->*)

Commit Future Resources

ACCEPTED

Release Resources(Querying Agent->*)

Release Temporary Commitment

IDLE

WAIT_FOR_CONFIRMS

Commitment Response(Corresponding Agents -> *)

Commitment possible?

Request rejected

Release Resources(*->Corresponding Agents)

All Agents reposne

received?

Save the confirmation

response

WAIT_FOR_CONFIRMS

Commit Resources(*->Corresponding

Agents)

Request Accepted

IDLEIDLE

IDLE

Check Over-Commitment(Admission Controller->*)

Evaluate probability of over-commitment

Probablity is under

threshold?

Temporarily Commit

Resources

Commitment Not Possible

(*->Admission Controllert)

Commitment Possible(*->Admission

Controllert)

WAIT_FOR_COMMITMENT_CONFIRM

No

Yes

IDLE

IDLE

Application Request for admission

Check local resources

Check Over-Commitment(*->All corresponding agents)

WAIT_FOR_CONFIRMS

The admission controller receives a

request an application instance and

queries all corresponding agents

All corresponding agents queried for

the over-commitment probability check

Admission controller receives Over-

Commitment Check Responses from all the

corresponding agents

Agents receive commitment command or

release commitment command from the

admission controller

The agent responsible for

the current service

(running agent) updates

other agents upon

completion of execution

Agents receive updates on

the current status and

location of the application

instance in the system from

the running agent

Figure 7.4: Distributed Algorithm for Service Commitment in SDL (Specification andDescription Language)


application, it will ask the corresponding agents whether they will have enough resources

for serving that application during the period in which it is anticipated for that applica-

tion to be served by their associated service types. Since all agents keep the records of

the applications which are likely to use their service type, they can reply to the agent’s

query with a ’yes’ or ’no’ reply. If the replies are all yes, the admission controller admits

the application, and it tells the corresponding agents to commit necessary resources for

the just admitted application instance.

It is noteworthy that the agents do not need to compute the relevant distribution

functions each time they receive a query from their preceding agents. Those distributions

can be provided to each agent by another computing module in the system, and the agents

can store them in their memory and use them as the need arises. Also, to avoid over-

commitment, the queried agents, upon arrival of a query from the preceding agents and

accepting to serve the application, can temporarily commit their resources for a limited

time, until they receive another message from the admission controller confirming the

acceptance or rejection of the application request.

The SDL (Specification and Description Language) [119] depicted in Figure 7.4 presents

the DASC algorithm. It can be seen that, the admission controller queries the correspond-

ing agents upon receiving a request for admitting an application instance. The presented

SDL shows the messages that should be exchanged among admission controller and the

agents responsible for services used in creating an application as well as the agents inter-

nal states and their interactions in DASC.

7.4.1 DASC Complexity Analysis

DASC is a distributed algorithm in which each agent is responsible for controlling one

service type. Therefore, for complexity analysis, we focus on one agent and we investigate

its processing and memory requirements.

If we represent the maximum lifetime of the longest living application in the system


by T , then the memory needed for storing the future estimation of instance usage for a

service type would be O(T ). Further, an agent has to store some information for each

application instance in the system that might use this service in future. If we represent

the maximum number of application instances in the system by Na then the memory for

storing application specific data will be in order of O(Na). So, in total, each agent needs

a memory in order of O(T +Na) to store the required data for the algorithm.

In addition, each agent for an incoming request has to compute the over-commitment

probability for the maximum duration of the longest living application in the system.

Therefore, the processing complexity for each agent would be in order of O(T ).

Since the algorithm is distributed, we need to analyze the communication overhead

in the system as well. In DASC, the admission controller has to query the corresponding

agents in order to admit a request. Therefore the communication overhead will be in

order of O(K) in which K shows the maximum number of services in the system. Also

as the application proceeds its execution in the system, each agent is required to notify

the succeeding agents of the latest change in the instance’s location. Therefore, the

total communication overhead for an admitted application instance would be in order of

O(K(K − 1)).

For example, for a system with 12 service components (presented in the performance

evaluation section) when a request enters the system, the admission controller commu-

nicates with 12 other agents (in worst case) to make an admission decision. These 12

messages are sent and processed in parallel, and each combined communication and com-

putation takes less than 10ms, and hence the total decision making time is less than 10

ms. We believe that for many systems and applications this decision making time is quite

acceptable. In addition, in this system, agents need a maximum memory of 500KB each.

The number of exchanged messages could be reduced if the bottleneck services in a

system are identified using an off-line analyses approach, and only agents responsible for

those services are queried for making decision. In some systems, this reduction would be


0 500 1000 1500 20000

0.5

1

1.5

2

2.5x 10

−3

prob

abili

ty d

ensi

ty fu

nctio

n (p

df)

time unit

Figure 7.5: Beta pdf for service execution time with parameters α = 2.333 and β = 4.666

significant if only a small portion of the services are bottleneck services.

7.5 DASC Performance Evaluation

In this section, we present the performance evaluation results for our proposed algorithm

for two different systems. The performance metric of interest is the applications failure

ratio. Application failure ratio is the ratio of the number of failed applications to the

number of applications admitted to the system. We would like this ratio to be less than

the threshold set for the failure probability. We also evaluate the application failure

ratios in each of the service types. Moreover, we compare the DASC algorithm against

steady-state based admission control systems in terms of application failure ratio as well

as applications throughput.

We begin by simulating the simple system described in the first section and depicted

in Figure 7.1 which is composed of two service types and two application classes. We

assume that the service provisioning has been performed and 100 instances for each


0.02 0.04 0.06 0.080

0.1

0.2

0.3

0.4

0.5

0.6

0.7

App

licat

ion

1 fa

ilure

rat

io

Applications request rate0.02 0.04 0.06 0.08

10−4

10−3

10−2

10−1

100

App

licat

ion

1 lo

g fa

ilure

rat

ioApplications request rate

NoCommitDASC−0.5DASC−0.1DASC−0.01

Figure 7.6: Application failure ratio for a system with two application classes and twoservice types

of service types one and two have been provisioned. We also assumed identical beta

distributions for service execution times for both services (Figure 7.5). We chose Beta

distribution since it can represent many pdf shapes and hence is useful in modeling many

types of services [104]. The Beta pdf parameters which we used in our experiment are

α = 2.333 and β = 4.666.

For generating application requests inter-arrival times, we used a geometric distri-

bution with parameter p ranging from 0.01 to 0.1 in 0.01 steps. Figure 7.6 shows the

application one failure ratio at service two for four different cases. The first case is the

case that there is no commitment control in place, and for the other three cases we applied

the DASC algorithm with thresholds of 0.5, 0.1, and 0.01. It is evident that without the

DASC algorithm the performance is very poor, i.e. more than 50% application failure for

high request rates. However, by applying the DASC algorithm, the system can achieve

its target QoS, even when the request rate is high.

We compared the DASC performance with an alternative steady-state based admis-

sion control mechanism. We designed this system using the bottleneck analysis of the

system [117]. In this method, we identify the bottleneck service (S2 in our system), and


0 0.02 0.04 0.06 0.082000

3000

4000

5000

6000

7000

8000

9000

10000a)Geometric Arrivals

Request arrival rate

App

licat

ions

Ser

ved

Suc

cess

fully

2000 4000 6000 8000 100005000

6000

7000

8000

9000

10000b)On/Off Bursty Arrivals

App

licat

ions

Ser

ved

Suc

cess

fully

Burst Period (T)

DASC−0.01BTLNK−0.01DASC−0.001BTLNK−0.001

Figure 7.7: Comparing DASC throughput with bottleneck-based admission control algo-rithm

we approximate the system performance by the bottleneck service performance. At the

bottleneck service, we used the Erlang-B formula to find the acceptable region of arrival

rates to the system, in the steady-state, to contain the probability of overflow at S2 at

the target level of 10−2, and 10−3. Then we use token regulators on the request arrival

processes to enforce the acceptable arrival rates.

We compared the throughput of a system controlled using Erlang-B, and a system

controlled by DASC algorithm, for a geometric request arrival (Figure 7.7a), and an on-

off bursty request arrivals (Figure 7.7b). In the bursty request arrival, we generated a

burst of request arrivals using geometric distribution with parameter 0.01 for a period of

T , followed by another burst of arrivals with parameter 0.1 for a period of T . Figure 7.7b

shows the applications throughput of the on-off bursty arrival process based on the value

of T . It can be seen that DASC outperforms the steady-state based admission control

system in both stationary and bursty arrival cases in terms of applications throughput

while it can meet the target QoS. This improvement is more significant in the bursty

case since DASC makes the admission decision based on the current state of the system,

and its anticipation of future usage.


S1 S2 S3

S4

S5

S6

S7

S8 S9

S10

S12

S11

S3 S4 S2

S7

S8

S7

S9

S1

S5

S6

S4 S1 S7

S2

S6

S8S5

S5

S7

S8

S4S3

S6

0.5

0.5

0.3

0.7

0.1

0.9

0.5

0.5

0.4

0.6

0.5

0.5

3

Application Class 1

Application Class 2

Application Class 3

Figure 7.8: A service oriented environment consisted of twelve service types and threeapplications


0.02 0.04 0.06 0.08 0.10

0.5

1

1.5x 10

−3

Applications request rate

App

licat

ions

failu

re r

atio

s

Applications Failure Ratios

APP1APP2APP3Total

Figure 7.9: Applications failure ratios in the system

Our simulations also show that the choice of target failure probability affects system

throughput. The lower we set this target, the system becomes more conservative in terms

of admitting requests for applications leading to the applications throughput reduction.

Also, as we increase the number of service instances the approximations become more

accurate. This is mainly due to the fact that the CLT becomes more accurate.

Next we simulated the more complex system depicted in Figure 7.8 which consists

of twelve service types and three application classes. The applications in this system

have sequential, conditional and parallel operations and application class two has one

loop operation. Again, we assume that the provisioning has been performed and 200

instances of each service type have been provisioned. For the service execution times, we

assumed identical beta distribution for all twelve service types. . We set the threshold

for the total application failure to 10−2 and used bound (7.9) to set the threshold for the

over-commitment probability for each service type (Toc) to 1.5 ∗ 10−3.

The parameters that we evaluated in this simulation are the total application failure

ratio, and the applications failure ratios in each service type separately. Moreover, we

compared DASC performance on this system with three other admission control mecha-


0.05 0.10

1

2

3

4x 10

−4


Tot

al a

pplic

atio

n fa

ilure

rat

ios S1

S2S3

0.05 0.10

1

2

x 10−4


S4S5S6

Figure 7.10: Failure ratios in services 1 to 6 vs. applications request rates

0.02 0.04 0.06 0.08 0.12

3

4

5

6

7

8

9

10

11

12x 10

4

Applications Request Rate

Tot

al A

pplic

atio

ns S

erve

d S

ucce

ssfu

lly

b)Applications Served Successfully

0.02 0.04 0.06 0.08 0.110

−6

10−5

10−4

10−3

10−2

10−1

100

Tot

al A

pplic

atio

n F

ailu

re R

atio

s


a)Total Application Failure Ratios

DASCNOCOMMSIMBTLNK

DASCNOCOMMSIMBTLNK

Figure 7.11: Comparison between four admission control mechanisms with stationaryrequest arrivals


0.511.522.5

x 104

0.45

0.5

0.55

0.6

0.65

0.7

Burst Time (T)

Rat

io o

f app

licat

ions

ser

ved

succ

essf

ully

a)Applications Served Successfully (ratio)

0.511.522.5

x 104

10−6

10−5

10−4

10−3

10−2

10−1

100

Tot

al a

pplic

atio

n fa

ilure

rat

ios

Burst Time (T)

b)Application Failure (ratio)

DASCNoCommitSIMBTLNK

Figure 7.12: Comparison between four admission control mechanisms with on-off burstyrequest arrivals with burst time (T)

nisms. The simulation period consisted of 750000 time units. For generating application

requests, we used geometric distributions with parameter p ranging from 0.01 to 0.1. In

this sample service-oriented environment, our analysis shows that the bottleneck service

is S3. Therefore, we computed the required parameters for different values of p ranging

from 0.01 to 0.1 in 0.01 steps, which covers a low request rate up to a request rate that

loads the system with twice its provisioned capacity at the bottleneck service (S6).

Figure 7.9 shows that even under very high request rates the total application failure

using DASC remains under the guaranteed level of 10−2. We also measured the individ-

ual application class failure ratios at each service component. Figure 7.10 shows these

measured failure ratios at service S1 to S6. These measured ratios are all below the

target threshold (1.5 ∗ 10−3) even under very high request rates.

We also compared the DASC performance against three other admission controllers

with both stationary and non-stationary request arrivals. Two of the admission con-

trollers are token bucket regulators that enforce an acceptable region of arrival rates on

the arrival process. In one of these, the acceptable region is obtained using the bottleneck


analysis of the system, and applying the Erlang-B formula as described before. The other

admission controller uses simulation-based techniques to find the best arrival rates in the

steady-state to maximize the throughput while keeping the failure probability less than

the target threshold of 10−2. The third controller does not apply any admission control

on the arrival process, and admits requests for applications if there exists a free instance

of the first service component of the composed application in its execution plan.

Figure 7.11 shows the measured applications throughput and failure ratios for the

stationary arrivals based on the request rate, and Figure 7.12 shows these parameters

for the on-off bursty request arrivals based on the burst period (T). The DASC out-

performs other mechanisms in both cases in terms of total throughput, and is able to

meet the QoS target. With stationary arrivals and high request rates, this improve-

ment is approximately 20% compared to the next best method (simulation-based). Note

that the bottleneck approach is overly conservative and provides lower throughputs and

very low application failure ratios. The improvement becomes much more visible with

non-stationary request arrivals. This is mainly due to the fact that DASC is able to

take advantage of the ”openings” through transient-state analysis of the system. The

DASC throughput is higher than other methods when the burst period is large, and it

can acheive comparable throughput to the nocommit algorithm when burst period is

small while can still meet the target QoS. Interestingly, due to the transient-analyses

prperty of the DASC algorithm, in some low burst periods DASC can find more open-

ings, and hence, achieve a higher throughput compared to other low burst periods while

still keeping the faliure probabliy below the threshold.

Queue-enabled Service Commitment 156

7.6 Queue-enabled Distributed Algorithm for Ser-

vice Commitment

In the previous sections, we presented the Distributed Algorithm for Service Commit-

ment (DASC) as an application admission control mechanism in a service-oriented envi-

ronments able to guarantee the probability of successful completion for admitted appli-

cation instances. So far, we assumed that the system is not allowed to queue application

instances, and if one application instance finds no free instance of a service at the time it

needs that service, the application instance leaves the system. In this section, we mod-

ify our algorithm so that a service offers a small number of queuing spaces to mitigate

application failures. We allow queuing, but keep it’s usage under an agreed level.

The number of required queuing spaces in a DASC controlled queue-enabled system is

very small compared to the number of service instances, since DASC algorithm keeps the

probability of over-commitment very low. For instance, if we assume that the threshold

for probability of over-commitment is Toc and the total number of instances of a service

type is N , then we roughly need at least TocN queuing spaces to mitigate the application

failures. Considering the fact that Toc is usually very low, the number of queuing spaces

are significantly smaller than the total number of service instances.

This section is organized as follows: In the next subsection, we present the modifica-

tions to the formulations to accommodate the queuing capability in the system. Then,

we discuss the extensions the the distributed algorithm to be able to make the admission

decision in a distributed environment. This subsection is followed by the performance

evaluation section.

In the appendix, we present a set of theorems and corollaries that are used in obtain-

ing the required parameters in the Queue-enabled DASC algorithm (Q-DASC) and are

referred to in the formulation and algorithm section.


7.6.1 Problem Formulation and Description

To start applying this extension to DASC, we need to present a brief analytical description

of some parts of the DASC algorithm that we need to modify.

One of the main parameters in DASC is Gijk(t) that shows the probability of an

application instance i, just starting execution of service type j, will be at service k at

time t, as formulated in (7.2). We need to consider the effect of adding a queue on this

parameter.

Assume that an application instance i arrives at service j and finds itself at the qth

spot in the queue (1st spot being the head of queue), and assume that there will be no

further queuing for that application instance along its way to service k, then we have:

hqijk(t) = gq

j (t) ∗ hijk(t)

hqijm(t) = gq

j (t) ∗ hijm(t) (7.14)

in which gqj (t) is the pdf of the Time to Enter Service (TES) for the queued application

instance.

By replacing hijk(t) with its queue-enabled representation hqijk(t), we have:

Gqijk(t) = Hq

ijm(t)−Hqijk(t) (7.15)

in which Hqijk(t) is the cdf of hq

ijk(t).

Similarly, the probability that the application i which just joined the qth spot in

service j’s queue is still in the queue or is executing service j at time t is: Gqijj(t) =

1−Hqijj(t)

Finding a closed form for this distribution in general case is quite difficult and im-

practical. In the appendix, we present a series of results that are used in finding lower

bound for this TES distribution for the queued instances. In queue-enabled systems, we


use this bound to compute the required over-commitment probabilities.

In the Q-DASC algorithm, the agent responsible for the queue computes the TES

mean (ηqj ) and variance (σq

j2) for the queued application instance using the results in

the last section (Theorem 3 and Corollary 3). Then it reports these parameters to the

succeeding agents. The succeeding agents, on the other hand, compute the convolutions

in (7.14) using the received parameters and assuming that the TES distribution is a

Normal distribution with parameters (ηqj , σ

qj2), and apply them to (7.15) and update

their future resource usage:

hqijk(t) = N(t, ηq

j , σqj2) ∗ hijk(t) (7.16)

If σqj2 is much less than the variance of hijk(t), the above Normal distribution, in

comparison to hijk(t) distribution, can be treated as a delta function centered at ηqj ;

(δ(t − ηqj )). This can be easily shown using the frequency domain analysis of the above

distributions. In this case, the equations in (7.14) and (7.15) will be changed to:

hqijk(t) = gq

j (t) ∗ hijk(t) ≈ hijk(t− ηqj ), ∀t > ηq

j

Gqijk(t) = Hq

ijm(t)−Hqijk(t)

≈ Hqijm(t− ηq

j )−Hqijk(t− η

qj ), ∀t > ηq

j (7.17)

As it can be seen in this case, the future estimations would be the shifted versions of

the estimations used in the queue-less DASC algorithm.

7.6.2 Q-DASC Performance Evaluation

In order to evaluate Q-DACS performance and examine its effect on the quality of service,

we simulated the complex system introduced in Figure 7.8.

We first assume that all service types have ample number of queuing spaces, and we


0.02 0.04 0.06 0.08 0.10

0.5

1

1.5

2

2.5

3x 10

−3


App

licat

ions

Que

uing

Rat

io

Q−DASC Applications Queuing Ratios

APP1APP2APP3

Figure 7.13: Applications queuing probability with ample number of queuing spaces usingQ-DASC algorithm

measured the applications queuing ratio instead of applications failure ratio. The queuing

ratio is measured by dividing number of queued instances of applications by the total

number of admitted application instances. In particular, we wanted to check whether

the proposed Q-DASC algorithm would operate below a target queuing ratio.

In another simulation we assume that services only have a few queuing spots, and we

determine whether these few spots would translate to lower applications failure ratios.

The reason for this experiment is to show that the system rarely needs to queue the appli-

cation instances since the commitment control mechanism restrains the over-commitment

probability.

In this section, we assume service component characteristics identical to the charac-

teristics described in Section 7.5, and we also use the geometric distribution based request

arrival generator as well as the thresholds described in Section 7.5.

Our first measurement is the application queuing ratio. The target level for this ratio

is 10−2. Figure 7.13 depicts the queuing ratio for all three application classes based on

the applications request rates assuming that there are ample number of queuing spaces.

Clearly the Q-DASC algorithm keeps the queuing ratio under the threshold even when


0 1 2 3 4 5 6 710

−5

10−4

10−3

10−2

Queue Size

App

licat

ions

Fai

lure

Rat

io

Q−DASC Applications Failure Ratios

APP1APP2APP3

Figure 7.14: Applications failure probability based on queue size in Q-DASC algorithm

the offered load to the system is very high.

Figure 7.14 shows the effect of number of queuing spots on applications failure proba-

bility. In this simulation, we assumed that the requests for applications follow a geometric

distribution with parameter p equal to 0.1 that offers a load to the system which is almost

twice the system’s capacity. It is evident that by adding a few queuing spaces for the

over-committed application instances, we can significantly reduce the applications failure

ratio, in comparison to a queue-less system, even when the offered load is very high.

7.7 Related work

One of the main issues in service-oriented systems that has been extensively studied is the

problem of QoS-aware service composition. This problem deals with the cases where each

service component has a specific set of QoS parameters and an overall QoS constraint

has to be met for a composite application [98, 97, 105, 106, 107, 108]. Among the papers

discussing this problem we can mention [98], in which the authors have formulated the

problem as a linear programming problem, and in [105], where the authors proposed

heuristics for optimal service composition considering general distributions for services.


We consider our work as an extension to these works, since we guarantee successful

completion of an application according to the service-level agreements.

The probabilistic nature of service execution and its influence on contracts between

service providers and application providers have been also studied in [112], in which the

authors have argued that instead of contracts that are based on hard bounds, probability

distributions can be used in soft contracts between web service providers and their clients.

In the first chapter of this part of the thesis, the problem of service allocation in

service-oriented environments has been introduced, and the optimal solution of the prob-

lem using Markov-Decision Processes [103] is presented. The computation of optimal

admission and allocation policies using MDP has some limitation for real large-scale

systems, especially due to the problem of state space explosion and assumptions on ex-

ecution distributions. In this chapter, however, we extended this work and we proposed

algorithms for guaranteeing quality of service in service-oriented systems.

In addition to the area of application creation through service composition, the work

in this chapter touches other research fields. For example, in the operations research

field, we can point to relevant research in admission control to a network of loss queues.

For example in [120], the authors proposed optimal solutions for admission control to two

queues in tandem, assuming only two user classes and exponential distributions. In [121],

the authors extended the work to multiple queues in tandem and presented a heuristic

algorithm as well. However, guaranteeing QoS is not a concern in their work.

In queuing theory, there has been a vast amount of research on analyzing queuing

network performance metrics [116]. While there are many types of queuing networks, very

few have exact analytical solutions for performance parameters [116, 122, 115, 114], and

many approximation techniques have been proposed to find approximate performance

metrics (especially throughput) [113, 123, 124, 125, 126, 127]. We modeled our problem

as an open Finite Capacity Queuing Network (FCQN) with limited or no waiting spaces

and loss[113]. These networks do not have closed form solutions [114] and generally


approximations are used to analyze their performance metrics in steady state. For

example in [113], the authors have presented a technique based on queuing network

analyzer [123] that approximates the throughput and expected waiting time. The authors

have found that the approximations are more accurate under light and moderate load,

and they become less accurate when the system is in heavy load. This and other methods

are based on the decomposition of the networks to individual queues. We direct interested

readers to [114] for a complete survey on these methods.

The inclusion of fork-join queues in a network makes its analysis more complicated.

In fact, for fork-join queues exact analytical results only exist for the mean response time

of a two server system [128, 129]. Although these types of queues can be seen in many

applications, these queues have not received much research attention because are very

difficult to analyze. For example in [126], the authors have proposed an approximation

technique for an open queue with fork-join queues and normal queues, but they have

assumed a blocking type of an open FCQN composed of M/M/C/K queues.

Another decomposition-based approximation method is the bottleneck analysis dis-

cussed in [117], and [130]. In this approach, the bottleneck queue is determined and

through its analyses approximations for the network of queues can be obtained.

To best of our knowledge, our work is the first that has used a probabilistic approach

to control admission in an open FCQN with losses that can guarantee the loss probability

for systems with non-stationary request arrival processes.

Admission to network of queues has also been studied by the telecommunication

research community in the context of admission to wireless networks. A comprehensive

survey on this field can be found in [131]. The context of our problem, however, is different

since we are dealing with composing multiple services and creating new applications. In

addition, while assuming exponential distribution for calls in wireless networks seems

reasonable, it would not be an accurate assumption in service-oriented environments.

Moreover, parallel and loop operations in service composition do not have a match in the


wireless cellular networks CAC problem.

The closest wireless CAC algorithm to our problem is introduced in [132], in which

the authors have also used a convolution-based approach to predict the future resource

usage in the cellular network. However, in that paper the authors have stopped short

of analytically computing the call dropping probabilities in the way we formulated the

over-commitment and application failure probabilities.

Other prediction-based papers in the field of CAC in wireless networks involve us-

ing linear predictors and wiener-process based predictors for future resource usages [133]

which basically anticipate the future based on the past. However, in our problem, by

utilizing the knowledge on the execution plans and service execution times, we can ana-

lytically anticipate the future resource usage in a much more accurate way.

In the real-time operating systems field, there are numerous articles on scheduling

and admission control mechanisms for real-time tasks [134, 135]. Recently the focus

has been more on tasks and jobs that have probabilistic execution times [136, 137].

For example in [137], the authors in addition to presenting a survey on the relevant

publications, have described a technique on computing the probability of missed deadlines

on a monoprocessor real-time operating system. In [138], the authors have approximated

the task execution distribution by Coxian distributions of exponentials and performed

the schedulability analysis for multiprocessor real time application. Although our work

in this chapter is presented for a system that orchestrate execution of applications, and

is mainly in the application level for a distributed and service-oriented environment,

variations of this modeling could also be applied to real-time large-scale and distributed

multiprocessor systems for the purpose of schedulability analysis as well.

In the next chapter, we study another issue in making admission decisions in service-

oriented systems that is the problem of system revenue maximization. The system rev-

enue in service-oriented systems can be maximized by admitting more valuable applica-

tion classes to the system and rejecting less valuable ones considering the applications


request arrival rates. We also present an application admission control system that com-

bines the DASC algorithm and the reward-based admission controller to maximize the

system revenue as well as to guarantee QoS.

Chapter 8

Application Admission Control

System

In a service-oriented environment service instances are allocated to composite applica-

tions so that the required performance is provided. Application admission control can

be used to ensure that appropriate amounts of instances are committed to applications,

given the revenue each application brings in the system and the system’s current com-

mitment. The techniques described so far are able to control the over-commitment and

failure probabilities and guarantee the application success probability. However, they do

not address the issue of maximization of system’s overall revenue.

In this chapter, we extend our study by proposing an application admission control

system for service-oriented environments. This proposed system mainly makes the ad-

mission decision in two steps. Upon receiving a request for an application, in the first

step, the system according to the current commitments, checks if it can guarantee the

target QoS in terms of probability of the successful completion. This check is called the

feasibility check part of the admission control system that uses Distributed Algorithm

for Service Commitment (DASC) [14] described in the previous chapter.

In the second step, a revenue maximization unit is used to maximize the system

165

Application Admission Control System 166

revenue by accepting more valuable applications to the system. For this unit, we propose

two approaches. The first approach is a steady-state based revenue maximization one

that is simple to implement, but does not capture the transient state of the system.

The second approach is a more complicated method that uses online optimization based

techniques which itself consists of three sub-blocks. The main sub-block is an online

optimizer block that solves a binary integer programming problem to maximize system

revenue. The proposed approaches in this chapter are different from the MDP-based

solutions proposed in Chapter 6 in that they avoid the exponential service execution

times and request inter-arrival assumptions of the MDP-based methods.

In this chapter, we first state the problem of reward-based admission. The steady-

state based approach is discussed next, and online optimization approach to application

admission control, and its main blocks are discussed in section 3. These blocks are the

feasibility check block, scenario generator block, online optimizer block, and the final

decision maker. The binary integer programming problem is formulated in this section

as well. Lastly, we present the performance evaluation and comparison results.

8.1 Problem Statement

Assume a service-oriented environment in which there are different service types, and

where different applications can be created by composing sets of different service types.

Each instance of an application requires each given service type during part of the appli-

cation lifetime. Service instances can be used by other applications instances as soon as

they become idle.

Figure 8.1 shows an example system with 3 types of applications and 3 service types:

Application 1 is composed of service types 1, 2 and 3; application 2 is composed of service

types 2 and 3; and application 3 is composed solely of service type 3.

In the example, application 1 first executes service type 1, and then executes service


S1 S2 S3

S2 S3

S3Application 3

Application 2

Application 1

Applications

leaving the

system

Figure 8.1: A sample service-oriented environment

type 2, and finally it goes to the last service type. Similarly, application 2 executes service

type 2 followed by service type 3.

Multiple applications can contend for the same service, and we suppose that each

application brings a different reward to the system. For example if application 2 brings

a low reward to the system while applications 1 and 3 bring higher rewards, then the

system should avoid over-committing service type 3 to application 2 at the expense of

applications 1 and 3. Application admission control entails regulating the admission of

applications so that application requirements are met while system revenue is maximized.

In the previous chapter, we proposed a distributed heuristic algorithm for the prob-

lem of service-commitment in service-oriented systems called DASC [15]. The DASC

algorithm makes sure that the system delivers a guaranteed level of quality of service in

terms of success probability for each accepted application instance.

Another aspect of the problem of application admission control is to ensure maxi-

mization of the system revenue. In this chapter two revenue maximization methods are

proposed: one is a steady-state based method and the other one is an online optimization-

based method that maximizes the system revenue by solving a linear programming prob-

lem. In the next section, we first describe the steady-state based method for revenue

maximization.


8.2 Steady-State Based Application Admission Con-

trol System

In this section, we study the revenue maximization problem in service-oriented systems in

steady-state. Our goal is to obtain a set of admission parameters for application classes

to maximize the system overall revenue.

Assume that there is a service-oriented environment withM different service types and

L different application classes in which the reward for serving an instance of application

class i is Ri. In this system, the probability of an application class i instance uses a

service type k instance during its execution is pik.

We assume that the incoming process for the application class i is a renewal process

[139] with mean 1/λi. In other words, its interarrival time follows a general distribution

with mean 1/λi. Also we assume that the execution time of service type k follows a

general distribution with mean mk. From [139], we can find the expected number of

application i instances being served at service type k in the steady-state as pikλimk if

there was no limitation for the service instances in the system. If we sum the expected

values for all application classes, then the expected number of busy service instances of

type k in the steady-state would be:

L∑

i=1

pikλimk (8.1)

On the other hand, since there are finite number of service instances of each service

type in the system, we define an admission control parameter zi for the application class

i indicating the portion of requests for class i applications that can enter the system.

Therefore, we define a linear programming problem for finding the optimum values for

zis that can maximize the system overall reward in the steady state considering the


limitations on the number of service instances (Nk) in all service types as follows:

max

L∑

i=1

(λiRi)zi (8.2)

s.t. mk

L∑

i=1

(λipik)zi ≤ Nk,∀k ∈ {1, 2, ...,M}

zi ∈ [0, 1],∀i ∈ {1, 2, ..., K}

The optimum values achieved by solving the above linear programming problem are

used by our proposed service-oriented system to control the incoming request rate for

each application class entering the system. This control can be enforced using token

bucket mechanisms that regulate the admission rate of application classes to the system.

The rate control parameters which are required for this algorithm can be calculated

in an optimizer module. If the request arrival process is stationary and has known

arrival rate for each application, they can be provided to this module manually. On the

other hand, automatic rate measurement techniques can be used in case the application

request arrivals processes are not stationary. In this case, the optimizer can recalculate

these parameters every time the incoming request rates change.

It is important to note that in the steady-state method, the reward-based admission

block is totally separated from the commitment block, and if an application request passes

the reward-based admission control mechanism, it needs to pass the service commitment

checks as well, in order to get into the system. This makes the implementation of this

system simple and as it will be shown in the performance evaluations section it can

achieve acceptable performance results.

In the next section, we describe another revenue maximization approach using online-

optimization techniques. Although this alternative method is more complicated than the

steady-state method, but it can better capture the transient states of the system and


make better admission decisions in those states.

8.3 Online Optimization-based Application Admis-

sion Control System

If we assume exponential distribution for the service execution times and for the appli-

cations request inter-arrival time, finding the optimal solution for the problem will lead

us to solving a dynamic programming problem using Markov Decision Processes that

we studied in [12]. However, in the general distribution case, the search for the optimal

solution involves solving a multi-stage stochastic programming problem [140].

In a multi-stage stochastic programming problem, in contrast to a deterministic pro-

gramming problem, we try to find optimal decisions at each stage, considering the stochas-

tic nature of the problem and uncertainty about future events. In our case, for instance,

whenever a request enters the service-oriented system, we would like to know whether

we should accept the request and gain its corresponding reward, or wait for later request

arrivals for other more valuable application classes. The uncertainty in this problem is

the time and type of the future request arrivals, and the execution times of the corre-

sponding service types. In a multi-stage stochastic problem the decisions should be made

when a request for an application arrives to the system, while somehow accounting for

the uncertainty in future stages when future requests might come and leave.

Due to the enormous number of uncertainties, this approach to finding optimal so-

lutions for the application admission control in service-oriented systems becomes com-

putationally extensive and infeasible for real systems, especially for the systems which

require on-line decision making.

Another approach to this problem that we follow in this section is a heuristic tech-

nique that finds near-optimal decisions using on-line optimization approaches [140]. In

online optimization approach, we try to find best decisions for accepting or rejecting a


Online Optimization-based Admission Control

Feasibility

Check

(DASC)

Online

Optimizer

Scenario

Generator

Final

Decision

Maker

Request

for application

Accept

the request

rejectreject

Figure 8.2: Application Admission Control System using Online Optimization

request for an application class as requests arrive to the system, in an online manner.

As described in [140], the online stochastic combinatorial optimization approaches have

been used in solving many different decision making problems such as scheduling, and re-

source allocation. For example in [141], the authors have studied an online optimization

technique for the problem of admission control to a media-on demand system.

The online optimization approach for our problem consists of finding the optimal

decision for some sample scenarios of the system trajectory instead of finding the opti-

mal decision that could be achieved from a computation intensive multi-stage stochastic

programming problem. In particular, our online optimization approach considers few

sample scenarios up to a finite horizon, and finds the optimal decisions for those scenar-

ios, instead of considering all the uncertainties on events which would occur in future. To

do so, we have to consider few factors such as the number and status of the application

instances that are already being served in the system as well as the number of available

service instances.

An additional important factor in making the decision is the reward that each ap-

plication class instance brings for the system. The system has to make a decision on

either accepting the newly arrived request, or waiting for future more valuable requests.

Other two important factors in the decision making are the time between each arrival


for different application classes, and the time for executing services for each application

class.

Therefore, we propose the following algorithm for the problem of application admis-

sion control in service-oriented systems using the online optimization approach, and we

elaborate more on each of the following steps later in this section.

1) Upon receiving a request for an application class, we check the feasibility of ac-

cepting the request and we reject the request if accepting the request is not feasible.

2) We generate some scenarios for the possible system trajectory in future.

3) We find the optimal decision of either accepting the newly arrived request for the

application or rejecting it in each generated scenario.

4) We make the final decision of accepting or rejecting the request based on the output

of each decision making process in step 3.

Figure 8.2 shows the block diagram of this proposed algorithm.

8.3.1 Feasibility Check

Feasibility check function in our algorithm evaluates the system’s current commitments

to the already being served application instances, in order to guarantee an agreed level

of quality of service. To do so, we use our previously proposed algorithm for the service

commitment. In the previous chapter [14], we stated the problem of service commitment

in service-oriented systems, and we proposed a distributed algorithm for this problem

called DASC. In DASC, we define a threshold for the over-commitment probability, and

we keep the over-commitment probability under this threshold by rejecting the requests

for application classes that might push this probability above the threshold. By doing

so, we guarantee an agreed level of application success probability for the admitted

application instances.

In our admission control system, we utilize the DASC algorithm to check the feasibility

of admitting an application instance. It is important to note that by feasibility, we mean


guaranteeing the agreed level of success for all of the application instances that are already

being served, as well as the newly arrived request. If this check shows that we can not

deliver the guaranteed level, we will reject the request immediately, otherwise we will

proceed to the next step of the algorithm.

8.3.2 Scenario Generation

The second step in our online optimization approach is to generate a number of sample

scenarios for the system trajectory. These scenarios consist of scenarios for the application

instances that are currently in the system as well as scenarios for applications that arrive

in future.

The scenario generating mechanism identifies exact times for service executions as

well as the execution path of the application. For instance, if an application class 1

instance in Figure 8.1 is currently being served in service type 1, the scenario generator

would tell us that this instance finishes execution of that service at time unit 700, and will

continue to service type 2 and finishes executing it at time unit 2500, and after that starts

executing service type 3 and leaves the system at time unit 4200. As in this example, in

each additional scenario, an exact timing is assigned to each of these transitions and also

the exact execution path is specified.

To generate these scenarios, we can use the distributions for execution times for

each service type, distributions of request inter-arrival times, and different probabilities

associated with choosing each service and consequently the applications execution path.

We have discussed these distributions and probabilities in the previous chapters [12, 14].

An approach to obtain the required distributions is to use the system’s historical data.

To do so, the historical data of the system’s activity has to be recorded and analyzed to

find the required distributions and probabilities. In the rest of this chapter, we assume

that these distributions are already available to the online admission control system,

and the scenario generator block uses these distributions for generating scenarios for the


current application instances and future arrivals.

Another issue in generating scenarios is specifying a finite horizon for these scenarios.

In other words, we have to decide how far we want to look to the future of the system in

generating scenarios. To some extent, assuming a finite horizon for generating scenarios

resembles defining a sliding window in discrete time signal processing systems. As in these

types of systems, a limited window of the accumulated data is used for the processing,

and decisions are made based on the windowed data.

There are various factors in determining the horizon, such as the storage and pro-

cessing limitations of each system. Applications lifetimes are another important factor

in determining the horizon. Therefore, the decision on the length of this horizon can be

made by the system designers based on each system’s resources and scale. The approach

that we practice in this paper is assuming a finite horizon based on the maximum lifetime

of the application classes.

After generating the required scenarios, we are ready to proceed to the next step of

our online optimization approach that is discussed in the next subsection.

8.3.3 Optimal Admission Decisions For Generated Scenarios

The online optimizer in our proposed system is responsible for finding the optimal decision

for accepting or rejecting the requests in each scenario. To do so, we formulate a linear

programming problem.

In our online optimizer, the reward for serving an application request r is represented

by w(r), and the decision for accepting or rejecting that request is shown by a(r) that

can take one of the two following values: 0 for rejecting, and 1 for accepting the request

r.

There are also K different service types in the system S = {sj, (1 ≤ j ≤ K)} that

each has N(sj) instances. The scenario generator block produces a series of events and

the timings associated to each event for the online optimization block, as described in


the previous section.

Based on these definitions, we can define the following Binary Integer Programming

(BIP) problem for finding the optimal admission decision for each scenario:

max W =∑

r∈R

w(r)a(r), a(r) ∈ {0, 1} (8.3)

s.t∑

r∈R

er(sj, te)a(r) ≤ N(sj), ∀te ∈ T, sj ∈ S (8.4)

T =⋃

Ter

S = {sj, (1 ≤ j ≤ K)}

in which R represents the set of all requests for applications including the newly arrived

request represented by r = 0. Also Ter is a set of all of the event times associated to one

particular request r, and er is the execution path of the request r, both provided by the

scenario generator block.

The objective of this binary integer programming problem is to maximize the sys-

tem reward W by accepting or rejecting each request in the generated scenario. This

maximization is subject to the services capacity at the time of each transition in the

applications execution path. This constraint is evaluated by considering er(sj, te) which

shows that request r is in service type j at transition time te or not. The set representing

all these transition times is called T which is produced by the scenario generation mech-

anism for each scenario. The formulated BIP finds the optimal admission decision for all

requests in one scenario. However, our main concern is a(0) that shows the decision for

accepting or rejecting the newly arrived request in that particular scenario.

The above integer programming problem can be solved efficiently using techniques

such as branch and bound. The two main outputs of this step are a(0) and the max-

imum achievable reward (W ) that are fed into the next step of our proposed online


admission control system explained in the next subsection. It is important to note that

this optimization problem is solved for each scenario, hence the number of scenarios that

can be evaluated is limited to the available time for making the decision, as well as the

time required for solving the stated BIP problem given the available processing power.

Therefore, in general case, the number of scenarios, and consequently the number of opti-

mizations will be determined by the system designers based on the system’s specifications

and resources.

8.3.4 Final Decision Making

In the previous subsection, we found the optimal decision for accepting or rejecting the

request for an application in each scenario . The next step is to make the final admission

decision. To make this decision, the output of the online optimizer block (i.e. a(0) and

W for each scenario) are fed to the final decision maker block. Based on these obtained

parameters, we can practice one of the following approaches for making the admission

decision:

1) Voting: accept the request if the majority of the optimal decisions for the generated

scenarios are in favor of accepting the request. This approach is similar to the voting

mechanism where a decision is made when majority of the voters are agreed to the

decision.

2) Conservative: accept the request if all of the decisions are in favor of accepting the

request.

3) Greedy: accept the request if at least one of the decisions is in favor of accepting

the request.

4) Maximum reward: accept the request if the total reward gained by accepting the

request is more than the total reward gained by rejecting the request.


8.4 Performance Evaluation

To evaluate the performance of the proposed algorithm we simulated the system depicted

in Figure 8.1. We wrote a C++ program, and for solving the linear programming problem

we used an open source library called lpsolve [142].

We set the number of instances per service type (i.e. N1, N2, and N3) to be 20. For

the service execution times, we assumed a beta distribution [118] for all three service

types with parameters α = 2.333, β = 4.666, the optimistic value of 1000, pessimistic

value of 2000, and mean of 1333 time units.

The simulation period in our simulation is 180000 time units which is 30 times of

the maximum lifetime of an application 1 instance. We also chose similar geometric

distributions for the arrival inter-arrival times for all three classes of applications with

parameter p ranging from 0.002 to 0.010. We also assumed the following rewards for

successful termination of each application instance: 0.4 for an application class 1 instance,

0.2 for an application 2, and 0.8 for an application 3. The penalties for unsuccessful

termination of applications are 0.3 for an application 1 instance failed in service 2, 0.8

for an application 1 instance failed in service 3, and 0.4 for an application 2 instance

failed in service 3. No cost is associated with rejecting a request for an application, and

rejected requests will leave the system and will not interfere with the system in future.

We evaluated the system performance using four different techniques. The first tech-

nique which we used is the No Commitment Policy (NCP) in which the system does not

try to maximize the system revenue and does not avoid over-commitments.

For the second mechanism, we only used the DASC algorithm that guarantees the

quality of service, but does not address the problem of reward maximization. We set the

threshold parameter in this algorithm to 1% meaning that the system guarantees 99%

success probability for the admitted application requests.

For the third technique, we used the steady-state based application admission control

algorithm [14]. This technique works based on the steady-state analysis of the system,


3 4 5 6 7 8 9 10

x 10−3

500

600

700

800

900

1000

1100

1200

1300


Sys

tem

Rew

ard

NCPDASC1% AloneDASC1%−SSDASC1%−Online

Figure 8.3: System reward for four different techniques

and uses a linear-programming technique to achieve admission regulation parameters

that maximize the system revenue in the steady state.

For the fourth mechanism, we used online optimization-based system composed of

the feasibility check block, scenario-generating block, online-optimization block, and the

decision making block. For the feasibility check block, we used the DASC algorithm with

a threshold parameter equal to 1%. For the scenario generation block, we generated three

different scenarios for the period of two times longer than the longest living application

(i.e. application class 1). As we mentioned earlier, we used lpsolve library [142] for

solving the binary integer programming problem, and for decision making, we used the

voting mechanism.

Figure 8.3 shows the system revenue for the period of simulation for these four different

mechanisms. Figure 8.4, on the other hand, shows failure probability for application

classes 1 and 2 in a logarithmic view.

As it can be seen, the NCP technique performance is acceptable when the system


3 4 5 6 7 8 9

x 10−3

10−3

10−2

10−1

100

App

licat

ion

1 F

ailu

re R

ate

Applications Request Rate3 4 5 6 7 8 9

x 10−3

10−3

10−2

10−1

100


App

licat

ion

2 F

ailu

re R

ate

NCP

DASC1% Alone

DASC1%−SS

DASC1%−Online

Figure 8.4: Application 1 and application 2 failure rates based on the applications requestrate

is lightly loaded but the system revenue degrades drastically as system’s load increases.

Moreover, the application failure results for this technique are extremely poor considering

the best effort nature of this technique.

The second observation is the performance of the DASC algorithm when it is the sole

mechanism in place. As it can be seen, although the application failure probability is

under the threshold, the system revenue does not improve using this algorithm.

The steady-state based admission control algorithm, combined with the DASC al-

gorithm, can perform better than two previous techniques. As it can be observed, the

system revenue increases using this combination, and at the same time, the required

quality of service can be delivered.

However, the performance of the online optimization approach outperforms the steady-

state based technique, especially when the system is not heavily over-loaded. The main

reason for this performance improvement is the ability of the online optimization tech-

nique in capturing the transient conditions in the system, as opposed to the steady-state


based admission control technique which uses the steady-state conditions of the system.

This observation can be interestingly confirmed by the fact that the performance im-

provements are mainly occur when the system is not heavily over-loaded, and there are

further potentials in the system for revenue maximization. As system goes over loaded,

the systems capacity becomes saturated, and therefore both the steady-state based tech-

nique and the online optimization based system can perform well.

Chapter 9

Conclusions

Future networks should cope with challenges imposed by emerging future generation of

applications; otherwise the range and scope of applications over future networks will be

limited by the design choices of the past. In this thesis, we studied future networks

and applications requirements and we addressed various challenges in future networks by

proposing an architecture, a network research testbed and scalable and distributed QoS

control algorithms.

9.1 Contributions

While most of present proposals on future network architectures have been designed to

address requirement of a particular class of applications, we have taken the research on

future network architectures a step further by proposing an application-oriented net-

work architecture as a configurable converged communication and computing network.

Based on this new network architecture, we designed a Virtualized Application Network-

ing Infrastructure that enables networking researchers to experiment with new network

architectures and distributed applications. We have also proposed a novel scalable and

distributed QoS and admission control algorithm in Service-Oriented systems and in Fi-

nite Capacity Queuing Networks in general. Overall this thesis contributions can be

181

Conclusions 182

listed as follows:

9.1.1 Application-Oriented Networking

We proposed a novel network architecture called an Application-Oriented Network archi-

tecture that addresses challenges from future networks applications such as configurability

and application-awareness, and facilitates application creation through virtualization of

processing, storage, reprogrammable hardware, and software resources that are commonly

used in application creation. We proposed a three-plane architecture for AON comprising

a control plane, a management plane, and an application plane. Applications are able to

configure the resources in the application plane to satisfy their own requirements. These

resources are virtualized computing, storage, hardware and software resources, and other

resources and functionalities needed for rapid application creation.

Multiplicity of applications are able to coexist over the same shared virtualized infras-

tructure in AON application plane that is managed and control by the other two AON

planes; AON management and AON control. The latter is responsible for control-related

functions such as allocation, and release of the resources as well as failure recovery op-

erations, while the former is responsible for performing management related functions

such as monitoring, provisioning, and re-provisioning and long-term fault management.

We also proposed an architecture for applications in the application plane that has three

main characteristic: a two-layer (service and transport) architecture, a service-oriented

service layer, and a transport layer that provides content and data delivery.

The proposed architecture can be helpful in a diverse range of applications that re-

quire responsiveness, reliability, security, smart caching, and efficient content broadcast-

ing/multicasting. Mobile networks can also utilize the processing and storage capabilities

embedded in the architecture for performing smart and adaptive content conversion and

distribution to mobile nodes that experience hand-off as well as temporary disconnec-

tions.

Conclusions 183

9.1.2 Virtualized Application Networking Infrastructure

In this thesis, we presented Virtualized Application Networking Infrastructure (VANI)

as a networking research testbed that allows experimentation with new networked sys-

tems and distributed applications. Compared to the other networking research testbeds

VANI utilizes a service-oriented control and management plane that provides flexible and

dynamic allocation, release, program, and configuration of resources used for performing

large-scale experiments in a wide area network from layer three up. VANI resources

in the application plane allow development of network architectures that require a con-

verged network of computing and communications resources and in-network processing,

and storage.

Another main contribution in VANI is introduction of a reprogrammable hardware

resource that can be allocated to the experiments that require high performance and high

throughput computing on-demand. This resource is designed based on virtualization of

hardware resources, in particular FPGAs, and providing well-defined interfaces to the

researchers to program and configure it. Through experimentations and measurements,

we showed that the reprogrammable hardware resource can be programmed rapidly and

can achieve very high throughput using its 16x10GE interfaces.

VANI also allows registration of new hardware and software resources in the control

and management plane. This facilitates experimentation since researchers can set-up new

experiments rapidly using the available service components developed independently by

other researchers.

VANI in essence is a prototype of our proposed Application-Oriented Network Archi-

tecture and a proof-of-concept to show case how AON proposed concepts can be realized

and how distributed applications and new network architectures can be built on such a

network.

Another major contribution of this study was the design, and development of DETS

that is a novel system to shape and regulate Ethernet traffic in VANI as well as in a

Conclusions 184

computing cluster, or a datacenter. The DETS system is required where there is a host

node connected to several virtual local area networks, and the sending and receiving traffic

rate on each of these virtual networks has to be guaranteed and controlled. Without this

control, an excess of received traffic on one of these virtual networks could disturb other

virtual networks ability to receive traffic in a guaranteed rate.

While most of current solutions for Ethernet congestion control rely on simple Con-

gestion Notification-based mechanisms and virtually all of them require a change in the

Ethernet hardware equipments, our proposed DETS system does not require any changes

in the hardware. It is also able to operate distributedly using one of the four algorithms

proposed for rate allocation. Through the experimentation on an actual Linux-based

computing cluster, we showed the effectiveness of the DETS, and we compared the per-

formance of the four algorithms and discussed their characteristics. We also proposed

modifications to the Ethernet control plane so that DETS can be natively supported by

Ethernet networking elements.

9.1.3 Scalable and Distributed QoS and Admission Control

In this thesis, we studied the problem of QoS and admission control and allocating

instances of services to different applications in service-oriented environments. In this

problem, a limited number of service instances from each service components are shared

among different application classes. The major concerns in this problem are two-fold:

maximizing the system revenue by allocating the service instances to the more valuable

application classes considering the service execution times and request inter-arrival times

of each application class; and guaranteeing the successful completion of an admitted

application instance.

We presented a method for obtaining the optimal policy for maximizing system rev-

enue using Markov Decision Processes for small scale systems with exponential service

execution times and request inter-arrival times. We analyzed the case where the consti-

Conclusions 185

tuting service components in an application are executed concurrently throughout appli-

cation lifetime as well as the case where the service components are executed sequentially,

and hence are not required throughout the application lifetime.

We presented the optimal policy for prototype examples, and we compared the perfor-

mance of applying this policy to the system with the performance of a system that uses

Complete Sharing or Complete Partitioning mechanisms. In all cases, we showed that

applying the policies obtained from Markov Decision Processes results to considerable

performance improvement in system revenue compared to the other two mechanisms,

especially when the request rates for the applications are high.

As another major contribution of this study, we presented a Distributed Algorithm for

Service Commitment (DASC) that guarantees a specified level of probability of successful

completion for an application in a service-oriented system in settings that have stationary

as well as non-stationary arrivals. We showed that the Central Limit Theorem can help us

in computing this probability, and we also described alternative approach for computing

this probability using Chernoff’s bound. The DASC algorithm can be implemented in a

distributed environment and does not assume any specific distribution type for service

execution times and application request inter-arrival times.

For stationary systems, we proposed two steady-state based alternative approaches

(one based on bottleneck analysis, and the other based on simulation) that use token

bucket regulators to control the admission of application request to the system. These

algorithms are simpler to implement than the DASC algorithm, but they can not operate

in non-stationary environments. DASC, however, is able to perform in both stationary

and non-stationary environments using transient-state analysis of the system. We pre-

sented performance evaluation results showing the effectiveness of the DASC algorithm

in a simple service oriented system as well as in a complex system with both stationary

and non-stationary request arrivals.

We also showed that by adding a few queuing spaces, we can guarantee a specified

Conclusions 186

level of queuing probability for an application instance, and at the same time, significantly

reduce the application failure probability. In doing so, we presented a series of theorems

and corollaries that can be used in finding bounds for the time to enter service distribution

in general queuing systems.

To maximize the system revenue in addition to guaranteeing QoS, we proposed an

application admission control system for service-oriented systems. The proposed system

is able to use a simple steady-state or an online optimization approach for maximizing

the system revenue, in addition to the DASC algorithm that guarantees the required

level of probability of success.

The online optimization block of our system is composed of three sub blocks; the sce-

nario generating block, the online optimizer, and the final decision maker. We elaborated

the functionalities of each block, and we discussed the important factors in designing each

of them. We also formulated a binary integer programming problem which maximizes the

system revenue in the online optimizer block. The simulation results and performance

comparisons show that the proposed system can achieve its objectives and it can improve

the system performance.

9.1.4 Related Educational Contributions

The last, but not the least, contribution if this study is the education of several University

of Toronto (UofT) students especially through their involvement in performing experi-

ments with AON architecture and design and development of various parts of VANI.

In the early stages of this study, we were conducting experiments on AON archi-

tecture and applications. Justin Seto and Andrew Mehes helped us in this process by

implementing a prototype of a new network architecture in AON for their final year

design project at Electrical and Computer Engineering (ECE) department, UofT. The

developed system has XML-delivery function in its transport layer and uses a peer-to-

peer mechanism to organize its network. The two other students that were involved in

Conclusions 187

this process were Michael Ens and Ian Gartley. They were Engineering Science students

that performed experiments with the NaradaBrokering pub/sub system as well as a new

open-source XML-parser.

A major force in the VANI project was Keith Redmond, a MASc student at University

of Toronto. We worked very closely together in design and development of virtualization

layer for main resources in VANI including processing, storage, reprogrammable hard-

ware, and the internal fabric. In summer of 2008, Tom Yue was a summer student that

worked with us in development of parts of the VANI virtualization layer, specifically

on the WS interfaces of reprogrammable hardware resource. Darryl Chung was also a

summer student that developed the base for a Graphical User Interface for VANI in the

summer of 2008.

Gordon Tam was an Engineering Science student that helped us in development of

VANI control and management plane software. He started working with us on his final

year design project and continued his collaboration during summer of 2009 as a summer

student. In the summer of 2009, a group of summer students helped us in development

of various software resources in VANI including database resource, orchestrator resource,

the hardware-based gateway resource, and GENI-VANI interworking resource. These

students were Arbab Khan, Saleh Dani, Mingliang Ma, Maxim Galash, and Wenyu Li.

Three of these students (Arbab Khan, Saleh Dani, Maxim Galash) together with Anthony

Das Santos worked on a prototype of a green orchestrator engine and developed a sensor

resource for VANI as their final year design project. Minglian Ming helped us in exploring

some of our future work in regard to automatic application deployment in VANI as well.

Arbab Khan still is cooperating with us as a summer student to integrate, maintain and

improve VANI control and management software, and the developed processing, storage,

gateway, and internal fabric resource.

The author takes pride in working with these students and in being a part of their

education process at University of Toronto.

Conclusions 188

9.2 Future Work

This dissertation has covered many subjects in dealing with challenges in future networks.

In terms of future work, there are many possibilities in each of the covered topics. In

Application-Oriented Networks in general, and VANI in particular, an important future

work is to develop large scale applications based on this architecture and the developed

testbed.

One application that we are currently investigating is a green application orchestrator

engine. In the green orchestrator engine, we intend to create a distributed follow-the-sun

system that is able to move service components to VANI nodes that have better access

to green energy such as solar power or wind. The green orchestrator system is built

on VANI using a variety of software-based resources developed for VANI including the

complex event processing service, and sensor service.

Another application of VANI is in SW-defined radio. In wireless networks, VANI

is capable of processing a large amount of aggregated and digitized radio signals in its

reprogrammable hardware resources. This capability facilitates advanced research on

software-radio systems, and future wireless technologies.

A major extension to the AON control and management plane, as well as VANI is

to develop functionalities to automate application creation and deployment. In an auto-

mated system, an Application-Provider would be able to specify the high level business

goals of an application, and the system can identify the appropriate service components

and deploy them in the right places in an AON to deliver the required functionality.

Inclusion of autonomous management techniques in VANI is another possible extension

of work on the VANI testbed.

Additional future work in VANI include interconnecting VANI to GENI testbeds so

that GENI researchers can use VANI resources to carry out federated experiments, as

well as setting up VANI nodes in different sites across a wide area network to enable large

scale experimentations. In addition, we plan to include new hardware resources such as

Conclusions 189

the new BEE3 boards and GPU-based hardware in VANI. We hope that VANI could

serve as a breeding ground for research on large-scale and advanced networked systems

in Canada in future.

In terms of future work on Distributed Ethernet Traffic Shaping system, we intend

to further explore the DETS protocol modifications to the Ethernet control plane, and

develop proof of concept Ethernet switches with this capability using the hardware re-

sources developed for VANI.

In scalable QoS and admission control in service-oriented systems, we intend to fur-

ther explore the transient-state analysis potential in maximizing the system revenue by

predicting the revenue that a system would loose or receive by admitting a request for an

application, especially when the system is not over-loaded and there is room for gaining

more revenue.

Another extension to this work could be including scheduling mechanisms for the

queued application instances in the system. Further development of the proposed com-

mitment algorithm to reduce power consumption in a service oriented system through

anticipation of future resource requirements and putting the surplus resources in the low

power mode could be another area of future research. Finally, incorporating the proposed

QoS-control mechanisms in a real service-oriented system such as AON is another major

extension of this work that we would like to explore in future.

Appendices

190

Appendix A

Queue-Enabled Service Commitment

In Q-DASC, we use the pdf of Time to Enter Service (TES) in a G/G/C/N queuing

system. Finding exact solutions for TES distribution in general is very difficult. Therefore

in this section, we introduce several results that help us find approximations for TES.

A.1 Time to Enter Service in a G/G/C/N System

Assume that there is a G/G/C/N system that has a general distribution type for the

request arrivals. Also it has C independent instances of one service with execution time

pdf f(t) with mean µ, and has N queue spots in front of them.

We are interested in the distribution for the Time to Enter Service (TES) for the

queued requests, assuming that all service instances are busy. To do so, we define the

system state at time t as s(t) = (t − t1, t − t2, ..., t − tC), t1 ≤ t2 ≤ ... ≤ tC , in which ti

represents the time that ith instance started serving a request.

Finding the closed form representation of the distribution of the time to enter service

(TES) for each of the requests in the queue is quite difficult and impractical in general

case. However, in this section, we develop series of theorems that lead bounds for these

distributions.

Our intuition is that we only need to study the residual times of the j longest served

191

192

requests that are already in the system to find out the time to enter service for a queued

request that is in spot j of the queue(with spot 1 being the head of the queue). Following

this intuition, we would then find the relation between the TES and the residual times

of the requests that are being served. We will use the concepts in stochastic orders [143]

to determine when our intuition is correct.

Definition 1: Let X and Y be two random variables which have the following prop-

erty:

P{X > t} ≤ P{Y > t} ∀t ∈ (−∞,∞) (A.1)

then X is said to be smaller than Y in the usual stochastic order, shown by X ≤st Y .

This property can be also represented in terms of cumulative distribution functions (cdf),

as follows:

FX(t) ≥ FY (t) ∀t ∈ (−∞,∞) (A.2)

In other words, the distribution of X is lower bounded by distribution of Y .

Definition 2: A nonnegative random variable X with distribution function F and

survival function F (t) ≡ 1−F (t) is said to be Increasing Failure Rate (IFR) if −logF is

convex on {t : F (t) > 0}. Also X is said to be Decreasing Failure Rate (DFR) if −logF

is concave on {t : F (t) > 0}.

The next theorem finds the sufficient and necessary condition for a random variable

to be IFR or DFR.

Theorem 1) The random variable X is IFR [DFR] if, and only if, [X−t1|X > t1] ≥st

[≤st][X − t2|X > t2] whenever t1 ≤ t2.

proof: Theorem 1.A.13 in [143].

According to this theorem if the execution time of a service has the IFR property,

then the application instances that are already being served in the system are more likely

to finish their execution in the order of their arrival to that service. Similarly, if it is

DFR, the instances are more likely to finish their execution in the reverse order of their

193

5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time unit

prob

ablit

y

cdfs for a uniform distribution

FF4F3F2F1

Figure A.1: Distributions for residual service times in a service with uniform executiontime

arrival.

Therefore the next lemma can follow at once from the above definitions and theorem:

Lemma 1) Assuming f(t) to be pdf of an IFR [DFR] service execution time, and

F (t) as its cumulative distribution function (cdf), and the system is in state s(t), then

F1(t) ≥ F2(t) ≥ ... ≥ FC(t) ≥ F (t), [F1(t) ≤ F2(t) ≤ ... ≤ FC(t) ≤ F (t)] in which Fi(t) is

the cdf of the random variable Trithat shows the residual time of the ith service instance.

The uniform distribution is an IFR distribution. Among other IFR distributions we

can name, Normal distribution, the Gamma and Weibull distributions for α > 1, and the

modified extreme value distribution [144]. DFR distributions are rare but as an example

we can name log normal distribution [144]. Equality in Lemma 1 is for the exponential

distribution that has a constant failure rate and is the boundary for IFR and DFR type

of distributions.

Example 1: Assume that a service execution time has a uniform distribution U(10,20).

Figure A.1 shows Fi(t) distributions for a system that has four service instances (C = 4),

and is in the state s(t) = (t− 15, t− 12, t− 8, t− 4).

194

50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time unit

prob

ablit

y

cdfs for a normal distribution

FF4F3F2F1

Figure A.2: Distributions for residual service times in a service with Normal executiontime

Example 2: Figure A.2 shows the distributions for a service with Normal distribution

N(200, 10). In this system there are four service instances (C = 4), and the system is in

the state s(t) = (t− 150, t− 120, t− 80, t− 40).

Example 3: Figure A.3 shows the distributions for a system with four servers. The

service execution time is the Beta distribution used in the previous sections, and depicted

in figure 7.5. This Beta distribution is also an IFR distribution. The figure is depicted

when the system is in the state s(t) = (t− 1500, t− 1200, t− 800, t− 400).

Using the above definitions, theorem and lemma, we can now focus back on the

properties of the Time to Enter Service (TES) for a queued application instance.

Theorem 2) In a G/G/C/N system the distribution of TES for the first instance

in the queue (head of the queue) is lower bounded by the distribution of the residual

time of any of the requests Already Being Served (ABS) in the system. In other words,

G1C(t) ≥ Fi(t), if G1C(t) is the cdf for the TES of the first request in the system.

proof : Assume that the system is in state s(t) and there is one request in queue. The

time to enter service (TES) for that request is a random variable shown by Tw1C and is

195

200 400 600 800 1000 1200 1400 1600 1800 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time unit

prob

ablit

y

cdfs for a beta distribution

FF4F3F2F1

Figure A.3: Distributions for residual service times in a service with a Beta executiontime α = 2.333, β = 4.666

equal to min(Tr1, Tr2, .., TrC) in which Tri is the residual service time of the ith server.

Cdf for Tw1C is represented as G1C(t), and cdf of Tri is referred as Fi(t).

From the definitions, we wish to prove that Tw1C ≤st Tri, ∀0 < i ≤ C. To do so, we

have to show:

P{Tw1C < t} ≥ P{Tri < t} ∀t > 0 (A.3)

To prove the above inequality, we have to show that the event Tri < t is a subset of

event Tw1C < t. This, however, is true, since we know that if Tri < t then Tw1C will be

less than t. As a result, the above inequality is true, and the theorem is proved.

The next two corollaries discuss the properties of the this variable in terms of its

mean as well as its characteristics for an IFR [DFR] distribution.

Corollary 1) In a G/G/C/N system, the mean TES for the first request in the queue

(head of the queue) is not more than the mean residual time of any of the requests in

the system. In other words, m1C ≤ mi.

proof : This can be simply proved from theorem 1 and the fact that m1C =∫

∞

0(1 −

196

G1C(t)dt).

Corollary 2) In a G/G/C/N system with IFR [DFR] service time, we will have

Tw1C ≤st Tr1[TrC ], and consequently m1C ≤ m1[mC ].

proof : This corollary can be proved from theorem 2, lemma 1, and corollary 1.

The above corollary interestingly states that in a system with IFR [DFR] service

time, the distribution of the TES for the first request in queue is lower bounded by the

distribution of the longest [shortest] Already Being Served (ABS) request in the system.

The next theorem considers the properties of the time to enter service (TES) random

variable for other application instances in the service queue.

Theorem 3) In a G/G/C/N system, the distribution of TES of the jth request

(2 ≤ j ≤ C) in the queue is lower bounded by the distribution of the residual time of the

maximum of any combination of j ABS requests in the system.

proof : If we define TwjCas the random variable representing the TES of the jth

request in the queue, we can define a set VC as follows:

VC = {Tr1, Tr2

, ..., TrC} (A.4)

We define VjC ⊂ VC as any subset of random variables in VC having |VjC | = j,

assuming 2 ≤ j ≤ C. We need to prove:

TwjC ≤st max(VjC)

P{TwjC < t} ≥ P{max(VjC) < t}

Again to show that the above inequality is true, we have to prove that the event

max(VjC) < t is a subset of event TwjC < t. This is true, since if max(VjC) < t then

TwjCfor sure will be less than t. As a result the above inequality is true and the theorem

is proved.

Corollary 3) In a G/G/C/N system with IFR [DFR] service time, the cdf of TES

197

for the jth instance in queue is lower bounded by the cdf of the maximum of first [last]

j ABS instances in the system:

GjC(t) ≥

j⋃

k=1

[C⋃

k=C−j+1

]Fk(t), j ≤ C (A.5)

The next theorem finds an upper bound for the distribution of TES in a G/G/C/N

system.

Theorem 4) In a G/G/C/N system the distribution of TES of the jth request (j ≥ 2)

in the queue is upper bounded by the distribution of the (j − 1)th request in the queue.

proof: We know that TwjC ≥ Tw(j−1)C , therefore GjC(t) ≤ G(j−1)C(t),∀t, and j ≥ 2.

From Corollary 3 and theorem 3 we can see that for IFR [DFR] systems GjC(t) is

bounded by G(j−1)C(t) and⋃j

k=1[⋃C

k=C−j+1]Fk(t).

In summary, we showed that to find bounds of TES distribution for the jth request

in queue, we only need to analyze the residual time of j requests that are already being

served in the system. If the service time distribution is IFR, this j requests can be the

longest served ones. Since many distributions in real systems can be characterized as

IFR distributions, it can be concluded that our first intuition is correct for most real

systems. However, for DFR distributions better bounds can be obtained by analyzing j

shortest served requests.

In Q-DASC, if an application instance is queued, we find the TES mean and variance

using lower bounds, and we distribute them to other agents so that they can update their

future usage estimation.

As mentioned earlier finding the exact TES distribution for general service execution

times is very difficult because it not only depends on the service execution time distribu-

tion but also on the current state of the system as well as start time of ABS instances.

Therefore, we performed performance evaluations on the beta distribution that we used

for the DASC performance evaluations in Chapter 7 in order to assess the tightness of the

198

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1TES distribution for spot 1 in the queue

bound: mean=1.5,stdev=0.7sim: mean=2.3,stdev=0.3

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



2 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Service Execution Time distribution

mean=1333.3,stdev=166.7

Figure A.4: TES distribution and calculated bound for beta distribution with α = 2.333, β = 4.666

199

bound. To do so, we assumed a queue with 200 busy servers with beta execution times

with maximum service time of 2000. The start time of all ABS instances are uniformly

chosen from 5 to 1995 with steps of 10.

Figure A.4 shows the TES distributions and corresponding means and standard devi-

ations for the first five instances in the queue from spots one to five using simulation as

well as the bounds obtained in Corollary 3. As can be seen, the bound on distribution is

lower than the distribution found through simulations as expected. We can also observe

that the bound is tighter for the first spots in the queue and as we move further down the

queue the bound becomes more conservative. In the next subsections we study the time

to enter service distribution properties for G/D/C/N and G/M/C/N queuing systems.

A.2 TES for G/D/C/N System

A G/D/C/N queuing system is a system that has a deterministic service time of d

seconds. We assume that all C servers in the system are busy and system is in state

s(t) = (t− t1, t− t2, .., t− tC), in which t1 ≤ t2 ≤ ... ≤ tC , and t− tj < d. Also, we know

that a deterministic distribution is an IFR distribution.

Therefore, the TES for the first request in the queue will be equal to the residual time

of the longest served request in the system d − (t − t1). Similarly, we can see that the

TES for the jth request in queue is a deterministic value and equals to:

TwjC = d− (t− tj), j ≤ C (A.6)

A.3 TES for G/M/C/N System

Assume that there is a G/M/C/N system with service rate µ. It can be easily shown that

TES in a G/M/C/N system follows the m-Erlang distribution. The TES distribution

for the first request in queue is exponential (m-Erlang with parameter m = 1, and Cµ),

200

and distribution for the jth request in queue is an m-Erlang distribution with parameters

m = j, and Cµ. Also the residual time distributions in a G/M/C/N system are all iid

exponential distribution with rate µ. In this type of system, if we study the TES for the

jth request in queue we can see that its mean would be:

E[TdjC ] = E[TwjC ] + E[Ts]

= j/Cµ+ 1/µ =j + C

Cµ

V AR[TdjC ] = V AR[TwjC ] + V AR[Ts]

= j/(Cµ)2 + 1/µ2 =j + C2

(Cµ)2

in which Ts is the requests service time.

The interesting observation from the above equation is that for j ≪ C2 the variance

of the delay in the system is almost equal to the variance of the service time. In other

words, in systems with ample number of servers, the variance of the TES for the first few

requests in the queue (j/(Cµ)2, j ≪ C2) is almost negligible compared to the variance

of service time (1/µ2).

Appendix B

Computing Over-Commitment

Probability using Chernoff’s Bound

The Central Limit Theorem gives a good approximation of over-commitment probability

when the total number of service instances is in the range of few standard deviations

(σk(t)) from the mean ηk(t). Therefore, if this range is more than few standard deviations

and the required over-commitment probability threshold (Toc) is less than 0.001, it is

better to use a more tight bound on the probability. To do so, we use the theory of large

deviation and the Chernoff’s bound [145] to compute the probability of over-commitment.

The following is the definition of the Chernoff’s bound:

P{Sk(t) ≥ Nk} ≤ e−(sNk−µS(s)), s > 0,∀t > 0 (B.1)

in which Sk(t) is the sum random variable (7.4), and µS(s) = lnψS(s) is the loga-

rithmic moment generating function of the Sk(t) random variable. Since, the right hand

side of the above inequality is true for any s > 0, we can find the value of s which mini-

mizes the right hand side of the inequality by finding the s∗ which satisfies the following

equation:

201

Appendix: Over-Commitment Probability using Chernoff’s Bound 202

Nk = µ′

S(s) (B.2)

Putting the definition of the sum random variable from 7.4, we have:

P{Sk(t) ≥ Nk} ≤ e−s∗Nk+∑

(

ln(Gijk(t)+Gijk(t)es∗ ))

in which s∗ is the solution of the following equation:

Nk =∑

i,j

Gijk(t)

Gijk(t)e−s +Gijk(t).

It has been shown that the probability in (B.1) can be approximated for the random

variables which are the sum of finitely many random variables (like our defined sum

random variable (7.4) ) as follows [146]:

P{Sk(t) ≥ Nk} ≈1

s∗√

2πµ′′

S(s∗)e−(s∗Nk−µS(s∗)) (B.3)

To make sure that the probability of over-commitment remains less than the threshold,

we should compute s∗ for all times t, and compute the probability in approximation (B.3).

To do so, we use characteristics of the sum random variable Sk(t). We know that

Sk(t) is the sum of n independent Bernoulli random variables, in which n represents the

number of applications that can be served by service type k at time t. Therefore, we

analyze the problem in the general case as follows:

Assume Xi, i = 1, 2, .., n are n independent Bernoulli random variables with param-

eters (pi, qi), and pi + qi = 1. We define random variable Y as Y =∑n

i=1Xi. We

have:


η := E[Y ] =n

∑

i=1

pi, (B.4)

σ2 := V AR[Y ] =n

∑

i=1

(V AR[Xi])

=n

∑

i=1

piqi (B.5)

The Chernoff’s Bound is:

p{Y ≥ N} ≤ e−sNE[esY ] = e−sN

n∏

i=1

(qi + pies) , s > 0 (B.6)

After taking derivative in respect to s we find s∗ as the root of the following function:

d(s) =n

∑

i=1

pi

pi + qie−s−N , s > 0 (B.7)

Also the second derivative of the Chernoff’s bound right-hand side equation or the

derivative of d(s) would be:

d′(s) =n

∑

i=1

piqie−s

(pi + qie−s)2, s > 0 (B.8)

As it can be seen, d′(s) is always positive and therefore d(s) is a strictly increasing

function that (at most) has one root. Figure B.1 shows a sample of this function. In

this Figure, we depicted d(s) for a service type with N = 900 instances and n = 1000

application instances in the system with random pi s. As expected, this function is a

strictly increasing function, and (in this case) with one root. Therefore, we present a five

step algorithm for finding the root. This algorithm in each step examines the cases where

there is no root for this function, or there is one root which is much larger than one, or

much less than one, or close to one. Now without getting further into the mathematical

details, we present this algorithm for finding the optimum s∗ as follows:


0 2000 4000 6000 8000 10000 12000−400

−300

−200

−100

0

100

d(s)

s

Figure B.1: A sample d(s) for a service with 900 instances, and random pi s for 1000application instances

1) if N ≥ n then s∗ = ∞, and the bound is 0, which means the system has more

service instances than the number of admitted applications and the probability of over-

commitment is zero. Otherwise go to the next step.

2) if N ≤ η then s∗ = 0, and the bound is 1. This means that the number of service

instances is less than the mean number of admitted applications, and by using CLT we

can see that the over-commitment probability is more than 0.5. Otherwise go to the next

step.

3) If η < N < n, d(s) would be a strictly increasing function with only one root.

therefore, if that root is a lot less than 1, (s∗ ≪ 1), we have:

s∗ =N − η

σ2, s∗ ≪ 1

If the above equation achieves s∗ < 0.5 then s∗ is the answer. otherwise proceed to

the next step.

4) for s∗ ≫ 1, we have:


s∗ =

∑

pi−1 − n

n−N, s∗ ≫ 1

if the above equation achieves s∗ > 5 then s∗ is the answer. otherwise proceed to the

next step.

5) the s∗ is in the range (0.5, 5), In this case, we can compute the root using the

Newton’s method very efficiently.

Our simulations show that in the most cases the above algorithm ends in the 4th step

and there is no need to use the Newton’s method. However, even if it is needed, the

Newton’s method can achieve a sufficiently accurate answer for our problem in less than

few iterations.

As we explained earlier, to compute the over-commitment probability in all future

times we have to compute s∗ for all times t that is most likely for the application to

be in that service. By calculating s∗ and obtaining Poc(t) we would be able to make

sure that the application failure probability is less than the agreed threshold Toc at all

times. The process of computation of s∗ for all t, however, in some systems can be a

computation intensive task. To overcome this obstacle, we propose a practical technique

for computing the root in equation (B.2).

Our solution is to combine the CLT-based method and the Chernoff’s bound method.

In this technique the system computes the over-commitment probability based on the

mean and variance values and using the central limit theorem as described in the previous

subsection. Moreover, the system keeps track of the time th that the CLT-based method

gives the highest value for the over-commitment probability. If the highest CLT-based

computed probability were less than 0.001, the system would compute the roots of the

equation (B.2) using the above mentioned technique. Consequently, the Chernoff’s bound

for that particular time th, can be computed using s∗.

Appendix C

Derivation of Gk(t) Probability

Assume that there is an application that can be created by cascading m different services

as following: S1

⊗

S2

⊗

...Sj

⊗

Sk...⊗

Sm

⊙

. The execution times of all services are

independent random variables shown as Xi(i = 1, ...,m), with pdf of fi(t)(i = 1, ...,m).

We want to find the probability that at time t the application has finished the exe-

cution of all services before service k and is currently executing the service k:

Gk(t) = P{j

∑

1

X < t <j

∑

1

X +Xk}, We define Yj as Yj :=j

∑

1

X, with pdf of fY j(t)

and cdf of FY j(t). Now we have:

Gk(t) = P{Yj < t < Yj +Xk}

=

t∫

0

fY j(τ)P{τ < t < Yj +Xk|τ = Yj}dτ

=

t∫

0

fY j(τ)P{t < τ +Xk}dτ

=

t∫

0

fY j(τ)(1− Fk(t− τ))dτ

206

Appendix: Derivation of Gk(t) Probability 207

= FY j(t)−

t∫

0

fY j(τ)Fk(t− τ)dτ

= FY j(t)−

t∫

0

t−τ∫

0

fY j(τ)fk(λ)dλdτ

with the change of variable λ to ν − τ , we have:

t∫

0

t−τ∫

0

fY j(τ)fk(λ)dλdτ =

t∫

0

t∫

0

fY j(τ)fk(ν − τ)dνdτ

=

t∫

0

(fY j(ν) ∗ fk(ν))dν = FY k(t)

Therefore, the probability Gk(t) is equal to:

Gk(t) = FY j(t)− FY k(t)

Appendix D

Simulation Environment Description

In this thesis, we have frequently used simulation techniques to evaluate performance of

the proposed systems and algorithms. The simulations environment and techniques used

for each of the performance evaluations have been described in the related parts of each

chapter. In this appendix we would like to present an overall description of simulation

environment and techniques used for the purpose of this study.

The simulations in this thesis were all conducted on a 56-node computing cluster

in the Network Architecture Lab in the Department of Electrical and Computer Engi-

neering, University of Toronto. Each of these 56 computing nodes has two Xen 1.7GHz

processors and two 40 GB local hard drives and 2GB of RAM. This considerable amount

of processing power allowed us to easily repeat each simulation many times (> 20 per

point) and use the calculated mean values of the obtained results to evaluate the per-

formance of the proposed algorithms. We have also calculated the confidence intervals

for these results and found out that since the number of trial runs are quite large the

confidence intervals are very narrow.

To make sure that the simulations are correct, we followed a step-by-step and modular

approach. In each case, we started the simulation process by simulating simpler cases

and we analyzed the extensive logs produced by the simulator to make sure the internal

208

Appendix: Simulation Environment Description 209

states and variables are correct. We also followed a modular design approach for our

simulations and we tested each module in separation to increase the quality of simulations

by simplifying the debugging process.

We have also evaluated the correctness of the random number generators by perform-

ing statistical analysis on the generated random numbers. The input and output of each

simulation is described in the performance evaluations sections in each chapter.

Bibliography

[1] T. Anderson, L. Peterson, S. Shenker, and J. Turner. Overcoming the internet

impasse through virtualization. Computer, 38(4):34 – 41, april 2005.

[2] Zhenyu Yang, Wanmin Wu, Klara Nahrstedt, Gregorij Kurillo, and Ruzena Bajcsy.

Enabling multi-party 3d tele-immersive environments with viewcast. ACM Trans.

Multimedia Comput. Commun. Appl., 6(2):1–30, 2010.

[3] A. Tizghadam and A. Leon-Garcia. Autonomic traffic engineering for network

robustness. Selected Areas in Communications, IEEE Journal on, 28(1):39 –50,

january 2010.

[4] R. Farha and A. Leon-Garcia. Blueprint for an Autonomic Service Architecture.

In Autonomic and Autonomous Systems, 2006. ICAS ’06. 2006 International Con-

ference on, July 2006.

[5] K.A. Abuosba and A.A. El-Sheikh. Formalizing service-oriented architectures. IT

Professional, 10(4):34 –38, july-aug. 2008.

[6] Virtualization. http://en.wikipedia.org/wiki/Virtualization.

[7] Hadi Bannazadeh, Albert Leon-Garcia, and et. al. Virtualized Application Net-

working Infrastructure. In Proc. of the 6th International Conference on Testbeds

and Research Infrastructures for the Development of Networks and Communities,

Berlin, Germany, May 2010.

210


[8] Keith Redmond, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. Devel-

opment of a Virtualized Application Networking Infrastructure Node. In Proc. of

the 3rd IEEE Workshop on Enabling the Future Service-Oriented Internet, Hon-

olulu, Hawaii, December 2009.

[9] Hadi Bannazadeh and Alberto Leon-Garcia. A Distributed Ethernet Traffic Shap-

ing System. In Proc. of the 17th IEEE Workshop on Local and Metropolitan Area

Networks (LANMAN 2010), Long Branch, NJ, May 2010.

[10] Michael Cusumano. Cloud computing and saas as new computing platforms. Com-

munications of the ACM, 53(4):27–29, 2010.

[11] Hadi Bannazadeh and Alberto Leon-Garcia. Allocating Services to Applications

using Markov Decision Processes. In proc. of IEEE Int. Conf. on Service-Oriented

Computing and Applications, SOCA’07, pages 141–146, Newport Beach, California,

June 2007.

[12] Hadi Bannazadeh and Alberto Leon-Garcia. Service Commitment Strategies in

Allocating Services to Applications. In proc. of IEEE Int. Conf. on Service Com-

puting, SCC’07, pages 91–97, Salt Lake City, Utah, July 2007.

[13] Hadi Bannazadeh and Alberto Leon-Garcia. A Distributed Algorithm for Ser-

vice Commitment in Allocating Services to Applications. In proc. of 2nd IEEE

Asia-Pacific Service Computing Conference, APSCC’07, pages 446–453, Tsukuba,

Japan, Dec 2007.

[14] Hadi Bannazadeh and Alberto Leon-Garcia. Probabilistic Approach to Service

Commitment in Service-Oriented Systems. In in the proc. of IEEE Congress on

Services, Honolulu, Hawaii, July 2008.


[15] Hadi Bannazadeh and Alberto Leon-Garcia. A distributed probabilistic commit-

ment control algorithm for service-oriented systems. Network and Service Manage-

ment (TNSM), to appear in the IEEE Transactions on.

[16] Hadi Bannazadeh and Alberto Leon-Garcia. Online optimization in application

admission control for service oriented systems. In Asia-Pacific Services Computing

Conference, 2008. APSCC ’08. IEEE, pages 482–487, Yilan, Taiwan, Dec 2008.

[17] Hadi Bannazadeh and Albert Leon-Garcia. On the Emergence of an Application-

Oriented Network Architecture. In proc. of IEEE Int. Conf. on Service-Oriented

Computing and Applications, SOCA’07, pages 47–54, Newport Beach, California,

June 2007.

[18] Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to-peer

content distribution technologies. ACM Comput. Surv., 36(4):335–371, 2004.

[19] Service-Oriented Architecture. www.ibm.com/soa.

[20] OASIS Reference Model for Service Oriented Architecture 1.0. http://www.oasis-

open.org.

[21] Francis Shanahan. Amazon.com Mashups. Wrox Press Ltd., Birmingham, UK,

2007.

[22] Tim O’Reilly. What is web 2.0: Design patterns and business models for the next

generation of software. Available online at http://oreilly.com/web2/archive/

what-is-web-20.html.

[23] W3C Working Group Note. Web services architecture. Available online at http:

//www.w3.org/TR/ws-arch/.

[24] W3C. extensible markup language (xml). Available online at http://www.w3.

org/XML/.


[25] Benny Mathew Poornachandra Sarang, Matjaz Juric. Business Process Execution

Language for Web Services BPEL and BPEL4WS. Packt Publishing, Birmingham,

UK, 2006.

[26] Krishna Kant. Data center evolution: A tutorial on state of the art, issues, and

challenges. Computer Networks, 53(17):2939 – 2965, December 2009.

[27] James Murty. Programming Amazon Web Services: S3, EC2, SQS, FPS, and

SimpleDB. O’Reilly Media Inc, California, 2008.

[28] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and

D. Zagorodnov. The eucalyptus open-source cloud-computing system. In Clus-

ter Computing and the Grid, 2009. CCGRID ’09. 9th IEEE/ACM International

Symposium on, pages 124 –131, Shanghai, May 2009.

[29] Guohui Wang and T. S. Eugene Ng. The impact of virtualization on network

performance of amazon ec2 data center. In Proceedings of the 29th IEEE Conference

on Computer Communications, INFOCOM 2010, San Diego, CA, March 2010.

[30] M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, Rong Pan, B. Prab-

hakar, and M. Seaman. Data center transport mechanisms: Congestion control

theory and ieee standardization. In Communication, Control, and Computing,

2008 46th Annual Allerton Conference on, pages 1270–1277, Sept. 2008.

[31] Alan B. Johnston. SIP: Understanding the Session Initiation Protocol. Artech

House Publishers, 2009.

[32] ITU-T. Next generation networks global standards initiative. Available online at

http://www.itu.int/ITU-T/ngn.


[33] K. Knightson, N. Morita, and T. Towle. Ngn architecture: generic principles,

functional architecture, and implementation. Communications Magazine, IEEE,

43(10):49 – 56, oct. 2005.

[34] Gonzalo Camarillo and Miguel A. Garcia-Martin. The 3G IP Multimedia Subsystem

(IMS). John Wiley & Sons Ltd, England, 2006.

[35] TM forum. Ipsphere forum. Available online at http://www.tmforum.org/

ipsphere.

[36] Cornelia Kappler. UMTS Networks and Beyond. John Wiley & Sons, England,

2009.

[37] Pierre Lescuyer and Thierry Lucidarme. Evolved Packet System, The LTE and

SAE Evolution of 3G UMTS. John Wiley & Sons, England, 2008.

[38] Alasdair Allan. Learning iPhone Programming: From Xcode to App Store. O’Reilly

Media, CA, USA, 2010.

[39] Reto Meier. Professional Android 2 Application Development. Wiley Publishing,

USA, 2010.

[40] Akamai. http://www.akamai.com.

[41] R.L. Xia and J.K. Muppala. A survey of bittorrent performance. Communications

Surveys Tutorials, IEEE, 12(2):140 –158, second 2010.

[42] Gero Mhl, Ludger Fiege, and Peter Pietzuch. Distributed Event-Based Systems.

Springer, Germany, 2006.

[43] P. Saint-Andre. Xmpp: lessons learned from ten years of xml messaging. Commu-

nications Magazine, IEEE, 47(4):92 –96, april 2009.


[44] Jacob Chakareski and Pascal Frossard. Adaptive systems for improved media

streaming experience. Communications Magazine, IEEE, 45(1):77 –83, jan. 2007.

[45] Cisco. Cisco visual networking index: Forecast and methodology, 2009-2014. Avail-

able online at http://www.cisco.com.

[46] Youtube. http://www.youtube.com.

[47] Hulu. http://www.hulu.com.

[48] E. Mikoczy, D. Sivchenko, Bangnan Xu, and J.I. Moreno. Iptv systems, standards

and architectures: Part ii - iptv services over ims: Architecture and standardization.

Communications Magazine, IEEE, 46(5):128 –135, may 2008.

[49] J.S. Turner and D.E. Taylor. Diversifying the internet. In Global Telecommunica-

tions Conference, 2005. GLOBECOM ’05. IEEE, volume 2, Dec 2005.

[50] Steven M. Bellovin, David D. Clark, Adrian Perrig, and Dawn Song. A

Clean-Slate Design for the Next-Generation Secure Internet, 2005. Available

at http://sparrow.ece.cmu.edu/group/pub/bellovin_clark_perrig_song_

nextGenInternet.pdf.

[51] Stanford University Clean Slate Design For Internet: An Interdisciplinary Research

Program. http://cleanslate.stanford.edu.

[52] 100x100 project. http://100x100network.org.

[53] Srel M., Rinta aho T., and Tarkoma S. Rtfm: Publish/subscribe internetwork-

ing architecture. ICT-MobileSummit 2008 Conference Proceedings, Paul Cunning-

ham and Miriam Cunningham (Eds), IIMC International Information Management

Corporation, 2008.


[54] Van Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass,

Nicholas H. Briggs, and Rebecca L. Braynard. Networking named content. In

CoNEXT ’09: Proceedings of the 5th international conference on Emerging net-

working experiments and technologies, pages 1–12, New York, NY, USA, 2009.

ACM.

[55] GENI System Overview, September 2008. Available at http://www.geni.net.

[56] GENI Control Framework Requirements, January 2009. Available at http://www.

geni.net.

[57] Peterson L. PlanetLab: A Blueprint for Introducing Disruptive Technology into

the Internet. http://www.planet-lab.org, January 2004.

[58] PlanetLab GENI Control Framework Overview, January 2009. Available at http:

//www.geni.net.

[59] Emulab - network emulation testbed. http://www.emulab.net.

[60] Mike Hibler, Robert Ricci, Leigh Stoller, Jonathon Duerig, Shashi Guruprasad,

Tim Stack, Kirk Webb, and Jay Lepreau. Large-scale Virtualization in the Emulab

Network Testbed. In Proceedings of the 2008 USENIX Annual Technical Confer-

ence, pages 113–128, June 2008.

[61] Open resource control architecture. http://nicl.cod.cs.duke.edu/orca/about.

html.

[62] P. Szegedi, S. Figuerola, M. Campanella, V. Maglaris, and C. Cervello-Pastor.

With evolution for revolution: managing FEDERICA for future Internet research.

Communications Magazine, IEEE, 47(7):34–39, July 2009.

[63] Snehapreethi Gopinath, Shweta Jain, Shivesh Makharia, and Dipankar Raychaud-

huri. An experimental study of the cache-and-forward network architecture in


multi-hop wireless scenarios. In Proc. of the 17th IEEE Workshop on Local and

Metro Area Networks (LANMAN 2010), Long Branch, NJ, May 2010.

[64] E. Grasa, G. Junyent, S. Figuerola, A. Lopez, and M. Savoie. Uclpv2: a network

virtualization framework built on web services [web services in telecommunications,

part ii]. Communications Magazine, IEEE, 46(3):126 –134, march 2008.

[65] E. Grasa et al. UCLPv2: A Network Virtualization Framework Built on Web

Services. Communications Magazine, IEEE, 46(3):126–34, March 2008.

[66] Matthias Nicola and Jasmi John. Xml parsing: A threat to database performance.

In In proc. of 12th Intl. Conference on Information and Knowledge Management,

pages 175–178, New Orleans, Louisiana, 2003.

[67] D. Davis and M.P. Parashar. Latency performance of soap implementations. In

Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium

on, New Orleans, Louisiana, may 2002.

[68] Hadi Bannazadeh. Hardware-based Content Processing, May 2007.

[69] F. Hartung, N. Niebert, A. Schieder, R. Rembarz, S. Schmid, and L. Eggert. Ad-

vances in network-supported media delivery in next-generation mobile systems.

Communications Magazine, IEEE, 44(8):82 –89, aug. 2006.

[70] D. Chappell. Theory in Practice: Enterprise Service Bus. O’Reilly Media, USA,

2004.

[71] IBM. Websphere datapower soa appliances. http://www-01.ibm.com/software/

integration/datapower/.

[72] Bo Li and Hao Yin. Peer-to-peer live video streaming on the internet: issues,

existing approaches, and challenges [peer-to-peer multimedia streaming]. Commu-

nications Magazine, IEEE, 45(6):94 –99, june 2007.


[73] I. Hernandez-Serrano, S. Sharma, and A. Leon-Garcia. Reliable p2p networks:

Treblecast and treblecast. In Parallel Distributed Processing, 2009. IPDPS 2009.

IEEE International Symposium on, pages 1 –8, 2009.

[74] Cisco. Application-oriented networking. http://www.cisco.com.

[75] Larry Peterson, Soner Sevinc, Jay Lepreau, Robert Ricci, John Wroclawski, Ted

Faber, Stephen Schwab, and Scott Baker. Slice-based facility architecture. Avail-

able online at http://www.geni.net.

[76] Marc E. Fiuczynski Herbert Ptzl. Linux-VServer, Resource Efficient OS-Level

Virtualization, June 2007. Available at http://ols.108.redhat.com/2007/

Reprints/potzl-Reprint.pdf.

[77] CANARIE Inc. CANARIE: Canadian Network for the Advancement of Research,

Industry and Education. http://www.canarie.ca.

[78] Glen Gibb, John W. Lockwood, Jad Naous, Paul Hartke, and Nick McKeown.

NetFPGA: An Open Platform for Teaching How to Build Gigabit-Rate Network

Switches and Routers. Trans. on Education, 51(3):364–369, August 2008.

[79] Yu Cheng, R. Farha, A. Tizghadam, Myung Sup Kim, M. Hashemi, A. Leon-Garcia,

and J.W.-K. Hong. Virtual network approach to scalable ip service deployment and

efficient resource management. Communications Magazine, IEEE, 43(10):76 – 84,

oct. 2005.

[80] C. Chang, J. Wawrzynek, and R.W. Brodersen. BEE2: a high-end reconfigurable

computing system. Design and Test of Computers, IEEE, 22(2):114–125, March-

April 2005.

[81] Sun Microsystems Inc. OpenESB: The Open Enterprise Service Bus. http://

open-esb.dev.java.net.


[82] Sun Microsystems Inc.: Java Web Start Technologies. http://java.sun.com/

javase/technologies/desktop/javawebstart.

[83] Ontario Research and Innovation Optical Network (ORION). http://www.orion.

on.ca.

[84] IEEE 802.1ad-2005, Virtual Bridged Local Area Networks Amendment 4: Provider

Bridges, 2006. Available at http://standards.ieee.org.

[85] Inc VMWare. VMware: A Virtual Computing Environment. http://www.vmware.

com, 2001.

[86] Padala P., Zhu X., Wang Z., Singhal S., and Shin K.G. Performance Evaluation

of Virtualization Technologies for Server Consolidation, 2007. Available at http:

//www.hpl.hp.com/techreports/2007/HPL-2007-59R1.html.

[87] Cloud Computing Definition, National Institute of Standards and Technol-

ogy, Version 15, 2006. Available at http://csrc.nist.gov/groups/SNS/

cloud-computing/index.html.

[88] The Internet Engineering Task Force (IETF). Rfc3448: Tcp friendly rate control

(tfrc). http://www.ietf.org/rfc/rfc3448.txt.

[89] S. Biyani and J. Martin. A comparison of tcp-friendly congestion control protocols.

In Computer Communications and Networks, 2004. ICCCN 2004. Proceedings. 13th

International Conference on, pages 255 –260, Oct 2004.

[90] IEEE 802.3x-1997, Local and Metropolitan Area Networks: Specification for 802.3

Full Duplex Operation, 1997. Available at http://standards.ieee.org.

[91] IEEE 802.1au, Virtual Bridged Local Area Networks Amendment Congestion No-

tification. Available at www.ieee802.org/1/pages/802.1au.html.


[92] Jinjing Jiang, R. Jain, and Chakchai So-In. An explicit rate control framework for

lossless ethernet operation. In Communications, 2008. ICC ’08. IEEE International

Conference on, pages 5914–5918, May 2008.

[93] Gary McAlpine, Manoj Wadekar, Tanmay Gupta, Alan Crouch, and Don Newell.

An architecture for congestion management in ethernet clusters. In IPDPS ’05:

Proceedings of the 19th IEEE International Parallel and Distributed Processing

Symposium - Workshop 9, page 211.1, 2005.

[94] Chakchai So-In, R. Jain, and Jinjing Jiang. Enhanced forward explicit conges-

tion notification (e-fecn) scheme for datacenter ethernet networks. In Performance

Evaluation of Computer and Telecommunication Systems, 2008. SPECTS 2008.

International Symposium on, pages 542 –546, June 2008.

[95] Linux Advanced Routing and Traffic Control. Available at http://lartc.org/.

[96] M. Bichler and K-J. Lin. Service-Oriented Computing. IT Systems Perspectives,

39(3):99–101, March 2006.

[97] X. Gu and K. Nahrstedt. Distributed Multimedia Service composition with statis-

tical QoS Assurances. IEEE Transactions on Multimedia, 8(1):141–151, Feb 2006.

[98] L. Zeng, B. Benatallah, A.H.H Ngu, M. Dumas, J.Kalagnanam, and H. Chang.

QoS-Aware Middleware for Web Service Composition. IEEE Transactions on Soft-

ware Engineering, 30(5):311–327, May 2004.

[99] P. Doshi, R. Goodwin, R. Akkiraju, and K. Verma. Dynamic workflow composition

using Markov decision processes. In Proc. IEEE International Conference on Web

Services, pages 576–582, July 2004.

[100] Thomas Phan and Wen-Syan Li. Heuristics-based scheduling of composite web

service workloads. In MW4SOC ’06: Proceedings of the 1st workshop on Middleware


for Service Oriented Computing (MW4SOC 2006), pages 30–35, New York, NY,

USA, 2006. ACM.

[101] K.W. Ross and D.H.K. Tsang. The stochastic knapsack problem. Communications,

IEEE Transactions on, 37(7):740 –747, jul 1989.

[102] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena

Scientific, Belmont, Massachusetts, third edition, 2005.

[103] M.L. Puterman. Markov Decision Processes. Wiley Inter-Science, New York, 1994.

[104] S.D. Moitra. Skewness and the Beta Distribution. Journal of Operation Research

Society, 41(10):953–961, Oct 1990.

[105] Menasce Daniel A., Casalicchio Emiliano, and Dubey Vinod. A heuristic approach

to optimal service selection in service oriented architectures. In WOSP ’08: Pro-

ceedings of the 7th international workshop on Software and performance, pages

13–24, New York, NY, USA, 2008. ACM.

[106] Danilo Ardagna and Barbara Pernici. Adaptive service composition in flexible

processes. IEEE Transactions on Software Engineering, 33:369–384, 2007.

[107] Valeria Cardellini, Emiliano Casalicchio, Vincenzo Grassi, Francesco Lo Presti, and

Raffaela Mirandola. Qos-driven runtime adaptation of service oriented architec-

tures. In ESEC/FSE ’09: Proceedings of the the 7th joint meeting of the European

software engineering conference and the ACM SIGSOFT symposium on The foun-

dations of software engineering, pages 131–140, New York, NY, USA, 2009. ACM.

[108] Tao Yu, Yue Zhang, and Kwei-Jay Lin. Efficient algorithms for web services selec-

tion with end-to-end qos constraints. ACM Trans. Web, 1(1):6, 2007.

[109] David Chappell and David Berry. Soa - ready for primetime: The next-generation,

grid-enabled service-oriented architecture. SOA Magazine, September 2007.


[110] Menasce Daniel A., Ruan Honglei, and Gomaa Hassan. Qos management in service-

oriented architectures. Perform. Eval., 64(7-8):646–663, 2007.

[111] Markus Schmid and Reinhold Kroeger. Decentralised qos-management in service

oriented architectures. In Distributed Applications and Interoperable Systems, vol-

ume 5053/2008, pages 44–57. Springer Berlin / Heidelberg, 2008.

[112] S. Rosario, A. Benveniste, S. Haar, and C. Jard. Probabilistic QoS and Soft Con-

tracts for Transaction-Based Web Services Orchestrations. IEEE Transaction on

Services Computing, 1(4):187–200, October-December 2008.

[113] Leyuan Shi. Approximate analysis for queueing networks with finite capacity and

customer loss. European Journal of Operational Research, 85(1):178 – 191, 1995.

[114] Boualem Rabta. Rapid Modelling for Increasing Competitiveness, chapter A Review

of Decomposition Methods for Open Queueing Networks, pages 25–42. 2009.

[115] Carolina Osorio and Michel Bierlaire. An analytic finite capacity queueing network

model capturing the propagation of congestion and blocking. European Journal of

Operational Research, 196(3):996 – 1007, 2009.

[116] H. Kobayashi and B. Mark. System Modeling and Analysis, Foundation of System

Perfromance Evaluation. Pearson Education, Inc., Upper Saddle River, New Jersey,

2009.

[117] Raj Jain. The art of computer systems performance analysis : techniques for ex-

perimental design, measurement, simulation, and modeling. John Wiley & Sons,

Inc., New York, NY, 1991.

[118] A. Papoulis and S. U. Pillai. Probablity, Random Variables and Stochastic Pro-

cesses. MacGraw-Hill, New York, 2002.


[119] Z.100, Specification and Description Language. Available online at

http://www.itu.int/rec/T-REC-Z.100-200711-I/en, 2007.

[120] Cheng-Yuan Ku, Din-Yuen Chan, and Lain-Chyr Hwang. Optimal reservation

policy for two queues in tandem. Inf. Process. Lett., 85(1):27–30, 2003.

[121] Cheng-Yuan Ku and Scott Jordan. Near optimal admission control for multiserver

loss queues in series. European Journal of Operational Research, 144(1):166–178,

2003.

[122] S. Balsamo, V. Nitto Persone, and R. Onvural. Analysis of Queueing Networks

with Blocking. Kluwer’s International Series, 2001.

[123] W. Whitt. The queueing network analyzer. The Bell System Technical Journal,

62(9):2779–2815, 1983.

[124] A. Heindl. Approximate analysis of queueing networks with finite buffers and losses

by decomposition. Technical Report 1998-8, 1998.

[125] J.C. Strelen. Loss queueing networks with bursty arrival processes and phase type

service times: Approximate analysis. In In Proceedings of the 5th IFIP Workshop

on Performance Modelling and Evaluation of ATM Networks, pages 87/1–10, 1997.

[126] Sushant Jain and J. MacGregor Smith. Open finite queueing networks with

m/m/c/k parallel servers. Computers & Operations Research, 21(3):297 – 317,

1994.

[127] R. Sadre, B. Haverkort, and A. Ost. An efficient and accurate decomposition

method for open finite and infinite buffer queueing networks. In in proc. of the

Third International Workshop on Numerical Solution of Markov Chains, page 120,

1999.


[128] Abigail Lebrecht and William J. Knottenbelt. Response time approximations in

fork-join queues. In in proceedings of 23rd Annual UK Performance Engineering

Workshop (UKPEW), June 2007.

[129] R. Nelson and A.N. Tantawi. Approximate analysis of fork/join synchronization in

parallel queues. Computers, IEEE Transactions on, 37(6):739 –743, jun 1988.

[130] Edward D. Lazowska, John Zahorjan, G. Scott Graham, and Kenneth C. Sevcik.

Quantitative system performance: computer system analysis using queueing net-

work models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1984.

[131] Majid Ghaderi and Raouf Boutaba. Call admission control in mobile cellular net-

works: a comprehensive survey: Research articles. Wirel. Commun. Mob. Comput.,

6(1):69–93, 2006.

[132] D.A. Levine, I.F. Akyldiz, and M. Naghshineh. A Resource Estimation and Call

Admission Algorithm for Wireless Multimedia Networks using Shadow Cluster Con-

cept. IEEE/ACM Transactions on Networking, 5(1):1–12, Feb 1997.

[133] T. Zhang, E. van den Berg, J. Chennikara, P. Agrawal, Jyh-Cheng Chen, and

T. Kodama. Local predictive resource reservation for handoff in multimedia wireless

ip networks. Selected Areas in Communications, IEEE Journal on, 19(10):1931–

1941, Oct 2001.

[134] Ti-Yen Yen and Wayne Wolf. Performance estimation for real-time distributed

embedded systems. IEEE Trans. Parallel Distrib. Syst., 9(11):1125–1136, 1998.

[135] Lei Ju, Abhik Roychoudhury, and Samarjit Chakraborty. Schedulability analysis of

msc-based system models. In RTAS ’08: Proceedings of the 2008 IEEE Real-Time

and Embedded Technology and Applications Symposium, pages 215–224, Washing-

ton, DC, USA, 2008. IEEE Computer Society.


[136] Firat Kart, Louise E. Moser, and P. Michael Melliar-Smith. Building a distributed

e-healthcare system using soa. IT Professional, 10(2):24–30, 2008.

[137] Sorin Manolache, Petru Eles, and Zebo Peng. Schedulability analysis of applica-

tions with stochastic task execution times. Trans. on Embedded Computing Sys.,

3(4):706–735, 2004.

[138] Sorin Manolache, Petru Eles, and Zebo Peng. Schedulability analysis of multipro-

cessor real-time applications with stochastic task execution times. In ICCAD ’02:

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided

design, pages 699–706, New York, NY, USA, 2002. ACM.

[139] Sheldon M. Ross. Stochastic Processes. John Wiley & Sons, 1996.

[140] P.V. Hentenryck and R. Bent. Online Stochastic Combinatorial Optimization. The

MIT Press, Cambridge, Massachusetts, 2006.

[141] Martin Bichler and Thomas Setzer. Admission control for media on demand ser-

vices. Service Oriented Computing and Applications, 1(1):65–73, Apr 2007.

[142] Mixed Integer Linear Programming MILP solver lp solve.

http://sourceforge.net/projects/lpsolve.

[143] M. Shaked and J.G. Shanthikumar. Stochastic Orders and There Applications.

Academic Press, Boston, Massachusetts, 1994.

[144] Richard E. Barlow, Frank Proschan, and Larry C. Hunter. Mathematical Theory

of Reliability. SIAM, New York, NY, 1996.

[145] Alberto Leon-Garcia. Probability, Statistics, and Random Processes For Electrical

Engineering. Addison-Wessley, New York, 2008.

[146] Joseph Y. Hui. Switching and traffic theory for integrated broadband networks.

Kluwer Academic Publishers, Massachusetts, 1990.

Glossary

ABS Already Being Served.

AON Application-Oriented Network.

AOR Application-Oriented Router.

BEE2 Berkeley Emulation Engine 2.

BIP Binary Integer Programming.

BPEL Business Process Execution Language.

CAC Call Admission Control.

CDN Content Distribution (Delivery) Network.

CEP Complex Event Processing.

CLT Central Limit Theorem.

CP Complete Partitioning.

CS Complete Sharing.

DASC Distributed Algorithm for Service Commitment.

DETS Distributed Ethernet Traffic Shaping.

DFR Decreasing Failure Rate.

EC2 Amazon Elastic Cloud Computing.

ESB Enterprise Service Bus.

FCP Full Commitment Policy.

FCQN Finite Capacity Queuing Network.

226

Glossary 227

FECN Forward Explicit Congestion Network.

FPGA Field-Programmable Gate Array.

GENI Global Environment for Network Innovations.

GPU Graphics Processing Unit.

GUI Graphical User Interface.

HTTP Hypertext Transfer Protocol.

IFR Increasing Failure Rate.

IMS IP Multimedia Subsystem.

IP Internet Protocol.

JMS Java Message Service.

LP Linear Programming.

MDP Markov Decision Processes.

NCP No Commitment Policy.

NGN Next Generation Network.

PCP Partial Commitment Policy.

Q-DASC Queue-enabled Distributed Algorithm for Service Commitment.

RAA Rate Allocation Algorithm.

RAA-FE Rate Allocation Algorithm-Forward Explicit.

RAA-FP Rate Allocation Algorithm-Fast Probe.

RAA-FS Rate Allocation Algorithm-Fair Share.

RAA-SP Rate Allocation Algorithm-Slow Probe.

SDL Specification and Description Language.

SIP Session Initiation Protocol.

SNMP Physical Node.

SNMP Simple Network Management Protocol.

Glossary 228

SNMP Virtual Node.

SOA Service-Oriented Architecture.

SSL Secure Socket Layer.

SSS Service Signaling Stratum.

TES Time to Enter Service.

TLS Transport Layer Security.

UCLP User Controlled Light Path.

UUID Universally Unique IDentifier.

VANI Virtualized Application Networking Infrastructure.

VANI-AP VANI Application Plane.

VANI-CMP VANI Control and Management Plane.

VLAN Virtual Local Area Network.

WS Web Service.

WSDL Web Service Description Language.

XML Extensible Markup Language.

XMPP Extensible Messaging and Presence Protocol.

Documents

Application-Oriented Networking through …...Abstract Application-Oriented Networking through Virtualization and Service Composition Hadi Bannazadeh Doctor of Philosophy Electrical