Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

Embed Size (px)

Citation preview

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    1/141

    i

    Software Quality of Service

    in Composite Applications

    Built with Web Services

    Shelly Saunders

    A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy of

    Nottingham Trent University and Southampton Solent University

    November 2010

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    2/141

    ii

    Abstract

    Over recent years businesses have evolved their enterprise architectures to include

    packaged applications, legacy systems, and bespoke line-of-business (LOB) applications that

    are integrated using an architectural approach called Service-oriented architectures (SOA).

    The architecture promotes agile and reconfigurable application development which is ideal

    in 21st century businesses. Modern SOAs now extend into the Cloud using pay per use

    Software as a Service (SaaS).

    Composite applications built using Web Services are one way in which SOA principles are

    being introduced into enterprise architectures. Although there are many business reasons

    for this type of architecture the methods and techniques for estimating overall Quality of

    Service (QoS) in a composite application built using Web Services do not exist at the

    moment.

    This thesis attempts to address a number of questions in this area. Firstly, how can we

    predict the performance of a composite application, for example, what is the effect on

    performance of replacing one Web Service with another one of equivalent functionality, or

    by dynamically changing the steps in a workflow? Secondly, how can we maximise the

    performance of that application through effective use and exploitation of the resources

    available to it? Thirdly, what strategies are there for improving our ability to meet QoS

    metrics in a composite application? As the provider of a composite application to one or

    more clients how can we manage situations were resources are overloaded? Under these

    conditions it would be useful to be able to selectively admit or reject requests from clients

    based on some criteria that maximises the providers profits or business objectives.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    3/141

    iii

    Fourthly, how can we define and manage SLAs for performance metrics in composite

    applications?

    This thesis makes the following five contributions. The first is detailed test results to

    demonstrate that the Mean Value Analysis (MVA) algorithm can be applied to a queuing

    network description of a composite application using Web Servcies. The second is the

    demonstration of the MVA algorithm as the fitness function for a Genetic Algorithm (GA).

    The third is a practical example of applying a GA to dynamic management of a real workflow

    implemented as a set of Web Services across multiple servers. The fourth is the

    demonstration of strategies for meeting QoS metrics under a number of different real-life

    overload conditions. The fifth is a proposal for improvements that could be made to existing

    SLA design methodologies and SLA languages to define QoS metrics composite applications.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    4/141

    iv

    Dedication

    To Greta 1993-2008 and Jenny 1970-2009

    May flights of angels sing thee to thy rest.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    5/141

    v

    Acknowledgements

    I would like to acknowledge the encouragement and support of my supervisors at Southampton

    Solent University: Eur Ing Professor Margaret Ross MBE, Eur Ing Geoff Staples and Dr Sean

    Wellington, Head of the Technology Research Centre. I would also like to thank Professor Mike

    Barnett and Professor John Rees who both provided helpful advice and direction.

    The final version of this thesis benefited greatly from the comments and input made by Edwin Gray

    during the viva.

    Steve White, of IBMs Autonomic Computing laboratory at the Thomas J Watson Research Centre in

    Hawthorne gave up time to read and comment on some of this work in a very valuable session.

    I have also had useful conversations about SOA in general from colleagues at my former employer,

    ACE Group, as well as IBM staff from the Hursely labs near Winchester.I would also like to thank

    Marlborough Stirling plc who gave permission for me to use the results of a performance testing

    exercise conducted on their systems.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    6/141

    vi

    Table of Contents

    Chapter 1 Introduction ..................................................................................................................... 1

    1.1 Motivation............................................................................................................................... 3

    1.2 Hypothesis............................................................................................................................... 4

    1.3 Methodology........................................................................................................................... 5

    1.4 Contributions........................................................................................................................... 5

    1.5 Thesis Roadmap ...................................................................................................................... 6

    Chapter 2 Analysis of the Problem ................................................................................................... 8

    Chapter 3 Related Work ................................................................................................................. 13

    3.1 Background ........................................................................................................................... 13

    3.1.1 Discovery and Negotiation .............................................................................................. 13

    3.1.2 Service Level Agreement................................................................................................. 14

    3.1.3 Service Provision............................................................................................................. 15

    3.1.4 Monitoring...................................................................................................................... 15

    3.2 Adaptive Control of Web Applications and Services ............................................................... 16

    3.3 Queuing Theory ..................................................................................................................... 17

    3.3.1 Dynamic Resource Configuration .................................................................................... 17

    3.3.2 Admission Control........................................................................................................... 19

    3.3.3 Dynamic Provisioning of Idle Resources .......................................................................... 20

    3.3.4 Extending to Multiple Tiers ............................................................................................. 20

    3.4 Control Theory ...................................................................................................................... 21

    3.4.1 Admission Control........................................................................................................... 21

    3.4.2 Degraded Service............................................................................................................ 21

    3.4.3 Extending to Multiple Tiers ............................................................................................. 22

    3.4.4 Fuzzy Controllers............................................................................................................. 23

    3.5 Combined Approaches........................................................................................................... 23

    3.6 Solving Optimization Problems .............................................................................................. 24

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    7/141

    7

    3.6.1 Utility Functions.............................................................................................................. 24

    3.6.2 Integer Linear Programming............................................................................................ 25

    3.6.3 Genetic Algorithms ......................................................................................................... 25

    3.7 Concluding Remarks .............................................................................................................. 26

    3.8 Publications ........................................................................................................................... 27

    Chapter 4 Designing SLAs for Composite Applications.................................................................... 28

    4.1 Service Level Management.................................................................................................... 28

    4.1.1 Service Monitoring.......................................................................................................... 29

    4.1.2 Key Quality Indicators and Key Performance Indicators................................................... 30

    4.2 Service Level Agreement Design ............................................................................................ 31

    4.2.1 COSMA ........................................................................................................................... 31

    4.2.2 MoDe4SLA ...................................................................................................................... 33

    4.2.3 Differential QoS Support ................................................................................................. 35

    4.3 Proposals............................................................................................................................... 36

    Chapter 5 An MVA Performance Model for a SOA.......................................................................... 37

    5.1 Introduction .......................................................................................................................... 37

    5.2 Performance Requirements................................................................................................... 38

    5.3 Business Demand Modelling.................................................................................................. 39

    5.4 Workload Characterisation .................................................................................................... 41

    5.4.1 Task Distribution ............................................................................................................. 41

    5.4.2 Arrival Time Distribution ................................................................................................. 43

    5.4.3 Service Time Distribution ................................................................................................ 45

    5.4.4 Load-Dependence of Service Times................................................................................. 46

    5.5 Modelling the Application...................................................................................................... 47

    5.5.1 Mean Value Analysis ....................................................................................................... 47

    5.5.2 The Queuing Network Model of an N-Tier Application .................................................... 51

    5.6 Management Software .......................................................................................................... 53

    5.6.1 Capturing Application Metrics ......................................................................................... 53

    5.6.2 Statistical Analysis of Raw Metrics................................................................................... 54

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    8/141

    8

    5.6.3 MVA Modeller ................................................................................................................ 55

    5.7 Results................................................................................................................................... 55

    5.7.1 Accuracy of the Model .................................................................................................... 58

    5.8 Using the Model to Predict the Performance of a New Workflow ......................... ................. 58

    5.9 Summary and Discussion ....................................................................................................... 59

    5.10 Publications ......................................................................................................................... 60

    Chapter 6 A Genetic Algorithm with an MVA Fitness Function for Runtime Performance

    Improvements of a Composite Application Built Using Web Services ............................................ 62

    6.1 Introduction to Genetic Algorithms........................................................................................ 62

    6.2 Comparison with Other Techniques ....................................................................................... 64

    6.3 A GA for the Sample Application............................................................................................ 64

    6.3.1 Chromosome Encoding ................................................................................................... 64

    6.3.2 Initial Population............................................................................................................. 67

    6.3.3 Fitness Evaluation ........................................................................................................... 68

    6.3.4 Fitness Selector............................................................................................................... 69

    6.3.5 Constraints ..................................................................................................................... 69

    6.3.6 Crossover........................................................................................................................ 70

    6.3.7 Mutation ........................................................................................................................ 74

    6.3.8 Population Evolution....................................................................................................... 74

    6.4 Technical Design of the Management Solution ...................................................................... 74

    6.4.1 The ESB........................................................................................................................... 74

    6.4.2 Dynamic Routing............................................................................................................. 75

    6.4.3 Logical Design ................................................................................................................. 76

    6.5 Test Harness .......................................................................................................................... 78

    6.5.1 Sample Workflows .......................................................................................................... 78

    6.5.2 Baseline Performance Test Results.................................................................................. 82

    6.5.3 Post GA Results............................................................................................................... 85

    6.6 Summary and Discussion ....................................................................................................... 86

    Chapter 7 Strategy for a QoS-aware Composite Applications in the Cloud...................... ............... 87

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    9/141

    9

    7.1 Enterprise SOA and Cloud Computing .................................................................................... 87

    7.1.1 Cloud Computing and Software as a Service.................................................................... 87

    7.1.2 A Unified Architecture .................................................................................................... 89

    7.1.3 Challenges with SLA Management .................................................................................. 90

    7.2 Modelling a Composite Application in the Cloud .................................................................... 91

    7.3 Strategies for Automated QoS Control in the Cloud ............................................................... 92

    7.3.1 Changes in Workload ...................................................................................................... 92

    7.3.2 Loss of Service ................................................................................................................ 93

    7.3.3 Increases in Latency ........................................................................................................ 93

    7.3.4 Differentiated Services.................................................................................................... 94

    7.4 Conclusions ........................................................................................................................... 95

    7.5 Publications ........................................................................................................................... 96

    Chapter 8 Evaluation and Conclusions............................................................................................ 97

    8.1 Discussion of Results ............................................................................................................. 98

    8.2 Evaluation of Results and Methodologies ............................................................................ 101

    8.3 Contributions of this Thesis ................................................................................................. 103

    8.4 Limitations of this Thesis ..................................................................................................... 104

    8.5 Future Work ........................................................................................................................ 105

    References.................................................................................................................................... 106

    Appendix A Publications Linked to This Thesis ............................................................................. 124

    Journals..................................................................................................................................... 124

    Conferences .............................................................................................................................. 124

    Appendix A2 Other Research Outputs Not Directly Relevant To This Thesis ......................... ....... 125

    Software Engineering ................................................................................................................ 125

    Optoelectronics......................................................................................................................... 125

    Patents...................................................................................................................................... 125

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    10/141

    List of Figures

    Figure 1.1 Service-oriented application integration.

    Figure 1.2 A Virtual Enterprise.

    Figure 4.1 Service Level Management

    Figure 4.2 Composite Web Services

    Figure 4.3 Simplified COSMAdoc schema

    Figure 5.1 Probability of client requesting each job class.

    Figure 5.2 Distribution of tasks across three tiers for each of the 26 classes of work.

    Figure 5.3 Inter-arrival time distribution from web logs.

    Figure 5.4 The tail of the distribution for inter-arrival times beyond two seconds.

    Figure 5.5 Inter-arrival times of tasks on the job queue.

    Figure 5.6 Service time distributions for all tasks executing in under 2 sec.

    Figure 5.7 Service time distributions for all tasks executing in over 2 sec

    Figure 5.8 Increase in task service time with load

    Figure 5.9 Queuing Network Model of an N-tier Application

    Figure 5.10 An ESB executing a sequence of tasks via Web Services

    Figure 5.11.Execution times of each class in a simple workflow, together with the total response

    time

    Figure 5.12 Comparison of the response times predicted by the model and the actual response times

    at different loads.

    Figure 5.13 Accuracy of the model

    Figure 5.14 The differences between the predicted and observed results when the model is used in a

    predictive manner

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    11/141

    Figure 6.1 Layers of services become increasingly more coarse-grained, with the top layer of

    orchestration services providing a standards based aggregation and process framework.

    Figure 6.2 Logical Design

    Figure 6.3 Sample Workflows

    Figure 6.4 Baseline Results

    Figure 7.1 SOA and SaaS used to create a composite application

    Figure 7.2 Generalised example queuing network model including SaaS services

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    12/141

    xii

    List of Tables

    Table 5.1 MVA Algorithm

    Table 6.1 Example Chromosome Encoding

    Table 6.2 Logical Design

    Table 6.3 Sample Workload

    Table 6.4 Job Distribution

    Table 6.5 Measured Execution Times

    Table 6.6 Optimised Job Distribution

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    13/141

    xiii

    Glossary of Terms

    Admission Control a QoS procedure which determines the rate at which jobs are accepted into a

    system or network, or indeed, whether the jobs will be accepted at all

    Artificial Intelligence (AI) a branch of computer science dealing with simulating intelligent

    behaviour in computers

    Autonomic Computing a term created by IBM to describe self-managing computer systems.

    BPEL an XML language for describing business processes

    Cloud computing - is the provision of dynamically scalable and often virtualised resources as a

    service over the Internet. Cloud computing services often provide common business applications

    online that are accessed from a web browser, while the software and data are stored on the servers.

    Control Theory a technique from engineering whereby one or more input variables are tracked by

    a controller in order to manipulate one or more output variables.

    Decision Theory a branch of AI concerned with decision making. In particular design theory

    addresses problems such as how to measure the outcome of a decision to ensure that its optimal

    and how to make decisions with incomplete knowledge (choice under uncertainty).

    E-Commerce Transaction a business transaction that occurs over a network between two

    partners. The transaction is likely to consist of a number of discrete business processes that

    automatically engage other IT systems.

    ESB (Enterprise Service Bus) a layer of abstraction on top of a messaging service stack that

    supports Web services standards, synchronous and asynchronous messaging patterns, content-

    based routing, rules-based content filtering or enrichment, XML transformation services, standards-

    based adapters (such as JCA, JMS).

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    14/141

    xiv

    Event-driven architecture (EDA) is a software architecture pattern promoting the production,

    detection, consumption of, and reaction to events. Event-driven architecture can complement

    service-oriented architecture (SOA) because services can be activated by triggers fired on incoming

    events

    Fuzzy Logic Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with

    reasoning that is approximate rather than precise

    Genetic Algorithm - A genetic algorithm (GA) is a search technique used in computing to find exact

    or approximate solutions to optimization and search problems. Genetic algorithms are categorized

    as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (also

    known as evolutionary computation) that use techniques inspired by evolutionary biology such as

    inheritance, mutation, selection, and crossover (also called recombination).

    Grid Computing an emerging architecture whereby many networked computers are used to

    parallel process work by packaging the work up into many small jobs

    J2EE a multi-platform framework that provides software developers a huge number of pre-coded

    solutions for common tasks in the Java language

    Kendall Notation a system for describing the characteristics of a queuing system - letters are used

    to describe the shape of a distribution: M-Markovian, G-general. The first letter defines the job

    interarrival distribution and the second letter describes the service time distribution. Then a number

    is used to give the number of servers, so M/M/1 is a queue where job interarrival and service times

    have a Markovian distribution and there is a single server.

    Linear Programming In mathematics, linear programming (LP) problems involve the optimization

    of a linear objective function, subject to linear equality and inequality constraints

    Mean Value Analysis (MVA) A technique for analysing closed multichain queuing networks.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    15/141

    xv

    .NET a framework for the Windows operating system that provides software developers with a

    huge number of pre-coded solutions for common tasks. It supports development in multiple

    languages but C#.NET and VB.NET are the most popular

    PI Controller in control theory, a controller with both proportional and integral feedback control.

    Its popular because it can have a nonzero constant value under steady-state conditions even when

    the error signal is zero.

    Queuing Theory the mathematical analysis of queues

    QoS Quality of Service, in a software application sense, refers to non-functional attributes such as

    response time, availability, and reliability. QoS attributes are used to provide measurable constraints

    in a SLA.

    SLA a Service Level Agreement is a formal contract between an IT service provider and a service

    consumer

    Service-oriented Architecture (SOA) "a style of multi-tier computing that helps organizations

    share logic and data among multiple applications and usage modes." [Natis and Schulte, 1996]

    SOAP SOAP is a protocol for exchanging XML-based messages between software components

    Software as a Service (SaaS) an element of Cloud Computing, SaaS is a model of software

    deployment whereby a provider licenses an application to customers for use as a service on

    demand. SaaS software vendors may host the application on their own web servers or download the

    application to the consumer device, disabling it after use or after the on-demand contract expires.

    UDDI an XML based registry for listing the WSDL and URLs of Web services

    UML Unified Modelling Language

    Utility Functions Utility is a measure of preference, expressed through utility functions. Utility

    functions assign numbers to members of a choice set in order to rank the choices.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    16/141

    xvi

    URL(URI) a unique identifier to the location of a Web service, application or site (on a corporate

    network or on the Internet)

    Web Service a Web service is commonly defined as a software service that uses WSDL to define its

    interface and SOAP envelopes for message exchange.

    Workflow a business process implemented as a composite Web Service comprised of a number of

    steps each consuming finer-grained Web Services

    Workload in e-commerce terms, this is the rate at which requests are made to system resources

    WSDL an XML format for describing the public interface of Web Services

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    17/141

    1

    Chapter 1

    Introduction

    It is commonplace for organisations to automate complex business processes using service-oriented

    architectures (SOA) [Erl, 2008]. A service-oriented architecture is a distributed architecture that

    models components as services. It is built upon a collection of open standards, including Web

    Services Description Language (WSDL) [W3C, 2001], SOAP [W3C, 2007], WS-Security [OASIS, 2006a],

    WS-Policy [IBM, 2006], WS-ReliableMessaging [OASIS 2006b], BPEL, or Business Process Execution

    Language [OASIS 2007].

    A SOA encourages enterprise application integration and composite application development by

    virtue of the fact that it is intrinsically loosely-coupled [Erl, 2008]. For example, Web Services can

    provide wrappers to applications built on legacy systems, allowing the functionality of those legacy

    applications to be integrated with new functionality which is collectively delivered via a single portal,

    Figure 1.1.

    Figure 1.1 Service-oriented application integration. The legacy functionality of back-end systems is

    exposed via Web Services. An integration layer provides business process orchestration and the portal

    layer provides the user interface.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    18/141

    2

    This type of composite application built using Web Services not only helps companies maximise their

    investment in legacy systems, but also helps streamline business processes. The business and

    technical benefits of such applications collectively offer what is often called a virtual enterprise or

    virtual organisation [Khoshafian, 2002], Figure 1.2.

    SOAP, WS-Add ress ing

    B P E L / O W L -S

    WSDL, WS-Po l i cy

    W S- T r ans ac t ion , WS- Sec u r i t y

    In te rne t

    U DD I

    ServiceC o n s u m e r

    In tr ane t In te rne t

    C R M

    E R P

    Legacy Apps

    Service Providers

    Service Providers

    Se rv ice Prov ide rs

    E n te r p r i se A p p l ica t i o nIntegration

    B 2B I n teg ra t i o n v i a OnD em an d S er v ice

    Providers

    In -house se rv ices

    Figure 1.2 A Virtual Enterprise (adapted from [Khoshafian, 2002]). A service-oriented architecture can

    flexibly integrate applications, functionality and data across not only legacy applications on the

    organisations own intranet, but can also across enterprise boundaries to consume external third-

    party services. Furthermore, the organisation can expose its own composite services to its own

    clients.

    Cloud Computing allows SOAs to reach out across the globe consuming software services from

    around the world a concept usually referred to as Software as a Service (Saas) [Lakshmanan,

    2009].

    The ability to automatically discover services, compose those services into a business process and

    invoke them as part of workflow in an on-demand fashion opens up some of the most exciting

    features of dynamic e-business. The Universal Description and Discovery Interface (UDDI) initiative

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    19/141

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    20/141

    4

    Managing Overload Conditions It is a common scenario when providing workflows to

    multiple clients that resources can become overloaded. Under these conditions it would be

    useful to be able to selectively admit or reject requests from clients based on some criteria

    that maximises the providers profits or business objectives.

    Performance Prediction How can we predict the effect on performance of replacing one

    Web Service with another one of equivalent functionality, or by dynamically changing the

    steps in a workflow?

    Performance Improvement How can an organisation improve the performance of that

    application through effective use and exploitation of the resources available to it?

    Service Level Agreement (SLA) Management how can we define and manage SLAs for

    performance metrics in composite applications?

    SLA Strategies what strategies are there for improving our ability to meet SLA performance

    targets in a composite application?

    1.2 Hypothesis

    This thesis examines the following hypothesis:

    There exist solutions and strategies that will allow providers of composite applications built using

    Web Services to manage the QoS metrics of that application in such a way that they can ensure they

    meet SLA targets containing those metrics.

    We make no attempt to determine the best solutions in this thesis as the scope of the work involved

    would be too broad for a thesis. Instead we attempt to provide evidence that such solutions exist.

    As suggested by the Software Engineering Institute, this is itself extremely valuable. Within the

    financial services industry, which the thesis author has worked since 1997, there is widespread

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    21/141

    5

    belief among system management professionals that these are potentially intractable problems.

    This thesis aims to demonstrate that solutions do exist.

    1.3 Methodology

    It is the aim of this research to explore how the scenarios outlined above could be addressed at the

    application levelthrough the construction of QoS-aware software components that could be offered

    as a generic management service in a typical composite application built using Web Services.

    Fundamental to this effort are two major pieces of work: firstly, the creation of a model for

    measuring and predicting the essential QoS metrics: response time and throughput, and secondly

    the development of a methodology for efficiently solving the optimization problem that results.

    We will use an qualitative, empirical approach for the major pieces of software engineering involved

    in which we will attempt to apply candidate solutions to a real insurance application built using Web

    Services. In electing to use this application we have chosen to follow an exploratory case-study

    methodology. The results of exploratory research such as this are not useful for decision-making by

    themselves, but they can provide significant insight into a given situation and this is therefore

    considered a good approach to address our hypothesis. We also believe that the composite

    application used in the study is very typical of a general class of applications used in the insurance

    and financial services industries. This view is based on the thesis author's many years experience

    working as a technical architect in this sector. In selecting the software engineering aspects of this

    thesis we are attempting to generate ideas for a design space and evaluate our design choices

    through prototyping the proposed design solutions in real use with actual components of the case-

    study application.

    1.4 Contributions

    This thesis makes the following main contributions to the subject:

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    22/141

    6

    1. This is the first time that detailed test results have been published to prove that the Mean

    Value Analysis (MVA) algorithm can be applied to a queuing network description of a

    composite application using Web Services.

    2. This thesis is the first published work to use MVA as the fitness function for a Genetic

    Algorithm

    3. This thesis is the first published work to apply a GA to dynamic run-time QoS management

    of a real insurance application implemented as a set of Web Services across multiple

    servers. Previous published work on using GAs to optimize service composition has

    restricted itself to numerical simulations.

    4. This thesis demonstrates strategies for meeting QoS targets under a number of different

    real-life overload conditions.

    5. This thesis discusses improvements that could be made to existing SLA design

    methodologies and SLA languages to incorporate QoS metrics for composite applications.

    1.5 Thesis Roadmap

    Chapter 2 provides a background discussion to the issues of QoS in service-oriented architectures as

    well as introducing related work in the field to the two main components of the thesis: the model

    and the optimization methodology.

    In order to derive and use a model for a composite application using Web Services we need to

    undertake the following steps:

    1. Define the performance requirements of the composite application.

    2. Model the business demand of the composite application

    3. Build a performance model by characterising the workload of real systems.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    23/141

    7

    Chapter 3 describes theses steps as applied to a real insurance application and shows how a

    performance model was developed based on queuing network theory. The report then shows how

    the model can be used for adaptive control of a composite application using Web Services by

    addressing some simple performance prediction problems.

    In Chapter 4, a Genetic Algorithm is introduced for performance management of composite

    application using Web Services . The key aspects of the GA are the chromosome encoding and the

    crossover strategy. It is shown how the GA can optimize the overall response times of workflows

    using the MVA model as a fitness function to identify whether the workflow suggested by each

    chromosome will meet the QoS targets defined.

    In chapter 5 we demonstrate from an architectural perspective how the models and optimization

    techniques introduced in this thesis can be applied to workflows for enterprise applications built

    using Service-oriented architectures that extend beyond the local enterprise and consume third-

    party services in the Cloud.

    Finally, in Chapter 6 we review the most recent proposals for SLA management of composite

    applications and identify areas where these proposals could be extended to include provision for the

    adaptive strategies described in the previous chapters.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    24/141

    8

    Chapter 2

    Analysis of the Problem

    The Software Engineering Institute reported in their review paper on Service Level Agreements in

    Service-Oriented Architectures that one of the most important areas for further research is the need

    to understand and determine the QoS of composite services [Bianco et. al. 2008]. The problems

    have also been raised with respect to applications built using services in the Cloud [Panzieri et. al.

    2010] who observe that QoS in clouds is not sufficiently investigated as yet but there is growing

    interest in both industry and academia.

    In terms of the SOA methodology, composition of services allows the business to realize flexibility,

    reusability and adaptability of its software assets. However, the application must still meet such

    important QoS attributes as performance. Since the components may be provided by multiple

    stakeholders and the configuration could change at run-time these are important additional issues

    to consider over a more traditional distributed architecture.

    Menasce [2002] first highlighted the need for a QoS definition in Web Services and identified the

    need to take into consideration both the needs of the service provider and the service consumer.

    QoS requirements for Web Services include the following [Yu et. al., 2007]: Performance, Reliability,

    Scalability, Transactions, Capacity, Accuracy and Integrity, Regulatory, Availability, Interoperability

    and Security.

    Performance:

    Service time is the length of time for services taken to provide a response to various

    types of requests [Bhoj et al, 2000; Chandrasekaran et al, 2002; Menasce, 2002;

    Agarwal et al, 2005].

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    25/141

    9

    Response time is the total time required to complete a service request [Mani and

    Nagarajan, 2002; Papazoglou and Georgakopoulos, 2003; Looker et al, 2004;

    DAmbrogio, 2006].

    Reliability refers to the capability of maintaining the service and service quality [Jin et al,

    2002; Silver et al, 2003; Cardoso et al, 2004; Burstein et al, 2005].

    Security refers to authentication mechanisms, messages encryption and access control,

    confidentiality, non-repudiation and resilience to denial-of-service attacks [Sahai et al, 2002;

    Ran, 2003; Wang et al, 2004; DAmbrogio, 2006].

    Accessibility refers to the capability of satisfying a web service request [Gu et al, 2002; Mani

    and Nagarajan, 2002; Looker et al, 2004; Mathijssen, 2005].

    Transactions relates typically to properties such as the transactional durability and

    consistency of results [Mani and Nagarajan, 2002; Menasce, 2002; Ran, 2003; Schmit and

    Dudstdar; 2005]

    Capacity is the maximum number of concurrent requests that server can process to

    guarantee performance or the number of concurrent connections that is permitted by the

    service [Al-Ali et al, 2002; Ran, 2003; Mathijssen, 2005].

    Accuracy and Integrity refers to the maintaining of correct and consistent interaction [Mani

    and Nagarajan, 2002; Papazoglou and Georgakopoulos, 2003; Looker et al, 2004].

    Regulatory refers to the conformance and compliance to the rules, laws, standards and

    specifications [Mani and Nagarajan, 2002; Ran, 2003; Looker et al, 2004].

    Availability is the time as a percentage that the composite application is available to service

    requests [Hu et.al. 2009]

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    26/141

    10

    Interoperability is the ability of the composite application to interoperate with systems in a

    way that is agnostic of the platform they run on or the programming language used to write

    them.

    Many of these are now well addressed, for example, Security through WS-Security [OASIS, 2006a]

    and Interoperability [OASIS 2010]. Performance in the context of Composite Web Services remains a

    challenge, however [Dyachuk et. al. 2007]. This is reflected in the fact that within the SaaS (Software

    as a Service) industry many vendors, e.g. Amazon, only make SLA statements that cover availability

    and reliability. SLA assurances about performance metrics such as response times are not widely

    available. The thesis author raised this topic on the discussion forum of the SaaS group on the

    LinkedIn business networking site. Despite the fact that this group has almost 6000 members

    worldwide (as of December 2009) just two SaaS vendors voluntarily offered performance related

    SLA metrics for their services. Of these two companies only one (Intactt) publicly display those

    figures on their website.

    Within the financial services industry, the author has noted through her work as a consultant, that

    many companies recognise this as a problem without any readily available automation solution.

    Instead, the state-of-the-art today is to monitor each individual resource in a composite application

    for its availability on a large monitor visually inspected by Help Desk staff, whilst performance

    metrics of individual resources are only analysed offline on a periodic basis (daily, weekly) from web

    logs. There is no published literature on this issue from these companies as the subject is for obvious

    reasons, commercially sensitive information.

    Where solutions exist to monitor performance metrics and to pro-actively take remedial action,

    these are based primarily on the use of redundant virtual machines. Lodi et. al. [2007] is an example

    of an approach using large-scale clustering of available Virtual Machines and adaptive load-

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    27/141

    11

    balancing that has been trialled an J2EE application servers. Two particular problems with this

    approach include:

    large number of VMs may give rise to scalability problems in collateral subsystems (e.g. a

    shared database may become a bottleneck)

    VM allocation time may cause SLA violations

    An alternative solution would be to make better use of the resources that are available. There are

    two aspects to this. Firstly, monitoring of individual resources to capture live performance metrics

    and being able to use a model of the composite application to be able to understand the impact on

    the workflows being executed in real-time. Secondly, using this data to automatically take remedial

    action where SLA targets are in danger of not being met. We attempt to address our hypothesis

    with a focus on these two pieces of software engineering.

    There have been many published strategies for modelling the performance of QoS of Composite

    Web Services, for example, through the use of integer programming [Cardoso, 2002; Zeng, et. al.,

    2004; Gao et. al., 2005; Kelly, 2003], as a multiple choice knapsack problem [Yu, et al. 2007],

    probability theory [Hwang et. al., 2007], event-driven rule-based programming [Zeng et. al., 2010],

    numerical simulation [Silver et. al., 2003], game theory [Esmaeilsabzali et. al., 2005], layered

    queuing network models [D'Ambrogio et. al., 2007], fuzzy logic [Lin et. al., 2005; Diao et. al. 2002b,

    2003], analytical models using queuing networks and hill-climbing algorithms [Menasce and Bennani

    2003], utility functions based on simple queuing networks [Pacifici et. al., 2003], approximate Mean

    Value Analysis [Menasce et. al., 2004], exact Mean Value Analysis [Urgaonkar et. al. 2005a], job

    scheduling [Urgaonkar and Shenoy 2005], control theory [Abdelzaher et. al. 2001; Lu et. al., 2002;

    Diao et. al., 2002a; Lu et. al., 2004; Wand et. al., 2004] and genetic algorithms [Canfora et. al. 2005;

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    28/141

    12

    Zomaya and Teh 2001; Page and Naughton 2006; Canfora et. al. 2008]. There is no published work

    that attempts to compare and contrast these different approaches.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    29/141

    13

    Chapter 3

    Related Work

    3.1 Background

    Although there has been an enormous amount of published material regarding SOA, the quality of

    service aspects have only more recently been addressed [Bichler and Lin, 2006], furthermore, the

    typical software QoS challenges of any Web Service [Mani and Nagaranjan, 2002]: availability,

    integrity, performance (throughput and latency), reliability, interoperability, regulatory factors and

    security, are compounded in more advanced SOA by their dynamic nature. Some of these issues are

    introduced below.

    3.1.1 Discovery and Negotiation

    For all but the simplest of agreements, some form of negotiation is required if services are to be

    consumed on demand [Stantchev et. al. 2009]. WS-Negotiation was proposed [Hung et. al., 2004]

    with the principal goals of describing a negotiation process and publishing an XML negotiation

    language through the use of Web Services architecture technologies.

    The proposed standard leaves the negotiation decision-making process to some internal algorithm

    that could be based on metrics such as price, service level objectives, or business policy. From the

    service providers perspective, a decision must also take into consideration the resources that are

    available at the time. Many proposals have recently appeared regarding this complex problem.

    Suggested solutions include modelling the problem as a multi-constraint knapsack [Yu and Lin,

    2005], as a fuzzy constraint problem [Lin et. al., 2005] or using integer programming models [Gao

    et. al., 2005].

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    30/141

    14

    3.1.2 Service Level Agreement

    The first attempt at describing a machine-readable (i.e. XML) language for specifying service-level

    agreements was IBMs Web Service Language Agreement (WSLA) [Ludwig et. al., 2003]. The

    rationale behind the specification [Keller and Ludwig, 2003] was the desire to create a flexible but

    formal language that could be applied end-to-end. WSLA is still the most actively cited SLA

    specification [Patel et. al. 2009] for researchers in the area of SOA and Cloud Computing.

    Another commonly cited specification is a joint proposal put forward via the Global Grid Forum,

    entitled WS-Agreement [Andrieux et. al., 2007]. However, WS-Agreements primary focus is the

    management of Grid architectures [Foster et. al., 2004]. It does not, therefore, directly address all of

    the needs of SLA definition from a business users perspective. Recently modifications were

    proposed to modify WS-Agreement to allow it to better model composite business services [Di

    Modica et. al. 2009].

    A much simpler approach is described by Web Services Offering Language (WSOL) [Tosic et., al.,

    2003]. The objective of WSOL is to create a series of classes of service in a standard format that

    would sit alongside a services WSDL file. The WSOL descriptions act as advertisements for a

    matchmaking engine to examine. The service consumer, via their matchmaking engine, selects the

    offering that is most appropriate. The WSOL team suggest that classes of service could differ in

    terms of usage privileges, priorities or response times. The value of WSOL is that it vastly simplifies

    the negotiation process. Additionally, the authors suggest that the management infrastructure

    required to support WSOL is also much simplified [Tosic et. al., 2004].Many other SLA languages

    have been suggested [Greiner and Rahm, 2004, Tian et. al., 2004, Sahai et. al., 2002 and Lamanna et.

    al., 2003], each with their own relative merits. More recently these initial attempts at SLA definition

    have been enhanced to cater for composite applications. Two such examples are MoDe4SLA

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    31/141

    15

    [Bodenstaff et. al. 2008] and COSMA [Ludwig et. al. 2008]. These two specifications will be described

    in more detail in the following chapter.

    3.1.3 Service Provision

    Resource provisioning is one of the biggest challenges for a service provider in an on-demand

    environment. Amongst the considerations are the identity of the client, the service being requested,

    the SLA, business policy, the measurement service, specific service provisioning operations, and the

    concurrent activities of requests from other clients [Dan et. al., 2004]. The problem in an on-

    demand, e-commerce environment where requests are stochastic and the physical resources

    satisfying those requests behave non-linearly and are not only spread across multiple application

    tiers, but could also be distributed around the globe, is potentially a major exercise. Workflow

    management must be able to distinguish requests based on performance objectives [Dan et. al.,

    2003]. Service provisioning is at the heart of autonomic computing [IBM, 2003] and much of the

    recent research into these two fields is related. A primer on control theoretic techniques for

    resource can be found in Diao et. al. [2004]. A detailed overview of the work that has been

    conducted in this area will be provided in the section 2.2.

    3.1.4 Monitoring

    The monitoring of QoS metrics must consider what to measure, how to measure, who does it

    (service provider, consumer etc), and where the measurements are taken [Menasce, 2004]. The task

    of monitoring is implicit to the task of provisioning and it is assumed that a provider will need to

    have in place mechanisms for efficiently collecting and storing resource metrics as a basis for any

    adaptive provisioning. However, it is also in the consumers interest to undertake monitoring

    activities: the question of trust is one issue, but perhaps more importantly, the consumer might also

    be acting as a composite service provider to someone else. In this scenario, the service consumer

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    32/141

    16

    must not only monitor the quality of third-party services he/she consumes but must also monitor

    the quality of his/her own offering. Not only could monitoring become a fairly complex activity, it

    could in itself become a resource intensive activity. To this end, is has been proposed that new

    breed of service provider could become a reality one that exists to provide monitoring services

    independently of both service provider and consumer, easing trust issues and limiting performance

    penalties (Benjamin, et. al., 2004).

    An example of how monitoring and provisioning can be used to solve dynamic resource allocation

    problems is the WebQ framework developed by Patel et. al. (2004). WebQ dynamically monitors

    QoS parameters from more than one provider of a given service. To begin with, the framework

    distributes requests equally across all of the providers. As the metrics database grows, the

    framework dynamically shifts load to the better performing services using a weighted algorithm.

    Since it continues to send some of its load to the slower services (these could, in fact, be test

    messages), the framework can re-adapt itself should the performance of the slower service improve

    again. The authors have used multi-level rule modelling in OWL-S to create a flexible framework that

    can manage complex QoS requirements involving large number of parameters.

    3.2 Adaptive Control of Web Applications and Services

    Research into adaptive management has concentrated on two main modelling techniques: those

    using queuing theory [Kleinrock, 1976] and those using feedback control theory [Franklin et. al.,

    2002]. Combining the two, it is also possible to use queuing theory to derive the system model for a

    control theoretic approach.

    Derived from research into decision theory in artificial intelligence [Russel and Norvig, 2003],

    optimization problems can be approached using utility functions to mathematically model

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    33/141

    17

    preference. Given a certain event, the action to take is determined by ranking all possible actions

    and choosing the actions with the best expected outcome. An alternative is the use of genetic

    algorithms (GAs) [Holland, 1992]. GAs are adaptive techniques for search and optimization problems

    that were inspired by some of the processes involved in natural evolution and specifically the notion

    of survival of the fittest.

    The goals of published work on adaptive management fall into one of four categories:

    Dynamically reconfigure available resources to optimise the throughput or response of the

    current workload this is a common goal of queuing related techniques and utility driven

    approaches.

    Admission control through the rejection of excess requests. Ideally the controller should

    also attempt to only service the important ones, rather than rejecting requests at random.

    Both queuing and control theoretic techniques have used this approach

    Dynamically provision idle resources, if these are available. Queuing models have been used

    to predict when to do this, based on a demand threshold being exceeded.

    Degrade the performance of admitted requests, possibly paying penalties to the client.

    Control theoretic techniques have been applied to providing relative guarantees between

    service classes, rather than absolute guarantees.

    3.3 Queuing Theory

    3.3.1 Dynamic Resource Configuration

    An example of the use of queuing theory for adaptive resource configuration is Welsh's SEDA

    architecture [Welsh et. al., 2001]. SEDA applications consist of event-driven stages connected by

    queues. Dynamic resource controllers keep stages within their operating regimes during load

    changes via thread pool management. A stage is a self-contained component consisting of an event

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    34/141

    18

    handler, a simple, incoming event queue and a thread pool. The handler processes events and

    dispatches them onto successive stages. Multiple stages can share the same thread pool; hence the

    architecture can dynamically adjust the pool based on the load at each stage. SEDA can be found at

    the heart of the Mule [Mule, 2010] open source message bus. Dynamic thread adaptation was

    shown to be effective in dealing with bursty Internet traffic but had limitations when it came to

    dealing with overload conditions.

    Menasce and Bennani [2003] used an analytical performance model to design controllers that run

    periodically to determine the best current resource configuration of a web server given its current

    workload. The authors used a QoS controller that monitors system performance, including the

    resource utilisation of server resources and periodically executes an algorithm to determine the

    appropriate reconfiguration commands. Data is collected from metrics such as CPU utilisation, which

    allows the current service demand to be calculated as the ratio of the resource utilisation and the

    system throughput. Mean Value Analysis, MVA, [Lazowska et. al., 1984] is used to create a model in

    which average response time, the probability of rejection, and average throughput can be predicted.

    In this particular paper, the network model is extremely simple, assuming only that an incoming

    request is serviced by one of m threads. When all m threads are busy, the request is rejected. A hill-

    climbing search algorithm is used to find a close-to-optimal configuration by constantly re-applying

    the algorithm to all possible configurations.

    Pacifici et. al. [7] extended this work to clusters of servers, supporting multiple classes of web traffic.

    The content of the inbound requests SOAP header is examined to determine its class of service and

    the server farm is partitioned into clusters, each one managing different classes of traffic. A utility

    function is defined for each class of traffic, which is simply a construct to weight the deviation of the

    actual response times from the desired response time. A combined utility function is also derived to

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    35/141

    19

    calculate how to allocate tasks across all of the resources in the farm. In this work, a traffic class set

    is created for each tuple . In Kendall notation an M/M/1

    queuing model is used to predict the average response time. The dynamic model is used to allocate

    resources and dynamically load-balance work across the available resources.

    3.3.2 Admission Control

    Menasce et. al. [2004a] used an analytical model to make real-time admission control decisions.

    Every time a new request is received the performance network model, using approximate MVA,

    solves a closed loop queuing network. An algorithm then determines whether the request can be

    serviced or not based on the current commitments and the possible solutions suggested by the

    model. Each client session is modelled as an individual class.

    Urgaonkar and Shenoy [2005] discuss a policy mechanism that emphasises the need to ensure that

    the policy mechanism itself does not create a significant performance overhead. Requests are

    mapped to a service class and then scheduled either FIFO (first-in-first-out) or shortest job first.

    Requests of lower class are deliberately delayed. This prevents them from denying access to more

    important requests. Requests of a higher class are subject to the admission control tests first. If the

    highest class fails, there is obviously no need to test lower classes. Requests are admitted so long as

    the system believes it has sufficient capacity to meet the SLA. Furthermore, batching requests

    reduces policing overhead. Buckets are defined in each class, with a range of service times. All

    requests in a bucket are then treated as equal. When admission control is invoked it considers each

    non-empty bucket in the class its testing and conducts an all or nothing test on those requests. A

    predictive technique is also used to further reduce overhead. The number of requests to admit can

    be pre-computed if you have a good idea of how many requests will be arriving at the next time

    interval.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    36/141

    20

    3.3.3 Dynamic Provisioning of Idle Resources

    Urgaonkar and Shenoy [2005] also use a G/G/1 queuing model in conjunction with online

    measurements to determine the need to replicate applications across idle, virtual servers if the

    number of requests gets so high that a threshold is breached. This threshold is simply the known

    bound on the job arrival rate of a G/G/1 queue [Kleinrock, 1976].

    3.3.4 Extending to Multiple Tiers

    Urgaonkar et. al. [2005a] have specifically considered the problems associated with tackling

    bottlenecks in a multi-tier distributed application. They show that independent per-tier provisioning

    is not sufficient as it can fail to capture the way in which bottlenecks can shift across tiers. In a

    related paper, Urgaonkar et. al. present a multi-tier model based on MVA in [2005b]. The model

    deals with scenarios such as a single request in the web tier spawning multiple tasks on the

    application tier through the use of closed-loops creating multiple visits to each resource. They also

    deal with long-lived sessions using an infinite queue at the front of the model, which also serves as

    the re-entry point for requests that have been completed, thus forming a completely closed-loop.

    This models think time at the client. The model uses an exact MVA algorithm. They suggest it can

    be extended in several ways: to deal with scenarios where service times increase with load, where

    resources are replicated on the same tier (load-balancing), for overload conditions at a given tier

    causing dropped requests and for multiple session classes, but provide no specific details. Liu et. al.

    [2005] developed an approximate MVA model for a three-tiered architecture that uses a multi-

    station queuing centre to model the ability of web servers to multi-thread incoming requests.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    37/141

    21

    3.4 Control Theory

    3.4.1 Admission Control

    The first published work on the use of control theory for admission control appears to be

    Abdelzaher et. al. [Abdelzaher, 2001]. In this paper, the authors attempted to keep the utilisation of

    a web server at a fixed percentage where the web server was known to achieve optimum

    performance. A simple linear expression was derived relating the utilisation to the number of

    admitted job requests and the bandwidth of pages being served. A PI controller was used. Also using

    a PI controller, Kihl et. al. [2003] used a M/G/1 queue for their model using a non-linear

    approximation for the utilisation of the server expressed in terms of the number of requests in the

    system and the service time distribution.

    Lu et. al. [2002] presented an approach using two SISO (single-input single-output) controllers. The

    controlled variables were the deadline miss ratio and the CPU utilisation. The adaptive system is

    characterised in terms of the following performance metrics: stability (the miss ratio and utilisation

    are bounded at all times), transient state response (overshoot and settling time), steady-state error

    and sensitivity to workload variations.

    As an alternative to using multiple SISO controllers, [Diao et. al. 2002a], constructed a true MIMO

    (multiple-input multiple output) controller. They controlled the Keep Alive and Max Clients

    parameters of an Apache Web Server in order to optimise its CPU and memory utilisation. They

    conclude that MIMO design techniques such as the Linear Quadratic Regulator, LQR [Franklin et. al.,

    2002], are beneficial for balancing design trade-offs.

    3.4.2 Degraded Service

    An alternative approach is to degrade the service levels of admitted requests. Instead of offering

    customers absolute delay guarantees, Lu et. al. [2001] describe an approach that offers a

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    38/141

    22

    differentiated service. Only the relative delay between two service classes is guaranteed, e.g. the

    ratio of gold response time to silver response time. They point out that under conditions of heavy

    load many of the reported approaches will only ensure better service to premium customers, but do

    not provide any guarantees as to how much better the service will be. Their proportional delay

    model specifies a fixed ratio between the delays seen by each service class. They also introduce a

    hybrid policy one that uses proportional delay in normal operating conditions and switches to

    absolute delay under very heavy load. This is because extreme load could lead to very long response

    times even for the premium customers if the target is simply to maintain a fixed proportional delay.

    They show how the relative delays can be use as the control variable for a proportional feedback

    controller.

    3.4.3 Extending to Multiple Tiers

    Lu et. al. [2004] extended control theoretic techniques to multi-tier distributed systems. Their paper

    presents the EUCON (End-to-end Utilization CONtrol) algorithm, which adaptively manages CPU

    utilisation using feedback control and a MIMO model predictive controller. They point out that most

    papers on feedback control methods assume a single CPU operating on a single task while most

    applications consist of tasks spawning multiple other tasks and are deployed on multi-CPU

    platforms. The performance of one task is coupled to the performance of other tasks. Changing the

    rate of one task affects the utilisation of dependent tasks on the processors that they are using. This

    paper derived a dynamic model to capture coupling amongst processors, developed a model

    predictive controller approach for QoS control and designed a distributed MIMO feedback control

    loop. When the number of servers is large, the overhead of a centralised controller could become

    significant. For this reason, an enhanced version, DEUCON, was presented in [Wand et. al., 2004].

    This is the distributed controller version of EUCON. A peer-to-peer control structure and localised

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    39/141

    23

    utilisation control algorithm are used based on distributed model predictive controller theory where

    a controller for each CPU cooperates only with local neighbours, i.e. only those that are executing

    sub-tasks. Simulation results show that the overhead compared to a centralized solution is much

    lower.

    3.4.4 Fuzzy Controllers

    One of the major limitations of control theoretic approaches is the need to derive a suitable model

    of the system. Diao et. al. [2002b, 2003] demonstrated that fuzzy controllers offer significant

    advantages. They defined a set of simple business related metrics to describe revenue, cost

    (penalty), and profit and then adapted their MIMO controller [2002a] to use a set of fuzzy rules,

    such as:

    IF change_in_MaxUsers IS neglarge AND change_in_profit IS neglarge THEN

    next_change_in_MaxUsers IS poslarge.

    They show that a PI controller achieves better results in the region of the workload for which it was

    designed, but for all other workloads, the fuzzy controller outperforms it. It is frequently the case, in

    any engineering discipline, that the derivation of a suitable model (Franklin, et al. 2002] can be a

    challenging task. In the case of complex distributed IT systems, the results presented demonstrate

    that it is particularly challenging. The conclusion is that the less rigorous demands of deriving a fuzzy

    model mean that this could be an extremely valuable approach.

    3.5 Combined Approaches

    Although control theory has been successfully used to provide improvements in the throughout and

    response times of web applications, the technique is limited due to the highly non-linear behaviour

    such systems. Queuing models, on the other hand, are very good at modelling these systems due to

    their statistical approach. Liu et. al. [2006] applied a simple queuing model to an adaptive control

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    40/141

    24

    algorithm [Astrom and Wittenmark, 1994] to demonstrate its applicability as an admission control in

    an overloaded web site. They compare their technique with three other approaches:

    A queuing model only

    Adaptive control only

    A queuing model with a PI controller (an approach proposed by Kamra et. al. [2004]).

    They show how their approach provided the smallest difference between the target response time

    and the actual response time. Their controller did not exploit their previous work using MVA [Liu et.

    al., 2005] and they express their intent to extend it with this in mind.

    3.6 Solving Optimization Problems

    3.6.1 Utility Functions

    Utility functions have been commonly used in artificial intelligence (AI) as a means of expressing

    preference. Recent research has begun to explore their application to self-optimisation problems in

    autonomic computing [Walsh et. al., 2004]. Utility is the measure of the desirability of an outcome.

    It is usually measured in terms of the cost, benefit or risk of an action. A utility function assigns a

    cardinal number to the desirability of an outcome and can depend on one or more dimensions.

    These could be related to business level objectives as well as service level objectives. Expected utility

    is the combined utility of combinations of actions. By defining the optimal decision to be when the

    maximum expected utility is achieved, the regret of a decision can be defined as the difference

    between the maximum expected utility and the actual expected utility. A common AI algorithm

    known as minimax attempts to minimise the maximum possible regret [Wang and Boutilier, 2003].

    One of the key advantages of this approach is that decisions can be taken in the absence of a

    complete description of the constraints that define a utility function [ Boutilier et. al., 2004]. The

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    41/141

    25

    methodology has been demonstrated in an autonomic, self-optimising application architecture at

    IBM [Tesauro, et. al., 2004].

    3.6.2 Integer Linear Programming

    Several papers have been published that propose the use of Integer Linear Programming (IP)

    methods for web service composition and resource allocation problems [see for example Gao et. al.,

    2005 and Kelly, 2003]. In terms of the current discussion, the most relevant is the work of Zeng et al

    (2004) who have applied IP to the problem of finding an optimal execution plan for a sequence of

    tasks in a composite web service. IP problems are a form of linear programming where the variables

    are integers (usually 0 and 1). In this case, the variables are 1 if a service x can execute task y and 0

    otherwise. The objective function is a linear weighted calculation of the QoS using parameters such

    as price, availability, service time etc. IP attempts to maximize or minimize the value of the objective

    function by adjusting the values of the variables while enforcing any known constraints. The output

    of an IP problem is the maximum (or minimum) value of the objective function and the values of

    variables at this maximum (minimum). As the authors discuss, though, IP has a large computation

    cost, especially as the number of services and tasks increase, because IP problems are generally NP-

    hard.

    3.6.3 Genetic Algorithms

    Whilst GAs have been applied to multiple and diverse applications [Goldberg, 1989], their use in

    software systems optimization problems is currently quite limited, focussing primarily on the rather

    different problem of Job Shop Scheduling [see for example Mahmood, 2000, Fayad and Petrovic,

    2005, Petrovic and Fayad, 2005, Montana et. al., 1998, Wang et. al. 1997]. An exception is the work

    of Canfora et. al. [2005] who attempt to solve an optimization problem for a set of Web Services

    comprising a complex workflow using a GA. To evaluate the fitness of their solution, they use a

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    42/141

    26

    weighted combination of parameters including Availability, Response Time, Cost and Reliability,

    although there is no discussion of how these parameters might be evaluated in real-time, based on

    real measurements. Using a numerical simulation, they do, however, provide an interesting

    comparison with the Integer Programming method of Zeng et. al. described above, and demonstrate

    that the GA provides a faster solution as the number of Web Services increases.

    In a related paper [Canfora et. al., 2004] the authors take a step to considering the dynamic use of

    GAs for service composition by discussing the question of service re-planning and propose adding a

    trigger to the workflow engine to re-evaluate the optimum service composition.

    A similar piece of work was presented by Jaeger and Mhl [2007] who also use numerical simulation

    to describe the effectiveness of GAs in this problem domain. They provide detailed results

    comparing the impact of different parameters (e.g. mutation rate, fitness function) on the

    optimisation capability of the genetic algorithm.

    From the world of task scheduling two papers provide useful background into the problem of using a

    GA in a dynamic scenario. Zomaya and Teh [2001] used a cycle crossover to load balance a discrete

    set of tasks over a set of resources, whilst Page and Naughton [2006] added a heuristic on the

    mutation operator to improve the performance.

    3.7 Concluding Remarks

    In conclusion, there is a large body of previous work to draw inspiration from. Approaches using

    control theory appear to be difficult to apply to distributed systems, as is evidenced by the

    complexity of the DEUCON model [Wand et. al., 2004]. Statistical approaches using queuing theory

    have proven successful in related disciplines such as network performance analysis and

    telecommunications queuing. For this reason, the queuing model approach is used in this thesis.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    43/141

    27

    In terms of choosing an optimization strategy, the discussion has focussed on general search and

    optimization techniques rather than heuristic methods confining themselves to a narrow domain.

    The results of Canfora et. al. [2005] suggest that GAs offer a promising candidate, particularly

    compared with Integer Programming. GAs are traditionally strong in problem spaces where heuristic

    approaches are too complex to be practical. Weise et al [2007], in a review of web service

    composition challenges, conclude that especially in practical applications, additional requirements

    will be imposed onto a service composition engineSuch requirements could include quality of

    service (QoS) ... or the generation of complete BPEL processes ... In this case, heuristic search will

    most probably become insufficient but genetic algorithms and genetic programming will still be able

    to deliver good results

    3.8 Publications

    Parts of this chapter were published in the Software Quality Journal:

    Shelly Saunders, Margaret Ross, Geoff Staples, and Sean Wellington, 2006. The Software Quality

    Challenges of Service Oriented Architectures in e-Commerce, Software Quality Journal14 (1)

    65-76 March 2006

    This article has been cited at least 11 times by the start of 2011.

    It was also presented at SQM 2005 conference as:

    Shelly Saunders, Margaret Ross, Geoff Staples, and Sean Wellington, 2005. The Software Quality

    Challenges of Service Oriented Architectures in e-Commerce, In: Current Issues in Software

    Quality, Thirteenth International Conference on Software Quality Management (SQM 2005),

    pp87-100.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    44/141

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    45/141

    29

    Application (KQI)

    Service Performance

    Indicators (KPI)

    Monitoring

    Instrumentation

    Service Level

    Agreement

    Service Level

    Monitoring

    Figure 4.1 Service Level Management

    Key Quality Indicators (KQI) of the application are derived from the performance metrics of the

    underlying composite services. These performance metrics are known as Key Performance

    Indicators (KPI). For each service these will be obtained from monitoring instrumentation which is

    the core of the whole process.

    4.1.1 Service Monitoring

    Many Cloud vendors who offer Web Services that can be composed into composite enterprise

    applications are only just beginning to provide actual data about the performance of their services.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    46/141

    30

    Even where this data is provided, the question of trust will always be an issue. Furthermore, the

    complete round trip time of a particular service from a particular vendor is also dependent on

    multiple elements lying between the composite application and the service itself, including ISPs

    hardware, communication links etc. For these reasons, we propose that even if all services provide

    their own performance metrics, service monitoring is also performed centrally. In our work we have

    used the Enterprise Service Bus to achieve this. Apart from raw data obtained from live monitoring

    of data, we can also use information from service registries, test data, SLA statements on contracted

    QoS values and feedback from other service consumers. However, the most weight should be put on

    the service execution history data as the most reliable source of information. This process has been

    termed service profiling [Abramowicz et. al., 2006].

    4.1.2 Key Quality Indicators and Key Performance Indicators

    Whilst there are many QoS metrics applicable to services [Kritikos and Plexousakis, 2009] the Key

    Quality Indicator of interest to this thesis is primarily the end-to-end time required to execute a

    particular workflow. This KQI can be mapped directly into the SLA. The KQI is derived from KPIs of

    each service used by that workflow. The KPIs we are interested in are the execution times of each

    service call. For differentiated services based on priority sessions, we would also be interested in the

    cost of each service as another KPI. Further, from a business intelligence perspective, understanding

    the costs involved in operating a composite service is also very desirable.

    Once KPIs are defined, we can generate our end-to-end workflow execution time KQIs from the KPIs

    for example using the techniques described by Mensace [2004]. In the example of Figure 4.2, Service

    A invokes B with probability p1 and it invokes C with probability p2 = 1 p1.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    47/141

    31

    Likewise C invokes D with probability p3 and it invokes E with probability p4 = 1 p3. Finally F is

    invoked when either D or E finish, or when B finishes. In this example the total execution time, T, is

    given by:

    T = tA + p1tB + p2(tC + p3tD + p4tE) + tF

    where p is the probability of that execution path being chosen.

    Figure 4.2 Composite Web Services

    Likewise, the total cost will have exactly the same form. The KQI is based on the value of T that the

    application is expected to meet. Likewise KPIs are based on values of tn that each service is expected

    to be able to meet. Since the KQI is a composite measure each individual KPI can be defined with a

    certain degree of tolerance. An individual service could exceed its KPI whilst the overall application

    execution time remains within its SLA targets. This allows us to add flexibility to the KPIs by adding

    performance thresholds. There could be a warning threshold as well as an error threshold.

    4.2 Service Level Agreement Design

    4.2.1 COSMA

    The collection of service execution data and its use in the definition of KPIs has also been suggested

    as an important method of service profiling of composite services [Ludwig et. al. 2009a] based on

    COSMA, an approach for managing SLAs in composite services [Ludwig et. al. 2008].The concept

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    48/141

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    49/141

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    50/141

    34

    therefore, find it very useful to review MoDe4SLA and compare it with the work we have presented

    in this thesis so far.

    The MoDe4SLA approach begins with a dependency model which for our purposes is similar to what

    we have produced already in Figure 4.2. We can produce models of this sort not only for response

    times but also for cost dependencies.

    Next the approach advocates that we analyse the impact the dependent services have on the

    composite service. An example is a service that is called repeatedly. If a workflow calls service A

    three times and its response time is 3 seconds and it calls service B once and its response time is 4

    seconds we could represent the impact to the workflow of service A has 3x3s = 9 and the impact of

    service B has 4x1s = 4.

    Additional measures of impact might also be desirable. We mentioned in section 5.3.4 that some

    services could have a far greater impact on our composite service than other services, for example,

    if only one external vendor could supply that service. The MoDe4SLA approach does not cover this

    kind of scenario so we propose that a uniqueness impact is also derived for each service. If a

    service can only be sourced from one location it has an uniqueness of one. If we can source the

    service from 2 locations it has a uniqueness of 0.5.

    In both the impact derivation and the uniqueness, the important thing to understand is that we are

    at this stage simply creating a method which allows us to rank services as being important or less

    important to us in meeting our own service level objectives. The actual values are of no importance

    as long as we are consistent with how we derive them. Note also that MoDe4SLA was extended in a

    recent paper [Bodenstaff et. al., 2009a] to study availability as a metric alongside response time and

    cost not an impact. This is also an important consideration, especially if it is conjugated with

    uniqueness.

  • 7/28/2019 Software Quality of Service in Composite Applications Built with Web Services PhD Thesis

    51/141

    35

    Next, MoDe4SLA suggests that we structure our monitoring results. All of the data indicated by

    MoDe4SLA is captured by our management solution described in section 4.3.3 and it consists of the

    following:

    An audit trail of all the messages exchanged

    The services invoked

    Which workflow the service invocation belonged to e.g. New Business, Renewal.