S-CUBE LP: Online Testing for Proactive Adaptation

www.s-cube-network.eu

University of Duisburg Essen (UniDue)

Universitat Politècnica de Catalunya (UPC)

South East European Research Centre (SEERC)

Osama Sammodi (UniDue)

S-Cube Learning Package

Quality Assurance and Quality Prediction:

Online Testing for Proactive Adaptation

© UniDue

© UniDue

S-Cube

Quality Definition, Negotiation

and Assurance

Quality Assurance and Quality Prediction

Online Testing for Proactive Adaptation

Learning Package Categorization

Learning Package Overview

Motivation

– Failure Prediction and Proactive Adaptation

Failure Prediction through Online Testing (OT)

Discussions

Summary

© UniDue

Service-based Applications Current Situation

shared ownership and adaptive systems

Changing Requirements +

Dynamic Context Aspects

Context Development Process System/Application

Self- Adaptation

© UniDue

Assume a citizen wants to

renew a vehicle’s registration

online:

1.The citizen provides a renewal

identification number or the license

plate number for identification

2.The citizen will have to pay the

renewal fee (for example, using

ePay service)

3.The application renews the

registration of the vehicle and

updates its record to reflect the

registration renewal

4.Finally, a confirmation of the

renewal process is e-mailed to the

citizen (for example, using Yahoo).

In parallel to that, a validation sticker

is mailed to the citizen

Service-based Applications Example (eGovernment Application)

Organization

Boundary

© UniDue

The previous slides showed that Service-Based Applications (SBAs)

run in highly dynamic settings wrt.

– 3rd party services, service providers, …

– requirements, user types, end-user devices, network connectivity, …

Difference from traditional software systems

– Unprecedented level of change

– No guarantee that 3rd party service fulfils its contract

– Hard to assess behaviour of infrastructure (e.g., Internet, Cloud, …) at design time

SBAs cannot be specified, realized and analyzed completely in

advance (i.e., during design-time)

Decisions and checks during the operation of the SBA are needed

(i.e., at run-time)

© UniDue

Service-based Applications The need for Adaptation

The need for Adaptation The S-Cube SBA Lifecycle

Requirements

Engineering

Design

Realization

Deployment &

Provisioning

Operation & Management

Identify

Adaptation

Need (Analyse)

Identify

Adaptation

Strategy (Plan)

Enact Adaptation

(Execute)

Evolution Adaptation

Design time Run-time („MAPE“ loop)

(incl. Monitor)

Background: S-Cube Service Life-Cycle

A life cycle model is a process model that covers the activities related to the entire life cycle of a service, a service-based application, or a software component or system [S-Cube KM]

© UniDue

MAPE Loop

http://www.s-cube-network.eu/km/terms/s/service-based-application-construction



Reactive Adaptation

– Repair/compensate for external failure visible to the end-user

– drawbacks: execution of faulty services, reduction of performance, inconsistent end-states, ...

Preventive Adaptation

– An internal failure/deviation occurs

Will it lead to an external failure?

– If “yes”: Repair/compensate internal failure/deviation to prevent external failure

Proactive Adaptation

Is internal failure/deviation imminent (but did not occur)?

– If “yes”: Modify system before internal failure actually occurs

Failure? Failure!

Failure?

Failure!

Key enabler: Online Failure Prediction

© UniDue

Types of Adaptation General differences

Prediction must be efficient

– Time available for prediction and repairs/changes is limited

– If prediction is too slow, not enough time to adapt

Prediction must be accurate

– Unnecessary adaptations can lead to

- higher costs (e.g., use of expensive alternatives)

- delays (possibly leaving less time to address real faults)

- follow-up failures (e.g., if alternative service has severe bugs)

– Missed proactive adaptation opportunities diminish the benefit of proactive adaptation (e.g., because reactive compensation actions are needed)

© UniDue

Need for Accuracy Requirements on Online Prediction Techniques


Motivation

– Failure Prediction and Proactive Adaptation

Failure Prediction through Online Testing (OT)

Discussions

Summary

© UniDue

Quality Assurance Techniques Background: Two Important Dynamic Checks

Testing (prominent for traditional software)

Systematically execute the software

1. Software is fed with concrete pre-determined inputs (test cases)

2. Produced outputs* are observed

3. Deviation = failure

Monitoring (prominent for SBAs)

– Observe the software during its current execution (i.e., actual use / operation)

1. End-user interacts with the system

2. Produced outputs* are observed

3. Deviation = failure

input output *

Tester

input output *

End-user

[for more details, see deliverable JRA-1.3.1; S-Cube KM] * incl. internal data collected for QA purposes

© UniDue

• Problem: Monitoring only (passively) observes services or SBAs during their actual use in the field

cannot guarantee comprehensive / timely coverage of the ’test object’

can reduce the accuracy of failure prediction

• Solution: Online Testing = Extend testing to the operation phase

“Actively (& systematically) execute services in parallel to their normal use in SBA”

Requirements

Engineering

Design

Realization

Deployment &

Provisioning

Operation &

Management

Identify

Adaptation

Need

Identify

Adaptation

Strategy

Enact

Adaptation

Evolution Adaptation

Testing

© UniDue

Online Failure Prediction through OT Motivation

PROSA: Predict violation of QoS

– For stateless services (i.e., services that don't persist any state between requests)

– E.g., predict that “response time” of “stock quote” service is slower than 1000 ms

– See [Sammodi et al. 2011, Metzger 2011, Metzger et al. 2010, Hielscher et al. 2008]

JITO: Predict violation of protocol

– For conversational services (i.e, services that only accept specific sequences of operation invocations)

– E.g., predict that “checkout” of “shopping basket” service fails after all products have been selected

– See [Dranidis et al. 2010]

© UniDue

Online Failure Prediction through OT Two S-Cube Approaches

In this learning package we focus on PROSA

Note: Both approaches support “Service Integrator“; who integrates in-house and 3rd

party services to compose an SBA

Idea of the PROSA approach

Inverse usage-based testing:

– Assume: A service has seldom been “used” in a given time period

– This implies that not enough “monitoring data” (i.e., data collected from monitoring its usage) has been collected

– If we want to predict the service’s QoS, the available monitoring data is used, and then the prediction accuracy might be not good

– To improve the prediction accuracy, dedicated online tests are performed to collect additional evidence for quality of the service (this evidence is called “test data”)

- But how much to test? see next slides!

– Both “monitoring data” and “test data” are used for prediction © UniDue

PROSA Online Testing of QoS

Usage-based Testing Background

Usage-based (aka. Operational profile) testing is a technique

aimed at testing software from the users’ perspective [Musa 1993,

Trammell 1995]

It drives the allocation of test cases in accordance with use,

and ensures that the most-used operations will be the most

tested

The approach was proposed for assuring reliability

Typically, either flat operational profiles or Markov chain

based models are used to represent usage models

– Markov chains represent the system states and transitions between those

states, together with probabilities for those state transitions (thus they capture structure) [Trammell 1995]

– Operational profiles are defined as a set of operations and their

probabilities [Musa 1993]

Usage-based Testing Background

© UniDue

Testing

Usage

Model

Monitoring

Events

Test

Input

3.Test

Execution

2.Test Case

Selection

1.Test

Initiation

4.Aggregation of

Monitoring Data Test

Request

= activity

= data flow

Usage

frequencies

Test Case

Repository

Monitoring 5.Usage Model

Building/Updating

6.Prediction

7.Adaptation

Monitoring

Data

Test

Output

Adaptation

Trigger

Adaptation Enactment

SBA Instances

Services s1 s2 sn

Test cases

© UniDue

PROSA Online Testing of QoS: General Framework

The framework consists of two main loops: one for testing and

another for monitoring:

1) Test initiation: Includes all preparatory activities for online test

selection and execution, such as definition of potential test cases

2) Test case selection: Selects test cases to be executed. This is

the central activity of our framework. Next slides provides further

details about our usage-based test case selection approach

3) Test execution: Executes the test cases that have been selected

by the previous activity

4) Aggregation of monitoring data: Collects monitoring data during

the operation of the SBA which is used for both updating the “usage

model” as the SBA operates (usage frequencies) and also for making

predictions

PROSA Online Testing of QoS: Framework Activities (1)

© UniDue

5) Usage-model building/updating: Initial usage model can be built

from results from requirements engineering. During operation of the

SBA, usage frequencies computed from monitoring events are used

to automatically update the “usage model”

6) Prediction: Augments testing data with monitoring data and

makes the actual QoS prediction for the services in the SBA

7) Adaptation: Based on the prediction results, adaptation requests

are issued if the expected quality will be violated. We focus on

adaptation by dynamic service binding (services are selected and

dynamically substituted at runtime)

PROSA Online Testing of QoS: Framework Activities (2)

© UniDue

Steps of the approach:

1.Build Usage Model

–We divide the execution of SBA into periods Pi - Between periods, usage model is updated

- Let ψk,i denote the usage probability for a service Sk in period Pi

2. Exploit Usage Model for Testing

–For simplification, let:

– m = number of time points within period

– qk = maximum number of tests allowed for service Sk per period

We compute # of data points estimated to be expected from monitoring in Pi: mmonitoring,k,i = ψk,i * m

Based on the above, we compute # of additional data points to be collected by testing in Pi:

mtesting,k,i = max (0;qk – mmonitoring,k,i)

P1 P2 Pi

Test? …

ti,j t1,1 t1,m

… Test?

Usage Model for P2

© UniDue

PROSA Online Testing of QoS: Technical Solution

Note: For 3rd party services, the number of allowable tests can be limited due to economical (e.g., pay per service invocation) and technical considerations (testing can impact on the availability of a service)

© UniDue

• True Positives (TP): when prediction predicts a failure and the service turns out

to fail when invoked during the actual execution of the SBA (i.e., actual failure)

• False Positives (FP): when prediction predicts a failure although the service

turns out to work as expected when invoked during the actual execution of the SBA (i.e., no actual failure)

• False Negatives (FN): when prediction doesn’t predict a failure although the

service turns out to fail when invoked during the actual execution of the SBA (i.e., actual failure)

• True Negatives (TN): when prediction doesn’t predict a failure and the service

turns out to work as expected when invoked during the actual execution of the SBA (i.e., no actual failure)

Actual Failure

Actual Non-Failure

Predicted Failure

TP FP

Predicted Non-Failure

FN TN

Measuring Accuracy Introducing TP, FP, FN and TN

To measure the accuracy of failure prediction, we take into account the following four cases:

t

Res

po

nse

tim

e se

rvic

e S2

time

Running SBA

S1 S2 S3

Predictor for response time

Monitored response time

…

Unnecessary Adaptation

Missed Adaptation

© UniDue

Actual Failure

Actual Non-Failure

Predicted Failure

TP FP

Predicted Non-Failure

FN TN

Measuring Accuracy Computing TP, FP, FN and TN

The four cases are computed as the following:

How many of the predicted failures were

actual failures?

How many of the actual failures have

been correctly predicted as failures?

How many of the predicted non-failures

were actual non-failures?

How many of the actual non-failures

have been incorrectly predicted as failures?

Note: smaller f is preferable. How many

predictions were correct?

Precision:

Recall (true positive rate):

Accuracy:

Negative predictive value:

False positive rate:

Note: Actual failures are rare prediction that always predicts “non-failure” can achieve high accuracy a. …

f Unnecessary Adaptation

1-r Missed Adaptation

Based on the previous cases, we compute the following metrics:

Measuring Accuracy Contingency Table Metrics (see [Salfner et al. 2010])

© UniDue


Motivation

– Fault Prediction and Proactive Adaptation

Fault Prediction through Online Testing (OT)

– PROSA: Violation of Quality of Service (QoS)

– JITO: Violation of Protocol

Discussions

Summary

© UniDue

To evaluate PROSA, we conducted an exploratory experiment with the following setup:

– Prototypical implementation of prediction approaches (see next slide)

– Simulation of example abstract service-based application (the workflow in the diagram) (100 runs, with 100 running applications each)

– (Post-mortem) monitoring data from real Web services (e.g., Google, 2000 data

points per service; QoS = performance) [Cavallo et al. 2010]

– Measuring contingency table metrics (for S1 and S3)

PROSA Online Testing of QoS: Evaluation

© UniDue

Prediction model = Arithmetic average of data points:

Initial exploratory experiments indicated that number of past data points (n) impacts on accuracy

Thus, in the exp. three variations of the model were considered:

– n = 1, aka. “point prediction” prediction value = current value

– n = 5 prediction value = the average of last 5 past data points

– n = 10 prediction value= the average of last 10 past data points

© UniDue

PROSA Online Testing of QoS: Prediction Models

S3

Considering the different prediction models:

•no significant difference in precision (p) & neg. predictive value (v)

•recall (r) ~ false positive rate (f) “conflicting”!

•accuracy (a) best for “point prediction”

© UniDue

PROSA Online Testing of QoS: Results

S1

Considering the different prediction models:

•no significant difference in precision (p) & neg. predictive value (v)

•recall (r) ~ false positive rate (f) “conflicting”!

•accuracy (a) best for “point prediction”

•difference from S3: “last 5” has highest recall for S1

© UniDue


S3

Comparing PROSA with Monitoring:

• For S3, prediction based on online testing (ot) is improved along all metrics when compare with prediction based on monitoring (mon) only

© UniDue


S1

Comparing PROSA with Monitoring:

• Improvement is not so high for S1 (already lots of monitoring data)

© UniDue


PROSA Online Testing of QoS: Discussions

Pros:

– Generally improves accuracy of failure prediction

– Exploits available monitoring data

– Beneficial in situations where prediction accuracy is critical while available past monitoring data is not enough to achieve this

– Can complement approaches that make prediction based available monitoring data (e.g., approaches based on data mining) and require lots of data for accurate prediction

– Can be combined with approaches for preventive adaptation, e.g.,:

- SLA violation prevention with machine learning based on predicted service failures Run-time verification to check if “internal” service failure leads to “external” violation of SLA

Cons:

– Assumes that testing a service doesn’t produce sides effects

– Can have associated costs due to testing:

One can use the usage model to determine the need for the testing activities

Require further investigation into cost models that relate costs of testing vs. costs of compensation of wrong adaptation

© UniDue


Motivation

– Fault Prediction and Proactive Adaptation

Fault Prediction through Online Testing (OT)

Discussions

Summary

© UniDue

2 complementary solutions for failure prediction

based on Online Testing

– PROSA: Prediction of QoS violation

– JITO: Prediction of protocol violation

Internal Failure does not necessarily imply external

failure (i.e., violation of SLA / requirement of composed

service)

Combine “internal” failure prediction approaches with “external”

failure prediction:

- TUW & USTUTT: SLA violation prevention with machine learning based

on predicted service failures

- UniDue: Run-time verification to check if “internal” service failure leads to

“external” violation of SLA

© UniDue

Summary

• [Sammodi et al. 2011] O. Sammodi, A. Metzger, X. Franch, M. Oriol, J. Marco, and K. Pohl. Usage-based online testing for proactive adaptation of service-based applications. In COMPSAC 2011

• [Metzger 2011] A. Metzger. Towards Accurate Failure Prediction for the Proactive Adaptation of Service-oriented Systems (Invited Paper). In ASAS@ESEC 2011

• [Metzger et al. 2010] A. Metzger, O. Sammodi, K. Pohl, and M. Rzepka. Towards pro-active adaptation with confidence: Augmenting service monitoring with online testing. In SEAMS@ICSE 2010

• [Hielscher et al. 2008] J. Hielscher, R. Kazhamiakin, A. Metzger, and M. Pistore. A framework for proactive self-adaptation of service-based applications based on online testing. In ServiceWave 2008

• [Dranidis et al. 2010] D. Dranidis, A. Metzger, and D. Kourtesis. Enabling proactive adaptation through just-in-time testing of conversational services. In ServiceWave 2010

© UniDue

Further S-Cube Reading

http://portal.acm.org/citation.cfm?id=1719370

[Salehie et al. 2009] Salehie, M., Tahvildari, L.: Self-adaptive software: Landscape and research challenges. ACM Transactions on Autonomous and Adaptive Systems 4(2), 14:1 – 14:42 (2009)

[Di Nitto et al. 2008] Di Nitto, E.; Ghezzi, C.; Metzger, A.; Papazoglou, M.; Pohl, K.: A Journey to Highly Dynamic, Self-adaptive Service-based Applications. Automated Software Engineering (2008)

[PO-JRA-1.3.1] S-Cube deliverable # PO-JRA-1.3.1: Survey of Quality Related Aspects Relevant for Service-based Applications; http://www.s-cube-network.eu/results/deliverables/wp-jra-1.3

[PO-JRA-1.3.1] S-Cube deliverable # PO-JRA-1.3.5: Integrated principles, techniques and methodologies for specifying end-to-end quality and negotiating SLAs and for assuring end-to-end quality provision and SLA conformance; http://www.s-cube-network.eu/results/deliverables/wp-jra-1.3

[S-Cube KM] S-Cube Knowledge Model: http://www.s-cube-network.eu/knowledge-model

[Trammell 1995] Trammell, C.: Quantifying the reliability of software: statistical testing based on a usage model. In ISESS’95. Washington, DC: IEEE Computer Society, 1995, p. 208

[Musa 1993] Musa, J.: Operational profiles in software-reliability engineering. IEEE Software, vol. 10, no. 2, pp. 14 –32, mar. 1993

[Salfner et al. 2010] F. Salfner, M. Lenk, and M. Malek. A survey of online failure prediction methods. ACM Comput. Surv., 42(3), 2010

[Cavallo et al. 2010] B. Cavallo, M. Di Penta, and G. Canfora. An empirical comparison of methods to support QoS-aware service selection. In PESOS@ICSE 2010

© UniDue

References

http://www.s-cube-network.eu/results/deliverables/wp-jra-1.3


















http://www.s-cube-network.eu/knowledge-model







The research leading to these results has

received funding from the European

Community’s Seventh Framework

Programme [FP7/2007-2013] under grant

agreement 215483 (S-Cube).

© UniDue

Acknowledgment

Technology

S-CUBE LP: Online Testing for Proactive Adaptation