Upload
virtual-campus
View
242
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
www.s-cube-network.eu
University of Duisburg Essen (UniDue)
Universitat Politècnica de Catalunya (UPC)
South East European Research Centre (SEERC)
Osama Sammodi (UniDue)
S-Cube Learning Package
Quality Assurance and Quality Prediction:
Online Testing for Proactive Adaptation
© UniDue
© UniDue
S-Cube
Quality Definition, Negotiation
and Assurance
Quality Assurance and Quality Prediction
Online Testing for Proactive Adaptation
Learning Package Categorization
Learning Package Overview
Motivation
– Failure Prediction and Proactive Adaptation
Failure Prediction through Online Testing (OT)
Discussions
Summary
© UniDue
Service-based Applications Current Situation
shared ownership and adaptive systems
Changing Requirements +
Dynamic Context Aspects
Context Development Process System/Application
Self- Adaptation
© UniDue
Assume a citizen wants to
renew a vehicle’s registration
online:
1.The citizen provides a renewal
identification number or the license
plate number for identification
2.The citizen will have to pay the
renewal fee (for example, using
ePay service)
3.The application renews the
registration of the vehicle and
updates its record to reflect the
registration renewal
4.Finally, a confirmation of the
renewal process is e-mailed to the
citizen (for example, using Yahoo).
In parallel to that, a validation sticker
is mailed to the citizen
Service-based Applications Example (eGovernment Application)
Organization
Boundary
© UniDue
The previous slides showed that Service-Based Applications (SBAs)
run in highly dynamic settings wrt.
– 3rd party services, service providers, …
– requirements, user types, end-user devices, network connectivity, …
Difference from traditional software systems
– Unprecedented level of change
– No guarantee that 3rd party service fulfils its contract
– Hard to assess behaviour of infrastructure (e.g., Internet, Cloud, …) at design time
SBAs cannot be specified, realized and analyzed completely in
advance (i.e., during design-time)
Decisions and checks during the operation of the SBA are needed
(i.e., at run-time)
© UniDue
Service-based Applications The need for Adaptation
The need for Adaptation The S-Cube SBA Lifecycle
Requirements
Engineering
Design
Realization
Deployment &
Provisioning
Operation & Management
Identify
Adaptation
Need (Analyse)
Identify
Adaptation
Strategy (Plan)
Enact Adaptation
(Execute)
Evolution Adaptation
Design time Run-time („MAPE“ loop)
(incl. Monitor)
Background: S-Cube Service Life-Cycle
A life cycle model is a process model that covers the activities related to the entire life cycle of a service, a service-based application, or a software component or system [S-Cube KM]
© UniDue
MAPE Loop
Reactive Adaptation
– Repair/compensate for external failure visible to the end-user
– drawbacks: execution of faulty services, reduction of performance, inconsistent end-states, ...
Preventive Adaptation
– An internal failure/deviation occurs
Will it lead to an external failure?
– If “yes”: Repair/compensate internal failure/deviation to prevent external failure
Proactive Adaptation
Is internal failure/deviation imminent (but did not occur)?
– If “yes”: Modify system before internal failure actually occurs
Failure? Failure!
Failure?
Failure!
Key enabler: Online Failure Prediction
© UniDue
Types of Adaptation General differences
Prediction must be efficient
– Time available for prediction and repairs/changes is limited
– If prediction is too slow, not enough time to adapt
Prediction must be accurate
– Unnecessary adaptations can lead to
- higher costs (e.g., use of expensive alternatives)
- delays (possibly leaving less time to address real faults)
- follow-up failures (e.g., if alternative service has severe bugs)
– Missed proactive adaptation opportunities diminish the benefit of proactive adaptation (e.g., because reactive compensation actions are needed)
© UniDue
Need for Accuracy Requirements on Online Prediction Techniques
Learning Package Overview
Motivation
– Failure Prediction and Proactive Adaptation
Failure Prediction through Online Testing (OT)
Discussions
Summary
© UniDue
Quality Assurance Techniques Background: Two Important Dynamic Checks
Testing (prominent for traditional software)
Systematically execute the software
1. Software is fed with concrete pre-determined inputs (test cases)
2. Produced outputs* are observed
3. Deviation = failure
Monitoring (prominent for SBAs)
– Observe the software during its current execution (i.e., actual use / operation)
1. End-user interacts with the system
2. Produced outputs* are observed
3. Deviation = failure
input output *
Tester
input output *
End-user
[for more details, see deliverable JRA-1.3.1; S-Cube KM] * incl. internal data collected for QA purposes
© UniDue
• Problem: Monitoring only (passively) observes services or SBAs during their actual use in the field
cannot guarantee comprehensive / timely coverage of the ’test object’
can reduce the accuracy of failure prediction
• Solution: Online Testing = Extend testing to the operation phase
“Actively (& systematically) execute services in parallel to their normal use in SBA”
Requirements
Engineering
Design
Realization
Deployment &
Provisioning
Operation &
Management
Identify
Adaptation
Need
Identify
Adaptation
Strategy
Enact
Adaptation
Evolution Adaptation
Testing
© UniDue
Online Failure Prediction through OT Motivation
PROSA: Predict violation of QoS
– For stateless services (i.e., services that don't persist any state between requests)
– E.g., predict that “response time” of “stock quote” service is slower than 1000 ms
– See [Sammodi et al. 2011, Metzger 2011, Metzger et al. 2010, Hielscher et al. 2008]
JITO: Predict violation of protocol
– For conversational services (i.e, services that only accept specific sequences of operation invocations)
– E.g., predict that “checkout” of “shopping basket” service fails after all products have been selected
– See [Dranidis et al. 2010]
© UniDue
Online Failure Prediction through OT Two S-Cube Approaches
In this learning package we focus on PROSA
Note: Both approaches support “Service Integrator“; who integrates in-house and 3rd
party services to compose an SBA
Idea of the PROSA approach
Inverse usage-based testing:
– Assume: A service has seldom been “used” in a given time period
– This implies that not enough “monitoring data” (i.e., data collected from monitoring its usage) has been collected
– If we want to predict the service’s QoS, the available monitoring data is used, and then the prediction accuracy might be not good
– To improve the prediction accuracy, dedicated online tests are performed to collect additional evidence for quality of the service (this evidence is called “test data”)
- But how much to test? see next slides!
– Both “monitoring data” and “test data” are used for prediction © UniDue
PROSA Online Testing of QoS
Usage-based Testing Background
Usage-based (aka. Operational profile) testing is a technique
aimed at testing software from the users’ perspective [Musa 1993,
Trammell 1995]
It drives the allocation of test cases in accordance with use,
and ensures that the most-used operations will be the most
tested
The approach was proposed for assuring reliability
Typically, either flat operational profiles or Markov chain
based models are used to represent usage models
– Markov chains represent the system states and transitions between those
states, together with probabilities for those state transitions (thus they capture structure) [Trammell 1995]
– Operational profiles are defined as a set of operations and their
probabilities [Musa 1993]
Usage-based Testing Background
© UniDue
Testing
Usage
Model
Monitoring
Events
Test
Input
3.Test
Execution
2.Test Case
Selection
1.Test
Initiation
4.Aggregation of
Monitoring Data Test
Request
= activity
= data flow
Usage
frequencies
Test Case
Repository
Monitoring 5.Usage Model
Building/Updating
6.Prediction
7.Adaptation
Monitoring
Data
Test
Output
Adaptation
Trigger
Adaptation Enactment
SBA Instances
Services s1 s2 sn
Test cases
© UniDue
PROSA Online Testing of QoS: General Framework
The framework consists of two main loops: one for testing and
another for monitoring:
1) Test initiation: Includes all preparatory activities for online test
selection and execution, such as definition of potential test cases
2) Test case selection: Selects test cases to be executed. This is
the central activity of our framework. Next slides provides further
details about our usage-based test case selection approach
3) Test execution: Executes the test cases that have been selected
by the previous activity
4) Aggregation of monitoring data: Collects monitoring data during
the operation of the SBA which is used for both updating the “usage
model” as the SBA operates (usage frequencies) and also for making
predictions
PROSA Online Testing of QoS: Framework Activities (1)
© UniDue
5) Usage-model building/updating: Initial usage model can be built
from results from requirements engineering. During operation of the
SBA, usage frequencies computed from monitoring events are used
to automatically update the “usage model”
6) Prediction: Augments testing data with monitoring data and
makes the actual QoS prediction for the services in the SBA
7) Adaptation: Based on the prediction results, adaptation requests
are issued if the expected quality will be violated. We focus on
adaptation by dynamic service binding (services are selected and
dynamically substituted at runtime)
PROSA Online Testing of QoS: Framework Activities (2)
© UniDue
Steps of the approach:
1.Build Usage Model
–We divide the execution of SBA into periods Pi - Between periods, usage model is updated
- Let ψk,i denote the usage probability for a service Sk in period Pi
2. Exploit Usage Model for Testing
–For simplification, let:
– m = number of time points within period
– qk = maximum number of tests allowed for service Sk per period
We compute # of data points estimated to be expected from monitoring in Pi: mmonitoring,k,i = ψk,i * m
Based on the above, we compute # of additional data points to be collected by testing in Pi:
mtesting,k,i = max (0;qk – mmonitoring,k,i)
P1 P2 Pi
Test? …
ti,j t1,1 t1,m
… Test?
Usage Model for P2
© UniDue
PROSA Online Testing of QoS: Technical Solution
Note: For 3rd party services, the number of allowable tests can be limited due to economical (e.g., pay per service invocation) and technical considerations (testing can impact on the availability of a service)
© UniDue
• True Positives (TP): when prediction predicts a failure and the service turns out
to fail when invoked during the actual execution of the SBA (i.e., actual failure)
• False Positives (FP): when prediction predicts a failure although the service
turns out to work as expected when invoked during the actual execution of the SBA (i.e., no actual failure)
• False Negatives (FN): when prediction doesn’t predict a failure although the
service turns out to fail when invoked during the actual execution of the SBA (i.e., actual failure)
• True Negatives (TN): when prediction doesn’t predict a failure and the service
turns out to work as expected when invoked during the actual execution of the SBA (i.e., no actual failure)
Actual Failure
Actual Non-Failure
Predicted Failure
TP FP
Predicted Non-Failure
FN TN
Measuring Accuracy Introducing TP, FP, FN and TN
To measure the accuracy of failure prediction, we take into account the following four cases:
t
Res
po
nse
tim
e se
rvic
e S2
time
Running SBA
S1 S2 S3
Predictor for response time
Monitored response time
…
Unnecessary Adaptation
Missed Adaptation
© UniDue
Actual Failure
Actual Non-Failure
Predicted Failure
TP FP
Predicted Non-Failure
FN TN
Measuring Accuracy Computing TP, FP, FN and TN
The four cases are computed as the following:
How many of the predicted failures were
actual failures?
How many of the actual failures have
been correctly predicted as failures?
How many of the predicted non-failures
were actual non-failures?
How many of the actual non-failures
have been incorrectly predicted as failures?
Note: smaller f is preferable. How many
predictions were correct?
Precision:
Recall (true positive rate):
Accuracy:
Negative predictive value:
False positive rate:
Note: Actual failures are rare prediction that always predicts “non-failure” can achieve high accuracy a. …
f Unnecessary Adaptation
1-r Missed Adaptation
Based on the previous cases, we compute the following metrics:
Measuring Accuracy Contingency Table Metrics (see [Salfner et al. 2010])
© UniDue
Learning Package Overview
Motivation
– Fault Prediction and Proactive Adaptation
Fault Prediction through Online Testing (OT)
– PROSA: Violation of Quality of Service (QoS)
– JITO: Violation of Protocol
Discussions
Summary
© UniDue
To evaluate PROSA, we conducted an exploratory experiment with the following setup:
– Prototypical implementation of prediction approaches (see next slide)
– Simulation of example abstract service-based application (the workflow in the diagram) (100 runs, with 100 running applications each)
– (Post-mortem) monitoring data from real Web services (e.g., Google, 2000 data
points per service; QoS = performance) [Cavallo et al. 2010]
– Measuring contingency table metrics (for S1 and S3)
PROSA Online Testing of QoS: Evaluation
© UniDue
Prediction model = Arithmetic average of data points:
Initial exploratory experiments indicated that number of past data points (n) impacts on accuracy
Thus, in the exp. three variations of the model were considered:
– n = 1, aka. “point prediction” prediction value = current value
– n = 5 prediction value = the average of last 5 past data points
– n = 10 prediction value= the average of last 10 past data points
© UniDue
PROSA Online Testing of QoS: Prediction Models
S3
Considering the different prediction models:
•no significant difference in precision (p) & neg. predictive value (v)
•recall (r) ~ false positive rate (f) “conflicting”!
•accuracy (a) best for “point prediction”
© UniDue
PROSA Online Testing of QoS: Results
S1
Considering the different prediction models:
•no significant difference in precision (p) & neg. predictive value (v)
•recall (r) ~ false positive rate (f) “conflicting”!
•accuracy (a) best for “point prediction”
•difference from S3: “last 5” has highest recall for S1
© UniDue
PROSA Online Testing of QoS: Results
S3
Comparing PROSA with Monitoring:
• For S3, prediction based on online testing (ot) is improved along all metrics when compare with prediction based on monitoring (mon) only
© UniDue
PROSA Online Testing of QoS: Results
S1
Comparing PROSA with Monitoring:
• Improvement is not so high for S1 (already lots of monitoring data)
© UniDue
PROSA Online Testing of QoS: Results
PROSA Online Testing of QoS: Discussions
Pros:
– Generally improves accuracy of failure prediction
– Exploits available monitoring data
– Beneficial in situations where prediction accuracy is critical while available past monitoring data is not enough to achieve this
– Can complement approaches that make prediction based available monitoring data (e.g., approaches based on data mining) and require lots of data for accurate prediction
– Can be combined with approaches for preventive adaptation, e.g.,:
- SLA violation prevention with machine learning based on predicted service failures Run-time verification to check if “internal” service failure leads to “external” violation of SLA
Cons:
– Assumes that testing a service doesn’t produce sides effects
– Can have associated costs due to testing:
One can use the usage model to determine the need for the testing activities
Require further investigation into cost models that relate costs of testing vs. costs of compensation of wrong adaptation
© UniDue
Learning Package Overview
Motivation
– Fault Prediction and Proactive Adaptation
Fault Prediction through Online Testing (OT)
Discussions
Summary
© UniDue
2 complementary solutions for failure prediction
based on Online Testing
– PROSA: Prediction of QoS violation
– JITO: Prediction of protocol violation
Internal Failure does not necessarily imply external
failure (i.e., violation of SLA / requirement of composed
service)
Combine “internal” failure prediction approaches with “external”
failure prediction:
- TUW & USTUTT: SLA violation prevention with machine learning based
on predicted service failures
- UniDue: Run-time verification to check if “internal” service failure leads to
“external” violation of SLA
© UniDue
Summary
• [Sammodi et al. 2011] O. Sammodi, A. Metzger, X. Franch, M. Oriol, J. Marco, and K. Pohl. Usage-based online testing for proactive adaptation of service-based applications. In COMPSAC 2011
• [Metzger 2011] A. Metzger. Towards Accurate Failure Prediction for the Proactive Adaptation of Service-oriented Systems (Invited Paper). In ASAS@ESEC 2011
• [Metzger et al. 2010] A. Metzger, O. Sammodi, K. Pohl, and M. Rzepka. Towards pro-active adaptation with confidence: Augmenting service monitoring with online testing. In SEAMS@ICSE 2010
• [Hielscher et al. 2008] J. Hielscher, R. Kazhamiakin, A. Metzger, and M. Pistore. A framework for proactive self-adaptation of service-based applications based on online testing. In ServiceWave 2008
• [Dranidis et al. 2010] D. Dranidis, A. Metzger, and D. Kourtesis. Enabling proactive adaptation through just-in-time testing of conversational services. In ServiceWave 2010
© UniDue
Further S-Cube Reading
[Salehie et al. 2009] Salehie, M., Tahvildari, L.: Self-adaptive software: Landscape and research challenges. ACM Transactions on Autonomous and Adaptive Systems 4(2), 14:1 – 14:42 (2009)
[Di Nitto et al. 2008] Di Nitto, E.; Ghezzi, C.; Metzger, A.; Papazoglou, M.; Pohl, K.: A Journey to Highly Dynamic, Self-adaptive Service-based Applications. Automated Software Engineering (2008)
[PO-JRA-1.3.1] S-Cube deliverable # PO-JRA-1.3.1: Survey of Quality Related Aspects Relevant for Service-based Applications; http://www.s-cube-network.eu/results/deliverables/wp-jra-1.3
[PO-JRA-1.3.1] S-Cube deliverable # PO-JRA-1.3.5: Integrated principles, techniques and methodologies for specifying end-to-end quality and negotiating SLAs and for assuring end-to-end quality provision and SLA conformance; http://www.s-cube-network.eu/results/deliverables/wp-jra-1.3
[S-Cube KM] S-Cube Knowledge Model: http://www.s-cube-network.eu/knowledge-model
[Trammell 1995] Trammell, C.: Quantifying the reliability of software: statistical testing based on a usage model. In ISESS’95. Washington, DC: IEEE Computer Society, 1995, p. 208
[Musa 1993] Musa, J.: Operational profiles in software-reliability engineering. IEEE Software, vol. 10, no. 2, pp. 14 –32, mar. 1993
[Salfner et al. 2010] F. Salfner, M. Lenk, and M. Malek. A survey of online failure prediction methods. ACM Comput. Surv., 42(3), 2010
[Cavallo et al. 2010] B. Cavallo, M. Di Penta, and G. Canfora. An empirical comparison of methods to support QoS-aware service selection. In PESOS@ICSE 2010
© UniDue
References
The research leading to these results has
received funding from the European
Community’s Seventh Framework
Programme [FP7/2007-2013] under grant
agreement 215483 (S-Cube).
© UniDue
Acknowledgment