19
Envisioning Defect Free Software through Innovative Process Optimization Pavan Kulkarni, Lead QA Engineer Sanjan Bora, Lead QA Engineer Nanthini Sukumaran, Senior QA Specialist Software AG Bangalore Technologies Pvt. Ltd

Abstractqaistc.com/2017/wp-content/uploads/2017/09/stc_2017... · Web viewSoftware AG Bangalore Technologies Pvt. Ltd Abstract “A stitch in time saves nine” …no !! We will go

Embed Size (px)

Citation preview

Envisioning Defect Free Software through Innovative Process Optimization

Pavan Kulkarni, Lead QA EngineerSanjan Bora, Lead QA Engineer

Nanthini Sukumaran, Senior QA Specialist

Software AG Bangalore Technologies Pvt. Ltd

1. Abstract

“A stitch in time saves nine” …no!!We will go with “A SMART stitch in time saves Ninety-nine”   Delivering a Product Suite without defects is a myth, especially in a complex Middleware Customer environment where the platform can be customized to suit business requirements in numerous ways. Through this paper, we would like to explore different innovative process methodologies with a vision of achieving 100% test adequacy in our process leading to “defect-free” applications and products.

Firstly, a robust defect fix verification and release process is very crucial, particularly when it involves customer defects which are critical to any business. At Software AG, we call it the “Fix Verification Process”. The process involves multiple scheduled stages of functional and Integration testing of the defect fixes before being released to the customer.

Second topic is Continuous Customer Defect Triaging Process. An established process to continuously triage Customer defects, filter out Integration scenarios, cross-check if these scenarios are part of the existing Suite and enhance the test suites with these scenarios using our automation framework can be very beneficial in to improve test adequacy and boosting customer confidence.

Lastly, intelligent RCA through Machine Learning algorithms can be used to quickly predict root causes of problems and arrive at possible solutions. We will discuss about a step-by-step mechanism of how this can be achieved and the challenges ahead of us.

With the disruptive process methodologies we are discussing in this paper, we intend to change the approach towards testing and defect resolution, leading to lean and agile organizations. Although we have mainly highlighted the various topics from an integration perspective in a complex middleware platform, we believe the process can be molded and adapted to make it relevant to any team or process.

2. What is middleware?

Middleware is software which lies between an operating system and the applications running on it. Essentially functioning as hidden translation layer, middleware enables communication and data management for distributed applications. Middleware masks the heterogeneity of computer architectures, operating systems, programming languages, and networking technologies to facilitate application programming and management. Using middleware allows users to perform such requests as submitting forms on a web browser or allowing the web server to return dynamic web pages based on a user’s profile.

Middleware also provides messaging services so that different applications can communicate using messaging frameworks like SOAP, REST, web services and JSON. it is the back bone of modern day digital technology that enables disparate applications to interact with other applications and create a desired outcome which is not possible to achieve in standalone systems. It takes care of interoperability, simplifies the complexity of distributed systems, provides solution to application programming interfaces and gives us the leverage to scale as much as we want and when we want.

3. Challenges with Customer Defects in Middleware

In a Middleware customer environment, the platform can be customized according to business requirements. Our products and platforms are developed so that customers can integrate two or more products in numerous ways to build complex solutions. Though the products are being tested at both functional and integration levels, delivering a product suite without defects is nearly impossible. On top of the customary challenges in Middleware testing, there are additional challenges related to defect fixes.

Let us take a couple of instances here:

1) Defect A in Product A is fixed and delivered to Customer A. Defect B in Product B is fixed and delivered to Customer B.However, due to a dependency with Product C, Defect A re-surfaces again in Customer A environment, leading to bad situation with customer A.

2) Defect A in Product A is fixed and delivered to Customer A. Meanwhile, customer B & C using the same functionality in a different version report the same defect which leads to multiple customer escalations.

It is a daunting task to sort the dependencies for products and sync the defect fixes in such a way that changes in one product do not affect some other product at an integration level. However, with ever increasing customer expectations and competitive environments, we have to look at innovative and efficient methods to minimize defects in our software. Test Adequacy is the most important piece of the release process towards that goal. Since many years now, organizations have been implementing testing methodologies or processes to improve Test Adequacy and keep the defect count to as less as possible. However, along with the traditional methods, we need to adopt some disruptive processes with the help of current technologies which will further our ambition towards defect free software. We will discuss a few of those ideas and implementations in the coming sections.

4. Fix Verification Process

The idea for a Fix Verification process originated from the necessity to make sure all the dependencies between products are resolved and defect fixes coming in from different products are in sync and tested at an integration level.

To begin with, the product teams register on an internal web application to participate in the Fix Verification cycle. This cycle is repeated every 2 weeks for each release and product teams can opt to participate in a cycle or the next one. The process consists of 2 stages. In Phase 1, the defects coming in from different products are checked in to the internal product repositories. The product teams execute their regression tests on the fixes. In Phase 2, the product teams move the verified fixes to an Integration repository before the next Integration test cycle starts. Once all the registered product fixes are on the integration repository, the Integration Regression Suite is executed to validate integration scenarios for the products. If test cases related to one product fail, the product team is notified. All the validated fixes are moved to GA repositories and made available to customer.

Below is a diagrammatic representation of the Fix Verification process:

The Fix Verification process conceived in the Integration team has been hugely instrumental in bringing all the defect fixes under one umbrella and has enabled us to uncover any Integration issues which may have been introduced while fixing the defects.

5. Continuous Defect Triaging Process

Even with all these improvements in Fix Verification process, quite a few defects have been leaking into the customer environment. One of the biggest challenges for the Integration teams for middleware platforms has always been the comprehension of how the customers use our platforms to customize their solutions. We started thinking about how we can leverage the knowledge we are exposed to while working on customer defects to enhance our Test Adequacy. This thought process led to what we now call Continuous Defect Triaging Process.

Let us elaborate the process further by taking a step-by-step approach:

Automated Triaging of Customer Defects

Customer issues are logged to our defect tracking system. A Java based framework is running in the background constantly looking for Customer issues being logged and fixed in the defect tracking system. The main challenge here is to identify which of these defects are related to Integration. This is achieved with the help of two artefacts: 1) The Fix Readme document & 2) The Integration matrix.

The Readme document consists of details about multiple defects and the fixes, written in a specific format. First, a ReadMe Validator is run to check if the ReadMe document is in the correct format. Next, using Java APIs, we extract details of the defects from the ReadMe document. Once the defect IDs are available, JIRA APIs and JQL are used to extract more information about the defects. By looking for keywords in the defect summary and description, the framework can determine if the defect involves more than one product, with a fair degree of accuracy.

However, it is always possible that the products mentioned in the defects do not have any common integration points at all and hence the defect is not an integration scenario. To determine if the products can practically be integrated in any logical way, we use the Integration Matrix.

The Integration matrix is an exhaustive 2D matrix which just tells us if any two products in the SAG Suite can be integrated in a solution. All the 80+ products are lined along both the axes and intersection co-ordinates which can be integrated are marked.

Our framework extracts the products involved from the ReadMe document and compares it with the integration matrix to determine if the defect is actually an integration issue.

Intelligent Review of Triaged Defects

The next step is to actually determine if there are existing test scenarios already written around the defect. This is achieved by an intelligent tool which extracts keywords from the defect summary and description and compares with the existing test cases using Java partial match search algorithms. The output of this step is a set of existing test cases relevant the defect in question.

Once it is determined that there are gaps in the test cases, existing test cases are re-factored or new test cases added to the Suite.

Adding test cases to the automated test suite

This is a manual part of the process. A QA Engineer designs new test cases or refactors existing test cases for the defect scenario and all the functionality being affected by the scenario. These test cases are then quickly automated and added to the existing regression test scenarios to be part of our continuous integration system.

Propagation to other releases

Customers use different releases or versions of our products or platforms. So when one customer finds a defect in one of the releases, the fix is applied to all the releases being used by different customers. In the same way, the automated test cases written for the release where the defect was actually found have to be propagated to all the older and newer releases. This is again done by our Jenkins framework. This step is very important as there is a high probability that another customer using the same functionality will face the same issue.

6. Architecture of Continuous Triaging Implementation

Below is the architecture for the Continuous Triaging Process, highlighting the tools and technologies used at each stage of the process:

7. Intelligent Root Cause Analysis

Yet another area which needs lot of manual intervention and can vastly from the latest technologies is the root cause analysis of a defect or issue. Intelligent Root Cause Analysis (RCA) can be used quickly pinpoint the root cause so that the developer or tester can focus entirely on fixing the issue and verifying it. Through Machine Learning methodologies, this can be achieved. The RCA system can also be embedded with learning mechanisms which have the capability to feed themselves continuously and can predict the solutions automatically with its learning techniques.

The input for the analysis can come from any source, for instance Log files which is basically text data. The first step involves pre-processing of the data to extract exceptions from Log files. Second step involves framing the observations for each exception and create Feature Vectors which represent the exception and can be facilitate further processing.

In the third step of pinpointing the root cause , the system either has to refer to a database which can track down the products and functionalities corresponds to the root cause and list out the highest probability solutions. Another approach is to dig through existing similar defects for possible solutions and lists out the highest probability root cause and solutions for the given defect or exception. From a Machine Learning perspective, it is a multi-label classification problem. Methodologies such as Random Forests or Neural Networks are best to tackle this scenario. Both these algorithms are based on learning by constructing decision trees and predicting the outputs after calculating the mode or mean prediction of individual trees.

However, there are challenges. Sometimes the errors in the log files are quite ambiguous, which might point to a large number of root causes and pin-pointing one becomes extremely difficult. Another challenge is predicting a correct root cause for an integration scenario, which is sometimes difficult for humans too. And these are just scratching the surface. Intelligent RCA with Machine Learning is a challenging topic and is still in its nascent stage. However, extensive research is being done and it will not be long before we take huge leaps in achieving an accurate error analysis system with no or minimal human intervention.

8. Improvements in Test Adequacy

The Customer Triaging process which we have been using has been instrumental in improving our overall Test Adequacy. Based on the knowledge gained during the triaging of different customer issues, we have come up with an exhaustive checklist of types of Test cases which have to be part of Test Design phase. The QA Engineer designing the test cases uses this checklist as reference to ensure test coverage during the test design phase. The checklist also acts as a guideline for the test case reviewer. We are continuously adding categories to this checklist through our learnings from each release.

9. Future Enhancements

We are working on few enhancements to further our goal of zero-defect software.

1) As discussed in one of the previous sections, intelligent Root Cause Analysis is one of the main areas which we have been working. In a complex integrated setup, pinpointing a root cause of a problem takes away a major chuck of time because of the number of components involved. If we can build a system or framework which will analyze various probable root causes, predict the exact one and simultaneously learn to fix problems, the cost benefits for an organization can be humungous.

2) Another important aspect we are working on is automated generation of test cases for the defects which are being fixed. The idea is to use NLP to read the defect details and generate test cases based on the details. This should help reduce quantitative effort during the test design phase.

10. Conclusion

While reaching the goal of defect free software seems like a far-fetched idea, there are lot of cutting-edge technologies available now which are bringing us towards that goal faster than before. If we put our minds together and come up with innovative ways of improving the process using these technologies and develop a continuous process of learning and applying it, the vision which we are trying to achieve will not seem that far at all.

11. References and Links

www.degdigital.com

www.azure.microsoft.com

www.bobemiliani.com

en.wikipedia.org

www.upwork.com

Author BiographyPavan Kulkarni is a Lead QA Engineer in the Suite Integration team at Software AG. He has a total experience of 11 years and has worked in multiple domains like Telecom, Networking and now Middleware. Having been exposed to different testing methodologies in his career, he is passionate about optimizing and improving Automation and QA processes in the organization.

Sanjan Bora is working as a Lead QA Engineer in Software AG having more than 13 years of total experience. He has worked on financial domain in the early period of his career and is currently into middleware testing where he contributed widely in different areas like automation, platform testing, SOA testing, fix release testing (to name a few).

Nanthini Sukumaran is a Senior QA Specialist in Software AG, with 9 years of experience in multiple domains. Being an automation specialist, she has worked on organization level unified frameworks for mobile as well as web automation. She was also instrumental in coming up new utilities and process changes resulting in more efficient QA processes. Her current interests include Machine Learning and Data science.