Net benefits analysis of Visual Regression Testing in a...

Linköping University | Department of Computer and Information

Science Master’s thesis, 30 ECTS | Information Technology

2020 | LIU-IDA/LITH-EX-A-20/024-SE

Net benefits analysis of Visual

Regression Testing in a

Continuous integration

environment: An industrial case

Axel Löjdquist

Examiners: Lena Buffoni

Supervisors: John Tinnerholm

Company supervisor: David Alcobero

Linköpings universitet SE–581 83

Linköping +46 13 28 10 00

www.liu.se

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida: http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

硕士学位论文 Dissertation for Master’s Degree

(工程硕士) (Master of Engineering)

持续集成环境下视觉回归测试的净收益分析：工业案例研究

Net benefit analysis of Visual Regression Testing in a

Continuous integration environment: An industrial

Case study

Axel Löjdquist 阿克塞尔

2020 年 9 月

Linköping University

UnUniversity

国内图书分类号：TP311 学校代码：

国际图书分类号：681 密级：公开

工程硕士学位论文

Dissertation for the Master’s Degree in Engineering

(工程硕士)

(Master of Engineering)

持续集成环境下视觉回归测试的净收益分析：工业案例研究

Case study

硕士研究生：阿克塞尔

导师： HIT 臧天仪

副导师： LiU Lena Buffoni

实习单位导师： David Alcobero, CTO

申请学位：工程硕士

学科：软件工程

所在单位：软件学院

答辩日期： 2020 年 6 月

授予学位单位：哈尔滨工业大学

Classified Index: TP311

U.D.C: 681

Dissertation for the Master’s Degree in Engineering

Case study

Candidate： Axel Löjdquist

Supervisor： Lena Buffoni

Associate Supervisor: John Tinnerholm

Industrial Supervisor: David Alcobero, CTO

Academic Degree Applied for： Master of Engineering

Speciality： Software Engineering

Affiliation： School of Software

Date of Defence： September, 2020

Degree-Conferring-Institution： Harbin Institute of Technology

Thesis for Master’s Degree at HIT and LiU

摘要

维护软件质量是一项艰巨的任务，其原因有很多，例如公司成长，上市时间要求，

代码复杂性等等。 GUI 测试工具和持续集成（CI）是当今常见的实践，用于解决维

护软件质量方面的某些问题。但是，这些技术带来了一系列挑战。视觉回归测试（VRT）

是一种特殊的 GUI 测试技术，专注于基于图像的断言。本研究介绍了在工业环境中

为 CI 环境引入 VRT 的利弊的实施和调查。此外，本文研究了在此过渡过程中需要

考虑的因素。结果表明，这种方法有很多好处，例如更快的反馈时间和测试频率的

增加。但是，还发现了缺陷和影响，例如测试维护和组织方面的问题，这表明公司

在实施之前需要仔细考虑。

关键词：视觉回归测试，持续集成，自动化测试，GUI测试

Abstract

Maintaining quality in software is a difficult task for several reasons, such as, company

growth, time-to-market demands, code complexity and more. GUI testing tools and

Continuous Integration (CI) are common practice today to tackle some of the issues with

maintaining software quality. However, these techniques bring a set of challenges. Visual

Regression Testing (VRT) is a special kind of GUI testing technique focused upon image-

based assertions. This study presents an implementation and investigation of the benefits

and drawbacks of introducing VRT for a CI environment in an industrial context.

Additionally, the thesis investigates factors that need to be considered upon this transition.

The results show that benefits are associated with this approach, such as, quicker feedback

times and an increase in testing frequency. However, drawbacks and implications were

also identified, such as, test maintenance and organizational concerns, indicating careful

consideration needs to be taken by an organization before proceeding with an

implementation.

Keywords: Visual Regression Testing; Continuous Integration; Test automation;

GUI testing;

Glossary

SaaS Software as a Service

ATE Automatic Test Execution

CI Continuous Integration

VGT Visual GUI Testing

e2e End-to-end

R&R Record and Replay

VCS Version Control Systems

CD Continuous Deployment

DRS Design Research Study

VRT Visual Regression Testing

DI Dependency Injection

IoC Inversion of Control

PaaS Platform as a Service

摘要 ................................................................................................................... I

ABSTRACT .......................................................................................................... II

GLOSSARY ........................................................................................................ III

TABLE OF FIGURES ........................................................................................ IX

TABLE OF TABLES ........................................................................................... X

CHAPTER 1 INTRODUCTION ........................................................................... 1

1.1 MOTIVATION & BACKGROUND ..................................................................... 1

1.2 AIM AND APPROACH ..................................................................................... 2

1.3 RESEARCH QUESTIONS .................................................................................. 5

1.4 DELIMITATIONS ............................................................................................ 5

1.5 MAIN CONTENT AND ORGANIZATION OF THE THESIS ..................................... 5

CHAPTER 2 THEORY ......................................................................................... 6

2.1 TEST AUTOMATION ....................................................................................... 6

2.1.1 The 1st Generation ................................................................................. 6

2.1.2 The 2nd Generation ................................................................................ 7

2.1.3 The 3rd Generation ................................................................................ 7

2.1.4 Visual Regression Testing .................................................................... 8

2.2 CONTINUOUS INTEGRATION AND VISUAL GUI TESTING: BENEFITS AND

DRAWBACKS IN INDUSTRIAL PRACTICE ...................................................................... 9

2.3 TRANSITIONING MANUAL SYSTEM TEST SUITES TO AUTOMATED TESTING:

AN INDUSTRIAL CASE STUDY ..................................................................................... 9

2.4 AUTOMATED SYSTEM TESTING USING VISUAL GUI TESTING TOOLS: A

COMPARATIVE STUDY IN INDUSTRY ......................................................................... 10

2.5 TOO MUCH AUTOMATION ............................................................................ 10

2.6 TRADE-OFFS BETWEEN AUTOMATED AND MANUAL TESTING ....................... 11

2.7 WHEN AND WHAT TO AUTOMATE IN SOFTWARE TESTING? .......................... 12

2.7.1 SUT-related factors ............................................................................. 12

2.7.2 Test-related factors ............................................................................. 12

2.7.3 Test tool-related factors ...................................................................... 13

2.7.4 Human and organizational factors ...................................................... 13

2.7.5 Cross-cutting and other factors ........................................................... 13

2.8 DATA COLLECTION ...................................................................................... 13

CHAPTER 3 METHOD ...................................................................................... 15

3.1 DESIGN RESEARCH STUDY ........................................................................... 15

3.1.1 Cycle 1 ................................................................................................ 15

3.1.2 Cycle 2 ................................................................................................ 15

3.1.3 Cycle 3 ................................................................................................ 16

3.2 QUALITATIVE DATA COLLECTION .............................................................. 16

3.2.1 Semi-structured interviews ................................................................. 16

3.2.2 Unstructured observations .................................................................. 19

3.2.3 Informal literature review ................................................................... 20

3.3 COLLECTED METRICS ................................................................................. 20

3.4 TRIANGULATION ......................................................................................... 21

CHAPTER 4 SYSTEM REQUIREMENT ANALYSIS...................................... 23

4.1 THE PROBLEM SITUATION............................................................................ 23

4.2 THE GOAL OF THE SYSTEM .......................................................................... 24

4.3 SELECTED- & RELEVANT TECHNOLOGIES ................................................... 24

4.3.1 NestJS ................................................................................................. 25

4.3.2 CircleCI ............................................................................................... 25

4.3.3 JWT ..................................................................................................... 25

4.3.4 Cypress ................................................................................................ 25

4.4 THE FUNCTIONAL REQUIREMENTS ............................................................... 26

4.4.1 Scheduler ............................................................................................ 26

4.4.2 CI platform .......................................................................................... 28

4.5 THE NON-FUNCTIONAL REQUIREMENTS ....................................................... 30

4.6 BRIEF SUMMARY ......................................................................................... 31

CHAPTER 5 SYSTEM DESIGN ........................................................................ 32

5.1 PROCESSES & ARCHITECTURE PRIOR TO THE IMPLEMENTATION .................. 32

5.1.1 Freshdesk ............................................................................................ 33

5.1.2 Github ................................................................................................. 34

5.1.3 Jira ....................................................................................................... 34

5.1.4 Vizlib Notifier (Slack bot) .................................................................. 35

5.1.5 Extension Builder ............................................................................... 35

5.2 ARCHITECTURE FOR THE NEW SOLUTION .................................................... 36

5.2.1 CI platform - CircleCI ........................................................................ 36

5.2.2 Test Server .......................................................................................... 37

5.2.3 Scheduler ............................................................................................ 37

5.2.4 Github ................................................................................................. 37

5.3 KEY TECHNIQUES ........................................................................................ 38

5.3.1 Dependency injection ......................................................................... 39

5.3.2 Designing for testability ..................................................................... 39

5.4 BRIEF SUMMARY ......................................................................................... 40

CHAPTER 6 SYSTEM IMPLEMENTATION ................................................... 41

6.1 THE ENVIRONMENT OF SYSTEM IMPLEMENTATION ...................................... 41

6.1.1 Scheduler ............................................................................................ 41

6.1.2 CircleCI ............................................................................................... 41

6.1.3 Test server ........................................................................................... 42

6.2 KEY PROGRAM FLOW CHARTS ..................................................................... 42

6.2.1 CircleCI workflows............................................................................. 42

6.2.2 Implemented workflow on CircleCI ................................................... 43

6.3 KEY INTERFACES OF THE SOFTWARE SYSTEM ............................................. 46

6.3.1 The Scheduler ..................................................................................... 46

6.3.2 Github ................................................................................................. 48

6.3.3 CircleCI ............................................................................................... 48

6.4 BRIEF SUMMARY ......................................................................................... 50

CHAPTER 7 RESULTS ...................................................................................... 51

7.1 INTERVIEW SESSIONS .................................................................................. 51

7.1.1 Test Maintenance ................................................................................ 53

7.1.2 Implementation Effort......................................................................... 55

7.1.3 Testing Tool ........................................................................................ 56

7.1.4 Feedback Time .................................................................................... 57

7.1.5 Test Reliability.................................................................................... 59

7.1.6 Test Automation ................................................................................. 61

7.1.7 Development Process .......................................................................... 62

7.1.8 Organization ........................................................................................ 65

7.2 UNSTRUCTURED OBSERVATIONS ................................................................ 67

7.2.1 Testing tools ........................................................................................ 68

7.2.2 Tested system ...................................................................................... 69

7.2.3 Continuous Integration platform......................................................... 69

7.2.4 Support software ................................................................................. 70

CHAPTER 8 DISCUSSION ................................................................................ 71

8.1 RESULTS ..................................................................................................... 71

8.1.1 Test maintenance ................................................................................ 71

8.1.2 Implementation effort ......................................................................... 72

8.1.3 Testing tool ......................................................................................... 73

8.1.4 Feedback time ..................................................................................... 73

8.1.5 Test reliability ..................................................................................... 74

8.1.6 Test automation................................................................................... 75

8.1.7 Development process .......................................................................... 76

8.1.8 Organization ........................................................................................ 77

8.1.9 Unstructured Observations ................................................................. 78

8.2 METHOD ..................................................................................................... 80

8.2.1 Construct validity................................................................................ 80

8.2.2 Internal validity & Reliability ............................................................ 80

8.2.3 External validity.................................................................................. 81

8.3 WORK IN A WIDER CONTEXT ....................................................................... 82

CHAPTER 9 CONCLUSION .............................................................................. 83

9.1 FUTURE WORK ............................................................................................ 85

REFERENCES .................................................................................................... 87

APPENDIX A: INTERVIEW QUESTIONS CYCLE 1...................................... 91

APPENDIX B: INTERVIEW QUESTIONS CYCLE 2 ...................................... 92

APPENDIX C: INTERVIEW QUESTIONS CYCLE 3 ...................................... 93

STATEMENT OF ORIGINALITY AND LETTER OF AUTHORIZATION .... 94

ACKNOWLEDGEMENT ................................................................................... 95

Table of figures

Figure 1.1 Overview of DRS methodology. Adapted from E. Alégroth et al. [7].

....................................................................................................................................... 4

Figure 3.1 Overview of the hierarchical tree diagram. Adapted from E. Alégroth

et al. [3]. ....................................................................................................................... 20

Figure 4.1 Use case diagram for the scheduler. .................................................. 27

Figure 4.2 Use case diagram for the CI platform. ............................................... 29

Figure 5.1 Vizlib's prior system architecture. ..................................................... 33

Figure 5.2 Example of a slack bot message for a release. Names and personal

images have been censored. ........................................................................................ 36

Figure 5.3 Vizlib’s system architecture after the implementation. ..................... 38

Figure 6.1 Overview of the implemented workflow on CircleCI. ...................... 43

Figure 6.2 Sequence diagram over how the different systems communicate upon

a triggered event from Github. .................................................................................... 45

Figure 6.3 Status of triggered workflow viewed from Github. ........................... 48

Figure 6.4 Example of CircleCI job 2, viewed in CircleCI. ............................... 49

Figure 6.5 An overview of the CircleCI workflow viewed from the platform

webpage. ...................................................................................................................... 49

Figure 6.6 An overview of the testing artifacts stored in CircleCI. .................... 50

Figure 6.7 An overview of the failed tests in CircleCI, showing the tests that

have failed during an execution................................................................................... 50

Figure 7.1 Documented observations throughout all cycles. .............................. 67

Table of tables

Table 1 Codes used for the coding process ......................................................... 18

Table 2 Functional requirements for the scheduler ............................................. 28

Table 3 Functional requirements for the CI platform. ........................................ 30

Table 4 Non-functional requirements for the scheduler. .................................... 31

Table 5 Non-functional requirements for the CI platform. ................................. 31

Table 6 Best practices summarized for testability presented by Alwardt et. al

[35]............................................................................................................................... 40

Table 7 Run-time environment for the scheduler. ............................................... 41

Table 8 Run-time environment for the jobs on the CI platform. ........................ 42

Table 9 Run-time environment for the test server. ............................................. 42

Table 10 Endpoints for the scheduler. ................................................................. 46

Table 11 Total occurrences of codes from coding process. ................................ 51

Table 12 Code appearances from each interview linked to the corresponding

cycle. ............................................................................................................................ 52

Table 13 Background and collected metrics from Cycle 1 interviews. .............. 52

Table 16 Type of observations made linked to each cycle. ................................ 67

Chapter 1 Introduction

This chapter presents the introduction for the thesis, in Section 1.1 the

motivation and background are described. Section 1.2 describes the aim of this

thesis, with an overview of the method used to achieve the aim. Section 1.3 presents

the research questions for this thesis. Section 1.4 explains the scope and delimitations

for the research and the implementation. Section 1.5 describes the main content and

organisation of the thesis.

1.1 Motivation & Background

As a Software as a Service (SaaS) company grows, its product base can increase

as well as the functionality of the products. It may thus increase a product's

complexity and make it more bug-prone. When employing new developers with

different experiences and backgrounds, it forces the company to adopt more mature

processes. This poses a challenge for maintaining the quality of the products. To help

maintain it, automated test execution (ATE) can be a powerful tool to automate

certain processes within the Continuous Integration (CI) environment and in extent

the software development life cycle [1].

However, introducing automation brings a set of challenges. A test that may be

very simple to check manually can become very complex from an ATE perspective,

i.e. applications with nondeterministic behavior or modifications with minor

appearance changes to a GUI [2]. Computers can easily determine if two results

differ but are less effective in distinguishing the significance. Such as, being able to

tell that difference A is OK meanwhile difference B is not OK. Consequently,

ensuring test validity and minimizing false positives becomes a complication with

GUI testing.

With many testing techniques today it may be hard to capture the end-user

perspective, i.e. unit testing, which is a common practice today to create automated

scripts for [3]. Visual Regression Testing (VRT) is a technique that tackles this

problem by running the application as an end-user and captures images of certain

areas of application in specific states. It later uses image comparison algorithms to

compare if the screenshot taken of the System under test (SUT) matches with a

baseline image. The image baseline defines how the application should look in a

specific state. If the image differs over a given threshold, the test fails. Determining

the feasibility, applicability, benefits, and drawbacks of introducing VRT to a CI

environment in industry is what this thesis aims to address.

This thesis will be conducted at a small to mid-size company with 50+

employees called Vizlib. Vizlib produces software used for a business intelligence

program called Qlik. Qlik is a platform that performs data integration, user-driven

business intelligence and conversational analytics [4]. Qlik offers extensions to their

platform which enables 3rd party vendors to create custom visualizations used for

data analytics. These visualizations are the kind of software that Vizlib produces and

are referred to as extensions. Today Vizlib currently develops and maintains 28

extensions and their product base is increasing. The testing process done at Vizlib

today is only performed manually. There are three different scenarios when testing is

done. First, whenever a new feature or bug is resolved, exploratory testing is done by

a tester to confirm that the fix works properly. Secondly, one or two developers will

code review the changes done when a pull request is made. Lastly, prior to a new

release, the testers will perform a smoke test ensuring that everything seems to be

working.

Introducing automated testing at Vizlib can be a helpful step in maintaining the

quality of their growing product base. The smoke tests performed at Vizlib today can

be a natural starting point for ATE. The smoke tests are performed by a tester that

gives input or performs a sequence of steps and observes the output based upon a

predefined protocol of tests, which is End-to-End (e2e) testing [5]. However, this

process becomes very labor-intensive as the release frequency and product base

increases.

1.2 Aim and Approach

The aim of this thesis is to investigate and evaluate some of the benefits and

drawbacks of introducing Visual Regression Testing (VRT), for a CI environment in

an industrial context. The choice of method to attain the results is to use an empirical

Design Research Study (DRS) [6] divided into three different cycles. The first cycle

involves configuring the VRT testing tool ensuring the technical approach is feasible.

Once the test suite is in place the study enters the second cycle. The second cycle

involves integrating the test execution into a CI environment. After the CI

implementation is completed the study enters the final phase. During cycle three the

evaluation of the study results is done. The evaluation includes qualitative data

collected throughout all three cycles. Figure 1.1 shows an overview of the method.

The qualitative data is collected from semi-structured interviews, observations and an

informal literature review. The observations are used in each cycle in order to

capture the progression of the DRS. Throughout the study, the design is reviewed and

adjusted after gained experience and can fallback to past cycles if needed, i.e. the CI

environment requires the testing tool to be further configured. The author of this

thesis performed 9 interviews, 3 interviews were conducted for Cycle 1, 3 interviews

were conducted for Cycle 2 and 3 interviews were conducted for Cycle 3. The data

collection methods aim to capture the areas affected by the implementation in order

to identify the benefits and drawbacks of VRT. These methods additionally capture

experiences in order to determine which factors should be considered when

introducing VRT for a CI environment.

Figure 1.1 Overview of DRS methodology. Adapted from E. Alégroth et al. [7].

1.3 Research Questions

a) What practical benefits and drawbacks are associated with introducing

visual regression testing to an industrial CI environment?

b) Which factors should be considered when implementing VRT in a CI

environment?

1.4 Delimitations

The scope of the thesis is limited to evaluate all the Vizlib extensions (the

developed software) that have tests written for them. The thesis will not evaluate the

tools selected for the CI implementation nor the testing framework used for

executing the VRT tests. Furthermore, the thesis will not evaluate different CI

platforms offered by the market. All the selected technologies used for the

implementation were selected by Vizlib. These technologies are listed and described

in Section 4.3. Additionally, the thesis will not investigate appropriate threshold

values used for the image comparison algorithms. Lastly, this thesis used only

qualitative data sources, meaning no quantitative data sources were used.

1.5 Main content and organization of the thesis

The thesis is structured as followed. Chapter 2 presents the theory and related

work within the test automation and visual testing. Chapter 3 describes the

methodology used for the DRS, and the qualitative data collection methods used for

this thesis. Furthermore, the chapter describes how these methods aim to answer the

stated research questions. Chapters 4 and 5 present the system requirements, the

system design for the implementation and define the problem situation the

implementation aims to address. These chapters include architectural overviews, use-

case diagrams, list of requirements and which key techniques have been used.

Chapter 6 presents the implementation and testing of the system. The chapter

includes flow charts describing the system and how it was tested. The following

chapters present the results and discussions regarding the results. Lastly, the thesis

conclusions are presented.

Chapter 2 Theory

This chapter presents the theory for this thesis. Section 2.1 describes the

background of test automation and GUI testing. Sections 2.2-2.7 present related work

in the investigated area. Section 2.8 provides the theory for data collection methods.

2.1 Test Automation

As the demand increases for quick time-to-market releases, many companies are

limited or compelled to focusing on a regression testing strategy for maintaining high

software quality. A regression testing strategy should be very flexible, so it can be

utilized after i.e. system modification or prior a product release [7]. ATE is a method

that addresses this issue, by speeding up the process and providing feedback from

modifications quicker [8]. It achieves this by decreasing the test execution time

compared to manual testing practices. Additionally, since it’s a computer performing

the regression tests, the human effort can be spent elsewhere. Furthermore, ATE

enables test execution during non-working hours. However, this requires the

automated tests to test on a higher-level than i.e. unit testing or other lower-level

testing techniques. It is uncertain if lower-level techniques can be reliable enough to

capture the entire system or the end-users point of view for high-level systems [9],

[10]. Acceptance testing is an intensive, complex, expensive and labor-intensive

process for GUI applications [11]–[13]. However, there exists testing techniques that

operate on a system-level, such as GUI testing [14]. GUI testing has evolved through

three generations described below.

2.1.1 The 1st Generation

The first generation of GUI testing interacts with the SUT by using screen

coordinates. Record and Replay (R&R) is an often-used technique, however not

necessary, to create test scripts for GUI tests [15]–[17]. It involves a two-step

process. First, it records the inputs and interactions performed by a tester while using

the SUT, such as key-strokes, mouse movement, and clicks. When one of these

events is triggered, the recording captures e.g. the screen coordinates of the mouse.

After the recording is completed, these sequences of events are saved to an

executable test script. Since this technique uses screen coordinates, the tests become

fragile to visual updates because the tests are dependent on the exact placement of

GUI components [18]. This leads to high maintenance costs for applications that

experience visual changes.

2.1.2 The 2nd Generation

The second generation of GUI testing improved the robustness of the first

generation. Instead of capturing screen coordinates of triggered events, it captures

attributes associated with the elements rendered on the GUI [19]. These attributes

vary depending on what type of application is being tested. For web applications, the

test runner finds GUI components by accessing the HTML DOM and identifies the

component by matching the attributes associated with an HTML tag, i.e. the id

attribute. It can be argued that this difference makes the 2nd generation GUI testing

more robust than the 1st generation since it’s no longer sensitive to the placement of

visual components. However, this approach sets some requirements on the SUT.

First, in order to identify the sought-after element, it requires that the GUI

component has a unique attribute associated with it. Otherwise, this might lead to

some undefined behavior as it might interact with the wrong GUI component.

Secondly, even identifying GUI components with uniquely produced identifiers

might pose another challenge. The test runner needs to know prior to its execution

which element attribute it should search after. If the unique identifier is generated for

the SUT during execution, the tester runner won’t be able to find the sought-after

GUI component.

The 2nd generation of GUI testing takes advantage of element-based assertions.

This means the tester runner can expect certain elements to appear in, i.e. the HTML

DOM. If an element does not appear when expected, the test runner fails the test.

2.1.3 The 3rd Generation

The 3rd generation of GUI testing tries to tackle the issues with the 2nd

generation GUI testing by interacting with the SUT using image recognition [19].

This approach is often referred to as Visual GUI testing (VGT). The test runner

utilizes images of GUI components in order to determine which GUI component to

interact with. By using images rather than element attributes in order to identify GUI

components, the technique overcomes some of the obstacles with the previous

generation. However, this approach is dependent on how well the image recognition

algorithm performs. If the image comparison algorithm performs poorly, the test

suite is subject to false positives, for failing to detect correctly rendered components.

The 3rd generation of GUI testing utilizes image-based assertions by comparing

images taken by the test runner with an image baseline that defines the expected

appearance.

2.1.4 Visual Regression Testing

VRT is not concerned with how the test runner interacts with the SUT but relies

on the assertion technique. VRT is based upon using image assertions, the 3rd

generation approach. There exist 2nd generation tools that supports this assertion

technique as a plugin or by default. In a case study by E. Alégroth et al. [19] , the

authors investigate why a large tech company, Spotify, transitioned to a 2nd

generation tool after a long term to use of a 3rd generation tool. Spotify experienced

difficulties with handling dynamic/non-deterministic data and 3rd party GUI

components. This led to a high maintenance of test scripts, as soon as a 3 rd party

provider updated an interface for their GUI component. As a result, the robustness of

the test scripts was reduced, as Spotify could not know when these updates occurred.

Another issue Spotify experienced was distinguishing different rows when i.e.

scrolling in their application, because the data in their list was dynamically produced.

This could be resolved by testing against a configured test database, making search

results deterministic. However, by using this approach, they were not testing against

a production environment anymore, which is one of the benefits with GUI testing.

The benefits of transitioning to a 2nd generation tool was the increased flexibility of

targeting dynamically produced data, as the test scripts could target attributes

associated in the GUI bitmap. However, this set some requirements in how the

application was developed, to ensure that these targeting attributes existed on

specific elements. E. Alégroth et al. [19] concluded that the tool Spotify used was too

limited to fully support their requirements, but argues VGT is still a valid testing

approach, by following best practice guidelines regarding the adoption and use of

VGT in practice listed in their paper.

2.2 Continuous Integration and Visual GUI Testing: Benefits

and Drawbacks in Industrial Practice

In a case study performed by E. Alégroth et al. [3] investigate the applicability

of VGT in an industrial CI environment and try to identify some of the benefits and

drawbacks when introducing this technique. E. Alégroth et al. [3] recommend VGT

because it mimics the steps taken by a tester when performing manual testing, by

either using R&R or writing scripts describing the actions. Furthermore, VGT allows

for assertions based upon comparing images during the test execution with an image

baseline. This is one of the advantages of using VGT because it allows testing

against a production environment and captures the end user’s perspective, which is

difficult with other testing techniques such as unit testing. However, the authors

argue that there are some drawbacks related to this method, such as the test execution

run time and fault detection. The results show that during the study, 16% of all test

outcomes, were false positives, that is, good quality items being rejected.

Additionally, they reported that 0% false negatives were found, meaning bad quality

items being approved, indicating the testing technique is pessimistic. E. Alégroth et

al. [3] argue VGT can be a good method for finding that there is a failure in the

application but it will not show where the defect lies which can lead to a costly root-

cause analysis.

2.3 Transitioning Manual System Test Suites to Automated

Testing: An Industrial Case Study

In another case study performed by E. Alégroth et al. [8], the authors investigate

transitioning from manual to automated testing suites. The benefits found were

quicker feedback times to the developers which enable higher test execution

frequency, a decrease in the overall testing effort, and an estimated positive Return

on Investment (ROI) after 6-13 VGT test suite executions. Another benefit observed

was the automated test was not just able to find all the faults found by the previous

manual testing suites, but even faults that had not been identified earlier. However,

the author argues that additional data is needed in order to determine if VGT is

feasible in an industrial setting. Furthermore, E. Alégroth et al. [8] found that the test

subjects thought that the testing tools suffered from different limitations, which lead

to the conclusions that the automated testing was still complementary to the manual

test execution. The test subjects, however, argued over the substantial value given

from the testing tools, even given their limitations. One of the perceived benefits was

flexibility, that the testing technique is independent of the platform and implemented

language.

2.4 Automated System Testing using Visual GUI Testing Tools:

A Comparative Study in Industry

Whether VGT is feasible acceptance testing technique whilst being a cost-

effective compared to manual testing, in general, is still debatable. There exist many

examples of test automation projects that have ended up with major expenditures or

failures [1], [2], [9], [20], meanwhile other research papers show positive support for

VGT [3], [8], [11]. A comparative study performed by E. Börjesson et al. [11]

showed an example where VGT was applicable as an acceptance testing technique in

an industrial setting. Their results show that they were able to automate 98 percent of

their test cases with two different testing tools. Furthermore, their analysis showed

that VGT overcame many of the limitations associated with R&R and other GUI

testing techniques. These limitations were the lack of flexibility and robustness , as

VGT is not bound by the psychical placement of GUI components on the display and

can find a component regardless of its placement [11]. Additionally, the authors state

since the technique always executes the same sequence of steps, the risk for fault slip

troughs caused by human error are reduced. However, the authors observed that the

technique was not able to find faults outside the defined testing scenarios and

therefore conclude that it cannot replace manual testing. Lastly, the authors estimated

a reduction for test suite execution by 78 % compared to manual testing with an

experienced tester. Thus, they conclude that VGT could potentially decrease cost

while increasing the testing frequency as the testing effort is decreased.

2.5 Too much automation

An important aspect of automating tests is the effort required by a company,

both regarding cost and time. According to [2], [21], introducing automated tests is

time consuming, both when a test fails and a manual analysis has to be made, and

when maintaining and updating the tests for new releases. The papers also state that

automated tests are generally more expensive than manual tests if they are only

executed very seldom. This is because it is generally more efficient to perform the

test once manually than to write the code for the automated test procedure. This

naturally leads to the question of which tests should be automated. There, the papers

state that what should be tested is among other factors, very dependent on repetition.

If a test is written to be repeated multiple times, automation of the test is probably

worthwhile. On the other hand, if a test only is meant to be run a few times, the cost s

of automating that test exceed its value. Likewise, if the software that is being tested

is subject to large or frequent changes, it could prove unnecessary labor to write

specific automated tests for it. Finally, the papers also state that automated testing is

more reliable than manual testing when it comes to detecting changes to the system.

If the aim of the testing approach is to check that the output remains the same after

changing the software, an automated test is performed in the same way every time,

whereas it can be harder to perform manual tests in the exact same way every time,

and it could, therefore, be harder to reproduce found bugs.

2.6 Trade-offs between automated and manual testing

Net benefits from automated and manual testing seem to be dependent on many

variables. Results from a study by Taipale et al. [20], show that test automation could

provide benefits such as quality improvements and being able to perform more tests

in less time. It would also save time for the testers and enable them to reuse many of

the automated test cases. The same study also shows that the drawbacks of

automating tests could be the implementation cost, especially in complex and often-

changing code. Another less anticipated cost of automating tests is the maintenance

cost and cost of training the testers. Taipale et al. [20] state that all tests need some

human intervention, even for automated tests, because 100% automatic testing is not

a realistic goal. Depending on what type of software a company produces, automated

testing may have different outcomes. For a company that produces complex software

that is subject to large changes and frequent releases, it can be more expensive to

maintain and update automated tests than to perform them manually. But for a

company that produces simpler software with mostly smaller changes on software

releases, automated testing can work as a quality control tool for testing if

implemented correctly. The study finally states that the reusability of tests is

essential to make implementing test automation a worthwhile task.

2.7 When and what to automate in software testing?

Deciding the scope of what to automate may be a difficult task. Having an idea

that it is worth to automate everything within the testing process is a dangerous line

of thought that will most likely result in disappointments or major expenditures. In a

multi-vocal literature review by Garousi et al. [1], the authors do an extensive review

of what and when to automate in software testing and try to determine important

factors to take into consideration. They state that automation cannot replace manual

testing completely or eliminate personnel costs. However, there are many benefits to

be found if automatization is done in the correct context with an appropriate

approach, such as, decrease cost in testing and an increase in the quality of software.

Garousi et al. [1], categorized the 15 most important factors of those found into 5

categories; SUT-related factors, Test-related factors, Test tool-related factors, Human

and organizational factors and lastly Cross-cutting and other factors.

2.7.1 SUT-related factors

The system under test (SUT) factors relates to the matureness of the system. If a

system is not mature enough, i.e. undergoing many major changes or new features

being re-implemented, it will have a negative impact on the automatization aspect.

This will produce ''broken tests'' which will require maintenance in repairing the

automated tests yielding false-positive results. In general, implementing

automatization in an immature project will result in a lot of effort to adapt to changes

2.7.2 Test-related factors

The test-related factors relate to the test cases and test suites used, i.e. the

decisions on what to test and what not to test has an impact on the automatization

aspect. Garousi et al. [1] concluded that the need for regression testing was the most

mentioned factor for test automation decisions. Furthermore, tests that humans

dislike to perform are strong candidates for automatization, meanwhile testing

performed for usability and user experience offer little payback from test

automatization.

2.7.3 Test tool-related factors

Test tool factors depend on the quality and adaption to proper and suitable

testing tools for automatization. Selecting a suitable tool has a large impact on the

success of the potential benefits gained from test automatization. Garousi et al. [1]

argue that there involves a risk when selecting a testing tool. These risks include an

increase in pricing or a sudden halt of further development, which should be

accounted for during the selection process. The authors recommend selecting tools

with a large user base as a risk mitigation strategy.

2.7.4 Human and organizational factors

The human and organizational factors relate to the competence and skill the

organization holds. This factor has a great significance as the competence required

for enabling automatization, is both different and often additional from the skills

needed to carry out manual testing. Garousi et al. [1] claims that if an organization

lacks the resources for training or when the organization is missing competencies it

might be better not to automate as the risk for failure is high.

2.7.5 Cross-cutting and other factors

Garousi et al. [1] define cross-cutting factors as factors applicable to more than

one group, such as economic-, automatability of testing and development process

factors. Normally, the initial cost of introducing automatization is greater than

introducing manual testing. However, the cost of automated test execution is less

than manual testing, especially when the test executing is repeated several times.

Furthermore, the authors argue that the development cycle used at the organization

will impact the benefits of automatization.

2.8 Data collection

Lethbridge et al. [22] argue data collection techniques can be categorized into three

different degrees. The first-degree is direct methods which means the researcher has

direct contact with the data collection source and collects the data in real-time. These

direct methods include observations, interviews or focus groups. The advantages

with first degree methods are that the researcher has control over what data is

collected, the quality of the data and how it is collected. However, the drawbacks are

that it is an expensive method as it requires a lot of effort to apply [23].

The second-degree is indirect methods which involve collecting raw data without

interacting with the subjects during a data collection session. These indirect methods

include automatically monitoring software engineering tools or observations found

from video recordings or meetings etc. The advantages with second-degree methods

are similar to first degree methods with the research ability to control but they are

less expensive to carry out [23]. Observations are a good technique to distinguish

between an official view of a specific matter and how it is in the real case. This

technique can provide the researcher with a deeper understanding of the subject

investigated. However, second-degree methods are still expensive to apply even

though they are generally cheaper than first degree methods [23].

Third-degree methods include independent analysis of artifacts already created

before or during the research period. These artifacts could be documents such as

manual testing reports, requirement specifications or failure reports. Third-degree

methods are advantageous since they are less expensive to carry out than first- and

second-degree methods, but this method reduces the possibility for the researcher to

control the quality of the data and what data is produced. This is because these

artifacts are were generally not produced with the intent of investigating the

investigated area [23].

Runeson et al. [23] argue that interviews are important in the data collection

process for a case study. The typical focus for semi-structured interviews is on both

acquiring the individual's qualitative and quantitative experience of the investigated

phenomenon. The interview questions are generally a mixture of open and closed

questions. Runeson et al. [23] further describe that there are three general principles

for how the ordering for the mixture of open and closed interview questions should

be proposed in an interview session:

Funnel – The funnel principle involves beginning the interview session by

asking open-ending questions towards the ends of the interview session asking more

specific closed questions.

Pyramid – The pyramid principle involves beginning asking specific closed

questions and as the interview session proceeds the questions become more open-

ended.

Time-glass – The time glass principle involves beginning the interview with

open-ended questions, progressing to specific closed questions and ending the

interview with open-ended questions again.

Chapter 3 Method

This chapter presents the method used to answer the stated research questions.

Section 3.1 explains the DRS and describes what was done in each step of the used

model. Section 3.2 describes which qualitative data collection methods were used.

Section 3.3 describes which metrics were used, their relevance and on which data the

metrics will be derived. Section 3.4 describes the triangulation procedure used for

this thesis.

3.1 Design research study

The DRS for this thesis was divided into three cycles described as activities

below, see Figure 1.1 for an overview of the cycles. Each cycle included semi-

structured interviews with different aims. The characteristics of DRS are progressive

refinement in design. It involves bringing a solution to a real-world problem and

investigating how the solution works in its context [24]. Then constantly revising the

solution until a wanted behavior is reached, since it is difficult to account all factors

for a solution before the implementation. It is therefore considered appropriate to use

a DRS approach for this thesis. The DRS followed the guidelines presented by

Kitchenham et al. [6] for experimental design research. Since the DRS aims for

progressive refinement based upon learned experiences, the approach allows falling

back into a previous cycle [24]. However, the interviews were only carried out once

for each cycle. The authors observations were included and linked throughout all

three cycles.

3.1.1 Cycle 1

The first cycle of the DRS includes configuration of the VRT testing tool, ensuring

relevant testing data is produced, e.g. testing reports, configuring the structure in how

the tests should be executed and storage of the testing results. Semi-structured

interviews were carried out with the aim to define the need for ATE, competences,

experiences, organisational processes, and practices.

3.1.2 Cycle 2

The second cycle of the DRS included implementing and integrating the test

execution in the CI environment. The implementation goal is the enablement of the

ATE in a CI environment. Semi-structured interviews were carried out during the end

of this cycle with the aim of investigating the adoption of the implementation from the

testers point of view.

3.1.3 Cycle 3

The third cycle included extracting and processing the data collected throughout

the first two cycles and later carrying out an evaluation of the implemented system

based upon the qualitative data collected. The interviews during this cycle aimed to

address the perceived benefits and drawbacks from the developer’s perspective. The

qualitative data was gathered from the author's observations and the semi-structured

interviews from each cycle. The collected data was divided into which cycle it was

collected from and lastly compared with the results from the informal literature review.

3.2 Qualitative Data Collection

The qualitative data collection for this thesis consisted of three different

methods. Section 3.2.1 describes how the semi-structured interviews were conducted

and what the interviews aimed to investigate for each cycle. See Appendix A, B, C,

to see the interview questions for each cycle. Section 3.2.2 describes how the data

from the unstructured observations were organized and collected. Section 3.2.3

describes how the informal literature review was used.

3.2.1 Semi-structured interviews

Throughout the thesis, three different types of interview sessions were held.

Each type had a different objective for each cycle, see Sections 3.2.1.4, 3.2.1.5 and

3.2.1.6. Every cycle interviewed 3 different interview subjects. An interview subject

was never interviewed twice. This section describes how the interview selection

process was held, how the interview data was analyzed and lastly it describes the aim

for the interviews in each cycle. All interviews were held online, due to the psychical

distance between the author and the interviewee’s as the company has offices both in

London and Poznan, and because of the COVID-19 outbreak in 2020 causing

travelling restraints.

3.2.1.1 Selection process

The interviewees were selected based upon their relevance for answering for the

objective defined in each cycle. The author had the possibility to select any interview

subjects at the investigated company. The author selected interview subjects after

consulting with the company supervisor in order to determine their relevance.

3.2.1.2 Pilot interview

Pilot interviews were conducted before the actual interview sessions in order to

ensure the interview questions were formulated properly. This measure aims to

increase the quality of the data collected as misconceptions can be decreased from

ambiguous question formulations and the ordering of the questions can be reviewed.

The pilot interview was held with another master student, with a similar background

as the author of this thesis.

3.2.1.3 Interview data analysis

The interviews were recorded if consent was given from the interviewee. After

the interview data had been collected, the interviews were transcribed. This will

reduce the risk of information being lost as the interviewer can review the generated

material after the interview has been conducted and can focus on guiding the

interview during the session. Commonalities among the interviewee’s answers were

later be identified by using a Grounded Theory approach, open coding [25]. The

coding process was performed by going through the transcribed material and

matching statements made by the interviewees with a set of code words. One

individual statement could be linked to several codes since one sentence can infer to

several areas of concern. After the tagging process was completed, the statements for

each code was analyzed in order to draw conclusions for a given code. Table 1

presents the codes used for this thesis. These codes were defined prior to the data

collection process. In this study, the test reliability is defined as the confidence one

has in the test results.

Table 1 Codes used for the coding process

# Code Description

1 Test Maintenance Statements concerning maintaining the

tests.

2 Implementation Effort Statements concerning the efforts for

implementing test suites.

3 Testing Tool Statements concerning the testing tool used

at the investigated company.

4 Feedback Time Statements concerning the feedback time to

the developers.

5 Test Reliability Statements concerning the reliability of test

results.

6 Test Automation Statements concerning the test automation.

7 Development Process Statements related to the development

process.

8 Organization Statements related to organizational

concerns.

3.2.1.4 Cycle 1

Objective – The objective of the interviews for the first cycle was to define the

interview subject's experience within software development, testing and the subject's

prior experience with VRT. The interview questions later focus on the company’s

drive for introducing automated test execution and to define the current

organizational processes and practices. Lastly, the interview questions focus on

finding what the perceived benefits and drawbacks are from introducing automated

test execution, more specifically VRT, see Appendix A.

Interviewees – The interview subjects during this cycle were selected based

upon if the subjects were drivers behind the incentive for introducing automated test

execution. At the company, there are 3 primary advocates driving the introduction of

VRT. Therefore, these 3 people were selected to be interviewed.

3.2.1.5 Cycle 2

Objective – The objective of the interviews for the second cycle is to define the

transition to the new solution from the tester's point of view. Firstly, the interview

questions will seek the background and experience of the testers. Later, the interview

questions investigate their roles as software testers and their experience working with

VRT. Lastly, the questions focus on the adoption of the new setting and which

benefits and drawbacks are perceived from the tester's point of view, see Appendix

Interviewees – The interview subjects selected for this cycle are the testers in

the QA-team at the investigated company. At the company, the QA-team consists of

6 testers. From this pool, the candidates were selected after consulting with the

company supervisor.

3.2.1.6 Cycle 3

Objective – The objective of the interviews for the last cycle is to find out the

perceived benefits and drawbacks from the developers. The interview questions aim

to investigate how the developers have been affected by the transition to automated

test execution, see Appendix C.

Interviewees – The interview subjects selected for this cycle were developers

that was involved in projects were the thesis implementation was present. At the

investigated company there are 25 developers employed. From this pool, the

candidates were selected after consulting with the company supervisor.

3.2.2 Unstructured observations

Observations were documented throughout the DRS and mapped to a

hierarchical tree diagram, see Figure 3.1. These noted observations were mapped to

different areas that the observation concerns. The observations were divided into

three different categories; challenges, limitations, and problems and documented as

the kind of problem they were considered at the time of detection. These

observations aim to provide a deeper understanding of the data collected from the

interviews. The author also investigated if the noted observations have been observed

from the literature found from the informal literature review.

Figure 3.1 Overview of the hierarchical tree diagram. Adapted from E. Alégroth et al.

3.2.3 Informal literature review

An informal literature review was conducted to find the latest research within the

area. This assisted in finding concerns, metrics, approaches, and findings. Furthermore,

the data collected from the literature review aided triangulation to compare the findings

for the carried-out thesis with findings in other settings. These results are summarized

in Chapter 2

3.3 Collected Metrics

Industry experience – All the interviews for each cycle will collect the industry

experience from the interviewee. For the Cycle 1 and 3 interviews, this metric will

describe the interviewees experience in the software industry. For the Cycle 2

interviews, this metric will describe the interviewee’s experience as a software tester.

This metric aims to provide a better understanding of the background experiences in

software development the organization holds.

Prior experience with VRT – This metric is only collected from the interviews

carried out during Cycle 1 and 2, as these interviewees are the only subjects

concerned with the development and introduction of VRT. This metrics aims to

provide a better understanding of the competences the organization holds for VRT.

Perception of VRT – All the interviewees for each cycle were asked if they

believed if VRT is a compliment or a replacement for manual testing practices. The

aim of this metrics is gaining a better understanding of which approach and the intent

Vizlib is taking upon introducing VRT.

Perception on test automation – During the first cycle interviews, the

interviewees were asked if higher quality could be achieved by introducing test

automation. This metrics aims towards providing a better understanding of the intent

for introducing VRT.

VRT test script development time – During the Cycle 2 interviews, the

interviewees were asked to estimate the time it takes for them to produce a test script

for a test script. This metrics aims towards providing a better understanding of the

implementation effort.

Feedback time – This metric is collected during the Cycle 3 interviews. The

developers are asked if they believe that the feedback time has decreases in cause of

the CI integration with VRT. This metric aims towards providing a better

understanding of how the feedback time has been affected. The feedback time in this

thesis is defined as the time it takes when a developer has completed resolving an

issue until feedback from the testers are given.

3.4 Triangulation

By using different methods for the qualitative data collection, it enables these

data sources to be triangulated. If more than one data source points towards a specific

finding, a stronger conclusion can be drawn as more evidence supports the claim. As

this thesis was only carried out in one industrial setting, triangulation is important in

order to verify claims from other sources in other contexts to confirm general

conclusions and support any findings in this study. For this thesis, the triangulation

process involved finding support for conclusions made in other scientific papers and

investigating if the same observations were found in this study. The triangulation

process lastly included linking the conclusions to the stated research questions in

Section 1.3.

Chapter 4 System Requirement Analysis

This chapter presents the system requirement analysis for this thesis

implementation. Section 4.1 describes the problem situation at Vizlib and defines

how the implementation fits the situation. Section 4.2 describes the goal of the

system. Section 4.3 presents the frameworks and technologies Vizlib has selected for

the implementation. Section 4.4 displays use-case diagrams over the implemented

systems and list their functional requirements. Section 4.5 lists the non-functional

requirements for the implemented systems.

4.1 The problem situation

The software developed at the investigated company is referred to as extensions.

Extensions are different kinds of interactive and configurable visualizations with the

sole purpose of visualizing data for gaining insight. An example of what an extension

could be is everything from a bar chart, pie chart to a heatmap, but in theory,

anything that you could think of that you would want to interact within a GUI. The

extensions are JavaScript and HTML based web applications. These extensions are

dependent on and tightly coupled with a third-party software platform, Qlik, which is

the engine that performs data analytics and produces the data to be visualized. This

poses a challenge as the developed software cannot be executed on its own but

requires an instance of the Qlik engine to be running in order to function.

Consequently, this raises some limitations for the implementation. One of these

limitations is that Qlik can only hold one version of an extension at any given time.

This will have implications for the test automation since two different versions of the

same extension cannot be tested at the same time on one instance of Qlik. It would

either require testing against another test server running its own instance of Qlik or

implementing a scheduler keeping track of what is being tested on the test server

currently. However, maintaining several testing servers would require a scheduler

dictating which servers are currently in use. Therefore, a scheduler supporting both

options will be implemented for this thesis, as it makes the solution scalable to

support a higher level of parallelism of concurrent test executions.

Another challenge that arises because of the software’s dependency on Qlik is

licensing. As Qlik is a Software as a Service (SaaS) product [26], the pricing is based

upon monthly subscription packages that limit how many concurrent active users

may exist under one license. Thus, setting a limitation on the implementation since

the amount of supported parallel test executions is limited to how many concurrent

tests executions that can held under one user license.

Furthermore, the implementation will require allocating dedicated testing

accounts for the test automation as it needs to ensure that a license is only used by

once testing instance and that a license is available. This further strengthens the

importance of a scheduler in order to enable test automation in this context.

Currently, Vizlib uses a self-built web application to build new versions of the

released software. It involves obfuscating code, replacing generic tags used for

licensing purposes and other build specific processes. It achieves this primarily by

using a JavaScript task runner called Gulp. Gulp is a popular JavaScript framework

for automating workflows [27]. Gulp allows a user to define a set of tasks and

determine the sequence these tasks should be executed in. In order to test a

production version of an extension, the extension code must run through this build

process to later be uploaded to the test server. The current version of the self-built

application only supports sequential builds. This would most likely become

problematic if the ATE would be integrated into this platform because it would result

in a bottleneck. Therefore, now when Vizlib wants to introduce VRT, this build

script will also need to be added to the new CI platform.

4.2 The goal of the system

The goal of the implementation is to enable the automatic execution of VRT test

suites for the investigated company’s CI pipeline. The implementation consists of

three parts, building a scheduler application, configuring the new CI platform and

adding the ATE for VRT tests in the CI pipeline. The implementation is required to

be triggered by specific events in the software development cycle, such as pull

requests, commits and releases of a new version of the developed product. This thesis

does not include writing tests for the testing framework, as this practice is performed

by the QA-team.

4.3 Selected- & Relevant Technologies

This section presents relevant technologies and frameworks Vizlib has selected

for the implementation.

4.3.1 NestJS

Vizlib selected NestJS to be the framework the scheduler is going to be

developed with. NestJS is a framework for building server-side applications that

utilizes TypeScript as the development language, meanwhile still supporting

JavaScript [28]. NestJS is built upon a popular library, NodeJS, which is an

asynchronous event-driven JavaScript runtime library used for building network-

based applications [29].

4.3.2 CircleCI

Vizlib has selected CircleCI, a Platform as a Service (PaaS) provider, as their CI

platform where the VRT tests will execute from. CircleCI is a platform that connects

to different Version Control Systems (VCS) and allows different user-defined jobs to

be executed when certain events are triggered on the VCS, such as commits or pull

requests [30]. CircleCI fetches the code associated when an event is triggered and

executes these user-defined jobs inside docker-containers or virtual machines. As

these user-defined jobs are performed within their own environment, it allows the

users of this service to execute a vast variety of tasks, such as Vizlib’s build process

and VRT testing. The ATE of VRT testing will reside on this platform. This platform

enables practices such as CI and Continuous Deployment (CD).

4.3.3 JWT

JSON web tokens (JWT) provide a secure way of transmitting information

between parties by using JSON objects and are commonly used for authorization

[31]. A JWT consists of three parts, a header, a payload, and a signature. The header

typically consists of which type of token it is and which hashing algorithm has been

used. The payload typically contains user information and the signature is a secret

used for validation. These three parts are encoded to Base64 and attached to the

header in HTTP requests. The structure of JWT makes it possible for servers to

determine if a request is authenticated or not. JWT will be used in the

implementation to authenticate to Qlik on the test server and to the scheduler.

4.3.4 Cypress

Cypress is the tool that Vizlib has selected as the VRT testing tool, see Chapter

2 for more information about VRT. Cypress is a front-end test runner for web

applications that enables users to write E2E-, integration and unit tests [32]. The tool

provides snapshot images of the application when a test fails and a video over the

entire test execution. Cypress enables cross-browser testing as it supports different

types of browsers such as Google Chrome, Electron and Firefox [33].

4.4 The functional requirements

This section presents the functional requirements for the two different systems

implemented. Section 4.4.1 provides a use case diagram over the scheduler and lists

the functional requirements for the system. Section 4.4.2 presents a use case diagram

for the CI platform and lists the system's functional requirements.

4.4.1 Scheduler

The scheduler will require three main functionalities; requesting available

licenses, releasing an allocated license and viewing which licenses are currently

available and allocated. Figure 4.1 displays a use-case diagram over the system.

Table 2 lists the functional requirements for the scheduler.

Figure 4.1 Use case diagram for the scheduler.

Table 2 Functional requirements for the scheduler

ID Description

S-FR1 The system shall be able to display the current number of licenses in

S-FR2 The system shall be able to generate a valid JWT-token for a test

user to be able to log in.

S-FR3 The system shall respond with a valid JWT-token for the test server

if a license is available.

S-FR4 The system shall be able to deallocate a prior allocated test user,

making the allocated license available again.

S-FR5 The system shall only allow one license to be allocated per extension

at any given point in time.

S-FR6 If no license is available for a request, the system shall respond with

a message informing that no licenses are currently available.

S-FR7 A license shall automatically be deallocated after it has been active

for 30 minutes.

4.4.2 CI platform

The CI platform will be configured to use Vizlib’s VCS, GitHub. Figure 4.2

displays a use-case diagram for the CI platform and Table 3 lists the functional

requirements for the CI platform.

Figure 4.2 Use case diagram for the CI platform.

Table 3 Functional requirements for the CI platform.

ID Description

CI-FR1 The VRT testing shall be executed when a pull-request is made

towards predefined branches.

CI-FR2 The VRT testing shall be executed when regression testing is

requested.

CI-FR3 The VRT testing shall be executed when a release is requested (A tag

is created on GitHub).

CI-FR4 The system shall store and display build and testing results.

CI-FR5 The system shall upload the build version to the test server if the

event that triggered the CI pipeline requires VRT testing.

CI-FR6 The system shall be able to run Vizlib’s build process.

CI-FR7 The system shall wait to execute the VRT test suite until a valid

license is received from the scheduler.

4.5 The non-functional requirements

This section lists the non-functional requirements for the implemented scheduler

and CI platform. Table 4 lists the requirements for the scheduler and Table 5 lists the

requirements for the CI platform.

Table 4 Non-functional requirements for the scheduler.

ID Description

S-NFR1 The system implementation language shall be TypeScript.

S-NFR2 The system shall be implemented using the framework NestJS.

S-NFR3 The system shall support HTTP request methods, such as GET,

POST, DELETE and PUT.

S-NFR4 The implementation code shall follow Vizlib’s coding convention

format.

Table 5 Non-functional requirements for the CI platform.

ID Description

CI-NFR1 The configuration files for the CI system shall be written in YML

syntax.

CI-NFR2 The CI platform shall support docker containers.

CI-NFR3 The CI platform shall support storing test results and execution log.

4.6 Brief summary

The goal of this system is to introduce VRT execution in a CI environment at

Vizlib. The implementation will require a scheduler to be built because of the

limitations of what can be tested concurrently on the test server. Furthermore, the

implementation will include a transition to a new CI platform because of the

limitations of the current build platform. Two use case diagrams have been created in

order to illustrate the functionalities of the system.

Chapter 5 System Design

This chapter presents an overview of the system design, processes and

architecture for the implementation. Section 5.1 describes the architecture and the

processes Vizlib used prior to the implementation. Section 5.2 describes the

modifications needed for the implementation. Section 5.3 describes the key

techniques used for the implementation. Lastly, Section 5.4 gives a brief summary of

the chapter.

5.1 Processes & architecture prior to the implementation

It is important for the system design for the new solution to fit Vizlib’s software

development processes without altering their current processes too drastically. From

the interviews from Cycle 1, the architecture and processes were presented for the

author. Vizlib’s system architecture prior to the implementation is described by

Figure 5.1. The figure includes the actors for the different components of the system.

Figure 5.1 Vizlib's prior system architecture.

The components in the system are described as following:

5.1.1 Freshdesk

Freshdesk is a support ticketing system used as a community platform for

Vizlib. On this platform, customers can report bugs, request features or discuss

various topics. A dedicated support team at Vizlib handles all requests posted on this

platform, such as product support, verifying bugs and feature requests. If a bug is

verified and replicated, the support teams reports the issue to Vizlib’s issue tracking

system, Jira.

5.1.2 Github

Github is Vizlib’s VCS. Vizlib is using a git strategy called git flow. Git flow

consists of using four different types of branches. The first is the master branch, this

branch holds the code of the latest production version. The second branch is the

development branch, this branch holds the code for the next future release. From the

development branch, a release candidate branch is created from when performing a

new release. The product owner is the actor who decides when a release should be

made. The release candidate branch is the branch that gets tested before a release. At

Vizlib the smoke tests are performed on this branch. If the release candidate branch

is deemed ok for production, it gets merged with the master branch, else the branch

becomes discarded. Lastly, a tag is created on the master branch. Github is

configured to send a webhook to a slack bot upon the tag creation event.

The last type of branch is referred to as a feature/bug/issue branch. This branch

is created from the development branch and is the branch which a developer resolves

an issue from. When a developer has resolved an issue, a pull request with the code

changes are made towards the development branch. For each pull request, another

developer will code review the changes. An approval is needed from the reviewer for

an issue to get merged. Additionally, a tester will review and test the application to

determine if the changes are deemed OK. If the testers approve the update, they

merge the pull request to a development branch.

5.1.3 Jira

Jira is Vizlib’s issue tracking system. The software development methodology

Vizlib uses is a Kanban approach were each extension has its own individual Kanban

board. The Kanban board consists of the following columns; Selected for

development, In Progress, Awaiting QA, In testing, QA Approved, Done and

Released. Here, the product owner prioritizes the issues at hand and selects which

issues should be developed. The developers work together with the product owner to

select which issues the developer should work on from the product backlog. Once an

issue exists in the Awaiting QA column, the testers review the changes, ensuring the

code changes has not affected the software in unexpected ways and that the solution

resolves the issue. If the issue gets approved by the testers, the issue transitions from

the Awaiting QA column to the QA approved column. When the pull request is

merged to the development branch, the issue get moved to the Done column. Once a

release is made, all issues in the Done column are move to the Released column.

5.1.4 Vizlib Notifier (Slack bot)

The Vizlib Notifier is a self-developed slack bot by Vizlib. When a webhook is

received from Github, upon tag creation, this slack bot produces an interactive

message with meta information about the release. It fetches the meta information

from JIRA describing which issues are about to be released. The message also

includes buttons to perform different actions, such as update the status of the related

issues as released on JIRA or notify customers that their issue has been resolved with

this release. The slack bot works as an interface for the product owner in order to

perform different steps in the release process. See Figure 5.2 for an example message

from the slack bot. The slack bot sends a webhook to the extension builder

application, when a release is triggered.

5.1.5 Extension Builder

The extension builder is another self-developed application by Vizlib. Upon

receiving requests, the application triggers build scripts for a tagged version of the

developed software, previously explained in section 4.1. After build script is

completed, the build artifacts are uploaded to AWS S3, a 3rd party storage provider.

Lastly, the extension builder notifies the slack bot whether the build has completed

with or without any errors.

Figure 5.2 Example of a slack bot message for a release. Names and personal images

have been censored.

5.2 Architecture for the new solution

The architecture for the new solution is presented in Figure 5.3. The following

updates and systems will be added to the previous architecture:

5.2.1 CI platform - CircleCI

The build script and the test runner Cypress will be added to the new CI

platform, CircleCI. Cypress will be configured to request and allocate licenses from

the scheduler or to wait if no available licenses are found. Furthermore, if a license

has been allocated, Cypress will upload the build artifacts produced by the build

script to the test server. Additionally, the test runner will download the image

baseline that it should use for the test run from AWS S3. Lastly, the test runner will

execute the test suites against the test server and deallocate the acquired license after

the test suites have been completed.

5.2.2 Test Server

A test server will be created hosting a version of Qlik sense. The test server will

be the SUT for the test runner Cypress. On this test server, testing accounts for Qlik

will be created, dedicated only to be used by the CI solution.

5.2.3 Scheduler

A scheduler will be implemented as an API. The scheduler will be responsible

for keeping track of what is currently being tested on the test server. It will generate

licenses for the test runner, but also notify the test runner if the test server is already

testing the same type of extension or whether any licenses are available.

5.2.4 Github

CircleCI will be integrated to Github. This will trigger builds on CircleCI when

specific events happen on Github. It will enable viewing build and test results

produced by CircleCI and enforcing rules, such as, tests must pass in order to

approve a pull request.

Figure 5.3 Vizlib’s system architecture after the implementation.

5.3 Key techniques

This section describes the key concepts and techniques used for the

implementation and design of the system. Section 5.3.1 describes the technique

dependency injection and how it is used. Section 5.3.2 describes how testability was

ensured in the design of the system.

5.3.1 Dependency injection

The scheduler is developed with the framework NestJS. NestJS allows for a

technique called Dependency Injection (DI) [34]. This is an architectural pattern the

author has selected to use. DI is an Inversion of Control (IoC) technique used to

achieve loose coupling between objects. When using DI, a class is not responsible of

instantiation for its own dependencies. Instead, the dependency gets instantiated by a

specific class and later it injects the initiated dependency to the class that needed the

dependency. The method of not controlling the initiation for the class own

dependencies is referred to as IoC [34]. Traditionally, without DI, the object

initiation is done by the coder imperatively. In NestJS, the injection is handled by a

specific IoC container during runtime. In order to create a dependency, NestJS uses a

so-called decorator for every class, that defines what kind of class it is. For example,

if the decorator injectable is used, NestJS will set that class as a provider and the

class can be used by any other class.

Dependency injection enables code reusability across any modules used

throughout the project, caching of instantiated objects and a simple way of creating

singleton objects.

5.3.2 Designing for testability

To ensure that the implementation fulfils its system requirements it is important

to test it. As the implementation consists of two different primary components using

different systems, it is important to take reasonable testing approaches for each

component. Garousi et al. [1] presents a checklist for determining what and when to

automate in software testing. The primary factors of the checklist are described in

Section 2.7. This checklist has been utilized when deciding if automated tests should

be written for the CI-platform and the scheduler. The implementation for the CI-

platform was deemed as a bad candidate for introducing automated testing and was

tested manually instead. The primary reason for why it was considered as a bad

candidate was because of the complexity and difficulty of testing the CI platform in

an automated manner. The implementation for the scheduler was deemed as an

appropriate candidate.

When designing a system and testing suites, it is important to consider the

aspect of testability. Alwardt et. al [35] present some best practices for achieving

testability when designing software, see Table 6. These best practices were

considered when designing, implementing and writing tests for the scheduler.

Table 6 Best practices summarized for testability presented by Alwardt et. al [35].

# Practice

1 Keep unit tests separate from integration tests.

2 Tests should not depend on the order in which they run.

3 Unit tests should be atomic.

4 Design a loosely coupled system.

5 Test should be maintainable.

5.4 Brief summary

This chapter presents two high-level architectures, one describing the

components of the previous architecture and the other describing the how the new

implementation will affect and fit into the previous architecture. Furthermore, it

describes how the different actors of the system interact with the components in the

architecture. Finally, it presents how NestJS enables the use of DI and which

considerations were made in order to achieve a system with a high testability.

Chapter 6 System Implementation

This chapter describes the system implementation. Section 6.1 describes the

different environments the system runs upon. Section 6.2 describes the key flows

throughout the system, including execution flow charts and sequence diagrams.

Section 6.3 describes the key interfaces of the system, consisting of which actors

interact with the different components in the system. Lastly, Section 6.4 gives a brief

summary of the chapter.

6.1 The environment of system implementation

The system implementation runs on three different environments, as it consists

of three different platforms. The first environment is for hosting the scheduler

application. The second environment is the CI platform. The third is the environment

for the test server.

6.1.1 Scheduler

The scheduler is hosted on a Platform as a Service (PaaS) provider called

Heroku. Heroku is a cloud platform for hosting and deploying web applications [36].

They offer different execution environments depending on the selected price plan

[37], [38]. Vizlib selected to use the standard-1x plan. The content of the plan is

summarized in Table 7. To the authors knowledge, Heroku has not published any

more detailed hardware specifications.

Table 7 Run-time environment for the scheduler.

Operating System Memory (RAM) vCPU

Ubuntu 18.04 (linux) 512 MB 1x

6.1.2 CircleCI

CircleCI is a PaaS and provides different run-time environments depending on

the selected price plan [39]. Vizlib has selected the Medium+ price plan. The content

of the plan is summarized in Table 8. To the authors knowledge, CircleCI has not

published any more detailed hardware specifications.

Table 8 Run-time environment for the jobs on the CI platform.

Docker container (Debian:jessie) 4 GB 2x

6.1.3 Test server

Table 9 describes the run-time environment for the test server hosting an

instance of Qlik sense. This server hosts the latest stable version of Qlik sense that is

typically released every 3 months.

Table 9 Run-time environment for the test server.

Windows Server 2016 16 GB 4x

6.2 Key program flow charts

The system as a whole has three different events which are able to initiate the

VRT testing. This section describes what happens and how the system acts upon

these different events.

6.2.1 CircleCI workflows

In CircleCI workflows are divided into jobs. A job contains a sequence of steps

to be executed. A step is an arbitrary command that is defined inside a configuration

file. Each job is executed inside in its own docker container. This enables jobs to be

executed in parallel or in sequence as they run within their own context. When

creating a workflow, job dependencies are defined among the jobs in the workflow in

order to determine if a specific job can be run in parallel or if it must wait before

another job has to be completed before executing. Specifying when a workflow

should be triggered on CircleCI is done by defining filters. These filters are created

by declaring which branches or tags should trigger a workflow. Workflows enables

something CircleCI refers to as attaching and persisting to workspace. When

persisting something to a workspace, it means that the specified build artifact will be

stored and made available for other jobs. In order to retrieve build artifacts from

previously executed jobs, simply attach the workspace, and all the previously

persisted workspaces will be attached for the specific job and available within the

current job’s context.

6.2.2 Implemented workflow on CircleCI

Figure 6.1 shows what the implemented workflow executes upon different

events triggered on Github.

Figure 6.1 Overview of the implemented workflow on CircleCI.

6.2.2.1 CircleCI Job 1: Build Extension

The first job in the workflow builds a production version of the extension. It

begins with fetching the code associated with the event that triggered the workflow

from Github. After the code has been fetched, it installs all related dependencies for

the project that are needed to build a version of the extension. Later, it executes the

build script, building the extension. If the build fails, the CircleCI job aborts, stores

the logs over what has failed and displays on Github that the associated commit has

failed. If the build succeeds, the build artifacts produced are uploaded and persisted

to the workspace and continues to execute the second job in the workflow.

6.2.2.2 CircleCI Job 2: Build extension

The second and last job in the workflow executes the VRT tests. The job begins

with attaching the workspace created from the previously executed jobs in the

workflow. After the workspace has been attached, the job starts the test runner,

Cypress. The first task the test runner performs, is authenticating towards the

scheduler, ensuring only authorized request can allocate licenses. It authenticates by

doing a POST HTTP request with credentials that have been provided in a

configuration file for the test runner. The second task the test runner performs is

allocating a license from the scheduler. It achieves this by doing a GET HTTP

request to the scheduler. If the scheduler fails to find an available license, the test

runner retries again after 10 seconds until the scheduler finds an unallocated license.

Once an available license is found, the test runner uploads the build artifacts created

from CircleCI Job 1 to the test server. The tester later authenticates to the test server

by using the license retrieved from the scheduler. After the build artifacts have been

uploaded, the test runner downloads the image baseline it should use. After the

download is complete, the test runner executes the VRT tests. When the test runner

has completed running all test suites, the test runner sends a DELETE HTTP request

to the scheduler deallocating the used license. After the license has been deallocated

a test report is created. Lastly, the testing artifacts produced by Cypress, such as, a

video of the execution and a testing report are stored in CircleCI. Figure 6.2 provides

an overview of how the different systems communicate with each other upon the pull

request event as a sequence diagram.

Figure 6.2 Sequence diagram over how the different systems communicate upon a

triggered event from Github.

6.3 Key Interfaces of the software system

This section describes the key interfaces for all systems involved and which

components the different actors interacts with. Section 6.1.1 describes the developed

API endpoints for the scheduler. Section 6.3.2 describes what has been added for the

interface on Github. Section 6.3.3 describes how the CircleCI interface is used.

6.3.1 The Scheduler

The scheduler is responsible for keeping track over what is being tested on

which test server, how many licenses are available for every server in any given

point in time and providing the test runner with a valid license to use. The external

interfaces for the scheduler are the API endpoints developed. These endpoints are

presented in Table 10 including the required HTTP-method to use, the URL and a

short description over what the endpoint is for.

Table 10 Endpoints for the scheduler.

# Method URL Description

1 /GET /token/list Gets all the currently

allocated licenses by the

scheduler.

2 /GET /token /{extension}?server={server} Request a license for an

extension

3 /DELETE /token Release acquired license

4 /DELETE /token/freeall Releases all licenses

currently allocated on

the scheduler.

5 /POST /auth/login Authentication towards

the scheduler.

When the scheduler allocates a license, it uses a first come - first serve

algorithm. The choice of algorithm was based upon that each testing occasion was

deemed equally important. Thus, not requiring an advanced prioritization algorithm.

The test runner, Cypress, is the primary component in the system that interacts with

the scheduler. The scheduler contains a configuration file were all the test server

meta information is defined. The configuration file expects the following two items

to be defined; a test server URL and the amount of available testing accounts on the

server. Based upon the information provided in the configuration file, the scheduler

sets up the expected resources for each defined server.

When the test runner requests a license from the scheduler, it uses endpoint #2

in Table 10. If the query parameter server is not provided, the scheduler searches

among all listed servers and investigates whether the server has an available license

and if the extension is not currently being tested on the server. If the server finds an

available testing instance, the scheduler allocates it, generates a license using JWT

and responds with which server is available and the generated license. When the

query parameter is specified, the scheduler searches only for an available license for

the server provided.

After the test runner has completed its execution, it uses endpoint #3 in Table 10

to release the allocated license. The body of the DELETE request contains which

extension has been tested and the server it has been tested against. This is needed for

the scheduler to determine which testing instance the request refers to. When the

scheduler receives this request, it simply searches for the allocated testing instance

and sets it as available again.

Endpoint #1 and #4 in Table 10 are primarily used by a system observer to

observe what the currently tested on the server and to reset all allocations currently

held by the scheduler.

Endpoint #5 is in Table 10 is used to authenticate towards the scheduler,

ensuring that unauthorized requested cannot allocate licenses. This is always the first

request the test runner performs in order to allocate a license. The body of the POST

request contains predefined credentials for the scheduler. Upon a successful

authentication, the scheduler responds upon the request with a JWT token. This JWT

must be present in the header of every request in order to use endpoints #1, #2, #3

and #4 in Table 10. If this token is not present, the scheduler will response with the

status code, 402, meaning the request is unauthorized. Upon successful requests, the

scheduler will always response with the status code 200, for all listed endpoints. If a

request is authenticated, but the scheduler has failed to allocate a license because

every server was unavailable or the scheduler tries to deallocate a license that is not

allocated, the status code of the response will be 202. If endpoint #2 is used in Table

10, and the query parameter contains a server that is not defined in the configuration

file in the scheduler, the response will have the status code 404, meaning the

resource is not found.

6.3.2 Github

Another key interface for the system is Github, as it is the interface the

developers and testers interact with. When using CircleCI with Github, it

automatically links the status of triggered CircleCI jobs to the commits in the

repository, see Figure 6.3. Github has been configured to check the status of a

CircleCI job and requires all jobs to have been passed for a pull request to be

approved. This ensures the tests have been executed before any new code gets

merged to specific branches, see Section 5.1.2.

Figure 6.3 Status of triggered workflow viewed from Github.

6.3.3 CircleCI

An additional key interface for the system is the CircleCI platform. The

platform stores all the executed workflows including the results of the build and

testing artifacts produced by the build. When viewing a status of a build in Github, a

link is automatically provided to the CircleCI platform to the specific jobs that has

been executed, see Figure 6.3. Upon clicking one of the jobs in the workflow, a view

is shown, including all tasks executed for a specific job. In this view, CircleCI

highlights which tasks that have been successfully performed, which tasks that have

failed and the execution time, see Figure 6.4.

Figure 6.4 Example of CircleCI job 2, viewed in CircleCI.

On CircleCI’s webpage an overview and the relation between the jobs in the

workflow can also be viewed, see Figure 6.5.

Figure 6.5 An overview of the CircleCI workflow viewed from the platform webpage.

CircleCI has been configured to store the testing artifacts produced by Cypress,

this includes a video of the test execution, images highlighting the difference

between the baseline and which tests have been failed, see Figure 6.6 and Figure 6.7.

Figure 6.6 An overview of the testing artifacts stored in CircleCI.

Figure 6.7 An overview of the failed tests in CircleCI, showing the tests that have failed

during an execution.

6.4 Brief summary

This chapter describes the system implementation. It begins with describing the

system environments used by the different components in the system. The chapter

later describes the flow through the system and how the different components

interact with each other, using both an execution diagram and a sequence diagram.

Lastly, the key interfaces of the system are described, including how the interfaces

are used and how they work.

Chapter 7 Results

This chapter describes the results gained from the conducted interviews and the

unstructured observations. Section 7.1 presents the results from the coding process of

the interview sessions. Section 7.2 presents the observations made throughout all the

cycles.

7.1 Interview sessions

From the coding process, a total of 99 statements were linked to a code. Table

11 shows the occurrence for each code used.

Table 11 Total occurrences of codes from coding process.

# Code Description Occurrences

1 Test Maintenance Statements concerning maintaining

the tests.

2 Implementation Effort Statements concerning the efforts for

implementing test suites.

3 Testing Tool Statements concerning the testing

tool used at the investigated

company.

4 Feedback Time Statements concerning the feedback

time to the developers.

5 Test Reliability Statements concerning the reliability

of test results.

6 Test Automation Statements concerning the test

automation.

7 Development Process Statements related to the

development process.

8 Organization Statements related to organizational

concerns.

Since these interview’s where held in different cycles with different intensions,

the code appearances have been linked to each cycle. Table 12 shows from which

interview a code was detected. The letters A, B and C represent an interview subject

in each cycle.

Table 12 Code appearances from each interview linked to the corresponding cycle.

Cycle 1 Cycle 2 Cycle 3

Code # A B C Total A B C Total A B C Total

1 0 1 1 2 3 1 1 5 0 1 2 3

2 1 2 1 4 1 2 1 4 0 0 0 0

3 4 1 0 5 2 4 2 8 0 0 0 0

4 2 1 1 4 2 1 1 4 1 3 3 7

5 1 4 1 6 4 1 1 6 1 0 1 2

6 3 3 1 7 2 1 1 4 0 0 0 0

7 0 3 0 3 2 2 2 6 0 1 1 2

8 2 3 1 6 2 3 3 8 2 0 1 3

Total 13 18 6 37 18 15 12 45 4 5 8 17

Cycle 1 had a total of 37 code appearances, Cycle 2 had a total of 45 code

appearances and Cycle 3 had a total of 17 appearances. Table 13, Table 14 and Table

15 summarizes the interviewees background and the collected metrics from all

interview sessions.

Table 13 Background and collected metrics from Cycle 1 interviews.

Cycle 1 A B C

Background in software industry 11 Years 13 years 20 years

Experience with VRT Indirect experience 6 Months 3,5 years

Believes VRT is a compliment yes yes yes

Confidence in VRT test results yes yes yes

Believes higher quality with test

automation yes yes yes

Cycle 2 A B C

Working experience as tester 3-4 months 2,5 years 4 years

Confidence in VRT test

results yes yes yes

Prior experience with VRT no yes yes

Believes VRT is a

compliment yes yes yes

Time to produce a test on

average

2 days on

average

Between 1 hour to

some days Some days

Cycle 3 A B C

Working experience software

developer 3 years 11 years 2 years

Confidence in VRT test results yes yes yes

Proportions maintaining vs

developing new features 60-40 50-50 75-25

Difficulty implementing new

features w/o introducing new

bugs because of code

complexity Impossible Very difficult

In general,

difficult

Believes VRT is a compliment yes yes yes

7.1.1 Test Maintenance

Two points of discussion were linked to test maintenance. The first subject was

concerned with the high coupling the developed product has with Qlik. Four

interviewees discussed if the version of Qlik would be updated on the test server,

there could be implications on the test suites and the image baseline for the VRT

tests. Two quotes have been extracted below from the transcribed material describing

the issue.

“Sometimes when Qlik releases a new product, they can update new CSS classes

or something similar, so if we are dependent on some asset in the HTML and they

would change, we would have to update the tests, again and again.”

“Another thing that might become an issue, is if we decide to upgrade the

version of Qlik on our tests server, this means that the baseline will probably need to

be updated, and some test suites might break as they might update their API.”

The second point of discussion was identified by three interview subjects. They

were concerned with that new features could imply that the image baseline would

require to be updated as the application might change visually. Two quotes have been

extracted below describing the issue.

“That is a big one, if a new version comes there is a risk of us needed to revisit

all our tests. Also maintaining the image baseline. If we release a new feature, we

need to ensure the baseline is correct, as the application might look different.

“I would say it would be the test maintenance, because if something changes in

the extension code or even the Qlik sense version we are testing against, can impact

our tests. So, it will be a challenge to keep the tests updated. I think however, this

will be manageable, but it is important to keep an eye on your tests and check that

they are up to date.”

This was experienced and mentioned during the Cycle 3 interviews, see the two

quotes below.

“Right now, a thing we have an issue with is, if someone from QA updates

something on the server, the test may fail when checking the difference between my

feature branch and the demo application.”

“So recently we change our rebranding, which made the tests fail on all our

branches.”

7.1.2 Implementation Effort

All the interview subjects had the same view upon the implementation effort.

That is, the implementation effort has increased when introducing the testing tool,

however, with time, they believed the implementation effort would decrease. Two

quotes have been selected expressing their thoughts below.

“I think theoretically it should decrease, but I kind of see it as a kind of

logarithmic function. At the beginning it might be a lot of work, then it hopefully

turns to zero”

“Over time it decreases, because you can avoid regressions, context switching,

revisiting and diving into old tasks. I think it really pays off to do that.”

Furthermore, two of the interviewees discussed some issues when developing

tests. The two quotes below describes the issue.

“When working with servers, it is sometimes hard to decide about waiting times

or timeouts. We must specify, how much time should we allow it to take before

something should render. This can be tricky to decide. Sometimes it is about

animations, and other times it is about latency.”

“So sometimes, because of the nature of the visual testing and because we have

to wait for somethings to render, we are affected of the certain load time of elements.

Although, you can wait, we might sometimes assume an element is loaded but it is not

actually loaded, so one thing gets rendered but not the other. You can run the same

tests twice and have different outputs.”

Lastly, one of the interview subjects compared the implementation effort

compared to another common testing framework.

“I remember in protractor, one time, we had a big problem to integrate image

comparison and was surprised how easy this was with Cypress.”

7.1.3 Testing Tool

Five interviewees expressed that the testing tool, Cypress, was easy to use.

From the interviews during Cycle 2, when the testers were interviewed, all of them

agreed upon this, even though no interview question was designed to investigate this.

Four quotes have been selected from the interview material were the interviewees

express their thoughts about how Cypress is easy to use.

“Cypress is really cool since the entry level threshold is really small. It is very

easy to use. You understand the thought process really easy.”

“Easy to use, even though I don't have a background in development. I only

have learned a little bit of simple JavaScript.”

“I think it was pretty easy, for the first tests. To start to develop anything. I'm

experienced with developing so a transitioning to a test tool wasn't difficult for me. I

thought it was easy to understand how to write the tests. The first setup, it took me

maybe a week to understand the tool.”

“Easy to use for sure. The entrance point is very small, if we are talking about

cypress in comparison with protractor, which is working on top of angular.

Protractor needed a lot more knowledge about different things working in the web

browser. It is easier for testers.”

Two interview subjects mentioned that the support for cross-browser testing is

beneficial. The quote below has been selected from when an interview subject

discussed the area.

“Cypress cross-browser testing is also beneficial. Currently, IE is not

supported, however, just by being able to run the tests upon a different browser,

really helps QA.”

One interviewee mentioned another benefit with the testing tool, expressed in

the quote below.

“Cypress can easily work with image comparison techniques.”

7.1.4 Feedback Time

When the interview subjects were asked if they believed the feedback time would

decrease all the interview subjects stated that it would. Four quotes have been

selected from the interviewees’ answers when asked this question.

“Definitely lowers the feedback time. For me the testing and CI pipeline is

about the feedback loop, the sooner I know which line that broke something, then I'm

not going to write that line. I believe that everything that contributes to a shorter

feedback loop is beneficial, which is both visual testing but also other forms of

automated tests.”

“If we are talking about manual testing, like smoke tests, it should help because

all existing automated tests are connected to some tests in the smoke tests, can be

done automatically. So instead of 15 minutes of clicking inside the application, we

can look into the screen and in two minutes view the execution of the automation and

determine if the application behaved in the same way.”

“When we are using CircleCI, we are running these tests much more frequently,

which means that these tests will be been passed before they reach the manual

testers. There is a problem with issues bouncing back between the developers and

testers. And right now, the developers don't need to speak with the testers and wait

for the rejection, because the tests are going to be run upon the commit

automatically.”

“With the integration with CircleCI the developers receive quick answers and

they can fix it. This leads to less work for QA.”

One of the interview subjects mentioned that the feedback time depends on the

situation. The quote below describes the concern raised.

“That depends probably about specific situations, sometimes automated tests

can help, sometimes they can be totally irrelevant.”

This was additionally mentioned in the Cycle 3 interviews when the developers

were interviewed, displayed in the three quotes below.

“With our current solution with CircleCI I have almost instant feedback. We

can talk about minutes, I have a feedback that the version is built correctly, results

from your tests and so on. In terms of machine feedback, it is in terms of 5-10

minutes.”

“This is similar to our previous questions, like how many days do I have to wait

for QA and now this is something that is automatic, so it happens immediately after

pushing something to Github. So quick feedback and performance.”

“So, the feedback that we receive through the CI is really fast, but not as fast as

you'd might expect. When you run these tests locally, you have many things

preinstalled which is not the case for the CI were dependencies need to be installed

for each run, even with caching, takes a little bit of time. So, when running the tests

locally, the execution time is maybe around 1-3 minutes, meanwhile in the CI

environment it can take up to 5 to 10 minutes. It is still very fast and okay, but these

minutes adds up each iteration.”

Lastly, from the Cycle 3, two interviewees mentioned that the tests have been

helpful. However, all the interviewees mentioned the test coverage is too low as the

implementation is in its early stages, displayed in the three quotes below.

“When we are talking about Cypress at Vizlib, we don't have too many

functional tests today since we have been more focused upon the smoke tests.

However, with greater coverage, I'm sure we are able to prevent a lot of different

bugs.”

“I found cypress very helpful, as it sometimes it has found a couple of bugs that

I've couldn't find or didn't look thorough enough. For example, I maybe didn't expect

the changes to affect a certain part because somethings are so complex it hard to

know the implications and cypress has been able find these issues. As the state of

right now, I don't think we have enough coverage that we can only rely on these tests

only. The cypress tests find something, I'm very thankful for that, but we have too few

tests. So, this means we are probably not spotting other errors. I want to be more

confident in these tests, but at the time being I am not. They have found bugs in my

work and I'm very grateful for that, but we don't have enough coverage currently. It

is very good that it covers the basic functionality that our clients use, but I would like

more.”

“Actually yes, recently we had a situation with Cypress, so thanks to that, it was

actually quite big, we were able to spot 2 different issues. So, I would say yeah, it

covers the bugs we usually deal with.”

7.1.5 Test Reliability

From the interviews, all the interview subjects were asked if they felt confident

in the test results produced. All interview subjects said that they were, see Table 13,

Table 14 and Table 15. One of the purposes of the first cycle interviews was to

discuss the intent of introducing VRT tests. Three quotes have been extracted below

describing two different concerns raised by two interviewees during the first cycle

interviews.

“Another drawback would be accuracy. So, the accuracy might damage the

reliability that I'm after. So, for example, if you run the VRT tests and it works for

you and then you say yeah, the development is on point and the tests work. And then I

test it again and it doesn't work, it makes me think, if I run it later, will it work? I

would say it is tricky to get a full deterministic output, were every time the

application behaves the same way. This is more of a challenge rather than a

drawback.”

“Another drawback would might be relying on the tests too much. That's the

thing, on one hand I can see that it can save us time at QA, on the other hand if it is

not 100% accurate or reliable, it means I'm still unsure if it is working or not. That

basically means that I need to double up the time, in that case we are running the

Cypress tests but then we are still doing some manual exploration. So even though, it

is supposed to save time, at the moment it is an overhead as we need to do this

double check.”

“If you don't have trust or confidence, I have had this feeling as well, uhh, i'm

not sure if it works. I better check. But in that case, it means that we haven't covered

it enough and that simply means there is still work to do. If I have automatic testing

and they don't give me confidence, it means I don't have enough coverage, or I

simply don't trust my test which means I not testing what I should be testing.”

From the second cycle interviews, when the testers were interviewed, all testers

mentioned that they feel confident in the results if Cypress yield positive results. A

quote from each tester has been listed below.

“I really trust the results that Cypress produces. So, if everything passes, I will

not doublecheck it manually.”

“I am really confident when Cypress passes all tests, I believe there is no need

to explore further if a test exists for what is being tested.”

“We have some issues that we need to update some timeouts and the image

baseline. However, if Cypress passes all tests, then I am confident in the results. If

something is bad Cypress is really good at reporting this.”

Furthermore, one interviewee mentioned a difficulty dealing with high coupling

with a Qlik. The quote from the interviewee is listed below.

“We have specific kinds of projects (Extensions) that are included in a different

bigger project which is Qlik sense. This is sometimes problematic, since we tend to in

a little way, also test Qlik sense which we don't have control over.”

7.1.6 Test Automation

From all the interviews, no one believed automated tests could completely

replace manual testers. However, all the interviewees held the view that the

automated tests are a good addition to the testing process. The two quotes below

come from the first cycle interviews, investigating why they wanted to introduce

“I see it as a repetitive task, so instead of spending 3 hours manually for one

person performing the task, I believe we can automatize that and make it more robust

to reduce the human error from repetitive tasks. So, we can focus the QA-team on the

more manual exploration, which actually requires a higher level of brain power.”

“Manual testers aren't able to test everything in each release. This is why work

that is repeatable, is what we want to automize. We want to save the time, as we can

write the test once, we can save time by doing the exact same thing by person. Right

now, after each release, I am asking myself, well hmm, we didn't actually check 50

other bugs that we had earlier with a client with this extension.”

Two quotes have been selected, describing why the interviewees believe

automated tests cannot replace manual testers.

“But if we are talking about the overall picture of the extensions and somewhere

where human intelligence is important to check if something is nice or not nice,

automated tests for now, will never replace this.”

“It is rather hard to implement every manual test into a testing script.

Especially new features, they have to be tested manually first to find edge cases and

to seek how it works. Testers notice many visual things, such as "this doesn't look

good", which I believe cannot be achieved easily with Cypress.”

During the second cycle interviews, two of the interviewees mentioned they had

trouble automating an export feature within the extensions. The quote below

describes the issue at hand.

“For example, we haven't still found a way to test the export feature properly, in

the way that the file produced by our extension looks good. I would say it is

challenging.”

Another issue detected from the second cycle interviews is described by the

quote below.

“We have some difficulties with the baseline, when testing locally versus a

production environment, this is because the locally produced version of the extension

will slightly appear different than the production version. This means the tests when

comparing images will fail. But this is a very small issue.”

Additionally, two insight was brought up during the first cycle interviews

concerning the cost of running the VRT tests and the benefits of running them. The

quotes have been extracted below.

“They might be quite expensive to run, because they are more costly, time

consuming and memory costly to run as they run upon a browser. (Compared to

other testing approach such as unit testing)”

“I think the benefit is that you can have this ultimate check to ensure the

product works as it actually should. Which I think is really hard to have this

confidence by using functional testing, were you test in isolation.”

7.1.7 Development Process

During the second cycle interviews, the testers described how long the manual

procedure takes for performing a smoke test. Their answers ranged from half an hour

to one hour, see the quote below.

“Smoke tests (Manual) takes around half an hour to do, meanwhile expletory

features can take around 5-6 hours.”

All testers mentioned the efficiency and effectiveness has increase as a cause of

the VRT tests. The three quotes below show what the testers expressed.

“It will decrease, because for example the things I was doing previously, like

smoke tests took around 1 hour. Right now, we are covered in the most cases with

Cypress. So, it is like, two clicks, and then you run the tests and that's all you have to

do. In the end it will decrease. But now in the beginning, it has increased, as we have

spent some time writing the tests, before we can use them. We are removing things

that we previously had to do manually, so we are saving a lot of time. We are sure

that for each iteration, we cover the same things.”

“Also running the Cypress upon smoke tests, I believe it finds more bugs and is

quicker, which leads to quicker releases. Smokes tests usually take some time for the

tester and now they are quicker.”

Two interviewees mentioned an increase in test coverage compared to only

performing manual testing. See the two quotes below.

“Also, Cypress found some cases that the QA team didn't find. Even when I was

writing some new test cases for some extensions, we found 2-3 new bugs that hadn't

been detected before. However, overall, I believe it will decrease the time for the

testers.”

“I actually saw that happen already, the QA-team was developing an Cypress

test for an extension and by doing that, they found several bugs, that later were

reported and fixed and that means we now have a test to prevent them from

happening.”

Additionally, an interviewee mentioned a benefit of writing tests. See the quote

below.

“Because we are able to catch things in a production environment and also

increase quality awareness within the product, because we are being quality

conscious. That means that if we create a new feature, we are going to having

implement a new test, so that will mean it will be like having a second manual test .”

From the first cycle interviews, an interviewee mentioned that one intent of

introducing VRT was to prevent the testers from testing faulty issues, as it increases

the workload with redundant tasks, see the two quotes below.

“To prevent faulty issues to QA. So, by doing that, we could optimize the QA

process, leading to less false positives. So, if something doesn't pass our current

standards there is no point for QA to start testing it, because now it is a bottleneck.”

“There is nothing more frustrating from a QA-perspective then testing

something that doesn't work, then it gets back to development, then comes back to

QA and gets rejected and bounces back and forth like this. That is the worst thing

ever. Whatever we can do to reduce the risk of this happening, is worth

investigating.”

When interviewing the developers during the Cycle 3 interviews, they all

mentioned that it is difficult to implement new features without introducing bugs, see

the two quotes below.

“Yeah, it is very difficult to introduce something new, there is a lack of

documentation for our extensions and additionally unit tests testing the documented

functionality, so it is difficult to find out if you broke something. It is very time-

consuming to investigate that something is working right or even to prove that

something is working right, without unit or e2e-tests. As the products have grown, it

takes longer time for something simple to implemented as compare to previous

features when the products were less complex. Because you need to fit your solution

carefully and ensure should haven't introduced any regression or so. It is not an easy

task.”

“No, or it depends on the features, if it is something like breaking changes,

which we did recently I suppose it is very difficult do without bugs. The complexity

for each extension is almost the same, I would say they are very complex.”

7.1.8 Organization

During the second cycle interviews, the testers were asked what tasks they

found most time-consuming or tedious. The four quotes were extracted from their

answers.

“Manual testing is often very monotonic, with sometimes very repetitive tasks. It

is very important to change projects, at Vizlib, that means working on a different

extension.”

“For me, personally, compare reviews. With the snapshot testing this really

assists us with having a compare view directly. If you compare doing this task

manually, it becomes really tedious.”

“I would say, repeatable tests, because it is more exciting to test new features,

rather than testing the same thing again over and over again. Also, that issues

reappear, so after you have rejected something, you know the same tests will be

needed to check it again, testing the same thing all the time.”

“For example, that the server is down or that the building process is not

working as expected. It is very time consuming, for example if we are testing against

a server, and it is not working, then we must check against another server and then if

that server doesn't work you have to check it locally.”

When the testers were instead asked which tasks they found the most enjoyable,

they gave the following answers.

“Testing totally new things is the most fun to test. It's all about being creative

and finding new use cases to test from the customers perspective.”

“New features, because they are more creative rather than going for all the

steps required to reproduce a bug.”

“The most interesting ones, the most complex testing. I for example, did a test

where I put the Mario game into one of our extensions. It was very exciting,

challenging and fun.”

From the first cycle interviews, two of the interviewees mentioned one of the

reasons why they wanted to introduce VRT testing, see the two quotes below.

“One of the main concerns we have is regression bugs.”

“Fight with regression bugs.”

From the second cycle interviews, one tester reported an estimate of the

occurrence of these bugs, see the quote below.

“Bug fixing, I would say around 10-20 % of the bugs are regression bugs.”

Lastly from the Cycle 3 interviews, two concerns were raised, see the two

quotes below.

“In an ideal world, the test should investigate if something has been broken, but

at Vizlib I find it difficult. Let's say we spend 1-2 years implementing functional tests,

there is still will be still some extensions that are unmaintainable, so the tests

execution can't be able to cover everything. We could have 100% test coverage, and

it is still not enough to be sure that everything is fine. I like the way we are heading,

the idea with CI, CD, integration tests, functional tests, fuzzy tests etc. But without

good quality of development, the tests can only be so helpful.”

“From the developer perspective, there is a lack of communication in what

CircleCI returns and the developers. When I see, oh the e2e tests failed. I go to a

specific person or someone from the QA team, and ask what is wrong. Because I

don't understand the logs. So CircleCI is great and CI etc, but it also requires some

kind of introduction, presentation, learning of the new technology.”

7.2 Unstructured Observations

A total of six observations were documented throughout all the cycles. Three

observations were linked to the first cycle, three observations were linked to the

second cycle and one observation was linked to the third cycle, see Table 16. A

hierarchical tree map has been created, mapping the observation to an area of

concern, see Figure 7.1.

Table 16 Type of observations made linked to each cycle.

Type of Observation Cycle 1 Cycle 2 Cycle 3

Limitation L1 L2 -

Challenge C1, C2 C3 C4

Problem - P1 -

Figure 7.1 Documented observations throughout all cycles.

7.2.1 Testing tools

The first observation found concerning the testing tool Cypress, was a limitation

observed during the first cycle. The limitation was a hook that was not supported by

Cypress. A hook is a special type of listener to an event, that can trigger a user

defined function when an event is emitted. Cypress supports different types of hooks,

such as, before a test suite is executed, after a test suite is executed etc. However,

Cypress does not have support for a sought-after event which was after all test suites

have been executed. This was troublesome, because the test runner needed to release

a requested token from the scheduler after all test suites had been executed. This

required another approach to be taken. The solution implemented, was to create a test

suite that was only responsible of releasing the acquired token and ensuring that this

test suite was always executed last.

The second observation concerning the testing tool, was a challenge of finding a

strategy for maintaining the image baseline. This observation was made during the

first cycle. The standard strategy offered by Cypress, is done by linking VRT test

cases to specific images in a folder. If no images exist in the folder, Cypress will

save the images produced in that test run and ignore these tests for that specific run,

since there exists no baseline to compare against. These produced images would later

be used for future test runs. Vizlib found this approach problematic, because these

images would most likely be different depending on which branch on the VCS a

developer was working on. Additionally, upon merging, the developers could

accidently update the baseline with incorrect images. As a result, the approach taken,

was to upload the baseline images to a 3rd party storage provider, AWS S3. These

images are downloaded by the test runner before it’s execution, ensuring that all

branches use the same baseline images. Thus, minimizing the risk for accidental

updates.

The third observations concerning the testing tool, was the challenge of

generating the images for the baseline. Depending on which operating system the test

runner ran the execution from, or which screen resolution was used when generating

the baseline, had an effect of the test results potentially yielding false positives. This

occurred because different operating systems may render images differently, or

images may get compressed depending on the resolution, affecting the pixel

comparisons algorithms used later by the testing tool. This observation was made

during the third cycle. The strategy for handling this challenge was to generate all the

baseline images directly from the CI platform, ensuring that the operating system

used for generated the baseline images was the same as the one the test runner

executed from.

7.2.2 Tested system

Two observations were linked to the tested system. The first observation was

made during the first cycle. The challenge was to upload the extensions in a

convenient way to the test server from the CI platform or locally. After discussing

with the company supervisor, it came to the authors attention that the company had

an already implemented script for uploading extensions to the test server. This script

was reused and integrated to the test runner.

The second observation made was a problem that was observed during the

second cycle. The author noticed a specific test was always failing when using the

dedicated CI testing accounts on the test server. This problem occurred because of an

authorization issue. First, when the system was implemented, only one specific

testing account was used. This account was an owner of a certain application which

the tests ran against. The functionality that was being tested required the user to be

an owner of an application in Qlik to perform. This became troublesome, once the

dedicated CI test user accounts were used, since these accounts were not owners of

the application. The solution for this problem was to configure the security rules for

the dedicated CI accounts, making it possible to perform the action required by the

test suites.

7.2.3 Continuous Integration platform

Only one observation was linked to the continuous integration platform. This

observation was made during the second cycle. The limitation was triggering a

specific workflow on the CI platform on pull requests and tag creations meanwhile

triggering a different workflow upon commits. This specific configuration was not

possible to achieve, it was only possible to do one or the other, but not both. After

discussions with the company supervisor regarding the limitation, it was chosen to

proceed with triggering workflows upon commits and tag creations. Since pull

requests refer to the latest commit, it means the workflow is indirectly triggered upon

pull requests. In order to approve a pull request, the latest commit, must have passed

all steps in the CI workflow, making sure that the tests have run and been passed

before a pull request becomes approved.

7.2.4 Support software

One observation was made concerning the support software. This observation

was made during the second cycle, when the CI solution was being integrated with

more extensions. It was noticed that build destination from the build script was

different for some extensions. This had implications as the upload script needed to

know from which file destination it should upload the extension from to the test

server. The solution implemented for handling different the build destinations was

integrated into the test runner. In the configuration file for the test runner, a new

variable was added which declared the build location for that specific extension. The

build script later used this variable to determine were the file location was for the

build artifacts produced by the build script.

Chapter 8 Discussion

This chapter discusses the results, method and the work in a wider context.

Section 8.1 presents the discussion of the results from the interview sessions and the

unstructured observations. Section 8.2 presents the discussion of the method and the

threats to validity. Lastly, Section 8.3 presents the discussion of the work in a wider

context.

8.1 Results

This section discusses the results from the coding process for each code word

and triangulate the results with the literature review. Additionally, it presents the

results from the unstructured observations.

8.1.1 Test maintenance

The first concern brought up from the interviews was that the test maintenance

is affected by the high coupling the extensions have with Qlik. If the version of Qlik

would be upgraded on the test server, the test suites would be at risk of being needed

to be revisited, as the APIs, appearance of components and assets could have

changed. This drawback is not only associated with VRT, but rather with test

automation in general [1], [3], [9]. This drawback can be mitigated by incorporating

an architectural strategy by adding custom identifiers for the test runner to interact

with rather than using existing identifiers associated in the 3 rd party software,

decreasing the test runner’s dependency. This strategy may, however, not always be

possible, as some components aren’t always accessible for modification. The same

type of issue was identified in the case study conducted by E. Alégroth et al. [19]

when using VGT. The difference in results is that VGT is sensitive to visual updates,

meanwhile VRT is sensitive to asset updates, as VGT uses image recognition to

interact with the SUT rather than assets.

The second concern brought up was the maintenance of the image baseline used

by the VRT technique. Whenever a new feature is developed, the feature may contain

a visual change requiring the image baseline to be updated. Adding an additional

aspect of maintenance for maintaining VRT tests compared to with other testing

techniques. This was also observed when the company rebranded, causing the test

suites to break. This forces a strategy for deciding upon how the application should

look like and when the new image baseline reference should be used. The strategy

used at Vizlib was to upload the baseline to a 3rd party storage provider. The test

runner later downloads the baseline prior to its execution. Maintaining the image

baseline can be both perceived as a benefit and a drawback. The benefit is that a clear

illustration over how the application should like will exist in the form of an image

and it is revised for every visual feature update. E. Alégroth et al. [19] argue

adopting a frequent test maintenance strategy is important to avoid test script

degradation and helps lowering maintenance cost to a large extent, because

simultaneous maintenance of both logic and image is more complex than doing it

separately. However, constantly revising the image baseline could become time

consuming. Additionally, when a branch on a VCS is out of sync with the latest

visual feature updates, it will fail upon test execution, since the image baseline

expects the new visual updates to exist. This is a factor that must be considered when

incorporating a strategy for maintaining an image baseline. No research papers have

been found concerning maintaining an image baseline.

8.1.2 Implementation effort

The interviewee’s throughout Cycle 1 and 2 stated that the implementation

effort has increased upon introducing VRT but believe it will decrease with time. The

main concern regarding the implementation effort for VRT raised by the

interviewee’s was handling load times, animations and latency issues when

developing test scripts. The animation and load time events are often not represented

as GUI events and are therefore difficult for the test runner to capture. This is

described as a fundamental problem in GUI testing without a simple solution by M.

Jovic et al. [15]. When increasing the amount of time-driven animations in a user

interface the importance of this problem increases. The approach taken at Vizlib for

handling this problem is using timeouts, by setting a value to wait for an animation

or a load time to complete. The issue with this approach is that setting a too short

timeout will cause the tests to fail, yielding false positives, meanwhile setting a too

long timeout will result in an unnecessarily long execution time. When the test

runner can capture these events, timeouts are not necessary, since the test runner

waits until the expected element appears and continues to run after its appearance.

This a benefit with the 2nd and 3rd generation tools of GUI testing. Another strategy

that can be taken is to rerun all failed tests and set a threshold over how many passes

must occur in order to deem a passed test. A drawback with this strategy is that it

increases the test execution time.

Furthermore, latency issues are another challenge observed by the interviewee’s,

determining how long the test runner should wait before failing a test. Some actions

may take substantial time to perform and in combination with latency issues, the

execution time may vary from 30 seconds to 2 minutes. A mitigation strategy

suggested in two papers by E. Alégroth et al. [3][19] is to minimize the remote test

execution. This strategy is, however, is not currently possible at Vizlib, since the CI

is hosted from a PaaS provider, requiring the test execution to occur remotely.

8.1.3 Testing tool

Both the first and second cycle interviewee’s reported that the testing tool,

Cypress was easy to use and easy to setup. It is not uncommon for testers to not have

a background in programming. However, the results indicate that even these

individuals reported that the testing tool was easy to use, and barrier of entry was low

for developing test scripts. M. Rafi et al. [21] argues that testing techniques with an

easy learning curve can reduce the high initial investment upon introducing test

automation. Furthermore, V. Gaurosi et al. [1] lists easy test automation tools as an

important beneficial factor when transitioning to test automation. The results show

that there exist such tools for VRT. E. Alégroth et al. [19] discusses that VGT is

often associated with being easy to use, which can make it tempting to bring

automation to various types of test cases. However, they argue that the technique

should be primarily used for system and acceptance testing, as the maintenance cost

may become costly if used for immature or frequently changing functionality.

8.1.4 Feedback time

The results show that the feedback time has decreased and been helpful for both

the developers and testers. With the help of the implementation, the time it takes for

the testers to perform the smoke tests has decreased. Additionally, has the testing

frequency increased for the developers as the tests run automatically in each commit.

This is an expected result, since it is usually one of the primary reasons why

organizations choose to introduce automatic testing in a CI environment. This was

also observed at Vizlib when viewing upon the incentives during the first cycle. The

results additionally show that the tests have been able to spot faults that could have

otherwise been missed due to human error, such as forgetting to check an edge case.

This is further discussed in Section 8.1.6.

Two concerns were raised during the Cycle 3 interviews regarding the feedback

received. The first concern raised was that the current state of the test coverage is too

low, to be able to only rely upon the test results. As the implementation is in its early

phase, this result in not unexpected. The interviewees explained further as the

implementation would mature, the test coverage would increase. However, increasing

the test coverage should be done with caution as the implementation effort and test

maintenances is affected by this factor, as discussed in Section 8.1.1, 8.1.3 and 8.1.6.

The second concern raised was that the execution time was quick, but not as

quick as one might expect. GUI testing techniques tend to have a longer execution

time compared to other techniques. The interviewees mentioned this was a small

issue. However, it has the potential of becoming a larger factor as the test suites grow

and is another reason why introducing tests cases should be done with caution.

8.1.5 Test reliability

The results show that test reliability is important, without it , the uncertainty can

increase the overall workload, due to the doublechecking. One of the interviewee’s

answer mentioned if one does not trust the tests results, it probably means you are

testing the wrong thing, or not thoroughly enough, either way, there is still work that

needs to be done. This is an important mindset when introducing test automation,

because it ensures the focus will remain upon the scope of the problem the test

automation tries to solve. Furthermore, from the collected metrics from Cycle 2 show

that all the interviewee’s have trust in the test results if the test runner passes all test

suites. This indicates that VRT is not sensitive for reporting false negatives, that is,

poor quality items that have been are being tested get approved. This supports the

same results have been observed by E. Alégroth et al. [3]. However, as discussed in

Section 8.1.2, there are cases were false positives have been observed that can lead to

a costly root cause analysis.

The results show that VRT is effective in reporting that a failure has occurred.

The testing tool used provides an image showing highlighting the differences

between the baseline and the image taken by the test runner. Additionally, it provides

a video of the entire test execution, making it easy for a tester following what has

happened. However, when a failed test is reported, a root cause analysis must always

be made, even though the testing tool can provide some indications over how an error

occured. Therefore, it is important to minimize the false positives by increasing the

robustness of the tests. The same issue was reported by E. Alégroth et al. [3].

8.1.6 Test automation

The results show that from the first cycle interviews there were two reasons why

Vizlib wanted to introduce test automation regarding the automation aspect. The first

reason was to increase their test coverage as they don’t have the time to test

everything for every release nor is it feasible to exhaustively test everything

manually constantly. The second reason was to automate repeatable tasks to become

more effective and to direct the tester’s focus away from monotonic and often boring

tasks to minimize human error. None of the interviewee’s believed that VRT could

replace manual testing, but rather believed it is a helpful compliment for manual

testing. This result is supported by the following studies; [1], [3], [8], [9], [19]. These

studies argue that the reason why automated tests or visual testing cannot replace

manual testing practices is because the automated tests can only detect failures that

are explicitly asserted. Therefore, other practices such as manual exploratory testing

are needed to compliment the test automation scripts. E. Alégroth et al. [19] and M.

Jovic et al. [9] argue this is a common misconception in industry, were the

expectation leans towards that the test automation can perform all tasks that humans

can. The interviewees mainly argued why it cannot replace automatic testing is

because the VRT cannot determine whether an application “looks nice”, i.e. how

colours match with each other or capturing the overall feeling of the application, that

human intelligence is required for these tasks. Two testers mentioned a test case,

checking the data produced when exporting the data from one extension, that they

currently haven’t found a good way to automate, indicating that some test cases are

difficult to automate. The technical debt increases in the form of test maintenance

and test execution time as the number of test cases grows, which is a factor to

consider upon selecting test cases. Especially in the case of VRT and Vizlib, as an

interviewee mentioned, that the execution time tends to be longer with this testing

approach compared with other techniques. Additionally, the more test cases Vizlib

holds, the more maintenance is likely required upon updating the version of Qlik, as

discussed in Section 8.1.1 and Section 8.1.2. Lastly, one interviewee mentioned a

benefit of using VRT as an ultimate check, ensuring the application works as it

should. The interviewee further describes that this is difficult to do with other testing

approaches such as functional testing. This is a benefit with VRT, that i t captures the

end user perspective which is enabled by this form of testing.

8.1.7 Development process

The results show that it takes between half an hour to one hour for the testers to

perform a smoke test while testing a feature can take up to 6 hours. All the

interviewees from the second cycle believed that this process is more effective now

both regarding the time and efficiency. However, they mention that their workload

has increased because of the implementation effort of writing test but believe this

will decrease in time as discussed in Section 8.1.2. That the execution time of the

tests would decrease, and the testing frequency would increase was an expected

result as discussed in Section 8.1.4. However, the interesting result is the total effort

required, viewed from a long-term perspective when transitioning to ATE of VRT.

The study period for this thesis is too short to draw any conclusions in that regard,

but the results indicate potential for positive long-term results.

Furthermore, the interviewee’s mentioned a benefit of finding issues while

writing tests that had not been observed before. One interviewee discussed this as a

benefit of being quality conscious. That is, the task of writing a test, increases the

quality awareness and works as a second manual test. S. Berner et al. [9] discusses

that during the test automation effort, 60 – 80 % of all bugs are found during the

development of tests. A further investigation would be required to confirm the

proportions of the bugs detected; However, the results indicate an increase of quality

consciousness.

From the metrics collected during Cycle 3 the developers reported it can take up

to 1-2 weeks before receiving feedback from the testers on whether their changes are

approved. The results additionally show that there is a problem with issues bouncing

back and forth between the testers and developers, causing an overhead of testing bad

artifacts from the testers point of view, and context switching and revisiting old

issues from the developers view. As discussed in Section 8.1.4, the developers

reported an increase in quicker feedback which could tackle the issue the

organization was experiencing. However, additional data is required to draw any

conclusions regarding this issue.

8.1.8 Organization

The results show that the testers view upon which tasks were the most tedious

were aligned. All the testers mentioned that the monotonous or repetitive testing are

the most tedious tasks. Additionally, one tester mentioned that coping with server up

times has a tendency of becoming an irritating overhead. The results were also

aligned when the testers were asked which tasks they found the most enjoyable. The

testers mentioned that exploratory, challenging or complex testing are the most

stimulating and enjoyable tasks. As discussed in Section 8.1.6, one of the incentives

for introducing VRT was to automate repetitive tasks, matching the tedious testing

tasks described by the testers. Additionally, the results show that regression bugs is a

challenge that Vizlib are facing. V. Gaurosi et al. [1] argues that smoke tests, large

amount of tests that are similar to each other, and frequent regression tests among

other factors are beneficial areas/factors to introduce test automation for, which

matches with the view over what Vizlib is striving for. Furthermore, V. Gaurosi et al.

[1] argues for some non-beneficial factors such as tightly integrated with 3rd party

software. This has been observed in this study and is discussed in Section 8.1.1. The

authors argue further that it is important for an organization to hold adequate

knowledge and competences in order to be successful upon introducing test

automation. When viewing the collected metrics, one could argue this is the case for

Vizlib. However, one interviewee reported that it was difficult to understand the test

results produced by the CI system and suggesting that some form of education plan

should be in place. This is an important factor that should be incorporated, ensuring

all actors can utilize the system in order to maximize the benefits.

Furthermore, two interviewees mentioned that test aren’t always necessary

helpful, it depends on the issue at hand. From the collected metrics all the developers

felt it was difficult to develop new features without risking introducing new bugs,

because of the code complexity. This indicates that other factors within good

software engineering practises are important for achieving high quality than testing

for achieving products with low defect levels.

Considering the various aspects mentioned above, indicates that Vizlib has been

a good candidate for introducing test automation. However, these set of factors may

vary in other organizations and environments. They should therefore be taken with

care and consideration in the decision process of introducing test automation, as the

end results may differ because of them.

8.1.9 Unstructured Observations

The observations made throughout all cycles indicated some factors that had to

be considered which in turn influenced the implementation design. Some of these

factors are context-based, such as the tight coupling extensions have with Qlik.

However, it is not unlikely for an organization to face e.g. similar types of

dependencies, making these factors still important to consider upon deciding if the

organization should introduce VRT or not, even though, the faced context-based

variables aren’t the same. Additionally, some general factors may not have been

observed in this thesis because of the context-based variables at Vizlib, i.e. because

of the tight coupling, other factors were not relevant for this type of setting and

therefore not observed.

8.1.9.1 Testing tools

The first testing tool observation L1, indicates that it is important to ensure the

testing tool fits and performs after the sought-after needs. This observation supports

the same observations made by Garousi et al. [1] and S. Berner et al. [9]. If no

solution would have been found, it could have jeopardized the success of the

implementation and is therefore important to investigate. Even if other solutions exist

to this limitation, such as, implementing the resource allocation inside the CI

platform, the solution may have become too complex to pursue.

The second testing tool observation C1, was concerning maintaining the image

baseline. The standard solution offered by a testing tool may differ depending on

which testing tool is used. Regardless of which tool is used, a strategy for

maintaining the baseline is required. The strategy used at Vizlib as described in the

results was to upload the images to a 3rd party storage provider and later download

these images before execution. Factors such as which git strategy the organization

uses may influence the choice of strategy. For example, a drawback with the strategy

used at Vizlib is that a feature branch that contains visual updates is going to fail the

test cases affected until the image baseline is updated. Additionally, upon updating

the image baseline, all the other branches that does not contain the new changes, are

going to fail until they have been updated with the latest code changes. To the

authors knowledge, no research papers exist investigating strategies for maintaining

image baselines for GUI testing.

The third testing tool observation C4, was concerned with generating the image

baseline from the same operating system as the test runner later executed from. This

observation was also concerned with that different resolutions of the image baseline

may affect the test results. This result indicates that a strategy for determining how

the image baseline should be generated is needed, as it affects the reliability of the

test results. The strategy used at Vizlib was to use the CI system to generate the

image baseline, which ensured the images were always produced from the same

operating system as the one the test runner executed from. However, this strategy

may have implications, because the operating system on a local computer may differ

from the one used on the CI platform, which may become troublesome when f.e.

developing tests. In order to resolve this issue, the implementation only downloads

the image baseline when the test runner executes from the CI platform. If no baseline

images are present when the test runner executes, it automatically saves and later

refers to the images taken during that test execution. However, a strategy must still

be utilized for generating the baseline images the CI platform should use.

8.1.9.2 Tested System

The first observation concerning the tested system C2, was concerned with

uploading the developed software to the test server. This factor is only relevant if the

SUT is required to be hosted on an independent server. However, it is important to

ensure it is possible to upload the tested software to the independent server in an

automated manner in order to introduce a CI solution. Otherwise, it would not be

possible to test code changes made.

The second observation concerning the tested system P1, was concerned with

authorization problems when using dedicated testing accounts for the CI solution.

This problem was first noticed during the second cycle when implementing the test

execution inside the CI environment. These types of problems can be difficult to

account for prior to an implementation and may have severe implications. The lesson

that can be learned from this observation is that a locally working solution will likely

need configuration on various components in the system to be functioning in a CI

environment. This factor should be accounted for when considering the

implementation effort.

8.1.9.3 Continuous Integration platform

Only one observation was made concerning the CI platform L2. The limitation

observed was that a sought-after configuration of triggering different workflows on

the system, based upon the events pull request and commits, was not achievable. The

solution was to trigger the same workflow for every commit. The consequences of

this observation were mild but could have had a more drastic effect. Upon selecting a

CI solution, it is therefore important to investigate if solution satisfies the sought -

after needs.

8.1.9.4 Support Software

Only one observation was made concerning the support software C3. The

challenge observed was that the build destination when running a build script was not

standardized. A benefit of introducing test automation is that the solution can be

reused for the different project that consist of the same type of software. However, it

may require some components to be standardized in order to reuse to a CI

configuration to experience this benefit and is a factor to consider.

8.2 Method

This section discusses the methods threats to validity based upon the guidelines

presented by P. Runeson [23].

8.2.1 Construct validity

The aim of the study is to investigate the effects of introducing VRT to a CI

platform in an industrial environment. The data collected from the interviews is

aimed towards capturing the practitioners experience when transitioning from a fully

manual testing suite to a semi-automated testing suite with VRT. In this context, a

semi-automated testing suite refers to that VRT will not replace all manual testing

practices but will however replace some of them and will additionally add new

practices. In order to increase the comprehension of the interview questions posed, a

pilot interview was held before each type of interview, minimizing the risk for

potential misinterpretations. The data collected from the observations are aimed

towards capturing decisions the author had to make during the progression of the

implementation for the system.

8.2.2 Internal validity & Reliability

The conclusions drawn in this thesis were acquired by careful triangulation of

the data collected from the various used methods. The primary data source used in

this study is based upon the data collected from the interview sessions. A threat to

the validity of this data source is how the respondents answered the posed questions,

as the respondents knew their answers would be reflected in this study, adding the

risk of that the interviewees would answer non-truthfully. Two methods have been

used to mitigate this risk. The first method has been to completely anonymize the

interviewees answers, ensuring their answers cannot be traced back to them, giving

the interviewees space to answer truthfully. The second method used has been

triangulation by finding support for claims in other data sources, increasing the

likelihood of a statement being true.

This thesis investigates a topic in a complex environment dependent on multiple

factors, such as, the developed product, selected frameworks & tools, the

organizations competence and commitment to the adoption, to name a few factors.

These factors could have a large impact on the success and results of the study.

However, since the implementation is considered a success by the organization, the

likelihood of these factors impacting the results negatively is considered low.

Regardless of success of the implementation, these factors would of have clearly

shown their impact in results if they would have had an apparent influence.

The risk of bias introduced by the author is an additional factor that needs to be

discussed. When presenting the results, the author could have only selected

statements that have yielded beneficial results and taken statements out of context,

obscuring what was meant by a statement. The first measure that has been taken to

mitigate this risk, is by presenting the raw data from the transcribed material. The

second measure has been using a grounded theory approach by using coding when

analysing the data, linking statements to specific codewords and analysing the

codewords independently, reducing the authors bias. However, when the coding

process is done, some form of bias is automatically introduced as the author must do

some form of interpretation in order to code the data. This has been done after the

authors best ability and therefore, the data produced from the coding process and

transcribed material is made available upon request.

Lastly, the results in this thesis have only been based upon qualitive data. Future

work of additional quantitative data is required for stronger support of the claims

made in this thesis.

8.2.3 External validity

The largest threat to validity is that the study is only conducted at one company,

resulting in the results may have a low external validity for other companies and

environments [23]. This affects the generalizability and replicability of the results as

they may be dependent on various context-based variables. However, similar studies

have been carried out by using similar methods, technologies and contexts.

Similarities among these studies and results have been observed, indicating some

form of replicability of the results and applicability of the conclusions in other

contexts and companies.

Furthermore, distinguishing what is cause by introducing VRT compared to

introducing some other testing technique with test automation is difficult. Therefore,

the effects of introducing VRT to a context were test automation is already in place

compared to transitioning from a completely manual testing suite, may not be as

apparent as some benefits may already be experienced. However, since VRT is a

subset of test automation, it means test automation benefits can also be experienced

with VRT.

8.3 Work in a wider context

It is important to be aware and to raise awareness of the various stakeholders

over the environmental impact when developing and consuming software. When

introducing well utilized practices such as CI, the organisation may become more

effective [40]. This could in turn mean that the company would not need to employ

as many employees, reducing the carbon footprint from the organisation. However, it

is not common today to take into account the overall efficiency or energy

consumption produced when developing, testing and maintaining software [41].

Therefore, it is important factor to consider when introducing CI practices and

optimizing which jobs are run in the CI pipeline, in order to reduce redundant builds

and test runs. Consequently, reducing the energy impact and becoming more

effective when developing the software.

Chapter 9 Conclusion

This thesis aimed to investigate the research questions stated in Section 1.3. It

did so by providing an implementation VRT in an CI environment. The

implementation was divided into three cycles with different purposes to capture

different perspectives of the various stakeholders. Additionally, the author has

performed an informal literature review and collected his own observations

throughout each cycle. This was done in order to gain better insight of the

implications raised by the implementation and to be able to triangulate findings.

Based upon these findings, the following can be concluded regarding each research

question:

RQ1: What practical benefits and drawbacks are associated with introducing

visual regression testing to an industrial CI environment?

Benefits:

- Clear definition with images over how the application/system is expected to

look like.

- VRT is a pessimistic technique, and therefore very unlikely to report false

negatives.

- Potential to decrease the total testing effort over time.

- VRT interacts and captures the end-user perspective.

- Exists VRT tools that are easy to use, even without a developer background.

- Exists tools that are easy for an initial setup.

- Exists tools that offer videos over test run, making it easier to debug.

- Upon failed test runs, produces side by side images with highlights over the

parts of the image that differ.

- Frequent revision over how the application is expected to look like.

- Enables to replace repetitive manual testing tasks, reducing human error and

focus can be spent on more enjoyable tasks.

- Increased efficiency regarding performing smoke tests compared to

performing then completely manually.

- Increased quality consciousness, the practice of writing tests acts as an

additional testing step.

- Offers a wider test coverage that can possibly spot bugs that have not been

detected before.

- Increased testing frequency, reducing feedback time.

Drawbacks:

- When the developed software is tightly coupled with 3rd party software, test

suites risk of breaking upon updates from the 3 rd party software provider

leading to test maintenance.

- Maintaining an appropriate strategy for keeping the image baseline up to

date for the VRT tests.

- Challenge of dealing with animation, latency and load times upon the

implementation effort of test suites.

- Increased work effort upon introducing the implementation.

- Still need to perform a root cause analysis upon failed test runs.

- Test execution time tends to be longer compared to other techniques such as

unit testing.

- Needs a strategy for handling false positive test results.

- Cannot replace manual testing practices completely, is still a complement to

manual testing.

RQ2: Which factors should be considered when implementing VRT to an CI

environment?

Factors:

- If possible, keep test execution and CI solution in a local network to

decrease latency and load time issues.

- Early investigation if the testing tool supports functional and performance

needs.

- Early investigation if the CI platform supports functional needs for the

intended testing implementation.

- Investigation of the developed software dependencies to other 3rd party

software, how often would their updates occur and how severe are the

implications of their updates.

- A strategy for how the image baseline should be updated and maintained.

- If the application contains animations, investigation over how the VRT test

suites could be affected.

- Ensure a version of the developed software can be uploaded to the SUT.

- If the developed software is dependent on a 3rd party SaaS provider, a

strategy must be investigated for handling licenses for testing accounts.

- Ensure the organization holds enough knowledge and competencies around

the testing techniques and tools used.

- The reliability of the tests results is important, therefore ensuring the

robustness of the test suites is crucial.

- Introduce only automation for tests that are considered repetitive tasks.

- Test cases should be carefully selected in order to keep the execution time

low, as VRT tends to have a longer execution time compared to other testing

techniques.

- An investigation whether the CI platform supports parallel test executions, if

execution speed is crucial.

- An education plan for all the various actors in the system in order to ensure

that all actors know how to draw knowledge from the system and maximize

its benefits.

- A strategy must exist for generating the baseline images, as these images are

affected by both resolutions and operating systems.

- An investigation over which browsers are supported by the CI platform in

order to enable cross-browser testing, if this trait is important.

9.1 Future work

Based upon the results and the discussion in this thesis, it would be interesting

to further investigate strategies for maintaining an image baseline. To the authors

knowledge, there seems to be a lack of research papers within in the area.

Additionally, future work of a stricter comparison of when it is more beneficial to

select a VRT tool over a VGT or vice versa would be interesting, as these tools have

different attributes.

Lastly, similar work in different industrial contexts and environments with

support from quantitative data is needed to further support the conclusions in this

thesis.

References

[1] V. Garousi and M. V Mäntylä, “When and What to Automate in

Software Testing? A Multi-vocal Literature Review,” Inf. Softw. Technol., vol.

76, no. C, pp. 92–117, Aug. 2016, doi: 10.1016/j.infsof.2016.04.015.

[2] K. Stobie, “Too much automation or not enough? When to automate

testing.,” in Pacific NW Software Quality Conference, 2009.

[3] E. Alégroth, A. Karlsson, and A. Radway, “Continuous Integration

and Visual GUI Testing: Benefits and Drawbacks in Industrial Practice,” in

2018 IEEE 11th International Conference on Software Testing, Verification

and Validation (ICST), 2018, pp. 172–181, doi: 10.1109/ICST.2018.00026.

[4] Qlik, “Qlik Sense.” [Online]. Available:

https://www.qlik.com/us/products/qlik-sense. [Accessed: 20-Feb-2020].

[5] M. Leotta, D. Clerissi, F. Ricca, and P. Tonella, Advances in

Computers, vol. 101. Elsevier, 2016.

[6] B. A. Kitchenham et al., “Preliminary guidelines for empirical

research in software engineering,” IEEE Trans. Softw. Eng., vol. 28, no. 8, pp.

721–734, Aug. 2002, doi: 10.1109/TSE.2002.1027796.

[7] R. Miller and C. Collins, “Acceptance Testing,” 2002, doi:

10.1007/978-1-4419-6488-5_14.

[8] E. Alegroth, R. Feldt, and H. Olsson, “Transitioning Manual System

Test Suites to Automated Testing: An Industrial Case Study,” in Proceedings -

IEEE 6th International Conference on Software Testing, Verification and

Validation, ICST 2013, 2013, pp. 56–65, doi: 10.1109/ICST.2013.14.

[9] S. Berner, R. Weber, and R. K. Keller, “Observations and Lessons

Learned from Automated Testing,” in Proceedings of the 27th International

Conference on Software Engineering, 2005, pp. 571–579, doi:

10.1145/1062455.1062556.

[10] T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G.

Rothermel, “An Empirical Study of Regression Test Selection Techniques,”

ACM Trans. Softw. Eng. Methodol., vol. 10, no. 2, pp. 184–208, Apr. 2001,

doi: 10.1145/367008.367020.

[11] E. Borjesson and R. Feldt, “Automated System Testing Using Visual

GUI Testing Tools: A Comparative Study in Industry,” in 2012 IEEE Fifth

International Conference on Software Testing, Verification and Validation,

2012, pp. 350–359, doi: 10.1109/ICST.2012.115.

[12] T.-H. Chang, T. Yeh, and R. C. Miller, “GUI Testing Using

Computer Vision,” in Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, 2010, pp. 1535–1544, doi:

10.1145/1753326.1753555.

[13] P. Li, T. Huynh, M. Reformat, and J. Miller, “A practical approach

to testing GUI systems,” Empir. Softw. Eng., vol. 12, no. 4, pp. 331–357, Aug.

2007, doi: 10.1007/s10664-006-9031-3.

[14] I. Banerjee, B. Nguyen, V. Garousi, and A. Memon, “Graphical User

Interface (GUI) Testing: Systematic Mapping and Repository,” Inf. Softw.

Technol., vol. 55, no. 10, pp. 1679–1694, 2013, doi:

10.1016/j.infsof.2013.03.004.

[15] M. Jovic, A. Adamoli, D. Zaparanuks, and M. Hauswirth,

“Automating Performance Testing of Interactive Java Applications,” in

Proceedings of the 5th Workshop on Automation of Software Test , 2010, pp. 8–

15, doi: 10.1145/1808266.1808268.

[16] J. Andersson and G. Bache, “The Video Store Revisited Yet Again:

Adventures in GUI Acceptance Testing,” in Extreme Programming and Agile

Processes in Software Engineering, 2004, pp. 1–10.

[17] A. Adamoli, D. Zaparanuks, M. Jovic, and M. Hauswirth,

“Automated GUI Performance Testing,” Softw. Qual. J., vol. 19, no. 4, pp.

801–839, Dec. 2011, doi: 10.1007/s11219-011-9135-x.

[18] M. Grechanik, Q. Xie, and C. Fu, “Creating GUI Testing Tools

Using Accessibility Technologies,” in 2009 International Conference on

Software Testing, Verification, and Validation Workshops, 2009, pp. 243–250,

doi: 10.1109/ICSTW.2009.31.

[19] E. Alegroth and R. Feldt, “On the long-term use of visual gui testing

in industrial practice: a case study,” Empir. Softw. Eng., 2017, doi:

10.1007/s10664-016-9497-6.

[20] O. Taipale, J. Kasurinen, K. Karhu, and K. Smolander, “Trade-off

between automated and manual software testing,” Int. J. Syst. Assur. Eng.

Manag., vol. 2, no. 2, pp. 114–125, Jun. 2011, doi: 10.1007/s13198-011-0065-

[21] D. M. Rafi, K. R. K. Moses, K. Petersen, and M. V Mäntylä,

“Benefits and Limitations of Automated Software Testing: Systematic

Literature Review and Practitioner Survey,” in Proceedings of the 7th

International Workshop on Automation of Software Test, 2012, pp. 36–42.

[22] T. C. Lethbridge, S. E. Sim, and J. Singer, “Studying Software

Engineers: Data Collection Techniques for Software Field Studies,” Empir.

Softw. Eng., vol. 10, no. 3, pp. 311–341, Jul. 2005, doi: 10.1007/s10664-005-

1290-x.

[23] P. Runeson and M. Höst, “Guidelines for conducting and reporting

case study research in software engineering,” Empir. Softw. Eng., vol. 14, no.

2, p. 131, Dec. 2008, doi: 10.1007/s10664-008-9102-8.

[24] A. Collins, D. Joseph, and K. Bielaczyc, “Design Research:

Theoretical and Methodological Issues,” J. Learn. Sci., vol. 13, no. 1, pp. 15–

42, 2004, doi: 10.1207/s15327809jls1301_2.

[25] K.-J. Stol, P. Ralph, and B. Fitzgerald, “Grounded Theory in

Software Engineering Research: A Critical Review and Guidelines,” 2016, doi:

10.1145/2884781.2884833.

[26] Qlik, “Qlik Pricing.” [Online]. Available:

https://www.qlik.com/us/pricing. [Accessed: 20-Feb-2020].

[27] “GulpJS.” [Online]. Available: https://gulpjs.com/. [Accessed: 27-

Feb-2020].

[28] “NestJS.” [Online]. Available: https://docs.nestjs.com/. [Accessed:

27-Feb-2020].

[29] “NodeJS.” [Online]. Available: https://nodejs.org/en/about/.

[Accessed: 27-Feb-2020].

[30] “CircleCI.” [Online]. Available: https://circleci.com/docs/2.0/about-

circleci/. [Accessed: 27-Feb-2020].

[31] “Introduction to JSON Web Tokens.” [Online]. Available:

https://jwt.io/introduction/. [Accessed: 27-Feb-2020].

[32] “Why Cypress?” [Online]. Available:

https://docs.cypress.io/guides/overview/why-cypress.html#In-a-nutshell.

[Accessed: 27-Feb-2020].

[33] “Cross Browser Testing.” [Online]. Available:

https://docs.cypress.io/guides/guides/cross-browser-testing.html#Continuous-

Integration-Strategies. [Accessed: 27-Feb-2020].

[34] “Customer Providers.” [Online]. Available:

https://docs.nestjs.com/fundamentals/custom-providers. [Accessed: 28-Mar-

2020].

[35] A. L. Alwardt, N. Mikeska, R. J. Pandorf, and P. R. Tarpley, “A lean

approach to designing for software testability,” in 2009 IEEE

AUTOTESTCON, 2009, pp. 178–183.

[36] “Deploy and run apps on today’s most innovative Platform as a

Service.” [Online]. Available: https://www.heroku.com/platform. [Accessed:

16-Mar-2020].

[37] “Dyno Types.” [Online]. Available:

https://devcenter.heroku.com/articles/dyno-types. [Accessed: 16-Mar-2020].

[38] “Stacks.” [Online]. Available:

https://devcenter.heroku.com/articles/stack. [Accessed: 16-Mar-2020].

[39] “Build fast. Start for free.” [Online]. Available:

https://circleci.com/pricing/. [Accessed: 16-Mar-2020].

[40] S. Dösinger, R. Mordinyi, and S. Biffl, “Communicating continuous

integration servers for increasing effectiveness of automated testing,” in 2012

Proceedings of the 27th IEEE/ACM International Conference on Automated

Software Engineering, 2012, pp. 374–377.

[41] J. Drangmeister, E. Kern, M. Hirsch-Dick, S. Naumann, G.

Sparmann, and A. Guldner, “Greening Software with Continuous Energy

Efficiency Measurement,” in GI-Jahrestagung, 2013.

Appendix A: Interview Questions Cycle 1

Background

1. How long have you been working in the Software industry?

2. What is your prior experience with testing techniques? What was your

experience/attitude with/towards it?

3. Have you ever been involved in a company or project that utilizes VRT? If

so, what was your experience with it?

Automated testing VRT

4. What are the main drives behind the incentive for test automation?

5. Considering VRT, what are the associated main benefits achieved by

introducing test automation?

6. What are the associated drawbacks with introducing test automation, more

specifically with VRT?

7. Is VRT a compliment or an alternative to manual testing? Yes (complement)

- What does manual testing contribute to compared to the automated test

execution with VRT? / Explain, why? No (alternative)

Perceived quality

8. If you transition to automatic testing, do you think your released product

will obtain a higher overall quality? High product quality is defined as a low

defect level in the product.

9. Do you think automatic testing will affect your confidence in a working

product after a release?

Processes

10. How does the process work from a bug or feature is reported until the code

changes exist in a production environment? (Describe the steps)

11. How can automated test execution affect the process described earlier?

Why/Why not?

12. Will automated test execution increase or decrease the feedback time if a

failure or defect is found, more specifically VRT?

13. Will automated test execution decrease or increase the total effort (in time)

of testing?

Appendix B: Interview Questions Cycle 2

Background

1. How long have you been working as a software tester?

2. What’s your prior experience with software testing tools?

3. What’s your prior experience with VRT testing tools?

4. What are your assignments as a software tester?

Software Testing

5. What are the most time-consuming tasks when working as a software tester

at Vizlib?

6. As a software tester, what tasks do you find most enjoyable?

7. What tasks or issues do you find tedious as a software tester?

8. Is VGT a compliment or an alternative to manual testing? Yes (complement)

- What does manual testing contribute to compared to the automated test

execution with VRT? / Explain, why? No (alternative)

Experience gained from VGT

9. What challenges did you experience when writing tests for Cypress?

10. How much effort did it require before you were able to produce tests for

Cypress? How long did it take before you learned the tool?

11. How long would you say it takes for you to produce a test with Cypress?

12. Do you believe your workload will increase or decrease with Cypress?

13. Has there been any test cases that have been difficult to test? If so, what was

the issue?

Perceived benefits & Drawbacks

14. How reliable do you feel the test results are after executing a test suite with

Cypress?

15. What do you believe are the benefits and which have you experienced since

the transition with introducing Cypress? Why?

16. What do you believe are the drawbacks and which have you experience since

the transition with introducing Cypress? Why?

Appendix C: Interview Questions Cycle 3

Background

1. How long have you been working as a software developer?

2. How long have you been working for Vizlib?

Perceived issues

3. How much effort of your time is spent on maintaining the quality of the

products rather than developing new features?

4. How long does it usually take before you receive feedback on whether your

changes are approved or not?

5. How would you consider the code complexity of the extensions?

6. Considering the complexity of the extensions, do you find it difficult to

implement new features without possibly introducing new bugs? Are the

extensions bug-prone?

Perceived Benefits & Drawbacks

7. Do you believe VRT testing techniques are able to detect bugs you usually

deal with?

8. From your perspective, have you experienced any benefits since the

transition to the CI solution? Elaborate.

9. Have you experienced any drawbacks from the CI solution? Elaborate.

Statement of Originality and Letter of Authorization

学位论文原创性声明

Statement of Originality

本人郑重声明：此处所提交的学位论文《中文题目 English Title》，是本人在导师指

导下，在哈尔滨工业大学攻读学位期间独立进行研究工作所取得的成果。且学位论

文中除已标注引用文献的部分外不包含他人完成或已发表的研究成果。对本学位论

文的研究工作做出重要贡献的个人和集体，均已在文中以明确方式注明。

作者签名：日期：年月日

学位论文使用权限

Letter of Authorization

学位论文是研究生在哈尔滨工业大学攻读学位期间完成的成果，知识产权归属哈尔

滨工业大学。学位论文的使用权限如下：

（1）学校可以采用影印、缩印或其他复制手段保存研究生上交的学位论文，并向国

家图书馆报送学位论文；（2）学校可以将学位论文部分或全部内容编入有关数据库

进行检索和提供相应阅览服务；（3）研究生毕业后发表与此学位论文研究成果相关

的学术论文和其他成果时，应征得导师同意，且第一署名单位为哈尔滨工业大学。

保密论文在保密期内遵守有关保密规定，解密后适用于此使用权限规定。

本人知悉学位论文的使用权限，并将遵守有关规定。

作者签名：日期：年月日

导师签名：日期：年月日

Acknowledgement

I would like to thank everyone at Vizlib for supporting me with my work. Additionally,

assisting me with technical issues, bouncing and mind storming ideas and their overall

involvement in the study. I would like to shout out a special thanks to David Alcobero,

who made this thesis possible. I would also like to additionally express my gratitude for

everyone involved in the interviews, as they are the backbone for this thesis. Furthermore,

I’d like to thank Lena Buffoni and John Tinnerholm for their supervision. Additionally, I

would like to thank my opponent Oscar Andell for his feedback, Andreas Lundquist for

helping each other and Tong Zhang for translating my abstract to Chinese. Lastly, I would

like to thank friends and loved ones for their support.

Confirmation of Supervisors

HIT Supervisor

Signature: Date:

LiU Supervisor

Signature: Date:

Internship Supervisor

Signature: Date:

Net benefits analysis of Visual Regression Testing in a...

Documents

Technical Tips: Visual Regression Testing and Environment Comparison with Backstop.JS

ViPR: Visual-Odometry-aided Pose Regression for 6DoF Camera … · 2019-12-25 · ViPR: Visual-Odometry-aided Pose Regression for 6DoF Camera Localization Felix Ott1, Tobias Feigl2,1,

Kendall-Theil Robust Line (KTRLine-version 1.0)-A Visual ... · A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients

Master's Thesis: Automatic Regression Testing using Visual ... · Visual GUI Tools Master of Science Thesis in Computer Science: Algorithms, Languages and Logic Johan Sj oblom and

Behavioral Management Patterns: Small Firms’ Recipe for Growth · and Regression Tree (CART) Analysis as well as Logistic Regression Analysis. Figure 2 provides a visual depiction

Automated Visual Regression Testing

Visual Regression Testing: mais um tipo de teste pra sua pipeline

TDC 2015 Floripa - Visual Regression Testing em ambientes na nuvem

A Possible Regression Equation for Predicting …downloads.hindawi.com/journals/joph/2017/1320457.pdfA Possible Regression Equation for Predicting Visual Outcomes after Surgical Repair

Regression Logistic Regression

Visual/CSS Regression Testing -- Catching the "unintended consequences" of modifying your theme

using ‘Galen’ and ‘Wraith’ Automating Visual Regression · Problems in Layout Testing •Manual efforts are high. •Possible chances of missing key areas of regression. •Not

Regression Linear Regression

Regression Linear Regression Regression Trees

Visual Regression Testing em ambientes na nuvem - Stefan Teixeira

MTC - Automatizando Visual Regression Testing

[Srijan Wednesday Webinars] Automating Visual Regression using ‘Galen’

Probabilistic Regression for Visual Trackingdirect regression strategy y= f(x) forces the network to commit to a single prediction y, providing no other infor-mation. However, the

Visual regression test

Probabilistic Regression for Visual Tracking...Probabilistic Regression for Visual Tracking Martin Danelljan Luc Van Gool Radu Timofte Computer Vision Lab, D-ITET, ETH Zurich, Switzerland¨