View
7
Download
0
Category
Preview:
Citation preview
Linköping University | Department of Computer and Information
Science Master’s thesis, 30 ECTS | Information Technology
2020 | LIU-IDA/LITH-EX-A-20/024-SE
Net benefits analysis of Visual
Regression Testing in a
Continuous integration
environment: An industrial case
study
Axel Löjdquist
Examiners: Lena Buffoni
Supervisors: John Tinnerholm
Company supervisor: David Alcobero
Linköpings universitet SE–581 83
Linköping +46 13 28 10 00
www.liu.se
Upphovsrätt
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se förlagets hemsida: http://www.ep.liu.se/.
Copyright
The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.
The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.
According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.
For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.
© Axel Löjdquist
硕士学位论文 Dissertation for Master’s Degree
(工程硕士) (Master of Engineering)
持续集成环境下视觉回归测试的净收益分析:工业案例研究
Net benefit analysis of Visual Regression Testing in a
Continuous integration environment: An industrial
Case study
Axel Löjdquist 阿克塞尔
2020 年 9 月
Linköping University
UnUniversity
国内图书分类号:TP311 学校代码:
10213
国际图书分类号:681 密级:公开
工程硕士学位论文
Dissertation for the Master’s Degree in Engineering
(工程硕士)
(Master of Engineering)
持续集成环境下视觉回归测试的净收益分析:工业案例研究
Net benefit analysis of Visual Regression Testing in a
Continuous integration environment: An industrial
Case study
硕 士 研 究 生 : 阿克塞尔
导 师 : HIT 臧天仪
副 导 师 : LiU Lena Buffoni
实 习 单 位 导 师 : David Alcobero, CTO
申 请 学 位 : 工程硕士
学 科 : 软件工程
所 在 单 位 : 软件学院
答 辩 日 期 : 2020 年 6 月
授 予 学 位 单 位 : 哈尔滨工业大学
Classified Index: TP311
U.D.C: 681
Dissertation for the Master’s Degree in Engineering
Net benefit analysis of Visual Regression Testing in a
Continuous integration environment: An industrial
Case study
Candidate: Axel Löjdquist
Supervisor: Lena Buffoni
Associate Supervisor: John Tinnerholm
Industrial Supervisor: David Alcobero, CTO
Academic Degree Applied for: Master of Engineering
Speciality: Software Engineering
Affiliation: School of Software
Date of Defence: September, 2020
Degree-Conferring-Institution: Harbin Institute of Technology
Thesis for Master’s Degree at HIT and LiU
I
摘 要
维护软件质量是一项艰巨的任务,其原因有很多,例如公司成长,上市时间要求,
代码复杂性等等。 GUI 测试工具和持续集成(CI)是当今常见的实践,用于解决维
护软件质量方面的某些问题。但是,这些技术带来了一系列挑战。视觉回归测试(VRT)
是一种特殊的 GUI 测试技术,专注于基于图像的断言。本研究介绍了在工业环境中
为 CI 环境引入 VRT 的利弊的实施和调查。此外,本文研究了在此过渡过程中需要
考虑的因素。结果表明,这种方法有很多好处,例如更快的反馈时间和测试频率的
增加。但是,还发现了缺陷和影响,例如测试维护和组织方面的问题,这表明公司
在实施之前需要仔细考虑。
关键词:视觉回归测试,持续集成,自动化测试,GUI测试
Thesis for Master’s Degree at HIT and LiU
II
Abstract
Maintaining quality in software is a difficult task for several reasons, such as, company
growth, time-to-market demands, code complexity and more. GUI testing tools and
Continuous Integration (CI) are common practice today to tackle some of the issues with
maintaining software quality. However, these techniques bring a set of challenges. Visual
Regression Testing (VRT) is a special kind of GUI testing technique focused upon image-
based assertions. This study presents an implementation and investigation of the benefits
and drawbacks of introducing VRT for a CI environment in an industrial context.
Additionally, the thesis investigates factors that need to be considered upon this transition.
The results show that benefits are associated with this approach, such as, quicker feedback
times and an increase in testing frequency. However, drawbacks and implications were
also identified, such as, test maintenance and organizational concerns, indicating careful
consideration needs to be taken by an organization before proceeding with an
implementation.
Keywords: Visual Regression Testing; Continuous Integration; Test automation;
GUI testing;
Thesis for Master’s Degree at HIT and LiU
III
Glossary
SaaS Software as a Service
ATE Automatic Test Execution
CI Continuous Integration
VGT Visual GUI Testing
e2e End-to-end
R&R Record and Replay
VCS Version Control Systems
CD Continuous Deployment
DRS Design Research Study
VRT Visual Regression Testing
DI Dependency Injection
IoC Inversion of Control
PaaS Platform as a Service
Thesis for Master’s Degree at HIT and LiU
IV
目 录
摘 要 ................................................................................................................... I
ABSTRACT .......................................................................................................... II
GLOSSARY ........................................................................................................ III
TABLE OF FIGURES ........................................................................................ IX
TABLE OF TABLES ........................................................................................... X
CHAPTER 1 INTRODUCTION ........................................................................... 1
1.1 MOTIVATION & BACKGROUND ..................................................................... 1
1.2 AIM AND APPROACH ..................................................................................... 2
1.3 RESEARCH QUESTIONS .................................................................................. 5
1.4 DELIMITATIONS ............................................................................................ 5
1.5 MAIN CONTENT AND ORGANIZATION OF THE THESIS ..................................... 5
CHAPTER 2 THEORY ......................................................................................... 6
2.1 TEST AUTOMATION ....................................................................................... 6
2.1.1 The 1st Generation ................................................................................. 6
2.1.2 The 2nd Generation ................................................................................ 7
2.1.3 The 3rd Generation ................................................................................ 7
2.1.4 Visual Regression Testing .................................................................... 8
2.2 CONTINUOUS INTEGRATION AND VISUAL GUI TESTING: BENEFITS AND
DRAWBACKS IN INDUSTRIAL PRACTICE ...................................................................... 9
2.3 TRANSITIONING MANUAL SYSTEM TEST SUITES TO AUTOMATED TESTING:
AN INDUSTRIAL CASE STUDY ..................................................................................... 9
2.4 AUTOMATED SYSTEM TESTING USING VISUAL GUI TESTING TOOLS: A
COMPARATIVE STUDY IN INDUSTRY ......................................................................... 10
2.5 TOO MUCH AUTOMATION ............................................................................ 10
2.6 TRADE-OFFS BETWEEN AUTOMATED AND MANUAL TESTING ....................... 11
2.7 WHEN AND WHAT TO AUTOMATE IN SOFTWARE TESTING? .......................... 12
2.7.1 SUT-related factors ............................................................................. 12
2.7.2 Test-related factors ............................................................................. 12
Thesis for Master’s Degree at HIT and LiU
V
2.7.3 Test tool-related factors ...................................................................... 13
2.7.4 Human and organizational factors ...................................................... 13
2.7.5 Cross-cutting and other factors ........................................................... 13
2.8 DATA COLLECTION ...................................................................................... 13
CHAPTER 3 METHOD ...................................................................................... 15
3.1 DESIGN RESEARCH STUDY ........................................................................... 15
3.1.1 Cycle 1 ................................................................................................ 15
3.1.2 Cycle 2 ................................................................................................ 15
3.1.3 Cycle 3 ................................................................................................ 16
3.2 QUALITATIVE DATA COLLECTION .............................................................. 16
3.2.1 Semi-structured interviews ................................................................. 16
3.2.2 Unstructured observations .................................................................. 19
3.2.3 Informal literature review ................................................................... 20
3.3 COLLECTED METRICS ................................................................................. 20
3.4 TRIANGULATION ......................................................................................... 21
CHAPTER 4 SYSTEM REQUIREMENT ANALYSIS...................................... 23
4.1 THE PROBLEM SITUATION............................................................................ 23
4.2 THE GOAL OF THE SYSTEM .......................................................................... 24
4.3 SELECTED- & RELEVANT TECHNOLOGIES ................................................... 24
4.3.1 NestJS ................................................................................................. 25
4.3.2 CircleCI ............................................................................................... 25
4.3.3 JWT ..................................................................................................... 25
4.3.4 Cypress ................................................................................................ 25
4.4 THE FUNCTIONAL REQUIREMENTS ............................................................... 26
4.4.1 Scheduler ............................................................................................ 26
4.4.2 CI platform .......................................................................................... 28
4.5 THE NON-FUNCTIONAL REQUIREMENTS ....................................................... 30
4.6 BRIEF SUMMARY ......................................................................................... 31
CHAPTER 5 SYSTEM DESIGN ........................................................................ 32
5.1 PROCESSES & ARCHITECTURE PRIOR TO THE IMPLEMENTATION .................. 32
5.1.1 Freshdesk ............................................................................................ 33
Thesis for Master’s Degree at HIT and LiU
VI
5.1.2 Github ................................................................................................. 34
5.1.3 Jira ....................................................................................................... 34
5.1.4 Vizlib Notifier (Slack bot) .................................................................. 35
5.1.5 Extension Builder ............................................................................... 35
5.2 ARCHITECTURE FOR THE NEW SOLUTION .................................................... 36
5.2.1 CI platform - CircleCI ........................................................................ 36
5.2.2 Test Server .......................................................................................... 37
5.2.3 Scheduler ............................................................................................ 37
5.2.4 Github ................................................................................................. 37
5.3 KEY TECHNIQUES ........................................................................................ 38
5.3.1 Dependency injection ......................................................................... 39
5.3.2 Designing for testability ..................................................................... 39
5.4 BRIEF SUMMARY ......................................................................................... 40
CHAPTER 6 SYSTEM IMPLEMENTATION ................................................... 41
6.1 THE ENVIRONMENT OF SYSTEM IMPLEMENTATION ...................................... 41
6.1.1 Scheduler ............................................................................................ 41
6.1.2 CircleCI ............................................................................................... 41
6.1.3 Test server ........................................................................................... 42
6.2 KEY PROGRAM FLOW CHARTS ..................................................................... 42
6.2.1 CircleCI workflows............................................................................. 42
6.2.2 Implemented workflow on CircleCI ................................................... 43
6.3 KEY INTERFACES OF THE SOFTWARE SYSTEM ............................................. 46
6.3.1 The Scheduler ..................................................................................... 46
6.3.2 Github ................................................................................................. 48
6.3.3 CircleCI ............................................................................................... 48
6.4 BRIEF SUMMARY ......................................................................................... 50
CHAPTER 7 RESULTS ...................................................................................... 51
7.1 INTERVIEW SESSIONS .................................................................................. 51
7.1.1 Test Maintenance ................................................................................ 53
7.1.2 Implementation Effort......................................................................... 55
7.1.3 Testing Tool ........................................................................................ 56
7.1.4 Feedback Time .................................................................................... 57
Thesis for Master’s Degree at HIT and LiU
VII
7.1.5 Test Reliability.................................................................................... 59
7.1.6 Test Automation ................................................................................. 61
7.1.7 Development Process .......................................................................... 62
7.1.8 Organization ........................................................................................ 65
7.2 UNSTRUCTURED OBSERVATIONS ................................................................ 67
7.2.1 Testing tools ........................................................................................ 68
7.2.2 Tested system ...................................................................................... 69
7.2.3 Continuous Integration platform......................................................... 69
7.2.4 Support software ................................................................................. 70
CHAPTER 8 DISCUSSION ................................................................................ 71
8.1 RESULTS ..................................................................................................... 71
8.1.1 Test maintenance ................................................................................ 71
8.1.2 Implementation effort ......................................................................... 72
8.1.3 Testing tool ......................................................................................... 73
8.1.4 Feedback time ..................................................................................... 73
8.1.5 Test reliability ..................................................................................... 74
8.1.6 Test automation................................................................................... 75
8.1.7 Development process .......................................................................... 76
8.1.8 Organization ........................................................................................ 77
8.1.9 Unstructured Observations ................................................................. 78
8.2 METHOD ..................................................................................................... 80
8.2.1 Construct validity................................................................................ 80
8.2.2 Internal validity & Reliability ............................................................ 80
8.2.3 External validity.................................................................................. 81
8.3 WORK IN A WIDER CONTEXT ....................................................................... 82
CHAPTER 9 CONCLUSION .............................................................................. 83
9.1 FUTURE WORK ............................................................................................ 85
REFERENCES .................................................................................................... 87
APPENDIX A: INTERVIEW QUESTIONS CYCLE 1...................................... 91
APPENDIX B: INTERVIEW QUESTIONS CYCLE 2 ...................................... 92
APPENDIX C: INTERVIEW QUESTIONS CYCLE 3 ...................................... 93
Thesis for Master’s Degree at HIT and LiU
VIII
STATEMENT OF ORIGINALITY AND LETTER OF AUTHORIZATION .... 94
ACKNOWLEDGEMENT ................................................................................... 95
Thesis for Master’s Degree at HIT and LiU
IX
Table of figures
Figure 1.1 Overview of DRS methodology. Adapted from E. Alégroth et al. [7].
....................................................................................................................................... 4
Figure 3.1 Overview of the hierarchical tree diagram. Adapted from E. Alégroth
et al. [3]. ....................................................................................................................... 20
Figure 4.1 Use case diagram for the scheduler. .................................................. 27
Figure 4.2 Use case diagram for the CI platform. ............................................... 29
Figure 5.1 Vizlib's prior system architecture. ..................................................... 33
Figure 5.2 Example of a slack bot message for a release. Names and personal
images have been censored. ........................................................................................ 36
Figure 5.3 Vizlib’s system architecture after the implementation. ..................... 38
Figure 6.1 Overview of the implemented workflow on CircleCI. ...................... 43
Figure 6.2 Sequence diagram over how the different systems communicate upon
a triggered event from Github. .................................................................................... 45
Figure 6.3 Status of triggered workflow viewed from Github. ........................... 48
Figure 6.4 Example of CircleCI job 2, viewed in CircleCI. ............................... 49
Figure 6.5 An overview of the CircleCI workflow viewed from the platform
webpage. ...................................................................................................................... 49
Figure 6.6 An overview of the testing artifacts stored in CircleCI. .................... 50
Figure 6.7 An overview of the failed tests in CircleCI, showing the tests that
have failed during an execution................................................................................... 50
Figure 7.1 Documented observations throughout all cycles. .............................. 67
Thesis for Master’s Degree at HIT and LiU
X
Table of tables
Table 1 Codes used for the coding process ......................................................... 18
Table 2 Functional requirements for the scheduler ............................................. 28
Table 3 Functional requirements for the CI platform. ........................................ 30
Table 4 Non-functional requirements for the scheduler. .................................... 31
Table 5 Non-functional requirements for the CI platform. ................................. 31
Table 6 Best practices summarized for testability presented by Alwardt et. al
[35]............................................................................................................................... 40
Table 7 Run-time environment for the scheduler. ............................................... 41
Table 8 Run-time environment for the jobs on the CI platform. ........................ 42
Table 9 Run-time environment for the test server. ............................................. 42
Table 10 Endpoints for the scheduler. ................................................................. 46
Table 11 Total occurrences of codes from coding process. ................................ 51
Table 12 Code appearances from each interview linked to the corresponding
cycle. ............................................................................................................................ 52
Table 13 Background and collected metrics from Cycle 1 interviews. .............. 52
Table 14 Background and collected metrics from Cycle 2 interviews. .............. 53
Table 15 Background and collected metrics from Cycle 3 interviews. .............. 53
Table 16 Type of observations made linked to each cycle. ................................ 67
Thesis for Master’s Degree at HIT and LiU
1
Chapter 1 Introduction
This chapter presents the introduction for the thesis, in Section 1.1 the
motivation and background are described. Section 1.2 describes the aim of this
thesis, with an overview of the method used to achieve the aim. Section 1.3 presents
the research questions for this thesis. Section 1.4 explains the scope and delimitations
for the research and the implementation. Section 1.5 describes the main content and
organisation of the thesis.
1.1 Motivation & Background
As a Software as a Service (SaaS) company grows, its product base can increase
as well as the functionality of the products. It may thus increase a product's
complexity and make it more bug-prone. When employing new developers with
different experiences and backgrounds, it forces the company to adopt more mature
processes. This poses a challenge for maintaining the quality of the products. To help
maintain it, automated test execution (ATE) can be a powerful tool to automate
certain processes within the Continuous Integration (CI) environment and in extent
the software development life cycle [1].
However, introducing automation brings a set of challenges. A test that may be
very simple to check manually can become very complex from an ATE perspective,
i.e. applications with nondeterministic behavior or modifications with minor
appearance changes to a GUI [2]. Computers can easily determine if two results
differ but are less effective in distinguishing the significance. Such as, being able to
tell that difference A is OK meanwhile difference B is not OK. Consequently,
ensuring test validity and minimizing false positives becomes a complication with
GUI testing.
With many testing techniques today it may be hard to capture the end-user
perspective, i.e. unit testing, which is a common practice today to create automated
scripts for [3]. Visual Regression Testing (VRT) is a technique that tackles this
problem by running the application as an end-user and captures images of certain
areas of application in specific states. It later uses image comparison algorithms to
compare if the screenshot taken of the System under test (SUT) matches with a
baseline image. The image baseline defines how the application should look in a
Thesis for Master’s Degree at HIT and LiU
2
specific state. If the image differs over a given threshold, the test fails. Determining
the feasibility, applicability, benefits, and drawbacks of introducing VRT to a CI
environment in industry is what this thesis aims to address.
This thesis will be conducted at a small to mid-size company with 50+
employees called Vizlib. Vizlib produces software used for a business intelligence
program called Qlik. Qlik is a platform that performs data integration, user-driven
business intelligence and conversational analytics [4]. Qlik offers extensions to their
platform which enables 3rd party vendors to create custom visualizations used for
data analytics. These visualizations are the kind of software that Vizlib produces and
are referred to as extensions. Today Vizlib currently develops and maintains 28
extensions and their product base is increasing. The testing process done at Vizlib
today is only performed manually. There are three different scenarios when testing is
done. First, whenever a new feature or bug is resolved, exploratory testing is done by
a tester to confirm that the fix works properly. Secondly, one or two developers will
code review the changes done when a pull request is made. Lastly, prior to a new
release, the testers will perform a smoke test ensuring that everything seems to be
working.
Introducing automated testing at Vizlib can be a helpful step in maintaining the
quality of their growing product base. The smoke tests performed at Vizlib today can
be a natural starting point for ATE. The smoke tests are performed by a tester that
gives input or performs a sequence of steps and observes the output based upon a
predefined protocol of tests, which is End-to-End (e2e) testing [5]. However, this
process becomes very labor-intensive as the release frequency and product base
increases.
1.2 Aim and Approach
The aim of this thesis is to investigate and evaluate some of the benefits and
drawbacks of introducing Visual Regression Testing (VRT), for a CI environment in
an industrial context. The choice of method to attain the results is to use an empirical
Design Research Study (DRS) [6] divided into three different cycles. The first cycle
involves configuring the VRT testing tool ensuring the technical approach is feasible.
Once the test suite is in place the study enters the second cycle. The second cycle
involves integrating the test execution into a CI environment. After the CI
implementation is completed the study enters the final phase. During cycle three the
Thesis for Master’s Degree at HIT and LiU
3
evaluation of the study results is done. The evaluation includes qualitative data
collected throughout all three cycles. Figure 1.1 shows an overview of the method.
The qualitative data is collected from semi-structured interviews, observations and an
informal literature review. The observations are used in each cycle in order to
capture the progression of the DRS. Throughout the study, the design is reviewed and
adjusted after gained experience and can fallback to past cycles if needed, i.e. the CI
environment requires the testing tool to be further configured. The author of this
thesis performed 9 interviews, 3 interviews were conducted for Cycle 1, 3 interviews
were conducted for Cycle 2 and 3 interviews were conducted for Cycle 3. The data
collection methods aim to capture the areas affected by the implementation in order
to identify the benefits and drawbacks of VRT. These methods additionally capture
experiences in order to determine which factors should be considered when
introducing VRT for a CI environment.
Thesis for Master’s Degree at HIT and LiU
4
Figure 1.1 Overview of DRS methodology. Adapted from E. Alégroth et al. [7].
Thesis for Master’s Degree at HIT and LiU
5
1.3 Research Questions
a) What practical benefits and drawbacks are associated with introducing
visual regression testing to an industrial CI environment?
b) Which factors should be considered when implementing VRT in a CI
environment?
1.4 Delimitations
The scope of the thesis is limited to evaluate all the Vizlib extensions (the
developed software) that have tests written for them. The thesis will not evaluate the
tools selected for the CI implementation nor the testing framework used for
executing the VRT tests. Furthermore, the thesis will not evaluate different CI
platforms offered by the market. All the selected technologies used for the
implementation were selected by Vizlib. These technologies are listed and described
in Section 4.3. Additionally, the thesis will not investigate appropriate threshold
values used for the image comparison algorithms. Lastly, this thesis used only
qualitative data sources, meaning no quantitative data sources were used.
1.5 Main content and organization of the thesis
The thesis is structured as followed. Chapter 2 presents the theory and related
work within the test automation and visual testing. Chapter 3 describes the
methodology used for the DRS, and the qualitative data collection methods used for
this thesis. Furthermore, the chapter describes how these methods aim to answer the
stated research questions. Chapters 4 and 5 present the system requirements, the
system design for the implementation and define the problem situation the
implementation aims to address. These chapters include architectural overviews, use-
case diagrams, list of requirements and which key techniques have been used.
Chapter 6 presents the implementation and testing of the system. The chapter
includes flow charts describing the system and how it was tested. The following
chapters present the results and discussions regarding the results. Lastly, the thesis
conclusions are presented.
Thesis for Master’s Degree at HIT and LiU
6
Chapter 2 Theory
This chapter presents the theory for this thesis. Section 2.1 describes the
background of test automation and GUI testing. Sections 2.2-2.7 present related work
in the investigated area. Section 2.8 provides the theory for data collection methods.
2.1 Test Automation
As the demand increases for quick time-to-market releases, many companies are
limited or compelled to focusing on a regression testing strategy for maintaining high
software quality. A regression testing strategy should be very flexible, so it can be
utilized after i.e. system modification or prior a product release [7]. ATE is a method
that addresses this issue, by speeding up the process and providing feedback from
modifications quicker [8]. It achieves this by decreasing the test execution time
compared to manual testing practices. Additionally, since it’s a computer performing
the regression tests, the human effort can be spent elsewhere. Furthermore, ATE
enables test execution during non-working hours. However, this requires the
automated tests to test on a higher-level than i.e. unit testing or other lower-level
testing techniques. It is uncertain if lower-level techniques can be reliable enough to
capture the entire system or the end-users point of view for high-level systems [9],
[10]. Acceptance testing is an intensive, complex, expensive and labor-intensive
process for GUI applications [11]–[13]. However, there exists testing techniques that
operate on a system-level, such as GUI testing [14]. GUI testing has evolved through
three generations described below.
2.1.1 The 1st Generation
The first generation of GUI testing interacts with the SUT by using screen
coordinates. Record and Replay (R&R) is an often-used technique, however not
necessary, to create test scripts for GUI tests [15]–[17]. It involves a two-step
process. First, it records the inputs and interactions performed by a tester while using
the SUT, such as key-strokes, mouse movement, and clicks. When one of these
events is triggered, the recording captures e.g. the screen coordinates of the mouse.
After the recording is completed, these sequences of events are saved to an
executable test script. Since this technique uses screen coordinates, the tests become
fragile to visual updates because the tests are dependent on the exact placement of
Thesis for Master’s Degree at HIT and LiU
7
GUI components [18]. This leads to high maintenance costs for applications that
experience visual changes.
2.1.2 The 2nd Generation
The second generation of GUI testing improved the robustness of the first
generation. Instead of capturing screen coordinates of triggered events, it captures
attributes associated with the elements rendered on the GUI [19]. These attributes
vary depending on what type of application is being tested. For web applications, the
test runner finds GUI components by accessing the HTML DOM and identifies the
component by matching the attributes associated with an HTML tag, i.e. the id
attribute. It can be argued that this difference makes the 2nd generation GUI testing
more robust than the 1st generation since it’s no longer sensitive to the placement of
visual components. However, this approach sets some requirements on the SUT.
First, in order to identify the sought-after element, it requires that the GUI
component has a unique attribute associated with it. Otherwise, this might lead to
some undefined behavior as it might interact with the wrong GUI component.
Secondly, even identifying GUI components with uniquely produced identifiers
might pose another challenge. The test runner needs to know prior to its execution
which element attribute it should search after. If the unique identifier is generated for
the SUT during execution, the tester runner won’t be able to find the sought-after
GUI component.
The 2nd generation of GUI testing takes advantage of element-based assertions.
This means the tester runner can expect certain elements to appear in, i.e. the HTML
DOM. If an element does not appear when expected, the test runner fails the test.
2.1.3 The 3rd Generation
The 3rd generation of GUI testing tries to tackle the issues with the 2nd
generation GUI testing by interacting with the SUT using image recognition [19].
This approach is often referred to as Visual GUI testing (VGT). The test runner
utilizes images of GUI components in order to determine which GUI component to
interact with. By using images rather than element attributes in order to identify GUI
components, the technique overcomes some of the obstacles with the previous
generation. However, this approach is dependent on how well the image recognition
Thesis for Master’s Degree at HIT and LiU
8
algorithm performs. If the image comparison algorithm performs poorly, the test
suite is subject to false positives, for failing to detect correctly rendered components.
The 3rd generation of GUI testing utilizes image-based assertions by comparing
images taken by the test runner with an image baseline that defines the expected
appearance.
2.1.4 Visual Regression Testing
VRT is not concerned with how the test runner interacts with the SUT but relies
on the assertion technique. VRT is based upon using image assertions, the 3rd
generation approach. There exist 2nd generation tools that supports this assertion
technique as a plugin or by default. In a case study by E. Alégroth et al. [19] , the
authors investigate why a large tech company, Spotify, transitioned to a 2nd
generation tool after a long term to use of a 3rd generation tool. Spotify experienced
difficulties with handling dynamic/non-deterministic data and 3rd party GUI
components. This led to a high maintenance of test scripts, as soon as a 3 rd party
provider updated an interface for their GUI component. As a result, the robustness of
the test scripts was reduced, as Spotify could not know when these updates occurred.
Another issue Spotify experienced was distinguishing different rows when i.e.
scrolling in their application, because the data in their list was dynamically produced.
This could be resolved by testing against a configured test database, making search
results deterministic. However, by using this approach, they were not testing against
a production environment anymore, which is one of the benefits with GUI testing.
The benefits of transitioning to a 2nd generation tool was the increased flexibility of
targeting dynamically produced data, as the test scripts could target attributes
associated in the GUI bitmap. However, this set some requirements in how the
application was developed, to ensure that these targeting attributes existed on
specific elements. E. Alégroth et al. [19] concluded that the tool Spotify used was too
limited to fully support their requirements, but argues VGT is still a valid testing
approach, by following best practice guidelines regarding the adoption and use of
VGT in practice listed in their paper.
Thesis for Master’s Degree at HIT and LiU
9
2.2 Continuous Integration and Visual GUI Testing: Benefits
and Drawbacks in Industrial Practice
In a case study performed by E. Alégroth et al. [3] investigate the applicability
of VGT in an industrial CI environment and try to identify some of the benefits and
drawbacks when introducing this technique. E. Alégroth et al. [3] recommend VGT
because it mimics the steps taken by a tester when performing manual testing, by
either using R&R or writing scripts describing the actions. Furthermore, VGT allows
for assertions based upon comparing images during the test execution with an image
baseline. This is one of the advantages of using VGT because it allows testing
against a production environment and captures the end user’s perspective, which is
difficult with other testing techniques such as unit testing. However, the authors
argue that there are some drawbacks related to this method, such as the test execution
run time and fault detection. The results show that during the study, 16% of all test
outcomes, were false positives, that is, good quality items being rejected.
Additionally, they reported that 0% false negatives were found, meaning bad quality
items being approved, indicating the testing technique is pessimistic. E. Alégroth et
al. [3] argue VGT can be a good method for finding that there is a failure in the
application but it will not show where the defect lies which can lead to a costly root-
cause analysis.
2.3 Transitioning Manual System Test Suites to Automated
Testing: An Industrial Case Study
In another case study performed by E. Alégroth et al. [8], the authors investigate
transitioning from manual to automated testing suites. The benefits found were
quicker feedback times to the developers which enable higher test execution
frequency, a decrease in the overall testing effort, and an estimated positive Return
on Investment (ROI) after 6-13 VGT test suite executions. Another benefit observed
was the automated test was not just able to find all the faults found by the previous
manual testing suites, but even faults that had not been identified earlier. However,
the author argues that additional data is needed in order to determine if VGT is
feasible in an industrial setting. Furthermore, E. Alégroth et al. [8] found that the test
subjects thought that the testing tools suffered from different limitations, which lead
to the conclusions that the automated testing was still complementary to the manual
test execution. The test subjects, however, argued over the substantial value given
Thesis for Master’s Degree at HIT and LiU
10
from the testing tools, even given their limitations. One of the perceived benefits was
flexibility, that the testing technique is independent of the platform and implemented
language.
2.4 Automated System Testing using Visual GUI Testing Tools:
A Comparative Study in Industry
Whether VGT is feasible acceptance testing technique whilst being a cost-
effective compared to manual testing, in general, is still debatable. There exist many
examples of test automation projects that have ended up with major expenditures or
failures [1], [2], [9], [20], meanwhile other research papers show positive support for
VGT [3], [8], [11]. A comparative study performed by E. Börjesson et al. [11]
showed an example where VGT was applicable as an acceptance testing technique in
an industrial setting. Their results show that they were able to automate 98 percent of
their test cases with two different testing tools. Furthermore, their analysis showed
that VGT overcame many of the limitations associated with R&R and other GUI
testing techniques. These limitations were the lack of flexibility and robustness , as
VGT is not bound by the psychical placement of GUI components on the display and
can find a component regardless of its placement [11]. Additionally, the authors state
since the technique always executes the same sequence of steps, the risk for fault slip
troughs caused by human error are reduced. However, the authors observed that the
technique was not able to find faults outside the defined testing scenarios and
therefore conclude that it cannot replace manual testing. Lastly, the authors estimated
a reduction for test suite execution by 78 % compared to manual testing with an
experienced tester. Thus, they conclude that VGT could potentially decrease cost
while increasing the testing frequency as the testing effort is decreased.
2.5 Too much automation
An important aspect of automating tests is the effort required by a company,
both regarding cost and time. According to [2], [21], introducing automated tests is
time consuming, both when a test fails and a manual analysis has to be made, and
when maintaining and updating the tests for new releases. The papers also state that
automated tests are generally more expensive than manual tests if they are only
executed very seldom. This is because it is generally more efficient to perform the
test once manually than to write the code for the automated test procedure. This
Thesis for Master’s Degree at HIT and LiU
11
naturally leads to the question of which tests should be automated. There, the papers
state that what should be tested is among other factors, very dependent on repetition.
If a test is written to be repeated multiple times, automation of the test is probably
worthwhile. On the other hand, if a test only is meant to be run a few times, the cost s
of automating that test exceed its value. Likewise, if the software that is being tested
is subject to large or frequent changes, it could prove unnecessary labor to write
specific automated tests for it. Finally, the papers also state that automated testing is
more reliable than manual testing when it comes to detecting changes to the system.
If the aim of the testing approach is to check that the output remains the same after
changing the software, an automated test is performed in the same way every time,
whereas it can be harder to perform manual tests in the exact same way every time,
and it could, therefore, be harder to reproduce found bugs.
2.6 Trade-offs between automated and manual testing
Net benefits from automated and manual testing seem to be dependent on many
variables. Results from a study by Taipale et al. [20], show that test automation could
provide benefits such as quality improvements and being able to perform more tests
in less time. It would also save time for the testers and enable them to reuse many of
the automated test cases. The same study also shows that the drawbacks of
automating tests could be the implementation cost, especially in complex and often-
changing code. Another less anticipated cost of automating tests is the maintenance
cost and cost of training the testers. Taipale et al. [20] state that all tests need some
human intervention, even for automated tests, because 100% automatic testing is not
a realistic goal. Depending on what type of software a company produces, automated
testing may have different outcomes. For a company that produces complex software
that is subject to large changes and frequent releases, it can be more expensive to
maintain and update automated tests than to perform them manually. But for a
company that produces simpler software with mostly smaller changes on software
releases, automated testing can work as a quality control tool for testing if
implemented correctly. The study finally states that the reusability of tests is
essential to make implementing test automation a worthwhile task.
Thesis for Master’s Degree at HIT and LiU
12
2.7 When and what to automate in software testing?
Deciding the scope of what to automate may be a difficult task. Having an idea
that it is worth to automate everything within the testing process is a dangerous line
of thought that will most likely result in disappointments or major expenditures. In a
multi-vocal literature review by Garousi et al. [1], the authors do an extensive review
of what and when to automate in software testing and try to determine important
factors to take into consideration. They state that automation cannot replace manual
testing completely or eliminate personnel costs. However, there are many benefits to
be found if automatization is done in the correct context with an appropriate
approach, such as, decrease cost in testing and an increase in the quality of software.
Garousi et al. [1], categorized the 15 most important factors of those found into 5
categories; SUT-related factors, Test-related factors, Test tool-related factors, Human
and organizational factors and lastly Cross-cutting and other factors.
2.7.1 SUT-related factors
The system under test (SUT) factors relates to the matureness of the system. If a
system is not mature enough, i.e. undergoing many major changes or new features
being re-implemented, it will have a negative impact on the automatization aspect.
This will produce ''broken tests'' which will require maintenance in repairing the
automated tests yielding false-positive results. In general, implementing
automatization in an immature project will result in a lot of effort to adapt to changes
[1].
2.7.2 Test-related factors
The test-related factors relate to the test cases and test suites used, i.e. the
decisions on what to test and what not to test has an impact on the automatization
aspect. Garousi et al. [1] concluded that the need for regression testing was the most
mentioned factor for test automation decisions. Furthermore, tests that humans
dislike to perform are strong candidates for automatization, meanwhile testing
performed for usability and user experience offer little payback from test
automatization.
Thesis for Master’s Degree at HIT and LiU
13
2.7.3 Test tool-related factors
Test tool factors depend on the quality and adaption to proper and suitable
testing tools for automatization. Selecting a suitable tool has a large impact on the
success of the potential benefits gained from test automatization. Garousi et al. [1]
argue that there involves a risk when selecting a testing tool. These risks include an
increase in pricing or a sudden halt of further development, which should be
accounted for during the selection process. The authors recommend selecting tools
with a large user base as a risk mitigation strategy.
2.7.4 Human and organizational factors
The human and organizational factors relate to the competence and skill the
organization holds. This factor has a great significance as the competence required
for enabling automatization, is both different and often additional from the skills
needed to carry out manual testing. Garousi et al. [1] claims that if an organization
lacks the resources for training or when the organization is missing competencies it
might be better not to automate as the risk for failure is high.
2.7.5 Cross-cutting and other factors
Garousi et al. [1] define cross-cutting factors as factors applicable to more than
one group, such as economic-, automatability of testing and development process
factors. Normally, the initial cost of introducing automatization is greater than
introducing manual testing. However, the cost of automated test execution is less
than manual testing, especially when the test executing is repeated several times.
Furthermore, the authors argue that the development cycle used at the organization
will impact the benefits of automatization.
2.8 Data collection
Lethbridge et al. [22] argue data collection techniques can be categorized into three
different degrees. The first-degree is direct methods which means the researcher has
direct contact with the data collection source and collects the data in real-time. These
direct methods include observations, interviews or focus groups. The advantages
with first degree methods are that the researcher has control over what data is
collected, the quality of the data and how it is collected. However, the drawbacks are
that it is an expensive method as it requires a lot of effort to apply [23].
Thesis for Master’s Degree at HIT and LiU
14
The second-degree is indirect methods which involve collecting raw data without
interacting with the subjects during a data collection session. These indirect methods
include automatically monitoring software engineering tools or observations found
from video recordings or meetings etc. The advantages with second-degree methods
are similar to first degree methods with the research ability to control but they are
less expensive to carry out [23]. Observations are a good technique to distinguish
between an official view of a specific matter and how it is in the real case. This
technique can provide the researcher with a deeper understanding of the subject
investigated. However, second-degree methods are still expensive to apply even
though they are generally cheaper than first degree methods [23].
Third-degree methods include independent analysis of artifacts already created
before or during the research period. These artifacts could be documents such as
manual testing reports, requirement specifications or failure reports. Third-degree
methods are advantageous since they are less expensive to carry out than first- and
second-degree methods, but this method reduces the possibility for the researcher to
control the quality of the data and what data is produced. This is because these
artifacts are were generally not produced with the intent of investigating the
investigated area [23].
Runeson et al. [23] argue that interviews are important in the data collection
process for a case study. The typical focus for semi-structured interviews is on both
acquiring the individual's qualitative and quantitative experience of the investigated
phenomenon. The interview questions are generally a mixture of open and closed
questions. Runeson et al. [23] further describe that there are three general principles
for how the ordering for the mixture of open and closed interview questions should
be proposed in an interview session:
Funnel – The funnel principle involves beginning the interview session by
asking open-ending questions towards the ends of the interview session asking more
specific closed questions.
Pyramid – The pyramid principle involves beginning asking specific closed
questions and as the interview session proceeds the questions become more open-
ended.
Time-glass – The time glass principle involves beginning the interview with
open-ended questions, progressing to specific closed questions and ending the
interview with open-ended questions again.
Thesis for Master’s Degree at HIT and LiU
15
Chapter 3 Method
This chapter presents the method used to answer the stated research questions.
Section 3.1 explains the DRS and describes what was done in each step of the used
model. Section 3.2 describes which qualitative data collection methods were used.
Section 3.3 describes which metrics were used, their relevance and on which data the
metrics will be derived. Section 3.4 describes the triangulation procedure used for
this thesis.
3.1 Design research study
The DRS for this thesis was divided into three cycles described as activities
below, see Figure 1.1 for an overview of the cycles. Each cycle included semi-
structured interviews with different aims. The characteristics of DRS are progressive
refinement in design. It involves bringing a solution to a real-world problem and
investigating how the solution works in its context [24]. Then constantly revising the
solution until a wanted behavior is reached, since it is difficult to account all factors
for a solution before the implementation. It is therefore considered appropriate to use
a DRS approach for this thesis. The DRS followed the guidelines presented by
Kitchenham et al. [6] for experimental design research. Since the DRS aims for
progressive refinement based upon learned experiences, the approach allows falling
back into a previous cycle [24]. However, the interviews were only carried out once
for each cycle. The authors observations were included and linked throughout all
three cycles.
3.1.1 Cycle 1
The first cycle of the DRS includes configuration of the VRT testing tool, ensuring
relevant testing data is produced, e.g. testing reports, configuring the structure in how
the tests should be executed and storage of the testing results. Semi-structured
interviews were carried out with the aim to define the need for ATE, competences,
experiences, organisational processes, and practices.
3.1.2 Cycle 2
The second cycle of the DRS included implementing and integrating the test
execution in the CI environment. The implementation goal is the enablement of the
Thesis for Master’s Degree at HIT and LiU
16
ATE in a CI environment. Semi-structured interviews were carried out during the end
of this cycle with the aim of investigating the adoption of the implementation from the
testers point of view.
3.1.3 Cycle 3
The third cycle included extracting and processing the data collected throughout
the first two cycles and later carrying out an evaluation of the implemented system
based upon the qualitative data collected. The interviews during this cycle aimed to
address the perceived benefits and drawbacks from the developer’s perspective. The
qualitative data was gathered from the author's observations and the semi-structured
interviews from each cycle. The collected data was divided into which cycle it was
collected from and lastly compared with the results from the informal literature review.
3.2 Qualitative Data Collection
The qualitative data collection for this thesis consisted of three different
methods. Section 3.2.1 describes how the semi-structured interviews were conducted
and what the interviews aimed to investigate for each cycle. See Appendix A, B, C,
to see the interview questions for each cycle. Section 3.2.2 describes how the data
from the unstructured observations were organized and collected. Section 3.2.3
describes how the informal literature review was used.
3.2.1 Semi-structured interviews
Throughout the thesis, three different types of interview sessions were held.
Each type had a different objective for each cycle, see Sections 3.2.1.4, 3.2.1.5 and
3.2.1.6. Every cycle interviewed 3 different interview subjects. An interview subject
was never interviewed twice. This section describes how the interview selection
process was held, how the interview data was analyzed and lastly it describes the aim
for the interviews in each cycle. All interviews were held online, due to the psychical
distance between the author and the interviewee’s as the company has offices both in
London and Poznan, and because of the COVID-19 outbreak in 2020 causing
travelling restraints.
3.2.1.1 Selection process
The interviewees were selected based upon their relevance for answering for the
objective defined in each cycle. The author had the possibility to select any interview
Thesis for Master’s Degree at HIT and LiU
17
subjects at the investigated company. The author selected interview subjects after
consulting with the company supervisor in order to determine their relevance.
3.2.1.2 Pilot interview
Pilot interviews were conducted before the actual interview sessions in order to
ensure the interview questions were formulated properly. This measure aims to
increase the quality of the data collected as misconceptions can be decreased from
ambiguous question formulations and the ordering of the questions can be reviewed.
The pilot interview was held with another master student, with a similar background
as the author of this thesis.
3.2.1.3 Interview data analysis
The interviews were recorded if consent was given from the interviewee. After
the interview data had been collected, the interviews were transcribed. This will
reduce the risk of information being lost as the interviewer can review the generated
material after the interview has been conducted and can focus on guiding the
interview during the session. Commonalities among the interviewee’s answers were
later be identified by using a Grounded Theory approach, open coding [25]. The
coding process was performed by going through the transcribed material and
matching statements made by the interviewees with a set of code words. One
individual statement could be linked to several codes since one sentence can infer to
several areas of concern. After the tagging process was completed, the statements for
each code was analyzed in order to draw conclusions for a given code. Table 1
presents the codes used for this thesis. These codes were defined prior to the data
collection process. In this study, the test reliability is defined as the confidence one
has in the test results.
Thesis for Master’s Degree at HIT and LiU
18
Table 1 Codes used for the coding process
# Code Description
1 Test Maintenance Statements concerning maintaining the
tests.
2 Implementation Effort Statements concerning the efforts for
implementing test suites.
3 Testing Tool Statements concerning the testing tool used
at the investigated company.
4 Feedback Time Statements concerning the feedback time to
the developers.
5 Test Reliability Statements concerning the reliability of test
results.
6 Test Automation Statements concerning the test automation.
7 Development Process Statements related to the development
process.
8 Organization Statements related to organizational
concerns.
3.2.1.4 Cycle 1
Objective – The objective of the interviews for the first cycle was to define the
interview subject's experience within software development, testing and the subject's
prior experience with VRT. The interview questions later focus on the company’s
drive for introducing automated test execution and to define the current
organizational processes and practices. Lastly, the interview questions focus on
finding what the perceived benefits and drawbacks are from introducing automated
test execution, more specifically VRT, see Appendix A.
Interviewees – The interview subjects during this cycle were selected based
upon if the subjects were drivers behind the incentive for introducing automated test
execution. At the company, there are 3 primary advocates driving the introduction of
VRT. Therefore, these 3 people were selected to be interviewed.
3.2.1.5 Cycle 2
Objective – The objective of the interviews for the second cycle is to define the
transition to the new solution from the tester's point of view. Firstly, the interview
Thesis for Master’s Degree at HIT and LiU
19
questions will seek the background and experience of the testers. Later, the interview
questions investigate their roles as software testers and their experience working with
VRT. Lastly, the questions focus on the adoption of the new setting and which
benefits and drawbacks are perceived from the tester's point of view, see Appendix
B.
Interviewees – The interview subjects selected for this cycle are the testers in
the QA-team at the investigated company. At the company, the QA-team consists of
6 testers. From this pool, the candidates were selected after consulting with the
company supervisor.
3.2.1.6 Cycle 3
Objective – The objective of the interviews for the last cycle is to find out the
perceived benefits and drawbacks from the developers. The interview questions aim
to investigate how the developers have been affected by the transition to automated
test execution, see Appendix C.
Interviewees – The interview subjects selected for this cycle were developers
that was involved in projects were the thesis implementation was present. At the
investigated company there are 25 developers employed. From this pool, the
candidates were selected after consulting with the company supervisor.
3.2.2 Unstructured observations
Observations were documented throughout the DRS and mapped to a
hierarchical tree diagram, see Figure 3.1. These noted observations were mapped to
different areas that the observation concerns. The observations were divided into
three different categories; challenges, limitations, and problems and documented as
the kind of problem they were considered at the time of detection. These
observations aim to provide a deeper understanding of the data collected from the
interviews. The author also investigated if the noted observations have been observed
from the literature found from the informal literature review.
Thesis for Master’s Degree at HIT and LiU
20
Figure 3.1 Overview of the hierarchical tree diagram. Adapted from E. Alégroth et al.
[3].
3.2.3 Informal literature review
An informal literature review was conducted to find the latest research within the
area. This assisted in finding concerns, metrics, approaches, and findings. Furthermore,
the data collected from the literature review aided triangulation to compare the findings
for the carried-out thesis with findings in other settings. These results are summarized
in Chapter 2
3.3 Collected Metrics
Industry experience – All the interviews for each cycle will collect the industry
experience from the interviewee. For the Cycle 1 and 3 interviews, this metric will
describe the interviewees experience in the software industry. For the Cycle 2
interviews, this metric will describe the interviewee’s experience as a software tester.
Thesis for Master’s Degree at HIT and LiU
21
This metric aims to provide a better understanding of the background experiences in
software development the organization holds.
Prior experience with VRT – This metric is only collected from the interviews
carried out during Cycle 1 and 2, as these interviewees are the only subjects
concerned with the development and introduction of VRT. This metrics aims to
provide a better understanding of the competences the organization holds for VRT.
Perception of VRT – All the interviewees for each cycle were asked if they
believed if VRT is a compliment or a replacement for manual testing practices. The
aim of this metrics is gaining a better understanding of which approach and the intent
Vizlib is taking upon introducing VRT.
Perception on test automation – During the first cycle interviews, the
interviewees were asked if higher quality could be achieved by introducing test
automation. This metrics aims towards providing a better understanding of the intent
for introducing VRT.
VRT test script development time – During the Cycle 2 interviews, the
interviewees were asked to estimate the time it takes for them to produce a test script
for a test script. This metrics aims towards providing a better understanding of the
implementation effort.
Feedback time – This metric is collected during the Cycle 3 interviews. The
developers are asked if they believe that the feedback time has decreases in cause of
the CI integration with VRT. This metric aims towards providing a better
understanding of how the feedback time has been affected. The feedback time in this
thesis is defined as the time it takes when a developer has completed resolving an
issue until feedback from the testers are given.
3.4 Triangulation
By using different methods for the qualitative data collection, it enables these
data sources to be triangulated. If more than one data source points towards a specific
finding, a stronger conclusion can be drawn as more evidence supports the claim. As
this thesis was only carried out in one industrial setting, triangulation is important in
order to verify claims from other sources in other contexts to confirm general
conclusions and support any findings in this study. For this thesis, the triangulation
process involved finding support for conclusions made in other scientific papers and
investigating if the same observations were found in this study. The triangulation
Thesis for Master’s Degree at HIT and LiU
22
process lastly included linking the conclusions to the stated research questions in
Section 1.3.
Thesis for Master’s Degree at HIT and LiU
23
Chapter 4 System Requirement Analysis
This chapter presents the system requirement analysis for this thesis
implementation. Section 4.1 describes the problem situation at Vizlib and defines
how the implementation fits the situation. Section 4.2 describes the goal of the
system. Section 4.3 presents the frameworks and technologies Vizlib has selected for
the implementation. Section 4.4 displays use-case diagrams over the implemented
systems and list their functional requirements. Section 4.5 lists the non-functional
requirements for the implemented systems.
4.1 The problem situation
The software developed at the investigated company is referred to as extensions.
Extensions are different kinds of interactive and configurable visualizations with the
sole purpose of visualizing data for gaining insight. An example of what an extension
could be is everything from a bar chart, pie chart to a heatmap, but in theory,
anything that you could think of that you would want to interact within a GUI. The
extensions are JavaScript and HTML based web applications. These extensions are
dependent on and tightly coupled with a third-party software platform, Qlik, which is
the engine that performs data analytics and produces the data to be visualized. This
poses a challenge as the developed software cannot be executed on its own but
requires an instance of the Qlik engine to be running in order to function.
Consequently, this raises some limitations for the implementation. One of these
limitations is that Qlik can only hold one version of an extension at any given time.
This will have implications for the test automation since two different versions of the
same extension cannot be tested at the same time on one instance of Qlik. It would
either require testing against another test server running its own instance of Qlik or
implementing a scheduler keeping track of what is being tested on the test server
currently. However, maintaining several testing servers would require a scheduler
dictating which servers are currently in use. Therefore, a scheduler supporting both
options will be implemented for this thesis, as it makes the solution scalable to
support a higher level of parallelism of concurrent test executions.
Another challenge that arises because of the software’s dependency on Qlik is
licensing. As Qlik is a Software as a Service (SaaS) product [26], the pricing is based
Thesis for Master’s Degree at HIT and LiU
24
upon monthly subscription packages that limit how many concurrent active users
may exist under one license. Thus, setting a limitation on the implementation since
the amount of supported parallel test executions is limited to how many concurrent
tests executions that can held under one user license.
Furthermore, the implementation will require allocating dedicated testing
accounts for the test automation as it needs to ensure that a license is only used by
once testing instance and that a license is available. This further strengthens the
importance of a scheduler in order to enable test automation in this context.
Currently, Vizlib uses a self-built web application to build new versions of the
released software. It involves obfuscating code, replacing generic tags used for
licensing purposes and other build specific processes. It achieves this primarily by
using a JavaScript task runner called Gulp. Gulp is a popular JavaScript framework
for automating workflows [27]. Gulp allows a user to define a set of tasks and
determine the sequence these tasks should be executed in. In order to test a
production version of an extension, the extension code must run through this build
process to later be uploaded to the test server. The current version of the self-built
application only supports sequential builds. This would most likely become
problematic if the ATE would be integrated into this platform because it would result
in a bottleneck. Therefore, now when Vizlib wants to introduce VRT, this build
script will also need to be added to the new CI platform.
4.2 The goal of the system
The goal of the implementation is to enable the automatic execution of VRT test
suites for the investigated company’s CI pipeline. The implementation consists of
three parts, building a scheduler application, configuring the new CI platform and
adding the ATE for VRT tests in the CI pipeline. The implementation is required to
be triggered by specific events in the software development cycle, such as pull
requests, commits and releases of a new version of the developed product. This thesis
does not include writing tests for the testing framework, as this practice is performed
by the QA-team.
4.3 Selected- & Relevant Technologies
This section presents relevant technologies and frameworks Vizlib has selected
for the implementation.
Thesis for Master’s Degree at HIT and LiU
25
4.3.1 NestJS
Vizlib selected NestJS to be the framework the scheduler is going to be
developed with. NestJS is a framework for building server-side applications that
utilizes TypeScript as the development language, meanwhile still supporting
JavaScript [28]. NestJS is built upon a popular library, NodeJS, which is an
asynchronous event-driven JavaScript runtime library used for building network-
based applications [29].
4.3.2 CircleCI
Vizlib has selected CircleCI, a Platform as a Service (PaaS) provider, as their CI
platform where the VRT tests will execute from. CircleCI is a platform that connects
to different Version Control Systems (VCS) and allows different user-defined jobs to
be executed when certain events are triggered on the VCS, such as commits or pull
requests [30]. CircleCI fetches the code associated when an event is triggered and
executes these user-defined jobs inside docker-containers or virtual machines. As
these user-defined jobs are performed within their own environment, it allows the
users of this service to execute a vast variety of tasks, such as Vizlib’s build process
and VRT testing. The ATE of VRT testing will reside on this platform. This platform
enables practices such as CI and Continuous Deployment (CD).
4.3.3 JWT
JSON web tokens (JWT) provide a secure way of transmitting information
between parties by using JSON objects and are commonly used for authorization
[31]. A JWT consists of three parts, a header, a payload, and a signature. The header
typically consists of which type of token it is and which hashing algorithm has been
used. The payload typically contains user information and the signature is a secret
used for validation. These three parts are encoded to Base64 and attached to the
header in HTTP requests. The structure of JWT makes it possible for servers to
determine if a request is authenticated or not. JWT will be used in the
implementation to authenticate to Qlik on the test server and to the scheduler.
4.3.4 Cypress
Cypress is the tool that Vizlib has selected as the VRT testing tool, see Chapter
2 for more information about VRT. Cypress is a front-end test runner for web
Thesis for Master’s Degree at HIT and LiU
26
applications that enables users to write E2E-, integration and unit tests [32]. The tool
provides snapshot images of the application when a test fails and a video over the
entire test execution. Cypress enables cross-browser testing as it supports different
types of browsers such as Google Chrome, Electron and Firefox [33].
4.4 The functional requirements
This section presents the functional requirements for the two different systems
implemented. Section 4.4.1 provides a use case diagram over the scheduler and lists
the functional requirements for the system. Section 4.4.2 presents a use case diagram
for the CI platform and lists the system's functional requirements.
4.4.1 Scheduler
The scheduler will require three main functionalities; requesting available
licenses, releasing an allocated license and viewing which licenses are currently
available and allocated. Figure 4.1 displays a use-case diagram over the system.
Table 2 lists the functional requirements for the scheduler.
Thesis for Master’s Degree at HIT and LiU
27
Figure 4.1 Use case diagram for the scheduler.
Thesis for Master’s Degree at HIT and LiU
28
Table 2 Functional requirements for the scheduler
ID Description
S-FR1 The system shall be able to display the current number of licenses in
use.
S-FR2 The system shall be able to generate a valid JWT-token for a test
user to be able to log in.
S-FR3 The system shall respond with a valid JWT-token for the test server
if a license is available.
S-FR4 The system shall be able to deallocate a prior allocated test user,
making the allocated license available again.
S-FR5 The system shall only allow one license to be allocated per extension
at any given point in time.
S-FR6 If no license is available for a request, the system shall respond with
a message informing that no licenses are currently available.
S-FR7 A license shall automatically be deallocated after it has been active
for 30 minutes.
4.4.2 CI platform
The CI platform will be configured to use Vizlib’s VCS, GitHub. Figure 4.2
displays a use-case diagram for the CI platform and Table 3 lists the functional
requirements for the CI platform.
Thesis for Master’s Degree at HIT and LiU
29
Figure 4.2 Use case diagram for the CI platform.
Thesis for Master’s Degree at HIT and LiU
30
Table 3 Functional requirements for the CI platform.
ID Description
CI-FR1 The VRT testing shall be executed when a pull-request is made
towards predefined branches.
CI-FR2 The VRT testing shall be executed when regression testing is
requested.
CI-FR3 The VRT testing shall be executed when a release is requested (A tag
is created on GitHub).
CI-FR4 The system shall store and display build and testing results.
CI-FR5 The system shall upload the build version to the test server if the
event that triggered the CI pipeline requires VRT testing.
CI-FR6 The system shall be able to run Vizlib’s build process.
CI-FR7 The system shall wait to execute the VRT test suite until a valid
license is received from the scheduler.
4.5 The non-functional requirements
This section lists the non-functional requirements for the implemented scheduler
and CI platform. Table 4 lists the requirements for the scheduler and Table 5 lists the
requirements for the CI platform.
Thesis for Master’s Degree at HIT and LiU
31
Table 4 Non-functional requirements for the scheduler.
ID Description
S-NFR1 The system implementation language shall be TypeScript.
S-NFR2 The system shall be implemented using the framework NestJS.
S-NFR3 The system shall support HTTP request methods, such as GET,
POST, DELETE and PUT.
S-NFR4 The implementation code shall follow Vizlib’s coding convention
format.
Table 5 Non-functional requirements for the CI platform.
ID Description
CI-NFR1 The configuration files for the CI system shall be written in YML
syntax.
CI-NFR2 The CI platform shall support docker containers.
CI-NFR3 The CI platform shall support storing test results and execution log.
4.6 Brief summary
The goal of this system is to introduce VRT execution in a CI environment at
Vizlib. The implementation will require a scheduler to be built because of the
limitations of what can be tested concurrently on the test server. Furthermore, the
implementation will include a transition to a new CI platform because of the
limitations of the current build platform. Two use case diagrams have been created in
order to illustrate the functionalities of the system.
Thesis for Master’s Degree at HIT and LiU
32
Chapter 5 System Design
This chapter presents an overview of the system design, processes and
architecture for the implementation. Section 5.1 describes the architecture and the
processes Vizlib used prior to the implementation. Section 5.2 describes the
modifications needed for the implementation. Section 5.3 describes the key
techniques used for the implementation. Lastly, Section 5.4 gives a brief summary of
the chapter.
5.1 Processes & architecture prior to the implementation
It is important for the system design for the new solution to fit Vizlib’s software
development processes without altering their current processes too drastically. From
the interviews from Cycle 1, the architecture and processes were presented for the
author. Vizlib’s system architecture prior to the implementation is described by
Figure 5.1. The figure includes the actors for the different components of the system.
Thesis for Master’s Degree at HIT and LiU
33
Figure 5.1 Vizlib's prior system architecture.
The components in the system are described as following:
5.1.1 Freshdesk
Freshdesk is a support ticketing system used as a community platform for
Vizlib. On this platform, customers can report bugs, request features or discuss
various topics. A dedicated support team at Vizlib handles all requests posted on this
Thesis for Master’s Degree at HIT and LiU
34
platform, such as product support, verifying bugs and feature requests. If a bug is
verified and replicated, the support teams reports the issue to Vizlib’s issue tracking
system, Jira.
5.1.2 Github
Github is Vizlib’s VCS. Vizlib is using a git strategy called git flow. Git flow
consists of using four different types of branches. The first is the master branch, this
branch holds the code of the latest production version. The second branch is the
development branch, this branch holds the code for the next future release. From the
development branch, a release candidate branch is created from when performing a
new release. The product owner is the actor who decides when a release should be
made. The release candidate branch is the branch that gets tested before a release. At
Vizlib the smoke tests are performed on this branch. If the release candidate branch
is deemed ok for production, it gets merged with the master branch, else the branch
becomes discarded. Lastly, a tag is created on the master branch. Github is
configured to send a webhook to a slack bot upon the tag creation event.
The last type of branch is referred to as a feature/bug/issue branch. This branch
is created from the development branch and is the branch which a developer resolves
an issue from. When a developer has resolved an issue, a pull request with the code
changes are made towards the development branch. For each pull request, another
developer will code review the changes. An approval is needed from the reviewer for
an issue to get merged. Additionally, a tester will review and test the application to
determine if the changes are deemed OK. If the testers approve the update, they
merge the pull request to a development branch.
5.1.3 Jira
Jira is Vizlib’s issue tracking system. The software development methodology
Vizlib uses is a Kanban approach were each extension has its own individual Kanban
board. The Kanban board consists of the following columns; Selected for
development, In Progress, Awaiting QA, In testing, QA Approved, Done and
Released. Here, the product owner prioritizes the issues at hand and selects which
issues should be developed. The developers work together with the product owner to
select which issues the developer should work on from the product backlog. Once an
issue exists in the Awaiting QA column, the testers review the changes, ensuring the
Thesis for Master’s Degree at HIT and LiU
35
code changes has not affected the software in unexpected ways and that the solution
resolves the issue. If the issue gets approved by the testers, the issue transitions from
the Awaiting QA column to the QA approved column. When the pull request is
merged to the development branch, the issue get moved to the Done column. Once a
release is made, all issues in the Done column are move to the Released column.
5.1.4 Vizlib Notifier (Slack bot)
The Vizlib Notifier is a self-developed slack bot by Vizlib. When a webhook is
received from Github, upon tag creation, this slack bot produces an interactive
message with meta information about the release. It fetches the meta information
from JIRA describing which issues are about to be released. The message also
includes buttons to perform different actions, such as update the status of the related
issues as released on JIRA or notify customers that their issue has been resolved with
this release. The slack bot works as an interface for the product owner in order to
perform different steps in the release process. See Figure 5.2 for an example message
from the slack bot. The slack bot sends a webhook to the extension builder
application, when a release is triggered.
5.1.5 Extension Builder
The extension builder is another self-developed application by Vizlib. Upon
receiving requests, the application triggers build scripts for a tagged version of the
developed software, previously explained in section 4.1. After build script is
completed, the build artifacts are uploaded to AWS S3, a 3rd party storage provider.
Lastly, the extension builder notifies the slack bot whether the build has completed
with or without any errors.
Thesis for Master’s Degree at HIT and LiU
36
Figure 5.2 Example of a slack bot message for a release. Names and personal images
have been censored.
5.2 Architecture for the new solution
The architecture for the new solution is presented in Figure 5.3. The following
updates and systems will be added to the previous architecture:
5.2.1 CI platform - CircleCI
The build script and the test runner Cypress will be added to the new CI
platform, CircleCI. Cypress will be configured to request and allocate licenses from
the scheduler or to wait if no available licenses are found. Furthermore, if a license
has been allocated, Cypress will upload the build artifacts produced by the build
script to the test server. Additionally, the test runner will download the image
baseline that it should use for the test run from AWS S3. Lastly, the test runner will
Thesis for Master’s Degree at HIT and LiU
37
execute the test suites against the test server and deallocate the acquired license after
the test suites have been completed.
5.2.2 Test Server
A test server will be created hosting a version of Qlik sense. The test server will
be the SUT for the test runner Cypress. On this test server, testing accounts for Qlik
will be created, dedicated only to be used by the CI solution.
5.2.3 Scheduler
A scheduler will be implemented as an API. The scheduler will be responsible
for keeping track of what is currently being tested on the test server. It will generate
licenses for the test runner, but also notify the test runner if the test server is already
testing the same type of extension or whether any licenses are available.
5.2.4 Github
CircleCI will be integrated to Github. This will trigger builds on CircleCI when
specific events happen on Github. It will enable viewing build and test results
produced by CircleCI and enforcing rules, such as, tests must pass in order to
approve a pull request.
Thesis for Master’s Degree at HIT and LiU
38
Figure 5.3 Vizlib’s system architecture after the implementation.
5.3 Key techniques
This section describes the key concepts and techniques used for the
implementation and design of the system. Section 5.3.1 describes the technique
dependency injection and how it is used. Section 5.3.2 describes how testability was
ensured in the design of the system.
Thesis for Master’s Degree at HIT and LiU
39
5.3.1 Dependency injection
The scheduler is developed with the framework NestJS. NestJS allows for a
technique called Dependency Injection (DI) [34]. This is an architectural pattern the
author has selected to use. DI is an Inversion of Control (IoC) technique used to
achieve loose coupling between objects. When using DI, a class is not responsible of
instantiation for its own dependencies. Instead, the dependency gets instantiated by a
specific class and later it injects the initiated dependency to the class that needed the
dependency. The method of not controlling the initiation for the class own
dependencies is referred to as IoC [34]. Traditionally, without DI, the object
initiation is done by the coder imperatively. In NestJS, the injection is handled by a
specific IoC container during runtime. In order to create a dependency, NestJS uses a
so-called decorator for every class, that defines what kind of class it is. For example,
if the decorator injectable is used, NestJS will set that class as a provider and the
class can be used by any other class.
Dependency injection enables code reusability across any modules used
throughout the project, caching of instantiated objects and a simple way of creating
singleton objects.
5.3.2 Designing for testability
To ensure that the implementation fulfils its system requirements it is important
to test it. As the implementation consists of two different primary components using
different systems, it is important to take reasonable testing approaches for each
component. Garousi et al. [1] presents a checklist for determining what and when to
automate in software testing. The primary factors of the checklist are described in
Section 2.7. This checklist has been utilized when deciding if automated tests should
be written for the CI-platform and the scheduler. The implementation for the CI-
platform was deemed as a bad candidate for introducing automated testing and was
tested manually instead. The primary reason for why it was considered as a bad
candidate was because of the complexity and difficulty of testing the CI platform in
an automated manner. The implementation for the scheduler was deemed as an
appropriate candidate.
When designing a system and testing suites, it is important to consider the
aspect of testability. Alwardt et. al [35] present some best practices for achieving
Thesis for Master’s Degree at HIT and LiU
40
testability when designing software, see Table 6. These best practices were
considered when designing, implementing and writing tests for the scheduler.
Table 6 Best practices summarized for testability presented by Alwardt et. al [35].
# Practice
1 Keep unit tests separate from integration tests.
2 Tests should not depend on the order in which they run.
3 Unit tests should be atomic.
4 Design a loosely coupled system.
5 Test should be maintainable.
5.4 Brief summary
This chapter presents two high-level architectures, one describing the
components of the previous architecture and the other describing the how the new
implementation will affect and fit into the previous architecture. Furthermore, it
describes how the different actors of the system interact with the components in the
architecture. Finally, it presents how NestJS enables the use of DI and which
considerations were made in order to achieve a system with a high testability.
Thesis for Master’s Degree at HIT and LiU
41
Chapter 6 System Implementation
This chapter describes the system implementation. Section 6.1 describes the
different environments the system runs upon. Section 6.2 describes the key flows
throughout the system, including execution flow charts and sequence diagrams.
Section 6.3 describes the key interfaces of the system, consisting of which actors
interact with the different components in the system. Lastly, Section 6.4 gives a brief
summary of the chapter.
6.1 The environment of system implementation
The system implementation runs on three different environments, as it consists
of three different platforms. The first environment is for hosting the scheduler
application. The second environment is the CI platform. The third is the environment
for the test server.
6.1.1 Scheduler
The scheduler is hosted on a Platform as a Service (PaaS) provider called
Heroku. Heroku is a cloud platform for hosting and deploying web applications [36].
They offer different execution environments depending on the selected price plan
[37], [38]. Vizlib selected to use the standard-1x plan. The content of the plan is
summarized in Table 7. To the authors knowledge, Heroku has not published any
more detailed hardware specifications.
Table 7 Run-time environment for the scheduler.
Operating System Memory (RAM) vCPU
Ubuntu 18.04 (linux) 512 MB 1x
6.1.2 CircleCI
CircleCI is a PaaS and provides different run-time environments depending on
the selected price plan [39]. Vizlib has selected the Medium+ price plan. The content
of the plan is summarized in Table 8. To the authors knowledge, CircleCI has not
published any more detailed hardware specifications.
Thesis for Master’s Degree at HIT and LiU
42
Table 8 Run-time environment for the jobs on the CI platform.
Operating System Memory (RAM) vCPU
Docker container (Debian:jessie) 4 GB 2x
6.1.3 Test server
Table 9 describes the run-time environment for the test server hosting an
instance of Qlik sense. This server hosts the latest stable version of Qlik sense that is
typically released every 3 months.
Table 9 Run-time environment for the test server.
Operating System Memory (RAM) vCPU
Windows Server 2016 16 GB 4x
6.2 Key program flow charts
The system as a whole has three different events which are able to initiate the
VRT testing. This section describes what happens and how the system acts upon
these different events.
6.2.1 CircleCI workflows
In CircleCI workflows are divided into jobs. A job contains a sequence of steps
to be executed. A step is an arbitrary command that is defined inside a configuration
file. Each job is executed inside in its own docker container. This enables jobs to be
executed in parallel or in sequence as they run within their own context. When
creating a workflow, job dependencies are defined among the jobs in the workflow in
order to determine if a specific job can be run in parallel or if it must wait before
another job has to be completed before executing. Specifying when a workflow
should be triggered on CircleCI is done by defining filters. These filters are created
by declaring which branches or tags should trigger a workflow. Workflows enables
something CircleCI refers to as attaching and persisting to workspace. When
persisting something to a workspace, it means that the specified build artifact will be
stored and made available for other jobs. In order to retrieve build artifacts from
previously executed jobs, simply attach the workspace, and all the previously
Thesis for Master’s Degree at HIT and LiU
43
persisted workspaces will be attached for the specific job and available within the
current job’s context.
6.2.2 Implemented workflow on CircleCI
Figure 6.1 shows what the implemented workflow executes upon different
events triggered on Github.
Figure 6.1 Overview of the implemented workflow on CircleCI.
Thesis for Master’s Degree at HIT and LiU
44
6.2.2.1 CircleCI Job 1: Build Extension
The first job in the workflow builds a production version of the extension. It
begins with fetching the code associated with the event that triggered the workflow
from Github. After the code has been fetched, it installs all related dependencies for
the project that are needed to build a version of the extension. Later, it executes the
build script, building the extension. If the build fails, the CircleCI job aborts, stores
the logs over what has failed and displays on Github that the associated commit has
failed. If the build succeeds, the build artifacts produced are uploaded and persisted
to the workspace and continues to execute the second job in the workflow.
6.2.2.2 CircleCI Job 2: Build extension
The second and last job in the workflow executes the VRT tests. The job begins
with attaching the workspace created from the previously executed jobs in the
workflow. After the workspace has been attached, the job starts the test runner,
Cypress. The first task the test runner performs, is authenticating towards the
scheduler, ensuring only authorized request can allocate licenses. It authenticates by
doing a POST HTTP request with credentials that have been provided in a
configuration file for the test runner. The second task the test runner performs is
allocating a license from the scheduler. It achieves this by doing a GET HTTP
request to the scheduler. If the scheduler fails to find an available license, the test
runner retries again after 10 seconds until the scheduler finds an unallocated license.
Once an available license is found, the test runner uploads the build artifacts created
from CircleCI Job 1 to the test server. The tester later authenticates to the test server
by using the license retrieved from the scheduler. After the build artifacts have been
uploaded, the test runner downloads the image baseline it should use. After the
download is complete, the test runner executes the VRT tests. When the test runner
has completed running all test suites, the test runner sends a DELETE HTTP request
to the scheduler deallocating the used license. After the license has been deallocated
a test report is created. Lastly, the testing artifacts produced by Cypress, such as, a
video of the execution and a testing report are stored in CircleCI. Figure 6.2 provides
an overview of how the different systems communicate with each other upon the pull
request event as a sequence diagram.
Thesis for Master’s Degree at HIT and LiU
45
Figure 6.2 Sequence diagram over how the different systems communicate upon a
triggered event from Github.
Thesis for Master’s Degree at HIT and LiU
46
6.3 Key Interfaces of the software system
This section describes the key interfaces for all systems involved and which
components the different actors interacts with. Section 6.1.1 describes the developed
API endpoints for the scheduler. Section 6.3.2 describes what has been added for the
interface on Github. Section 6.3.3 describes how the CircleCI interface is used.
6.3.1 The Scheduler
The scheduler is responsible for keeping track over what is being tested on
which test server, how many licenses are available for every server in any given
point in time and providing the test runner with a valid license to use. The external
interfaces for the scheduler are the API endpoints developed. These endpoints are
presented in Table 10 including the required HTTP-method to use, the URL and a
short description over what the endpoint is for.
Table 10 Endpoints for the scheduler.
# Method URL Description
1 /GET /token/list Gets all the currently
allocated licenses by the
scheduler.
2 /GET /token /{extension}?server={server} Request a license for an
extension
3 /DELETE /token Release acquired license
4 /DELETE /token/freeall Releases all licenses
currently allocated on
the scheduler.
5 /POST /auth/login Authentication towards
the scheduler.
When the scheduler allocates a license, it uses a first come - first serve
algorithm. The choice of algorithm was based upon that each testing occasion was
deemed equally important. Thus, not requiring an advanced prioritization algorithm.
The test runner, Cypress, is the primary component in the system that interacts with
the scheduler. The scheduler contains a configuration file were all the test server
meta information is defined. The configuration file expects the following two items
to be defined; a test server URL and the amount of available testing accounts on the
Thesis for Master’s Degree at HIT and LiU
47
server. Based upon the information provided in the configuration file, the scheduler
sets up the expected resources for each defined server.
When the test runner requests a license from the scheduler, it uses endpoint #2
in Table 10. If the query parameter server is not provided, the scheduler searches
among all listed servers and investigates whether the server has an available license
and if the extension is not currently being tested on the server. If the server finds an
available testing instance, the scheduler allocates it, generates a license using JWT
and responds with which server is available and the generated license. When the
query parameter is specified, the scheduler searches only for an available license for
the server provided.
After the test runner has completed its execution, it uses endpoint #3 in Table 10
to release the allocated license. The body of the DELETE request contains which
extension has been tested and the server it has been tested against. This is needed for
the scheduler to determine which testing instance the request refers to. When the
scheduler receives this request, it simply searches for the allocated testing instance
and sets it as available again.
Endpoint #1 and #4 in Table 10 are primarily used by a system observer to
observe what the currently tested on the server and to reset all allocations currently
held by the scheduler.
Endpoint #5 is in Table 10 is used to authenticate towards the scheduler,
ensuring that unauthorized requested cannot allocate licenses. This is always the first
request the test runner performs in order to allocate a license. The body of the POST
request contains predefined credentials for the scheduler. Upon a successful
authentication, the scheduler responds upon the request with a JWT token. This JWT
must be present in the header of every request in order to use endpoints #1, #2, #3
and #4 in Table 10. If this token is not present, the scheduler will response with the
status code, 402, meaning the request is unauthorized. Upon successful requests, the
scheduler will always response with the status code 200, for all listed endpoints. If a
request is authenticated, but the scheduler has failed to allocate a license because
every server was unavailable or the scheduler tries to deallocate a license that is not
allocated, the status code of the response will be 202. If endpoint #2 is used in Table
10, and the query parameter contains a server that is not defined in the configuration
file in the scheduler, the response will have the status code 404, meaning the
resource is not found.
Thesis for Master’s Degree at HIT and LiU
48
6.3.2 Github
Another key interface for the system is Github, as it is the interface the
developers and testers interact with. When using CircleCI with Github, it
automatically links the status of triggered CircleCI jobs to the commits in the
repository, see Figure 6.3. Github has been configured to check the status of a
CircleCI job and requires all jobs to have been passed for a pull request to be
approved. This ensures the tests have been executed before any new code gets
merged to specific branches, see Section 5.1.2.
Figure 6.3 Status of triggered workflow viewed from Github.
6.3.3 CircleCI
An additional key interface for the system is the CircleCI platform. The
platform stores all the executed workflows including the results of the build and
testing artifacts produced by the build. When viewing a status of a build in Github, a
link is automatically provided to the CircleCI platform to the specific jobs that has
been executed, see Figure 6.3. Upon clicking one of the jobs in the workflow, a view
is shown, including all tasks executed for a specific job. In this view, CircleCI
highlights which tasks that have been successfully performed, which tasks that have
failed and the execution time, see Figure 6.4.
Thesis for Master’s Degree at HIT and LiU
49
Figure 6.4 Example of CircleCI job 2, viewed in CircleCI.
On CircleCI’s webpage an overview and the relation between the jobs in the
workflow can also be viewed, see Figure 6.5.
Figure 6.5 An overview of the CircleCI workflow viewed from the platform webpage.
CircleCI has been configured to store the testing artifacts produced by Cypress,
this includes a video of the test execution, images highlighting the difference
between the baseline and which tests have been failed, see Figure 6.6 and Figure 6.7.
Thesis for Master’s Degree at HIT and LiU
50
Figure 6.6 An overview of the testing artifacts stored in CircleCI.
Figure 6.7 An overview of the failed tests in CircleCI, showing the tests that have failed
during an execution.
6.4 Brief summary
This chapter describes the system implementation. It begins with describing the
system environments used by the different components in the system. The chapter
later describes the flow through the system and how the different components
interact with each other, using both an execution diagram and a sequence diagram.
Lastly, the key interfaces of the system are described, including how the interfaces
are used and how they work.
Thesis for Master’s Degree at HIT and LiU
51
Chapter 7 Results
This chapter describes the results gained from the conducted interviews and the
unstructured observations. Section 7.1 presents the results from the coding process of
the interview sessions. Section 7.2 presents the observations made throughout all the
cycles.
7.1 Interview sessions
From the coding process, a total of 99 statements were linked to a code. Table
11 shows the occurrence for each code used.
Table 11 Total occurrences of codes from coding process.
# Code Description Occurrences
1 Test Maintenance Statements concerning maintaining
the tests.
10
2 Implementation Effort Statements concerning the efforts for
implementing test suites.
8
3 Testing Tool Statements concerning the testing
tool used at the investigated
company.
13
4 Feedback Time Statements concerning the feedback
time to the developers.
15
5 Test Reliability Statements concerning the reliability
of test results.
14
6 Test Automation Statements concerning the test
automation.
11
7 Development Process Statements related to the
development process.
11
8 Organization Statements related to organizational
concerns.
17
Since these interview’s where held in different cycles with different intensions,
the code appearances have been linked to each cycle. Table 12 shows from which
Thesis for Master’s Degree at HIT and LiU
52
interview a code was detected. The letters A, B and C represent an interview subject
in each cycle.
Table 12 Code appearances from each interview linked to the corresponding cycle.
Cycle 1 Cycle 2 Cycle 3
Code # A B C Total A B C Total A B C Total
1 0 1 1 2 3 1 1 5 0 1 2 3
2 1 2 1 4 1 2 1 4 0 0 0 0
3 4 1 0 5 2 4 2 8 0 0 0 0
4 2 1 1 4 2 1 1 4 1 3 3 7
5 1 4 1 6 4 1 1 6 1 0 1 2
6 3 3 1 7 2 1 1 4 0 0 0 0
7 0 3 0 3 2 2 2 6 0 1 1 2
8 2 3 1 6 2 3 3 8 2 0 1 3
Total 13 18 6 37 18 15 12 45 4 5 8 17
Cycle 1 had a total of 37 code appearances, Cycle 2 had a total of 45 code
appearances and Cycle 3 had a total of 17 appearances. Table 13, Table 14 and Table
15 summarizes the interviewees background and the collected metrics from all
interview sessions.
Table 13 Background and collected metrics from Cycle 1 interviews.
Cycle 1 A B C
Background in software industry 11 Years 13 years 20 years
Experience with VRT Indirect experience 6 Months 3,5 years
Believes VRT is a compliment yes yes yes
Confidence in VRT test results yes yes yes
Believes higher quality with test
automation yes yes yes
Thesis for Master’s Degree at HIT and LiU
53
Table 14 Background and collected metrics from Cycle 2 interviews.
Cycle 2 A B C
Working experience as tester 3-4 months 2,5 years 4 years
Confidence in VRT test
results yes yes yes
Prior experience with VRT no yes yes
Believes VRT is a
compliment yes yes yes
Time to produce a test on
average
2 days on
average
Between 1 hour to
some days Some days
Table 15 Background and collected metrics from Cycle 3 interviews.
Cycle 3 A B C
Working experience software
developer 3 years 11 years 2 years
Confidence in VRT test results yes yes yes
Proportions maintaining vs
developing new features 60-40 50-50 75-25
Difficulty implementing new
features w/o introducing new
bugs because of code
complexity Impossible Very difficult
In general,
difficult
Believes VRT is a compliment yes yes yes
7.1.1 Test Maintenance
Two points of discussion were linked to test maintenance. The first subject was
concerned with the high coupling the developed product has with Qlik. Four
interviewees discussed if the version of Qlik would be updated on the test server,
there could be implications on the test suites and the image baseline for the VRT
tests. Two quotes have been extracted below from the transcribed material describing
the issue.
Thesis for Master’s Degree at HIT and LiU
54
“Sometimes when Qlik releases a new product, they can update new CSS classes
or something similar, so if we are dependent on some asset in the HTML and they
would change, we would have to update the tests, again and again.”
“Another thing that might become an issue, is if we decide to upgrade the
version of Qlik on our tests server, this means that the baseline will probably need to
be updated, and some test suites might break as they might update their API.”
The second point of discussion was identified by three interview subjects. They
were concerned with that new features could imply that the image baseline would
require to be updated as the application might change visually. Two quotes have been
extracted below describing the issue.
“That is a big one, if a new version comes there is a risk of us needed to revisit
all our tests. Also maintaining the image baseline. If we release a new feature, we
need to ensure the baseline is correct, as the application might look different.
“I would say it would be the test maintenance, because if something changes in
the extension code or even the Qlik sense version we are testing against, can impact
our tests. So, it will be a challenge to keep the tests updated. I think however, this
will be manageable, but it is important to keep an eye on your tests and check that
they are up to date.”
This was experienced and mentioned during the Cycle 3 interviews, see the two
quotes below.
“Right now, a thing we have an issue with is, if someone from QA updates
something on the server, the test may fail when checking the difference between my
feature branch and the demo application.”
“So recently we change our rebranding, which made the tests fail on all our
branches.”
Thesis for Master’s Degree at HIT and LiU
55
7.1.2 Implementation Effort
All the interview subjects had the same view upon the implementation effort.
That is, the implementation effort has increased when introducing the testing tool,
however, with time, they believed the implementation effort would decrease. Two
quotes have been selected expressing their thoughts below.
“I think theoretically it should decrease, but I kind of see it as a kind of
logarithmic function. At the beginning it might be a lot of work, then it hopefully
turns to zero”
“Over time it decreases, because you can avoid regressions, context switching,
revisiting and diving into old tasks. I think it really pays off to do that.”
Furthermore, two of the interviewees discussed some issues when developing
tests. The two quotes below describes the issue.
“When working with servers, it is sometimes hard to decide about waiting times
or timeouts. We must specify, how much time should we allow it to take before
something should render. This can be tricky to decide. Sometimes it is about
animations, and other times it is about latency.”
“So sometimes, because of the nature of the visual testing and because we have
to wait for somethings to render, we are affected of the certain load time of elements.
Although, you can wait, we might sometimes assume an element is loaded but it is not
actually loaded, so one thing gets rendered but not the other. You can run the same
tests twice and have different outputs.”
Lastly, one of the interview subjects compared the implementation effort
compared to another common testing framework.
“I remember in protractor, one time, we had a big problem to integrate image
comparison and was surprised how easy this was with Cypress.”
Thesis for Master’s Degree at HIT and LiU
56
7.1.3 Testing Tool
Five interviewees expressed that the testing tool, Cypress, was easy to use.
From the interviews during Cycle 2, when the testers were interviewed, all of them
agreed upon this, even though no interview question was designed to investigate this.
Four quotes have been selected from the interview material were the interviewees
express their thoughts about how Cypress is easy to use.
“Cypress is really cool since the entry level threshold is really small. It is very
easy to use. You understand the thought process really easy.”
“Easy to use, even though I don't have a background in development. I only
have learned a little bit of simple JavaScript.”
“I think it was pretty easy, for the first tests. To start to develop anything. I'm
experienced with developing so a transitioning to a test tool wasn't difficult for me. I
thought it was easy to understand how to write the tests. The first setup, it took me
maybe a week to understand the tool.”
“Easy to use for sure. The entrance point is very small, if we are talking about
cypress in comparison with protractor, which is working on top of angular.
Protractor needed a lot more knowledge about different things working in the web
browser. It is easier for testers.”
Two interview subjects mentioned that the support for cross-browser testing is
beneficial. The quote below has been selected from when an interview subject
discussed the area.
“Cypress cross-browser testing is also beneficial. Currently, IE is not
supported, however, just by being able to run the tests upon a different browser,
really helps QA.”
One interviewee mentioned another benefit with the testing tool, expressed in
the quote below.
Thesis for Master’s Degree at HIT and LiU
57
“Cypress can easily work with image comparison techniques.”
7.1.4 Feedback Time
When the interview subjects were asked if they believed the feedback time would
decrease all the interview subjects stated that it would. Four quotes have been
selected from the interviewees’ answers when asked this question.
“Definitely lowers the feedback time. For me the testing and CI pipeline is
about the feedback loop, the sooner I know which line that broke something, then I'm
not going to write that line. I believe that everything that contributes to a shorter
feedback loop is beneficial, which is both visual testing but also other forms of
automated tests.”
“If we are talking about manual testing, like smoke tests, it should help because
all existing automated tests are connected to some tests in the smoke tests, can be
done automatically. So instead of 15 minutes of clicking inside the application, we
can look into the screen and in two minutes view the execution of the automation and
determine if the application behaved in the same way.”
“When we are using CircleCI, we are running these tests much more frequently,
which means that these tests will be been passed before they reach the manual
testers. There is a problem with issues bouncing back between the developers and
testers. And right now, the developers don't need to speak with the testers and wait
for the rejection, because the tests are going to be run upon the commit
automatically.”
“With the integration with CircleCI the developers receive quick answers and
they can fix it. This leads to less work for QA.”
One of the interview subjects mentioned that the feedback time depends on the
situation. The quote below describes the concern raised.
Thesis for Master’s Degree at HIT and LiU
58
“That depends probably about specific situations, sometimes automated tests
can help, sometimes they can be totally irrelevant.”
This was additionally mentioned in the Cycle 3 interviews when the developers
were interviewed, displayed in the three quotes below.
“With our current solution with CircleCI I have almost instant feedback. We
can talk about minutes, I have a feedback that the version is built correctly, results
from your tests and so on. In terms of machine feedback, it is in terms of 5-10
minutes.”
“This is similar to our previous questions, like how many days do I have to wait
for QA and now this is something that is automatic, so it happens immediately after
pushing something to Github. So quick feedback and performance.”
“So, the feedback that we receive through the CI is really fast, but not as fast as
you'd might expect. When you run these tests locally, you have many things
preinstalled which is not the case for the CI were dependencies need to be installed
for each run, even with caching, takes a little bit of time. So, when running the tests
locally, the execution time is maybe around 1-3 minutes, meanwhile in the CI
environment it can take up to 5 to 10 minutes. It is still very fast and okay, but these
minutes adds up each iteration.”
Lastly, from the Cycle 3, two interviewees mentioned that the tests have been
helpful. However, all the interviewees mentioned the test coverage is too low as the
implementation is in its early stages, displayed in the three quotes below.
“When we are talking about Cypress at Vizlib, we don't have too many
functional tests today since we have been more focused upon the smoke tests.
However, with greater coverage, I'm sure we are able to prevent a lot of different
bugs.”
“I found cypress very helpful, as it sometimes it has found a couple of bugs that
I've couldn't find or didn't look thorough enough. For example, I maybe didn't expect
Thesis for Master’s Degree at HIT and LiU
59
the changes to affect a certain part because somethings are so complex it hard to
know the implications and cypress has been able find these issues. As the state of
right now, I don't think we have enough coverage that we can only rely on these tests
only. The cypress tests find something, I'm very thankful for that, but we have too few
tests. So, this means we are probably not spotting other errors. I want to be more
confident in these tests, but at the time being I am not. They have found bugs in my
work and I'm very grateful for that, but we don't have enough coverage currently. It
is very good that it covers the basic functionality that our clients use, but I would like
more.”
“Actually yes, recently we had a situation with Cypress, so thanks to that, it was
actually quite big, we were able to spot 2 different issues. So, I would say yeah, it
covers the bugs we usually deal with.”
7.1.5 Test Reliability
From the interviews, all the interview subjects were asked if they felt confident
in the test results produced. All interview subjects said that they were, see Table 13,
Table 14 and Table 15. One of the purposes of the first cycle interviews was to
discuss the intent of introducing VRT tests. Three quotes have been extracted below
describing two different concerns raised by two interviewees during the first cycle
interviews.
“Another drawback would be accuracy. So, the accuracy might damage the
reliability that I'm after. So, for example, if you run the VRT tests and it works for
you and then you say yeah, the development is on point and the tests work. And then I
test it again and it doesn't work, it makes me think, if I run it later, will it work? I
would say it is tricky to get a full deterministic output, were every time the
application behaves the same way. This is more of a challenge rather than a
drawback.”
“Another drawback would might be relying on the tests too much. That's the
thing, on one hand I can see that it can save us time at QA, on the other hand if it is
not 100% accurate or reliable, it means I'm still unsure if it is working or not. That
Thesis for Master’s Degree at HIT and LiU
60
basically means that I need to double up the time, in that case we are running the
Cypress tests but then we are still doing some manual exploration. So even though, it
is supposed to save time, at the moment it is an overhead as we need to do this
double check.”
“If you don't have trust or confidence, I have had this feeling as well, uhh, i'm
not sure if it works. I better check. But in that case, it means that we haven't covered
it enough and that simply means there is still work to do. If I have automatic testing
and they don't give me confidence, it means I don't have enough coverage, or I
simply don't trust my test which means I not testing what I should be testing.”
From the second cycle interviews, when the testers were interviewed, all testers
mentioned that they feel confident in the results if Cypress yield positive results. A
quote from each tester has been listed below.
“I really trust the results that Cypress produces. So, if everything passes, I will
not doublecheck it manually.”
“I am really confident when Cypress passes all tests, I believe there is no need
to explore further if a test exists for what is being tested.”
“We have some issues that we need to update some timeouts and the image
baseline. However, if Cypress passes all tests, then I am confident in the results. If
something is bad Cypress is really good at reporting this.”
Furthermore, one interviewee mentioned a difficulty dealing with high coupling
with a Qlik. The quote from the interviewee is listed below.
“We have specific kinds of projects (Extensions) that are included in a different
bigger project which is Qlik sense. This is sometimes problematic, since we tend to in
a little way, also test Qlik sense which we don't have control over.”
Thesis for Master’s Degree at HIT and LiU
61
7.1.6 Test Automation
From all the interviews, no one believed automated tests could completely
replace manual testers. However, all the interviewees held the view that the
automated tests are a good addition to the testing process. The two quotes below
come from the first cycle interviews, investigating why they wanted to introduce
VRT.
“I see it as a repetitive task, so instead of spending 3 hours manually for one
person performing the task, I believe we can automatize that and make it more robust
to reduce the human error from repetitive tasks. So, we can focus the QA-team on the
more manual exploration, which actually requires a higher level of brain power.”
“Manual testers aren't able to test everything in each release. This is why work
that is repeatable, is what we want to automize. We want to save the time, as we can
write the test once, we can save time by doing the exact same thing by person. Right
now, after each release, I am asking myself, well hmm, we didn't actually check 50
other bugs that we had earlier with a client with this extension.”
Two quotes have been selected, describing why the interviewees believe
automated tests cannot replace manual testers.
“But if we are talking about the overall picture of the extensions and somewhere
where human intelligence is important to check if something is nice or not nice,
automated tests for now, will never replace this.”
“It is rather hard to implement every manual test into a testing script.
Especially new features, they have to be tested manually first to find edge cases and
to seek how it works. Testers notice many visual things, such as "this doesn't look
good", which I believe cannot be achieved easily with Cypress.”
During the second cycle interviews, two of the interviewees mentioned they had
trouble automating an export feature within the extensions. The quote below
describes the issue at hand.
Thesis for Master’s Degree at HIT and LiU
62
“For example, we haven't still found a way to test the export feature properly, in
the way that the file produced by our extension looks good. I would say it is
challenging.”
Another issue detected from the second cycle interviews is described by the
quote below.
“We have some difficulties with the baseline, when testing locally versus a
production environment, this is because the locally produced version of the extension
will slightly appear different than the production version. This means the tests when
comparing images will fail. But this is a very small issue.”
Additionally, two insight was brought up during the first cycle interviews
concerning the cost of running the VRT tests and the benefits of running them. The
quotes have been extracted below.
“They might be quite expensive to run, because they are more costly, time
consuming and memory costly to run as they run upon a browser. (Compared to
other testing approach such as unit testing)”
“I think the benefit is that you can have this ultimate check to ensure the
product works as it actually should. Which I think is really hard to have this
confidence by using functional testing, were you test in isolation.”
7.1.7 Development Process
During the second cycle interviews, the testers described how long the manual
procedure takes for performing a smoke test. Their answers ranged from half an hour
to one hour, see the quote below.
“Smoke tests (Manual) takes around half an hour to do, meanwhile expletory
features can take around 5-6 hours.”
Thesis for Master’s Degree at HIT and LiU
63
All testers mentioned the efficiency and effectiveness has increase as a cause of
the VRT tests. The three quotes below show what the testers expressed.
“It will decrease, because for example the things I was doing previously, like
smoke tests took around 1 hour. Right now, we are covered in the most cases with
Cypress. So, it is like, two clicks, and then you run the tests and that's all you have to
do. In the end it will decrease. But now in the beginning, it has increased, as we have
spent some time writing the tests, before we can use them. We are removing things
that we previously had to do manually, so we are saving a lot of time. We are sure
that for each iteration, we cover the same things.”
“Also running the Cypress upon smoke tests, I believe it finds more bugs and is
quicker, which leads to quicker releases. Smokes tests usually take some time for the
tester and now they are quicker.”
Two interviewees mentioned an increase in test coverage compared to only
performing manual testing. See the two quotes below.
“Also, Cypress found some cases that the QA team didn't find. Even when I was
writing some new test cases for some extensions, we found 2-3 new bugs that hadn't
been detected before. However, overall, I believe it will decrease the time for the
testers.”
“I actually saw that happen already, the QA-team was developing an Cypress
test for an extension and by doing that, they found several bugs, that later were
reported and fixed and that means we now have a test to prevent them from
happening.”
Additionally, an interviewee mentioned a benefit of writing tests. See the quote
below.
“Because we are able to catch things in a production environment and also
increase quality awareness within the product, because we are being quality
Thesis for Master’s Degree at HIT and LiU
64
conscious. That means that if we create a new feature, we are going to having
implement a new test, so that will mean it will be like having a second manual test .”
From the first cycle interviews, an interviewee mentioned that one intent of
introducing VRT was to prevent the testers from testing faulty issues, as it increases
the workload with redundant tasks, see the two quotes below.
“To prevent faulty issues to QA. So, by doing that, we could optimize the QA
process, leading to less false positives. So, if something doesn't pass our current
standards there is no point for QA to start testing it, because now it is a bottleneck.”
“There is nothing more frustrating from a QA-perspective then testing
something that doesn't work, then it gets back to development, then comes back to
QA and gets rejected and bounces back and forth like this. That is the worst thing
ever. Whatever we can do to reduce the risk of this happening, is worth
investigating.”
When interviewing the developers during the Cycle 3 interviews, they all
mentioned that it is difficult to implement new features without introducing bugs, see
the two quotes below.
“Yeah, it is very difficult to introduce something new, there is a lack of
documentation for our extensions and additionally unit tests testing the documented
functionality, so it is difficult to find out if you broke something. It is very time-
consuming to investigate that something is working right or even to prove that
something is working right, without unit or e2e-tests. As the products have grown, it
takes longer time for something simple to implemented as compare to previous
features when the products were less complex. Because you need to fit your solution
carefully and ensure should haven't introduced any regression or so. It is not an easy
task.”
“No, or it depends on the features, if it is something like breaking changes,
which we did recently I suppose it is very difficult do without bugs. The complexity
for each extension is almost the same, I would say they are very complex.”
Thesis for Master’s Degree at HIT and LiU
65
7.1.8 Organization
During the second cycle interviews, the testers were asked what tasks they
found most time-consuming or tedious. The four quotes were extracted from their
answers.
“Manual testing is often very monotonic, with sometimes very repetitive tasks. It
is very important to change projects, at Vizlib, that means working on a different
extension.”
“For me, personally, compare reviews. With the snapshot testing this really
assists us with having a compare view directly. If you compare doing this task
manually, it becomes really tedious.”
“I would say, repeatable tests, because it is more exciting to test new features,
rather than testing the same thing again over and over again. Also, that issues
reappear, so after you have rejected something, you know the same tests will be
needed to check it again, testing the same thing all the time.”
“For example, that the server is down or that the building process is not
working as expected. It is very time consuming, for example if we are testing against
a server, and it is not working, then we must check against another server and then if
that server doesn't work you have to check it locally.”
When the testers were instead asked which tasks they found the most enjoyable,
they gave the following answers.
“Testing totally new things is the most fun to test. It's all about being creative
and finding new use cases to test from the customers perspective.”
“New features, because they are more creative rather than going for all the
steps required to reproduce a bug.”
Thesis for Master’s Degree at HIT and LiU
66
“The most interesting ones, the most complex testing. I for example, did a test
where I put the Mario game into one of our extensions. It was very exciting,
challenging and fun.”
From the first cycle interviews, two of the interviewees mentioned one of the
reasons why they wanted to introduce VRT testing, see the two quotes below.
“One of the main concerns we have is regression bugs.”
“Fight with regression bugs.”
From the second cycle interviews, one tester reported an estimate of the
occurrence of these bugs, see the quote below.
“Bug fixing, I would say around 10-20 % of the bugs are regression bugs.”
Lastly from the Cycle 3 interviews, two concerns were raised, see the two
quotes below.
“In an ideal world, the test should investigate if something has been broken, but
at Vizlib I find it difficult. Let's say we spend 1-2 years implementing functional tests,
there is still will be still some extensions that are unmaintainable, so the tests
execution can't be able to cover everything. We could have 100% test coverage, and
it is still not enough to be sure that everything is fine. I like the way we are heading,
the idea with CI, CD, integration tests, functional tests, fuzzy tests etc. But without
good quality of development, the tests can only be so helpful.”
“From the developer perspective, there is a lack of communication in what
CircleCI returns and the developers. When I see, oh the e2e tests failed. I go to a
specific person or someone from the QA team, and ask what is wrong. Because I
don't understand the logs. So CircleCI is great and CI etc, but it also requires some
kind of introduction, presentation, learning of the new technology.”
Thesis for Master’s Degree at HIT and LiU
67
7.2 Unstructured Observations
A total of six observations were documented throughout all the cycles. Three
observations were linked to the first cycle, three observations were linked to the
second cycle and one observation was linked to the third cycle, see Table 16. A
hierarchical tree map has been created, mapping the observation to an area of
concern, see Figure 7.1.
Table 16 Type of observations made linked to each cycle.
Type of Observation Cycle 1 Cycle 2 Cycle 3
Limitation L1 L2 -
Challenge C1, C2 C3 C4
Problem - P1 -
Figure 7.1 Documented observations throughout all cycles.
Thesis for Master’s Degree at HIT and LiU
68
7.2.1 Testing tools
The first observation found concerning the testing tool Cypress, was a limitation
observed during the first cycle. The limitation was a hook that was not supported by
Cypress. A hook is a special type of listener to an event, that can trigger a user
defined function when an event is emitted. Cypress supports different types of hooks,
such as, before a test suite is executed, after a test suite is executed etc. However,
Cypress does not have support for a sought-after event which was after all test suites
have been executed. This was troublesome, because the test runner needed to release
a requested token from the scheduler after all test suites had been executed. This
required another approach to be taken. The solution implemented, was to create a test
suite that was only responsible of releasing the acquired token and ensuring that this
test suite was always executed last.
The second observation concerning the testing tool, was a challenge of finding a
strategy for maintaining the image baseline. This observation was made during the
first cycle. The standard strategy offered by Cypress, is done by linking VRT test
cases to specific images in a folder. If no images exist in the folder, Cypress will
save the images produced in that test run and ignore these tests for that specific run,
since there exists no baseline to compare against. These produced images would later
be used for future test runs. Vizlib found this approach problematic, because these
images would most likely be different depending on which branch on the VCS a
developer was working on. Additionally, upon merging, the developers could
accidently update the baseline with incorrect images. As a result, the approach taken,
was to upload the baseline images to a 3rd party storage provider, AWS S3. These
images are downloaded by the test runner before it’s execution, ensuring that all
branches use the same baseline images. Thus, minimizing the risk for accidental
updates.
The third observations concerning the testing tool, was the challenge of
generating the images for the baseline. Depending on which operating system the test
runner ran the execution from, or which screen resolution was used when generating
the baseline, had an effect of the test results potentially yielding false positives. This
occurred because different operating systems may render images differently, or
images may get compressed depending on the resolution, affecting the pixel
comparisons algorithms used later by the testing tool. This observation was made
during the third cycle. The strategy for handling this challenge was to generate all the
Thesis for Master’s Degree at HIT and LiU
69
baseline images directly from the CI platform, ensuring that the operating system
used for generated the baseline images was the same as the one the test runner
executed from.
7.2.2 Tested system
Two observations were linked to the tested system. The first observation was
made during the first cycle. The challenge was to upload the extensions in a
convenient way to the test server from the CI platform or locally. After discussing
with the company supervisor, it came to the authors attention that the company had
an already implemented script for uploading extensions to the test server. This script
was reused and integrated to the test runner.
The second observation made was a problem that was observed during the
second cycle. The author noticed a specific test was always failing when using the
dedicated CI testing accounts on the test server. This problem occurred because of an
authorization issue. First, when the system was implemented, only one specific
testing account was used. This account was an owner of a certain application which
the tests ran against. The functionality that was being tested required the user to be
an owner of an application in Qlik to perform. This became troublesome, once the
dedicated CI test user accounts were used, since these accounts were not owners of
the application. The solution for this problem was to configure the security rules for
the dedicated CI accounts, making it possible to perform the action required by the
test suites.
7.2.3 Continuous Integration platform
Only one observation was linked to the continuous integration platform. This
observation was made during the second cycle. The limitation was triggering a
specific workflow on the CI platform on pull requests and tag creations meanwhile
triggering a different workflow upon commits. This specific configuration was not
possible to achieve, it was only possible to do one or the other, but not both. After
discussions with the company supervisor regarding the limitation, it was chosen to
proceed with triggering workflows upon commits and tag creations. Since pull
requests refer to the latest commit, it means the workflow is indirectly triggered upon
pull requests. In order to approve a pull request, the latest commit, must have passed
Thesis for Master’s Degree at HIT and LiU
70
all steps in the CI workflow, making sure that the tests have run and been passed
before a pull request becomes approved.
7.2.4 Support software
One observation was made concerning the support software. This observation
was made during the second cycle, when the CI solution was being integrated with
more extensions. It was noticed that build destination from the build script was
different for some extensions. This had implications as the upload script needed to
know from which file destination it should upload the extension from to the test
server. The solution implemented for handling different the build destinations was
integrated into the test runner. In the configuration file for the test runner, a new
variable was added which declared the build location for that specific extension. The
build script later used this variable to determine were the file location was for the
build artifacts produced by the build script.
Thesis for Master’s Degree at HIT and LiU
71
Chapter 8 Discussion
This chapter discusses the results, method and the work in a wider context.
Section 8.1 presents the discussion of the results from the interview sessions and the
unstructured observations. Section 8.2 presents the discussion of the method and the
threats to validity. Lastly, Section 8.3 presents the discussion of the work in a wider
context.
8.1 Results
This section discusses the results from the coding process for each code word
and triangulate the results with the literature review. Additionally, it presents the
results from the unstructured observations.
8.1.1 Test maintenance
The first concern brought up from the interviews was that the test maintenance
is affected by the high coupling the extensions have with Qlik. If the version of Qlik
would be upgraded on the test server, the test suites would be at risk of being needed
to be revisited, as the APIs, appearance of components and assets could have
changed. This drawback is not only associated with VRT, but rather with test
automation in general [1], [3], [9]. This drawback can be mitigated by incorporating
an architectural strategy by adding custom identifiers for the test runner to interact
with rather than using existing identifiers associated in the 3 rd party software,
decreasing the test runner’s dependency. This strategy may, however, not always be
possible, as some components aren’t always accessible for modification. The same
type of issue was identified in the case study conducted by E. Alégroth et al. [19]
when using VGT. The difference in results is that VGT is sensitive to visual updates,
meanwhile VRT is sensitive to asset updates, as VGT uses image recognition to
interact with the SUT rather than assets.
The second concern brought up was the maintenance of the image baseline used
by the VRT technique. Whenever a new feature is developed, the feature may contain
a visual change requiring the image baseline to be updated. Adding an additional
aspect of maintenance for maintaining VRT tests compared to with other testing
techniques. This was also observed when the company rebranded, causing the test
suites to break. This forces a strategy for deciding upon how the application should
Thesis for Master’s Degree at HIT and LiU
72
look like and when the new image baseline reference should be used. The strategy
used at Vizlib was to upload the baseline to a 3rd party storage provider. The test
runner later downloads the baseline prior to its execution. Maintaining the image
baseline can be both perceived as a benefit and a drawback. The benefit is that a clear
illustration over how the application should like will exist in the form of an image
and it is revised for every visual feature update. E. Alégroth et al. [19] argue
adopting a frequent test maintenance strategy is important to avoid test script
degradation and helps lowering maintenance cost to a large extent, because
simultaneous maintenance of both logic and image is more complex than doing it
separately. However, constantly revising the image baseline could become time
consuming. Additionally, when a branch on a VCS is out of sync with the latest
visual feature updates, it will fail upon test execution, since the image baseline
expects the new visual updates to exist. This is a factor that must be considered when
incorporating a strategy for maintaining an image baseline. No research papers have
been found concerning maintaining an image baseline.
8.1.2 Implementation effort
The interviewee’s throughout Cycle 1 and 2 stated that the implementation
effort has increased upon introducing VRT but believe it will decrease with time. The
main concern regarding the implementation effort for VRT raised by the
interviewee’s was handling load times, animations and latency issues when
developing test scripts. The animation and load time events are often not represented
as GUI events and are therefore difficult for the test runner to capture. This is
described as a fundamental problem in GUI testing without a simple solution by M.
Jovic et al. [15]. When increasing the amount of time-driven animations in a user
interface the importance of this problem increases. The approach taken at Vizlib for
handling this problem is using timeouts, by setting a value to wait for an animation
or a load time to complete. The issue with this approach is that setting a too short
timeout will cause the tests to fail, yielding false positives, meanwhile setting a too
long timeout will result in an unnecessarily long execution time. When the test
runner can capture these events, timeouts are not necessary, since the test runner
waits until the expected element appears and continues to run after its appearance.
This a benefit with the 2nd and 3rd generation tools of GUI testing. Another strategy
that can be taken is to rerun all failed tests and set a threshold over how many passes
Thesis for Master’s Degree at HIT and LiU
73
must occur in order to deem a passed test. A drawback with this strategy is that it
increases the test execution time.
Furthermore, latency issues are another challenge observed by the interviewee’s,
determining how long the test runner should wait before failing a test. Some actions
may take substantial time to perform and in combination with latency issues, the
execution time may vary from 30 seconds to 2 minutes. A mitigation strategy
suggested in two papers by E. Alégroth et al. [3][19] is to minimize the remote test
execution. This strategy is, however, is not currently possible at Vizlib, since the CI
is hosted from a PaaS provider, requiring the test execution to occur remotely.
8.1.3 Testing tool
Both the first and second cycle interviewee’s reported that the testing tool,
Cypress was easy to use and easy to setup. It is not uncommon for testers to not have
a background in programming. However, the results indicate that even these
individuals reported that the testing tool was easy to use, and barrier of entry was low
for developing test scripts. M. Rafi et al. [21] argues that testing techniques with an
easy learning curve can reduce the high initial investment upon introducing test
automation. Furthermore, V. Gaurosi et al. [1] lists easy test automation tools as an
important beneficial factor when transitioning to test automation. The results show
that there exist such tools for VRT. E. Alégroth et al. [19] discusses that VGT is
often associated with being easy to use, which can make it tempting to bring
automation to various types of test cases. However, they argue that the technique
should be primarily used for system and acceptance testing, as the maintenance cost
may become costly if used for immature or frequently changing functionality.
8.1.4 Feedback time
The results show that the feedback time has decreased and been helpful for both
the developers and testers. With the help of the implementation, the time it takes for
the testers to perform the smoke tests has decreased. Additionally, has the testing
frequency increased for the developers as the tests run automatically in each commit.
This is an expected result, since it is usually one of the primary reasons why
organizations choose to introduce automatic testing in a CI environment. This was
also observed at Vizlib when viewing upon the incentives during the first cycle. The
results additionally show that the tests have been able to spot faults that could have
Thesis for Master’s Degree at HIT and LiU
74
otherwise been missed due to human error, such as forgetting to check an edge case.
This is further discussed in Section 8.1.6.
Two concerns were raised during the Cycle 3 interviews regarding the feedback
received. The first concern raised was that the current state of the test coverage is too
low, to be able to only rely upon the test results. As the implementation is in its early
phase, this result in not unexpected. The interviewees explained further as the
implementation would mature, the test coverage would increase. However, increasing
the test coverage should be done with caution as the implementation effort and test
maintenances is affected by this factor, as discussed in Section 8.1.1, 8.1.3 and 8.1.6.
The second concern raised was that the execution time was quick, but not as
quick as one might expect. GUI testing techniques tend to have a longer execution
time compared to other techniques. The interviewees mentioned this was a small
issue. However, it has the potential of becoming a larger factor as the test suites grow
and is another reason why introducing tests cases should be done with caution.
8.1.5 Test reliability
The results show that test reliability is important, without it , the uncertainty can
increase the overall workload, due to the doublechecking. One of the interviewee’s
answer mentioned if one does not trust the tests results, it probably means you are
testing the wrong thing, or not thoroughly enough, either way, there is still work that
needs to be done. This is an important mindset when introducing test automation,
because it ensures the focus will remain upon the scope of the problem the test
automation tries to solve. Furthermore, from the collected metrics from Cycle 2 show
that all the interviewee’s have trust in the test results if the test runner passes all test
suites. This indicates that VRT is not sensitive for reporting false negatives, that is,
poor quality items that have been are being tested get approved. This supports the
same results have been observed by E. Alégroth et al. [3]. However, as discussed in
Section 8.1.2, there are cases were false positives have been observed that can lead to
a costly root cause analysis.
The results show that VRT is effective in reporting that a failure has occurred.
The testing tool used provides an image showing highlighting the differences
between the baseline and the image taken by the test runner. Additionally, it provides
a video of the entire test execution, making it easy for a tester following what has
happened. However, when a failed test is reported, a root cause analysis must always
Thesis for Master’s Degree at HIT and LiU
75
be made, even though the testing tool can provide some indications over how an error
occured. Therefore, it is important to minimize the false positives by increasing the
robustness of the tests. The same issue was reported by E. Alégroth et al. [3].
8.1.6 Test automation
The results show that from the first cycle interviews there were two reasons why
Vizlib wanted to introduce test automation regarding the automation aspect. The first
reason was to increase their test coverage as they don’t have the time to test
everything for every release nor is it feasible to exhaustively test everything
manually constantly. The second reason was to automate repeatable tasks to become
more effective and to direct the tester’s focus away from monotonic and often boring
tasks to minimize human error. None of the interviewee’s believed that VRT could
replace manual testing, but rather believed it is a helpful compliment for manual
testing. This result is supported by the following studies; [1], [3], [8], [9], [19]. These
studies argue that the reason why automated tests or visual testing cannot replace
manual testing practices is because the automated tests can only detect failures that
are explicitly asserted. Therefore, other practices such as manual exploratory testing
are needed to compliment the test automation scripts. E. Alégroth et al. [19] and M.
Jovic et al. [9] argue this is a common misconception in industry, were the
expectation leans towards that the test automation can perform all tasks that humans
can. The interviewees mainly argued why it cannot replace automatic testing is
because the VRT cannot determine whether an application “looks nice”, i.e. how
colours match with each other or capturing the overall feeling of the application, that
human intelligence is required for these tasks. Two testers mentioned a test case,
checking the data produced when exporting the data from one extension, that they
currently haven’t found a good way to automate, indicating that some test cases are
difficult to automate. The technical debt increases in the form of test maintenance
and test execution time as the number of test cases grows, which is a factor to
consider upon selecting test cases. Especially in the case of VRT and Vizlib, as an
interviewee mentioned, that the execution time tends to be longer with this testing
approach compared with other techniques. Additionally, the more test cases Vizlib
holds, the more maintenance is likely required upon updating the version of Qlik, as
discussed in Section 8.1.1 and Section 8.1.2. Lastly, one interviewee mentioned a
benefit of using VRT as an ultimate check, ensuring the application works as it
Thesis for Master’s Degree at HIT and LiU
76
should. The interviewee further describes that this is difficult to do with other testing
approaches such as functional testing. This is a benefit with VRT, that i t captures the
end user perspective which is enabled by this form of testing.
8.1.7 Development process
The results show that it takes between half an hour to one hour for the testers to
perform a smoke test while testing a feature can take up to 6 hours. All the
interviewees from the second cycle believed that this process is more effective now
both regarding the time and efficiency. However, they mention that their workload
has increased because of the implementation effort of writing test but believe this
will decrease in time as discussed in Section 8.1.2. That the execution time of the
tests would decrease, and the testing frequency would increase was an expected
result as discussed in Section 8.1.4. However, the interesting result is the total effort
required, viewed from a long-term perspective when transitioning to ATE of VRT.
The study period for this thesis is too short to draw any conclusions in that regard,
but the results indicate potential for positive long-term results.
Furthermore, the interviewee’s mentioned a benefit of finding issues while
writing tests that had not been observed before. One interviewee discussed this as a
benefit of being quality conscious. That is, the task of writing a test, increases the
quality awareness and works as a second manual test. S. Berner et al. [9] discusses
that during the test automation effort, 60 – 80 % of all bugs are found during the
development of tests. A further investigation would be required to confirm the
proportions of the bugs detected; However, the results indicate an increase of quality
consciousness.
From the metrics collected during Cycle 3 the developers reported it can take up
to 1-2 weeks before receiving feedback from the testers on whether their changes are
approved. The results additionally show that there is a problem with issues bouncing
back and forth between the testers and developers, causing an overhead of testing bad
artifacts from the testers point of view, and context switching and revisiting old
issues from the developers view. As discussed in Section 8.1.4, the developers
reported an increase in quicker feedback which could tackle the issue the
organization was experiencing. However, additional data is required to draw any
conclusions regarding this issue.
Thesis for Master’s Degree at HIT and LiU
77
8.1.8 Organization
The results show that the testers view upon which tasks were the most tedious
were aligned. All the testers mentioned that the monotonous or repetitive testing are
the most tedious tasks. Additionally, one tester mentioned that coping with server up
times has a tendency of becoming an irritating overhead. The results were also
aligned when the testers were asked which tasks they found the most enjoyable. The
testers mentioned that exploratory, challenging or complex testing are the most
stimulating and enjoyable tasks. As discussed in Section 8.1.6, one of the incentives
for introducing VRT was to automate repetitive tasks, matching the tedious testing
tasks described by the testers. Additionally, the results show that regression bugs is a
challenge that Vizlib are facing. V. Gaurosi et al. [1] argues that smoke tests, large
amount of tests that are similar to each other, and frequent regression tests among
other factors are beneficial areas/factors to introduce test automation for, which
matches with the view over what Vizlib is striving for. Furthermore, V. Gaurosi et al.
[1] argues for some non-beneficial factors such as tightly integrated with 3rd party
software. This has been observed in this study and is discussed in Section 8.1.1. The
authors argue further that it is important for an organization to hold adequate
knowledge and competences in order to be successful upon introducing test
automation. When viewing the collected metrics, one could argue this is the case for
Vizlib. However, one interviewee reported that it was difficult to understand the test
results produced by the CI system and suggesting that some form of education plan
should be in place. This is an important factor that should be incorporated, ensuring
all actors can utilize the system in order to maximize the benefits.
Furthermore, two interviewees mentioned that test aren’t always necessary
helpful, it depends on the issue at hand. From the collected metrics all the developers
felt it was difficult to develop new features without risking introducing new bugs,
because of the code complexity. This indicates that other factors within good
software engineering practises are important for achieving high quality than testing
for achieving products with low defect levels.
Considering the various aspects mentioned above, indicates that Vizlib has been
a good candidate for introducing test automation. However, these set of factors may
vary in other organizations and environments. They should therefore be taken with
care and consideration in the decision process of introducing test automation, as the
end results may differ because of them.
Thesis for Master’s Degree at HIT and LiU
78
8.1.9 Unstructured Observations
The observations made throughout all cycles indicated some factors that had to
be considered which in turn influenced the implementation design. Some of these
factors are context-based, such as the tight coupling extensions have with Qlik.
However, it is not unlikely for an organization to face e.g. similar types of
dependencies, making these factors still important to consider upon deciding if the
organization should introduce VRT or not, even though, the faced context-based
variables aren’t the same. Additionally, some general factors may not have been
observed in this thesis because of the context-based variables at Vizlib, i.e. because
of the tight coupling, other factors were not relevant for this type of setting and
therefore not observed.
8.1.9.1 Testing tools
The first testing tool observation L1, indicates that it is important to ensure the
testing tool fits and performs after the sought-after needs. This observation supports
the same observations made by Garousi et al. [1] and S. Berner et al. [9]. If no
solution would have been found, it could have jeopardized the success of the
implementation and is therefore important to investigate. Even if other solutions exist
to this limitation, such as, implementing the resource allocation inside the CI
platform, the solution may have become too complex to pursue.
The second testing tool observation C1, was concerning maintaining the image
baseline. The standard solution offered by a testing tool may differ depending on
which testing tool is used. Regardless of which tool is used, a strategy for
maintaining the baseline is required. The strategy used at Vizlib as described in the
results was to upload the images to a 3rd party storage provider and later download
these images before execution. Factors such as which git strategy the organization
uses may influence the choice of strategy. For example, a drawback with the strategy
used at Vizlib is that a feature branch that contains visual updates is going to fail the
test cases affected until the image baseline is updated. Additionally, upon updating
the image baseline, all the other branches that does not contain the new changes, are
going to fail until they have been updated with the latest code changes. To the
authors knowledge, no research papers exist investigating strategies for maintaining
image baselines for GUI testing.
The third testing tool observation C4, was concerned with generating the image
baseline from the same operating system as the test runner later executed from. This
Thesis for Master’s Degree at HIT and LiU
79
observation was also concerned with that different resolutions of the image baseline
may affect the test results. This result indicates that a strategy for determining how
the image baseline should be generated is needed, as it affects the reliability of the
test results. The strategy used at Vizlib was to use the CI system to generate the
image baseline, which ensured the images were always produced from the same
operating system as the one the test runner executed from. However, this strategy
may have implications, because the operating system on a local computer may differ
from the one used on the CI platform, which may become troublesome when f.e.
developing tests. In order to resolve this issue, the implementation only downloads
the image baseline when the test runner executes from the CI platform. If no baseline
images are present when the test runner executes, it automatically saves and later
refers to the images taken during that test execution. However, a strategy must still
be utilized for generating the baseline images the CI platform should use.
8.1.9.2 Tested System
The first observation concerning the tested system C2, was concerned with
uploading the developed software to the test server. This factor is only relevant if the
SUT is required to be hosted on an independent server. However, it is important to
ensure it is possible to upload the tested software to the independent server in an
automated manner in order to introduce a CI solution. Otherwise, it would not be
possible to test code changes made.
The second observation concerning the tested system P1, was concerned with
authorization problems when using dedicated testing accounts for the CI solution.
This problem was first noticed during the second cycle when implementing the test
execution inside the CI environment. These types of problems can be difficult to
account for prior to an implementation and may have severe implications. The lesson
that can be learned from this observation is that a locally working solution will likely
need configuration on various components in the system to be functioning in a CI
environment. This factor should be accounted for when considering the
implementation effort.
8.1.9.3 Continuous Integration platform
Only one observation was made concerning the CI platform L2. The limitation
observed was that a sought-after configuration of triggering different workflows on
the system, based upon the events pull request and commits, was not achievable. The
solution was to trigger the same workflow for every commit. The consequences of
Thesis for Master’s Degree at HIT and LiU
80
this observation were mild but could have had a more drastic effect. Upon selecting a
CI solution, it is therefore important to investigate if solution satisfies the sought -
after needs.
8.1.9.4 Support Software
Only one observation was made concerning the support software C3. The
challenge observed was that the build destination when running a build script was not
standardized. A benefit of introducing test automation is that the solution can be
reused for the different project that consist of the same type of software. However, it
may require some components to be standardized in order to reuse to a CI
configuration to experience this benefit and is a factor to consider.
8.2 Method
This section discusses the methods threats to validity based upon the guidelines
presented by P. Runeson [23].
8.2.1 Construct validity
The aim of the study is to investigate the effects of introducing VRT to a CI
platform in an industrial environment. The data collected from the interviews is
aimed towards capturing the practitioners experience when transitioning from a fully
manual testing suite to a semi-automated testing suite with VRT. In this context, a
semi-automated testing suite refers to that VRT will not replace all manual testing
practices but will however replace some of them and will additionally add new
practices. In order to increase the comprehension of the interview questions posed, a
pilot interview was held before each type of interview, minimizing the risk for
potential misinterpretations. The data collected from the observations are aimed
towards capturing decisions the author had to make during the progression of the
implementation for the system.
8.2.2 Internal validity & Reliability
The conclusions drawn in this thesis were acquired by careful triangulation of
the data collected from the various used methods. The primary data source used in
this study is based upon the data collected from the interview sessions. A threat to
the validity of this data source is how the respondents answered the posed questions,
as the respondents knew their answers would be reflected in this study, adding the
Thesis for Master’s Degree at HIT and LiU
81
risk of that the interviewees would answer non-truthfully. Two methods have been
used to mitigate this risk. The first method has been to completely anonymize the
interviewees answers, ensuring their answers cannot be traced back to them, giving
the interviewees space to answer truthfully. The second method used has been
triangulation by finding support for claims in other data sources, increasing the
likelihood of a statement being true.
This thesis investigates a topic in a complex environment dependent on multiple
factors, such as, the developed product, selected frameworks & tools, the
organizations competence and commitment to the adoption, to name a few factors.
These factors could have a large impact on the success and results of the study.
However, since the implementation is considered a success by the organization, the
likelihood of these factors impacting the results negatively is considered low.
Regardless of success of the implementation, these factors would of have clearly
shown their impact in results if they would have had an apparent influence.
The risk of bias introduced by the author is an additional factor that needs to be
discussed. When presenting the results, the author could have only selected
statements that have yielded beneficial results and taken statements out of context,
obscuring what was meant by a statement. The first measure that has been taken to
mitigate this risk, is by presenting the raw data from the transcribed material. The
second measure has been using a grounded theory approach by using coding when
analysing the data, linking statements to specific codewords and analysing the
codewords independently, reducing the authors bias. However, when the coding
process is done, some form of bias is automatically introduced as the author must do
some form of interpretation in order to code the data. This has been done after the
authors best ability and therefore, the data produced from the coding process and
transcribed material is made available upon request.
Lastly, the results in this thesis have only been based upon qualitive data. Future
work of additional quantitative data is required for stronger support of the claims
made in this thesis.
8.2.3 External validity
The largest threat to validity is that the study is only conducted at one company,
resulting in the results may have a low external validity for other companies and
environments [23]. This affects the generalizability and replicability of the results as
Thesis for Master’s Degree at HIT and LiU
82
they may be dependent on various context-based variables. However, similar studies
have been carried out by using similar methods, technologies and contexts.
Similarities among these studies and results have been observed, indicating some
form of replicability of the results and applicability of the conclusions in other
contexts and companies.
Furthermore, distinguishing what is cause by introducing VRT compared to
introducing some other testing technique with test automation is difficult. Therefore,
the effects of introducing VRT to a context were test automation is already in place
compared to transitioning from a completely manual testing suite, may not be as
apparent as some benefits may already be experienced. However, since VRT is a
subset of test automation, it means test automation benefits can also be experienced
with VRT.
8.3 Work in a wider context
It is important to be aware and to raise awareness of the various stakeholders
over the environmental impact when developing and consuming software. When
introducing well utilized practices such as CI, the organisation may become more
effective [40]. This could in turn mean that the company would not need to employ
as many employees, reducing the carbon footprint from the organisation. However, it
is not common today to take into account the overall efficiency or energy
consumption produced when developing, testing and maintaining software [41].
Therefore, it is important factor to consider when introducing CI practices and
optimizing which jobs are run in the CI pipeline, in order to reduce redundant builds
and test runs. Consequently, reducing the energy impact and becoming more
effective when developing the software.
Thesis for Master’s Degree at HIT and LiU
83
Chapter 9 Conclusion
This thesis aimed to investigate the research questions stated in Section 1.3. It
did so by providing an implementation VRT in an CI environment. The
implementation was divided into three cycles with different purposes to capture
different perspectives of the various stakeholders. Additionally, the author has
performed an informal literature review and collected his own observations
throughout each cycle. This was done in order to gain better insight of the
implications raised by the implementation and to be able to triangulate findings.
Based upon these findings, the following can be concluded regarding each research
question:
RQ1: What practical benefits and drawbacks are associated with introducing
visual regression testing to an industrial CI environment?
Benefits:
- Clear definition with images over how the application/system is expected to
look like.
- VRT is a pessimistic technique, and therefore very unlikely to report false
negatives.
- Potential to decrease the total testing effort over time.
- VRT interacts and captures the end-user perspective.
- Exists VRT tools that are easy to use, even without a developer background.
- Exists tools that are easy for an initial setup.
- Exists tools that offer videos over test run, making it easier to debug.
- Upon failed test runs, produces side by side images with highlights over the
parts of the image that differ.
- Frequent revision over how the application is expected to look like.
- Enables to replace repetitive manual testing tasks, reducing human error and
focus can be spent on more enjoyable tasks.
- Increased efficiency regarding performing smoke tests compared to
performing then completely manually.
Thesis for Master’s Degree at HIT and LiU
84
- Increased quality consciousness, the practice of writing tests acts as an
additional testing step.
- Offers a wider test coverage that can possibly spot bugs that have not been
detected before.
- Increased testing frequency, reducing feedback time.
Drawbacks:
- When the developed software is tightly coupled with 3rd party software, test
suites risk of breaking upon updates from the 3 rd party software provider
leading to test maintenance.
- Maintaining an appropriate strategy for keeping the image baseline up to
date for the VRT tests.
- Challenge of dealing with animation, latency and load times upon the
implementation effort of test suites.
- Increased work effort upon introducing the implementation.
- Still need to perform a root cause analysis upon failed test runs.
- Test execution time tends to be longer compared to other techniques such as
unit testing.
- Needs a strategy for handling false positive test results.
- Cannot replace manual testing practices completely, is still a complement to
manual testing.
RQ2: Which factors should be considered when implementing VRT to an CI
environment?
Factors:
- If possible, keep test execution and CI solution in a local network to
decrease latency and load time issues.
- Early investigation if the testing tool supports functional and performance
needs.
- Early investigation if the CI platform supports functional needs for the
intended testing implementation.
- Investigation of the developed software dependencies to other 3rd party
software, how often would their updates occur and how severe are the
implications of their updates.
Thesis for Master’s Degree at HIT and LiU
85
- A strategy for how the image baseline should be updated and maintained.
- If the application contains animations, investigation over how the VRT test
suites could be affected.
- Ensure a version of the developed software can be uploaded to the SUT.
- If the developed software is dependent on a 3rd party SaaS provider, a
strategy must be investigated for handling licenses for testing accounts.
- Ensure the organization holds enough knowledge and competencies around
the testing techniques and tools used.
- The reliability of the tests results is important, therefore ensuring the
robustness of the test suites is crucial.
- Introduce only automation for tests that are considered repetitive tasks.
- Test cases should be carefully selected in order to keep the execution time
low, as VRT tends to have a longer execution time compared to other testing
techniques.
- An investigation whether the CI platform supports parallel test executions, if
execution speed is crucial.
- An education plan for all the various actors in the system in order to ensure
that all actors know how to draw knowledge from the system and maximize
its benefits.
- A strategy must exist for generating the baseline images, as these images are
affected by both resolutions and operating systems.
- An investigation over which browsers are supported by the CI platform in
order to enable cross-browser testing, if this trait is important.
9.1 Future work
Based upon the results and the discussion in this thesis, it would be interesting
to further investigate strategies for maintaining an image baseline. To the authors
knowledge, there seems to be a lack of research papers within in the area.
Additionally, future work of a stricter comparison of when it is more beneficial to
select a VRT tool over a VGT or vice versa would be interesting, as these tools have
different attributes.
Thesis for Master’s Degree at HIT and LiU
86
Lastly, similar work in different industrial contexts and environments with
support from quantitative data is needed to further support the conclusions in this
thesis.
Thesis for Master’s Degree at HIT and LiU
87
References
[1] V. Garousi and M. V Mäntylä, “When and What to Automate in
Software Testing? A Multi-vocal Literature Review,” Inf. Softw. Technol., vol.
76, no. C, pp. 92–117, Aug. 2016, doi: 10.1016/j.infsof.2016.04.015.
[2] K. Stobie, “Too much automation or not enough? When to automate
testing.,” in Pacific NW Software Quality Conference, 2009.
[3] E. Alégroth, A. Karlsson, and A. Radway, “Continuous Integration
and Visual GUI Testing: Benefits and Drawbacks in Industrial Practice,” in
2018 IEEE 11th International Conference on Software Testing, Verification
and Validation (ICST), 2018, pp. 172–181, doi: 10.1109/ICST.2018.00026.
[4] Qlik, “Qlik Sense.” [Online]. Available:
https://www.qlik.com/us/products/qlik-sense. [Accessed: 20-Feb-2020].
[5] M. Leotta, D. Clerissi, F. Ricca, and P. Tonella, Advances in
Computers, vol. 101. Elsevier, 2016.
[6] B. A. Kitchenham et al., “Preliminary guidelines for empirical
research in software engineering,” IEEE Trans. Softw. Eng., vol. 28, no. 8, pp.
721–734, Aug. 2002, doi: 10.1109/TSE.2002.1027796.
[7] R. Miller and C. Collins, “Acceptance Testing,” 2002, doi:
10.1007/978-1-4419-6488-5_14.
[8] E. Alegroth, R. Feldt, and H. Olsson, “Transitioning Manual System
Test Suites to Automated Testing: An Industrial Case Study,” in Proceedings -
IEEE 6th International Conference on Software Testing, Verification and
Validation, ICST 2013, 2013, pp. 56–65, doi: 10.1109/ICST.2013.14.
[9] S. Berner, R. Weber, and R. K. Keller, “Observations and Lessons
Learned from Automated Testing,” in Proceedings of the 27th International
Conference on Software Engineering, 2005, pp. 571–579, doi:
10.1145/1062455.1062556.
[10] T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G.
Rothermel, “An Empirical Study of Regression Test Selection Techniques,”
ACM Trans. Softw. Eng. Methodol., vol. 10, no. 2, pp. 184–208, Apr. 2001,
doi: 10.1145/367008.367020.
[11] E. Borjesson and R. Feldt, “Automated System Testing Using Visual
Thesis for Master’s Degree at HIT and LiU
88
GUI Testing Tools: A Comparative Study in Industry,” in 2012 IEEE Fifth
International Conference on Software Testing, Verification and Validation,
2012, pp. 350–359, doi: 10.1109/ICST.2012.115.
[12] T.-H. Chang, T. Yeh, and R. C. Miller, “GUI Testing Using
Computer Vision,” in Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, 2010, pp. 1535–1544, doi:
10.1145/1753326.1753555.
[13] P. Li, T. Huynh, M. Reformat, and J. Miller, “A practical approach
to testing GUI systems,” Empir. Softw. Eng., vol. 12, no. 4, pp. 331–357, Aug.
2007, doi: 10.1007/s10664-006-9031-3.
[14] I. Banerjee, B. Nguyen, V. Garousi, and A. Memon, “Graphical User
Interface (GUI) Testing: Systematic Mapping and Repository,” Inf. Softw.
Technol., vol. 55, no. 10, pp. 1679–1694, 2013, doi:
10.1016/j.infsof.2013.03.004.
[15] M. Jovic, A. Adamoli, D. Zaparanuks, and M. Hauswirth,
“Automating Performance Testing of Interactive Java Applications,” in
Proceedings of the 5th Workshop on Automation of Software Test , 2010, pp. 8–
15, doi: 10.1145/1808266.1808268.
[16] J. Andersson and G. Bache, “The Video Store Revisited Yet Again:
Adventures in GUI Acceptance Testing,” in Extreme Programming and Agile
Processes in Software Engineering, 2004, pp. 1–10.
[17] A. Adamoli, D. Zaparanuks, M. Jovic, and M. Hauswirth,
“Automated GUI Performance Testing,” Softw. Qual. J., vol. 19, no. 4, pp.
801–839, Dec. 2011, doi: 10.1007/s11219-011-9135-x.
[18] M. Grechanik, Q. Xie, and C. Fu, “Creating GUI Testing Tools
Using Accessibility Technologies,” in 2009 International Conference on
Software Testing, Verification, and Validation Workshops, 2009, pp. 243–250,
doi: 10.1109/ICSTW.2009.31.
[19] E. Alegroth and R. Feldt, “On the long-term use of visual gui testing
in industrial practice: a case study,” Empir. Softw. Eng., 2017, doi:
10.1007/s10664-016-9497-6.
[20] O. Taipale, J. Kasurinen, K. Karhu, and K. Smolander, “Trade-off
between automated and manual software testing,” Int. J. Syst. Assur. Eng.
Manag., vol. 2, no. 2, pp. 114–125, Jun. 2011, doi: 10.1007/s13198-011-0065-
Thesis for Master’s Degree at HIT and LiU
89
6.
[21] D. M. Rafi, K. R. K. Moses, K. Petersen, and M. V Mäntylä,
“Benefits and Limitations of Automated Software Testing: Systematic
Literature Review and Practitioner Survey,” in Proceedings of the 7th
International Workshop on Automation of Software Test, 2012, pp. 36–42.
[22] T. C. Lethbridge, S. E. Sim, and J. Singer, “Studying Software
Engineers: Data Collection Techniques for Software Field Studies,” Empir.
Softw. Eng., vol. 10, no. 3, pp. 311–341, Jul. 2005, doi: 10.1007/s10664-005-
1290-x.
[23] P. Runeson and M. Höst, “Guidelines for conducting and reporting
case study research in software engineering,” Empir. Softw. Eng., vol. 14, no.
2, p. 131, Dec. 2008, doi: 10.1007/s10664-008-9102-8.
[24] A. Collins, D. Joseph, and K. Bielaczyc, “Design Research:
Theoretical and Methodological Issues,” J. Learn. Sci., vol. 13, no. 1, pp. 15–
42, 2004, doi: 10.1207/s15327809jls1301_2.
[25] K.-J. Stol, P. Ralph, and B. Fitzgerald, “Grounded Theory in
Software Engineering Research: A Critical Review and Guidelines,” 2016, doi:
10.1145/2884781.2884833.
[26] Qlik, “Qlik Pricing.” [Online]. Available:
https://www.qlik.com/us/pricing. [Accessed: 20-Feb-2020].
[27] “GulpJS.” [Online]. Available: https://gulpjs.com/. [Accessed: 27-
Feb-2020].
[28] “NestJS.” [Online]. Available: https://docs.nestjs.com/. [Accessed:
27-Feb-2020].
[29] “NodeJS.” [Online]. Available: https://nodejs.org/en/about/.
[Accessed: 27-Feb-2020].
[30] “CircleCI.” [Online]. Available: https://circleci.com/docs/2.0/about-
circleci/. [Accessed: 27-Feb-2020].
[31] “Introduction to JSON Web Tokens.” [Online]. Available:
https://jwt.io/introduction/. [Accessed: 27-Feb-2020].
[32] “Why Cypress?” [Online]. Available:
https://docs.cypress.io/guides/overview/why-cypress.html#In-a-nutshell.
[Accessed: 27-Feb-2020].
[33] “Cross Browser Testing.” [Online]. Available:
Thesis for Master’s Degree at HIT and LiU
90
https://docs.cypress.io/guides/guides/cross-browser-testing.html#Continuous-
Integration-Strategies. [Accessed: 27-Feb-2020].
[34] “Customer Providers.” [Online]. Available:
https://docs.nestjs.com/fundamentals/custom-providers. [Accessed: 28-Mar-
2020].
[35] A. L. Alwardt, N. Mikeska, R. J. Pandorf, and P. R. Tarpley, “A lean
approach to designing for software testability,” in 2009 IEEE
AUTOTESTCON, 2009, pp. 178–183.
[36] “Deploy and run apps on today’s most innovative Platform as a
Service.” [Online]. Available: https://www.heroku.com/platform. [Accessed:
16-Mar-2020].
[37] “Dyno Types.” [Online]. Available:
https://devcenter.heroku.com/articles/dyno-types. [Accessed: 16-Mar-2020].
[38] “Stacks.” [Online]. Available:
https://devcenter.heroku.com/articles/stack. [Accessed: 16-Mar-2020].
[39] “Build fast. Start for free.” [Online]. Available:
https://circleci.com/pricing/. [Accessed: 16-Mar-2020].
[40] S. Dösinger, R. Mordinyi, and S. Biffl, “Communicating continuous
integration servers for increasing effectiveness of automated testing,” in 2012
Proceedings of the 27th IEEE/ACM International Conference on Automated
Software Engineering, 2012, pp. 374–377.
[41] J. Drangmeister, E. Kern, M. Hirsch-Dick, S. Naumann, G.
Sparmann, and A. Guldner, “Greening Software with Continuous Energy
Efficiency Measurement,” in GI-Jahrestagung, 2013.
Thesis for Master’s Degree at HIT and LiU
91
Appendix A: Interview Questions Cycle 1
Background
1. How long have you been working in the Software industry?
2. What is your prior experience with testing techniques? What was your
experience/attitude with/towards it?
3. Have you ever been involved in a company or project that utilizes VRT? If
so, what was your experience with it?
Automated testing VRT
4. What are the main drives behind the incentive for test automation?
5. Considering VRT, what are the associated main benefits achieved by
introducing test automation?
6. What are the associated drawbacks with introducing test automation, more
specifically with VRT?
7. Is VRT a compliment or an alternative to manual testing? Yes (complement)
- What does manual testing contribute to compared to the automated test
execution with VRT? / Explain, why? No (alternative)
Perceived quality
8. If you transition to automatic testing, do you think your released product
will obtain a higher overall quality? High product quality is defined as a low
defect level in the product.
9. Do you think automatic testing will affect your confidence in a working
product after a release?
Processes
10. How does the process work from a bug or feature is reported until the code
changes exist in a production environment? (Describe the steps)
11. How can automated test execution affect the process described earlier?
Why/Why not?
12. Will automated test execution increase or decrease the feedback time if a
failure or defect is found, more specifically VRT?
13. Will automated test execution decrease or increase the total effort (in time)
of testing?
Thesis for Master’s Degree at HIT and LiU
92
Appendix B: Interview Questions Cycle 2
Background
1. How long have you been working as a software tester?
2. What’s your prior experience with software testing tools?
3. What’s your prior experience with VRT testing tools?
4. What are your assignments as a software tester?
Software Testing
5. What are the most time-consuming tasks when working as a software tester
at Vizlib?
6. As a software tester, what tasks do you find most enjoyable?
7. What tasks or issues do you find tedious as a software tester?
8. Is VGT a compliment or an alternative to manual testing? Yes (complement)
- What does manual testing contribute to compared to the automated test
execution with VRT? / Explain, why? No (alternative)
Experience gained from VGT
9. What challenges did you experience when writing tests for Cypress?
10. How much effort did it require before you were able to produce tests for
Cypress? How long did it take before you learned the tool?
11. How long would you say it takes for you to produce a test with Cypress?
12. Do you believe your workload will increase or decrease with Cypress?
13. Has there been any test cases that have been difficult to test? If so, what was
the issue?
Perceived benefits & Drawbacks
14. How reliable do you feel the test results are after executing a test suite with
Cypress?
15. What do you believe are the benefits and which have you experienced since
the transition with introducing Cypress? Why?
16. What do you believe are the drawbacks and which have you experience since
the transition with introducing Cypress? Why?
Thesis for Master’s Degree at HIT and LiU
93
Appendix C: Interview Questions Cycle 3
Background
1. How long have you been working as a software developer?
2. How long have you been working for Vizlib?
Perceived issues
3. How much effort of your time is spent on maintaining the quality of the
products rather than developing new features?
4. How long does it usually take before you receive feedback on whether your
changes are approved or not?
5. How would you consider the code complexity of the extensions?
6. Considering the complexity of the extensions, do you find it difficult to
implement new features without possibly introducing new bugs? Are the
extensions bug-prone?
Perceived Benefits & Drawbacks
7. Do you believe VRT testing techniques are able to detect bugs you usually
deal with?
8. From your perspective, have you experienced any benefits since the
transition to the CI solution? Elaborate.
9. Have you experienced any drawbacks from the CI solution? Elaborate.
Thesis for Master’s Degree at HIT and LiU
94
Statement of Originality and Letter of Authorization
学位论文原创性声明
Statement of Originality
本人郑重声明:此处所提交的学位论文《中文题目 English Title》,是本人在导师指
导下,在哈尔滨工业大学攻读学位期间独立进行研究工作所取得的成果。且学位论
文中除已标注引用文献的部分外不包含他人完成或已发表的研究成果。对本学位论
文的研究工作做出重要贡献的个人和集体,均已在文中以明确方式注明。
作者签名: 日期: 年 月 日
学位论文使用权限
Letter of Authorization
学位论文是研究生在哈尔滨工业大学攻读学位期间完成的成果,知识产权归属哈尔
滨工业大学。学位论文的使用权限如下:
(1)学校可以采用影印、缩印或其他复制手段保存研究生上交的学位论文,并向国
家图书馆报送学位论文;(2)学校可以将学位论文部分或全部内容编入有关数据库
进行检索和提供相应阅览服务;(3)研究生毕业后发表与此学位论文研究成果相关
的学术论文和其他成果时,应征得导师同意,且第一署名单位为哈尔滨工业大学。
保密论文在保密期内遵守有关保密规定,解密后适用于此使用权限规定。
本人知悉学位论文的使用权限,并将遵守有关规定。
作者签名: 日期: 年 月 日
导师签名: 日期: 年 月 日
Thesis for Master’s Degree at HIT and LiU
95
Acknowledgement
I would like to thank everyone at Vizlib for supporting me with my work. Additionally,
assisting me with technical issues, bouncing and mind storming ideas and their overall
involvement in the study. I would like to shout out a special thanks to David Alcobero,
who made this thesis possible. I would also like to additionally express my gratitude for
everyone involved in the interviews, as they are the backbone for this thesis. Furthermore,
I’d like to thank Lena Buffoni and John Tinnerholm for their supervision. Additionally, I
would like to thank my opponent Oscar Andell for his feedback, Andreas Lundquist for
helping each other and Tong Zhang for translating my abstract to Chinese. Lastly, I would
like to thank friends and loved ones for their support.
Thesis for Master’s Degree at HIT and LiU
96
Confirmation of Supervisors
HIT Supervisor
Signature: Date:
LiU Supervisor
Signature: Date:
Internship Supervisor
Signature: Date:
Recommended