47
Browz: Automatic Visual Regression Testing using containers for Web Apps running on Chrome and Firefox Authors: Sergio GUZMÁN-MAYORGA Julián MANRIQUE -P UERTO Advisor: Mario L INARES -VÁSQUEZ A thesis submitted in fulfillment of the requirements for the degree of Systems and Computation Engineering in Systems and Computing Engineering Department July 27, 2020

Browz: Automatic Visual Regression Testing using

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Browz: Automatic Visual Regression Testing using

Browz: Automatic Visual RegressionTesting using containers for Web Apps

running on Chrome and Firefox

Authors:Sergio

GUZMÁN-MAYORGA

Julián

MANRIQUE-PUERTO

Advisor:Mario

LINARES-VÁSQUEZ

A thesis submitted in fulfillment of the requirementsfor the degree of Systems and Computation Engineering

in

Systems and Computing Engineering Department

July 27, 2020

Page 2: Browz: Automatic Visual Regression Testing using

ii

“You can’t connect the dots looking forward; you can only connect them looking back-wards. So you have to trust that the dots will somehow connect in your future. You haveto trust in something - your gut, destiny, life, karma, whatever.”

Steve Jobs

Page 3: Browz: Automatic Visual Regression Testing using

iii

Abstract

Browz: Automatic Visual Regression Testing usingcontainers for Web Apps running on Chrome and Firefox

Open source Javascript solution that explores inside a docker container a givenweb app and detects the differences across browsers with Visual Regression test-ing (with snapshot comparisons). In this regard, any software developer can ana-lyze and/or visualize its results and take decisions on how to improve the givenapp.

Page 4: Browz: Automatic Visual Regression Testing using
Page 5: Browz: Automatic Visual Regression Testing using

v

Contents

Abstract iii

List of Figures vii

List of Tables ix

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Context 52.1 XBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Visual Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Docker Container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Regular Web Application Export . . . . . . . . . . . . . . . . . . . . 6

3 Related work 73.1 Proprietary Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 BrowserStack [18] . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.2 LambdaTest [17] . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Crowd Sourced Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Open Source Software . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3.1 Galen [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3.2 BackstopJS [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Scientific Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4.1 Visual Regression Testing . . . . . . . . . . . . . . . . . . . . 93.4.2 Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Solution Design 134.1 Architecture Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Code Layout: Container and Host . . . . . . . . . . . . . . . . . . . 14

Page 6: Browz: Automatic Visual Regression Testing using

vi

4.3 Visual Regression Implementation . . . . . . . . . . . . . . . . . . . 174.3.1 Snapshot Processor Protocol . . . . . . . . . . . . . . . . . . 184.3.2 Image Comparison with ResembleJS . . . . . . . . . . . . . . 19

4.4 Client Instantiation: Handlers . . . . . . . . . . . . . . . . . . . . . . 204.5 Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.5.1 Report Generation . . . . . . . . . . . . . . . . . . . . . . . . 214.5.2 Visualizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.6 Comparison with Other Solutions . . . . . . . . . . . . . . . . . . . 25

5 Deployment and Documentation 295.1 Running From Source Code . . . . . . . . . . . . . . . . . . . . . . . 295.2 NPM Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Conclusion 31

7 Future Work 33

Bibliography 35

Page 7: Browz: Automatic Visual Regression Testing using

vii

List of Figures

1.1 Session cookies support MDN.[15] . . . . . . . . . . . . . . . . . . . 2

2.1 React JS Build example. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 X-Check architecture [28] . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Deployment Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2 General Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . 154.3 Container Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 154.4 Browz Code Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5 Container Code Organization . . . . . . . . . . . . . . . . . . . . . . 184.6 Resemble.js Generated Metadata . . . . . . . . . . . . . . . . . . . . 204.7 ResembleJS Default Input JSON . . . . . . . . . . . . . . . . . . . . . 214.8 Run Details on Visualizer . . . . . . . . . . . . . . . . . . . . . . . . 234.9 Run Details Logs on Visualizer . . . . . . . . . . . . . . . . . . . . . 234.10 Event Details Firefox . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.11 Event Details Side By Side . . . . . . . . . . . . . . . . . . . . . . . . 244.12 Event Details Comparison . . . . . . . . . . . . . . . . . . . . . . . . 244.13 Event Details Comparison with Huge Difference . . . . . . . . . . . 25

Page 8: Browz: Automatic Visual Regression Testing using
Page 9: Browz: Automatic Visual Regression Testing using

ix

List of Tables

4.1 State of the art feature comparison against proprietary solutions . . 264.2 State of the art feature comparison against open source solutions . 264.3 State of the art feature comparison against academic research . . . . 27

Page 10: Browz: Automatic Visual Regression Testing using
Page 11: Browz: Automatic Visual Regression Testing using

1

Chapter 1

Introduction

1.1 Problem Statement

The development process of a web application is not an easy task. Besides thecosts related to learning curves, assignments division, implementations and oth-ers, it must be considered that the product must look and function well fromany device and screen size. Due to this last issue, web development is greatlyfragmented, meaning, there is a huge number of possible combinations for thesoftware and hardware that the web application will be run on. In this last aspectlies the problem of web developers: There are no open source tools that allowthe testing of how the same web application performs on different versions andtypes of browsers and operative systems. This leads to some particular effects ofthe web fragmentation, like:

• Functionalities not fully supported on all browsers: For example, sessioncookies have all features supported on Mozilla Firefox, but it lacks such sce-nario on both Google Chrome and Microsoft Edge, as it is stated on figure1.1.

• Visual properties differences: Among the most common mistakes on thisarea we can talk about CSS prefixes management, that were introduced bybrowsers so that they could allow more expressiveness on their CSS engineswithout breaking the standard. For example, this is how one would definea gradient background using Safari and Mozilla Firefox, when the propertyhad to have vendor-specific syntax:

1 background : −moz−l i n e a r−gradient ( l e f t , green , yellow ) ; #Mozil la F i r e f o x

23 background : −webkit−gradient ( l i n e a r , l e f t center , r i g h t

center , from ( green ) , to ( yellow ) ) ; # Webkit

Page 12: Browz: Automatic Visual Regression Testing using

2 Chapter 1. Introduction

45 background : l i n e a r−gradient ( to r ight , green , yellow ) ; # CSS

Standard

• Differences between versions of the same browser: This is not so frequentas in mobile app development, but there are cases like the deprecation ofAPIs. For instance, with the arrival of Chrome 64 at the end of 2017, it wasdecided to abandon the support for "chrome.loadTimes()", a non standardAPI that got metrics of data loading and network traffic.[11]

• Implementation differences in cases where the W3C standards do not spec-ify. For instance, error management and return values for mistaken inputsfor built-in javascript APIs.

FIGURE 1.1: Session cookies support MDN.[15]

Page 13: Browz: Automatic Visual Regression Testing using

1.2. Thesis Goals 3

1.2 Thesis Goals

Consequently, the goal of this thesis is to build a functional prototype that allowsweb developers to evaluate how their app behaves under the same flow of action betweendifferent browsers and take decisions based on the results. We will be focusing onsupporting automated tests on two specific browsers: Chrome and Firefox.

Hence, the specific objectives of this thesis are:

• Design an architecture to automatically deploy a web application, test it ondifferent browsers and compare the results..

• Implement a functional prototype that deploys locally a web application -exported in a basic manner- and test it using two types of browsers: Chromeand Firefox. A basic manner of exportation is defined as a folder that onlycontains graphical or other media assets, and the three types of standardfiles handled on a web page (HTML, CSS and JS), without requiring server-side rendering (SSR).

• Publish a first version of the prototype in a public package manager (NPMfor instance) so that any developer can use it as a CLI tool.

1.3 Thesis Contribution

The fact of offering tools to mitigate the lack of time and/or resources of a de-velopment team during the tests of their web application on different browsers,versions and/or devices allows:

• The generation of software components that will function consistently re-gardless of where it is executed.

• Improved user experience as users with a different browser or version thanthe one the development team tested in, might not experience inconsisten-cies or errors.

• Increased accessibility of developed web apps, in terms of use from differ-ent devices and screen sizes.

Ultimately, as an alternative to proprietary solutions we offer Browz, an opensource tool [14] that compares a given web application between browsers anddetermines if there is any difference running the app in them.

Page 14: Browz: Automatic Visual Regression Testing using

4 Chapter 1. Introduction

1.4 Document Structure

After this short introduction, we will start by defining some basic terms as partof the context, then talk about the related work and state of the art of Visual Re-gression Testing. With this general approach we will present our solution designfor the prototype and the given deployment and documentation for NPM so thatevery person can use it. Finally, a small set of conclusions and future work tocontinue with this project.

Page 15: Browz: Automatic Visual Regression Testing using

5

Chapter 2

Context

2.1 XBI

“Cross-browser incompatibilities (XBIs) are discrepancies between a web appli-cation’s appearance, behavior, or both, when the application is run on two dif-ferent environments. An environment consists of a web browser together withthe host operating system.“[6]. With our solution we aim to give developers anopen-source tool to detect structure (layout errors), content (text appearance andlength, for example) and behavior (different actions depending on the browsertype) XBIs [6], displayed on the User Interface when using different browsersand screen resolutions.

2.2 Visual Regression Testing

“The regression testing method called Visual Regression Testing checks whetherapplication screens display correctly without any presentation failures. This methoduses image comparison tools that detect differences in screen elements, e.g. thedisappearance of or change in the position of a button, by comparing images ofthe correct screen and the target screen by using computer vision techniques.“[1].The proposed functional prototype belongs to this field of regression testing sincewe aim to run the web app and compare screenshots when performing multipleactions on it.

2.3 Docker Container

Containers have been regarded as an unit of software that can be excuted ev-erywhere. For our specific case we chose to package our execution software us-ing Docker. “A Docker container image is a lightweight, standalone, executable

Page 16: Browz: Automatic Visual Regression Testing using

6 Chapter 2. Context

package of software that includes everything needed to run an application: code,runtime, system tools, system libraries and settings.“[7] We only need to create astandard image that can be executed in any environment without worrying aboutdifferent architectures of computers, missing libraries, etc.

2.4 Regular Web Application Export

For this project we will only receive regular web application exports, which onlyconsist of static assets (images, gifs, videos, audios, animations, resource files,etc) and standard HTML, CSS and JS files. For example the app public build ofReact in fig 2.1

FIGURE 2.1: React JS Build example.

Currently we are not accepting apps with files that require client interpretationbeyond the basics (Frameworks like Vue.js [27], or libraries like React [23], forexample) or server-side logic (PHP [21], Nuxt.js [2], etc). This decision was takenin the basis that all web applications can be translated to the web standard files(HTML, CSS and JS).

Page 17: Browz: Automatic Visual Regression Testing using

7

Chapter 3

Related work

The state of the art related to this particular project is classified on four differenttypes of solutions, that focus on giving developers tools to test their web ap-plications on different kinds of platforms: proprietary solutions, crowd sourcedtesting, open source software and scientific research.

3.1 Proprietary Solutions

This is a niche of solutions for testing fragmentation on web applications: Browser-Stack, SouceLabs, LambdaTest, Browserling, CrossBrowserTesting, just to namea few. Coming up, we will expose some of them that offer a free trial to test whatthey can offer.

3.1.1 BrowserStack [18]

Platform for executing tests on the cloud. It offers two modes: manual and au-tomatic. The first one consists on a virtual machine they offer as "real devices"in which you can instantiate a great variety of browsers, versions and operativesystems. Whereas, the latter provisions virtual machines on a transparent wayto the user by running his script/test of Selenium web driver; this grants a greatflexibility and speed since as an user you are able to declare all the characteristicsfor the machine to provision on a single file.

One of the biggest advantages that it offers is that it extracts lots of informa-tion from the tests: network usage, console logs, photos, videos and even errors.However, among the worst aspects is the instability of the connection with thevirtual machines: Approximately, every 10 seconds the connection is reset, whichnegatively affects the interaction with the machine.

Page 18: Browz: Automatic Visual Regression Testing using

8 Chapter 3. Related work

3.1.2 LambdaTest [17]

Similar to BrowserStack, it is a platform that offers the two functionalities pre-viously described. In comparison, the interaction with the virtual machines isbetter, but the provisioning process can take longer.

Additionally, it offers three types of visual testing of UI (User Interface). Thefirst one consists on taking a screenshot of the first render of a given URL. Thesecond one compares images showing the differences among them (similar to Re-semble.js [13]), including a interactive slider that allows you to see what changedfrom one image to the other. The last one is a responsive test where you getscreenshots of the same page using multiple devices, which ends up being wastedsince images cant be downloaded on a batch; you need to save manually each oneof the generated pictures.

3.2 Crowd Sourced Testing

Crowd sourced testing consists on distributing an application for testing on dif-ferent machines and pay those who are willing to execute tests for such software.Although it is common in other types of software like games and mobile apps, itis not viable for web, since it requires a high costs compared to automation tools.Besides, it does not have the possibility to scale so easily as automated tests do.Meaning, one round of crowd sourced testing covers a specific version of the soft-ware; in order to test another version with corrections, other round is required,which is time and money consuming. Since web applications change constantly,this option becomes non-practical on real life scenarios.

3.3 Open Source Software

3.3.1 Galen [8]

Open source framework for functional and web page layout testing. It offers asyntax to run assertions on the layout of the page, allowing simple and complextests. The project has been inactive since more than a year (March 10th - 2019).It uses Java and Javascript, and does not contemplate any type of browser vir-tualization. They use Selenium and the webdrivers from different browsers toexecute the aforementioned assertions.

Page 19: Browz: Automatic Visual Regression Testing using

3.4. Scientific Research 9

3.3.2 BackstopJS [9]

Application for visual regression testing of web apps. Its main functionality lieson watching the changes of a webpage along a specific time. It runs exclusivelyon Chrome headless, but it incorporated Docker virtualization to support testsusing different operative systems on the same host. Also, it allows the use ofscripts with Puppeteer [22] or ChromyJS [20] to automate interactions during thevisual regression testing.

3.4 Scientific Research

Taking a look into the literature, there are two main methods that offer developersthe opportunity to know how their application looks in different browsers: Visualregression testing and code analysis.

3.4.1 Visual Regression Testing

It consists on generating images that show the places or actions where the sameapplication is different when compared to a different browser. Alongside this lineof research you can find tools like WEBDIFF and CrossCheck.

WEBDIFF. The main idea is to collect information from the DOM of each browserand capture a screenshot of the web page at a given time. Then, having a refer-ence browser, mark all elements that vary when compared to such DOM. Finally,compare the information using the images and generate a list of elements that aredifferent. [24]

CrossCheck. Use of the best of two different tools: Webdiff (Already explainedabove) and CROSST. Meanwhile Webdiff is very useful for detecting XBIs relatedto how the screen looks in different browsers, CROSST is used for finding trace-level XBIs. When talking about trace-level, we refer to detection of differences onthe elements of the DOM by performing a static code check. In top of this, Cross-Check introduces machine learning to build a classifier for visual comparison ofelements. [5]

3.4.2 Code Analysis

Instead of checking screenshots taken during the execution of a specific user flow,this technique consists in analysis the client-side code to determine whether aspecific action is different in any of a given set of browsers.

Page 20: Browz: Automatic Visual Regression Testing using

10 Chapter 3. Related work

X-PERT and X-Check

X-PERT [6] introduces a new method for comparing two browsers running thesame web page declared as the alignment graph. The main idea is to declarebetween nodes two types of relations: siblings (HTML nodes on the same level)and parents (Node that directly contains another). When this graph is createdand there is a specification of the position of one node related to the other (above,centered, left, etc), both structures are compared to check if the layout of oneelement is different.

X-Check [28] is another interesting tool developed later based on a method ofcapture/replay. Using an automated (automatic web-crawler) or a manual ap-proach, several commands are run on a reference browser, logged to a centraldatabase and then repeated on different browsers. To complement this, there is aproxy implemented inside the web application, to store every JavaScript methodcall and network responses, so that when ran in different platforms they can getthe same answers for queries. The main idea is to verify after every new eventthe existence of XBIs by checking JavaScript functions, CSS and HTML (In herethey use X-PERT). The architecture can be found on Fig 3.1

FIGURE 3.1: X-Check architecture [28]

Page 21: Browz: Automatic Visual Regression Testing using

3.4. Scientific Research 11

X-Diag

Built on top on X-Check, X-Diag [29] is an automated tool that only identifieswhen an XBI happens, it also points out what is the root cause of such inconsis-tency. In order to do this, it makes a three-step check:

• Execution of JavaScript functions over the DOM APIs using Jalangi to mon-itor their execution.

• Rendering of CSS properties, searching for the external, internal and inlinestyles and the default styles of the browser if none of the previous generatedthe inconsistency.

• HTML syntax, checking the different attribute values.

Page 22: Browz: Automatic Visual Regression Testing using
Page 23: Browz: Automatic Visual Regression Testing using

13

Chapter 4

Solution Design

4.1 Architecture Diagrams

Our solution consists on the execution of a previously provisioned docker con-tainer image that runs all headless browsers against a given regular web appexport. The deployment can be seen on Figure 4.1; as such, we have three sharedvolumes between the container and the host (currently supporting Linux-basedOperative Systems):

• App volume: Directory where the given web app is found.

• Snapshot Destination Volume: Directory where all screenshots and Resem-ble.js data will be saved.

• Configuration Volume: Directory that contains files to configure both hostand container runtimes.

The main idea is for the host (who must have already installed Node JS and NPM[19]) to execute on a command line terminal this project, which will in turn startdocker, pull the predefined image if necessary, make internal container opera-tions and, after stopping the container, run a HTTP server to show a detailedreport about the found results. This general flow of action can be found on Fig-ure 4.2

Now then, the docker execution receives from the host configuration parametersrelated to max memory use, shared volumes (directories) location and the exe-cution command to download the latest version of this open source project andexecute it (More details on Chapter 5). After this, we have several actors to ex-plore the given web app as shown on Figure 4.3:

Page 24: Browz: Automatic Visual Regression Testing using

14 Chapter 4. Solution Design

FIGURE 4.1: Deployment Diagram

• Docker container runtime: Responsible for starting the servers and the ex-ploration process of the regular web app export. It is also the one responsi-ble for returning a success or failure response to the host.

• Snapshot processor server: Server responsible for receiving incoming snap-shots and logs from the browser instances and comparing them betweeninstances.

• HTTP Server: Server responsible for hosting the files of the regular web appexport given by the end user.

• Headless browser instances: Instances controlled by a handler that per-forms specific actions on the app provided by the HTTP Server (More de-tails on section 4.4).

4.2 Code Layout: Container and Host

First, we have the code layout of our open source repository on Github, as shownin Figure 4.4

In it, we have the following folders:

Page 25: Browz: Automatic Visual Regression Testing using

4.2. Code Layout: Container and Host 15

FIGURE 4.2: General Sequence Diagram

FIGURE 4.3: Container Sequence Diagram

Page 26: Browz: Automatic Visual Regression Testing using

16 Chapter 4. Solution Design

FIGURE 4.4: Browz Code Layout

Page 27: Browz: Automatic Visual Regression Testing using

4.3. Visual Regression Implementation 17

• bin: Starting point for the code execution of the host. In it we verify the in-put params (Regular web app export location and images destination path)and manage the general execution flow (start the container, stop the con-tainer and start the report).

• browser-execution: Folder where the code that will be run in the containeris located.

• config: Configuration files across the entire process. To ensure the same con-figuration between the host and the container, this is mounted as a volumewhen launching the container.

• shared: Modules used throughout the entire application, including the re-trieval of configuration properties (config.js), the names of browsers (browsers.js)and the logging of messages (logger.js).

• src: Directory where the execution code of the host is located. It includes thedocker manager (in charge of managing command line operations relatedto docker) and the report manager (in charge of generating report data filesand starting the visualizer).

• misc: Other files that do not match any of the above, test scripts for instance.

In order to run the docker container, we use the command line terminal APIfrom Node [4]. In this regard, the identification of messages and errors are donethrough system exit codes (0 means the execution was successful, otherwise therewas an error) and output channels (error output channel generally is an error, ex-cept if we are working with git [10])

As for the code that is executed in the container, it is organized as shown in Figure4.5.

The browser-execution/index.js file serves as an entry point to start all the pro-cesses previously mentioned on the container sequence diagram (Figure 4.3).

With this general context we can proceed to explain in detail what composes thebrowser-execution folder in the next sections.

4.3 Visual Regression Implementation

The main goal of visual regression is to test for unintended changes on the UI ofan app; in this regard we implemented the Snapshot Processor Server to receive

Page 28: Browz: Automatic Visual Regression Testing using

18 Chapter 4. Solution Design

FIGURE 4.5: Container Code Organization

incoming requests from the browser instances so that we can save and compareimages, and ultimately detect these XBIs.

4.3.1 Snapshot Processor Protocol

When the server is started, it looks for the active browsers on the configurationand sets a 5 minute initialization timeout for each browser. Later on, after receiv-ing a request from any of these browsers it resets the timeout for that browser tothe one configured by the user (30 seconds by default).

The protocol provided by the Snapshot Processor to make requests is a REST APIwith two endpoints.

The first one is a POST method to the root of the server (/) that includes the infor-mation of the event executed on the browser (event name, event type, browser,timestamp when the event was sent and a specific id) and two images that corre-spond to what happened before and after the event.

When all browsers send the information about a specific event id, the server be-gins a comparison process and stores the given information inside the destinationdirectory provided by the user (if none provided, it defaults to the "runs" folder).

Page 29: Browz: Automatic Visual Regression Testing using

4.3. Visual Regression Implementation 19

The second endpoint is also a POST method to send logs generated by the browser(hence, the resource is /logs) which includes the log type (warning, log, info orerror), browser, a timestamp and the log contents. Each of these logs is appendedto a list on memory and later on stored on a separate JSON file exclusively.

The idea of keeping a separate endpoint for logs is to have a complete track ofthe stack trace of what happened in each browser. When creating the report, weassociate the logs with events by using their timestamps. This way, the final userknows if an action triggered a log, easing the debugging process of the cross-browser inconsistencies.

4.3.2 Image Comparison with ResembleJS

The image comparison used in this project is not a new algorithm; as a matter offact, we are using a visual regression analysis tool written in Javascript: Resem-ble.js [13]

Every time the snapshot processor receives an image, it automatically saves itwith Multer [12]. Also, it stores the file path in a structure that maps each eventid to their respective snapshot file paths. These snapshots are taken before theevent is executed and immediately afterwards. The directory structure wherethis data is saved is:

1 <date_s t r ing >/snapshots/<event_id >/

The date string will be the same for the entire execution of the browsers for thegiven web app, and inside this folder we store the images per browser.

In order to compare the images, the snapshot processor loads each of the files intomemory with the given paths and passes it to Resemble.js. After this, we storethe comparison data given by resemble on the same directory, as shown in Figure4.6.

We use the "misMatchPercentage" to establish an alert threshold configurable bythe user. The closer it is to 100, the more different the images are.

Finally, as part of Resemble.js, we can configure different parameters to the com-parison, like:

• ignore: Specify what aspects should be ignored from the comparison. Thisincludes the options "nothing", "less", "colors", "antialiasing" and "alpha".

• scaleToSameSize: Set the images to compare to be of the same size

Page 30: Browz: Automatic Visual Regression Testing using

20 Chapter 4. Solution Design

FIGURE 4.6: Resemble.js Generated Metadata

• output: Object that specifies how the output will be generated, this includesthe errorColor, errorType (flat, movement, flat with diff intensity, move-ment with diff intensity, diff portion fro the input), transparency (compareimages as "opaque" or "transparent"), useCrossOrigin (specify the use ofCORS when comparing remote images), boundingBox or boundingBoxes(Compare only a specific part of the images), ignoredBox or ignoredBoxes(Ignore a specific part of the images) and you can even declare what col-ors to ignore during the comparison (ignoreAreasColoredWith). There areother attributes that are not that well documented (like largeImageThresh-old).

These parameters are part of the general process configuration defined by theuser. In case this is not present we use the JSON properties of Figure 4.7

4.4 Client Instantiation: Handlers

In order to make tests on the regular web app export we create instances of head-less browsers in an abstraction that we named handler. A handler is a functionthat instantiates browsers with a specific method of exploration. Each handlerhas linked a platform to use and explore the web app. This way, we can have dif-ferent handlers depending on what platform the final user decides to instantiatethe browsers with (for example Puppeteer [22], Playwright [16], or Cypress).

Page 31: Browz: Automatic Visual Regression Testing using

4.5. Report 21

FIGURE 4.7: ResembleJS Default Input JSON

As a starting point, we are using a smart monkey created by TheSWDesignLab re-search group at Los Andes University using Cypress [25] that handles two typesof browsers: Chrome and Firefox. We can set as parameters the base URL, num-ber of events to execute and a run seed; the seed can be used to retry a specific setof events and thats why the user can provide it.

When the main handler promise is resolved the docker container runtime stopsthe browsers and proceeds to either report an error or successful result to thehost.

4.5 Report

Since not every user will end up reading the logs generated by our app we showall the results in a web UI. There are two main parts for this: the report generationand the visualizer.

4.5.1 Report Generation

As it was previously discussed on the Snapshot Processor protocol section (4.3.1),the image comparison paths and the logs data are stored inside memory struc-tures, which will ultimately be saved inside the directory:

1 <date_s t r ing >/

Page 32: Browz: Automatic Visual Regression Testing using

22 Chapter 4. Solution Design

All logs are saved inside the "log.json" separating each browser with its respectivelogs (as previously stated, these include the timestamp, log type, and contents ofthe log).

As for the rest of the data, on the "run.json" we store all the general informationof the run, including:

• numEvents: Max number of events configured.

• startDate, endDate and startTimestamp: Date attributes to determine thedates of the run.

• baseBrowser: Base browser to compare images against.

• browsers: Browsers where the execution was performed.

• appDirname: Directory of the app specified by the user.

• events: Array of events that are stored as defined by the SnapshotProcessor.If logs where matched to this event we add them as well.

Finally we create a new file named "runs.json" that stores an array of strings withall the runs created by the user on the specific snapshot destination directoryprovided, and copy it alongside all the snapshots and report files, to a "runs"folder inside the report visualizer distribution directory.

4.5.2 Visualizer

The Report visualizer is a Vue.js project that ultimately loads all the data gener-ated from previous execution runs. It is hosted locally with a basic HTTP server.

Per each execution run, the UI shows details of it, including: browsers, startingseed, dates, logs, and a list of events each with the mismatch percentage betweenbrowsers, as seen on Figure 4.8

There is another screen for matching each log to a specific event on the browseras shown in Figure 4.9

As for the event details we offer the user a funnel, so that he can select the vi-sualization target he wants (for a browser 4.10 or for the comparison betweenbrowsers 4.11), the moment (before or after) and the type of visualization (forcomparisons, we support side by side 4.11 and visual differences on the samescreen, meaning, the result image of Resemble.js 4.12)

Page 33: Browz: Automatic Visual Regression Testing using

4.5. Report 23

FIGURE 4.8: Run Details on Visualizer

FIGURE 4.9: Run Details Logs on Visualizer

FIGURE 4.10: Event Details Firefox

Page 34: Browz: Automatic Visual Regression Testing using

24 Chapter 4. Solution Design

FIGURE 4.11: Event Details Side By Side

FIGURE 4.12: Event Details Comparison

Page 35: Browz: Automatic Visual Regression Testing using

4.6. Comparison with Other Solutions 25

There are often cases in which the difference between the two browsers is toonoticeable and is advised to use the "Visual diff" type, as it can be seen on Figure4.13

FIGURE 4.13: Event Details Comparison with Huge Difference

4.6 Comparison with Other Solutions

According to the whole solution design we have described, on tables 4.1, 4.2 and4.3, is shown how Browz would be competing against the other alternatives of thestate of the art. When using the acronym "NA", it is meant that we do not knowthis information for the specified tool. We did not include here X-Diag, since itnot only detects an XBI but tries to repair it, making it superior to all other toolsstated on this document, including Browz.

Page 36: Browz: Automatic Visual Regression Testing using

26 Chapter 4. Solution Design

TABLE 4.1: State of the art feature comparison against proprietarysolutions

Feature BrowserStack LambdaTest BrowzCloud execution Yes Yes YesLocal execution No No YesBrowservirtualization Yes Yes Yes

Execute testson different OS Yes Yes No

Execution logs Yes Yes YesStable connectionduring execution No Yes Yes

Browser exploration Yes YesSupport formultiple handlers,currently only Cypress

Visual regressioncomparisonbetween browsers

No Yes Resemble.js

Responsiveviewportssupport

No Yes Yes

TABLE 4.2: State of the art feature comparison against open sourcesolutions

Feature Galen BackstopJS BrowzCloud execution Yes Yes YesLocal execution Yes Yes YesBrowservirtualization No No Yes

Execute testson different OS No Yes No

Execution logs Yes Yes YesStable connectionduring execution Yes Yes Yes

Browser exploration Selenium Puppeteer andChromy JS

Support formultiple handlers,for starters Cypress

Visual regressioncomparisonbetween browsers

NoNo it compares thesame app against itselfduring a time period

Resemble.js

Responsiveviewportssupport

No Yes Yes

Page 37: Browz: Automatic Visual Regression Testing using

4.6. Comparison with Other Solutions 27

TABLE 4.3: State of the art feature comparison against academic re-search

Feature CrossCheck X-Check BrowzCloud execution Yes Yes YesLocal execution Yes Yes YesBrowservirtualization Doesnt say Yes Yes

Execute testson different OS No No No

Execution logs Yes Yes YesStable connectionduring execution Doesnt say Doesnt say Yes

Browser exploration YesInhouse methodof capture andreplay events

Support formultiple handlers,for starters Cypress

Visual regressioncomparisonbetween browsers

Static code, imagecomparison andmachine learning

X-Pert(alignmentgraph)

Resemble.js

Responsiveviewportssupport

Doesnt say Doesnt say Yes

Page 38: Browz: Automatic Visual Regression Testing using
Page 39: Browz: Automatic Visual Regression Testing using

29

Chapter 5

Deployment and Documentation

5.1 Running From Source Code

This section will detail how to run Browz locally from the source code [14] andwill also explain the parameters for the CLI command.

In order to run Browz locally, it is required to have NPM, NodeJS and dockerinstalled. Also, the source code can be cloned with git or downloaded directlyfrom Github [14].

First, make sure your current working directory is the root of the project. Then,based on the NPM scripts declared on the "package.json" one should type on thecommand line:

1 npm run browz −− <bin_params> <app_flags >

The app flags allow a small customization of the process for the end user. On thefirst release we include:

• –visualize: The container execution is not started and we go straight to thereport visualization (will show the history view by default).

• –skip-report: The report visualization is skipped. This flag is useful to au-tomate tests on bash scripts as otherwise it won’t finish without user input.

As for the bin params, there are two as well:

• HTTP APP DIR: Directory (relative or absolute path) where the regular webapp export is hosted. In case the "–visualize" flag is present, this parameterwill be ignored.

• SNAPSHOT DESTINATION DIR: Directory (relative or absolute path) towhere the images for the comparison will be stored. When it is not present

Page 40: Browz: Automatic Visual Regression Testing using

30 Chapter 5. Deployment and Documentation

the default value is the "runs" folder of the project, and it is ignored whenthe "–visualize" flag is present.

In this regard, some examples for the execution of this script would be:

1 npm run browz −− /tmp/app/ d i s t /tmp/images # Explore app on/tmp/app/ d i s t s t o r i n g images on /tmp/images

23 npm run browz −− /tmp/app/ d i s t −−skip−repor t # Explore app on

/tmp/app/ d i s t without showing repor t and s t o r i n g images on runsf o l d e r

45 npm run browz −− −−v i s u a l i z e # V i s u a l i z e repor t h i s t o r y

5.2 NPM Deployment

Browz is currently deployed on NPM [3].

To install it in your machine just type:

1 sudo npm i n s t a l l −g browz

Since we are managing files inside the root folders of the device it only workswith admin privileges for the moment. Similar to the documentation for runningthe project from the source, we have the following examples:

1 sudo browz /tmp/app/ d i s t /tmp/images # Explore app on /tmp/app/ d i s ts t o r i n g images on /tmp/images

23 sudo browz /tmp/app/ d i s t −−skip−repor t # Explore app on /tmp/app/ d i s t

without showing repor t and s t o r i n g images on runs f o l d e r45 sudo browz −−v i s u a l i z e # V i s u a l i z e repor t h i s t o r y

For more detail on what each parameter is for, refer to "Running from sourcecode" section 5.1.

Page 41: Browz: Automatic Visual Regression Testing using

31

Chapter 6

Conclusion

We presented in this thesis a client-server architecture based on the use of adocker container in Section 4 to explore a given regular web app export, andshow the XBIs generated by comparing different browsers (currently Chromeand Firefox). Apart from the design, the implementation allows customizationin the exploration methods (use of handlers), process configurations (max num-ber of events, default waiting time per request, management of logs, Resemble.jsdifference outputs, etc) and visualization of differences across runs.

Compared to proprietary solutions, this prototype offers the execution of compar-isons to the given application both on premise and on the cloud, and the capa-bility (not implemented yet) to use different viewports. Although, our function-alities fall short in comparison to implementations found on scientific research(mostly to Machine Learning identification techniques with X-Pert and X-Check3.4.2 and XBIs correction on X-Diag 3.4.2) the architecture is available to use thetechniques implemented on these papers and even expand them with more op-tions in the future (viewports, different operative systems, etc).

Bear in mind that the main goal of this thesis is to build a functional prototype thatallows web developers to evaluate how their app behaves under the same flow of actionbetween different browsers and take decisions based on the results. This way, we aimto provide a public solution that web developers can use to detect cross-browserinconsistencies in their applications.

Due to this, apart from design and implementation, we exposed the prototype ona public repository [14] and a public package on NPM [3] so that anyone can useit and give us feedback.

Nevertheless, we recognize the solution has several improvement areas regardingthe increase of options the user has available and its current execution flow.

Page 42: Browz: Automatic Visual Regression Testing using
Page 43: Browz: Automatic Visual Regression Testing using

33

Chapter 7

Future Work

In this chapter we propose improvements and specific tasks that could be doneafter this first stage of development:

First, ensure all the images being processed by the Snapshot Processor Server getwritten. This, because currently the server is stopping without verifying if it isstill saving data or processing comparisons. Another improvement is the creationof a proxy server inside the container execution environment to make sure thatrequests to external websites will remain consistent throughout the whole execu-tion in all the browsers. Also, the handling of directories could improve as nowit requires super user permissions when running from the global install.

Since our current implementation encapsulates the whole browser comparisonin one container, the prototype can increase its reach by including multiple op-erative systems (right now we only include Ubuntu latest version available onDockerHub [26]) and multiple viewports (currently only desktop screen views)by making either parallel or sequential executions of these possible combinations.

After that we propose reviewing the exploration tool we are currently using (Vari-ation from the SW Design Lab smart monkey using Cypress.js [25]), focusingon the reasons why it suddenly stops it’s execution (mostly when being usedon Firefox) and use other tools to make action flows on the application, amongthese, implementations of Rippers or Monkeys with Puppeteer [22] or Playwright[16]. A Ripper would give us the flexibility of exploring multiple action flows forthe same applications instead of just one with pseudo-random actions generatedwith a seed (current implementation).

Finally, we consider that a more comprehensive search of related work must bedone to recognize the use of open-source tools for Automated Visual Regressiontesting and the reception in the developer community.

Page 44: Browz: Automatic Visual Regression Testing using
Page 45: Browz: Automatic Visual Regression Testing using

35

Bibliography

[1] Yu Adachi, Haruto Tanno, and Yu Yoshimura. “Reducing Redundant Check-ing for Visual Regression Testing”. In: 2018 25th Asia-Pacific Software En-gineering Conference (APSEC). 2018 25th Asia-Pacific Software EngineeringConference (APSEC). ISSN: 2640-0715. Dec. 2018, pp. 721–722. DOI: 10 .1109/APSEC.2018.00106.

[2] alexchopin. nuxtjs.org. nuxtjs.org. Library Catalog: nuxtjs.org. URL: https://nuxtjs.org/guide/ (visited on 03/29/2020).

[3] browz. npm. Library Catalog: www.npmjs.com. URL: https://www.npmjs.com/package/browz (visited on 07/25/2020).

[4] Child Process | Node.js v14.4.0 Documentation. URL: https://nodejs.org/api/child_process.html (visited on 06/28/2020).

[5] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. “Cross-Check: Combining Crawling and Differencing to Better Detect Cross-browserIncompatibilities in Web Applications”. In: Verification and Validation 2012IEEE Fifth International Conference on Software Testing. Verification and Vali-dation 2012 IEEE Fifth International Conference on Software Testing. ISSN:2159-4848. Apr. 2012, pp. 171–180. DOI: 10.1109/ICST.2012.97.

[6] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. “X-PERT:Accurate identification of cross-browser issues in web applications”. In:2013 35th International Conference on Software Engineering (ICSE). 2013 35thInternational Conference on Software Engineering (ICSE). ISSN: 1558-1225.May 2013, pp. 702–711. DOI: 10.1109/ICSE.2013.6606616.

[7] Docker. What is a Container? | App Containerization | Docker. Library Cata-log: www.docker.com. URL: https://www.docker.com/resources/what-container (visited on 03/29/2020).

[8] Galen Framework. galenframework/galen. original-date: 2013-09-17T13:48:43Z.Mar. 27, 2020. URL: https://github.com/galenframework/galen (visitedon 03/29/2020).

[9] Garris. garris/BackstopJS. original-date: 2014-10-15T18:56:07Z. Mar. 29, 2020.URL: https://github.com/garris/BackstopJS (visited on 03/29/2020).

Page 46: Browz: Automatic Visual Regression Testing using

36 Bibliography

[10] Git - git-pull Documentation. URL: https://git-scm.com/docs/git-pull#Documentation/git-pull.txt---progress (visited on 06/28/2020).

[11] Google. Chrome 64 to deprecate the chrome.loadTimes() API | Web. Google De-velopers. Library Catalog: developers.google.com. URL: https://developers.google.com/web/updates/2017/12/chrome-loadtimes-deprecated (vis-ited on 03/29/2020).

[12] Express JS. expressjs/multer. original-date: 2014-01-30T18:16:04Z. June 28,2020. URL: https://github.com/expressjs/multer (visited on 06/28/2020).

[13] Resemble JS. Resemble.js : Image analysis. URL: https://rsmbl.github.io/Resemble.js/ (visited on 03/29/2020).

[14] Sergio Guzmán Mayorga. sguzmanm/browz. original-date: 2020-02-21T19:22:17Z.July 25, 2020. URL: https : / / github . com / sguzmanm / browz (visited on07/25/2020).

[15] MDN. Handling common HTML and CSS problems. MDN Web Docs. LibraryCatalog: developer.mozilla.org. URL: https://developer.mozilla.org/en-US/docs/Learn/Tools_and_testing/Cross_browser_testing/HTML_

and_CSS (visited on 03/29/2020).[16] Microsoft. microsoft/playwright. original-date: 2019-11-15T18:32:42Z. June 7,

2020. URL: https://github.com/microsoft/playwright (visited on 06/08/2020).[17] Most Powerful Cross Browser Testing Tool Online | LambdaTest. URL: https:

//www.lambdatest.com (visited on 06/07/2020).[18] Most Reliable App & Cross Browser Testing Platform. BrowserStack. URL: https:

//www.browserstack.com (visited on 06/07/2020).[19] Node.js. Node.js. Node.js. Library Catalog: nodejs.org. URL: https://nodejs.

org/es/ (visited on 06/28/2020).[20] Onetaplnc. OnetapInc/chromy. original-date: 2017-05-09T07:19:56Z. May 30,

2020. URL: https://github.com/OnetapInc/chromy (visited on 06/07/2020).[21] PHP. PHP: Hypertext Preprocessor. URL: https://www.php.net/ (visited on

03/29/2020).[22] Puppeteer v3.3.0. URL: https://pptr.dev/ (visited on 06/07/2020).[23] React. React – A JavaScript library for building user interfaces. Library Catalog:

reactjs.org. URL: https://reactjs.org/ (visited on 03/29/2020).[24] Shauvik Roy Choudhary, Husayn Versee, and Alessandro Orso. “WEBD-

IFF: Automated identification of cross-browser issues in web applications”.In: 2010 IEEE International Conference on Software Maintenance. 2010 IEEEInternational Conference on Software Maintenance. ISSN: 1063-6773. Sept.2010, pp. 1–10. DOI: 10.1109/ICSM.2010.5609723.

Page 47: Browz: Automatic Visual Regression Testing using

Bibliography 37

[25] TheSWDesignLab. TheSoftwareDesignLab/monkey-cypress. original-date: 2020-03-03T16:17:24Z. Apr. 1, 2020. URL: https://github.com/TheSoftwareDesignLab/monkey-cypress (visited on 06/08/2020).

[26] ubuntu Tags - Docker Hub. URL: https://hub.docker.com/_/ubuntu?tab=tags (visited on 06/08/2020).

[27] Vue.js. Vue.js. URL: https://vuejs.org/ (visited on 03/29/2020).[28] Guoquan Wu et al. “Detect Cross-Browser Issues for JavaScript-Based Web

Applications Based on Record/Replay”. In: 2016 IEEE International Confer-ence on Software Maintenance and Evolution (ICSME). 2016 IEEE InternationalConference on Software Maintenance and Evolution (ICSME). Oct. 2016,pp. 78–87. DOI: 10.1109/ICSME.2016.28.

[29] Shaopeng Xu et al. “X-Diag: Automated Debugging Cross-Browser Issuesin Web Applications”. In: 2018 IEEE International Conference on Web Services(ICWS). 2018 IEEE International Conference on Web Services (ICWS). July2018, pp. 66–73. DOI: 10.1109/ICWS.2018.00016.