Automated software testing for cross-platform systems · 2012-03-08 · Automated software testing for cross-platform systems Gustaf Br annstr om January 29, 2012 Master’s Thesis

Automated software testingfor cross-platform systems

Gustaf Brannstrom

January 29, 2012Master’s Thesis in Computing Science, 30 credits

Supervisor at CS-UmU: Mikael RannarExaminer: Fredrik Georgsson

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Abstract

SILK is the preferred audio codec to use in a call between Skype clients. Everytime the source code has been changed there is a risk the code is no longerbit-exact between all the different platforms. The main task for this thesis isto make it possible to test bit-exactness between platforms automatically tosave resources for the company.

During this thesis a literature study about software testing has been carriedout to find a good way of testing bit-exactness between different platforms.The advantages and disadvantages with the different testing techniques wasexamined during this study.

The result of the thesis is a framework for testing bit-exactness between sev-eral different platforms. Based on the conclusions from the literature studythe framework is using a technique called data-driven testing to carry out thebit-exactness tests on SILK.

ii

Contents

1 Introduction 1

1.1 Skype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 SILK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Problem description 3

2.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Software testing 5

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 White box testing . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Black box testing . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.5 Regression testing . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.6 Automated software testing . . . . . . . . . . . . . . . . . . . . 10

3.6.1 Recorded testing . . . . . . . . . . . . . . . . . . . . . . 11

3.6.2 Data-driven testing . . . . . . . . . . . . . . . . . . . . . 11

3.6.3 Keyword-driven testing . . . . . . . . . . . . . . . . . . 12

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Accomplishment 15

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 How the work was done . . . . . . . . . . . . . . . . . . . . . . 15

iii

iv CONTENTS

5 Results 17

5.1 Test specifications . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2 Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.3 Replay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Conclusions 23

6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 Acknowledgements 25

References 27

List of Figures

3.1 The cost for fixing a bug depending on the time. . . . . . . . . 5

3.2 The relation between the cost of performing the tests and the

number of defects. . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 The figure shows what is included in white box testing. . . . . 8

3.4 The figure shows how data-driven testing could work. . . . . . 12

5.1 A system overview. The output from the recordings is used as

input to the functions the replay module is testing. . . . . . . . 17

5.2 An example how the network of the computers participating in

the test could look like. . . . . . . . . . . . . . . . . . . . . . . 20

v

vi LIST OF FIGURES

List of Tables

3.1 Example of how the data could look like in a simple example for

data-driven testing. . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Example of how the data could look like in a simple example for

keyword-driven testing. . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Preliminary schedule for the thesis work. . . . . . . . . . . . . . 15

vii

viii LIST OF TABLES

Chapter 1

Introduction

When a company is developing new software, problems and bugs will occur inthe source code. As the number of lines of code increases it becomes more diffi-cult to locate where in the source code the bugs are. Since it is not possible toavoid bugs in the code it will become a common but also a very time consumingtask to perform and thus it will end up being a large company expense. If itwas possible to automatically locate all the incorrect functions in the sourcecode it would be highly desirable since this could help a company to save a lotof resources.

This thesis focuses on automated testing for cross-platform software. Dur-ing the thesis a framework will be implemented to solve the task of testingbit-exactness between functions in Skype’s open source audio codec SILK.

1.1 Skype

Skype is a Luxembourg based company and was founded 2003 by Niklas Zenn-strom and Janus Friis [15]. Skype is developing software for both video andvoice communication. Their software can be used on regular desktop computer,phones and TV’s. During 2010 the users of Skype was talking 207 billionminutes in total and in the last quarter of the same year they had an averageof 180 million users connected every month [15].

1.2 SILK

SILK is the default audio codec between two devices running Skype [17] and isopen source code. The codec can be used in real-time applications and supportsseveral different sampling frequencies and can adjust to both network and CPUchanges [16]. SILK is implemented in C, but some of the functions are optimisedwith assembly code and the codec supports several different platforms.

1

2 Chapter 1. Introduction

1.3 Definitions

The definitions below will be used throughout the entire thesis:

• BugA bug is the same thing as a software defect, error or fault. The word bughas been used a long time for defects in products, not only in computersoftware.

• Bit-exactTwo binary units are considered bit-exact when the bits they are rep-resented by are exactly the same. For instance, the output from twoprograms are bit-exact if the output are identical.

• CodecA codec consist of two parts, an encoder and a decoder. The encoderencodes data which then can be decoded by the decoder. A codec can beused for several reason, e.g. an audio codec can be used to compress anddecompress audio data.

1.4 Outline

A brief description of the following chapters in the thesis:

• Chapter 2, Problem description, a description of the problem, the goalsand the purpose with the thesis

• Chapter 3, Software testing, an in-depth study of existing software testingmethods

• Chapter 4, Accomplishments, a description of the work process

• Chapter 5, Results, a system overview of the test framework

• Chapter 6, Conclusions, the reached conclusions during the thesis

• Chapter 7, Acknowledgements

Chapter 2

Problem description

In the following sections the problem of this thesis is stated and the goal andthe purpose of the thesis is described.

2.1 Problem statement

When the developers at Skype have modified the existing source code or havewritten new functions they want to make sure SILK is still bit-exact. It isimportant SILK remains bit-exact between all the different platforms it will beexecuted on. If not, the resulting output signal from SILK will differ betweenthe platforms. The following steps describe one way to test if the codec isbit-exact:

1. Prepare all the devices

2. Build the codec for all devices

3. Copy the binaries to the devices

4. Run the test for the component on all devices

5. Analyse the results

SILK is supported by several platforms which makes it a time consuming taskto manually test it on each platform. Since the electronic market is growingfast, the number of platforms the codec will have to support is increasing. Foreach new platform the test procedure will become more and more time con-suming to perform.

If the codec would not be bit-exact the tester needs to report the problemto the developers. The developers then needs to do the tedious work of findingthe function causing the codec to not be bit-exact.

3

4 Chapter 2. Problem description

For each step, including the debugging step done by the developers, that canbe automated the time complexity for the testing procedure will be reduced.In the best case all of the steps above are automated, which would give thecompany a certain level of quality assurance.

2.2 Goal

The goal of this thesis is to implement a test framework to compare if theoutput from different functions executed on different platforms is bit-exact.The framework should automate the steps in the test procedure stated in 2.1.

2.3 Purpose

The main purpose of this thesis was to find different types of existing test-ing methods and decide which of these could be used in an automated testframework for cross-platform systems. This framework is supposed to help theaudio developers at Skype to verify that all the functions in SILK continuesto be bit-exact between platforms after new code has been committed. Thedevelopers at Skype can currently only test if the entire encoder and decoderare bit-exact between platforms, not a specific function.

Chapter 3

Software testing

This chapter is the result of the literature study about software testing thatwas carried out during the thesis.

3.1 Introduction

In todays society people get in contact with computers almost every day and allthe computers contain some sort of software. As the software becomes largerand more complex the probability of bugs to occur will increase. As the numberof lines of code grows it will also become more difficult to find a bug. Accord-ing to [12] it is also very important to find bugs as early as possible during thedevelopment as it will only become more expensive to fix them later on in theproject. This is illustrated in figure 3.1 from [12, p. 9].

Figure 3.1: The cost for fixing a bug depending on the time.

5

6 Chapter 3. Software testing

The companies should therefore try to test the software as soon as possibleduring the development to lower the cost. According to the author of [3] theseare some of the consequences a company with a low quality product can expect:

• protracted delays in delivering new applications,

• lose customers to competitive companies,

• high maintenance costs due to poor quality and

• high customer support costs due to poor quality.

If the company does not test the software thoroughly and extensively enoughthey will probably miss several of the bugs since the software is under tested.On the other hand, the software can be over tested if a company put to muchresources on testing it and the expense will be unnecessarily high. These twoscenarios are shown in figure 3.2 from [3, p. 48].

Figure 3.2: The relation between the cost of performing the tests and thenumber of defects.

3.2 Unit testing

The idea with unit testing according to [7] is that each unit test should verifythat a specific part of the source code has a particular behaviour. Each unit

3.3. White box testing 7

test should clearly state if the output from a test passed or failed given a certaininput and then quickly give the feedback to the developer or tester. Today it iscommon to use unit tests and it is a key element in test driven development [7].Companies that are applying this development method should implement thefunctions after the tests have been written. The idea is that the output of eachfunction should be predictable given a certain input and therefore it shouldbe possible to write the tests before implementing the functions. There existsmany different unit test frameworks and here is a list of some of the membersin the xUnit family:

• SUnit (Smalltalk)

• JUnit (Java)

• CppUnit (C++)

• PyUnit (Python)

Even though unit tests are commonly used the concept has several drawbacks.It is a time consuming task to write all the tests and run all of them manually.If the project is big and contains a lot of branches it will not be feasible to writetests to cover all the branches. Another drawback with unit tests accordingto the author of [8] is the difficulties of writing good tests. If the tests arebadly written they will either cover the function poorly or unnecessarily manytests needs to be written. It should be straight forward to write unit tests forfunctions with simple input and output. But as the input or output of thefunction under test is getting more complex it becomes harder for the testerto write the test cases and there is a risk that the function will be poorly tested.

In [1] the authors are describing the psychology behind testing. If the de-veloper is writing the test, he or she may become blind of the their own errors.But if someone else is writing the tests instead, this will probably become evenmore time consuming due to poor knowledge of the code.

If a lot of test cases has been written for a certain software application and theapplication needs to be redesigned, all the original test cases might also needto be rewritten due to this change. This is not only because interfaces mightchange but also behaviours of functions also might change which makes theoriginal test cases to be invalid. It is therefore important that the code thatwill be tested is stable.

3.3 White box testing

The author of [14] describes white box testing as ”a way of testing the externalfunctionality of the code by examining and testing the program code that re-alises the external functionality” (p. 48). During white box testing the source


code will be required since it will be analysed in different ways. Software com-panies can use this type of testing to get an overview of which parts of thesource code are being used and which parts that should be improved.

White box testing can be divided into two parts, static testing and structuraltesting, which is shown in figure 3.3 from [14, p. 48]. The analysis of the code

Figure 3.3: The figure shows what is included in white box testing.

during the static testing can either be manually or automatically according to[14]. During the static testing it is possible to see e.g. if the code fulfils thefunctional requirements, if any part of the source code is unreachable and if allthe declared variables are being used. The source code will never be compiledduring the static testing.

The structural testing includes, as shown in figure 3.3, unit testing (describedin 3.2), code coverage testing and code complexity testing. The code coveragetesting will run the program with predefined test cases and profile the software.It exists four different ways of measuring the code coverage according to [14]:

• function coverage, how many times each function has been called,

• statement coverage, how many of the statements has been executed,

• path coverage, how many of the different paths has been executed and

• condition coverage, how many of the conditions has been evaluated.

The returned information from the code coverage tool will help the developersto see e.g. if there exists any code that has not been executed during the tests.The developers can then decide if such code should be removed or if they should

3.4. Black box testing 9

add more test cases to cover that code.

To measure the code complexity one can use cyclomatic complexity [10]. It”measures program unit complexity in term of control flows, specifically branch-ing” according to [2, p. 284]. To compute the cyclomatic complexity the controlflow graph is required. The complexity for a single function is calculated as:

M = E −N + 2

where

• M =complexity

• E =number of edges in the control flow graph

• N =number of nodes in the control flow graph

The complexity M will represent the number of independent paths through thefunction, and can be helpful when writing unit tests.

3.4 Black box testing

In [14] the authors explains black box testing as testing ”without knowledgeof the internals of the system under test” (p. 74). Black box testing can beused to test the software against a list of specifications or requirements. Thistype of testing should be used on software that is ready for delivery. The testershould be able to look at the software specifications, not the source code, andthen give the software some input. The output from the software should thenbe verified against expected output according to the software specifications.Black box testing should not be used for finding errors in the software, only forverifying it. This is why this type of testing can be useful for a company thathas paid another company to develop the software application.

When performing black box testing the tester should use both positive andnegative testing. Positive testing can be used to ensure the customer that thesoftware is working as excepted and negative testing can be used to show thatthe software does not crash due to unexpected input.

3.5 Regression testing

When new source code has been developed or old source code has been modi-fied, both new bugs can occur and old bugs might reoccur. To detect this typesof bugs a quality engineer can use regression testing. This is described by [14]as ”regression testing is done to ensure that enhancements or defect fixes madeto the software works properly and does not affect the existing functionality”(p. 194). According to [9] it is also important to perform regression testing on


the software as soon as something, e.g. the hardware the software is executedon, has changed. This is important since some part of the source code mightbe hardware independent and therefore some bugs might only occur on certainhardware.

Usually it is not feasible to run all the test cases for the software every timenew source code is committed. So, the first step in the regression testing isusually to perform a smoke test. A smoke test is essentially a test for all thebasic functionality of the software. If this test fails, the bug that is causing thecrash needs to be fixed before a more detailed testing can be done. If the testerdoes not have the time to run all the tests, the tester should then instead selectand run a subset of tests that is covering the new source code.

3.6 Automated software testing

The testing techniques in the previous sections in this chapter are traditionallycarried out manually during the scripted testing done by a quality engineer[4]. Another way of doing the testing is to automate the testing procedure.The execution of the test cases can be automated by using a specialised testframework or software. The authors of [14] describes a test framework as a”module that combines ’what to execute’ and ’how they have to be executed’”(p. 398).

According to [5] these are some of the benefits of automating the test pro-cedure:

• Saves time and resourcesAn automated test framework is most likely more efficient than a qualityengineer who is running the same tests manually.

• More reliable testingTest cases which includes many steps can be hard for a manual tester tocarry out without doing anything wrong.

• Run more testsA company will get better test coverage of the software since an auto-mated framework is more likely to be more efficient than manual testing.The quality engineer can write new test cases or improve existing testscases instead of running the tests manually.

• Run other types of testsWith an automated test framework it is possible to run tests e.g. stresstests and long time tests. Tests like these can be hard do manually.

However, there also exists drawbacks with automated software testing. Here isa list of some of the drawbacks:

3.6. Automated software testing 11

• The return of investmentIt is not always worth setting up an automated test framework for thetesting according to [5]. If the return of investment is not big enough thecompany should continue with the manual testing. The company shouldkeep in mind that it may take a while before they will get return of theinvestment.

• All tests can not be automatedTests such as different types of ad hoc testing [14] and verification ofgraphical user interfaces are hard to automate.

• Test that requires human interactionSome tests requires human interaction, such as connecting and discon-necting different hardware, thus not suitable for automation according to[14].

• Often changing softwareIf the software under test is changing often it will be a large overhead toset up the automated test framework for the software every time it hasbeen changed according to [6].

As described in [5] the need of manual testers does not disappear when thetests becomes automated. The manual testers are experts on how to test thesoftware. Their knowledge could instead be used to improve the test cases theautomated test framework is executing.

The following subsection describes different techniques that can be used inan automated test framework.

3.6.1 Recorded testing

This method is recording all the interactions with the software according tothe author of [11]. The interaction could e.g. be mouse clicks and keyboardactions. The recorded interactions can then be used later on to test if the newversion of the software is doing the same thing if the recorded interactions arereplayed to it.

This method can e.g. be used to test graphical user interfaces, but it is impor-tant to do the recordings in the correct way. Depending on how the recordingswith the graphical user interface are done, the playback step could be sensitiveto changes in the user interface. Thus, it will require different recordings foreach version of the graphical user interface.

3.6.2 Data-driven testing

According to the authors of [6] data-driven testing consists of two parts, thedata and the script that will use the data. For each of the test cases the script


runs, it first reads data, which could be stored in a database or in files, and thenruns the test and compare the results against a database or a file containingthe expected results. This is illustrated in figure 3.4.

Figure 3.4: The figure shows how data-driven testing could work.

The data for the testing needs to be recorded or generated in one way or an-other before the testing can begin. If data-driven testing is done with a largedatabase with test data the test coverage of the software will be good.

Below, in table 3.1, is an example how the data could look like when a calcu-lators addition function is tested:

Input Result1 1 21 2 32 3 53 5 8

Table 3.1: Example of how the data could look like in a simple example fordata-driven testing.

In this example the script would read the element in the first row in the twofirst columns and then run the function under test and compare the result fromthe function with the expected result in the third column. This procedure isrepeated for each row in the table.

3.6.3 Keyword-driven testing

Keyword-driven testing is similar to data-driven testing. These two methodsare sometimes referred to as table-driven testing. Both methods uses databases

3.7. Conclusions 13

or files that are containing the input and the result. The difference is that thetable for the keyword-driven testing also contains a keyword and table 3.2 isan example of how this could look like if the system under test is a calculator.Each keyword corresponds to a predefined action and when the test script readsa keyword it knows what to do.

Keyword Input Resultadd 1 1 2sub 2 1 1mul 3 2 6div 6 2 3

Table 3.2: Example of how the data could look like in a simple example forkeyword-driven testing.

Each row in the table is representing one test case. The script will first readthe keyword on the current line and translate the keyword to a specific set ofactions. The next two columns will be used as input to the actions and theresult from the actions will be compared against the result in the fourth column.

In this example the script will translate add to call the calculator’s function foraddition with 1 and 1 as input. The result from the function will be comparedagainst the expected result value from the table, which is 2.

3.7 Conclusions

All different types of testing are not suited to be automated. Stress, reliability,regression and functional testing are four types of testing that are well suitedfor automation since they are repetitive tasks. As stated earlier, ad hoc test-ing and different types of static testing are on the other hand not suited forautomation mainly because they require human interaction.

But if a company are thinking about setting up a test framework for theirapplication they should consider how expensive it would be to fix a bug com-pared to how much resources they would have to use to avoid such bugs. If theapplication is advanced and complex it would probably require more resourcesthan a very simple application. For a simple application it would probably beenough to do exploratory testing and unit testing, which is less expensive thanimplementing or buying an automated testing framework.

If it is decided to set up a test framework for the application, it should beeasy to test the source code. If possible, as much as possible of the source


code should automatically be tested on each code commit, since it is easier andcheaper to fix the bugs in a early phase than in a late phase. The results of thetests should be easily accessible and visualised to the developers that are fix-ing the bugs. If they do not know that the bugs exists, they can not fix the bugs.

It is also important to remember to do integration testing, since it is verylikely that new bugs occurs when the sub components are interacting witheach other. Even if a sub component is passing all the unit and regression testsit is possible it fails when it is integrated with the other sub components. Forinstance, this can happen in an application that is running each sub componentin different threads with some shared resources. The threading issues this ap-plication could have are not possible to test with unit tests. This should insteadbe tested in a automated test framework which could stress test the application.

All three different automated methods to test software have similarities. Theyall require predefined input to the software it will test. Data-driven testingis suitable to use when testing the lower level of the software, e.g. a specificfunction or an entire component. Keyword-driven testing is adding anotherlayer to the data-driven testing since each keyword could correspond to a set ofcomponents and functions that will be tested. If the keywords in the databaseare testing things in the higher level of the software this could in fact test sim-ilar things as recorded testing. But recorded testing can not be used to testthe lower level of the application since it records the input to the applicationon the higher level.

To test bit-exactness between functions the same input should be used to allthe functions that will be compared. This data could either be automaticallygenerated or it could be recorded from an execution. Every time the functionsunder test are called the input to the function is recorded. If the latter typeis used, the input data to the functions under test will be input it actuallymight get in a real situation and not some automatically generated dummydata. If the functions under test are called with varying input data duringthe recording, the test coverage will be good when the data is used during thebit-exactness test. For an audio codec it is possible to use a long and varyingaudio file as input during the execution to generate recordings with good codecoverage.

With a large set of varying input it is possible to use data-driven testing to testa system. With data-driven testing it is possible to run regression testing anddifferent types of functional testing which is exactly what should be done whenSILK is tested for bit-exactness. These are the reasons why the framework thatwill be implemented during this thesis will be using data-driven testing.

Chapter 4

Accomplishment

This section will describe the preliminaries for the thesis, how the work wasplanned, and how the work actually was done.

4.1 Preliminaries

Below, table 4.1 shows an outline of the preliminary schedule that was writtenfor the project plan. The table does not include the preparation work, whichincludes writing a proposal and a project plan for the thesis, that was donebefore the thesis started. All the work during the thesis was planned to bedone at Skype’s office in Stockholm. Even though it is not visible in 4.1, theplan was to start write the report when the implementation reached the finalstage.

Weeks Work tasks4 Literature study,

study current test system anddesigning the test framework

9 Implementation and testing6 Write report and

evaluate the framework1 Prepare presentation and opposition

Table 4.1: Preliminary schedule for the thesis work.

4.2 How the work was done

This section will describe how the separate parts of the thesis was done.

15

16 Chapter 4. Accomplishment

Literature study, study of current test systems and designing theframeworkI was supposed to do the literature study, study of the current test systemsand designing the test framework during the first four weeks of the thesis. Dueto work related travelling and a conference the literature study was delayed alittle. The consequences were that I had to do the literature study in parallelwith some parts of the implementation instead. Luckily this did not cause anybig problems. Thanks to a deal the company where I did my thesis had witha company that provides e-books it was easy to find a lot of literature aboutsoftware testing. The outcome of this part of the thesis was a design of the testframework and knowledge on how they are doing the testing.

Implementation and testingThe next nine weeks of work I was supposed to spend on implementing the testframework. During these weeks I first ran in to some problems with pointers inC and then some problem with setting up Cygwin to work with the framework.The framework had to work with Cygwin since this was a requirement fromthe company. Due to the problems with Cygwin the implementation phasetook me one week extra and thus ten weeks in total. During this phase of thethesis the framework was also tested and the outcome of this phase was thetest framework.

Write the report and evaluate the test frameworkAfter the implementation was done I started to write the report. I did not dothis simultaneously with the implementation as it was planned but thanks todifferent kind of documentation and notes this was not a problem to do.

Chapter 5

Results

The outcome of the in-depth-study and the implementation was a framework.It consists of two separate modules, the record and the replay module. Therecord module makes it possible to capture input data to a function. The datathat was captured by the record module is later on to be used as input to thefunctions the replay module wants to test. The result from the execution of thefunction under test will be captured by the replay module and be comparedagainst the result from other functions. This is shown in figure 5.1.

Figure 5.1: A system overview. The output from the recordings is used asinput to the functions the replay module is testing.

The framework can compare if the result from two or more functions is thesame. This makes it possible to test if a function that has been implementeddifferently, due to e.g. optimisations, for different platforms is bit-exact be-tween the platforms. By giving the same input to all the different implementa-tions of a function, it is possible to make sure the function is bit-exact betweenthe different implementations by comparing the result from the function.

17

18 Chapter 5. Results

5.1 Test specifications

XML files are used to specify which functions to test and how to build andrun the executables. The framework is using three different types of XML files.The first file contains information about the functions and their parameters andthe second file contains the information that is needed to build and execute thefiles. These two files are used by both the record and replay module. Thelast file is containing the test cases, which functions the test framework willcompare and on which platform the functions should be executed on. This fileis only used by the replay module.

5.2 Record

The recording module is capturing input to a function. The idea with thismodule is to generate input data to the functions the replay module will test.It is important that all the different implementations of the function that thereplay module will test are using the same recording as input. These are themain steps in the recording module:

1. Rename functions

2. Create new functions

3. Build and run program with input

During the first step all the functions that have been specified in the XMLfile will be renamed. A prefix (in this case ” ”) will be added to the originalfunction name for each function. All the occurrences of the function name inthe file will be changed, not only the function head. The framework will alsosearch if the directory where the original C-code is located contains a file withoptimised code. It will search for a file with the same file name but ends with” arm.S” instead of ”.c”. If such a file exists all the occurrences of the functionname will be renamed in the same way as the C-code.

The second step for the recording module is to create new functions that arereplacing all the original functions since they have been renamed. All the newfunctions will write to two files for each parameter. In the first file the functionwill write the parameter’s binary data and in the second file the function willwrite how many bytes it wrote to the first file. When this is done the functionwill call the renamed function. The second step is illustrated in the followingpseudo code:

CODE

1 function __my_function(a,b):2 . . .3 . . .

5.3. Replay 19

4 end5

6 function my_function(a,b):7 write_to_file(a)8 write_to_file(sizeof(a))9 write_to_file(b)

10 write_to_file(sizeof(b))11 return __my_function(a, b)12 end

Every time the original function would have been called, the new functionwill be the called instead. For all the other functions it will be no differencesince the new function calls and returns the same thing as the original function.

The last step in the recording module is to build and run the project withthe modified code. The framework will build it with the command specified inthe XML file. After a build has finished the framework will execute the binaryand all the parameters will be recorded.

Before the module starts modifying the code it will first create a backup ofall the files it will modify. When the framework has built the modified code itwill restore the files.

5.3 Replay

The replay module can be used to call specific functions with prerecordeddata as input to the functions. It is possible to run everything on the localcomputer but the local computer could also start the replay module on a remotecomputer. This gives the user of the framework the opportunity to test if afunction is bit-exact between different platforms and architectures. An examplehow such a setup could look like is shown in figure 5.2.


Figure 5.2: An example how the network of the computers participating in thetest could look like.

When the replay module is started on the local computer it is first parsing theXML files. This will give the module a list of all the comparisons to do. Eachcomparison is done between two or more functions or a list of files containingprerecorded data. Since it possible to compare files containing prerecordeddata it is possible to use reference output files.

The first step after the parsing is to send all the new files to all the remotecomputers specified in the XML files. The module will use the secure copycommand (scp) to transfer the file to the remote computer.

After sending the remote computers have received the new files the modulewill start handle each separate comparison. These are the steps the modulewill do to generate output from each function in the comparison:

1. Send XML files to specified computer

2. Start a subprocess on the specified computer

3. Retrieve the files from the specified computer

Depending on whether or not a function is specified to be executed on the localor on a remote computer the module will handle this a little bit different. Itwill use secure copy to transfer the XML files between the local computer and

5.3. Replay 21

the remote computer. To start a subprocess on a remote computer the modulewill use a remote login program named ssh.

The subprocess that will be started will do the following:

1. Create the new main function

2. Build the new code

3. Run the new code

To avoid executing unnecessary code when generating output from a functionthe original main function will be replaced with a new main function. The newmain function will first create a variable for each parameter and then load theprerecorded data into it. It will first read how many bytes to read from the filewith the sizes and then read that amount of data from the file containing theactual binary data.

The next step is to call the function that is supposed to be tested with theparameters and then record the output from the function. The framework willwrite all the parameters and the return value to files. Since the parametersmight be pointers and has been changed inside the function, it is important toalso write the parameters to file, not only the return value. This is the pseudocode on how the new main function can look like:

CODE

1 function main():2 a = read_from_file(nr_of_bytes(a))3 b = read_from_file(nr_of_bytes(b))4 ret_val = call my_function(a, b)5 write_to_file(a)6 write_to_file(b)7 write_to_file(ret_val)8 end

After the code has been replaced the subprocess will start to build and executethe new code. The result of the execution of the new main function will beone file for each parameter containing the binary data. These output files willbe retrieved back to the local computer and when the output files from all theexecutions has been retrieved, the framework will test if the files are bit-exact.

This module is also making a backup of the file containing the original mainfunction before it replaces the code. When all the comparisons are done themodule restores the file.


Chapter 6

Conclusions

All software developing companies need to test their software to be able toguarantee a certain level of quality. Some software needs to be tested morethoroughly than others and thus the companies needs to find a balance of thecost of running a lot of tests and the cost of missing a bug. To automate thetesting might be a good way of having good test coverage for a low cost butsometimes the return of automating the testing is not bigger than the invest-ment.

The level of automation depends on the which programming language is usedfor the system under test. SILK is mainly implemented in C, which has causedsome problems with the automation procedure. A pointer in C may refer toa single element or a list of elements and the number of elements it refers tois not connected to the pointer. In other programming language as e.g. Javathe size of an array is always known. This would make it easier to implementa framework that required less information from the user of it and thereforeeasier to automate more steps.

The outcome of this thesis was a test framework, which fulfills the goals ofthe thesis by automating the testing procedure stated in section 2.1. Theframework is currently being used by audio developers at Skype to further im-prove the quality of the source code and the final client.

Overall the thesis was successful and the time plan that was done in the projectplan was possible to sustain. All the requirements from the company were pos-sible to fulfill in the 20 weeks time frame of the thesis. The main reason whythe time plan of the thesis was possible to sustain, is because when each phasewas time estimated it took into account that problems will occur. Withoutthis extra time for solving the problems properly, it was possible to avoid quickfixes that probably would cause bigger problems later on.

23

24 Chapter 6. Conclusions

6.1 Limitations

The framework has one major drawback. For each function the user of theframework wants to test, the user has to specify how many bytes should berecorded. Even though it is possible to use C-code, such as variables and thesizeof() function, to specify the number of bytes and the user only has tospecify it once for each function this is still a drawback. If this was not requiredthe entire test procedure would be possible to automate except from an initialset up.

6.2 Future work

Although the goals of the thesis were reached and fulfilled, the framework canbe improved in several different ways.

Currently the framework needs to be tested more thoroughly. It has onlybeen tested with a Windows 7 machine as the local computer but should alsobe tested on more platforms, such as Linux, Mac OS X and other versions ofWindows. It is important that the framework is stable and robust. If the testframework is crashing all the time on a platform the tester would not trust theresult from the framework. The question is how to test the test framework? Byimplementing another test framework or should this work be done manually?

It would also be great if it was possible to use the framework on other compo-nents and other software than SILK. This was kept in mind during the designand the implementation to make this possible in the future. The frameworkis currently only expecting the component or software under test to be imple-mented in C but with some minor changes it would be possible to enable toadd support for more programming languages.

The framework could also be improved by extending it to be executed as soon asnew source code has been committed. For each step that has been automated,less work will be required by the tester. The problem described in section 6.1is another thing that should be automated in the future. It would make theframework more stand-alone.

Chapter 7

Acknowledgements

I would like to thank everybody in the Audio team at Skype and a specialthanks to Jon Bergenheim and my external supervisor Yao Yi. It has beenvery inspiring and worthwhile to work with all of you. I would also like tosay thanks to my internal supervisor at the university Mikael Rannar and myfamily and friends for all the support.

25

26 Chapter 7. Acknowledgements

References

[1] H. Schaefer A. Spillner, T. Linz. Software Testing Foundations: A StudyGuide for the Certified Tester Exam. Rocky Nook, 2011.

[2] R. Blacks. Pragmatic Software Testing: Becoming an Effective and Effi-cient Test Professional. John Wiley & Sons, 2007.

[3] J. Subramanyam C. Jones. The Economics of Software Quality. Addison-Wesley Professional, 2011.

[4] B. Pettichord C. Kaner, J. Bach. Lessons Learned in Software Testing: AContext-Driven Approach. John Wiley & Sons, 2001.

[5] B. Gauf E. Dustin, T. Garrett. Implementing Automated Software Testing:How to Save Time and Lower Costs While Raising Quality. Addison-Wesley Professional, 2009.

[6] J. McKay G. Bath. The Software Test Engineer’s Handbook. Rocky Nook,2008.

[7] P. Hamill. Unit test frameworks. O’Reilly Media Inc., 2004.

[8] C. Johansen. Test-Driven JavaScript Development. Addison-Wesley Pro-fessional, 2010.

[9] A. P. Mathur. Foundations of Software Testing: Fundamental Algorithmsand Techniques. Pearson Education India, 2007.

[10] T. J. McCabe. A complexity measure. IEEE Transactions on softwareengineering, 2:308–320, 1976.

[11] G. Meszaros. xUnit Test Patterns: Refactoring Test Code. Addison-WesleyProfessional, 2007.

[12] R. Patton. Software Testing, Second edition. Sams, 2005.

[13] W. E. Perry. Effective Methods for Software Testing. John Wiley & Sons,2006.

27

28 REFERENCES

[14] G. Ramesh S. Desikan. Software Testing: Principles and Practices. Pear-son Education India, 2006.

[15] Skype. About skype. http://about.skype.com/ (visited 2011-06-20).

[16] Skype. Silk speech codec. http://developer.skype.com/resources/draft-vos-silk-01.txt (visited 2011-06-20).

[17] Skype. Silk: super wideband audio codec.http://developer.skype.com/silk (visited 2011-06-20).

Documents

Automated software testing for cross-platform systems · 2012-03-08 · Automated software testing for cross-platform systems Gustaf Br annstr om January 29, 2012 Master’s Thesis