21
Unit Tes)ng Tool Compe))on Round Four Urko Rueda, René Just, Juan P. Galeo5, Tanja E. J. Vos The 9th Interna=onal Workshop on Search-Based SoDware Tes=ng

Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

UnitTes)ngToolCompe))onRoundFour

UrkoRueda,RenéJust,JuanP.Galeo5,TanjaE.J.Vos

The9thInterna=onalWorkshoponSearch-BasedSoDwareTes=ng

Page 2: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

Contents

1. AbouttheToolcompe==on

2. TheTools3. TheMethodology

4. TheResults5. Lessonslearned

4thJavaunittes=ngcompe==on

1

Page 3: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

Unit Testing Tool Competition

FITTESTcrest.cs.ucl.ac.uk/fi;est

Coverage metrics

Mutation metrics

CUTs / Projects / Tools

Tools SBST & nonSBST

2012 ICST’13

✓ Cobertura Javalanche 77 / 5 / 2 Manual & Randoop - baselines

2013 Round Two FITTEST’13

JaCoCo PITest 63 / 9 / 4 1st + T3 & Evosuite

63 / 9 / 8 2014 Round Three SBST’15

✗ 2nd + Commercial & GRT & jTexPert &

Mosa(Evosuite)

2015 Round Four SBST’16

✗ Defects4J: github.com/rjust/defects4j+Realfaultfindingmetric

68 / 5 / 4 Randoop - baseline & T3 & Evosuite & jTexPert

BenchmarkedJavaunittes=ngattheclasslevel

AbouttheToolcompe))on

4thJavaunittes=ngcompe==on

2

Page 4: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

AbouttheToolcompe))on

§  Why?§  Towardstes=ngfieldmaturity–thisisjustJava…§  Toolsimprovements,futuredevelopmentsinsight

§  Whatisnewinthe4thedi=on?§  Benchmarkinfrastructure–splitinto

§  Testgenera=on§  Testexecu=on&Testassessment(Defects4J)

§  Benchmarksubjects(fromDefects4Jdataset)§  Timebudgets(1,2,4&8minutes)§  Flakytests(noncompliable,nonreliablepass)

4thJavaunittes=ngcompe==on

3

Page 5: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheTools

Tool Technique Static analysis

Edition 2012 2013 2014 2015

Randoop (baseline)

Random ✗ ✓ ✓ ✓ ✓

T3 ✗ ✗ ✓ ✓ ✓ jTexPert Random (guided) ✓ ✗ ✗ ✓ ✓

Evosuite Evolutionary algorithm

✓ ✗ ✓ ✓ ✓

§  SBSTandnon-SBSTtools§  Commandlinetools§  Fullyautomated–nohumaninterven=on

4thJavaunittes=ngcompe==on

4

Page 6: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  Tooldeployment§  Installa=on–Linuxenvironment§ Wrapperimplementa=on–runtoolscript

§  Std.IN/OUTcommunica=onprotocol§  4thedi=onhasa=mebudget

§  Tune-upcycle–setup,run,resolveissues§ Benchmarkinfrastructure

§  Defects4Jintegra=on§  Decouplingtestgenera=onfromtestexecu=on/assessment

§  Tool–runovernoncontestbenchmarksamples

4thJavaunittes=ngcompe==on

5

Page 7: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodologyrun tool

for Tool Tbenchmarkframework

"BENCHMARK"

Src Path / Bin Path / ClassPath

ClassPath for JUnit Compilation

"READY"

.

.

.

name of CUT...

generate file in./temp/testcases

"READY"

compile + execute + measure test case

loop

preparation

time-budget

4thJavaunittes=ngcompe==on

6

Page 8: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  Benchmarkinfrastructure§  TwoHPZ820worksta=ons–each:

§  2CPUsocketsforatotalof20cores§  256GbRAM

§  32virtualmachines(16perworksta=on)§  Testgenera=on

§  1core–controltoolmul=-threadingcapability§  8GBRAM

§  Testexecu=on/assessment(toolindependent)§  2cores§  16GbRAM–resolvesoutofmemoryissues

4thJavaunittes=ngcompe==on

7

Page 9: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

benc

hmar

k to

olre

plic

ated

x32

VM

s

T3

jTexpert

EvoSuite

Randoop runtool

80CUTs

RUNs 1, 2, 3

generatetest cases

collect metrics

aggregator

runtool

runtool

runtool

HP Z820 16 VMs20core CPU256Gb RAM

1core CPU8Gb RAM

time budgets

1 2 4 8m

2core CPU16Gb RAM

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

HP Z820 16 VMs20core CPU256Gb RAM

1core CPU8Gb RAM

time budgets

1 2 4 8m

2core CPU16Gb RAM

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

1 2 4 8m

RUNs 4, 5, 6

generatetest cases

collect metrics

CalculateScore

4thJavaunittes=ngcompe==on

8

Page 10: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

Randoop

Test classes

@Test

@Test

@Test

compilablerun to detect and remove flaky tests

Test classes

@Test

@Test

No flaky tests

run tocollect metrics

calculate score

benchmark tool

runtool runtool runtool runtool

T3 EvoSuite jTexpert

Time-budget

(1, 2 , 4, 8min)

Y

N

CUT(fixed)

CUT(1 real fault)

CUT(mutated)

generate

CUT(fixed)

TheMethodology

4thJavaunittes=ngcompe==on

9

Page 11: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  Flakytests§  Passesduringgenera=on§  But,mightFailduringexecu=on/assessment§  False-posi=vewarnings

§ Nonreliablefault-detec=on§ Nonreliablemuta=onanalysis

§  Defects4Jflakytestssanity§  Noncompilingtestclasses§  Failingtestsover5execu=ons(fixedCUTversions)

4thJavaunittes=ngcompe==on

10

Page 12: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  TheMetrics–Testeffec=veness§  Codecoverage(fixedbenchmarkversions)

§ Defects4J<-Cobertura§  Statementcoverage§ Condi=oncoverage

§ Muta=onscore§ Defects4J<-Majorframework(allmuta=onoperators)

§  Realfaultdetec=on(buggybenchmarkversions)§  1realfaultperbenchmark§  0or1score,independentofhowmanytestsrevealit

4thJavaunittes=ngcompe==on

11

Page 13: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

covScore(T,L,C,r) := wi · covi + wb · covb + wm · covm +

(real fault found ? wf : 0)

TheMethodology

§  TheScoringformulaT=Tool;L=Timebudget;C=CUT;r=RUN(1..6)Coverages:covi=statement;covb=condi=on

covm=mutantskillra=oWeights:wi=1;wb=2;wm=4;wf=4

4thJavaunittes=ngcompe==on

12

Page 14: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  TheScoringformula–=mepenalty

§  Testgenera=onslot:L..2·L§  NopenaltyifgenTime<=L§  PenaltyforExtra=metaken(genTime–L)

§ HalfcovScoreiftheToolmustbekilled(>2·L)

4thJavaunittes=ngcompe==on

13

Page 15: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  TheScoringformula–testspenalty

#Classes=generatedtestclasses;#uClasses=uncompilable#Tests=testcases;#fTests=flaky

4thJavaunittes=ngcompe==on

14

Page 16: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  TheScoringformula–ToolscoreScore(T,L,C):=avg(Score(T,L,C,r)forallrexecu=ons

Score(T,L,C,r) := tScore(T,L,C,r) – penalty(T,L,C,r)

4thJavaunittes=ngcompe==on

15

Page 17: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  Conclusionvalidity§  Reliabilityoftreatmentimplementa=on

§  Tooldeploymentinstruc=onsEQUALforallpar=cipants

§  Reliabilityofmeasures§  Efficiency:wallclock=mebyJavaSystem.currentTimeMillis()

§  Effec=veness:Defects4J§  Toolsnon-determinis=cnature:6runs(HWCapacity)

4thJavaunittes=ngcompe==on

16

Page 18: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheMethodology

§  Internalvalidity§  CUTsfromDefects4J(uniformandarbitraryselec=onfrom5opensourceprojects)§  ToolsandbenchmarkinfrastructureTune-upsamples§ Contestbenchmarks

§ Wrappersruntool:implementedbyToolsside§  Constructvalidity

§  Scoringformulaweights–qualityindicatorsvalue§  Empiricalstudies–correla=onofproxymetricsfor:Testeffec=venessandFaultfindingcapability

4thJavaunittes=ngcompe==on

17

Page 19: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

TheResults

Contestrunfor~1week

Testgenera=on,execu=onandassessment

x32 VMs

Asinglevirtual

machinewoulduse8CPUmonths!

4thJavaunittes=ngcompe==on

18

Page 20: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

Lessonslearned

§  Tes=ngToolsimprovements§  Automa=on,Testeffec=veness,Comparability

§  Benchmarkinginfrastructureimprovements§  DecouplingTestgen.fromexecu=on/assessment§  Flakytestsiden=fica=onandsanity§  Faultfindingcapabilitymeasurement§  Testeffec=venessduetoTestgenera=on=me§ Whatnext?

§ Automatedparalleliza=onofthebenchmarkcontest§ MoreTools,newlanguages?(i.e.C#?)

4thJavaunittes=ngcompe==on

19

Page 21: Unit Tesng Tool Compe))on Round Fourggay/sbst2016/slides/competition.pdf · Unit Testing Tool Competition FITTEST crest.cs.ucl.ac. uk/fi;est Coverage metrics Mutation metrics CUTs

Contactus

UniversidadPolitécnicadeValencia,[email protected],[email protected]

OpenUniversiteitHeerlen,[email protected]

UniversityofMassachuseysAmherst,MA,[email protected]

UniversityofBuenosAires,[email protected]

web:hyp://sbstcontest.dsic.upv.es/

4thJavaunittes=ngcompe==on

20