87
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments Shane McIntosh @shane_mcintosh [email protected]

Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Embed Size (px)

DESCRIPTION

Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.

Citation preview

Page 1: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Collecting and Leveraging a Benchmark of Build System Clones

to Aid in Quality Assessments

Shane McIntosh

@[email protected]

Page 2: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Collecting and Leveraging a Benchmark of Build System Clones

to Aid in Quality Assessments

Shane McIntosh

@[email protected]

Martin Poehlmann

Elmar Juergens

Page 3: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Collecting and Leveraging a Benchmark of Build System Clones

to Aid in Quality Assessments

Shane McIntosh

@[email protected]

Martin Poehlmann

Elmar Juergens

Audris Mockus

Page 4: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Collecting and Leveraging a Benchmark of Build System Clones

to Aid in Quality Assessments

Shane McIntosh

@[email protected]

Bram Adams

Ahmed E. Hassan

Martin Poehlmann

Elmar Juergens

Audris Mockus

Page 5: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Collecting and Leveraging a Benchmark of Build System Clones

to Aid in Quality Assessments

Shane McIntosh

@[email protected]

Bram Adams

Ahmed E. Hassan

Martin Poehlmann

Elmar Juergens

Audris Mockus

Brigitte Haupt

Christian Wagner

Page 6: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

What is a build system?

Source code

Page 7: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

What is a build system?

Source code

Deliverable

Page 8: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

.tex

.c

.cc

.o

.o

.dvi

.a

.exe

.pdf

.deb

Build systems describe how sources are translated into deliverables

3

Page 9: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

4

Step 1 - Configuration

Step 2 - Construction

Step 3 - Certification

Step 4 - Packaging

Step 5 - Deployment

Page 10: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

5

Step 1 - Configuration

Page 11: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

6

Step 2 - Construction

Page 12: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

7

Step 3 - Certification

Page 13: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

8

Step 4 - Packaging

Page 14: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

9

Step 5 - Deployment

Page 15: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Page 16: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Commit

Commit 9719cf0

Page 17: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Commit

BuildCommit 9719cf0

Page 18: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Commit

Build

Test

Commit 9719cf0

Page 19: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Commit

Build

Test

ReportCommit 9719cf0 broke the build!

Commit 9719cf0

Page 20: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Continuous Integration:!Enabled by the

build system

10

Commit

Build

Test

ReportCommit 9719cf0 broke the build!

Commit 9719cf0

Page 21: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

“...nothing can be said to be certain, except death and taxes” - Benjamin Franklin

The Build “Tax”

An Empirical Study of Build Maintenance Effort!

S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan

[ICSE 2011]

Up to 27% of source changes require build

changes, too!

11

Page 22: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

“...nothing can be said to be certain, except death and taxes” - Benjamin Franklin

The Build “Tax”

An Empirical Study of Build Maintenance Effort!

S. McIntosh, B. Adams, T. H. D. Nguyen, Y. Kamei, A. E. Hassan

[ICSE 2011]

Up to 27% of source changes require build

changes, too!

How do practitioners cope with build maintenance?

11

Page 23: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Cloning!

Useful build logic

12

Useful build logic

Page 24: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Cloning!

Useful build logic

12

Copy&

paste Useful build logic

Page 25: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

13

Page 26: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

13

Page 27: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

13

Page 28: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

13

Page 29: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

One monolithic build system

13

Page 30: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

One monolithic build system

1.1 million lines

of build logic

13

Page 31: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

One monolithic build system

1.1 million lines

of build logic

Clone coverage of 94%-99%

13

Page 32: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

One monolithic build system

1.1 million lines

of build logic

Clone coverage of 94%-99%

Inflation due to cloning of

10x-23x

13

Page 33: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Excessive cloning makes build maintenance painful

!

!

!

!

!

30 custom business

applications

One monolithic build system

1.1 million lines

of build logic

Clone coverage of 94%-99%

Inflation due to cloning of

10x-23x

Build changes manually duplicated 30 times13

Page 34: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

Page 35: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

Page 36: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

Measured

Page 37: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

Measured Typical

Page 38: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

Measured Typical

Page 39: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical

Page 40: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

14

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical

Page 41: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

1515

Page 42: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

15

Large collection of open source

15

Page 43: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

15

Large collection of open source3,872 projects

15

Page 44: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

15

Large collection of open source3,872 projects

1,275 Java projects

Ant Maven

15

Page 45: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

15

Large collection of open source3,872 projects

2,597 C/C++ projects

Autotools CMake

1,275 Java projects

Ant Maven

15

Page 46: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

16

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical

Page 47: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Apache Github GNU Sourceforge

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

0.00

0.25

0.50

0.75

1.00

Ant Maven Ant Maven Ant Ant Maven

Clo

ne C

over

age

MinimumLength

5101520

17

Lots of small clones in Java build systems

Page 48: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Apache Github GNU Sourceforge

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

0.00

0.25

0.50

0.75

1.00

Ant Maven Ant Maven Ant Ant Maven

Clo

ne C

over

age

MinimumLength

5101520

17

Build logic clones tend to be small

Lots of small clones in Java build systems

Page 49: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Apache Github GNU Sourceforge

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

0.00

0.25

0.50

0.75

1.00

Ant Maven Ant Maven Ant Ant Maven

Clo

ne C

over

age

MinimumLength

5101520

17

Half of Maven systems have clone coverage of at least 47%-50%

Build logic clones tend to be small

Lots of small clones in Java build systems

Page 50: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Apache Github GNU Sourceforge

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

0.00

0.25

0.50

0.75

1.00

Ant Maven Ant Maven Ant Ant Maven

Clo

ne C

over

age

MinimumLength

5101520

17

Half of Maven systems have clone coverage of at least 47%-50%

Build logic clones tend to be small1/4 of Ant systems have

clone coverage of >50%

Lots of small clones in Java build systems

Page 51: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

18

Apache Github GNU Sourceforge

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0

0.2

0.4

0.6

Autotools CMake Autotools CMake Autotools CMake Autotools CMake

Clo

ne C

over

age

MinimumLength

5101520

Less cloning in C/C++ build systems

Page 52: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

18

Apache Github GNU Sourceforge

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0

0.2

0.4

0.6

Autotools CMake Autotools CMake Autotools CMake Autotools CMake

Clo

ne C

over

age

MinimumLength

5101520

Less cloning in C/C++ build systems

Page 53: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

18

Apache Github GNU Sourceforge

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0

0.2

0.4

0.6

Autotools CMake Autotools CMake Autotools CMake Autotools CMake

Clo

ne C

over

age

MinimumLength

5101520

Less cloning in C/C++ build systems

Clone coverage rarely exceeds 30% in C/C++ build systems

Page 54: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

18

Apache Github GNU Sourceforge

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0

0.2

0.4

0.6

Autotools CMake Autotools CMake Autotools CMake Autotools CMake

Clo

ne C

over

age

MinimumLength

5101520

Less cloning in C/C++ build systems

Clone coverage rarely exceeds 30% in C/C++ build systems

Yet there are still some C/C++ build systems with plenty of clones!27%-66% 21%-48%

Page 55: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

19

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical

Page 56: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

19

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical>50% clone coverage

is common

<30% clone coverage

is common

Page 57: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

20

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical>50% clone coverage

is common

<30% clone coverage

is common

Page 58: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

21

Page 59: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

21

Page 60: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total

21

Page 61: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

21

Page 62: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

Sample!(95%±5%) 382 382 378 349 1,491

21

Page 63: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

Sample!(95%±5%) 382 382 378 349 1,491

Config.Const.Cert.Pkg.Depl.

32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%

21

Page 64: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

Sample!(95%±5%) 382 382 378 349 1,491

Config.Const.Cert.Pkg.Depl.

32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%

Cloning shifts from construction to configuration

21

Page 65: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

Sample!(95%±5%) 382 382 378 349 1,491

Config.Const.Cert.Pkg.Depl.

32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%

Cloning shifts from construction to configuration

Construction is the most heavily cloned C/C++ build phase

21

Page 66: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Manual analysis of a statistically representative sample of clones

Autotools CMakeAnt Maven Total56,521 71,543 23,723 3,746All clones 155,533

Sample!(95%±5%) 382 382 378 349 1,491

Config.Const.Cert.Pkg.Depl.

32% 79% 22% 40%64% 17% 56% 66%12% 4% 13% 11%25% 21% 21% 2%11% 1% 9% 7%

Cloning shifts from construction to configuration

Construction is the most heavily cloned C/C++ build phase

Rarely cloned due to CPack

21

Page 67: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

22

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical>50% clone coverage

is common

<30% clone coverage

is common

Page 68: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

22

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical Cloning shifts from const. to

config.

Mostly construction

clones

>50% clone coverage

is common

<30% clone coverage

is common

Page 69: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

23

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical Cloning shifts from const. to

config.

Mostly construction

clones

>50% clone coverage

is common

<30% clone coverage

is common

Page 70: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

24

Teams often migrate from one technology to another

Autotools CMakeAnt Maven

Page 71: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

24

Teams often migrate from one technology to another

Autotools CMakeAnt MavenCould technology migration help to reduce cloning?

Page 72: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Migration is not a silver bullet

0.390.47

0.52

0.700.75

0.82

0.00 0.00 0.04

0.220.30

0.47

0.15

0.25

0.36

0.680.77

0.84

0.00 0.00 0.00

0.180.26

0.39

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Proportion of Systems

Clo

ne C

over

age

AbnormalityVery highHighModerately highNormalModerately lowLowVery low

Technology● Ant

AutotoolsCMakeMaven

25

Page 73: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Migration is not a silver bullet

0.390.47

0.52

0.700.75

0.82

0.00 0.00 0.04

0.220.30

0.47

0.15

0.25

0.36

0.680.77

0.84

0.00 0.00 0.00

0.180.26

0.39

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Proportion of Systems

Clo

ne C

over

age

AbnormalityVery highHighModerately highNormalModerately lowLowVery low

Technology● Ant

AutotoolsCMakeMaven

25

More cloning in

Maven than Ant

Page 74: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Migration is not a silver bullet

0.390.47

0.52

0.700.75

0.82

0.00 0.00 0.04

0.220.30

0.47

0.15

0.25

0.36

0.680.77

0.84

0.00 0.00 0.00

0.180.26

0.39

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Proportion of Systems

Clo

ne C

over

age

AbnormalityVery highHighModerately highNormalModerately lowLowVery low

Technology● Ant

AutotoolsCMakeMaven

25

More cloning in

Maven than Ant

High thresholds!are very similar

Page 75: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

Migration is not a silver bullet

0.390.47

0.52

0.700.75

0.82

0.00 0.00 0.04

0.220.30

0.47

0.15

0.25

0.36

0.680.77

0.84

0.00 0.00 0.00

0.180.26

0.39

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Proportion of Systems

Clo

ne C

over

age

AbnormalityVery highHighModerately highNormalModerately lowLowVery low

Technology● Ant

AutotoolsCMakeMaven

25

More cloning in

Maven than Ant

High thresholds!are very similar

Q: How are they avoiding build cloning?

Page 76: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

26

Using abstraction mechanisms not!provided by the build technologies

Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.

While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).

We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.

While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include

directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.

Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-

fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.

Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.

(RQ5) How do build systems with few clonesachieve low clone rates?

To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.

<!-- Define references to files containing common targets --><!DOCTYPE project [

<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>

...<project name="bea" default="all">

<!-- Include the file containing common targets. -->&modules -common;

</project >

Listing 1: Using XML entity expansion to importcommon build code in the Keel system.

Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to

abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage

avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.

XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.

7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.

Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-

Page 77: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

26

Using abstraction mechanisms not!provided by the build technologies

Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.

While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).

We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.

While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include

directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.

Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-

fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.

Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.

(RQ5) How do build systems with few clonesachieve low clone rates?

To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.

<!-- Define references to files containing common targets --><!DOCTYPE project [

<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>

...<project name="bea" default="all">

<!-- Include the file containing common targets. -->&modules -common;

</project >

Listing 1: Using XML entity expansion to importcommon build code in the Keel system.

Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to

abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage

avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.

XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.

7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.

Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-

Store an external block of XML in a macro

Page 78: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

26

Using abstraction mechanisms not!provided by the build technologies

Java files as inputs using wildcards. Hence, the compile tar-get is generic, and often cloned in several Ant specifications.

While construction accounts for many of the clones in Antbuild systems (64%), most Maven clones impact configura-tion (79%). The next most frequently cloned phase (pack-aging) appears in less than a third as many clones (21%).

We observed that many of the Maven clones replicatethird-party dependency lists or plugin configuration amongsubsystems. While this ensures that each subsystem can bebuilt independently of the others, it imposes a heavy load onmaintainers, who will need to update several pom.xml filesin order to modify third-party dependency lists or updateplugin configurations.Construction is the most heavily cloned build phasein C/C++ build systems. Table 3 shows that thebuild subcategory represents 48% and 65% of Autotools andCMake clones respectively. The filesystem category is alsodetected in 19% of Autotools and 1% of CMake clones. Allin all, the construction phase accounts for 56% of Autotoolsclones and 66% of CMake clones.

While the Autotools construction phase is most frequentlycloned (56%), Table 3 shows that packaging details are alsocloned often (21%). Yet packaging details are rarely clonedin CMake (2%). Many of these Autotools packaging cloneshave to do with repetition of data file lists among subsys-tems. Since Autotools build specifications generate recursivemake build systems, variables are not shared among subsys-tems [23]. Although Autotools offers developers an include

directive, we have observed that rather than place sharedvariables in a header-like file, developers often clone vari-ables that have a shared scope. Similar to Maven, wheredependency lists and plugin configuration were replicated,developers likely duplicate shared variables to facilitate sub-system independence. However, this makes system-widechanges more difficult.

Conversely, we observe that CMake packaging details arerarely cloned. We observe that developers leverage built-in CPack functionality of CMake [20], where packaging de-tails are typically specified in a single location: CPackCon-

fig.cmake. This eliminates the need to replicate packagingdetails in subsystem specifications.There are more configuration clones in the morerecent build technologies. Table 3 shows that thereare more than twice as many configuration clones in Mavenbuild systems (79%) than Ant ones (32%). Similarly, thereare almost twice as many configuration clones in CMake(40%) than there are in Autotools (22%). In Section 5, wereport that these more recent technologies are more proneto cloning (RQ2). The shift of cloning tendencies towardsconfiguration likely contributes to the inflated cloning valueswe observe in the more recent technologies.

Configuration details are cloned more often in the morerecent CMake and Maven build technologies. For Javabuild systems, Maven clones favour the configurationphase, while Ant clones (and clones in C/C++ builds)favour construction. CMake packaging support (CPack)helps to reduce cloning in the packaging phase.

(RQ5) How do build systems with few clonesachieve low clone rates?

To address RQ5, we analyze clones in the systems with thelowest and highest cloning rates for each studied technology.

<!-- Define references to files containing common targets --><!DOCTYPE project [

<!ENTITY modules -common SYSTEM "../ modules -common.ent">]>

...<project name="bea" default="all">

<!-- Include the file containing common targets. -->&modules -common;

</project >

Listing 1: Using XML entity expansion to importcommon build code in the Keel system.

Much Java build cloning can be avoided by exploit-ing the underlying XML representation. We observethat entire files are duplicated in the Ant and Maven systemswith the highest clone coverage. In these cases, developmentteams duplicate existing build specifications to rapidly de-velop new subsystems. However, developers have referredto maintaining such build systems fraught with clones asa “nightmare” (cf. Section 3). Defect fixes or updates todependency lists, tool configuration, and packaging detailsmust be carefully replicated among the clones to ensure thatbuilds continue to assemble deliverables correctly.On the other hand, we have observed that in addition to

abstraction mechanisms provided by Ant and Maven (e.g.,the include and import tasks), XML-based build systemsavoid cloning by leveraging the underlying XML represen-tation. In prior work, we note that the JBoss build sys-tem leverages XML entity expansion in Ant to implement aframework-driven build system referred to as“buildmagic”[21].Indeed, Listing 1 shows how one can use XML entity expan-sion to avoid duplicating shared build code in subsystembuild specifications. We also find that the Ant test suite in-cludes regression tests to ensure that entity expansion con-tinues to work, suggesting that it is not a workaround, butinstead is intentionally supported functionality.Many C/C++ build logic clones can be avoided byduplicating templates automatically when building.Many of the studied C/C++ systems provide developmentAPIs. As such, they ship examples of how to use vari-ous API functionality with their deliverables. These exam-ples include accompanying build specifications. However,in the C/C++ systems with the highest clone coverage, wefind that many of these example build specifications are fileclones of each other, which poses maintainability problems.One of the studied systems with a low clone coverage

avoids these clones by duplicating and specializing templatebuild specification automatically using shell scripts duringthe construction phase. Using this approach, cloning sharedbuild code in example build specifications can be avoided.

XML entity expansion can be used to avoid cloning sharedbuild code in Java build systems. Cloning of build logicshipped with API usage examples can be avoided by au-tomatically deriving specifications at build-time.

7. THREATS TO VALIDITYWe now discuss the threats to the validity of our analysis.

Construct validity. Our clone detection tool is config-ured to only detect Type I (exact) clones. Since we do notdetect Type II, III, or IV clones, our cloning results shouldbe interpreted as lower bounds rather than exact values.Internal validity. We assume that large values of cloningmetrics suggest maintenance problems illustrated in our Mu-

Store an external block of XML in a macro

Expand the macro

Page 79: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

27

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical Cloning shifts from const. to

config.

Mostly construction

clones

>50% clone coverage

is common

<30% clone coverage

is common

Page 80: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments

27

How much cloning is typical?

What type of logic is being

cloned?

Configuration details

Packaging specifications

What can be done to mitigate

cloning?

Measured Typical Cloning shifts from const. to

config.

Mostly construction

clones

Ant Maven may reduce

cloning

Use of “creative”

abstraction

>50% clone coverage

is common

<30% clone coverage

is common

Page 81: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 82: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 83: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 84: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 85: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 86: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Page 87: Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments