54
PVS Empirical Study of Usage and Performance of Java Collections 1 Diego Costa , 1 Artur Andrzejak, 1 Janos Seboek, 2 David Lo 1 Heidelberg University, 2 Singapore Management University 1

Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

PVS

Empirical Study of Usage and Performance of Java Collections

1Diego Costa, 1Artur Andrzejak, 1Janos Seboek, 2David Lo

1Heidelberg University, 2Singapore Management University

1

Page 2: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Published as:

Paper/slides available at: https://pvs.ifi.uni-heidelberg.de/publications/

Empirical Study of Usage and Performance of Java Collections

1Diego Costa, 1Artur Andrzejak, 1Janos Seboek, 2David Lo

2

1Heidelberg University, 2Singapore Management University

Diego Costa, Artur Andrzejak, Janos Seboek, and David Lo. 2017. Empirical Study of Usage and Performance of Java Collections. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17), L'Aquila, Italy — April 22 - 26, 2017

Page 3: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collections

• Collections are objects that groups multiple elements into a single unit.

• Use its metadata to track, access and manipulate its elements

2

Page 4: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collections

• Collections are objects that groups multiple elements into a single unit.

• Use its metadata to track, access and manipulate its elements

Object[]

Integer

ArrayList<Integer>

2

Page 5: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collections

• Collections are objects that groups multiple elements into a single unit.

• Use its metadata to track, access and manipulate its elements

CollectionOverhead

ElementsFootprint

Object[]

Integer

ArrayList<Integer>

2

Page 6: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Motivation

• Numerous studies have identified the inefficient use of collections as the main cause of runtime bloat

Execution Time

+17% Improv.

Configuration of one HashMap instance

[Liu et al. 2009]

Memory Usage

+54% Improv.

Use of ArrayMapsinstead of HashMaps

[Ohad et al. 2009]

Energy Consumption

+300% Improv.

Use of ArrayList instead of LinkedList

[Jung et al. 2016]

3

Page 7: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collection Frameworks

• The Java Collection Framework offers a Standardimplementation of the major collection abstractions

• Stable and reliable framework

• Easy to use

• However, there exist alternative libraries that provide a myriad of different implementations:

• Primitive Collections (IntArrayList)

• Immutable Collections

• Multimaps (Map<K, Collection>)

• Multisets (Map<K, int>)

4

Page 8: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collection Frameworks

• The Java Collection Framework offers a Standardimplementation of the major collection abstractions

• Stable and reliable framework

• Easy to use

• However, there exist alternative libraries that provide a myriad of different implementations:

• Primitive Collections (IntArrayList)

• Immutable Collections

• Multimaps (Map<K, Collection>)

• Multisets (Map<K, int>)

4

Unsupported features

Page 9: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Collection Frameworks

• The Java Collection Framework offers a Standardimplementation of the major collection abstractions

• Stable and reliable framework

• Easy to use

• However, there exist alternative libraries that provide a myriad of different implementations:

• Primitive Collections (IntArrayList)

• Immutable Collections

• Multimaps (Map<K, Collection>)

• Multisets (Map<K, int>)

4

Unsupported features

Simplified API

Page 10: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Analysis of Performance Impact of Alternative Collections

1. Study on Collections Usage• How often do programmers use alternative implementations?

2. Experimental Evaluation of popular Java Collection Libraries• Are there better alternatives to the most commonly used Collections

regarding performance?

5

Goal: Can we find alternatives to the Standard collection

types which improve performance on time/memory?

Page 11: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Study on Usage of Collections

• Dataset• We analyze the GitHub Java Corpus

• 10K projects

• 268 MLOC

• Static Analysis• Use of Java Parser to extract variable declaration and allocation sites

of Types with suffix:

{List, Map, Set, Queue, Vector}

• Manually removed false-positives

6

Page 12: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Developers Rarely Use non-Standard Collections

ArrayList48%

HashSet10%

HashMap22%

Lists57%

Sets15%

Maps28%

7

Page 13: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

ArrayList48%

LinkedList5%

HashSet10%

LinkedHashSet1%

HashMap22%

LinkedHashMap2%

8

Lists57%

Sets15%

Maps28%

Developers Rarely Use non-Standard Collections

Page 14: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

ArrayList48%

LinkedList5%Other Lists

4%

HashSet10%

TreeSet1%

LinkedHashSet…

Other Sets3%

HashMap22%

LinkedHashMap2%

TreeMap1%

Other Maps3%

• Top 4 represent 86% of all declared instantiations

• Non-Standard collections are declared <4%

9

Lists57%

Sets15%

Maps28%

Developers Rarely Use non-Standard Collections

Evaluate alternatives to

ArrayList, HashMap,

HashSet and LinkedList

Page 15: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Commonly Used Element Data Types

From the categorized data types:

• Strings are the most commonly held data type, followed by Numeric

Lists Sets Map Keys Map Values

Top50

String

Numeric

Collections

Other

0%

20%

40%

60%

80%

100%

Lists Sets MapKeys

MapValues

10

Page 16: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Commonly Used Element Data Types

From the categorized data types:

• Strings are the most commonly held data type, followed by Numeric

Lists Sets Map Keys Map Values

Top50

String

Numeric

Collections

Other

0%

20%

40%

60%

80%

100%

Lists Sets MapKeys

MapValues

10

Evaluate collections holding

Strings, Integer and Long

Page 17: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Initial Capacity is Rarely Specified

ArrayList HashMap HashSet

Top50

Specified

Default

Copy

0%

20%

40%

60%

80%

100%

List Map Set

11

Page 18: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Initial Capacity is Rarely Specified

ArrayList HashMap HashSet

Top50

Specified

Default

Copy

0%

20%

40%

60%

80%

100%

List Map Set

11

Evaluate collections with

default Initial Capacity

Page 19: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives

• Superior Alternative: a Non-Standard implementation that can outperform a Standard counterpart in terms of execution time and/or memory consumption.

• Can we find a superior alternative to the most commonly used collection types?

12

Page 20: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Study on Java Collections

• We selected 6 alternative libraries:• Repository Popularity (GitHub)

• Appearance in previous partial benchmarks

13

Libraries Version JCF

Compatible

Available at

Trove 3.0.3 yes trove.starlight-systems.com

Guava 18.0 yes

github

/google/guava

GSCollections 6.2.0 yes /goldmansachs/gs-collections

HPPC 0.7.1 no /carrotsearch/hppc

Fastutil 7.0.10 yes /vigna/fastutil

Koloboke 0.6.8 yes /leventov/Koloboke

Page 21: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Study on Java Collections

• Seven typical scenarios evaluated• populate, iterate, contains, get, add, remove, copy

• Collections holding from 100 to 1 million elements

• Alternatives to the most commonly used collections• JDK 1.8.0_65

• ArrayList, HashMap, HashSet and LinkedList• Object collection alternatives

• Primitive collection alternatives

14

Page 22: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

CollectionsBench Suite

• We create a benchmark suite: CollectionsBench• Open Java Microbenchmark Harness

@Setuppublic void setup() {fullList = this.createNewList();fullList.addAll(values); // Randomly

} generated

@Benchmarkpublic void iterate() {for (T element : fullList) {blackhole.consume(element);// Blackholes avoid dead-code // optimization}

} 15

Page 23: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

CollectionsBench Suite

• We create a benchmark suite: CollectionsBench

@Setuppublic void setup() {fullList = this.createNewList();fullList.addAll(values); // Randomly

} generated

return new ArrayList<T>();

return new LinkedList<T>();

Only the instantiation is needed for each collection type*

return new FastList<T>();

Here we measure:

• Execution time (ns)

• Collection Overhead (allocation)• GC Profiler

@Benchmarkpublic void iterate() {for (T element : fullList) {blackhole.consume(element);// Blackholes avoid dead-code // optimization}

} 16

Page 24: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Planning

• To accomplish a steady state performance evaluation:

@Benchmarkpublic void iterate()

5s

Iteration

17

Page 25: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Planning

• To accomplish a steady state performance evaluation:

@Benchmarkpublic void iterate()

5s

Iteration

17

IterationWarmup

Measured Iteration

10𝑥

30𝑥

Fork 1

Page 26: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Planning

• To accomplish a steady state performance evaluation:

@Benchmarkpublic void iterate()

5s

Iteration

17

IterationWarmup

Measured

Iteration

Iteration Iteration

10𝑥

30𝑥

10𝑥

30𝑥

Fork 1 Fork 2

Page 27: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Experimental Planning

• To accomplish a steady state performance evaluation:

@Benchmarkpublic void iterate()

5s

Iteration

17

IterationWarmup

Measured

Iteration

Iteration Iteration

AVG Time ± CI (99%)AVG Memory allocated ± CI (99%)

10𝑥

30𝑥

10𝑥

30𝑥

Fork 1 Fork 2

Page 28: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Reporting Speedup/Slowdown

• We present the results of alternatives normalized to the Standard implementation performances.

• Means with overlapping CI are set to zero

• We use the following speedup/slowdown definitions:

𝑆 =

𝑇𝑠𝑡𝑑𝑇𝑎𝑙𝑡

, 𝑖𝑓 𝑇𝑠𝑡𝑑 > 𝑇𝑎𝑙𝑡

−𝑇𝑎𝑙𝑡𝑇𝑠𝑡𝑑

, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

18

Page 29: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Reporting Speedup/Slowdown

• We present the results of alternatives normalized to the Standard implementation performances.

• Means with overlapping CI are set to zero

• We use the following speedup/slowdown definitions:

𝑆 =

𝑇𝑠𝑡𝑑𝑇𝑎𝑙𝑡

, 𝑖𝑓 𝑇𝑠𝑡𝑑 > 𝑇𝑎𝑙𝑡

−𝑇𝑎𝑙𝑡𝑇𝑠𝑡𝑑

, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

1 1 1

2 7

1 1 1

2 7

𝑆 = {0, 1.5, 1.9, 2, 7.8}

𝑆 = {0,−1.5,−1.9, −2,−7.8}

Heat map

Overlapping

Confidence Intervals 19

Page 30: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Reporting Memory Overhead

• For the memory comparison we present the collection overhead reduction per element (with compressed object pointers)

20

Page 31: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Reporting Memory Overhead

• For the memory comparison we present the collection overhead reduction per element (with compressed object pointers)

For instance

Collection X: 100 bytes per element

Collection Y: 10 bytes per element

• Evaluated on the copy scenario

20

Page 32: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Reporting Memory Overhead

• For the memory comparison we present the collection overhead reduction per element (with compressed object pointers)

For instance

Collection X: 100 bytes per element

Collection Y: 10 bytes per element

• Evaluated on the copy scenario

90%

Reduction of Collection Overhead

20

Page 33: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: LinkedList

• LinkedList was outperformed by everyArrayList alternative

contains add get remove copyJCF

2 2 2 2

31 0 0 1 0

* * * * *1

4 6 32

11 12 16 11 12

FU2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

3 3 4 3 4

GS2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

10 13 16 12 12

HP1 2 2 2 2 2 1 1 1 1

* * * * *1

3 6 32 2 2 1 -1 1

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

84%

JCF LinkedList$Entryconsumes 24 bytes

21

StandardFastutil

GSCollec.HPPC

Reduction of Collection Overhead

Page 34: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: LinkedList

• LinkedList was outperformed by everyArrayList alternative

contains add get remove copyJCF

2 2 2 2

31 0 0 1 0

* * * * *1

4 6 32

11 12 16 11 12

FU2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

3 3 4 3 4

GS2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

10 13 16 12 12

HP1 2 2 2 2 2 1 1 1 1

* * * * *1

3 6 32 2 2 1 -1 1

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

Asymptotic disadvantage

84%

JCF LinkedList$Entryconsumes 24 bytes

21

StandardFastutil

GSCollec.HPPC

Reduction of Collection Overhead

Page 35: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: LinkedList

• LinkedList was outperformed by everyArrayList alternative

contains add get remove copyJCF

2 2 2 2

31 0 0 1 0

* * * * *1

4 6 32

11 12 16 11 12

FU2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

3 3 4 3 4

GS2 2 2 2

31 1 1 1 1

* * * * *1

4 6 32

10 13 16 12 12

HP1 2 2 2 2 2 1 1 1 1

* * * * *1

3 6 32 2 2 1 -1 1

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

Asymptotic disadvantage Asymptotic advantage

public boolean remove(Object o)

84%

JCF LinkedList$Entryconsumes 24 bytes

21

StandardFastutil

GSCollec.HPPC

Reduction of Collection Overhead

Page 36: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: ArrayList

• GSCollections provides a superior alternative• Faster when populating the list (no time penalty)

populate iterate copy

FU1 1 1 1 1 0 0 0 0 0

-4 -4 -4 -3 -3

GS1 1 1 0 1 0 0 0 0 0 0 0 0 0 0

HP1 2 2 1 2 -1 -1 -1 -1 -1

-6 -7 -14 -14 -9

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

No memory difference

22

FastutilGSCollec.

HPPC

Page 37: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: ArrayList

• GSCollections provides a superior alternative• Faster when populating the list (no time penalty)

populate iterate copy

FU1 1 1 1 1 0 0 0 0 0

-4 -4 -4 -3 -3

GS1 1 1 0 1 0 0 0 0 0 0 0 0 0 0

HP1 2 2 1 2 -1 -1 -1 -1 -1

-6 -7 -14 -14 -9

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

Distinct Array copy calls

Std: Arrays.copyOf();

Alt: System.arraycopy();

No memory difference

22

FastutilGSCollec.

HPPC

Page 38: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: ArrayList

• GSCollections provides a superior alternative• Faster when populating the list (no time penalty)

populate iterate copy

FU1 1 1 1 1 0 0 0 0 0

-4 -4 -4 -3 -3

GS1 1 1 0 1 0 0 0 0 0 0 0 0 0 0

HP1 2 2 1 2 -1 -1 -1 -1 -1

-6 -7 -14 -14 -9

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

Distinct Array copy calls

Std: Arrays.copyOf();

Alt: System.arraycopy();

HPPC adds each element instead of copying the array

No memory difference

22

FastutilGSCollec.

HPPC

Page 39: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: HashSet

• Koloboke provides a superior alternative• Fastutil is a solid 2nd option

populate iterate contains add remove copyFU

0 -1 -1 0 -2 1 2 2 1

30 0 0 1 1 0 0 -2 0 0 0 -1 -1 1 1 2 1 0 2 1

GS-1 -1 0 0 -2 -1 1 0 0 0 -1 -1 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 1 2 1

Gu0 0 0 0 -1 -2 0 -1 -1 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 -2 -2 0

HP-2 -2 -1 0 -2 0 2 1 0

30 0 0 1 1 0 0 -2 0 0 0 -1 -2 1 0 1 -1 -1 1 0

Ko0 0 0 1 0 0 2 2 1

30 0 0 1 1 0 0 -2 0 0 0 -1 -1 1 1

13 16 29 29 51

Tr-2 -2 -2 0 -2 -2 -2 0 0 2 0 0 0 0 1 -2 -2 -2 -2 -2 -2 -2 -2 -1 -2 0 -2 -2 1 -2

100 1K 10K 100K 1M 100 1K 10K100

K1M 100 1K 10K

100K

1M 100 1K 10K100

K1M 100 1K 10K

100K

1M 100 1K 10K100

K1M

75%

Std HashSet$Nodeobject consumes 32 bytes

23

FastutilGSCollec.

GuavaHPPC

KolobokeTrove

Reduction of Collection Overhead

Page 40: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: HashSet

• Koloboke provides a superior alternative• Fastutil is a solid 2nd option

populate iterate contains add remove copyFU

0 -1 -1 0 -2 1 2 2 1

30 0 0 1 1 0 0 -2 0 0 0 -1 -1 1 1 2 1 0 2 1

GS-1 -1 0 0 -2 -1 1 0 0 0 -1 -1 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 1 2 1

Gu0 0 0 0 -1 -2 0 -1 -1 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 -2 -2 0

HP-2 -2 -1 0 -2 0 2 1 0

30 0 0 1 1 0 0 -2 0 0 0 -1 -2 1 0 1 -1 -1 1 0

Ko0 0 0 1 0 0 2 2 1

30 0 0 1 1 0 0 -2 0 0 0 -1 -1 1 1

13 16 29 29 51

Tr-2 -2 -2 0 -2 -2 -2 0 0 2 0 0 0 0 1 -2 -2 -2 -2 -2 -2 -2 -2 -1 -2 0 -2 -2 1 -2

100 1K 10K 100K 1M 100 1K 10K100

K1M 100 1K 10K

100K

1M 100 1K 10K100

K1M 100 1K 10K

100K

1M 100 1K 10K100

K1M

75%

Koloboke performs a memory copy of its HashTable

Std HashSet$Nodeobject consumes 32 bytes

Impact of memory efficiency on time

23

FastutilGSCollec.

GuavaHPPC

KolobokeTrove

Reduction of Collection Overhead

Page 41: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: HashMap

• Standard HashMap is a solidimplementation

• No superior alternatives on time

populate iterate contains remove copyFU

0 -1 -1 0 1 -1 -2 -1 0 1 -1 0 0 0 1 -1 -1 -1 -1 0 -1 -1 -1 -2 -1

GS-1 -1 -1 -1 0 -2

-3-2 -1 0 -1 0 -1 0 0 0 0 0 1 1 0 -1 -1 -1 -1

Gu -3-2 -2 -2 -2

-7 -9 -7 -8 -70 0 -1 0 0 -2 -2 -2

-3-2 -9 -10 -9 -13 -9

HP-2 -2 -1 -1 0 -1 -1 0 0 1 -1 0 0 0 0 -1 -1 -1 -1 0 -1 -2 -2 -2 -2

Ko-2

-6 -4 -4 -2 -3 -3-2 -1 1 -1 -2

-3 -3-1 -2

-11-13-15 -4 4 6 9 11 11

Tr -3-2 -2 -1 -1

-7 -9 -5 -4 -31 1 1 1 2 -2 -2 -2 -1 -1 -4 -4 -6 -5 -7

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

24

FastutilGSCollec.

GuavaHPPC

KolobokeTrove

Page 42: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: HashMap

• Standard HashMap is a solidimplementation

• No superior alternatives on time

populate iterate contains remove copyFU

0 -1 -1 0 1 -1 -2 -1 0 1 -1 0 0 0 1 -1 -1 -1 -1 0 -1 -1 -1 -2 -1

GS-1 -1 -1 -1 0 -2

-3-2 -1 0 -1 0 -1 0 0 0 0 0 1 1 0 -1 -1 -1 -1

Gu -3-2 -2 -2 -2

-7 -9 -7 -8 -70 0 -1 0 0 -2 -2 -2

-3-2 -9 -10 -9 -13 -9

HP-2 -2 -1 -1 0 -1 -1 0 0 1 -1 0 0 0 0 -1 -1 -1 -1 0 -1 -2 -2 -2 -2

Ko-2

-6 -4 -4 -2 -3 -3-2 -1 1 -1 -2

-3 -3-1 -2

-11-13-15 -4 4 6 9 11 11

Tr -3-2 -2 -1 -1

-7 -9 -5 -4 -31 1 1 1 2 -2 -2 -2 -1 -1 -4 -4 -6 -5 -7

100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

55%

Std. HashMap$Node object consumes 32 bytes

• Fastutil provides a superior alternative on memory consumption

24

FastutilGSCollec.

GuavaHPPC

KolobokeTrove

Reduction of Collection Overhead

Page 43: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

25

Page 44: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

SmallerCollection Overhead

25

Page 45: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

Integer

(16 bytes)

int

(4 bytes)

SmallerCollection Overhead

25

Page 46: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

Integer

(16 bytes)

int

(4 bytes)

SmallerCollection Overhead

SmallerElement Footprint

25

Page 47: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

Integer

(16 bytes)

int

(4 bytes)

SmallerCollection Overhead

SmallerElement Footprint

25

Page 48: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Object vs Primitive Collections

• Reducing collection footprint: overhead + element footprint

int[]Object[]

Object collectionArrayList<>

Primitive collectionIntArrayList

Integer

(16 bytes)

int

(4 bytes)

SmallerCollection Overhead

SmallerElement Footprint

25

80%

Collection Footprint Reduction

Page 49: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: Primitive Collections

• We found superior alternatives to all three abstraction types • Data Type: Integer to int

• Performance varies considerably from distinct libraries for multiple reasons

For instance, ArrayList primitive implementations

Libs populate iterate contains copy

FU 3 3 32 2 -2 -2 -2 -1 -2 1 2

3 3 30 0 0 0 0

GS 3 3 32 2 1 1 1 1 1 2 2

3 3 40 0 0 0 0

HP2 1 2 2 2 1 1 1 1 2 1 2

3 3 3 -5 -7 -8 -9 -7

Tr 3 3 32 2 1 1 1 1 1 0 0 0 1 1

-3 -8 -9 -7 -4100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

26

FastutilGSCollec.

HPPCTrove

Page 50: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: Primitive Collections

• We found superior alternatives to all three abstraction types • Data Type: Integer to int

• Performance varies considerably from distinct libraries for multiple reasons

For instance, ArrayList primitive implementations

Libs populate iterate contains copy

FU 3 3 32 2 -2 -2 -2 -1 -2 1 2

3 3 30 0 0 0 0

GS 3 3 32 2 1 1 1 1 1 2 2

3 3 40 0 0 0 0

HP2 1 2 2 2 1 1 1 1 2 1 2

3 3 3 -5 -7 -8 -9 -7

Tr 3 3 32 2 1 1 1 1 1 0 0 0 1 1

-3 -8 -9 -7 -4100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

for(Object..)

5x slower than for(IntProcedure) commonly implemented

26

FastutilGSCollec.

HPPCTrove

Page 51: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternatives: Primitive Collections

• We found superior alternatives to all three abstraction types • Data Type: Integer to int

• Performance varies considerably from distinct libraries for multiple reasons

For instance, ArrayList primitive implementations

Libs populate iterate contains copy

FU 3 3 32 2 -2 -2 -2 -1 -2 1 2

3 3 30 0 0 0 0

GS 3 3 32 2 1 1 1 1 1 2 2

3 3 40 0 0 0 0

HP2 1 2 2 2 1 1 1 1 2 1 2

3 3 3 -5 -7 -8 -9 -7

Tr 3 3 32 2 1 1 1 1 1 0 0 0 1 1

-3 -8 -9 -7 -4100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M 100 1K 10K 100K 1M

for (int = offset; i-- > 0; )

30% more branch misses

for(Object..)

5x slower than for(IntProcedure) commonly implemented

26

FastutilGSCollec.

HPPCTrove

Page 52: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Superior Alternative: Primitive Collections

• GSCollections, Koloboke and Fastutil provide solid superior alternatives

ArrayList HashMap HashSet

80% 76% 84%

Footprint Reduction

Integer → int

Time impact

(1M elements)12x

+2x populate

+4x contains

+2x remove12x 12x

+2x populate

-2x iterate

+6x/+4x copy12x 12x

+4x populate

+4x/+3x iterate

+33x/+7x copy12x27

Page 53: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Summary

• There are performance opportunities on Alternative Collection Frameworks

• Time/memory improvement with moderate refactor effort

• We provide a Guideline to assist developers on:• Identifying superior alternative implementations

• Which scenarios an alternative could lead to a substantial performance improvement

• CollectionsBench is open-source and available at GitLab

28

Page 54: Empirical Study of Usage and Performance of Java Collections · PVS Empirical Study of Usage and Performance of Java Collections 1Diego Costa , 1Artur Andrzejak 1Janos Seboek 2David

Thank you for your time

Questions?

[email protected]

29