28
Do the Dependency Conflicts in My Project Matter? Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu and Shing-Chi Cheung 1 1 1 1 1 1* 2 2 2* 1. Northeastern University 2. The Hong Kong University of Science and Technology Ying Wang 2018-12-19

Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Do the Dependency Conflicts in My Project Matter?

Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu and Shing-Chi Cheung

1 1 1

1 1 1*

2 2

2*

1. Northeastern University2. The Hong Kong University of Science and Technology

Ying Wang

2018-12-19

Page 2: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Example 1

java –cp a.jar; b.jar …

foo.class foo.class

-1-

Page 3: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Example 2

lib1

lib2

-2-

Page 4: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

954(42%)

1003(44%)

1457(64%)Popular Java projects2289Projects contain the same library of different versions

Projects contain the duplicate classes in different libraries

Projects contain both conflicting classes and libraries

Observations

-3-

Page 5: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

-4-

Page 6: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Example 3

879 days!

176 downstream clients!

Maven-shade-plugin

-5-

Page 7: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Motivation

Dependency Conflict (DC) problem is very common in practice.

Most building tools do not guarantee loading the mostappropriate class for the client project.

Building tools do not differentiate benign from harmful (e.g., causing runtime exceptions) DC warnings.

-6-

Page 8: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Our work

Empirical study

Empirical studyManifestation Patterns

Fixing Patterns

Automated diagnosis

Empirical studyDetection

Assessing DCseverity levels

Evaluation

Empirical studyEffectiveness

Usefulness

-7-

Page 9: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study---Research questions

RQ1(Issue manifestation patterns):What are the common manifestations of DC issues? Are there patterns that can be extracted to enable automated detection of these problems?

RQ2(Issue fixing patterns):How do developers fix DC issues in practice? Are there factors that affect developers’ choices of different fixing solutions?

-8-

Page 10: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study---Data Collection

Java open source projects built by Maven from the Apache ecosystem are selected as the subjects for our empirical study, due to the following reasons:

Key words: 1) “library”, “dependency” or “compatibility”, etc.2) “conflict” or “NoSuchMethodError”, etc.

135 DC issues (128 of them have been fixed)

-9-

Page 11: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ1: Issue manifestation patterns

A. Conflicts in library versions

B. Conflicts in classes among libraries

C. Conflicts in classes between host project and libraries

A: 29%

B: 67%

C: 4%

-10-

Page 12: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ1: Issue manifestation patterns

A. Conflicts in library versions(39 out of 135 issues)

⚫ If there are multiple versions of the same library,according to Maven’s nearest wins strategy,Maven chooses the version that appears at thenearest to the root (host project) of thedependency tree.

If the host project references thefeatures only defined in theshadowed library (i.e., Lib2v2.0), aruntime exception will occur.

NoClassDefFoundError

NoSuchMethodError

System Failure

-11-

Page 13: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ1: Issue manifestation patterns

B. Conflicts in classes among libraries(90 out of 135 issues)

⚫ Based on the Maven’s first declaration wins

strategy, the duplicate classes within the firstdeclared library (i.e., lib2) will shadow the onesincluded in the others (lib1).

If the host project references thefeatures only defined in theshadowed classes (i.e., class A, B,C in Lib1), a runtime exceptionwill occur.

NoSuchMethodError

System Failure

-12-

Page 14: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ1: Issue manifestation patterns

C. Conflicts in classes between host project and libraries(6 out of 135 issues)

⚫ If the host project and its referenced library (i.e.,Lib1) include duplicate classes (i.e., A, B and C),then only those included in the library (i.e., Lib1)will be included during the packaging process.

The classes included in libraryLib1 shadowed those defined inthe host project, which leadedto a runtime failure.

NoSuchMethodError

System Failure

-13-

Page 15: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ1: Issue manifestation patterns

ReferencedLoaded

-14-

Page 16: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ2: Issue fixing patterns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability topackage the project in an Uber Jar, including itsthird party libraries. It will also shade (i.e., rename)the packages of some of the libraries.

Pattern 2: Adjusting the classpath orderof dependencies (42 out of 128 solutions)

Forcing a particular dependency order on theclasspath is a strategy commonly used bydevelopers for fixing DC issues at a relatively lowcost.

#HDFS-10570

HDFS

Netty 2.0

Hadoop Netty 2.8

Hdfsproxy

-15-

Page 17: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ2: Issue fixing patterns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability topackage the project in an Uber Jar, including itsthird party libraries. It will also shade (i.e., rename)the packages of some of the libraries.

Pattern 2: Adjusting the classpath order of dependencies (42 out of 128 solutions)

Forcing a particular dependency order on theclasspath is a strategy commonly used bydevelopers for fixing DC issues at a relatively lowcost.

Pattern 3: Harmonizing library versions (51 out of 128 solutions)

Solutions of this pattern upgrade or downgradesome of the JARs to resolve the versioninconsistencies.

-16-

Page 18: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Empirical study-RQ2: Issue fixing patterns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability topackage the project in an Uber Jar, including itsthird party libraries. It will also shade (i.e., rename)the packages of some of the libraries.

Pattern 2: Adjusting the classpath order of dependencies (42 out of 128 solutions)

Forcing a particular dependency order on theclasspath is a strategy commonly used bydevelopers for fixing DC issues at a relatively lowcost.

Pattern 3: Harmonizing library versions (51 out of 128 solutions)

Solutions of this pattern upgrade or downgradesome of the JARs to resolve the versioninconsistencies.

Pattern 4: Classloader customization (5 out of 128 solutions)

This solution uses dynamic module system frameworkssuch as OSGI and Wildfly, to allow different versions ofthe same libraries or classes coexist in one project bycreating multiple classloaders.

Pattern 5: Other workarounds (5 out of 128 solutions) The remaining issues are resolved in miscellaneous ways

-17-

Page 19: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Dependency conflict diagnosis

Manifestaionpatterns

Maintenance efforts on fixing solutions

Detect dependency

conflict issues

Assess their severity levels

-18-

Page 20: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Dependency conflict diagnosisLibrary DependencyManagement Script

Binary Code File

Lib2 Lib3 v1.0Lib2

Lib4

Lib5Lib3 v2.0

Extract library dependency tree

1 2

Loaded Shadowed Referenced

Lib3 v2.0 Lib3 v1.0

Analyze Relations Between Different Feature Set3

Assessing Warning Severity Levels

L1

4

L2 L3 L4

Identify Duplicate Libraries or Classes

Lib2Lib2

Lib4

Lib5

Lib3 v1.0

Lib3 v2.0

-19-

Page 21: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation

RQ3 (Effectiveness): How effective can Decca detect real DC issues and assess their severity levels?

RQ4 (Usefulness): Can Decca detect unknown DC issues in real-world projects and facilitate developers in diagnosing them?

-20-

Page 22: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation: Effectiveness of Decca

A high quality dataset containing high-severity (i.e., Level 3 and 4) and low-severity (i.e., Level 1 and 2) DC issues.

Subjects:

Assumption: Bugs are usually repaired within 2 years across different projects since they were introduced to the project

True Positive (TP) : the conflict identified as a high-severity issue (i.e., Level 3 or Level 4) is a high-severity issue.

False Positive (FP) : the conflict identified as a high-severity issue (i.e., Level 3 or Level 4) is a low-severity issue.

True Negative (TN) : the conflict identified as a low-severity issue (i.e., Level 1 or Level 2) is a low-severity issue.

False Negative (FN) : the conflict identified as a low-severity issue (i.e., Level 1 or Level 2) is a high-severity issue.

Precision = TP/(TP + FP)

Recall = TP/(TP + FN )

F-measure =2 × Precision × Recall /(Precision + Recall)

Precision : 0.923, Recall : 0.766 and F-measure : 0.837

-21-

Page 23: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation: Usefulness of Decca

ID ProjectSeverity level

L1 L2 L3 L41 Spark 40 1 0 02 Beam 17 2 0 03 Bahir 22 0 1 14 Wicketstuff/Core 16 1 0 05 Javasoze clue 18 1 0 06 ActiveMQ Artemis 24 0 0 07 Apex Core 34 0 0 08 Ignite 7 0 0 09 Wicket 2 0 0 0

10 Closure-Compiler 4 1 0 011 Orientdb 8 0 0 112 Cm 5 0 0 113 Brooklyn 20 0 1 014 CarbonData 25 4 0 015 Prestodb 16 1 0 016 Solr 10 1 0 017 Tomcat exporter 10 2 0 018 Hadoop Common 16 0 0 119 Oozie 25 0 1 020 Accumulo 33 1 0 021 Eclipse jetty 6 2 0 022 Parquet 2 1 0 023 Apex Malhar 34 1 0 024 Atlas 44 1 1 0

Decca successfully identified 466DC issues from 24 projects among all the 30 projects analyzed.

Results:Bug ID

SPARK-23509BEAM-3690BAHIR-159Issue #621Issue #61

----

Issue #2815Issue #8111

Issue #1BROOKLYN-581

CARBONDATA-2169Issue #29

DATASOLR-447Issue #8

HADOOP-15261OOZIE-3185

ACCUMULO-4812Issue #2232

PARQUET-1236APEXMALHAR-2556

ATLAS-2437

438 (93.9%) of them are at Level 1, 20 (4.2%) of them are at Level 2, 4 (0.08%) of them are at Level 3,4 (0.08%) of them are at Level 4.

Bug reportSeverity Root cause

Fixing suggestions

-23-

Page 24: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation: Usefulness of Decca

ID ProjectSeverity level

L1 L2 L3 L41 Spark 40 1 0 02 Beam 17 2 0 03 Bahir 22 0 1 14 Wicketstuff/Core 16 1 0 05 Javasoze clue 18 1 0 06 ActiveMQ Artemis 24 0 0 07 Apex Core 34 0 0 08 Ignite 7 0 0 09 Wicket 2 0 0 0

10 Closure-Compiler 4 1 0 011 Orientdb 8 0 0 112 Cm 5 0 0 113 Brooklyn 20 0 1 014 CarbonData 25 4 0 015 Prestodb 16 1 0 016 Solr 10 1 0 017 Tomcat exporter 10 2 0 018 Hadoop Common 16 0 0 119 Oozie 25 0 1 020 Accumulo 33 1 0 021 Eclipse jetty 6 2 0 022 Parquet 2 1 0 023 Apex Malhar 34 1 0 024 Atlas 44 1 1 0

Results:Bug ID

SPARK-23509BEAM-3690BAHIR-159Issue #621Issue #61

----

Issue #2815Issue #8111

Issue #1BROOKLYN-581

CARBONDATA-2169Issue #29

DATASOLR-447Issue #8

HADOOP-15261OOZIE-3185

ACCUMULO-4812Issue #2232

PARQUET-1236APEXMALHAR-2556

ATLAS-2437

11 bugs (55%) were confirmed by developers as real issues within a few days;

6 out of the 11 confirmed bugs(55%) were quickly fixed using our suggestions

3 confirmed bugs (30%) are in the process of being fixed

2 confirmed bugs are to be resolved by the developers of upstream third party libraries

-24-

Page 25: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation: Usefulness of DeccaID Project Category Revision

Size(LOC)

Star

1 Spark Big data 8077bb0 130.0k 162622 Beam Big data a750128 337.0k 17223 Bahir Extension tool 6ea42a8 0.9k 1524 Wicketstuff/Core Container 5cc41f5 228.5k 3145 Javasoze clue Command 23c9da4 2.8k 1036 ActiveMQ Artemis Network server f6c5408 557.8k 2717 Apex Core Platform 4fb580f 87.0k 2778 Ignite OSGI 4e86660 2218.4k 15059 Wicket Web framework b728c69 352.5k 412

10 Closure-Compiler JS compiler 900251b 427.6k 400511 Orientdb Database 56ab1ac 496.3k 336612 Cm Web application 9e6f45b 19.k 1213 Brooklyn Cloud 48dbcc3 276.1k 6914 CarbonData Big data 9f2884a 127.9k 61215 Prestodb Big data 89fed3a 0.8k 1516 Solr Network Server d32048c 31.7k 29517 Tomcat exporter Exporter 70ac377 0.9k 1918 Hadoop Common Database 1e85a99 2042.8.k 588319 Oozie Big data 9e662c7 198.6k 36420 Accumulo Database d98843b 563.8k 34321 Eclipse jetty Debugging b71cd70 375.9k 186822 Parquet Big data b82d962 0.9k 55023 Apex Malhar Big data 0d98d05 243.7k 11024 Atlas Framework 6770091 123.4k 33

Popular

Subjects:Java

Maven platform

-22-

Page 26: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Evaluation: Feedback from developers

“This seems like a handy report, is the tool you used to identify

this error open source? I am curious to give it a try (also for

other stuff).”

------------BEAM-3690

“Related, but not the same: I have tried turning on dependency

convergence in the Maven-enforcer-plugin. We need the same for

gradle to ensure long-term health and protect from regressions.

Maybe the tool that generated this fine-grained conflicts report

can also fail the build? That would be nice.”

------------SPARK-23509

-25-

Page 27: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Conclusion

First empirical study of DC issues between host project and third-party libraries.

Formulation of the dependency conflict problem and its root cause.

An automated technique Decca to detect DC issues and assess their severity levels.

-26-

Page 28: Do the Dependency Conflicts in My Project Matter?sccpu2.cse.ust.hk/castle/materials/fse18-ying-slides.pdfEmpirical study---Data Collection Java open source projects built by Maven

Thank you!