37
The Binary Compatibility Challenge Martin Odersky Typesafe and EPFL

Scalax

Embed Size (px)

Citation preview

Page 1: Scalax

The Binary Compatibility Challenge Martin Odersky

Typesafe and EPFL

Page 2: Scalax

The Problem in a Nutshell

•  Binary compatibility has been an issue ever since Scala became popular.

•  Causes grief when building, friction for upgrading.

•  The community has learned to deal with this by becoming more conservative.

•  But this makes it harder to innovate and improve.

Break your client’s builds vs Freeze, and stop improving

Is there no third way?

2

Page 3: Scalax

What is Binary Compatibility?

Binary compatibility ≠ Source compatibility Source & binary incompatible

object  Client  {        msg.length  }  

object  Server  {        val  msg  =  “abc”  }  

object  Server  {        val  msg  =  Some(“abc”)  }  

Page 4: Scalax

What is Binary Compatibility?

Binary compatibility ≠ Source compatibility Source incompatible, binary compatible:

object  Client  {        import  a,  b      val  x:  String  =  1  }  

object  a  {      implicit  def  f(x:  Int):  String  =          x.toString    }  object  b  {      implicit  def  g(x:  Int):  String  =  ”abc”  }  

object  a  {      implicit  def  f(x:  Int):  String  =          x.toString    }  object  b  

Page 5: Scalax

What is Binary Compatibility?

Binary compatibility ≠ Source compatibility Source compatible, binary incompatible: �èNeed to recompile on Java 1-7

object  Apple  extends  Edible  {        def  joules  =  500000  }  

trait  Edible  {      def  joules:  Double  }  

trait  Edible  {      def  joules:  Double      def  calories  =  joules  *  4.184  }  

In Java 8 it’s more complex but fundamentally the same.

Page 6: Scalax

What is Binary Compatibility?

Binary compatibility ≠ Source compatibility Source compatible, binary incompatible: �èNeed to recompile on Java 1-7

object  Apple  extends  Edible  {        def  joules  =  500000  }  

trait  Edible  {      def  joules:  Double  }  

trait  Edible  {      def  joules:  Double      def  calories  =  joules  *  4.184  }  

trait  Edible  {      def  joules:  Double      def  calories:  Double  }  object  Edible$class  {      def  calories($this:  Edible):  Double  =            $this.joules  *  4.184  }

object  Apple  extends  Edible  {      def  joules  =  500000.0      def  calories:  Double  =          Edibl$class.calories(this)  }

In Java 8 it’s more complex but fundamentally the same.

Page 7: Scalax

trait  Edible  {      def  joules:  Double      lazy  val  def  calories  =            joules  *  4.184  }  

Other Issues

Compiler optimizations and bug fixes can affect binary compatibility. Example: Implementation of lazy values.

object  Apple  extends  Edible  {        def  joules  =  500000  }  

object  Apple  extends  Edible  {      def  joules  =  500000.0      private  var  initFlags:  BitSet      private  var  cals:  Int  =  _      def  calories  =  {          if  (!initFlags(N))  {              cals  =  Edible$class.initCals(this)              initFlags(N)  =  true          }          cals        }}

Previously: 1 bit per lazy val To avoid deadlocks: 2 bits. è all offsets change!

Page 8: Scalax

Compiler Pipeline

8

Parser  

Typer  

SyntheticMethods  

SuperAccessors  

RefChecks  

ElimRepeated  

ElimLocals  

ExtensionMethods  

TailRec  

PatternMatcher  

ExplicitOuter  

Erasure  

Mixin  

Memoize  

LazyVals  

CapturedVars  

Constructors  

LambdaLift  

Flatten  

RestoreScopes  

Cleanup  

GenBCode  more phases

Source

Symbols

JVM Byte-code

Lots of scope for things to go wrong!

Page 9: Scalax

Where It Breaks

C.class  

A.class  

C.class  

C.scala C.scala

(binary incompatible source change)

Page 10: Scalax

Why Is This Such a Big Problem?

Scala  Library  2.10  

DustyLegacyLib  

MyApplication  

Scala  Library  2.11  

Seq.scala Seq.scala

(binary incompatible source change)

(too old, can’t rebuild)

X can’t upgrade to Scala 2.11!

Page 11: Scalax

Not Just A Problem with Scala-Library

Akka  3.2  

DustyLegacyLib  

MyApplication  

Akka  3.3  

Actor.scala Actor.scala

(binary incompatible source change)

(can’t rebuild)

X can’t upgrade to Akka 3.3!

Page 12: Scalax

Not Just A Problem with Scala-Library

shapeless  2.0  

DustyLegacyLib  

MyApplication  

shapeless  2.1  

unions.scala unions.scala

(binary incompatible source change)

(can’t rebuild)

X can’t upgrade to shapeless 2.1!

Page 13: Scalax

Dealing With It So Far

“MiMa” tool can detect binary incompatibilities. Scala release policy:

–  Minor versions need to be (forwards and backwards) binary compatible.

–  Major versions are allowed to break binary compatibility –  Major versions are released rarely (+18 months between them).

Problem:

–  3rd party libraries need similar policies but often don’t enforce them.

–  Innovation is stifled. –  Simple fixes have to wait for a long time to get in. –  Lots of dev cycles spent on dealing with binary compatibility.

Page 14: Scalax

What Do Others Do?

Java: •  Language close to JVM bytecode. •  Innovation happens on JVM level.

–  Either in the JVM itself or through reflection. –  E.g. Java 8 lambdas, default methods.

•  Libraries are frozen when they appear. –  E.g. java.util.Date    

•  Language is restricted in terms of extensibility –  E.g. nterface1, interface2, ... interface7 in Eclipse.

Page 15: Scalax

What Do Others Do?

OSGI: Allow multiple versions of a library in an application

•  Fragile, requires serious classloader magic. •  Few frameworks beyond Eclipse have bought in.

Scala  Library  2.10  

DustyLegacyLib  

MyApplication  

Scala  Library  2.11  

rebuild MyApplication  

Page 16: Scalax

What Do Others Do?

C/C++: •  Relies on Linker for more flexibility in interfaces. •  Not that great a story either (c.f. DLL Hell).

Page 17: Scalax

What Do Others Do?

Clojure: •  Builds from source.

Page 18: Scalax

What Do Others Do?

Javascript: •  Builds from source.

Page 19: Scalax

What Do Others Do?

Python: •  Builds from source.

Page 20: Scalax

What Do Others Do?

Ruby: •  Builds from source.

Page 21: Scalax

What Do Others Do?

Go: •  Builds from source.

Page 22: Scalax

Why Can’t Scala Build from Source?

No standard Build Tool Should we standardize on SBT, Gradle, Maven, Ivy, Ant?

Reproducible builds are rare. Chicken and egg problem:

Because everyone is used to binary builds, nobody* invests in making builds reproducible *Not quite true: Typesafe has invested in community build, can now build more than 1M lines of community projects. But it’s a huge effort.

Page 23: Scalax

What We Need

•  An interchange format that captures the essence of Scala dependencies.

•  This cannot be the JVM bytecode format

•  Nor can it be source

23

Page 24: Scalax

The Idea

Use Typed Trees as an interchange format.

–  More robust than source.

–  More stable than JVM bytecode.

–  Efficient?

Page 25: Scalax

New Compiler Pipeline

25

Parser  

Typer  

SyntheticMethods  

SuperAccessors  RefChecks  

ElimRepeated  

ElimLocals  

ExtensionMethods  

TailRec  

PatternMatcher  

ExplicitOuter  

Erasure  

Mixin  

Memoize  

LazyVals  

CapturedVars  

Constructors  

LambdaLift  

Flatten  

RestoreScopes  

Cleanup  

GenBCode  

Typed Trees

Typed Trees

Source

Bytecode Frontend Backend

Page 26: Scalax

How To Build

C.class  |  CTree  |  HashC1  

B.class  |  BTree  |  HashB1  

A.class  |  ATree  |  HashA1  

C.class  |  CTree  |  HashC2  

B.class  |  BTree  |  HashB2  

A.class  |  ATree  |  HashA2  

C.scala C.scala

(source change)

(rebuild from BTree)

(rebuild)

Page 27: Scalax

More Robust Than Source

27

Parser  

Typer  

SyntheticMethods  

SuperAccessors  

Typed Trees

Source

Frontend

Ø  Resolve Names Ø  scan packages Ø  handle imports Ø  establish implicit scopes

Ø  Resolve Overloading Ø  Find Implicits Ø  Apply Conversions Ø  Infer Type Parameters Ø  Assign Types to Trees

A lot can go wrong here!

(5374 lines)

Page 28: Scalax

More Robust Than Source

28

RefChecks  

ElimRepeated  

ElimLocals  

ExtensionMethods  

TailRec  

PatternMatcher  

ExplicitOuter  

Erasure  

Mixin  

Memoize  

LazyVals  

CapturedVars  

Constructors  

LambdaLift  

Flatten  

RestoreScopes  

Cleanup  

GenBCode  

Typed Trees

Bytecode Backend

Assign Types

(311 lines)

Page 29: Scalax

More Resilient Than Bytecode

Can –  add fields and methods to traits –  add lazy vals anywhere –  change compilation scheme in any way necessary. None of these would be binary compatible!

Can also –  add or remove implicits –  add methods anywhere –  change imports All of these could be source incompatible!

Page 30: Scalax

Efficient?

Can typed trees be efficient enough to build million+ line systems? Possible issues: •  Size of trees

–  on disk –  in memory

•  Transformation time

30

Page 31: Scalax

Potential Issue: Tree Size

xs.filter(_  >=  0)              

becomes: Apply  

Select  

Ident   “filter”  

“xs”  

::  

Block  

DefDef  

“anonfun”   ::  

::  

ValDef  

“$x”  

Nil  

Nil  

Nil  

TypeTree:  Int  

Apply  

Select  

Ident  

“$x”  

“>=”  Literal  

0  

Literal  

“anonfun”  

::  Nil  16 Nodes,

not counting types

17 chars

Page 32: Scalax

Back of the Envelope Calculation:

16 nodes Average size of node: 32 bytes 512 bytes total. Double that to include type info. è 16 bytes source à 1KB tree (factor 64 blow-up). For a 1M line system 30MB source à 2GB trees.

32

Page 33: Scalax

Apply  (34)    SelectTermWithSig  (9)      Ident  (3)          “xs”      “filter”      “Function1  -­‐>  Boolean”  Closure  (23)      ParamDef  (7)            “x”          TypeRef  “scala.Int”      Apply  (14)          SelectTermWithSig  (9)              Ident  (3)                  “x$”              “>=“              “Integer  -­‐>  Boolean”          Literal  (3)        0            

A More Compact Representation

33

Still navigable,

because inner nodes contain size of total tree derived from them

Types or symbols given at the leaves.

Types of inner nodes are reconstituted using the TypeAssigner.

Page 34: Scalax

Speed

Transformation + byte-code generation amounts to ~ 60% of total compile time. We can speed this up by

–  fusing phases, reducing amount of intermediate trees, –  using a fast type assigner, instead of a slow typer, –  building different files in parallel.

Besides, can use incremental compilation. –  Compile only this units that depend on changed libraries. –  Need to do that only once.

34

Page 35: Scalax

Other Benefits

Optimization –  Typed trees are a great format for interprocedural analyses –  Inlining across compilation units made simple –  Inlining without binary compatibility issues

Program Analysis –  Types trees are close to source, but easy to traverse –  Ideal for context-dependent program analyses such as FindBugs –  Ideal for instrumentation

Portability –  Typed trees allow retargeting to different backends, as long as

dependencies exist. –  Allow libraries to be used on JVM, JS, LLVM... without needing

explicit recompilation. 35

Page 36: Scalax

Common Intermediate Format

36

 

New  Backend  

 

 

dotc  Frontend  

Bytecode

 

scalac  Frontend  

GenBCode  

 

Old  Backend  

 

Page 37: Scalax

Conclusion

37

Typed trees

can fix the binary

compatibility problem and they

offer lots of other benefits, too.

Let’s start the work to make them

real!