29
Polyglot An Extensible Compiler Framework for Java Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University

An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

PolyglotAn Extensible Compiler Framework for Java

Nathaniel NystromMichael R. Clarkson

Andrew C. Myers

Cornell University

Page 2: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

2

Language extension

� Language designers often create extensions to existing languages� e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif,

ArchJava, ESCJava, Polyphonic C#, ...

� Want to reuse existing compiler infrastructure as much as possible

� Polyglot is a framework for writing compiler extensions for Java

Page 3: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

3

Requirements� Language extension

� Modify both syntax and semantics of the base language

� Not necessarily backward compatible� Goals:

� Easy to build and maintain extensions� Extensibility should be scalable

� No code duplication

� Compilers for language extensions should be open to further extension

Page 4: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

4

Rejected approaches� In-place modification

� Macro languages� Limited to syntax extensions� Semantic checks after macro expansion

basecompiler

1.0

bug fixes &upgrades

basecompiler

2.0

copy &

modify

extensioncompiler

1.0

copy &

modify(again)

extensioncompiler

2.0

bug fixes &upgrades (again)

Page 5: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

5

Polyglot

� Base compiler is a complete Java front end� 25K lines of Java

� Name resolution, inner class support, type checking, exception checking, uninitialized variable analysis, unreachable code analysis, ...

� Can reuse and extend through inheritance

Page 6: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

6

Scalable extensibility

� Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the language.

constant foldingexception checking

type checkingname resolution

=e.fxif+

Page 7: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

6

Scalable extensibility

� Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the language.

constant foldingexception checking

type checkingname resolution

=e.fxif+

constant foldingexception checking

type checkingname resolution

=e.fxif+

Page 8: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

6

Scalable extensibility

� Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the language.

constant foldingexception checking

type checkingname resolution

=e.fxif+

constant foldingexception checking

type checkingname resolution

=e.fxif+

Page 9: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

6

Scalable extensibility

� Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the language.

constant foldingexception checking

type checkingname resolution

=e.fxif+

constant foldingexception checking

type checkingname resolution

=e.fxif+

Page 10: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

6

Scalable extensibility

� Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the language.

constant foldingexception checking

type checkingname resolution

=e.fxif+

constant foldingexception checking

type checkingname resolution

=e.fxif+

Page 11: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

7

Non-scalable approaches

Visitors

pass as ASTnode method(�naive OO�)

Polyglot

Easy to add or modifyPasses AST nodes

!"

!

"

!

!

Using

Page 12: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

8

Javasource

Javatarget

Javaparser

Codegenerator

AST rewritingpasses

Base Polyglot compiler

Polyglot architecture

Extsource

Javatarget

Extparser

Codegenerator

AST rewritingpasses

Ext2

sourceJava

target

Ext2

parserCode

generatorAST rewriting

passes

Page 13: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

9

Architecture details

� Parser written using PPG� Adds grammar inheritance to Java CUP

� AST nodes constructed using a node factory� Decouples node types from implementation

� AST rewriting passes:� Each pass lazily creates a new AST� From naive OO: traverse AST invoking a method

at each node� From visitors: AST traversal factored out

Page 14: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

10

Example: PAO

� Primitive types as subclasses of Object� Changes type system, relaxes Java syntax� Implementation: insert boxing and unboxing

code where needed

HashMap m;m.put(�two�, 2);int v = (int) m.get(�two�);

HashMap m;m.put(�two�, new Integer(2));int v = ((Integer) m.get(�two�)).intValue();

Page 15: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

11

PAO implementation� Modify parser and type-checking pass to

permit e instanceof int� Parser changes with PPG:

include �java.cup�

drop { rel_expr ::= rel_expr INSTANCEOF ref_type }

extend rel_expr ::= rel_expr:a INSTANCEOF type:b

{: RESULT = node_factory.Instanceof(a, b); :}

� Add one new pass to insert boxing and unboxing code

Page 16: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

12

Implementing a new pass� Want to extend Node interface with rewrite() method

� Default implementation: identity translation� Specialized implementations: boxing and unboxing

� Mixin extensibility: extensions to a base class should be inherited by subclasses

codeGen()typeCheck()

condthenelse

codeGen()typeCheck()

lhsrhs

codeGen()typeCheck()

Node

If Add

rewrite()codeGen()

typeCheck()

rewrite()

condthenelse

codeGen()typeCheck()

rewrite()

lhsrhs

codeGen()typeCheck()

Page 17: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

13

Inheritance is inadequate

codeGen()typeCheck()

condthenelse

codeGen()typeCheck()

lhsrhs

codeGen()typeCheck()

Node

If Add rewrite()codeGen()

typeCheck()

rewrite()

condthenelse

codeGen()typeCheck()

rewrite()

lhsrhs

codeGen()typeCheck()

PaoNode

PaoIf PaoAdd

Page 18: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

14

Inheritance is inadequate

codeGen()typeCheck()

Node

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

rewrite()codeGen()

typeCheck()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

PaoNode

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

rewrite()

typeCheck()codeGen()

Page 19: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

15

Extension objects

Use composition to mixin methods and fields into AST node classes

ext

codeGen()typeCheck()

extcondthenelse

codeGen()typeCheck()

extlhsrhs

codeGen()typeCheck()

Node

IfAdd

rewrite()ext

PaoExt

PAO extension objects; installed into all nodes

by node factory

null

Page 20: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

16

Extension objects

Extension objects have their own ext field to leave extension open

ext

codeGen()typeCheck()

extcondthenelse

codeGen()typeCheck()

extlhsrhs

codeGen()typeCheck()

Node

IfAdd

rewrite()ext

PaoExt

ext_type_infotypeCheck()

ext

null

Page 21: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

17

Method invocation� A method may be implemented in the node or in

any one of several extension objects.

� Extension should call node.ext.ext.typeCheck()

� Base compiler should call: node.typeCheck()

� Cannot hardcode the calls

ext

codeGen()typeCheck()

Node

rewrite()ext

PaoExt

ext_type_infotypeCheck()

ext

null

Page 22: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

18

Delegate objects� Each node & extension object has a del field� Delegate object implements same interface as node or ext

� Directs call to appropriate method implementation� Ex: node.del.typeCheck()

� Ex: node.ext.del.rewrite()

� Run-time overhead < 2%

delext

codeGen()typeCheck()

Nodedel

rewrite()ext

PaoExt

ext_type_info

del

typeCheck()ext

codeGen()typeCheck() { node.ext.ext.typeCheck() }

{ node.codeGen() }

JavaDel

null

Page 23: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

19

Scalable extensibility� To add a new pass:

� Use an extension object to mixin default implementation of the pass for the Node base class

� Use extension objects to mixin specialized implementations as needed

� To change the implementation of an existing pass� Use delegate object to redirect to method providing

new implementation

� To create an AST node type:� Create a new subclass of Node� Or, mixin new fields to existing node using an extension

object

Page 24: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

20

Polyglot family tree

Polyglot base (Java)

parameterizedtypes

Coffer PolyJ Jif

PAO

Jif/split

JMatch covariantreturn

Page 25: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

21

Results

� Can build small extensions in hours or days� 10% of base code is interfaces and factories

78129KJif65108KJMatch6099KJif/split

23.2Kparameterized types

4879KPolyJ

801

3.614

100% of Base

132Kjavac 1.11.6Kcovariant return

6.1KPAO24KCoffer

166KPolyglot base (Java)# TokensExtension

Page 26: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

22

Related work

� Other extensible compilers� e.g., CoSy, SUIF� e.g., JastAdd, JaCo

� Macros� e.g., EPP, Java Syntax Extender, Jakarta� e.g., Maya

� Visitors� e.g., staggered visitors, extensible visitors

Page 27: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

23

Conclusions

� Several Java extensions have been implemented with Polyglot

� Programmer effort scales well with size of difference with Java

� Extension objects and delegate objects provide scalable extensibility

� Download from:http://www.cs.cornell.edu/projects/polyglot

Page 28: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

24

AcknowledgmentsJMatchBrandon Bray

Jif, Jif/splitLantian Zheng

Jif, Jif/splitSteve Zdancewic

JLtoolsDan Spoonhower

JLtoolsNaveen Sastry

JMatchJed Liu

JLtools, PolyJAleksey Kliger

JMatchMatt Harren

Jif, Jif/split, covariant returnSteve Chong

PPGMichael Brukman

http://www.cs.cornell.edu/projects/polyglot

Page 29: An Extensible Compiler Framework for Java · Ł Polyglot is a framework for writing compiler extensions for Java. 3 Requirements Ł Language extension Ł Modify both syntax and semantics

Questions?