Polyglot An Extensible Compiler Framework for Java

  • Published on
    30-Dec-2015

  • View
    25

  • Download
    0

DESCRIPTION

Polyglot An Extensible Compiler Framework for Java. Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University. Language extension. Language designers often create extensions to existing languages e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif, ArchJava, ESCJava, Polyphonic C#, ... - PowerPoint PPT Presentation

Transcript

  • PolyglotAn Extensible Compiler Framework for JavaNathaniel NystromMichael R. ClarksonAndrew C. Myers

    Cornell University

  • Language extensionLanguage designers often create extensions to existing languagese.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif, ArchJava, ESCJava, Polyphonic C#, ...Want to reuse existing compiler infrastructure as much as possible

    Polyglot is a framework for writing compiler extensions for Java

  • RequirementsLanguage extensionModify both syntax and semantics of the base languageNot necessarily backward compatibleGoals:Easy to build and maintain extensionsExtensibility should be scalableNo code duplicationCompilers for language extensions should be open to further extension

  • Rejected approachesIn-place modification

    Macro languagesLimited to syntax extensionsSemantic checks after macro expansion

  • PolyglotBase compiler is a complete Java front end25K lines of JavaName resolution, inner class support, type checking, exception checking, uninitialized variable analysis, unreachable code analysis, ...Can reuse and extend through inheritance

  • Scalable extensibility

    Most compiler passes are sparse:AST NodesPassesChanges to the compiler should be proportional to changes in the language.

    +ifxe.f=name resolutiontype checkingexception checkingconstant folding

    +ifxe.f=name resolutiontype checkingexception checkingconstant folding

    +ifxe.f=name resolutiontype checkingexception checkingconstant folding

    +ifxe.f=name resolutiontype checkingexception checkingconstant folding

    +ifxe.f=name resolutiontype checkingexception checkingconstant folding

  • Non-scalable approachesVisitorspass as ASTnode method(naive OO)PolyglotEasy to add or modifyPassesAST nodesUsing

  • Polyglot architecture

  • Architecture detailsParser written using PPGAdds grammar inheritance to Java CUPAST nodes constructed using a node factoryDecouples node types from implementationAST rewriting passes:Each pass lazily creates a new ASTFrom naive OO: traverse AST invoking a method at each nodeFrom visitors: AST traversal factored out

  • Example: PAOPrimitive types as subclasses of ObjectChanges type system, relaxes Java syntaxImplementation: insert boxing and unboxing code where neededHashMap m;m.put(two, 2);int v = (int) m.get(two);HashMap m;m.put(two, new Integer(2));int v = ((Integer) m.get(two)).intValue();

  • PAO implementationModify parser and type-checking pass to permit e instanceof intParser changes with PPG:include java.cupdrop { rel_expr ::= rel_expr INSTANCEOF ref_type }extend rel_expr ::= rel_expr:a INSTANCEOF type:b {: RESULT = node_factory.Instanceof(a, b); :}Add one new pass to insert boxing and unboxing code

  • Implementing a new passWant to extend Node interface with rewrite() methodDefault implementation: identity translationSpecialized implementations: boxing and unboxingMixin extensibility: extensions to a base class should be inherited by subclassesNodeIfAdd

    typeCheck()codeGen()

    condthenelsetypeCheck()codeGen()

    lhsrhstypeCheck()codeGen()

    typeCheck()codeGen()rewrite()

    condthenelsetypeCheck()codeGen()rewrite()

    lhsrhstypeCheck()codeGen()rewrite()

  • Inheritance is inadequateNodeIfAddPaoNodePaoIfPaoAdd

    typeCheck()codeGen()

    condthenelsetypeCheck()codeGen()

    lhsrhstypeCheck()codeGen()

    typeCheck()codeGen()rewrite()

    condthenelsetypeCheck()codeGen()rewrite()

    lhsrhstypeCheck()codeGen()rewrite()

  • Inheritance is inadequateNodePaoNode

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

    typeCheck()codeGen()rewrite()

  • Extension objectsUse composition to mixin methods and fields into AST node classesNodeIfAddPaoExtPAO extension objects; installed into all nodes by node factorynull

    exttypeCheck()codeGen()

    extcondthenelsetypeCheck()codeGen()

    extlhsrhstypeCheck()codeGen()

    extrewrite()

  • Extension objectsExtension objects have their own ext field to leave extension openNodeIfAddPaoExtnull

    exttypeCheck()codeGen()

    extcondthenelsetypeCheck()codeGen()

    extlhsrhstypeCheck()codeGen()

    extrewrite()

    exttypeCheck()ext_type_info

  • Method invocationA method may be implemented in the node or in any one of several extension objects.Extension should call node.ext.ext.typeCheck()Base compiler should call: node.typeCheck()Cannot hardcode the callsNodePaoExtnull

    exttypeCheck()codeGen()

    extrewrite()

    exttypeCheck()ext_type_info

  • Delegate objectsEach node & extension object has a del fieldDelegate object implements same interface as node or extDirects call to appropriate method implementationEx: node.del.typeCheck()Ex: node.ext.del.rewrite()Run-time overhead < 2%

    NodePaoExt{ node.ext.ext.typeCheck() }{ node.codeGen() }JavaDelnull

    delexttypeCheck()codeGen()

    delextrewrite()

    delexttypeCheck()ext_type_info

    typeCheck()codeGen()

  • Scalable extensibilityTo add a new pass:Use an extension object to mixin default implementation of the pass for the Node base classUse extension objects to mixin specialized implementations as neededTo change the implementation of an existing passUse delegate object to redirect to method providing new implementationTo create an AST node type:Create a new subclass of NodeOr, mixin new fields to existing node using an extension object

  • Polyglot family treePolyglot base (Java)parameterizedtypesCofferPolyJJifPAOJif/splitJMatchcovariantreturn

  • ResultsCan build small extensions in hours or days10% of base code is interfaces and factories

    Extension# Tokens% of BasePolyglot base (Java)166K100 Jif129K78 JMatch108K65 Jif/split99K60 PolyJ79K48 Coffer24K14 PAO6.1K3.6 parameterized types3.2K2 covariant return1.6K1javac 1.1132K80

  • Related workOther extensible compilerse.g., CoSy, SUIFe.g., JastAdd, JaCoMacrose.g., EPP, Java Syntax Extender, Jakartae.g., MayaVisitorse.g., staggered visitors, extensible visitors

  • ConclusionsSeveral Java extensions have been implemented with PolyglotProgrammer effort scales well with size of difference with JavaExtension objects and delegate objects provide scalable extensibilityDownload from:http://www.cs.cornell.edu/projects/polyglot

  • Acknowledgmentshttp://www.cs.cornell.edu/projects/polyglot

    Brandon BrayJMatchMichael BrukmanPPGSteve ChongJif, Jif/split, covariant returnMatt Harren JMatchAleksey Kliger JLtools, PolyJJed LiuJMatchNaveen Sastry JLtoolsDan Spoonhower JLtoolsSteve ZdancewicJif, Jif/splitLantian ZhengJif, Jif/split

  • Questions?

  • Mixin extensibilityNodeIfAddInheritance does not provide mixin extensibility:when a base class is extended, subclasses shouldinherit the changes

    typeCheck()codeGen()

    condthenelsetypeCheck()codeGen()

    lhsrhstypeCheck()codeGen()

    typeCheck()codeGen()rewrite()

    condthenelsetypeCheck()codeGen()rewrite()

    lhsrhstypeCheck()codeGen()rewrite()

  • Other Polyglot featuresQuasi-quoting libraryUseful for translation from extension language AST to base language or intermediate language ASTqqStmt(if (%e.relabelsTo(%e)) %s; else %s;, new Object[] { L, Li, then_body, else_body });Automatic separate compilationSerialize type information and store in the ASTEncoded into the class file via javacExtracted from class file using reflectionData-flow analysis framework

  • PAO rewritingrewrite(ts) called for each AST node:

    class PaoExt extends Ext { Node rewrite(PaoTypeSystem ts) { return node(); } }

    class PaoInstanceofExt extends PaoExt { Node rewrite(PaoTypeSystem ts) { Instanceof e = (Instanceof) node(); Type rtype = e.compareType(); // e.g., e instanceof int e instanceof Integer if (rtype.isPrimitive()) return e.compareType(ts.boxedType(rtype)); else return n; } }

  • Node factoriesEach extension has a node factory (nf)To create a node of type T, call method nf.T()T() may return an extension-specific subclass of TT() attaches the extension and delegate objects to T via a call to extT()Mixin extensibility: if T is a subclass of S, then extT() will call extS()

  • ResultsCan build small extensions in hours, days10% of base compiler code is interfaces and factories

    ExtensionLexerParserTotal% of Basejavac 1.1 (excl. bytecode asm)119K72Polyglot base2.7K11K166K100Jif2.7K5.4K129K78JMatch2.7K14K108K65PolyJ2.7K3.5K79K48Coffer2.7K2.7K24K14PAO2.7K1696.1K3.6Param003.2K2covariant return001.6K1

  • Why not output bytecode?Wanted to be able to read the outputThe symmetry is satisfyingLimitations of Java as a target languageScoping rules sometimes make it difficult to output Java code, especially with inner classesLack of goto can make generated control flow inefficient

  • Hello. Im Nate Nystrom and Im going to talk about Polyglot, which isA framework we developed for writing compilers for extensions to Java.This is joint work with Michael Clarkson and Andrew Myers at Cornell.When creating a new language, language designers often base the new language on an existing one.The most famous example is probably C++, but there are thousands of research languages and domain specific languages, a few of which are listed here.When writing a compiler for these languages, we want as much as possible to reuse the existing compiler infrastructure. Polyglot is a framework that allows you to do this for extensions to Java.These extensions may modify both the syntax and semantics of the base language.For instance, an extension may add new statement forms or new types.But, these extensions are not necessarily backward compatible.For example for teaching purposes, we may wish to remove some of the more confusing features of the base language, inner classes for example. Compilers for extension languages should not only be easy to build and maintain, but also it should feel like a real compiler.Users of the compiler should not see whats going on under the hood.Furthermore, we want the framework to provide what we call scalable extensibility:A small change the base language should not require large amounts of code to implement.In particular, writing an extension should not require any code duplication.Finally, compilers produced by the framework should themselves be extensible.One obvious way to build a compiler for a language extension is to take a compiler for an existing language and modify it in-place.This approach has some serious maintenance problems.Because the code for the extension and base compiler diverge, bug fixes and other enhancements applied to the base compiler are not automatically picked up by the extension compiler.Another approach one might use is macro languages.These are great for writing syntactic extensions to the base language,But it is difficult to make large scale semantic changes to the language.Furthermore, macro languages do no semantic checking.Semantic checking is performed after macros are expanded, which may result in error messages that are difficult to understand.Polyglot does not have this problem because the framework provides a complete Java front end.Polyglot provides all the semantic checking needed to Java and extensions can reuse and extend this code.

    The primary challenge with designing Polyglot was to provide scalable extensibility.Most compiler passes are sparse, meaning each pass needs to interact with only a few abstract syntax nodes.For instance, a name resolution pass need only consider AST nodes that contain identifiers.An exception checking pass need only consider nodes that can throw or catch exceptions.Because passes are sparse, when we make add or modify an AST node, for instance the addition operator,We should not have to touch the code for all passes, but only the passes that interact with the modified node.Similarly if we were to add a new pass, say constant folding, we should not have to write code for AST nodes that cannot be folded.We should only have to write code for AST nodes that can be folded.In object-oriented languages there are several ways to implement a compiler.The most common approach is to use visitors.Visitors make it very easy to add or modify passes, but difficult to add new AST nodes.Because all existing visitors must be modified to support the new node.Another way to implement a compiler is to implement compiler passes as methods of the AST node.For instance, every AST node might have a typeCheck method and a codeGen method.Now, with this approach it is difficult to add a new pass,We have to modify all the existing node classes to add a new method for the pass.However, with this approach it is very easy to add new AST nodes or to extend existing ones.Polyglot takes this second approach and extends it to allow easy addition of new passes.So now Ill describe how Polyglot achieves these goals. First, as I said the Polyglot base compiler is a Java 1.4 frontendIt reads Java code and outputs Java code. First the extension writer needs to create a parser for their language, so weve implemented a preprocessor for JavaCUP called PPG that provides grammar inheritance. This lets you write a parser for a language that is mostly Java with very little code. Between parsing and code generation are a series of AST rewriting passes. Each pass takes an AST and produces a new, usually modified AST. For instance some passes resolve identifiers in the AST to their declarations; other passes compute and check semantic information, storing the results in the AST so that they can be passed to the next pass. Each pass performs one operation over the entire AST. This leads to a more modular programming style and allows extension writers to override behavior in a fine-grained manner. Each pass is implemented with a single AST rewriter object that invokes a method implemented for each AST node type. Ill describe how this work in a bit more detail shortly. To create an extension, the programmer extends classes in the base compiler to add, modify, or remove functionality. The extension will likely also need a pass that translates the extension language AST into a base language AST. Of course, we want extensions to be open, which means we can do this again. Note that in this case the second extension may translate its ASTs into ASTs for the first extension which in turn translates them into ASTs for the base language.

    Mention PPG, node factories, functional rewriting, one operation per pass more modular and easier to replacePolyglot provides a base Java compiler, written in Java.To extend this compiler, one must first write a parser for the extension language.Polyglot includes a parser generator called PPG which adds grammar inheritance to Java CUP.The parser creates an abstract syntax tree using a node factory.This factory allows implementations of AST nodes to be easily replacedSince the base compiler does not refer to AST node classes directly, but only to their types, or interfaces.The parser hands the AST off to a series of AST rewriting passes which traverse the AST invoking a method at each node that implements the pass.These methods may return a new AST node, and so the entire pass over the AST creates a new AST.The rewriting passes perform semantic analysis on the AST, inserting type information into the tree to be passed between passes.Extension writers may extend modify the sequence of passes run on the AST and since the AST is an extension language AST, not a Java AST the extension writer typically must write a pass that translates the AST into a Java AST.This AST is fed to a code generator that outputs Java code, on which a compiler such as javac is invoked to produce bytecode.As an example, here is an extension called PAO that allows primitive types such as int or float to be used as subclasses of Object.This extension changes the Java type systems and also relaxes Javas syntax.Since no new syntax is introduced, this language cannot be implemented using a macro language.The implementation is straightforward: after semantic checking with the new typing rules, PAO is translated to Java by inserting boxing and unboxing code as needed.In this example, we insert a mapping from the string two to the integer 2 into a hash table.We then extract the integer again. Since get returns an Object, we have to cast it to an int before the assignment.In the translation, the integer 2 is boxed before insertion into the hash table, then unboxed when it is cast to an int.

    To implement PAO, we must modify the Java parser and type checker to permit instanceof to be used on primitive types.Using PPG, we can start with the java parser, which is implemented using Java CUP.Then we drop Javas rule where instanceof can only be used on reference types.And add a rule that allows instanceof to be used with arbitrary types.We also create an AST node for the instanceof using the node factory.In addition, to the changes for instanceof, we also add a new pass to insert the boxing and unboxing code.To add this new translation pass, conceptually what we want to do is extend the Node interface with a new method called rewrite().The translation pass traverses the AST calling the rewrite() method to produce a new AST.Since most PAO terms are also Java terms, for most nodes the rewrite() method just returns the node itself.In other words it implements the identity translation.For other nodes like casts where boxing or unboxing code needs to be inserted, we write a specialized version of rewrite() that implements the translation.An important aspect of scalable extensibility that most object-oriented languages do not support is what we call mixin extensibility.Since compiler data structures are defined through inheritance, we would like changes made to the base class to automatically be inherited by subclasses.These changes include, not only adding new methods, but also new data fields.As well see in a few slides, our solution to providing mixin extensibility also provides scalable extensibility.

    To add this new rewriting pass, one might think you could use inheritance to add a method to do the translation.But, this doesnt scale.We can extend the root of the AST node class hierarchy by subclassing the Node class to produce a PaoNode class.This class implements the translation pass as a method called rewrite().The translation pass traverses the AST calling the rewrite() method to produce a new AST.Since most PAO terms are also Java terms, for most nodes the rewrite() method just returns the node itself.In other words it implements the identity translation.For other nodes like casts where boxing or unboxing code needs to be inserted, we write a specialized version of rewrite() that implements the translation.But then we have to go through each one of the subclasses of Node and multiply inherit from the new class and the existing class.Or, since were using Java, which doesnt have multiple inheritance, we have to go through all the subclasses and singly inherit and then write some delegation code to invoke the implementation in the PaoNode class.This may not seem too bad when there are only 2 subclasses of Node, but in Polyglot there are dozens of subclasses of Node.Creating dozens of new classes to add one new method does not scale.The solution we implement in Polyglot is to use composition rather than inheritance to mixin new methods and fields into the AST class hierarchy.In the Polyglot base compiler, every AST node has an ext field into which an extension can install an object implementing a new method or providing a new field.These objects are installed by the node factory with one line of code.Therefore to add a new method, the programmer has to implement an extension object class and write the line of code to install a new extension object into each new AST node.Of course, since we want language extensions to themselves be extensible, every extension object has its own ext field.Now, a method may be implemented in either the node or in any one of several extension objects.In this example, the typecheck method is implemented in both the node and in the second extension object.The code for the base compiler should invoke the method in the AST node class.The code for the language extension should invoke the method in the second extension object.These calls cannot be hardcoded.Therefore, we do the usual computer science trick and introduce another level of indirection.Each node and each extension object has a del field through which we invoke methods introduced by that object.The del field points to an object implementing the same interface as the node or extension object.This object contains code to dispatch the method to the appropriate implementation of the method.Typically, no delegate object is needed, so we can just point the del pointer back to the node or extension object itself.When we do have to override a method, the framework provides a default implementation of the delegate object that dispatches all calls back to the node.The programmer need only extend this default implementation and implement one method to redirect to the new implementation.Extension objects thus provide mixin extensibility.They also provide scalable extensibility.Heres how.To add a new pass, for instance the PAO rewriting pass, the programmer creates an extension object class for each node type that requires boxing or unboxing code in its translation to Java.The programmer also writes a single extension object class that implements the default behavior of the pass, in this case, the identity translation.Instances of this class are installed into the remaining node types.To change the implementation of an existing pass, the programmer need only install a delegate object to redirect the calls to the method to the new implementation.To create a new AST node type, the programmer just creates a new subclass of the Node root class,or mixes in new fields into an existing node type.Several extensions to Java have been implemented with Polyglot.PAO Ive already discussed.Some of these languages have been themselves extended.Jif is an extension to Java that protects privacy by controlling information flow.Jif and a few other languages are extensions of an extension that implements parameterized types.This extension is abstract.It does not implement a complete compiler, but provides some code used by in the compilers that extend it.The extensions define what the parameters are.In PolyJ, the parameters are types.In Jif, the parameters are security labels.Jif/split extends Jif to support distributed programs.Our experience is that small extensions like PAO can be implemented in a few days.One language that extends Java to allow method return types to be covariant in their class type was implemented in only a few hours.The extensions seem to scale well as the language differs from Java.As a measure of the programming effort involved, we measured the code size of the extensions in Java tokens.We use tokens rather than line counts to avoid some of the bias due to coding style.Note that Polyglot is a bit larger than Suns java compiler, javac.Much of this difference is because Polyglot makes extensive use of interfaces and factories, which are not used in javac since javac was not designed to be extensible.The larger extensions, Jif, PolyJ, and Jmatch, make substantial changes to the Java type system.Most of the Jif code consists of new type constructs and a new non-sparse semantic checking pass.Much of the PolyJ and Jmatch code is code for translating the language to Java.The smaller extensions like PAO and covariant return can be implemented with small amounts of code.There is quite a bit of related work in this area.In general, most of this work does not focus on language extension, but rather addition of new optimization or analysis passes.Some of these systems use language features not found in Java to be extensible, such as aspects.Most of this work does not consider the mixin extensibility problem.

    In conclusion, Polyglot has been successfully used to implement several Java extensions. The programmer effort required to implement a language extension seems to scale well with the difference between the extension and Java. We use extension objects and delegate objects to achieve this scalability.Finally, Polyglot is available for download from this site and youre welcome to try it out.An important aspect of scalable extensibility that most object-oriented languages do not support is what we call mixin extensibility.Since compiler data structures are defined through inheritance, we would like changes made to the base class to automatically be inherited by subclasses.These changes include, not only adding new methods, but also new data fields.As well see in a few slides, our solution to providing mixin extensibility also provides scalable extensibility.

Recommended

View more >