7
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING IEEJ Trans 2013; 8: 380–386 Published online in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/tee.21869 Paper JGroovy: An Alternative Approach to Implement Extensible Java Compiler Siwadol Sateanpattanakul a , Non-member Kazuhiko Hamamoto ∗∗ , Member Aranya Walairacht , Non-member The main reason for the invention of computer programming languages is for these languages to express a command-to-control machine behavior. Some of these programming languages have specific advantages that are used in specific environments such as structured query language (SQL), hypertext markup language (HTML), and spreadsheet. The common definitions of these languages are domain-specific languages (DSLs). Although DSLs are the best way to deal with specific systems, it is hard to use with other environments or platforms. Groovy is a dynamic programming language that runs on the Java virtual machine. Groovy has some features that allow programmers to manage DSLs within its unique style. Groovy has some disadvantages because it does not support all Java features and syntax while also producing unnecessary byte code during compilation. This paper proposes an extended-architecture technique to implement a computer programming language and compiler through extending Java with Groovy language. The extensible language is called ‘JGroovy’. And JGroovy is supported both by Java and Groovy language. We implement the compiler for JGroovy and call it the ‘JGroovy compiler’ (JGC). By its extended architecture, JGC is more compatible for Java source code than Javac can claim to be. And it also produces a better and more compact byte code than the Groovy compiler, with an approximate improvement of 8–12%. © 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc. Keywords: programming language, Java language specification, Java virtual machine, compiler construction Received 3 October 2011; Revised 29 May 2012 1. Introduction Computer programming languages have been deliberately designed to express computations. All programming languages are artificial languages that are used by a machine, typically a com- puter. They are implemented for specific purposes, and are also known by an array of other terms. Thus, every computer pro- gramming language has a particular form for written specification highlighting their syntax and semantics. Some languages have been designed to solve a general problem, whereas some languages have been intended to explain a specific problem. General-purpose pro- gramming languages (GPLs) are the group of languages that is envisioned to describe a wide variety of problems of application domains. However, there are some specific complications that can- not be solved by GPLs. Consequently, domain-specific languages (DSLs) are implemented to clarify specific problems. There are definitely many DSLs, which is due to the fact that each DSL has an individual characteristic that can explain a particular problem, which can be either a demonstration technique and/or an individual solution technique. The language-oriented programming (LOP) is a computer pro- gramming paradigm designated to solve a specific problem by using both DSLs and GPL [1]. LOP can consist of a set of DSLs that manages group of specific problems. A DSL can express an exact command designated for the single application domain. For example, structured query language (SQL) is a standard language a Correspondence to: Siwadol Sateanpattanakul. E-mail: [email protected] * Department of Computer Engineering, Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand ** Department of Information Media Technology, School of Information and Telecommunication Engineering, Tokai University, 2-3-23 Takanawa Minato-ku, Tokyo 108-8619, Japan for accessing databases. It is used only to access and manipulate databases. Similarly, the hypertext markup language (HTML) it is not a programming language but is a markup language that uses markup tags to describe HTML documents, also called web pages. DSLs have been invented for a specific domain; it cannot be used to solve the problem in other domains. Although programmers work with DSLs via many GPLs, each GPL has a characteristic style of response DSLs because GPLs have some form of written specification of their syntax and semantics. There are many mod- ern GPLs that are used to develop software applications, such as Ada, C/C++, C#, Java, Lisp, Pascal, Python, and Ruby. These languages are designed to solve specific problems with several terms. Some of these languages are extended from Java. This is because Java is famous among object-oriented programming languages and it is a GPL designed to be used for writing soft- ware in a wide variety of application domains. And Java has been enabled to support programmers who want to write within the LOP paradigm. Java program is compiled to byte code and run on the Java virtual machine (JVM), which enables a set of com- puter software programs to use a virtual machine for execution of other computer programs and scripts. It is also under continuous development, though a new version has been released [2–4]. This is an advantage of the Java language: it supports a methodology to improve the programming language [5, 6]. Furthermore, a JVM can also be used to implement programming languages different from Java. For example, the Ada source code can be compiled to the Java byte code which can be executed by a JVM. A JVM is a crucial component of the Java platform because JVMs are avail- able for many hardware and software platforms. Therefore, there are many programming languages that are trying to extend and take advantage of Java and JVM. Java can be extended with new specifications that represent a new programming language. These new specifications change Java language specifications (JLS). In other words, a new programming language cannot support a source © 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

JGroovy: An alternative approach to implement extensible Java compiler

  • Upload
    aranya

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERINGIEEJ Trans 2013; 8: 380–386Published online in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/tee.21869

Paper

JGroovy: An Alternative Approach to Implement Extensible Java Compiler

Siwadol Sateanpattanakul∗a, Non-member

Kazuhiko Hamamoto∗∗, Member

Aranya Walairacht∗, Non-member

The main reason for the invention of computer programming languages is for these languages to express a command-to-controlmachine behavior. Some of these programming languages have specific advantages that are used in specific environments suchas structured query language (SQL), hypertext markup language (HTML), and spreadsheet. The common definitions of theselanguages are domain-specific languages (DSLs). Although DSLs are the best way to deal with specific systems, it is hard to usewith other environments or platforms. Groovy is a dynamic programming language that runs on the Java virtual machine. Groovyhas some features that allow programmers to manage DSLs within its unique style. Groovy has some disadvantages becauseit does not support all Java features and syntax while also producing unnecessary byte code during compilation. This paperproposes an extended-architecture technique to implement a computer programming language and compiler through extendingJava with Groovy language. The extensible language is called ‘JGroovy’. And JGroovy is supported both by Java and Groovylanguage. We implement the compiler for JGroovy and call it the ‘JGroovy compiler’ (JGC). By its extended architecture, JGCis more compatible for Java source code than Javac can claim to be. And it also produces a better and more compact byte codethan the Groovy compiler, with an approximate improvement of 8–12%. © 2013 Institute of Electrical Engineers of Japan.Published by John Wiley & Sons, Inc.

Keywords: programming language, Java language specification, Java virtual machine, compiler construction

Received 3 October 2011; Revised 29 May 2012

1. Introduction

Computer programming languages have been deliberatelydesigned to express computations. All programming languages areartificial languages that are used by a machine, typically a com-puter. They are implemented for specific purposes, and are alsoknown by an array of other terms. Thus, every computer pro-gramming language has a particular form for written specificationhighlighting their syntax and semantics. Some languages have beendesigned to solve a general problem, whereas some languages havebeen intended to explain a specific problem. General-purpose pro-gramming languages (GPLs) are the group of languages that isenvisioned to describe a wide variety of problems of applicationdomains. However, there are some specific complications that can-not be solved by GPLs. Consequently, domain-specific languages(DSLs) are implemented to clarify specific problems. There aredefinitely many DSLs, which is due to the fact that each DSL hasan individual characteristic that can explain a particular problem,which can be either a demonstration technique and/or an individualsolution technique.

The language-oriented programming (LOP) is a computer pro-gramming paradigm designated to solve a specific problem byusing both DSLs and GPL [1]. LOP can consist of a set of DSLsthat manages group of specific problems. A DSL can express anexact command designated for the single application domain. Forexample, structured query language (SQL) is a standard language

a Correspondence to: Siwadol Sateanpattanakul.E-mail: [email protected]

* Department of Computer Engineering, Faculty of Engineering, KingMongkut’s Institute of Technology Ladkrabang, Bangkok 10520,Thailand

** Department of Information Media Technology, School of Informationand Telecommunication Engineering, Tokai University, 2-3-23 TakanawaMinato-ku, Tokyo 108-8619, Japan

for accessing databases. It is used only to access and manipulatedatabases. Similarly, the hypertext markup language (HTML) it isnot a programming language but is a markup language that usesmarkup tags to describe HTML documents, also called web pages.DSLs have been invented for a specific domain; it cannot be usedto solve the problem in other domains. Although programmerswork with DSLs via many GPLs, each GPL has a characteristicstyle of response DSLs because GPLs have some form of writtenspecification of their syntax and semantics. There are many mod-ern GPLs that are used to develop software applications, such asAda, C/C++, C#, Java, Lisp, Pascal, Python, and Ruby. Theselanguages are designed to solve specific problems with severalterms. Some of these languages are extended from Java. Thisis because Java is famous among object-oriented programminglanguages and it is a GPL designed to be used for writing soft-ware in a wide variety of application domains. And Java has beenenabled to support programmers who want to write within theLOP paradigm. Java program is compiled to byte code and runon the Java virtual machine (JVM), which enables a set of com-puter software programs to use a virtual machine for execution ofother computer programs and scripts. It is also under continuousdevelopment, though a new version has been released [2–4]. Thisis an advantage of the Java language: it supports a methodologyto improve the programming language [5, 6]. Furthermore, a JVMcan also be used to implement programming languages differentfrom Java. For example, the Ada source code can be compiled tothe Java byte code which can be executed by a JVM. A JVM is acrucial component of the Java platform because JVMs are avail-able for many hardware and software platforms. Therefore, thereare many programming languages that are trying to extend andtake advantage of Java and JVM. Java can be extended with newspecifications that represent a new programming language. Thesenew specifications change Java language specifications (JLS). Inother words, a new programming language cannot support a source

© 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

JGROOVY: AN ALTERNATIVE APPROACH TO IMPLEMENT EXTENSIBLE JAVA COMPILER

code that is written in the Java common form. A main cause ofincorrectness is the similar syntax usage between the host languageand the extension language. There are many extended languagesthat are compatible with JVM, such as JRuby which is a Javaimplementation of the Ruby programming language [7], Metaborg[8], JMatch [9], and Groovy [10]. These languages are designed tosolve specific problems with several terms. Even though each lan-guage is an extension of Java, it cannot use an identical compilerwhile compiling a host language.

In this paper, we extend the Java programming language withthe Groovy programming language to improve Java programmingwith the external DSL technique [11, 12]. We extend the Groovyprogramming language features on top of the Java programminglanguage where Groovy incorporates the Java construction assmaller parts [13]. The compiler is designed as a separate modulefor each language. The host language Java is primarily designedand extended to this module to create a new language. Thecompiler must support all JLS [14], and should be compatiblefor new extension modules.

The rest of this paper is outlined as follows: Section 2 givesan overview of JGroovy architecture. Section 3 describes how toextend Groovy feature to the host language. Section 4 presentsthe result and discussion of the JGroovy compiler (JGC). Finally,Section 5 concludes the paper.

2. Overview of JGroovy Architecture

JGroovy is a programming language that extends the meaningof all existing Java statements and expressions while adding somenew forms. It is backward-compatible with Java. Java language isused as a host language and enhances the Groovy programminglanguage features. The set of features are used at the host language,which is sufficiently small and different so as to make tractableformal proofs of type soundness [15]. In addition, the Groovylanguage supports the meta object protocol (MOP) [16]. It has aninterface that defines the Application Programming Interface (API)usable by clients of Groovy’s MOP. Because Groovy has somesimilarities in syntax with Java, aspects of language design canbuild a language simply by reusing trivial language definition mod-ules such as modules for expressions, declarations, etc. However,there are some features of Groovy that are not available in Java:

• Closure• Native syntax for lists and maps• Native support for regular expression• Embed expression inside strings• Switch statement and polymorphic iteration• Smart syntax for writing bean• Safe navigation.

To support Groovy features, the language needs to be extendedin several ways. At the lexical analyzer, the language needs tobe extended with notation for dynamic type, safe navigation,and native of regular expression. At the syntactic analyzer, thelanguage needs to be extended production rules of Groovy whichare not available within Java language. Finally, for the semanticanalyzer, the language needs to be extended with the meaning andbehavior of the language.

2.1. Stack machine lexical analyzer It is a finite-statemachine [17] whose output values are determined both by itscurrent state and by the values of its inputs. It consists of sixtuples (S , S 0, �, ∧, T , G) [18] as follows:

• a finite set of states (S )• a start state (also called initial state) S 0, which is an element

of (S )

• a finite set called the input alphabet (�)• a finite set called the output alphabet (∧)• a transition function (T : S × � → S ) mapping pairs of a state

and an input symbol to the corresponding next state. Switchstatement and polymorphic iteration

• an output function (G : S × � → ∧) mapping pairs of a stateand an input symbol to the corresponding output symbol.

The stack-machine technique [13] is used for resolving somesimilar characters between Java and Groovy. We will brieflydescribe how the stack-machine lexical analyzer does this.

The stack-machine scanner reads the next token. Then the stack-machine scanner approves the input token and assigns the correctstate to the scanner (Fig. 1). The next token and stack-machinefunction should be straightforward. The basic steps are as follows:

1. Read the first token and put the current state into the stackmachine [Fig. 1(a) and (b)].

2. See what the next character is [Fig. 1(c)]. Ignore the commentsand white spaces. Then return the token with the appropriateand pop or put state to the proper state.

(a) If the next character is an operator, or delimiter, return theappropriate kind [Fig. 1(d)]. (e.g., if the character is ‘(’,return the token with the ‘kind’ value of ‘LPAREN’, and

C

B

A

stack

stack

(c) Accept indenti?er at stete B andstep up to state C ,and wait nexttoken.

(b) Accept keyword at start stete,go to state B and put A into stack.

(a) Sample state machine

B

A

C

B

D C

B

stack

stack

A

A

A

D

B

CD

B

CD

A

(d) Accept a white space or delimeterat state C, if it is delimeter for end ofstatement go to A and pop state, ifwhite space stay at state C,if lineseparator goto D and wait fornext token.

A

Possible path Selected path

Fig. 1. Stack-machine lexical analyzer

381 IEEJ Trans 8: 380–386 (2013)

S. SATEANPATTANAKUL, K. HAMAMOTO, AND A. WALAIRACHT

go to another state which is upon current state and putcurrent state);

(b) If it is a number (integer), return the token with the ‘kind’value of ‘INTEGER’ and the ‘val’ value of the wholenumber, but not just the first digit;

(c) If it is a word (string/character array), return the tokenwith the ‘kind’ value of ‘IDENTIFIER’ and the ‘id’ valueof the whole string, but not just the first character. If thestring is a keyword (e.g., ‘if’, ‘else’, etc.), return the tokenwith the proper ‘kind’ value.

Definitely, the Java tokens must be designed with no concern tothe extended tokens. The extension tokens are properly defined toeach state which appears from the language host stack-machinestate. The stack-machine state is designed with the necessarytokens. For example, the token or keyword ‘IF’ is not definedin the class state because there is no syntax about ‘IF’ that is usedin this state. The stack-machine lexical analyzer will approve theinput token and assign the right state to the scanner.

2.2. Syntactic analyzer It determines whether the streamof tokens from the lexical analyzer come from a valid sentencein the programming language grammar. The abstract syntax tree(AST, Fig. 2) has been used to define the structure of language asa class hierarchy with general abstract classes like Statement andExpression, thus assigning it to a concrete class by expressionssuch as Assignment and AddExpression. Also, methods and fieldscan be included to the classes under order to implement compi-lation or interpretation. The abstract grammar is used to describeproduction rules. It is a class hierarchy augmented with subcom-ponent information corresponding to production right-hand sides.

JastAdd [19] makes use of an explicit object-oriented notationfor abstract grammar because former object-oriented ASTs have atwo-level hierarchy that is usually insufficient from the modelingpoint of view.

The following grammar productions describe how the Java 5grammar is extended in a backward-compatible way in order tobecome the JGroovy grammar. The notation is BNF: on the left-hand side is a nonterminal, and on the right-hand side is one ormore sequences of symbols; more sequences are separated by thevertical bar, ‘|’, indicating a choice, the whole being a possiblesubstitution for the symbol on the left. Symbols that never appearon a left side are terminals (Fig. 3).

2.3. Semantic analyzer The structures of object-oreintedASTs simply represent the implementation of compilers, becausethey use a class-hierarchy structure that is composed by nontermi-nals which are modeled as abstract superclasses and productionswithin specialized concrete subclasses. The AST object-orientedclass is an outlined structure. The visitor pattern is used to designa solution for the ASTs class. It allows a given common methodused by all ASTs nodes. Then a helper class calls a Visitor that is

abstract Stmt;abstract BranchTargetStmt : Stmt;GroovyForStmt : BranchTargetStmt ::= InitStmt:Stmt* Expr Stmt;

Fig. 2. An abstract syntax tree

groovy–for–statement == FOR LPAREN groovy–for–init.i IN groovy–expression.e RPARENstatement.s| FOR LPAREN groovy–for–init.i IN LBRACK groovy–for–list.e?RBRACK RPAREN statement.s

Fig. 3. The production rules present in BNF

contained in an abstract visit(c) method form for each AST classC . Although the visitor pattern is widely used to manage the sys-tem but it has an only one method which can be factored out; afield must be still declared directly in the classes, or be handledby a separate mechanism. This limitation can be resolved by usingthe aspect-oriented programming (AOP) paradigm [20]. The AOPtechnique has a method that allows the programmer to traverseevery point of the outlined structure and to arrange code files byaspect and behavior. Therefore, all the behaviors of the semanticanalyzer have been induced by AOP. There are two types of filesthat are used for the AOP technique. Jadd is a file type that is usedto control imperative behavior. And Jrag is a file type that is usedto control declarative behavior [21]. There are some differencesbetween these two types of files. The Jadd modules use the nor-mal Java syntax which consists of a list of class declarations. Andeach class is matched an AST classes. The Jrag module is a spe-cific programming language, which is slightly extended from Javaprogramming language. It consists of lists of class declarations,each class containing attributes and equations.

JastAdd is a compiler construction system that provides manyinfrastructures including AOP support. Both the imperative Jaddaspect and the declarative Jrag aspect can be combined by JastAdd.The compiler can be of a separated design and can be dividedinto many submodules that are developed under declarative orimperative methodology depending on which is most suited forthe particular situation. Then the JastAdd system translates Jaddfiles, which then declare from all modules to ordinary Java andJrag files, and these files are thus translated through this process.Both file types are weaving together a Java code that correspondsto the ASTs structure. The source codes are compiler source codesthat are used to control compiler behavior. With AOP in JastAdd,the compiler system can be extended with less effect being madeto a precious system.

3. Implementation

As described above, the JGC system is comprised of severalsubmodules that work together [22]. There are four extensivesubmodules in the JGC: lexical analyzer, syntactic, semanticanalyzer, and code generation and optimization (Fig. 4). The JGCis an extension of the Java compiler with Groovy specificationlanguage. Therefore, the JGC is fully supported by Java 5.

3.1. JGroovy lexical extensions All keywords orreserve words must be extended in this section. Each keywordhas to be extended to the proper state of the stack-machine lexicalanalyzer. With the stack-machine lexical analyzer technique, thecompiler can be extended to all Groovy keywords and charactersthat appear in Groovy language specification [23].

In addition, the lexical analyzer has to be a common operatorto handle many DSLs. We use a closure which is an operator to

Extended architectureBackend

Frontend

Code generator and optimization

Lexicalanalyzer

Syntacticanalyzer

Semanticanalyzer

Extendedlexical

Extendedsyntactic

Extendedsemantic

Source files

Files (.class)

Fig. 4. The JGroovy compiler architecture

382 IEEJ Trans 8: 380–386 (2013)

JGROOVY: AN ALTERNATIVE APPROACH TO IMPLEMENT EXTENSIBLE JAVA COMPILER

show and accept any number of strings. An operator * is used torepresent closure L* to denote the concatenation of language Lwith itself any number of times, that is,

L∗ =∞⋃

i=0

Li . (1)

3.2. JGroovy syntax extensions In this section, weexplain the three extension formats that are often used for anextensible new languages feature: method and variable declara-tions, statement extension, and expression extension. Several com-postable extensions have been specified and implemented for thehost Java 1.4 and Java 5 languages. We describe several of theterms here.

1. Method and variable declarations: Java method declara-tion headers are extended to include dynamic type methoddeclaration.

MethodDecl method_header =modifiers.m? dynamic_type.t simple_name.n

2. Statement extensions: There are many new statements thatare added to host language such as polymorphic iteration.The syntax of polymorphic iteration created should allow away to traverse its elements without exposing its internalstructure.

x = 0for ( i in [0, 1, 2, 3, 4] ) {x+ = i}

So, we have to extend the polymorphic iteration syntax to Javaas follows:

Stmt groovy_for_statement = FORLPARENgroovy_for_init.i IN groovy_expression.eRPARENstatement.s

The syntax of iteration statements is extended to other state-ment too as shown below:

Case switch_label =CASE static_arraylist_init.e1 COLON

In a switch statement, the case label is extended to allow listand map as the conditional. There are many extended grammarswhich are not explained, but they are extended to the hostlanguage to support Groovy features.

3. Expression extensions: The iteration statements correspond tolists and maps.

Expr static_arraylist_init =LBRACKgroovy_variable_initializers.v? COMMA?RBRACK

The expressions are extended to contain members of value orvariable similar to declaration of array.

3.3. JGroovy semantic extensions The AST structureis an object-oriented structure used to design a compiler structureas a guideline to design a parsing grammar. Hence, a Groovysyntax that has similar Java can be inherited. Reference attributegrammars (RAGs) [24, 25] are used to handle and access class insemantic module to handle and check the correctness of the sourcecode. RAGs are specific files that use in JastAdd. RAG is a specificlanguage designed to handle compiler behavior. It is composedof ordinary Java language and a specific language. The specificlanguage is a kind of AOP. Then, we explain some modules ofthe compiler that are used to verify source codes such as nameanalysis and type checking. While the methodology of the semanticanalyzer may change or improve the performance [26], it can bean updated methodology of name analysis and type checking withno effect to the original code. With this methodology, the semanticanalyzer can manage a source code similar to Featherweight Java(FJ) [27], but it supports mixing dynamically typed and staticallytyped codes.

1. Name analysis and type checking: Similar to the Java program,JGroovy programs include name analysis and are type-checkedstatically. JGroovy has been built with object-oriented ASTs, sothere are many abstract classes and interfaces that were built,while aspect modules may add interface implementations to theAST classes. As an example, one can mention an implement ofname analysis for a language that has many different block-likeconstructs, e.g., class method and compound statement. Thereis a lookup method that looks up a name among its local dec-laration in a block, and, if not found there, delegates the call tosome outer block-like construct. The AOP technique is used toweave and combine all abstract classes and interfaces to AST.We consider extending Java 5 with the enhanced iteration forloop of Groovy:

for (e in collection) statement

A local variable declaration e is declared and needs to beincluded in the set of visible declarations of the contained state-ment similarly enhanced in Java 5. The new loop is designed bya new AST class GroovyForInClause (See Fig. 5). It definedlookup for contained statement that delegates to a new specificlookup and localLookup variable. The specific lookup is usedto find the string, which is a variable name. Then, the lookupmethod matches the string to the local variable, and if there isno match, it delegates to the GroovyForInClause’s own lookupattribute, which is captured by the superclass Stmt.

With all the techniques that are developed including extendedlanguage specification in the previous section, we are now ableto build extensible compilers. The main program consists of two

Access

VariableDeclaration

NameKind

NameKind nameType()

NameKind nameType()

NameKind EXPRESSIONNameKind TYPENameKind PACKAGE

framework

extension

Concrete ClassgetAccess.nameType()

Fig. 5. JGroovy extension framework

383 IEEJ Trans 8: 380–386 (2013)

S. SATEANPATTANAKUL, K. HAMAMOTO, AND A. WALAIRACHT

Extended API architecture

Host API

Java 1.4 Java 5

Groovy Another

Fig. 6. JGroovy API

main components: the Java component and the Groovy component.There are two modules in Java component: Java1.4 and Java5.Each module contains many submodules for supporting each JSL.Groovy components reuse and extend Java components [28].Therefore, the JGC is supported by all Java API because it isextended on top of Java API (Fig. 6).

3.4. Code generation and optimization It is a com-piler’s backend module. This module transforms the intermediatelanguage representations (IR) that is approved from the previoussection to IR byte codes. The IR byte codes are optimized by Soot[29] in this section. Soot is a Java optimization framework thatis aware of which perform significant optimizations on byte codeand produce a new files class by performing optimizations suchas loop-invariant removal and common subexpression eliminationusing a simple side effect analysis [30]. Finally, it produces bytecodes by using JVM specification that is suitable to run on theJVM platform [31].

4. Results and Discussion

To evaluate effects of our implementation technique, we haveevaluated it by comparing it with the Java compiler, JGC, andJastAddJ5 [22] compiler. We use two test suites to test ourcompiler. The first suite is used as test and evaluation of theJava code, and the second suite is used as test and evaluationfor the Groovy code. We have used a test suite of the Jacks[32] compiler to evaluate all compilers. Jacks with 4619 Java testcases were conducted to test the performance of three compilers.After evaluation, 515 Groovy test cases were conducted to test theperformance between JGC and Groovy compiler. Each compilerwas tested three times. The Jacks test suite prepares many modulesfor testing a compiler. Lists of the testing modules are shownbelow.

• Block and statements• Classes• Lexical structure, etc.

For the Groovy testing, we built a test case from the Groovylanguage specification. All test cases that were used for testingare not available in Java. We used the Java compiler versionjdk1.5.0_06 and included JastAddJ5 to test and compare with theJGC. This experiment was conducted to evaluate the errors ofthe JGC with Java5 specification after the Groovy features havebeen extended. The result of the experiment is shown in Table I[18, 33].

The results of experiment show that the JGC found errorswithin the test case which was created from the JLS 5 less thanJavac 5 compiler. The error is equivalent to the JastAdd compiler.However, the number of passes for three compilers is almostsame if calculated in percentages. It shows that JGC has similarperformance as the other two compilers.

Table I. Result of the JGC experiment with other Java compilers

Number of test cases (4619 test cases)

Compiler Pass Fail % Pass complete

Javac 4540 79 98.3JastAddJ 4569 50 98.9JGC 4568 51 98.9

Table II. Result of JGC experiment with the Groovy compiler

Number of test cases (515 test cases)

Compiler Pass Fail % Pass complete

Groovy 515 0 100JGC 515 0 100

The experiment includes Groovy specification and comparethe performance of JGC with Groovy compiler versiongroovy-all-1.80. Table II represents the experimental results of JGCthat compiles with the Groovy source code [33]. The experimentshows that the performance of JGC is compatible for the Groovylanguage and the similar Groovy compiler.

The next experiment represents the byte code sizes that arecompiled by Javac, JGC, and Groovy with several conditions. Thisexperiment uses similar test cases to evaluate the three compilers.There are three graphs that form this experiment. First, the resultfrom the first graph (Fig. 7) compares the byte codes that werecompiled by Javac, JGC, and Groovy. The JGC produces a bytecode that is more efficient and compact than Javac and Groovy.The Groovy compiler has the largest byte code. The performanceof the interpreter is important even for high-performance JVM thatemploys the Just-In-Time (JIT) compiler technology [34] to boostthe steady-state performance. So, the byte code size is a causeof the performance. In the next graph (Fig. 8), it focuses on theresult between Javac and JGC. The result demonstrates that thebyte code that is produced from the code optimization module canproduce a compact byte code.

In Fig. 9, we use similar test cases to evaluate the Groovy com-piler and JGC with many conditions. This experiment comparesthe result between the Groovy compiler and JGC.

From the final graph (Fig. 9), we can see that the Groovycompiler produces a very large byte code when compiling a simplecondition because the Groovy class builds meta-class methodsinside a class. Although almost meta-class methods have beencalled at the runtime, some methods have not been used at thesame time. It produces more codes, which are not necessary torun the system. Compiling each method causes an intolerablyslow start-up time and a huge memory footprint for the targetapplication [35]. From the first point to the sixth point of the finalgraph demonstrates a lot of size variation between them. Groovyproduces many codes to use for common work but it can stillrepresent in Java methodology like the JGC. However, with theseventh point to ninth point source code working with Groovy API,JGC produces less byte code than the Groovy compiler. JGC canreduce the byte code size better than Groovy compiler by nearly8–12%. The performance is a critical issue for many desktop andserver systems that use Java. As Java moves from these systemsto smaller embedded devices, size issues become important [36].Therefore, a smaller byte code is more efficient [37].

From the result above, the JGroovy represents the capability ofa programming language. It is an extended language that supportsall host language syntax including those that can work with JavaAPI. The Java 5 compiler is always under active development [3],and this is an advantage when using Java 5 for the host language. Itis compatible with the Groovy language syntax, too. Moreover, the

384 IEEJ Trans 8: 380–386 (2013)

JGROOVY: AN ALTERNATIVE APPROACH TO IMPLEMENT EXTENSIBLE JAVA COMPILER

7000Javac JGC Groovy

6000500040003000

Size

(by

te)

20001000

0

Blank c

ode

Primiti

ve de

clarat

ion (i

nt)

Primiti

ve de

clarat

ion (d

ouble

)

Assign

value

plus

Assign

value

Referen

ce ty

pe (s

tring

)

Variab

le ini

tializ

ation

Contro

l stat

emen

t

Meth

od in

voca

tion

Fig. 7. Byte codes size of three compilers: Javac, JGC, andGroovy compiler

367

Javac JGC

391 391 398 404 419 441 402 516

331228247225218216215214212

Blank c

ode

Primiti

ve de

clarat

ion (i

nt)

Primiti

ve de

clarat

ion (d

ouble

)

Assign

value

plus

Assign

value

Referen

ce ty

pe (s

tring

)

Variab

le ini

tializ

ation

Contro

l stat

emen

t

Meth

od in

voca

tion

Size

(by

te)

Fig. 8. Byte codes size between Javac and JGC

5456 5533 5593 5597 54825,482

5,364 5,728 6,211

554749994893

331229

JGC Groovy

216216216214Size

(by

te)

Dynam

ic de

clarat

ion

Assign

boole

an

Assign

doub

le

Assign

long

Referen

ce ty

pe (s

tring

)

Meth

od in

voca

tion

Assign

groo

vy co

llecti

on

Groov

y meth

od in

voca

tion

Groov

y for

state

ment

Fig. 9. Byte codes size comparison between the Groovy compilerand JGC

JGC produces a more efficient byte code, and it avoids producingunnecessary byte codes. Therefore, JGC can make a more compactbyte code than Groovy compiler.

5. Conclusion

In this paper, we have given an overview of the structure ofJGroovy that uses JLS5 to the JGC for host language .The JGroovylanguage was extended from the host language to support a morecomplex syntax. Both languages can work together with a stack-machine lexical analyzer that provides the appropriate token toidentify a complex language. We have also evaluated performanceof the JGC. The results of the experiment showed that JGCsupports the Java language 5 features. Furthermore, it supportsthe extended language without affecting the host language. JGCprovides some capability to produce Java byte code (.class) filesthat JGC produces:

• better compact byte code than Java compiler (Javac), averagebyte code size reduction being 43.74 ± 3.25%;

• less byte code when compared to three compilers with similarsource code;.

• efficient byte code, it cut off byte code that unnecessary to use torun the system. This can be reduced by 8–12% when evaluatedusing Groovy API.

The JGC is an extensible compiler that uses a JGroovy languagespecification. In this result, the extensible compiler produces lessbyte code than the compiler that is newly made. The extensiblecompiler is the way to make a new compiler if the host languageis a popular and strong syntax like Java. The extensible compilercan reuse the code structure. When generating the byte code, ithas more methodology to produce effective byte code. Finally, itconstructs compact byte codes.

Acknowledgments

The authors would like to thank Somsak Walairacht, Bard NesbøSkreien, and the anonymous reviewers for their constructive comments,and Ms. S. Chokpaiboon who provided valuable feedback. This work wassupported by the Aun/Seed-Net.

References

(1) Ward M. Language oriented programming. Software-Concepts andTools 1994; 15:147–161.

(2) Wehr S, Lammel R, Thiemann P. JavaGI: generalized interfacesfor Java. In European Conference on Object-Oriented Programming(ECOOP), 21st European Conference, Berlin, Germany, Notes inComputer Science, vol. 4609. Springer-Verlag; 2007; 347–372.

(3) Wehr S, Thiemann P. JavaGI: the interaction of type classes withinterfaces and inheritance. ACM Transactions on Programming Lan-guages and Systems (TOPLAS) 2011; 33(4):12:1–12:83.

(4) Garcia R, Jarvi J, Lumsdaine A, Siek J, Willcock J. An extendedcomparative study of language support for generic programming.Journal of Functional Programming 2007; 17(2):145–205.

(5) Zenger M. Keris: Evolving software with extensible modules. Journalof Software Maintenance and Evolution: Research and Practice 2005;17(5):333–362.

(6) Keen AK, Tingjian G, Justin TM, Olsson RA. JR: flexible distributedprogramming in an extended Java. ACM Transactions on Program-ming Languages and Systems (TOPLAS) 2004; 26(3).

(7) Fernandez O. The Rails 3 Way . 2nd ed. Boston: Addison-WesleyProfessional; 2010.

(8) Bravenboer M, Groot R. de Visser E. MetaBorg in action: Examplesof domain-specific language embedding and assimilation using Strat-ego/XT. In Proceedings of the Summer School on Generative andTransformational Techniques in Software Engineering (GTTSE 2005),Braga, Portugal, Lecture Notes in Computer Science, vol. 4143.Springer-Verlag; 2006; 297–311.

(9) Liu J, Myers C. JMatch: iterable abstract pattern matching forJava. Proceedings of the 5th International Symposium on PracticalAspects of Declarative Languages (PADL’03), New Orleans, LA,USA, Lecture Notes in Computer Science, vol. 2562. Springer-Verlag;2003; 110–127.

(10) Koenig D, Glover A, King P, Laforge G, Skeet J, Gosling J. Groovyin Action . Manning Publications: Greenwich, CT; 2007.

(11) Hudak P. Building domain-specific embedded languages. ACM Com-puting Surveys (CSUR) 1996; 28(4):196–196.

(12) Elliott C. An embedded modeling language approach to interac-tive 3D and multimedia animation. IEEE Transactions on Soft-ware Engineering, Special issue on domain-specific languages 1999;25(3):291–308.

(13) Gosling J, Joy B, Steele G, Bracha G. The Java Language Specifica-tion, 3rd ed . Addison-Wesley: Boston; 2005.

(14) Wright KA, Felleisen M. A syntactic approach to type soundness.Information and Computation 1992; 1(115):33–94.

(15) Kiczales Gregor J, Rivieres des J, Bobrow DG. The Art of theMetaobject Protocol . Cambridge, MA: MIT Press; 1991.

(16) Mealy GH. A method for synthesizing sequential circuits. BellSystems Technical Journal 1955; 34(5):1045–1079.

(17) Roth CH, Kinney LL Jr. Fundamentals of Logic Design . Thomson-Engineering: Stamford, Connecticut; 2009.

385 IEEJ Trans 8: 380–386 (2013)

S. SATEANPATTANAKUL, K. HAMAMOTO, AND A. WALAIRACHT

(18) Sateanpattanakul S, Walairacht A. JGroovy : an extensible program-ming language with Groovy. ICACT2010, 2010.

(19) Hedin G, Magnusson E. JastAdd: an aspect-oriented compilerconstruction system. Science of Computer Programming 2003;47(1):37–58.

(20) Kiczales G, Lamping J, Mendhekar A, Maeda C, Lopes C, LoingtierJM, Irwin J. Aspect-Oriented Programming. ECOOP’97 , LectureNotes in Computer Science, vol. 1241. Springer-Verlag: Jyvaskyla,Finland; 1997; 220–242.

(21) Kiczales G, Hilsdale E, Hugunin J, Palm J, William G. An overviewof AspectJ. In Proceedings of ECOOP2001 , Lecture Notes inComputer Science, vol. 2072. Springer-Verlag: Budapest, Hungary;2001; 327–355.

(22) Torbjorn E, Gorel H. The JastAdd extensible Java compiler. OOP-SLA07, 2007.

(23) Groovy Language Specification, August 12, 2008. http://groovy.codehaus.org/jsr/spec/

(24) Hedin G, Mernik M. Interactive execution time predictions using ref-erence attributed grammars. In Second Workshop on Attribute Gram-mars and their Applications, WAGA’99. Parigot D (ed); Amsterdam,The Netherlands. INRIA; 1999; 153–172.

(25) Hodin G. Reference attribute grammar. Informatica 2000; 24(3):301–317.

(26) Boustani El N, Hage J. Improving Type Error Messages for GenericJava, Higher-Order and Symbolic Computation. 2011; 24(1-2):3–39.

(27) Igarashi A, Pierce BC, Wadler P. Featherweight Java: a minimalcore calculus for Java and GJ. ACM Transactions on ProgrammingLanguages and Systems (TOPLAS) 2001; 23(3):396–450.

(28) Ekman T, Hedin G. Reusable language specification Modules inJastAdd II. Workshop on Evolution and Reuse of Language Spec-ifications for DSLs, ERLS, 2004.

(29) Vallee-Rai R, Hendren Sundaresan L, Lam PV, Gagnon E, Co P.Soot-a Java optimization framework. In Proceedings of CASCON99 .IBM Press: Toronto; 1999.

(30) Clausen RL. A Java bytecode optimizer using side-effect analysis.Concurrency: Practice & Experience 1997; 9(11):1031–1045.

(31) Lindholm T, Yellin F. The Java Virtual Machine Specication . 2nd ed.Addison-Wesley: Boston; 1999.

(32) The Jacks compiler test suite, October 8, 2008. http://sources.redhat.com/mauve/.

(33) Sateanpattanakul S, Walairacht A. JGroovy: an exterimental ofextensible Java compiler. CCCA2011, 2011.

(34) Suganuma T, Ogasawara T, Takeuchi M, Yasue T, Kawahito M,lshizaki K, Komatsu H, Nakatani T. Overview of the IBM Java Just-in-Time compiler. IBM System Journal 2000; 39(1):175–193.

(35) Suganuma T, Yasue T, Kawahito M, Komatsu H, Nakatani T. Adynamic optimization framework for a Java Just-in-Time compiler.In Proceeding of Object-Oriented Programming, System Languages,and Application (OOPSLA ’01); 2001; 180–194.

(36) Clausen LR, Schultz UP, Consel C, Muller G. Java Bytecodecompression for low-end embedded systems. ACM Transactions onProgramming Languages and Systems (TOPLAS) 2000; 22:471–489.

(37) Brisaboa NR, Farina A, Navarro G, Esteller MF. (S,C)-dense coding:An optimized compression code for natural language text databases.In Proceedings of the Symposium String Processing and InformationRetrieval , vol. 2857. Nascimento MA (ed). Springer-Verlag: Manaus,Brazil; 2003; 122–136.

Siwadol Sateanpattanakul (Non-member) received the B.E.degree in computer engineering and the M.E.degree from the Suranaree University of Tech-nology (SUT), Thailand, in 2003 and 2007. Heis currently pursuing the Ph.D. degree in theDepartment of Electronics, Faculty of Engineer-ing, King Mongkut’s Institute of Technology

Ladkrabang, Thailand. His research interests include softwareengineering, Java technology, compiler construction, and computerprogramming language.

Kazuhiko Hamamoto (Member) was born in Nagasaki Prefec-ture, Japan, in 1966. He received the B.E. degreefrom the Department of Electronics Engineering,Tokyo University of Agriculture and Technol-ogy (TUAT), Japan, in 1989, and the M.E. andPh.D. degrees from the Graduate School of Elec-tronics and Information Technology, TUAT, in

1991 and 1994, respectively. He was an Assistant Professor withSchool of Engineering, Tokai University in 1994 and became anAssociate Professor in 1999. He is currently a Professor with theDepartment of Information Media Technology, School of Informa-tion and Telecommunication Engineering, Tokai University. Hisresearch interests include medical information technology, imageprocessing, human interface, and virtual reality. Prof. Hamamotois a member of the IEICE, SICE, and IEEE, and has been an ICTfield coordinator of the JICA AUN/SEED-Net Project since 2006.

Aranya Walairacht (Non-member) received the D.E. degreefrom Tokai University, Japan. She is currentlyan Assistant Professor with the Department ofComputer Engineering, Faculty of Engineering,King Mongkut’s Institute of Technology Ladkra-bang, Bangkok, Thailand. Her research interestsinclude artificial intelligence, genetic algorithm,

and java technology.

386 IEEJ Trans 8: 380–386 (2013)