Upload
ijcsis
View
216
Download
0
Embed Size (px)
Citation preview
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
Creating an Appropriate Programming Language for
Student Compiler Project
Elinda Kajo Mece
Department of Informatics Engineering
Polytechnic University of Tirana
Tirana, Albania
Abstract — Finding an appropriate and simple source language, to
be used in implementing student compiler project, is one of
challenges, especially in cases when the students are not familiar
with high level programming languages. This paper presents a
new programming language intended principally for beginners
and didactic purposes in the course of compiler design. SimJ, a
reduced form of the Java programming language, is designed for
a simple and faster programming. More readable code, no
complexity, and basic functionality are the primary goals of
SimJ. The language includes the most important functions and
data structures needed for creating simple programs found
generally in beginners programming text books. The Polyglot
compiler framework is used for the implementation of SimJ.
Keywords- compiler design; new programming language; polyglot
framework
I. INTRODUCTION
A compiler course takes a significant place in computer
science curricula. This course is always associated with an
implementing project. Being a multidimensional course, it
requires the students to be familiar with high level
programming languages among the other things. The firstimpact with these high level languages is almost always
considered confusing because of their complexity. This
becomes more obvious in object-oriented languages like Java
[8]. Object-orientation [15] hinders to learn Java step-by-step
from basic principles, because right from the beginning the
learner has to define at least one public class with a method
with signature public static void main(String[] args). So the
teacher has two choices here: trying to explain most of theconcepts involved (classes, methods, types, arrays, etc.) or just
provide the surrounding program text and let the learner add
code to the body of the method main.
SimJ is a simple, Java based programming language. It is
conceived and designed to ease teaching of basic
programming to beginners. We believe that they should learn
easily the basic concepts, before they are exposed to more
complex programming issues. It is much simpler for a new programmer to write println ("Hello world) instead of writing
a confusing line like System.out.println ("Hello world"). This
simple but concise example shows the importance of the first
impact with programming languages. The role of SimJ is to
make this impact less “painful”.
Compiler frameworks are widely used as a simple tool for
implementing new languages based on existing ones. The
complexity begins to increase if the differences between the
existing language and the new one become significant [4].
That is why we used Java as a base language for SimJ. For this
purpose we have chosen Polyglot [4,5] as a compiler
framework for creating compiler for languages similar to Java.
II. THE POLYGLOT FRAMEWORK
Polyglot is an extensible Java compiler toolkit designed for
experimentation with new language extensions. The base
polyglot compiler, jlc ("Java language compiler"), is a mostly-
complete Java front end [1]; that is, it parses [1,2] and performs semantic checking on Java source code. The
compiler outputs Java source code. Thus, the base compiler implements the identity translation. Language extensions are
implemented on top of the base compiler by extending the
concrete and abstract syntax and the type system [4].
After type checking the language extension, the abstract
syntax tree (AST) [1,14] is translated into a Java AST and the
existing code is output into a Java source file which can then
be compiled with javac.
Polyglot supports the easy creation of compilers for languagessimilar to Java. The Polyglot framework is useful for domain-
specific languages, exploration of language design, and for
simplified versions of Java for pedagogical use. As mentioned
above, the last part is where we intend to focus on this paper.
A Polyglot extension is a source-to-source compiler that
accepts a program written in a language extension and
translates it to Java source code [4,5]. It also may invoke a
Java compiler such as javac to convert its output to bytecode[13]. A SimJ oriented view of this process, including the
eventual compilation to Java bytecode, is shown in figure 1.
Figure 1. The Polyglot Compiler Framework Architecture
36 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
The first step in compilation is parsing input source code to
produce an AST. Polyglot includes an extensible parser generator, PPG [5], which allows the implementer to define
the syntax of the language extension (SimJ in our case) as a set
of changes to the base grammar for Java [7]. The extended
AST may contain new kinds of nodes either to represent
syntax added to the base language or to record new
information in the AST.
The core of the compilation process is a series of compilation
passes applied to the abstract syntax tree. Both semanticanalysis and translation [1] to Java may comprise several such
passes. The pass scheduler selects passes to run over the AST
of a single source file, in an order defined by the extension,
ensuring that dependencies between source files are not
violated. Each compilation pass, if successful, rewrites the
AST, producing a new AST that is the input to the next pass.
A language extension may modify the base language pass
schedule by adding, replacing, reordering, or removingcompiler passes. The rewriting process is entirely functional;
compilation passes do not destructively modify the AST.Compilation passes do their work using objects that define
important characteristics of the source and target languages. A
type system object acts as a factory for objects representing
types and related constructs such as method signatures[4,5].
The type system object also provides some type checking
functionality. A node factory [4] constructs AST nodes for its
extension. In extensions that rely on an intermediate language,
multiple type systems and node factories may be used duringcompilation. After all compilation passes complete, the usual
result is a Java AST. A Java compiler such as javac is invoked
to compile the Java code to bytecode.
III. SIMJ PROGRAMMING LANGUAGE
SimJ (stands for Simple Java) is a simplified version of the
Java programming language conceived especially for
beginners. The language is very simple, easy to learn and is
very similar to Java. Previous work has been done in this field
(i.e. the J0 programming language [5] but these languages are
quite different compared to Java syntax [7]. We think that
similarity with Java is very important in order to allow the
programmer to switch to Java without any problems regarding
the syntax when he thinks is ready to explore the full potential
and the advanced features of it.Figure 2 shows an example of the same code written in Java
and in SimJ. This example shows, as mentioned above, that
the code in SimJ is clearly more readable than the one in Java.
Generally, programming courses and textbooks for beginners
include many programs that during their execution require or
the input of the user. In Java this part it’s definitely neither
simple nor easy to implement at the beginning level. We
address this problem by removing the complex part andleaving only the “understandable” one (i.e. readLine()).
public class A {
public static void main(String[] args) {
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader (System.in));System.out.print(“Your name:” );
String name = reader.readLine();
System.out.print(“\nHello, ” + name + “!”);
}
catch (IOException ioexeption) {
System.out.println(ioexeption);
}
}}
class A {
main() {
print(“Your name:”);
String name = readLine();
print(“\nHello, ” + name + “!”);
}
}
Figure 2. Example code writen in Java and SimJ
The simplified versions of the printing methods are quite
obvious, since they are almost always used in simple
programs. It is also important to mention that, compared to
Java, the structure of the program is unchanged thus
preserving its object-orientation character.
Another important goal of this language is to help teaching of compiler design [1].
SimJ language specification [3,10,11] shown in figure 3 is
very simple, short, equipped with the fundamental and mostly
used parts of a programming language at the beginning level
[9,7]. Related work (i.e. MiniJava [1]) shows that simplicity is
the primary characteristic of these languages.
As mentioned previously we think that similarities with Java
are important but also they should not lose their identity. InMiniJava for example the System.out. println(), that is the
same as in Java, is defined to do the printing but the meaning
of System.out in this language cannot be found. With SimJ we
try to address these problems by creating a simple but well
defined language that syntactically talking is not a reduced
exact copy of the mother language but has its own identity.
Program ::= MainClass ( Class )*
MainClass ::= "class" Identifier "{" "main" "(" ")" "{" Statement "}" "}"
Class ::= "class" Identifier "{" (Variable)* (Method)* "}"Variable ::= Type Identifier ";"
Method ::= Type Identifier "(" (Type Identifier ("," Type Identifier)*)?
")" "{" (Variable)* (Statement)* "return" Expression ";" "}"
Type ::= "boolean"
| "int"| "char"
| "string"
37 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
| "int" "[" "]"
| Identifier
Statement ::= "{" ( Statement )* "}"
| "if" "(" Expression ")" Statement "else" Statement
| "while" "(" Expression ")" Statement
| "for" "(" Expression ";" Expression ";" Expression ")" Statement| "switch" "(" Expression ")" "{" ("case" Expression ":"
Statement "break" ";")* "default" ":" Statement "}"
| "print" "(" Expression ")" ";"
| "println" "(" Expression ")" ";"
| "readLine" "(" ")" ";"| "readInt" "(" ")" ";"
| Identifier "=" Expression ";"
| Identifier "[" Expression "]" "=" Expression ";"
Expression ::= Expression ( "||" | "&&" | "<" | ">" | "!=" | "==" | "+" | "-"
| "*" | "/" ) Expression| Expression "[" Expression "]"
|Expression "."Identifier"("(Expression("," Expression)*)?")"
| <INTEGER>
| <STRING>
| <CHARACTER>| "true"
| "false"
| Identifier
| "this"
| "new" "int" "[" Expression "]"| "new" Identifier "(" ")"
| "!" Expression
| "(" Expression ")"
Identifier ::= <IDENTIFIER>
Figure 3: SimJ language specification
This is an important point that helps reducing possibleambiguities and makes the language more understandable.
SimJ includes the basic building blocks of a programming
language. From this point of view it is quite similar with Java
[8,7]. We have implemented the basic primitive data types
(figure 2):
• boolean – true or false
• int – integers
• char – characters
• string – sequence of characters (string in SimJ for simplicity is considered a primitive data type)
• int[] – array of integers
Mostly used control flow statements [9,8] are implemented in
SimJ (figure 2). Their syntax is the same as in Java
considering that they have no redundant complexity to be
removed:
• if else
• for
• while
• switch
Principal operators [9,8] are also present in SimJ. These
include: addition, subtraction, multiplication, division, logical
and, logical or, logical not, smaller than, greater than, not
equal, equal.
IV. IMPLEMENTATION
For the implementation of SimJ we have used Polyglot as aframework that improves and simplifies compiler design for
languages similar to Java. This process consists in creating a
new language extension. Extensions (in our case SimJ) usually
have the following sub packages [5]:
• ext.simj.ast – AST nodes specific to SimJ
language.
• ext.simj.extension – New extension and
delegate objects specific to SimJ.
• ext.simj.types – Type objects and typing
judgments specific to SimJ.
• ext.simj.visit – Visitors specific to SimJ.
• ext.simj.parse – The parser and lexer for the
SimJ language.
In addition, our extension defines the class
ext.simj.ExtensionInfo [5], which contains the
objects which define how the language is to be parsed and
type checked. There is also a class ext.simj.Version
defined [5], which specifies the version number of SimJ. The
Version class is used as a check when extracting extension-
specific type information from .class files.
The design process of SimJ includes the following tasks [5]:
• Syntactic differences between SimJ and Java are
defined based on the Java grammar found in polyglot/ext/jl/parse/java12.cup.
• Any new AST nodes that SimJ requires are defined
based on the existing Java nodes found in polyglot.ast
(interfaces) and polyglot.ext.jl.ast (implementations).
• Semantic differences between SimJ and Java are
defined. The Polyglot base compiler (jlc) implementsmost of the static semantic of Java as defined in the
Java Language Specification [7].
• Translation from SimJ to Java is defined. The
translation produces a legal Java program that can be
compiled by javac.
We implement SimJ by creating a Polyglot extension with
the characteristics described above. Implementation follows
these steps [5]:
• build.xml is modified and a target for SimJ is
added. This is done based on the skeleton extension
found in polyglot/ext/skel . Running the
customization script polyglot/ext/newext
copies the skeleton to polyglot/ext/simj , and
substitutes our languages name at all the appropriate
places in the skeleton.
• A new parser is implemented using PPG. This is done
by modifying
38 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
polyglot/ext/simj/parse/simj.ppg using
the SimJ syntax.
• The required new AST nodes are implemented. Thenode factorypolyglot/ext/simj/ast/SimJNodeFactor
y_c.java is modified in order to produce these
nodes.
• Semantic checking for SimJ is implemented based on
its rules.
• The translation from SimJ to Java is implemented
based on the translation defined above. This is
implemented as a visitor pass that rewrites the AST
into an AST representing a legal Java program.
V. CONCLUSIONS
Our motivation for creating SimJ was to provide a simple,
understandable and easy to learn programming language
similar to Java that improves the learning of programming
basic structures and being a source language exemplar for
implementing student compiler project. We discovered that theexisting approaches did not fully address the problem of a
simplified Java like structured language and that is not only a
reduced copy of it. Our language is simple but improves
existing solutions by merging their advantages and trying to
avoid the weak points.
Using Polyglot Framework to build the compiler we conclude
that it is an effective and easy way to produce compilers for
Java-like languages like SimJ. It is simple and has a well
defined structure thus offering the possibility to generate a
base skeleton for new language extensions on which we canadd the desired specifications.
Our language, SimJ is a well structured simplified version of
the Java programming language that is not only a reduced
copy of it. SimJ could be used by beginners that want to learn
Java but don’t know anything about object oriented
programming. It is also a good choice for learning compiler
design because of its well defined and easy to implement
structure.
R EFERENCES
[1] Appel, A.W , Palsberg, J. (2002). Modern Compiler Implementation
in Java (2nd ed.). Cambridge University Press.
[2] Metsker,S. J. (2001). Building Parsers with Java. Addison Wesley.
[3] Slonneger, K., Kurtz, B.L. (1995). Formal Syntax and Semantics of Programming Languages, A Laboratory Based Approach. AddisonWesley.K. Elissa, “Title of paper if known,” unpublished.
[4] Mystrom, N., Clarkson, M.R., Myers, A.C. (2003). Polyglot: AnExtensible Compiler Framework for Java. Retrieved January 20, 2007,from http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR2002-1883.
[5] Cornell University, Department of Computer Science. (2003). How toUse Polyglot. Retrieved January 20, 2007, fromhttp://www.cs.cornell.edu/projects/polyglot/.
[6] Cornell University, Department of Computer Science. (2003).. PPG: AParser Generator for Extensible grammars. Retrieved January 20, 2007,http://www.cs. cornell.edu/projects/polyglot/.
[7] Gosling, J., Joy, B., Steele, G., Bracha, G. (2005). The Java LanguageSpecification (3rd ed.). Addison Wesley.
[8] Arnold, K., Gosling, J., Holmes, D. (2005). The Java ProgrammingLanguage (4th ed.). Addison Wesley Professional.
[9] Kernighan, B.W., Ritchie, D.M. (1988). The C Programming Language(2nd ed.). Prentice Hall.
[10] Clinger, W., Rees, J. (2001). Report on the Algorithmic LanguageScheme. Retrieved January 24, 2007, from http://www-swiss.ai.mit.edu/~jaffer/r4rs_toc.html.
[11] Krishnamurthi, Sh. (2006). Programming Languages: Application andInterpretation. Retrieved January 28, 2007, fromhttp://www.cs.brown.edu/~sk/Publications/Books/ ProgLangs/.
[12] Cornell University, Department of Computer Science. (2003). J0: A JavaExtension for Beginning (and Advanced) programmers. RetrievedJanuary 20, 2007, from http:// www.cs.cornell.edu/Projects/j0/.
[13] Lindholm, T., Yellin, F. (1999). The Java Virtual Machine Specification(2nd ed.). Addison Wesley.
[14] Jones, J. (2003). Abstract Syntax Tree Implementation Idioms. RetrievedFebruary 6, 2007, from http://jerry.cs.uiuc.edu/~plop/plop2003/Papers/.
[15] Ambler, S.J. (2006). Introduction to Object-Orientation and UML.Retrieved February 11, 2007, fromhttp://www.agiledata.org/essays/objectOrientation101.html.
[16] O’Docherty, M. (2005). Object-Oriented Analysis and Design:Understanding System Development with UML 2.0. John Wiley & Sons
[17] Graver, J.O. (1992). The Evolution of an Object-Oriented Compiler Framework. Retrieved January 30, 2007, fromhttp://cs.ubc.ca/rr/proceedings/spe91-95/spe/vol22/ issue7/spe767jg.pdf
39 http://sites.google.com/site/ijcsis/
ISSN 1947-5500