4
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 Creating an Appropriate Programming Language for Student Compiler Project Elinda Kajo Mece Department of Informatics Engineering Polytechn ic University of Tirana Tirana, Albania [email protected]  Abstract  — Finding an appropriate and simple source language, to be use d in impleme nti ng student compil er pro jec t, is one of challenges, especially in cases when the students are not familiar with high level programming languages. This paper presents a new programming language intended principally for beginners and didactic purposes in the course of compiler design. SimJ, a reduced form of the Java programming language, is designed for a simple and fas ter progra mming . Mor e rea dab le code, no complex ity, and basi c functi onal ity are the primary goals of SimJ. The language includes the most important functions and data structure s need ed for creating simple prog rams found gene rall y in begi nner s prog ramming text book s. The Polyglot compiler framewo rk is used for the implementation of SimJ.  Keywords-  compiler desig n; new programming la nguage; polyg lot  framework I. INTRODUCTION A compil er course tak es a sig nif ica nt pla ce in comput er scie nce curr icula . This course is alwa ys asso ciate d with an impl emen ting proje ct. Bein g a mult idime nsion al cours e, it re quir es th e st uden ts to be fa mi li ar wi th hi gh le ve l  prog ramming lang uages among the other things . The first impac t wi th these hig h level lan gua ges is almost always con sid ere d con fus ing bec aus e of the ir comple xit y. Thi s  becomes more obvious in object-oriented languages like Java [8]. Object-orientation [15] hinders to learn Java step-by-step from basic principles, because right from the beginning the learner has to define at least one public class with a method with signature public static void main(String[] args). So the teacher has two choices here: trying to explain most of the concepts involved (classes, methods, types, arrays, etc.) or just  provide the surrounding program text and let the learner add code to the body of the method main. SimJ is a simpl e, Java base d prog ramming language . It is conceived and desi gned to eas e teac hi ng of ba sic  programming to beginners. We believe that they should learn easily the basic concepts, before they are exposed to more complex programming issues. It is much simpler for a new  programmer to write println ("Hello world) instead of writing a confusing line like System.out.println ("Hello world"). This simple but concise example shows the importance of the first impact with programming languages. The role of SimJ is to make this impact less “painful”. Comp iler frame work s are widely used as a simple tool for impl ement ing new language s based on exis ting ones. The complexity begins to increase if the differences between the exis ting languag e and the new one beco me sign ific ant [4]. That is why we used Java as a base language for SimJ. For this  purpose we ha ve chos en Poly gl ot [4,5] as a compil er  framework for creating compiler for languages similar to Java. II. THE POL YGL OT FRAMEWORK Polyglot is an extensible Java compiler toolkit designed for exp eri men tat ion wi th new lan guage ext ens ions. The bas e  polyglot compiler, jlc ("Java language compiler"), is a mostly- complete Java front end [1]; that is , it pa rs es [1,2] and  pe rfo rms semant ic chec kin g on Java source code . The compiler output s Java sourc e code. Thus, the base compiler implements the identity translation. Language extensions are implemented on top of the base compiler by extending the concrete and abstract syntax and the t ype system [4]. After ty pe checki ng the lan gua ge ext ens ion, the abs tra ct syntax tree (AST) [1,14] is translated into a Java AST and the existing code is output into a Java source file which can then  be compiled with javac. Polyglot supports the easy creation of compilers for languages similar to Java. The Polyglot framework is useful for domain- spec ific language s, explo rati on of lang uage desig n, and for simplified versions of Java for pedagogical use. As mentioned above, the last part is where we intend to focus on this paper. A Poly glot extens ion is a source-to-so urce compil er tha t accepts a pr ogra m wr it te n in a la nguage exte nsion and translates it to Java source code [4,5]. It also may invoke a Java compiler such as javac to convert its output to bytecode [13 ]. A Si mJ ori ent ed vie w of thi s pro ces s, inc lud ing the eventual compilation to Java bytecode, is shown in figure 1. Figure 1. The Polyglot Compi ler Framework Architecture 36 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Creating an Appropriate Programming Language for Student Compiler Project

  • Upload
    ijcsis

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Creating an Appropriate Programming Language for Student Compiler Project

 

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 9, No. 6, 2011

Creating an Appropriate Programming Language for 

Student Compiler Project

Elinda Kajo Mece

Department of Informatics Engineering

Polytechnic University of Tirana

Tirana, Albania

[email protected]

 Abstract  — Finding an appropriate and simple source language, to

be used in implementing student compiler project, is one of 

challenges, especially in cases when the students are not familiar

with high level programming languages. This paper presents a

new programming language intended principally for beginners

and didactic purposes in the course of compiler design. SimJ, a

reduced form of the Java programming language, is designed for

a simple and faster programming. More readable code, no

complexity, and basic functionality are the primary goals of 

SimJ. The language includes the most important functions and

data structures needed for creating  simple programs found

generally in beginners programming text books. The  Polyglot

compiler framework is used for the implementation of SimJ.

 Keywords-  compiler design; new programming language; polyglot 

 framework 

I. INTRODUCTION

A compiler course takes a significant place in computer 

science curricula. This course is always associated with an

implementing project. Being a multidimensional course, it

requires the students to be familiar with high level

  programming languages among the other things. The firstimpact with these high level languages is almost always

considered confusing because of their complexity. This

 becomes more obvious in object-oriented languages like Java

[8]. Object-orientation [15] hinders to learn Java step-by-step

from basic principles, because right from the beginning the

learner has to define at least one public class with a method

with signature public static void main(String[] args). So the

teacher has two choices here: trying to explain most of theconcepts involved (classes, methods, types, arrays, etc.) or just

 provide the surrounding program text and let the learner add

code to the body of the method main.

SimJ is a simple, Java based programming language. It is

conceived and designed to ease teaching of basic

 programming to beginners. We believe that they should learn

easily the basic concepts, before they are exposed to more

complex programming issues. It is much simpler for a new programmer to write println ("Hello world) instead of writing

a confusing line like System.out.println ("Hello world"). This

simple but concise example shows the importance of the first

impact with programming languages. The role of SimJ is to

make this impact less “painful”.

Compiler frameworks are widely used as a simple tool for 

implementing new languages based on existing ones. The

complexity begins to increase if the differences between the

existing language and the new one become significant [4].

That is why we used Java as a base language for SimJ. For this

  purpose we have chosen Polyglot [4,5] as a compiler 

framework for creating compiler for languages similar to Java.

II. THE POLYGLOT FRAMEWORK 

Polyglot is an extensible Java compiler toolkit designed for 

experimentation with new language extensions. The base

 polyglot compiler, jlc ("Java language compiler"), is a mostly-

complete Java front end [1]; that is, it parses [1,2] and  performs semantic checking on Java source code. The

compiler outputs Java source code. Thus, the base compiler implements the identity translation. Language extensions are

implemented on top of the base compiler by extending the

concrete and abstract syntax and the type system [4].

After type checking the language extension, the abstract

syntax tree (AST) [1,14] is translated into a Java AST and the

existing code is output into a Java source file which can then

 be compiled with javac.

Polyglot supports the easy creation of compilers for languagessimilar to Java. The Polyglot framework is useful for domain-

specific languages, exploration of language design, and for 

simplified versions of Java for pedagogical use. As mentioned

above, the last part is where we intend to focus on this paper.

A Polyglot extension is a source-to-source compiler that

accepts a program written in a language extension and

translates it to Java source code [4,5]. It also may invoke a

Java compiler such as javac to convert its output to bytecode[13]. A SimJ oriented view of this process, including the

eventual compilation to Java bytecode, is shown in figure 1.

Figure 1. The Polyglot Compiler Framework Architecture

36 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 2: Creating an Appropriate Programming Language for Student Compiler Project

 

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 9, No. 6, 2011

The first step in compilation is parsing input source code to

  produce an AST. Polyglot includes an extensible parser generator, PPG [5], which allows the implementer to define

the syntax of the language extension (SimJ in our case) as a set

of changes to the base grammar for Java [7]. The extended

AST may contain new kinds of nodes either to represent

syntax added to the base language or to record new

information in the AST.

The core of the compilation process is a series of compilation

  passes applied to the abstract syntax tree. Both semanticanalysis and translation [1] to Java may comprise several such

 passes. The pass scheduler selects passes to run over the AST

of a single source file, in an order defined by the extension,

ensuring that dependencies between source files are not

violated. Each compilation pass, if successful, rewrites the

AST, producing a new AST that is the input to the next pass.

A language extension may modify the base language pass

schedule by adding, replacing, reordering, or removingcompiler passes. The rewriting process is entirely functional;

compilation passes do not destructively modify the AST.Compilation passes do their work using objects that define

important characteristics of the source and target languages. A

type system object acts as a factory for objects representing

types and related constructs such as method signatures[4,5].

The type system object also provides some type checking

functionality. A node factory [4] constructs AST nodes for its

extension. In extensions that rely on an intermediate language,

multiple type systems and node factories may be used duringcompilation. After all compilation passes complete, the usual

result is a Java AST. A Java compiler such as javac is invoked

to compile the Java code to bytecode.

III. SIMJ PROGRAMMING LANGUAGE

SimJ (stands for Simple Java) is a simplified version of the

Java programming language conceived especially for 

 beginners. The language is very simple, easy to learn and is

very similar to Java. Previous work has been done in this field

(i.e. the J0 programming language [5] but these languages are

quite different compared to Java syntax [7]. We think that

similarity with Java is very important in order to allow the

 programmer to switch to Java without any problems regarding

the syntax when he thinks is ready to explore the full potential

and the advanced features of it.Figure 2 shows an example of the same code written in Java

and in SimJ. This example shows, as mentioned above, that

the code in SimJ is clearly more readable than the one in Java.

Generally, programming courses and textbooks for beginners

include many programs that during their execution require or 

the input of the user. In Java this part it’s definitely neither 

simple nor easy to implement at the beginning level. We

address this problem by removing the complex part andleaving only the “understandable” one (i.e. readLine()).

 public class A {

public static void main(String[] args) {

try {

BufferedReader reader = new BufferedReader(

new InputStreamReader (System.in));System.out.print(“Your name:” );

String name = reader.readLine();

System.out.print(“\nHello, ” + name + “!”);

}

catch (IOException ioexeption) {

System.out.println(ioexeption);

}

}}

class A {

main() {

print(“Your name:”);

String name = readLine();

print(“\nHello, ” + name + “!”);

}

}

Figure 2. Example code writen in Java and SimJ

The simplified versions of the printing methods are quite

obvious, since they are almost always used in simple

 programs. It is also important to mention that, compared to

Java, the structure of the program is unchanged thus

 preserving its object-orientation character.

Another important goal of this language is to help teaching of compiler design [1].

SimJ language specification [3,10,11] shown in figure 3 is

very simple, short, equipped with the fundamental and mostly

used parts of a programming language at the beginning level

[9,7]. Related work (i.e. MiniJava [1]) shows that simplicity is

the primary characteristic of these languages.

As mentioned previously we think that similarities with Java

are important but also they should not lose their identity. InMiniJava for example the System.out. println(), that is the

same as in Java, is defined to do the printing but the meaning

of System.out in this language cannot be found. With SimJ we

try to address these problems by creating a simple but well

defined language that syntactically talking is not a reduced

exact copy of the mother language but has its own identity.

Program ::= MainClass ( Class )*

MainClass ::= "class" Identifier "{" "main" "(" ")" "{" Statement "}" "}"

Class ::= "class" Identifier "{" (Variable)* (Method)* "}"Variable ::= Type Identifier ";"

Method ::= Type Identifier "(" (Type Identifier ("," Type Identifier)*)?

")" "{" (Variable)* (Statement)* "return" Expression ";" "}"

Type ::= "boolean"

| "int"| "char"

| "string"

37 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 3: Creating an Appropriate Programming Language for Student Compiler Project

 

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 9, No. 6, 2011

| "int" "[" "]"

| Identifier 

Statement ::= "{" ( Statement )* "}"

| "if" "(" Expression ")" Statement "else" Statement

| "while" "(" Expression ")" Statement

| "for" "(" Expression ";" Expression ";" Expression ")" Statement| "switch" "(" Expression ")" "{" ("case" Expression ":"

Statement "break" ";")* "default" ":" Statement "}"

| "print" "(" Expression ")" ";"

| "println" "(" Expression ")" ";"

| "readLine" "(" ")" ";"| "readInt" "(" ")" ";"

| Identifier "=" Expression ";"

| Identifier "[" Expression "]" "=" Expression ";"

Expression ::= Expression ( "||" | "&&" | "<" | ">" | "!=" | "==" | "+" | "-"

| "*" | "/" ) Expression| Expression "[" Expression "]"

|Expression "."Identifier"("(Expression("," Expression)*)?")"

| <INTEGER>

| <STRING>

| <CHARACTER>| "true"

| "false"

| Identifier 

| "this"

| "new" "int" "[" Expression "]"| "new" Identifier "(" ")"

| "!" Expression

| "(" Expression ")"

Identifier ::= <IDENTIFIER>

Figure 3: SimJ language specification

This is an important point that helps reducing possibleambiguities and makes the language more understandable.

SimJ includes the basic building blocks of a programming

language. From this point of view it is quite similar with  Java

[8,7]. We have implemented the basic primitive data types

(figure 2):

•  boolean – true or false

• int – integers

• char – characters

• string – sequence of characters (string in SimJ for simplicity is considered a primitive data type)

• int[] – array of integers

Mostly used control flow statements [9,8] are implemented in

SimJ (figure 2). Their syntax is the same as in Java

considering that they have no redundant complexity to be

removed:

• if else

• for 

• while

• switch

Principal operators [9,8] are also present in SimJ. These

include: addition, subtraction, multiplication, division, logical

and, logical or, logical not, smaller than, greater than, not

equal, equal.

IV. IMPLEMENTATION

For the implementation of SimJ we have used Polyglot as aframework that improves and simplifies compiler design for 

languages similar to Java. This process consists in creating a

new language extension. Extensions (in our case SimJ) usually

have the following sub packages [5]:

• ext.simj.ast – AST nodes specific to SimJ

language.

• ext.simj.extension – New extension and

delegate objects specific to SimJ.

• ext.simj.types – Type objects and typing

 judgments specific to SimJ.

• ext.simj.visit – Visitors specific to SimJ.

• ext.simj.parse – The parser and lexer for the

SimJ language.

In addition, our extension defines the class

ext.simj.ExtensionInfo [5], which contains the

objects which define how the language is to be parsed and

type checked. There is also a class ext.simj.Version

defined [5], which specifies the version number of SimJ. The

Version class is used as a check when extracting extension-

specific type information from .class files.

The design process of SimJ includes the following tasks [5]:

 

• Syntactic differences between SimJ and Java are

defined based on the Java grammar found in polyglot/ext/jl/parse/java12.cup.

• Any new AST nodes that SimJ requires are defined

 based on the existing Java nodes found in polyglot.ast

(interfaces) and polyglot.ext.jl.ast (implementations).

• Semantic differences between SimJ and Java are

defined. The Polyglot base compiler (jlc) implementsmost of the static semantic of Java as defined in the

Java Language Specification [7].

• Translation from SimJ to Java is defined. The

translation produces a legal Java program that can be

compiled by javac.

We implement SimJ by creating a Polyglot extension with

the characteristics described above. Implementation follows

these steps [5]:

• build.xml is modified and a target for SimJ is

added. This is done based on the skeleton extension

found in polyglot/ext/skel . Running the

customization script polyglot/ext/newext

copies the skeleton to polyglot/ext/simj , and

substitutes our languages name at all the appropriate

 places in the skeleton.

• A new parser is implemented using PPG. This is done

 by modifying

38 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 4: Creating an Appropriate Programming Language for Student Compiler Project

 

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 9, No. 6, 2011

polyglot/ext/simj/parse/simj.ppg using

the SimJ syntax.

• The required new AST nodes are implemented. Thenode factorypolyglot/ext/simj/ast/SimJNodeFactor

y_c.java is modified in order to produce these

nodes.

• Semantic checking for SimJ is implemented based on

its rules.

• The translation from SimJ to Java is implemented

  based on the translation defined above. This is

implemented as a visitor pass that rewrites the AST

into an AST representing a legal Java program.

V. CONCLUSIONS

Our motivation for creating SimJ was to provide a simple,

understandable and easy to learn programming language

similar to Java that improves the learning of programming

  basic structures and being a source language exemplar for 

implementing student compiler project. We discovered that theexisting approaches did not fully address the problem of a

simplified Java like structured language and that is not only a

reduced copy of it. Our language is simple but improves

existing solutions by merging their advantages and trying to

avoid the weak points.

Using Polyglot Framework to build the compiler we conclude

that it is an effective and easy way to produce compilers for 

Java-like languages like SimJ. It is simple and has a well

defined structure thus offering the possibility to generate a

 base skeleton for new language extensions on which we canadd the desired specifications.

Our language, SimJ is a well structured simplified version of 

the Java programming language that is not only a reduced

copy of it. SimJ could be used by beginners that want to learn

Java but don’t know anything about object oriented

 programming. It is also a good choice for learning compiler 

design because of its well defined and easy to implement

structure.

R EFERENCES

 [1] Appel, A.W , Palsberg, J. (2002). Modern Compiler Implementation

in Java (2nd ed.). Cambridge University Press.

[2] Metsker,S. J. (2001). Building Parsers with Java. Addison Wesley.

[3] Slonneger, K., Kurtz, B.L. (1995). Formal Syntax and Semantics of Programming Languages, A Laboratory Based Approach. AddisonWesley.K. Elissa, “Title of paper if known,” unpublished.

[4] Mystrom, N., Clarkson, M.R., Myers, A.C. (2003). Polyglot: AnExtensible Compiler Framework for Java. Retrieved January 20, 2007,from http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR2002-1883.

[5] Cornell University, Department of Computer Science. (2003). How toUse Polyglot. Retrieved January 20, 2007, fromhttp://www.cs.cornell.edu/projects/polyglot/.

[6] Cornell University, Department of Computer Science. (2003).. PPG: AParser Generator for Extensible grammars. Retrieved January 20, 2007,http://www.cs. cornell.edu/projects/polyglot/.

[7] Gosling, J., Joy, B., Steele, G., Bracha, G. (2005). The Java LanguageSpecification (3rd ed.). Addison Wesley.

[8] Arnold, K., Gosling, J., Holmes, D. (2005). The Java ProgrammingLanguage (4th ed.). Addison Wesley Professional.

[9] Kernighan, B.W., Ritchie, D.M. (1988). The C Programming Language(2nd ed.). Prentice Hall.

[10] Clinger, W., Rees, J. (2001). Report on the Algorithmic LanguageScheme. Retrieved January 24, 2007, from http://www-swiss.ai.mit.edu/~jaffer/r4rs_toc.html.

[11] Krishnamurthi, Sh. (2006). Programming Languages: Application andInterpretation. Retrieved January 28, 2007, fromhttp://www.cs.brown.edu/~sk/Publications/Books/ ProgLangs/.

[12] Cornell University, Department of Computer Science. (2003). J0: A JavaExtension for Beginning (and Advanced) programmers. RetrievedJanuary 20, 2007, from http:// www.cs.cornell.edu/Projects/j0/.

[13] Lindholm, T., Yellin, F. (1999). The Java Virtual Machine Specification(2nd ed.). Addison Wesley.

[14] Jones, J. (2003). Abstract Syntax Tree Implementation Idioms. RetrievedFebruary 6, 2007, from http://jerry.cs.uiuc.edu/~plop/plop2003/Papers/.

[15] Ambler, S.J. (2006). Introduction to Object-Orientation and UML.Retrieved February 11, 2007, fromhttp://www.agiledata.org/essays/objectOrientation101.html.

[16] O’Docherty, M. (2005). Object-Oriented Analysis and Design:Understanding System Development with UML 2.0. John Wiley & Sons

[17] Graver, J.O. (1992). The Evolution of an Object-Oriented Compiler Framework. Retrieved January 30, 2007, fromhttp://cs.ubc.ca/rr/proceedings/spe91-95/spe/vol22/ issue7/spe767jg.pdf 

39 http://sites.google.com/site/ijcsis/

ISSN 1947-5500