40
LISA The descriptive programming language Seyed Pooria Madani Kochak The University of Prince Edward Island CSIT-16 July 17, 2012 Computer Science and Information Technology @ The University of Prince Edward Island

LISA The descriptive programming language

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LISA The descriptive programming language

LISA The descriptiveprogramming languageSeyed Pooria Madani Kochak

The University of Prince Edward Island

CSIT-16July 17, 2012

Computer Science and Information Technology @The University of Prince Edward Island

Page 2: LISA The descriptive programming language

Contents

1 Introduction 41.1 Language’s Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 41.2 LISA Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 41.3 Intermediate Representation . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 51.4 Backend Computational Engine . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 51.5 Formal language’s representation . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 51.6 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 6

2 LISA Language Specifications 72.1 Language Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 72.2 Keywords, Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 9

3 Compiler construction 113.1 Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 113.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 13

3.2.1 LISA’s Predefined Function List . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 133.3 Adding new Pre-defined functions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 153.4 Abstract Syntax Tree (AST) . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 19

3.4.1 Structure of a typical AST Node . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 193.5 Type checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 193.6 LISA lib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 203.7 Intermediate Code Generator . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 213.8 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 213.9 Disabling Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 213.10 Compiler files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 213.11 String Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 213.12 Notes on ”print“ function . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 21

4 ICDFA Generator 224.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 224.2 String representation of ICDFA . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 224.3 Generation of ICDFA in LISA . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 234.4 Marking final states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 274.5 Random generation of ICDFA . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 274.6 Big Integer - The GNU Multiple Precision Arithmetic Library . . . . . . . . . . . . . . . . . . . . . 284.7 Translating ICDFA to Grail+ format . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 28

5 LISA User Manual 295.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 295.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 295.3 Structure of LISA Source Code . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 295.4 Primitive Data-types . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 305.5 Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 305.6 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 315.7 LISA’s grammar and Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 31

5.7.1 Expression Statements . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 315.7.2 Function calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 31

5.8 Loop Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 325.9 Generating Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 325.10 Nested Loop and Generating Statements . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 335.11 I/O Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 34

5.11.1 print Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2

Page 3: LISA The descriptive programming language

5.11.2 readfile Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.11.3 writefile Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.12 String Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 345.13 Type conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 355.14 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 35

6 Classical Tests and Examples 36

7 Ideas for future development 39

List of Figures

1 An ICDFA with no final states (ICDFA∅) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Flowchart for enumeration of ICDFA∅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3

Page 4: LISA The descriptive programming language

1 Introduction

A formal language is a set of string of symbols. Performing operations and tests on formal languages is a worthwhiletask in the fields of mathematics, computer science, and linguistics and is a huge interest to the researchers in thesefields. Many frameworks have been built to satisfy this need and utilize researchers with necessary tools to accomplishtheir tasks.

Normally, software packages such as,Grail+ or MachineCATare modular computational engines that interactwith the user through a medium such as Linux shell-terminal.As demonstrated in Listing 1, for performing threesimple operations on two finite state machine usingGrail+ , a typical user should execute three Linux’s commands andcreate files to store the intermediate outputs. Such a methodis time consuming,inefficient, and somehow difficult forcomplex test cases.

Listing 1: Linux shell commands to work withGrail+

1 linuxserver:home root$ fmcat fm1 fm2 > fm32 linuxserver:home root$ fmunion fm3 fm1 > fm43 linuxserver:home root$ fmsize fm4

The goal of this project was to build a descriptive programming language for performing tests and operations ondifferent classes of formal languages efficiently, and witha superb control-flow power over the sequences of tests andoperations. As demonstrated on Listing 2, the designed descriptive programming language in this project, enablesresearchers to describe the body of their tests and operations in one source-code file, and have a full control over theflow of the operation. This allows, researchers to focus better on the objective of their tests, and will have a organizedtests descriptive files (source code files of their tests).

Listing 2: LISA, the descriptive programming language for operations

1 fm1 = readfile(dfa,"fm1");2 fm2 = readfile(dfa, "fm2");3 fm3 = union (fm1, fm2);4 if ( isdeterministc(fm3) )5 print("The size is:" + size( union ( fm1 , fm3 ) ) );6 else7 print("The size is:" + size( union ( fm2 , fm3 ) ) );

It is worth mentioning that, researchers will have the liberty of choosing their computational engine and instructingthe compiler as which software package to use when performing the operations. For example, the code that demon-strated on Listing 2, can be asked to get executed usingGrail+ or MachineCATsystems.

The goal of this project is to design a descriptive programming language to abstract out all the operations that can beperformed onformal languages; and be a middle-layer software between programmers/researchers and computationalengines such asGrail+ , Machine CAT,FADO, etc. As a result, researchers and programmers will need to learn oneuniform programming language syntax and will have the liberty of using a variety of software packages as the back-endcomputational engine.

1.1 Language’s Name

The programming language that is intended to be designed andimplemented throughout this project is calledLanguageInterpreter and Structural Analyzeror LISA for short. The nameLISA is being suggested byChristopher Vessey, afaculty member at department of computer science at the University of Prince Edward Island.

1.2 LISA Compiler

TheLISAcompiler is only the front-end of a complete compiler suite that translates LISA’s source code into an Inter-mediate Representation(IR) which normally is a conventional programming language such as C/C++. The generatedIR is able communicate with the desired back-end computational engine natively and which then compiled to an ex-ecutable program using third-party programming tools suchasGNU GCC[3] toolkit. Thus,theLISAcompiler is onlytranslating the source code from theLISA language to a suitable conventional programming language source code de-pending on the requested back-end engine. However, theLISAcompiler is more than a simple translator, by creating

4

Page 5: LISA The descriptive programming language

an abstract syntax tree of the source code theLISAcompiler is able to perform type checking, type conversion,andhandle I/O operations.

The lexical analyzer of theLISA compiler was constructed usingFlex. Flex is a free open source software forgenerating lexical analyzer. Flex build the lexical analyzer by generating the C program according to the descriptionsgiven to it and the generated C program has the ability to get integrated with other parts of the compiler.

The parser of theLISAcompiler was constructed usingBison[6]. Bisonis a free open source software for generatingcontext-free grammar parsers, and is part of theGNU project. The generated parser usingBisonis in C source codeand is heavily modified.

All the other part of theLISAcompiler such as theabstract syntax tree, the symbol table, and the intermediate codegeneratorall are implemented from scratch using C programming language and are well integrated with the generatedparser and the generated lexical analyzer.

1.3 Intermediate Representation

The intermediate representation that theLISAsource code gets translated into, is one of the conventionalprogramminglanguages that the back-end computational engine is implemented in. For example,Grail+ , the current back-end en-gine of theLISA, is being implemented using C/C++. Therefore, theLISAcompiler will translate theLISAsource codeinto C/C++ source code (the IR) so that the translated code can useGrail+ as the library. Similarly, the intermediatecode generator of theLISAcompiler can get modified to generatePythoncode in order to useFADO as the back-endcomputational engine.

1.4 Backend Computational Engine

The back-end of this compiler is responsible for performingall the tests and operations that are being supportedby LISA ’s language specifications. Some frameworks exists that perform different tests and operations on formallanguages. This project did not implement the complete back-end, but used the existing computational engines andprogramming libraries as the back-end.

Grail+ is “A symbolic computation environment for finite-state machines, regular expressions, and finite lan-guages.” In other words, Grails+ support most of the operations that can be performed on some of the formal lan-guages, andGrail+ served as the major back-end for this project.

This project aimed to design an architecture that enable theextendability and/or replace-ability of the back-endengine with different engines without any change to the grammar ofLISA.

1.5 Formal language’s representation

A representation syntax is necessary for formal languages,so that the user can introduce the desired formal languages(regular, context-free, etc) into the system and read the result back. There are different ways to represent formallanguages, which are listed bellow:

1. Finite-state Machine(FSM) :

• Deterministic finite automata(DFA)

• Non-deterministic finite automata(NFA)

• Pushed down automata

• Moore Machine

• epsilon-NFA

• Mealy Machine

2. Formal Grammar:

• Regular

• Context-free

5

Page 6: LISA The descriptive programming language

• Context-sensitive

• Recursive-enumerable

3. Regular Expression

4. Decision procedure

Note, except for the regular expressions, the user will not enter these representation directly into the program.But, this representation will be supplied as a separate file along with the source code for execution. Since theserepresentations will be supplied to the back-end engine, itis very important that they be in the right syntax accordingto the back-end engine, in this case Grail+. In other words, the describedDFA that is stored in a file has to be in thecorrect syntactical form according to the back-end engine that is going to be used.

1.6 Operations

Following operations will be supported as language primitives on formal languages. Keywords and/or predefinedfunctions will be allocated for each of the following operations, and the grammar will be described in detail in thegrammar section.

1. String operations:Concatenation Alphabet of a string String substationString homomorphism String Projection Right quotientSyntactic relation Right cancellation Prefix

2. Set operation:Complement Union IntersectionSet difference Symmetric difference Cartesian product

3. Intersection with regular languages

6

Page 7: LISA The descriptive programming language

2 LISA Language Specifications

LISA ’s language specifications, grammar rules, terminals, and non-terminals are discussed in depth in this section.All the design decisions, and future potential expansions are also introduced.

2.1 Language Grammar

1 <program> ::= <declaration-block> <program-block>

2 <declaration-block> ::= “declare” “{” <declaration-statements> “}”

3 <declaration-statements> ::= <declaration-statements> <declaration-statement>

4 | ǫ

5 <declaration-statement> ::= <type> ID “;”

6 <type> ::= “int”7 | “dfa”8 | “nfa”9 | “regex”10 | “bool”11 | “string”

12 <program-block> ::= “program” “{” <statements> “}”

13 <statements> ::= <statements> <statement>14 | ǫ

15 <statement> ::= <expression-statement>

16 | <if-statement>17 | <compund-statement>

18 | <while-statement>19 | <generating-statement>

20 | <“break”> “;”21 | <“continue”> “;”

22 <generatingstatement> ::= “generate” “(” <generator-type> “,” INT “,” INT “)”<statement>

23 <generator-type> ::= “random”24 | “enumerate”

25 <expression-statement> ::= <expression> “;”

26 <if-statement> ::= “if” “(” <expression> “)” <statement>

27 <compund-statement> ::= statements

28 <while-statement> ::= WHILE ( expression ) statement

29 <expression> ::= <variable< <exprop> <expression>

Table 1: Grammar rules - Continued on next page

7

Page 8: LISA The descriptive programming language

30 | <simple-expression>

31 <exprop> ::= “=”32 | “-=”33 | “*=”34 | “/=”35 | “+=”

36 <simple-expression> ::= <simple-expression> “ ||” <or-expression>37 | <or-expression>

38 <or-expression> ::= <or-expression> “&&” <unary-relationexpression>39 | <unary-relationexpression>

40 <unary-relationexpression> ::= ! <unary-relationexpression>41 | <relation-expression>

42 <relation-expression> ::= <add-expression> <relop> <add-expression>43 | <add-expression>

44 <relop> ::= “<=”45 | “>=”46 | “==”47 | “!=”48 | “>”49 | “<”

50 <add-expression> ::= <add-expression> <addop> <term>

51 | <term>

52 <addop> ::= “-”53 | “+”

54 <term> ::= <term> <multop> <factor>55 | <factor>

56 <multop> ::= “*”57 | “/”58 | “%”

59 <factor> ::= “(” <simple-expression> “)”60 | <constant>

61 <constant> ::= <INTEGER>

62 | <TRUE>

63 | <FALSE>

64 | “next”65 | “hasnext”66 | <variable>67 | STRINGLITERAL68 | <function>

Table 1: Grammar rules - Continued on next page

8

Page 9: LISA The descriptive programming language

69 <function> ::= ID “(” <parameter-list> “)”

70 <parameter-list> ::= <simple-expression>71 | <parameter-list> , <simple-expression>72 | ǫ

73 variable ::= ID

Table 1: Grammar rules

All the grammar rules that are listed in table 1 are in Backus Normal Form which is almost identical to the syntax ofBison.

2.2 Keywords, Terminal

Terminals and keywords of the grammar are described here briefly. Refer to chapter 6 for better understanding. Singlecharacters tokens are terminals that represent the value ofthe token also. For example, ’;’ (semicolon) will return backas ’;’ to the parser without any name attached to it. Moreover, all the capital tokens are terminals and all the lower-casetokens are non-terminals. Note that “Token name” is what returned by lexical analyzer to the parser after reading the“Token value” from the source code.

Token name Token value DescriptionINTEGER [+-]?[1-9][0-9]* Any integerID [a-zA-Z][a-z0-9A-Z]* Identifier. Any string start with al-

phabet and contains only alphabetor digit. Will be used for declaringnames (i.e. function name, variablename)

STRINGLITERAL List of (any) characters betweendouble quotations (” ”)

INT int Is a keyword for defining integervariables and values.

DFA dfa Is a keyword for defining Determin-istic Finite Auatomata variables andvalue.

NFA nfa Is a keyword for defining Non-deterministic Finite Auatomatavariables and value.

DECLARE declare Is a keyword for creating declara-tion block.

PROGRAM program Is a keyword for creating program-code block.

WHILE while Is a keyword for creating a loop-statement.

IF if Is a keyword for creating a condi-tional statement

ELSE else Is a keyword used in conjunctionwith IF.

AE += Add and assign value.Table 2: Terminals - Continued on next page

9

Page 10: LISA The descriptive programming language

TE *= Multiple and assign value.ME -= Subtract and assign value.DE /= Divide and assign value.BOOL bool Is a keyword for defining boolean

variable.TRUE true Is a boolean value.FALSE false Is a boolean value.CONTINUE continue Is a keyword, and a statement that

skip an iteration in a loop.BREAK break Is a keyword, and a statement that

exit from loop intentionally.NEXT next Is a keyword, and a statement that

will return an automated generatedvalue.

HASNEXT hasnext Is a keyword, and statement thatwill show if any value left to be re-turned by the generator.

GENERATE generate Is a keyword, that will automati-cally generateICDFA.

LENGTH length Is a keyword.TYPE typed Is a keyword.ALPHABET alphabet Is a keyword for representing alpha-

bet.ENUMERATE enumerate Is a keyword.RANDOM random Is a keyword.STRING string Is a keyword for defining string

variables and values.OR || Logical OR.AND && Logical AND.NE != Not equal.LE <= Less than or equal.GE >= Greater than or Equal.EQ == Equal.

- , $ () <> = + ; . ! * / % These are single character tokensthat get directly returned to theparser as a single character.

Table 2: Terminals

Some of the terminals that are listed in the Table-2 are keywords that are listed on Table 3. The keywords that arelisted in Table 3 can not be used as a name for variables and functions.

int bool dfa nfa regex declare program while if else true false breakcontinue next hasnext length generate type random sequencestring

Table 3: List of the keywords

10

Page 11: LISA The descriptive programming language

Note that although pre-defined functions are not detected asakeywordby the lexical analyzer, but they are reservedwords and are not allowed to be used in naming the variables.

3 Compiler construction

Full detail of the LISA’s compiler construction is being discussed in this chapter.

3.1 Symbol Table

LISA ’s symbol table is an expandable data structure. A typical row of this symbol table containsvalue name, value,token type, and symbol type. Value is a viod pointer (viod *) to the value that the row corresponds to. Token type onlyacceptssymbol_token_type enums as the value. Once the lexical analyzer detects a symbol that is not in the table,it will assign one of the token type values chosen fromsymbol_token_type enum and it will add the token into thesymbol table. Symbol type is the type that will eventually get figured out by the parser.

For example, the following code will get inserted into the symbol table as fallow:

int my_int;

The lexical analyzer will detectint as a keyword token. Moreover, the lexical analyzer will detect int as acharacter pointer (char *) , thus,thevalue_name in the symbol table will point to a character pointer. Moreover,sinceint is a keyword, the lexical analyzer will assignTOKENTYPE_KEYWORD enum value totoken_type of thenode. Keywords are special nodes in the symbol table that do not get any value in theirsymbol_type (NULL).

my_int will get detected by the lexical analyzer as identifier. Sincemy_int is a string of characters, the lexicalanalyzer will detectmy_int as a character pointer (char *) , thus,thevalue_name in the symbol table will pointto a character pointer. The corresponding′token_type′ value that will get set for this node is′TOKENTY PE_ID′.The lexical analyzer can not detect the type that should be associated with this identifier. However, the parser willeventually find out thatmy_int represents an integer variable and the node type formy_int should get set by theparser to′NODETY PE_INTEGER′.

Bear in mind, the symbol table will be used by the parser and the lexical analyzer simultaneously. Thus, forerror-tracings that are related to the symbol table, both ofthese modules should get inspected.symboltable.candsymboltable.hfiles contain definitions and implementations of the symbol table.

Listing 3: Cenumrepresenting node types in the symbol table

1 enum symbol_node_type{2 NODETYPE_DFA,3 NODETYPE_NFA,4 NODETYPE_REGEX,5 NODETYPE_STRING, //String literals6 NODETYPE_INTEGER,7 NODETYPE_BOOLEAN, //true false8 NODETYPE_PREDEFINED_FUNCTION9 };

Listing 4: Cenumrepresenting token types in the symbol table

12 enum symbol_token_type{3 TOKENTYPE_ID,4 TOKENTYPE_STRINGLITERALS,5 TOKENTYPE_NUMBER,6 TOKENTYPE_KEYWORD,7 TOKENTYPE_SYMBOL8 };

11

Page 12: LISA The descriptive programming language

Listing 5: Cenumrepresenting node template in the symbol table

12 typedef struct{3 void *value;4 enum symbol_token_type token_type;5 enum symbol_node_type node_type;6 } symbol_node;

For future improvement, the symbol table can be implementedas ahashtable, in order to increase the searchperformance.

Table 4 contains the functions that are defined and implemented insymbol_table.handsymbol_table.cfiles.

Function’s prototype Descriptionvoid init_symbol_table(void); Will allocated memory for the symbol table, and

will load all the keywords and predefined namesand functions into the symbol table.

void expand_symbol_table(void); Will expand the symbol table once reach its max-imum.

int add_symbol_node(symbol_node *); Will add the given symbol node into the table andreturn the index number after adding. This func-tion will check for the uniqueness of the node tobe added, and if the node is already in the table,it will only return the index to that node and willnot add a new node. Moreover, this function willcheck the symbol table capacity and will call theexpand function if it was needed.

int find_symbol_node(symbol_node *); Will find the given node in the symbol table andreturn the index to the node in the symbol table. Ifthe given node does not exists, it will return -1.

void print_symbol_table(void); Will print the symbol table. Will be used only fordebugging purposes.

int add_id_symbol_node(char *); Will wrap a given string asTOKENTYPE_IDnode and will add it into the symbol table and willreturn back the index number for the added node.

int add_number_symbol_node(int); Will wrap a given integer asTOKENTYPE_NUMBER node and will add itinto the symbol table and will return back theindex number for the added node.

int add_stringliterals_symbol_node(char *);Will wrap a given integer asTOKENTYPE_STRINGLITERALS nODE andwill add it into the symbol table and will returnback the index number for the added node. It willalso set the node type toNODETYPE_STRING.

Table 4: Symbol Table’s Functions

As shown on Listing 6, in thesymbol_table.h, INIT_CAPACITY is holding the initial size of the symbol ta-ble. Every-time that the capacity gets reached, the symbol table gets expanded by the constant sized represented byINCREASE_BY. The last three functions in the above table, are helper functions. All of these three function will calladd_symbol_node function explicitly.

12

Page 13: LISA The descriptive programming language

Listing 6: symbol_table.hpartial

1 /*************************************************************/2 // Symbol Table constants3 /*************************************************************/4 #define INIT_CAPACITY 100 //initial capacity of symbol table5 #define INCREASE_BY 1006 /*************************************************************/

3.2 Functions

LISAprovides pre-defined functions for operations on the primitive data types, such as union, minimization, and otherset operations on the formal languages. Moreover, all of theI/O operations will be handled using pre-defined functions.The syntax for using the functions is similar to the C programming language. The grammar rule for the function isgeneral, any identifier followed by parentheses and list of parameter, is a function call. If the function exists in thesymbol table, it will get called, otherwise an error will be thrown.

Functions are built-in in the back-end engine and are preloaded into the symbol table. Thus, for extendingLISA ’scapability and introducing new pre-defined functions, no modification to the grammar rules ofLISA is necessary.

Currently, user-defined functions are not supported, but with current function-call grammar rule, function creationgrammar rule can be added to the language without loosing anyconsistency, in future.

3.2.1 LISA’s Predefined Function List

List of current pre-defined functions that are supported inLISA is given on Table 5. Any future modification to thepre-defined functions, or creation of new pre-defined functions are expected to be mentioned in this section.

Function Name Descriptionsize(parameter) The only accepted parameter type for this function

arenfa,dfa,regex. Which will compute the size ofthe parameter and return integer as a result.

isfinite(parameter) The only accepted parameter types for this func-tion arenfa,dfa. Will return true if the parameteris finite, and false otherwise.

iscomplete(parameter) The only accepted parameter types for this func-tion arenfa,dfa. Will return true if the parameteris complete, and false otherwise.

isuniversal(parameter) The only accepted parameter types for this func-tion arenfa,dfa. Will return true if the parameteris universal, false otherwise.

isdeterministic(parameter) The only accepted parameter types for this func-tion arenfa,dfa. Will return true if the param-eter is deterministic false otherwise. Since non-deterministic automata can also be stored in NFA,it might come handy to have function to check ifthe NFA is deterministic or not. It always returntrue for DFAs.

Table 5: Pre-defined functions - Continued on next page

13

Page 14: LISA The descriptive programming language

readfile(parameter1,parameter2) The only accepted keywords for theparameter1are tt nfa,dfa,regex,string, and only accepted typefor theparameter2is string literals(or string vari-able) which is a filesystem path. This functionopen file content pointed to byparameter2, parseit according to the type specified by parameter1and return object which is compatible with param-eter1. For example, if description of a DFA in anstandard format (Grail+) has been stored in a fileon a computer and user willing to read the file con-tent and store in a variable typedDFA, should usereadfile function where first parameter set todfa and second parameter set to path.

writefile(parameter1,parameter2) The only accepted keywords for theparameter1are nfa,dfa,regex,string, and only ac-cepted type for theparameter2is string literals(orstring variable) which is filesystem path. It willwrite theparameter1into file that path given byparameter2. For example, ifdfa variable whichstores information about adfa required to be writ-ten into a file, this function can be used.

parse(parameter1,parameter2) The only accepted keywords for the parameter1arenfa,dfa,regex, and only accepted type for pa-rameter2 is string literals(or string variable). Thisfunction will translate the string (parameter2) intodata-type requested as parameter1 and return thetranslated object.

print(parameter) The only accepted types for the parameter1 arenfa,dfa,regex,string,int. And will print the re-quested parameter into standard output.

reverse (parameter) The only accepted types for the parameter arenfa,dfa. This function compute "reverse" of thegiven parameter and return the result, in the sametype as the parameter’s. "String" type is not yetsupported. Bare in mind, the reverse of adfa maynot be a DFA. In such case, the DFA has to bestored in a NFA variable and the operation get per-formed on it.

union(parameter1,parameter2) The only accepted types for theparameter1andthe parameter2are nfa,dfa,regex. This functionwill compute the union of the two parameter andreturn the result. Bare in mind, that parameter1and parameter2 have to be same type, otherwiseexplicit conversion should be made.

plus(parameter) The only accepted types for the parameter arenfa,dfa,regex. This function perform plus opera-tion on the given parameter.

star(parameter) The only accepted types for the parameter arenfa,dfa,regex. This function perform star opera-tion on the given parameter.

Table 5: Pre-defined functions - Continued on next page

14

Page 15: LISA The descriptive programming language

reachable(parameter) The only accepted types for the parameter arenfa,dfa. This function check to see if all the statesof the parameter is reachable.

reduce(parameter) The only accepted types for the parameter arenfa,dfa. Will reduce the parameter, and return typereduced object as the same type of parameter.

max(parameter)complete(parameter) The only accepted types for parameter arenfa,dfa.

It will compute the complete version of the param-eter variable and return the result. The returnedobject will have the same type as the parameter.

shuffle(parameter1,parameter2) The only accepted types for the parameters arenfa,dfa. It will shuffle the parameters and returnthe result. The returned object will have the sametype as the parameter.

concat(parameter1, parameter2) The only accepted types for the parameters arenfa,dfa. It will concatinate the parameters and re-turn the result. The returned object will have thesame type as the parameter.

nfatodfa(parameter) The only accepted types for the parameters isnfa.It will convert the parameter toDFA type objectand return the result.

dfatonfa(parameter) The only accepted types for the parameters isdfa.It will convert the parameter toNFA type objectand return the result.

Table 5: Pre-defined functions

Some operations onDFA types, might change the object fromDFA to NFA, which will be counted as the back-endengine bug.LISA library will takes measures to keep them inDFA form.

3.3 Adding new Pre-defined functions

This section is a brief tutorial for adding new pre-defined functions toLISA ’s compiler.

• LISA’s compiler : symboltable.c, functions.c, icg.c, parser.y

• Back-end Engine: LISA Library, Grail+

LISA compiler, compilesLISA ’s syntax to C/C+(currentLISA ’s IR). Intermediate Code Generator (icg.c) isresponsible for generating the intermediate representation (C/C++) source code. Once the syntax of function-call getsdetected,parser.ywill call functions_icg() function in icg.c.

Insidefunctions_icg(), the name of the function-call will be checked with the symbol table’s nodes, to makesure such a function is exists. Once the function name is found in the symbols table, the appropriate function fromfunctions.cwil be called to generate IR. However, if the function does not exist, a syntax error will be reported.

Functions insidefunctions.cwill generate appropriate C/C++ code and returnunsigned char * as a result.Functions will check for the function call parameter, and perform some level of type checking before generating thecode. Therefore, there is some level of control as what kind of IR(C/C++) code gets generated based on the informationprovided in the parameter(s).

15

Page 16: LISA The descriptive programming language

Once the appropriate function is executed fromfunctions.c, the returnedstring and an appropriate node-type willbe inserted into theabstract syntax tree. Based on the described process for intermediate code generation, adding newpredefined functions will be accomplished by following sequence of operations:

• Add the function-name, and (maybe) return type into symbol table by modifying symbol table’sinit() functioninsidesymboltable.cfile.

• Implement appropriate C/C++ method(or class) insideLISA library, which will be the semantic implementationfor the pre-defined function.

• Implement appropriate C function insidefunctions.cthat the generate IR(C/C++) code. Some level of typechecking, and syntax checking should be performing by this function.

• Modify function_icg() function insideicg.cto call the function implemented in thefunctions.cand installthe return string into theabstract syntax tree, and attach a node-type to it.

By following the above steps, a new pre-defined function willbe added into the language and will be part of thesyntax ofLISA. It is worth mentioning that when for introducing new pre-defined function, no new grammar rule hasto be introduced into the grammar ofLISA. Therefore, even users who have no knowledge of compiler design can stilladd new pre-defined functions and extendLISA ’s functionality.

Let’s explore, step by step, how the pre-defined functionsize(...) is implemented. The first step is to add apre-defined function name to the symbol table and set the appropriate attributes.

Listing 7: Adding a new function name into symbol table

1 void init_symbol_table(void) {2 node = (symbol_node *) malloc(sizeof(symbol_node));3 node->value_name = "size";4 node->token_type = TOKENTYPE_PREDEFINED_FUNCTION;5 node->node_type = NODETYPE_INTEGER;6 add_symbol_node(node);7 ....8 ....9 ....

10 }

Pre-defined values, such as function names and keywords, arefilled into the sumbol table usinginit_symbol_table() function.symbol_node is a Cstruct that represent a node in the symbol table.

On line 2 of Listing 7, a new node is created (in heap). On line 3, the function name is introduced. Names willbe passed to the parser as tokens and each token will have typethat indicates its purpose. Thus, pre-defined functionnames are indicated by setting theirtoken_type to TOKENTYPE_PREDEFINED_FUNCTION. Moreover, somepre-functions return only type likesizefunction, whereas some other pre-defined functions may return different typesbased on their parameters. To elaborate more, the functionreducewill reduce the finite state machine that is passedas the parameter and return object same type as its parameter. Thus, the returning type of thereduce function solelydepends on its parameter and that can beDFA typed orNFA typed. In contrast,thesize function always returnsinteger

For those functions that return only a specific type,node_type should be set to that specific type. For example,in the case of thesize pre-defined function thenode_type has to be set toNODETYPE_INTEGER. But, in a casethat the type of the return object is not known and may differ from parameter to parameter,NODETYPE_NULL has tobe chosen fornode_type. After setting the attributes for the node, the node gets added to the symbol table usingadd_symbol_node() function.

Step two is to declare class/methods in theLISAlibrary to support the action of retrieving the size of the requestedobject. Since different parameter types demand differentsizemethod implementation in theLISA library, for eachparameter type, a specific size-method has been implemented. DFA::sizemethod has been declared for DFA types,NFA::size for NFA types, and so on.

All of the bases have been created to support the pre-defined functionsize, and only the Intermediate CodeGeneration (C/C++) is left to be built. Insidefunctions.c a function needs to be created with the same name asthe pre-defined function. Having the same name is a coding standard that is being adapted by developers of this project

16

Page 17: LISA The descriptive programming language

and it is requested that it be followed by the future developers. The return type and parameters for the newly createdfunction should be as follows and should not be changed.

Listing 8: Method signature template

unsigned char * <function name> (int idSIndex, struct Parameter_List *parameterlist);

Referring to listing 8, the functions in thefunctions.cfile all must returnunsigned char * which would bea pointer to the generated IR(C/C++). The first parameter,idSIndex, is the index to the pre-defined function namein the symbol table. The second parameter,parameterlist, is a pointer to a subtree in theabstract syntax tree.The pointed subtree contains the list of parameters that will be supplied to the pre-defined function during the functioncall.

Listing 9: Complete implementation ofsize function in function.c

1 unsigned char *size(int idSIndex, struct Parameter_List *parameterlist) {23 unsigned char *generated_code;4 int code_size = 0; //length of generated code5 unsigned char *functionName; //Hold the function name should be called6 /*7 * (1) Size will only accept once parameter in the parameter list,8 * therefore we have to check there is only one single expression9 * in the parameter list. But, if the parameterlist had more than one

10 * simple expression, the compilation process has to halt, and throw an error11 */12 if (parameterlist->parameterlist != NULL || parameterlist->simple_expression == NULL) {13 //throw syntax error.14 yyerror("No appropriate function’s signiture for ’size’ function");15 }1617 switch (parameterlist->simple_expression->type) {18 case NODETYPE_NFA:19 functionName = "NFA::size\0";20 break;21 case NODETYPE_DFA:22 functionName = "DFA::size\0";23 break;24 case NODETYPE_STRING:25 functionName = "STRING::size\0";26 break;27 case NODETYPE_REGEX:28 functionName = "REGEX::size\0";29 break;30 default:31 yyerror("The parameter for ’size’ function is invalid.");32 break;33 }3435 int variableIndex = parameterlist->simple_expression->or_expression->

unary_relation_expression->36 relation_expression->add_expression1->term->factor->constant->

sindex_constant;3738 symbol_node var_node = symbol_table[variableIndex];39 code_size = strlen(functionName) + strlen(var_node.value_name) + 4;40 generated_code = (unsigned char *)malloc(sizeof(unsigned char) * code_size);41 memset(generated_code,’\0’,code_size); //Always set memory to NULL4243 /*44 * The pattern to be created:45 * <functionName> ( <var_node.value_name> )46 */47 memcpy(generated_code,functionName,strlen(functionName));48 generated_code[strlen(functionName)] = ’(’ ;

17

Page 18: LISA The descriptive programming language

49 memcpy(&(generated_code[strlen(functionName)+1]),var_node.value_name,strlen(var_node.value_name));

50 generated_code[strlen(functionName) + strlen(var_node.value_name) + 1] = ’)’ ;5152 return generated_code;5354 }

Listing 9 contains full implementation for the intermediate code generation forsize pre-defined function. Line13 checks the number of parameters supplied, that has to be only one parameter forsize function.

Since intermediate code will be different based on the parameter’s type, line 18-34 generate C/C++ that is mostappropriate for the supplied parameter type. Mainly, theseintermediate codes are somewhat related to the methods/-classes created on step 2. Line 36-55 generate complete intermediate code representation (parameters included) andreturn the generated string.

Once the parser detect a function-call, it will callfunction_icg() function. In listing 10, line 3 defines aabstract syntax tree node for representing the function-call. From line 6 to the end of list of ifs, the name of function-call will be searched and appropriate function implementation will get called from those that are implemented infunctions.c

Listing 10:function_icg function implementation in icg.c file.

1 struct Function *function_icg(int idSIndex, struct Parameter_List *parameterlist) {23 struct Function *func = (struct Function*)malloc(sizeof(struct Function));45 symbol_node idnode = symbol_table[idSIndex];6 if (idnode.token_type != TOKENTYPE_PREDEFINED_FUNCTION)7 yyerror(strcat(idnode.value_name, " - is not a predefined function."));89 unsigned char *generated_stmt;

1011 if (strcmp(idnode.value_name,"size") == 0) {1213 generated_stmt = size(idSIndex,parameterlist);14 func->type = idnode.node_type;1516 }else if (strcmp(idnode.value_name,"max") == 0) {1718 generated_stmt = max(idSIndex,parameterlist);19 func->type = idnode.node_type;20 }21 ....22 ....23 and many more else if for each pre-defined function24 ....25 ....2627 func->id_index = idSIndex;28 func->parameterlist = parameterlist;29 func->code = generated_stmt;30 return func;31 }

After calling appropriate intermediate code generator function, the returned generated string will be stored ingenerated_stmt and will be set as the node’s attributes to function node thatwill soon be inserted in to theabstract syntax tree.

Each abstract syntax tree node will have a type and for nodes that represent a function call, the type has to be setto a returning object type. For example, since the returningtype of thesize pre-defined function isintegerand thathas been already set in the symbol table, that definition willget copied to thetypeattribute of the function node (forexample: line 14). Line 30 returns the generated node which will be part of the abstract syntax tree.

18

Page 19: LISA The descriptive programming language

3.4 Abstract Syntax Tree (AST)

Due to complexity ofLISA ’s grammar, the existence of a abstract syntax tree is required for parsing. An implicitsyntax tree will be generated by BISON which meets our requirement for parsing the S-Attribute grammar. Moreover,using the “%type” keyword makes it possible to declare non-terminals with their appropriate attributes and thoseattributes will be helpful later on during parsing.

By using%union keyword it is possible to specify several data types for the semantic values. The semanticvalue for each non-terminal is a uqniue C structure (struct) that holds appropriate attribute information for thatnon-terminal. These structures are declared in theasbtractsyntaxtree.hfile. By placing all of the declared structuresin the%union section, declaring non-terminals using those structures become possible.

After introducing structures to BISON by using%union, it is time to associate those structures with the appropri-ate non-terminals. This association has been demonstratedon Listing 11

Listing 11: Declaring non-terminals in YACC

1 %type <variable name in %union section> non-terminal’s_name;

Listing 11 will guide BISON that type of the givennon-terminalis equal to the type of the declared variable insidethe%union section.

Each non-terminal inLISAhas its own unique function for generating appropriate intermediate code. These func-tions are located in files theicg.hand theicg.cand return appropriate type according to the%type declaration of thenon-terminal.

3.4.1 Structure of a typical AST Node

Let’s consider non-terminalExpression(on Listing 12), which is representing the grammar rule for expressions inLISA.

Listing 12: Grammar rule example in YACC/BISON syntax

1 expression:2 variable exprop expression3 | simpleexpression4 ;

The node for representing ”expression“ in the syntax tree isshown on Listing 13.

Listing 13: Expression syntax tree node, C syntax

1 struct Expression {2 enum symbol_node_type type;3 struct Simple_Expression *sm_expression;4 int operator_index;5 int variable_index;6 struct Expression *expression;7 unsigned char *code;8 };

Most of the non-terminals should have a type (int, string, and ...). Their type can be what has been carried out fromthe bottom of the tree or according to the non-terminal requirement. These types will be used for type checking whichis explained in the following sections. Moreover, most of the non-terminals havecodeattribute which represents thegenerated code from that specific non-terminal to the bottomof the tree.

3.5 Type checking

Type checking is performed during the intermediate code generation.LISAcurrently does not support anyboxingorcastingfeature. However, the only automatic type conversion happens when concatenatingstring with integerusing“+” operator, which the overall result will bestring.

Type checking is required anywhere that it is applicable andit is a strong standard inLISA . Type checking for anode is performed by consulting the syntax tree of the node. Each node in the syntax tree hastypeassociated with it,

19

Page 20: LISA The descriptive programming language

and the value of the type in the parent node is based on the children nodes’ type.Not all types are pre-determined. Most of the pre-defined functions have their return-value type specified in

the symbol table. However, this is not true for some pre-defined functions since they can return many possible typesaccording to their parameters. For example,union is a pre-defined function that compute union of its two parameters.The two parameters can be only two DFA, or two NFA typed variables. The return value of theunion function willbe aDFA if the both of the supplied parameters areDFA and will beNFA if both of the supplied parameters areNFA.Therefore, the return value of the specific ”union“-call node will be determined in the intermediate code generation bychecking the type of its parameters.

Both side of the assignment operators have to have same data types. Some of the functions such asreadfileshould explicitly get asked to return what kind of object type. But some other functions such asunion will returnspecific types that are determined by the type of their parameters.

3.6 LISA lib

LISA lib, is a C++ library that interfaces withLISA ’s predefined-functions with Grail+. Moreover,LISA lib containsan Interconnected Deterministic Finite Automata (ICDFA)generator that is dicuessed in detail in chapter 4.5. Thislibrary is a greate template for how to interfaceLISA ’s with another Framework, and should be studied carefully.Table 6 shows the source code files forLISA Lib. These files are located underobj/lib/lisa directory.

File Name Descriptiondfa.h Interface class for Grail+ FM class for represent-

ing DFAdfa.c Implementation of dfa.hnfa.h Interface class for Grail+ FM class for represent-

ing NFAnfa.h Implementation of nfa.hio.h LISA I/O features, such as reading and writing

filesio.c Implementation of io.hgenerator.h ICDFA generatorgenerator.c Implementation of the ICDFA generatorre.h Interface class for Grail+ RE class for represent-

ing Regular Expressionre.c Implementation of re.hlisastd.h Lisa standard libraryinclude.h The main include file that contains all the LISA

lib .h filesTable 6: Source code files

Most of the methods that are implemented in this library are “static”. Grail+ was not designed to be used as a C++Library. Therefore, only one main file can include the headers of Grail+, otherwise the headers get linked a coupleof times and that causes multiple-linkage compilation error. To overcome this difficultyinclude.hwill include all theheader files as well as the implementation files. By includinginclude.hinto the final source file all the other headerfiles and the Grail+ source code will get compiled only once.

The given solution is a trick. The author would suggest usinga proper modification of Grail+ so as to be able touse it as an external library. However, due to the time constraint, such modifications were not possible at the time ofLISAdevelopment, but it is strongly suggested for future projects.

20

Page 21: LISA The descriptive programming language

3.7 Intermediate Code Generator

3.8 Makefile

LISA ’s comopiler source code is compiled and linked using a GNU Makefile. Each module (headers [.h] and imple-mentations [.c]) are compile as an object file into the “temp/“ directory, and all will get linked to an executable called”lisa“ into the ”obj/“ directory. Makefile is being used for compiling BISON and Flex files too.

There are some notes that needs attention whenever using the“makefile”:

• Due to the existence of significant compiler warnings, options such as ”-w“ have been used for compilation todisables warnings. Abundance of these options are suggested for performing test and debugging.

• A debugging feature has been enabled using ”-d“ option. However, it is suggested that programmers are sug-gested to remove this option for final release.

3.9 Disabling Warnings

Using Grail+ as the back-end engine results in many compilation warnings that are related to Grail+’s implementation.For clear results and to see only errors, it is suggested thatprogrammers use the “-w” option whenever they try tocompile the generated C++ usingGNU g++.

3.10 Compiler files

Table 7 shows list of the source code files forLISAcompiler. Please note that this list does not containsLISA libraryfiles, which are interface files for Grail+ andLISAintegration.

File Name Descriptionparser.y Bison generated, parser file. contains the main

function.lex.l Lexical analyzer generated using Flex.icg.h Intermediate Code Generator header file.icg.c Intermediate Code Generator implementation file.functions.h Pre-defined functions code generator.functions.c Pre-defined functions code generator.symboltable.h Symbol table definitions and constants.symboltable.c Symbol table maintaining functions.abstractsyntaxtree.hAbstract syntax tree nodes.

Table 7: Source code files

3.11 String Concatenation

String concatenation is supported only between string and integer types.LISA , at this phase, does not have any planto handle string concatenation between other types such as DFA, NFA and etc.

3.12 Notes on ”print“ function

”print“ predefined-function will print primitive data types (int, string, DFA, and etc) on the standard output and returnsvoid.

21

Page 22: LISA The descriptive programming language

4 ICDFA Generator

One ofLISA ’s feature is the ability toenumerateor randomly generate Initially Connected Deterministic finite Au-tomata(ICDFA), which can be done using generate-statement. Therefore, the generator engine, which is able to enu-merate and randomly generateICDFAs, has been implemented along with this project. TheICDFA generator is partof LISA libad will be part of the backend computational engine.

4.1 Preliminaries

A determinist finite automaton(DFA) is a finite state machine that represent a regular language. DFA is used (notlimited to) to check if a string of alphabet (input) belongs to the regular language that is represented by that DFA.

For a formal definition, DFA is a tuple(Q, Σ, δ, q0, F ), whereQ is a finite set of states,Σ is a finite set ofalphabets (symbols),q0 is the initial state,F ⊆ Q is the set of final states, andδ is list of transition functions such thatδ : Q × Σ → Q. Readers are strongly encouraged to have basic understanding of regular languages and DFAs.A DFA is initially-connected(ICDFA) if all of the statesQ are reachable fromq0. In other words, as stated by M.Almeida, N.Moreira, and Rogerio Reis, 2007 [1] “for eachq ∈ Q there exists a sequence ofqi|i

∫[0, j] states and a

sequence ofσi|i ∈ [0, j − 1] symbols, for somej < |Q|, such thatδ(q′, σm) = q′m + 1, q′0

= q0andq′j = q.”A DFA without final states will be denoted by(Q, Σ, δ, q0, F ) and is referred to as aDFA∅. Likewise, anICDFA

without any final is referred to as anICDFA∅.

4.2 String representation of ICDFA

Due to the text processing power of computing methods and their efficency, representing ICDFA using string ofcharacters is alot more efficient in terms of computing resources in compare to other methods. Throughout of thissection, string representation ofDFA∅ andICDFA∅ is discussed.

Astart

B

C

DE

0

1

1

0

0

1

1

00,1

Figure 1: An ICDFA with no final states (ICDFA∅)

A naive representation of anICDFA∅ or DFA∅ can be obtained by enumerating the states and for each state alist of its transitions for each symbol. For example, the givenICDFA∅ in Figure 3.1, can be represented as:

[[A(0 : B, 1 : C)], [B(0 : C, 1 : B)], [C(0 : D, 1 : E)],

[D(0 : A, 1 : D)], [E(0 : A, 1 : A)]](1)

22

Page 23: LISA The descriptive programming language

Given a completeDFA∅(Q,∑

, δ, q0), the number of states will be represented byn = |Q|, and the number ofalphabets represented byk = |

∑|. By considering a total order over

∑, the representation 1 can be simplified by

removing the alphabetic symbols from the representation. Therefore, the representation 1 can be rewritten as fallow:

[[A(B, C)], [B(C, B)], [C(D, E)], [D(A, D)], [E(A, A)]] (2)

Since the chosen labels for states have a standard order (in our example, the alphabetic order is considered), wecan simplify the representation even more by rewriting the representation as fallow:

s = [1, 2, 2, 1, 3, 4, 0, 3, 0, 0] (3)

The order of the labels chosen for states in our example can beconsidered as A=0, B=1, C=2, D=3, E4. Since thetotal number of alphabetsk = 2 (

∑= 0, 1), the first2(k) item in the representation 3 are transitions from state A

going to state B(1) using alphabet 0, and going to state C(2) using alphabet 1, the second 2 item in the representation 3are transitions from state B going to state C(2) by alphabet 0, and going to state B(1) by alphabet “1”, and so forth.

To obtain acanonical representationof anICDFA∅, given an order over the alphabet, induced order in the statesand transitions can be considered. Thus, representation 3 is a canonical representation for theICDFA∅ in Figure3.1. Discussing details about canonical representation isout of scope of this documents, and readers are referred to[Almeida, Moreira, Reis, 2007] for more information.

As is visually confirm-able from representation 3, this string representation is simple and efficient and can betranslated to any form.LISA ’s ICDFA generator uses this representation to enumerate ICDFAs, and translate therepresentation toGrail+ format after creation. Therefore, firm understanding ofcanonical representationis necessaryfor understanding the generator code.

It worth to mention, if we restrict canonical representation to ICDFA∅, then the canonical representation is unique[Almedia, Moreira, Reis,2007].

4.3 Generation of ICDFA in LISA

There are certain rules for enumerating ICDFA∅ which are not discussed in this paper, however they are well coveredon “Enumeration and Generation of ICDFA Technical Report from University of Porto”[1]. The generation of ICDFA∅is accomplished by generating sequence of flags. Let(fj)j ∈ [0, n − 1] be the sequence of indexes of the firstoccurrence of each state labelj. For examples, for the given canonical string representation 3, the array of flags willbe:

Listing 14: Flags arry

int f[] = {0,0,1,4,5}

f[0] representing the first occurrence of state A. However, sinceA(which is state number 0) is the starting state,there is no point of giving index to it, it is only included in the array for sake of indexing consistency and any valuefor f[0] should be ignored.

f[1] representing the index to the first occurrence of state B( which is state number 1) in the canonical string.By careful observation of canonical representation 3 stateB(#1) is first occurred at index 0. The same idea holds forf [2] . . . f [n − 1] wheren = |Q|.

In the canonical string, whatever states are between the first occurrence of two states, should be bigger or equal thanthe first state (assuming states are ordered), and less than or equal to the second state. For instance, in the canonicalrepresentation based on flags shown on Listing 14, all the states betweens[0] ands[5], are bigger than state B(1) andless than or equal to E(4). Any canonical string that satisfythis informal rule, is a valid canonical string and in thispaper we try to refer to this rule asfilling the gap. Since there are many possible ways for filling the gap, all thepossible representations are a valid canonical string of interest.

Therefore, enumeration of all the ICDFA∅ is accomplished by generating all the possible flags , and generating allthe possible canonical string for each flag (filling the gap).

23

Page 24: LISA The descriptive programming language

The formal rules for generating flags is:

(∀j ∈ [2, n − 1])(fj > fj − 1) (4)

(∀m ∈ [1, n − 1])(fm < km) (5)

Which implies thatf1 ∈ [0, k − 1], andfj − 1 < fk < kj for j ∈ [2, n − 1].

Initilize

Last ICDFA∅

generated?

All currentflag’s ICDFA∅

is generated?

Programterminates

Generatenext flag

return nextICDFA∅

Initilizecanonical

string withcurrent flag

yes

no

yes

no

Figure 2: Flowchart for enumeration of ICDFA∅

Figure 3.2 is showing the steps for generating all ICDFA∅, given k, and n in LISA’s ICDFA generating engine.Bear in mind, the intent of this section is to describe all theparts ofLISA ’s generating engine such as classes andmethods which are all implemented in C++. ICDFA generator isimplemented inGeneratorclass.

Initialization step will generate the first flag sequence forgiven k, and n, as shown on Listing 15

Listing 15: Initialization of first flag sequence

1 void Generatorinit_flags(){2 flags[0] = 0;3 for(int i = 1; i < n; i++){4 flags[i] = (k * i) - 1;5 }6 }

24

Page 25: LISA The descriptive programming language

whereflags[] is the array for holding flag sequence. All the other flags willget generated based on the firstinitialized flag.

For any newly generated flag sequence, a corresponding canonical string will be created. The algorithm for suchgeneration is demonstrated on Listing 16.

Listing 16: Creating the first canonical string from newly created flags

1 void Generator::reset(){2 for(int i = 0; i < n*k; i++)3 canonical_string[i] = 0;45 for(int i = 0; i< n; i++)6 canonical_string[flags[i]] = i;7 }

where all the indexes except those pointed by flags will be setto 0, and the indexes pointed by flags will be set toappropriate state number. Bear in mind, for every newly generated flags, the first canonical string will be generatedusing this algorithm which we call itreset() method. Thus, inLISA ’s generating engine, the first canonical stringfor givenn andk will be generated during the initialization step.

The current sequence of flags and current canonical string are both stored inint *flags, and int

*canonical_str instance variables in theGenerator class. The size offlags array isn, and the size ofcanonical_string array isn × k.

Generating ICDFA∅ will stop once the currentcanonical_str is the last valid canonical string for givenn andk.Therefore, in each iteration of generating process, a checkhas to be done to see if if thecanonical_str is the lastcanonical string, as demonstrated on Listing 17.

Listing 17: Check to see if the current canonical string is the last valid one

1 bool Generator::is_last(){2 int index = 0;3 int i;4 for(i= 1; i <= getN()-1; i++){5 if(getCanonicalString()[index] != i) return false;6 index++;7 }8 for(i = getK()-1; i <= (getK()-1) * (getN()+1); i++){9 if(getCanonicalString()[index] != getN()-1) return false;

10 index++;11 }12 return true;1314 }

As shown in Listing 17, if theis_last() method returntrue, generating engine will returnnull that indicatesthere are no more ICDFA∅ left to be generated.

If the currentcanonical_string is not the last valid canonical string for givenn andk, still we have tocheck if the current canonical string is the last valid canonical string for current flag sequence. If so, that means thecurrent flag sequence is the saturated and no more valid canonical_string can be generated with current flags sequence.Thus,the next flags sequence will get generated. Listing 18 demonstrates the flags saturation check.

25

Page 26: LISA The descriptive programming language

Listing 18: Check to see if the flags sequence is staturated

1 bool Generator::is_full(){2 for(int j = 1; j < getN()-1; j++){3 for(int l = flags[j]+1; l <= flags[j+1]-1; l++){4 if(getCanonicalString()[l] != j) return false;5 }6 }78 for( int l = flags[getN()-1]; l < getK()*getN(); l++){9 if(getCanonicalString()[l] != getN()-1) return false;

10 }1112 return true;13 }

If is_full() method returnstrue, the next flag sequence will get generated as demonstrated onListing 19.

Listing 19: Generating next flags sequence

1 void Generator::next_flags(int i){23 if(i == 1){4 flags[i] = flags[i]-1;5 }else{6 if(flags[i] -1 == flags[i-1]){7 flags[i] = getK() * i -1;8 next_flags(i-1);9 }else

10 flags[i] = flags[i]-1;1112 }1314 }

Since a new flags sequence got generated,reset() method has to get called. However, ifis_full() re-turnsfalse, it means the current flag is not saturated, and it is possibleto generate anotherICDFA∅. This process isdemonstrated on Listing 20.

26

Page 27: LISA The descriptive programming language

Listing 20: Generating next flags sequence

12 void Generator::next_icdfa(unsigned long a, unsigned long b){3 int i = a * getK() + b;4 if(a < getN()-1){5 while(is_flagged(i) >= 0){6 for(int kk = i + 1; kk <= getK()*getN()-1; kk++)7 if(is_flagged(kk)< 0)8 getCanonicalString()[kk] = 0;9

10 b = b - 1;11 i = i - 1;12 }13 }1415 // f[j] = the nearest flag not exceeding i16 int j=1;17 for(int x = getN()-1; x >= 0; x--){18 if(flags[x] <= i ){19 j = x;20 break;21 }22 }2324 if(getCanonicalString()[i] == getCanonicalString()[ flags[j] ] ){25 getCanonicalString()[i] = 0;26 if(b == 0) next_icdfa(a-1,getK()-1);27 else next_icdfa(a,b-1);28 }else{29 getCanonicalString()[i] = getCanonicalString()[i] + 1;30 }3132 }

So far, creatingICDFA∅ in LISAis being describe. However,LISA’s generating engine, generateICDFA by addingthe final states to the createdICDFA∅. Process of adding final states is being described in the nextsection.

4.4 Marking final states

For a givenn states, there are2n possible different configuration of final states. Therefore, for eachICDFA∅, it ispossible to generate2n ICDFA where each have different set of final states.

For example, in figure 3.1, where showsICDFA∅ with n = 5 it is possible to generate25 = 32 differentICDFA.One would be anICDFA with final state marked at stateA; another would be anICDFA with final states marked atstateA andB, and so forth.

Therefore, inLISA ’s ICDFA generator engine, after generating anyICDFA∅ a set all the possible2n possibleICDFA will get generated and return. In the generator class, the arrayflags

1 int **flags;

is anintegerarray with the size of2n andn wide. Each row of this array has a sequence of0 and1 for indicatingwhich states are the final states.flags[0] has all of its element1 which means all the states are final states.flags[n−1]has all of its elements0 which means none of the states are final states.

It worth to mention, the methodinit_finals() in the generator class, will construct and initialize all the 2n

possible final states configuration.

4.5 Random generation of ICDFA

What it has been covered so far, was about generating all of the ICDFAs for givenn andk, which includes generatingICDFA∅ and marking the final states. In order to have an ICDFA which israndomly generated, both of these processes,

27

Page 28: LISA The descriptive programming language

generating ICDFA∅ and marking final states, has get randomized. In other words,different methods should be used togenerate a random ICDFA∅ and also pick a random final-states marking scheme fromfinals array.

Although this topic has been fully covered in the paper by [the paper], a brief implementation description is givenhere. Generating a random integer is well documented and studied in the field of Computer Science, and the idea is tofind a method to translate from integer to ICDFA∅ directly. An algorithm is needed to find thejth ICDFA∅ for a givenn andk without generating all of the previousICDFA∅ that are exists before thej.

TheGenerator class contais a recursive methodGenerator::compute_n which helps the process of gen-eratingICDFA∅ from integers.

Nn−1,j = nbk−1−jwithj ∈ (6)

The magic numberN contains properties given on equation 6.Generator::compute_n is themethod whichcomputes the numberN . It worth to mention that the computedN can get extreme large for small values ofn andk

(i.en = 10, k = 2) which will not fit inprimitive data types such asunsigned longwhich is 64-bit wide. Handlingand operating with such big numbers is well discussed in the following chapters.

The methodcompute_n accept three parameters.result is a big-int type that stores the result of the operation.m andj are the parameters that are well described by equation 3.6 and 3.7.

Different values form andj will give different results will be interpreted differently. For instance the total numberof ICDFA∅s for givenn andk can be calculated by the code demonstrated on Listing 21, which usescompute_nmethod to accomplish this task.

Listing 21: Compute the total number of ICDFA∅s for givenn andk

1 void Generator::getSize(mpz_t max){2 mpz_init_set_ui(max, 0);3 mpz_t spare;4 mpz_init(spare);56 for(int i = 0; i < getK(); i++){7 mpz_set_ui(spare,0);8 compute_n(spare,1,i); //Where9 mpz_add(max,max,spare);

10 }11 mpz_clear(spare);12 }

Another place (critical) where compute_n method is being used, is within theGenerator::generate_from_index method. This method generateICDFA∅ from integer provided inthe parameter. Since the size ofICDFA∅ for small number ofn andk can get really huge, the parameter for thismethod accept big integer.

4.6 Big Integer - The GNU Multiple Precision Arithmetic Libr ary

As mentioned in the previous sections, some numbers that aregenerated usingcompute_n method are astronom-ically big, thereforebig integersupporting is necessary for this project. The GNU Multiple Precision ArithmeticLibrary has being used as the primary library for dealing with big numbers [4].

All of the C/C++ files generated byLISA compiler as output should linked usingGMP library. Listing 22 is alinux command to to compile the generated file using GMP library

Listing 22: How to compile LISA generated C++ using GMPLib

g++ -w generated.cc -lgmp

4.7 Translating ICDFA to Grail+ format

Canonical representations 3 is an array of numbers which denotes the states numbers. However, this array can gettranslated to any desire form. The back-end engine for current LISA’s project isGrail+ , therefore all the representa-

28

Page 29: LISA The descriptive programming language

tions should be in theGrail+ format.The Generator::canonicalstring_to_fm() is the method responsible to translate canonical

representation toGrail+ finite state machine format and marks thefinal states accordingly. Thecanonicalstring_to_fm() is the method which can be changed to translate the canonicalrepresentation toany desired format, other thanGrail+ .

5 LISA User Manual

5.1 Introduction

LISAstands for Language Investigator and Structural Analyzer,is a descriptive programming language for performingtests and operations on formal languages. However, so far, only regular language operations are supported.

The goal ofLISA is to enable researchers in the field ofFormal Languagesfocus more on the tests that they arewilling to perform and be less concern with the actual implementation of the tests. In other words, a researcher candescribe the objective of different tests and operations ona set of regular languages without thinking much about theimplementation of the described tests. Thus, researched can do tests a lot more elegant and faster.

What has madeLISA a special language, is supporting different elements of formal languages as its primitivedata-type. For example,deterministic finite automatonorDFA is a primitive data type inLISA. Thus, users can createvariables that areDFA typed and work with them without any limitation.

This document contains basic information regarding how to useLISA . For detailed information regarding imple-mentation and modification, readers are encouraged to referto LISA ’s technical document.

5.2 Requirements

LISAuses some other third-party programming libraries and tools for compilingLISA ’s source code into executable.Thus, for successful compilation, the presence of all of theintermediate third-party libraries and tools are required.

LISA is developed forLinux systems. Therefore, correct operation ofLISA is only expected when running on aLinuxbased system. Moreover, the operating system must have installed the following application packages thatLISAuses for code generation.

The generatedIntermediate Representation(IR) code forLISA is C++ and LISA usesGCC[3] to compile thegenerated intermediateC++ code into executable.Thus presence of this software package is mandatory and users whoare going to useLISAshould also have the necessary permissions to use this software package as well.

LISAusesastyle[2] software package to beautify the generatedC++ source to be more readable for human inter-pretation. Thus presence of this software package is mandatory and users who are going to useLISAshould also havethe necessary permission to use this software package as well.

LISAlibrary is part of the back-end engine of theLISAproject and is designed in such a way to function well evenwith big integers. As a result, GMP library[4] is being used extensively insideLISA library. Thus presence of GMPlibrary is mandatory and users who are going to useLISAshould have the necessary permission to use this softwarepackage as well.

Grail+ [5] is part of the back-end engine of theLISAproject. Grail+ is a symbolic computation environment forfinite-state machines, regular expressions, and finite languages. Thus presence of this software package is mandatoryand users who are going to useLISAshould also have the necessary permission to use this software package as well.

5.3 Structure of LISA Source Code

LISA ’s source code is divided to two sections,declarationandprogram. Declarationis where variables get declared.As is shown on Listing 23, thedeclarationsection is denoted bydeclare keyword. It worth to mention,declarationsection does not allow any variable initialization.

29

Page 30: LISA The descriptive programming language

Listing 23: Simple structure of LISA source code

1 declare{2 //Where deceleration take place3 }45 program{6 //Where program source code placed7 }

Programsection, which is denoted byprogram keyword , is where the main logic of the user-program will behelded. No declaration can take place in this section, however, declared variables should get initialized at the beginningof this section.

The only supported format forcommentinside code, is single-line comment. Single-line commentsstart with “//”and will be terminated at theEND-OF-LINE.

5.4 Primitive Data-types

Following data types, are types that are well supported as the most basic element of the languageLISA. For example,Deterministic Finite State Machineis a basic data type that developers can declare variables that having this data typeand perform basicset operationson those variables. Having such types as the primitive data type permits developersto deal with these types very naturally with a very clear syntax.

• Integer: int is the keyword for declaring integer variable.

• Boolean: bool is the keyword for declaring boolean variable.

• String: string is the keyword for declaring string variable.

• Deterministic Finite State Machine (DFA): dfa is the keyword for declaring DFA variable.

• Non-deterministic Finite State Machine (NFA): nfa is the keyword for declaring NFA variable.

• Regular Expression (RE): re is the keyword for declaring RE variable.

Sections 5.5 and 5.6 contain descriptions to how to declare variables and arrays from the primitive data types listedabove.

5.5 Variable

Variables are allowed to be declare only in thedeclaration-sectionas discussed on section 5.3.

Listing 24: Syntax for declaring variable

<primitive data-type> <variable-name> ;

As demonstrated the syntax for declaring variable on Listing 24,<primitive data-type> will be replaced with oneof primitive data-types keyword that being discussed on section 5.4.

<variable-name> should be replaced with an identifier. Identifiers have to start with alphabet can can containunderscore(_) and numbers. Listing 25 has demonstrated declaring variables.

Listing 25: Example for declaring variables

int my_int;dfa my_dfa;nfa my_nfa;string my_string;re my_regural_expression;

30

Page 31: LISA The descriptive programming language

5.6 Array

Array data types are supported for all of the primitive data types inLISA . Array variables are only allowed to bedeclare in thedeclaration-sectionas discussed on section 5.3.

Listing 26: Syntax for declaring array variable

<primitive data-type> [ <size> ] <variable-name> ;

As demonstrated the syntax for declaring array variable on Listing 26,<primitive data-type> will be replacedwith one of primitive data-types keyword that being discussed on section 5.4.

<size> will be replaced with an integer value to indicated the size of the array to be declared.<variable-name> should be replaced with an identifier. Identifiers have to start with alphabet can can contain

underscore(_) and numbers. Listing 27 has demonstrated declaring variables.

Listing 27: Example for declaring array variables

int[4] my_int;dfa[2] my_dfa;nfa[6] my_nfa;string[10] my_string;re[120] my_regural_expression;

5.7 LISA’s grammar and Syntax

LISA’s language grammar is well defined in Backus−Naur Form in Table 1. Moreover, this section will cover syntaxof different statements in a much more readable format.

5.7.1 Expression Statements

Expression statements are statements that end with “;”. There are two kind of expression statements:

1. Assignment Statements:Left hand side of the assignment operator (i.e.*=, =, +=, *=, -=, /=) hasto be a variable’s identifier and right hand side of the statement has to be a primitive value or simple expressionstatements that can be reduced to a primitive value (i.eboolean, int, string, dfa, and etc).

2. Simple Expression Statements:Any form of expression that eventually will replaced by a value. Thus, anyalgebraic expression, relational expression, boolean expression, and function calls are counted to be simpleexpression statement.

For more information about the syntax of thesimple expression statementsplease refer to Table 1.

5.7.2 Function calls

Function call is asimple expression statementand will be replaced by the value that is returned by a function. Not allfunctions return a value. For example,print function, is a function that print string of characters on the standard-outputand does not return anything. Therefore, using such functions in assignment statements is not possible and willthrow a syntax error upon any use in the assignment statements.

Functions are in the core ofLISA . All the communications with back-end engine happens through function calls.Functions are pre-defined inLISA , andLISAdoes not have any syntax for function-definition. Table 5 on page 15 isthe list of pre-defined functions with description. Listing28 demonstrates declaring variables and function calls.

31

Page 32: LISA The descriptive programming language

Listing 28: Declaring variables and Function Call

1 declare{2 //Where declaration take place3 dfa my_dfa; //dfa type variable4 nfa my_nfa; //nfa type variable5 int index; //integer variable6 }78 program{9 //Where program source code placed

1011 //Assignment Expression Statement12 index = 2+3;1314 //Function that returns dfa15 my_dfa = readfile(dfa,"~/dfa.txt");1617 //Function that print my_dfa but doesn’t return anything18 print(my_dfa);19 }

5.8 Loop Statement

The only loop-statement that is being supported is thewhile loop-statement. The syntax of a typical loop statement isshown in Listing 29. It worth to mention that in the while-statement,<statements> has to be enclosed between curlybraces.

Listing 29: while-statement’s syntax

1 while ( <boolean-expression> ) {2 <statements> OR break;3 }

As noted in Listing 29,break statements are allowed in the body of the loop-statement, will cause to break outofthe loop compeletly.

5.9 Generating Statement

LISA ’s back-end computational engine hasInterconnected Deterministic Finite Automatagenerator. Thus,LISA ’susers can benefit from a built-inICDFA generator that can generate randomICDFA , or enumerate all the describedICDFAs.

Listing 30: generate-statement’s syntax

1 generate( random OR enumerate , <integer- number of states>, <integer - number ofalphabets>)

2 {3 //TODO: insert code to deal with generated ICDFAs4 }

Listing 30 shows basic syntax of generate-statement. The first parameter is indicating that the created generator israndomor enumerator, and bothrandom andenumerate are keywords. Second parameter is indicating the numberof the states (i.e.|Q|). Third parameter is indicating the number of the alphabets(i.e. |

∑|).

Generate-statement is a specialized loop-statement. If the generator described to be arandomgenerator, thengenerate-statement becomes a non-terminating loop, and user has to terminate the loop manually by usingbreakstatement.

However, if the genervator described to be aenumerator, it will continue running until the lastICDFA gets gener-ated. But, user still has control over the loop and can terminate it at any desired time usingbreak; statement.

Inside the generate-statement, the latest generatedICDFA can be retrieved usingnext; statement.Next state-ment will returnDFA type, thus, the returned value can only be stored in adfa typed variable.

32

Page 33: LISA The descriptive programming language

Listing 31: generate-statement enumeration example

1 generate ( enumerate, 3 , 2 ) {2 //Read the generated ICDFA3 my_dfa = next;45 //Print it into standard IO6 print(my_dfa);7 }

Listing 31 is an example for the use of generate-statement that enumerates all of theICDFA with n = 3 statesandk = 2. next will return the latest generatedICDFA. Caution is needed when usingnext statement, since if thefinal generatedICDFA is returned, there will be noICDFA left to be returned for the next call. By usinghasnextstatement, one make sure that if there are still at least oneICDFA left to be generated.hasnext statement will returntrue if there are someICDFA left to be generated, andfalseif the generation is being finished.

Caution has to be made ifnext statement is being used more than one time inside the body of agenerate-statement.In order to prevent any error, it has to be checked if there aresomeICDFA left to be generated usinghasnext.

Listing 32: generate-statement random example

1 generate ( random , 3 , 2 ) {2 //Read the generated ICDFA3 my_dfa = next;45 //Print it into standard IO6 print(my_dfa);78 //Exit if enough ICDFA generated9 if ( counter >= MAX_COUNTER){

10 break;11 }1213 counter += 1;14 }

Listing 32 demonstrates how to use a randomICDFA generator. Since randomICDFA generator never endsautomatically, they have to be ended manually. Thus, at somepoint, as demonstrated, insturction for terminationshould be given usingbreak statement.

5.10 Nested Loop and Generating Statements

Nested loop statements are well supported, like other conventional programming languages such asC/C++. Nestedgenerating statements are also well supported. There are different cases that nested generating statements are needed,one can find more examples on chapter 6

Listing 33: Nested generating statements example

1 generate (enumerate , 3 , 2)2 {3 mydfa_a = next; //frist NEXT4 generate(enumerate, 4 , 2){5 mydfa_b = next; //second NEXT;67 mydfa_union = union(mydfa_a, mydfa_b);8 print(mydfa_union);9 }

10 print("Program is finished!");11 }

Listing 33 demonstrates a sample program with nested generating statements. As is clear from Listing 33 ,nextstatement has been called twice. Eachnext statement refers to the most closest outergenerate-statement.

33

Page 34: LISA The descriptive programming language

5.11 I/O Operations

I/O operations, at current version ofLISA , is very limited. The only I/O operations that are supportedarereadingfiles, writing files, andprinting on standard output.

5.11.1 print Function

Pre-definedprint function is responsible for printing all the primitive datatypes (i.e.string, int, dfa, nfa, boolean,andregex) on the standard output. String concatenation betweenstring types andintegertypes is well supported, but nostring concatenation betweenstring and types other thanintegeris not possible.

Listing 33 is an example of demonstratingprint function for printing aDFA typed variable and astring literalson standard output.

5.11.2 readfile Function

Pre-definedreadfile function is responsible for reading files and parse the content of them to the requested primi-tive data type.

As demonstrated in Listing 34 the first parameter of this function is primitive data type to indicate the format ofthe file that needs to be parsed.

Currently, inputFinite State Machinefiles has to be inGrail+ ’s format. This requirement might varies accordingto the back-end computational engine.

Listing 34:readfile/writefile function demonstration

1 declare{2 dfa my_dfa1;3 dfa my_dfa2;4 }5 program{67 //Read two files in8 my_dfa1 = readfile(dfa, "/home/dfa1.txt");9 my_dfa2 = readfile(dfa, "/home/dfa2.txt");

1011 //Write the union result into file12 writefile (union(my_dfa1, my_dfa2) ,13 "/home/output.txt");1415 }

5.11.3 writefile Function

Pre-definedwritefile function is responsible for writing data objects to file. As demonstrated in Listing 34,writefile function accepts two parameters, first is referring to the object to be written, and the second parameteris path to the local hard disk.

It worth to mention that string representation ofDFA and NFAdata types are in theGrail+ ’s Finite State Machineformat.

5.12 String Concatenation

StringandIntegertypes can concatenate together very well in all cases, and the resulting object will bestring typedobject. String concatenation can happen between twostring typed variables, or onestring and oneintegervariableusing “+” sign in between of them. Listing 35 demonstrates examples regarding string concatenation.

34

Page 35: LISA The descriptive programming language

Listing 35: String concatenation example

1 ...2 print("This is test " + 1);34 mydfa_1 = readfile(dfa, "/home/dfa" + 1 + " .txt");5 ...

5.13 Type conversion

In cases where two variables(objects) needed to interact with each other directly through assignment, or indirectlythrough pre-defined functions, they need to be in the same type. Type conversion will not handled implicitly, developeris held responsible for all the type conversions, except forstring concatenation.

The two pre-defined functionsnfatodfa, anddfatonfa are responsible for converting betweenDFA andNFAtypes. Moreover,parse function will convert string represented regular expression toNFA or DFA.

Listing 36: Type conversion examples

12 my_dfa = nfatodfa(my_nfa); //converting to NFA3 my_nfa = dfatonfa(my_dfa); //converting to DFA4 my_dfa = parse(dfa, "regular-expression"); //converting regex to DFA5 my_dfa = parse(nfa, "regural-expression"); //converting regex to NFA6 my_regex = parse(regex,"regular-expression"); //convert string to regex object

As demonstrated on Listing 36,parse function accepts two parameters. The first parameter is indicating thetype that string-literal should get translated to, and the second parameter is the string-literal representing the regularexpression.

5.14 Compiling

LISA compiler will translateLISA ’s source code to intoC/C++ source code, and usesGNU Compiler Toolkittocompile the generated C/C++.

Listing 37: Linux shell command to compile usingLISAcompiler

1 ./lisa sourcecode.l

Two files will get generated as a result of shell command shownon Listing 37:

1. generated.cc: The generatedC/C++ source code that user can modify farther to satisfy his needs.

2. <filname>.o: The executable version of thegenerated.ccfile that is compiled and linked automatically, thename of the output file is the same as the source file name.

If generated.ccfile had to be modified to satisfy some other needs, one should at least compile thegenerated.ccfile with the commands that demonstrated on Listing 38

Listing 38: Linux shell command to compilegenerated.cc

1 g++ -w generated.cc -lgmp

As is demonstrated on Listing 38,gmplib should also gets linked since back-end engine uses that library for itscomputations.

But it worth to mention that compiling thegenerated.ccmanually is not necessary, unless some changes manuallyapplied to on thegenerated.ccfile.

For debugging purposes, symbol table will also get printed if -d switch get used on during compilation.

Listing 39: Linux shell command to compileLISAprogram in debug mode

1 ./lisa -d sourcefile.l

35

Page 36: LISA The descriptive programming language

6 Classical Tests and Examples

This chapter contains examples for demonstrating different aspect ofLISA . Moreover, some of classical examples inthe field of formal languages are demonstrated in this chapter. Listing 40 demonstrated how to create arrays and howto store informations in the arrays.

Listing 40: Example for using arrays

1 declare{2 int MAX;3 int index;4 dfa[10] dfa_list;5 }6 program{7 MAX = 10; //Maximum number of ICDFA8 index = 0;9

10 generate(random, 4,2){1112 dfa_list[index] = next;1314 index += 1;15 if (index == MAX)16 {17 break;18 }19 }2021 index = 0;22 while( index < MAX)23 {24 print(index + ")\n");25 print(dfa_list[index]);26 index += 1;27 }2829 }

Listing 41: Classic Example 1

1 // Classical example2 // Enumerate all the possible ICDFA n=3, k=23 // compute the union between all of two ICDFAs,4 // and find the two ICDFA that generate biggest DFA5 // result from union of the two.6 declare{7 int Max;8 int m;9 int n;

10 dfa MaxA;11 dfa MaxB;12 dfa A;13 dfa B;14 dfa C;1516 }17 program{18 Max = 0;19 generate(enumerate,3,2){20 A = next;21 generate(enumerate,3,2){22 B = next;23 C = union(A,B);24 C = reduce(C);25 if(!iscomplete(C)){26 C = complete(C);

36

Page 37: LISA The descriptive programming language

27 }28 m = size(C);29 if(m > Max){30 Max = m;31 MaxA = A;32 MaxB = B;33 }34 }35 }3637 print("The size of union is: " + Max);38 print(MaxA);39 print(MaxB);40 print(union(MaxA,MaxB));41 }

Listing 42: Classic Example 2

1 declare{2 dfa a;3 dfa b;4 dfa c;5 dfa maxA;6 dfa maxB;7 int maxSize;8 int MAX;9 int i;

10 }11 program{1213 MAX = 1000; //Maximum number of randoms14 maxSize = 0;15 i = 0;16 generate(random, 4,2){1718 a = next;19 b = next;20 c = union(a,b);21 c = reduce(c);22 if(!iscomplete(c)){23 c = complete(c);24 }25 if(size(c) > maxSize){26 maxSize = size(c);27 maxA = a;28 maxB = b;29 }30 if (i>= MAX){ break; }31 i += 1;32 }33 print("The biggest size is:" + maxSize + "\n");34 print(maxA);35 print(maxB);36 print(union(maxA,maxB));37 }

Listing 43: Classic Example 3

1 declare{2 dfa a;3 dfa b;4 dfa c;5 dfa maxA;6 dfa maxB;7 dfa maxC;8 int maxSize;

37

Page 38: LISA The descriptive programming language

9 int MAX;10 int i;11 }12 program{1314 MAX = 1000; //Maximum number of randoms15 maxSize = 0;16 i = 0;17 generate(random, 4,2){1819 a = next;20 b = next;21 c = shuffle(a,b);22 c = reduce(c);23 if(!iscomplete(c)){24 c = complete(c);25 }26 if(size(c) > maxSize){27 maxSize = size(c);28 maxA = a;29 maxB = b;30 maxC = c;31 }32 if (i>= MAX){ break; }33 i += 1;34 }35 print("The worst case for state complexity for shuffle operation between consecutive automata

for n=4 and k=2:\n");36 print(maxA);37 print(" and:\n");38 print(maxB);39 print("\ is:\n");40 print(maxC);41 print("which gives " + maxSize + " states\n");42 }

Listing 44: Classic Example 4

1 declare{2 int Maxshu;3 int Maxcon;4 int m;5 int n;6 dfa MaxAshu;7 dfa MaxBshu;8 dfa MaxAcon;9 dfa MaxBcon;

10 dfa MaxC;11 dfa MaxD;12 dfa A;13 dfa B;14 dfa C;15 dfa D;16 }17 program{18 Maxshu = 0;19 Maxcon = 0;20 generate(enumerate,3,2){21 A = next;22 generate(enumerate,3,2){23 B = next;24 C = shuffle(A,B);25 D = concat(A,B);26 C = reduce(C);27 D = reduce(D);28 if(!iscomplete(C)){

38

Page 39: LISA The descriptive programming language

29 C = complete(C);30 }31 if(!iscomplete(D)){32 D = complete(D);33 }34 m = size(C);35 if(m > Maxshu){36 Maxshu = m;37 MaxAshu = A;38 MaxBshu = B;39 MaxC = C;40 }41 m = size(D);42 if(m > Maxcon){43 Maxcon = m;44 MaxAcon = A;45 MaxBcon = B;46 MaxD = D;47 }484950 }51 }52 print("The maximum state complexity of shuffle is: " + Maxshu + " and is obtained for:\n");53 print(MaxAshu);54 print(MaxBshu);55 print(MaxC);56 print("The maximum state complexity of concatenation is: " + Maxcon + " and is obtained for:\n

");57 print(MaxAcon);58 print(MaxBcon);59 print(MaxD);60 }

Listing 45: I/O handling example1 declare{2 dfa my_dfa1;3 dfa my_dfa2;4 dfa dfa_union;5 dfa dfa_reverse;6 }7 program{89 //Read the two input dfa from file

10 my_dfa1 = readfile(dfa,"dfainput1");11 my_dfa2 = readfile(dfa,"dfainput2");1213 //Compute union and reverse of the DFAs14 dfa_union = union(my_dfa1,my_dfa2);15 dfa_reverse = reverse(dfa_union);1617 //Write the result into files18 writefile(dfa_union,"union_output.txt");19 writefile(dfa_reverse,"reverse_output.txt");2021 }

7 Ideas for future development

These are the list of features that should eventually get supported by LISA. Each milestone should pick features fromthis list and implement it.

• Implement full support of regular expression (size function, and etc).

39

Page 40: LISA The descriptive programming language

• Implement finite-language support.

• Check to see if a word is in language (re/dfa/nfa)

• Integrate with Eclipse-IDE.

References

[1] Marco Almeida,Nelma Moreira,Rogério Reis, Enumeration and Generation of Initially Connected DeterministicFinite Automata,Technical Report Series: DCC-2006-07 Version 1.1 March 2007.

[2] Tal Davidson, Jim Pattee, “Home page, Artistic Style 2.02” <http://astyle.sourceforge.net/>.

[3] GNU Compiler Collection, GNU Project,<http://gcc.gnu.org/>.

[4] GNU Multiple Precision Arithmetic Library, GNU Project, <http://gmplib.org/>.

[5] Grail+, Department of Computer Science, University of Western Ontario, Canada ,<http://www.csd.uwo.ca/Research/grail/>.

[6] BISON, GNU Project,<http://www.gnu.org/software/bison/>

40