26
www.interrasystems.co m An Introduction To Antlr

Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Embed Size (px)

Citation preview

Page 1: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

www.interrasystems.com

An Introduction To Antlr

Page 2: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 2

Content

What is Antlr?Why use Antlr?How to use Antlr?Components of Antlr grammar fileWriting Lexer ClassWriting Parser ClassWhat does Antlr generates?PredicatesAutomatic Parse Tree generationTree ParsingConclusion

Page 3: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 3

What is Antlr?

ANTLR, ANother Tool for Language Recognition, is a pred-LL(k) parser and translator generator tool.

It generates front end of compilers, and source-to-source translators grammatical descriptions in Java, C++, Python, C#.

Page 4: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 4

Why Antlr?

Antlr supports writing grammars in EBNF LL[k] that is very handy in compare to LR grammars.

The generated code of Antlr is much more readable than others LR/LL parser, which makes debugging much more easy.

Re-entrant parser

Re-usability

Antlr can outputs multiple languages.

Lower Memory requirement as it doesn’t simulate a push down automata like LALR(yacc/bison)

Page 5: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 5

Why Antlr?(contd.)

Antlr generates code in Object oriented languages. So It allows to inherit the basic functionality and add your own functionality.

Antlr supports exception handling, makes easy error recovery.

Same meta-language specification for lexer/parser/tree parser.

Antlr allows to build AST from input token stream.

Page 6: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 6

How to use Antlr?

calc.g is my antlr grammer file containing both lexer and parser.It contains the parser with name CalcParser

class CalcParser extends Parser;

It contains the lexer with name CalcLexerclass CalcLexer extends Lexer;

Now invoke ANTLR on the grammer file to generate the lexer and the parser code

java -cp $ANTLR_HOME/antlr.jar antlr.Tool calc.g

Compile the generated codegcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcLexer.cppgcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcParser.cpp

Page 7: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 7

How to use Antlr?

Compile the main function with instance of lexer and parser class.

Example of main() functions body CalcLexer lexer(cin); CalcParser parser(lexer); parser.expr();

gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall main.cpp

Link the generated obj files with antlr static library to create the parser executable

gcc main.o CalcLexer.o CalcParser.o $ANTLR_HOME/lib/cpp/src/libantlr.a -lstdc++

Page 8: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 8

Writing Parser Class

All parser rules must be associated with a parser class.

A parser specification in a grammar file often looks like:

{ optional class code preamble }

class YourParserClass extends Parser;

options section

tokens section

{ optional parser class members }

parser rules

Page 9: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 9

Options section

The section is preceded by the ‘options’ keyword and contains a series of option/value assignments.

options {

importVocab = lexerVocab;

k = 2;

buildAst = true;

defaultErrorHandler = true;

}

Page 10: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 10

Token section

Token section contains all the keywords that parser will

use in parser rules.

For example: tokens { "void"; "char"; "short"; "int"; ..... }

Page 11: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 11

Rule Section

The structure of an input stream of atoms is specified by a set of mutually-referential rules.

Each rule has a a name, optionally a set of arguments, optionally an init-action, optionally a return value, and an alternative or alternatives.

Each alternative contains a series of elements that specify what to match and where.

Page 12: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 12

Rule Section(contd.)

The basic form of an ANTLR rule is:

rulename

: alternative_1

| alternative_2

...

| alternative_n

;

If parameters are required for the rule, use the following form:

rulename[formal parameters] : ... ;

Page 13: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 13

Rule Section(contd.)

If you want to return a value from the rule, use the returns keyword:

rulename[formal parameters] returns [type id] : ... ;

If you want to pass arguments to any rule reference use the following from:

rulename : alternative_1[arg1, arg2] ;

If the rule reference return any value, to capture that value simply assign that value to a variable using assignment.

Page 14: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 14

Rule Section

rulename { type id; } : id=alternative_1[arg1, arg2] ... ;

Init-action can also be specified for rule. rulename { // init-action } : ....;

Page 15: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 15

Rule Section

User action can follow any rule reference. It excutes after that rule reference have matched except in non guessing mode.

rule : rule_ref1 { // user code } rule_ref2

{

// user code

}

;

ANTLR supports extended BNF notation according to the following four subrule syntax.

Page 16: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 16

Rule Section(contd.)

( P1 | P2 | ... | Pn )

( P1 | P2 | ... | Pn )*

( P1 | P2 | ... | Pn )+

( P1 | P2 | ... | Pn )?

Page 17: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 17

Writing Lexer Class

All lexer rules must be associated with a lexer class.

A lexer specification in a grammar file often looks like:

{ optional class code preamble }

class YourLexerClass extends Lexer;

options section

tokens section

{ optional lexer class members }

lexer rules

Page 18: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 18

What does Antlr Generate?

Antlr will generate the following files from calc.g grammer.

CalcLexer.hppCalcLexer.cppCalcParser.hppCalcParser.cppCalcLexerTokenTypes.hppCalcLexerTokenTypes.txt

For every rule, Antlr defines a function call inside the parser/lexer class. For example, the code for rules expr looks very much like this:

Page 19: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 19

Rule Section(contd.)

void CalcParser::expr() {

try { // for error handling

mexpr();

{ // ( ... )*

for (;;) {

if ((LA(1) == PLUS)) {

match(PLUS);

mexpr();

}

else {

goto _loop14;

}

}

_loop14:;

} // ( ... )*

match(SEMI);

}

catch (ANTLR_USE_NAMESPACE(antlr)RecognitionException& ex) {

// report error consume this token and forward the token stream pointer from where

// parser can resume parsing

}

}

Page 20: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 20

Predicates

Antlr provides two types of predicates to resolve ambiguities between alternatives.

Semantic predicate

A semantic predicate specifies a condition that must be met (at run-time) before parsing may proceed. It is specified as {...}?

Example:

stat :

{isTypeName(LT(1))}? ID ID ";" // declaration "type varName;"

| ID "=" expr ";" // assignment

;

Page 21: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 21

Predicate(contd.)

Syntatic predicate

Semantic predicate allows you to use arbitrary lookahead when parsing decisions cannot be deterministic with finite lookahead. It is specified

as ( prediction block ) => production.

Example:

stat: ( list "=" )=> list "=" list

| list

;

Page 22: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 22

Automatic Parse Tree Generation

ANTLR comes with it’s own tree data structure.

Antlr tree is a Nery Tree. With each node containing a list of child nodes

Each node has a token with tokenId and Value

How to generateIn options regionbuildAST = true;

With each rule specify the parent with ^ e.gassign: lvalue “=“^ expr’;expr: term (“+”^ term)*;term: ID (“*”^ ID)*;

Page 23: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 23

Accessing Parse Tree

The tree is available from parser Object via member function getAST() after parsingmyParser.topRule(myLexer);

AST *parseTree = myParser.getAST();

The parse Tree information can be accessed via the following member functions of parse Treeint getType(); // type of the token

std::string getText(); // text of the token

int getNumberOfChildren();

AST *getFirstChild();

AST *getNextSibling();

Page 24: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 24

Customizing AST parser

Put ‘!’ to prevent automatic AST generation

Add customized tree generationTerm: explicit_mult

| implicit_mult

;

explicit_mult: ID MULT^ ID;

imlicit_mult !: left:ID right:ID

{ #implicit_mult =

#(#[MULT,”*”], #left, #right); }

;

Page 25: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 25

Tree Parser

The browser for the AST tree can also be generated by ANTLRThe parser/browser needs to be derived from TreeParserclass myTreeParser extends TreeParser;

rule written similar to parser with # denoting a node information in in-order form e.gpoly : #(ADD term poly) | term ;term : INT | ID | #(EXP ID INT) | #(MULT INT #(EXP ID INT)) ;

Action can be added with each rule. The rule can create a new modified AST.

Page 26: Www.interrasystems.com An Introduction To Antlr. Slide: 2 Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing

Slide: 26

Conclusion

ANTLR is a newer and powerful substitute of old yacc parser generator

The input language is BNF based and is better organized than yacc input

Lot of free language parser code is already available in this language

Re-entrant parser in true OOPs.

Each rule available as separate parser entry point so the parser is more re-usable.

Already in use at Interra. In e2Vera and Tiger.

We should use antlr for new projects

Will probably have some porting issues as it heavily depends on Exception handling and templates.