Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
2
Syntax-Directed Definition
• Syntax-Directed Definition
– Each production rule has a “To-Do(!)” list
• To-Do= an associated semantic action (code)
– a ‘To-Do’ list will be executed, whenever a production rule
is being reduced (in LR), or being derived(in LL)
– Commonly used in parser generators (i.e. yacc, ANTLR..)
• The parser to be created will have semantic actions,
defined by the compiler developer
3
Semantic Actions
• For actions of E E + E
– Three E need to be differentiated in action codes
– In yacc/bison, ANTLR .. : using $$, $1, $2, ....
eg. expr ::= expr PLUS expr {$$=$1+$3;}
Example 1) Calculator made of Yacc
4
expr : NUM {$$ = $1;}| expr PLUS expr {$$ = $1 + $3;}| expr MULT expr {$$ = $1 + $3;} | LPAR expr RPAR {$$ = $2;};
expr : NUM {$$.val = $1.val;}| expr PLUS expr {$$.val = $1.val + $3.val;}| expr MULT expr {$$.val = $1.val + $3.val;} | LPAR expr RPAR {$$.val = $2.val;};
or
We call values in vals “attributes”
5
Yacc, bison
declarations
%%
rules
%%
support code
.y file format
(same format as lex/flex )foo.y
y.tab.c
a.out
gcc
yacc, bison
yyparse() is
the main routine
6
Declarations
• User defined declarations: described “%{“ between “%}”
• Tokens – terminal to be used in the grammar
– %token terminal1 terminal2 ...
– or %token terminal1 val1 terminal2 val2 ...
• For the usages in yacc (or bison), the option ‘-d’ will allow to use the
tokens defined in lex (which is in y.tab.h)
– Refer to the Internet materials!
• lex/yacc pair, flex/bison pair
If you want to use C++ in the actions, you had better flex++/bison++ rather than lex/yacc
7
Declarations CONTS’
• Start symbol
%start non-terminal
• Associativity – (left, right, none)%left TK_PLUS
%right TK_ASSIGNMENT
%nonassoc TK_LESSTHAN
• Precedence
%prec
– Used to define a priority of a rule
8
Declarations CONTS’
• Attribute values – Information of terminal/non-terminal symbol
• delivered from lexer (scanner) eg. %union {
int ival;
char *name;
double dval;
}
– codify as YYSTYPE
• Type of non-terminals: used when a special value need to be delivered
%type<union_entry>non_terminal
eg. %type<ival>IntNumber
9
Functions
• Main function
– yyparse()
• Error function
– yyerror(char *s);
• The value of the last token
– yylval of type YYSTYPE (%union decl)
note:yylval is defined in lex
[a-z] {yylval.ival = yytext[0] – ‘a’;
return TK_NAME;} // in lex
then, we can use yylval in yacc
10
Rules
• Production rules
non-terminal : first_ | second_ | ... ;
non-terminal : ; /* -rule */
eg) foo : production1 | ; /* nothing*/
• Actions are to be executed if the rhs is matched
non-terminal l : rhs {action routine} ;
• Conflict
: shift/reduce or reduce/reduce conflicts
eg1) e: ’X’ | e ’+’ e ;
• “X+X+X” is “(X+X)+X”? or “X+(X+X)”?
eg2) Z: X|Y; X:’a’; Y:’a’;
• in yacc, conflict will occur
11
Attribute Values ($vars)
• Each terminal/non-terminal has attribute values
• In the action of the matched rule
– $$ = LHS
– $1 = first symbol of the RHS
– $2 = second symbol, etc.
– If the actions are as follows
A: B {...} C {...} ;
C’s value is $3 !!
12
Example .y File- Calculator
%union {
int value;
char *symbol;
}
%type<value> exp term factor
%type<symbol> ident
...
exp : exp ‘+’ term {$$ = $1 + $3; };
/* Note, $1 and $3 are ints here */factor: ident {$$ = lookup(symbolTable, $1); };
/* Note, $1 is a char* here */
13
Elimination of If-Conflict in C
• Binding else with the closest if
– Change the grammar (possible but complex!),
– or Use yacc directives
%nonassoc LOWER_THAN_ELSE
%nonassoc ELSE
statement : if expr statement %prec LOWER_THEN_ELSE
| if expr statement ELSE statement
14
SDD Example 2) Type Declaration
D T id {AddType(id, T.type);
D.type = T.type; }
D D1, id {AddType(id, D1.type);
D.type = D1.type; }
T int {T.type = intType; }
T float {T.type = floatType; }
{AddType($2, $1.type);
$$.type = $1.type; }
in yacc
D
D , id
T id
intintType
T.type
D.type
D.type
Values are propagated, in bottom-up direction
int a, b
15
SDD Example 3) Type Declaration
D TL {AddType(id, T.type);
D.type = T.type;
L.type = D.type; }
T int {T.type = intType; }
T float {T.type = floatType; }
L L1, id {AddType(id, L1.type);
??? }
L id {AddType(id, ???); }
D
id
LT
intintType
T.type
L.type
D.type
int a, b
L , idL.type
Values are propagated, in both top-down and bottom-up direction
16
Attributes
• AST vs. SDD– fact 1 An AST can be defined using SDDs. (eg. previous example 1 )
– fact 2 A SDD can be viewed a series of evaluations on attributesattached to the nodes of an AST.
• Categories of attributes (in AXYZ)– synthesized attr.
• Evaluation is made from the attribute values of children (bottom-up)
• A.attr = f(X.attr, Y.attr, Z.attr);
• All the terminals assume to be synthesized attr.
– inherited attr.
• Evaluation is made from the attribute values of parent or sibling as well as children
• Y.attr = f(A.attr, X.attr, Z.attr);
Implementation of Attribute Evaluation
• On-the-fly implementation (cf. Rule-based method, Parse Tree method)
– evaluation order is the same as AST traversal order
– Most efficient, but has restriction
• S-attributed SDD : only has synthesized attr.
• L-attributed SDD : only has synthesized attr. + evaluation with attributes of left siblings
• Semantic analysis
– To ensure correctness of usage of program constructs (variables, objects, expressions, statements...) by analysis
• Related to scopes and types
– Semantic analysis “Attribute Evaluation + Attribute value check”
• cf. single pass (with single AST) vs. multiple pass (with separate AST)
17
Listener Style in ANTLR
18
A method has what-to-be-
done when visiting a node in
a parsing tree
If the name of grammar is MiniC, interfaceMiniCListener and skeleton class MiniCBaseListener are generated automatically.
For each nonterminal,enter.., exit.. methods are generated automatically
While the parsing tree is traversed, enter.. and exit.. methods would be invoked when each nonterminal node is visited.
19
Semantic Analysis
• Scope-related checks: Check if a variable is not used before definition, if same-named variables are defined twice
• Type-related checks: Variables and the assigned values are type compatible
Lexical AnalysisSyntax Analysis
Semantic AnalysisErrors
Abstract Syntax Tree + ...
Source code
20
Scope
• Identifiers– Variables, Constants, Function names, lables ...
• ‘Lexical’ Scope– Textual range of a program
• Statement block, function definition, source file, the whole program...
• Scope of the identifiers– The lexical scope referred by the identifier
예) scope of a variable
In a block (local variables), In a function definition(formal parameters), Source file (global variables), The whole program (extern variables)
cf. How about fields? methods?
21
Variaboles Scope : PL Review
{ int a;...
{int b;...}
....}
scope of variable a
Scope of variable b
void foo() {... goto lab;...lab: i++;... goto lab;...
}
int foo(int n) {...
}
scope of argument n
scope of label lab(The function body
in ANSI C)
22
Symbol Tables
• Symbol tables
– Data structure for management of symbols and related information
– Scope and types of identifiers in the data structure are referenced in Semantic analysis and code generation
– Insertion : in variable declaration phase
– Lookup: when used in expressions and in other language structures
• Table entries : identifier name + info
– eg.
NAME KIND TYPE ATTRIBUTESfoo func int,int int externm arg intn arg int consttmp var char const
23
Scope Information in Symbol Tables
• Characteristics of block structured languages
– Each block (lexical scope) has local variable
declarations
Each lexical scope has a single symbol table
– Hierarchy of scopes :
• Each block (lexical scope) can have other
subblocks
• Any variables declared in enclosing blocks
can be used
Hierarchy of symbol tables
int x;
void f(int m) {
float x, y;
...
{int i, j; ....; }
{int x; l: ...; }
}
int g(int n) {
char t;
... ;
}
24
Examples
int x;
void f(int m) {
float x, y;
...
{int i, j; ....; }
{int x; l: ...; }
}
int g(int n) {
char t;
... ;
}
x var int
f func int void
g func int int
m arg int
x var float
y var float
n arg int
t var char
i var int
j var int
x var int
l label
Global symtab
func f
symtab
func g
symtab
25
Error Checking
int x;
void f(int m) {
float x, y;
...
{int i, j; x=1; }
{int x; l: i=2; }
}
int g(int n) {
char t;
x=3;
}
x var int
f func int void
g func int int
m arg int
x var float
y var float
n arg int
t var char
i var int
j var int
x var int
l label
Global symtab
i=2
Error!
“undefined
variable”
• Starting from the current scope, search upward along the hierarchy
• If no matching declaration is found until reaching the root Error!
26
Symbol Table Implementation
• Essential operation– Table construction: after building the AST
– Insertion : in variable declaration phase
– Lookup: when used in expressions and in other language structures (checking)
cf. forward reference ?
• For efficiency– Identifier names in table entries
• using a string pool to hold only pointers to the pool in the table
– Local tables: hash
– Globally N-ary tree structure• But, tree is too expensive!
• More efficient management is possible by using locality of usage:
Note that after getting out of the scope, the corresponding local table is useless
Global Table Hierarchy-Using A Stack
27
{int i,j;} {int x..}
f() f() f() g()
file file file file file fileStack] 27
int x;
void f(int m) {
float x, y;
...
{int i, j; ....; }
{int x; l: ...; }
}
int g(int n) {
char t;
... ;
}
x var int
f func int void
g func int int
m arg int
x var float
y var float
n arg int
t var char
i var int
j var int
x var int
l label
Global symtab
func f
symtab
func g
symtab
28
Semantic Analysis
• Semantic Analysis = Syntax Analysis +α
• In this chapter– Abstract Syntax Tree (AST)
– Syntax-Directed Definition/Translation (SDT)
Lexical (어휘) AnalysisSyntax (구문) Analysis
Semantic (의미) AnalysisErrors
Abstract Syntax Tree+ ...
Source Code
30
Parse Tree
• Parse tree
– Describing derivation process
– Terminals are leaf nodes
– Non-terminals are intermediate nodes
– Impossible to express derivation order
(eg. same with left-most derivation and
right-most derivation)
S
E + S
( S ) E
E + S 5
E + S1
2 E
( S )
E + S
E3 4
31
Abstract Syntax Tree (AST)
• AST
– Parse tree without
superfluous information
+
+ 5
1 +
2 +
3 4
S
E + S
( S ) E
E + S 5
E + S1
2 E
( S )
E + S
E3 4
32
AST Data Structures – Java Example
Abstract class Expr{}
class Add extends Expr {
Expr left, right;
Add(Expr L, Expr R) {
left=L; right=R;
}
}
class Num extends Expr {
int value;
Num(int v) {value = v;}
}
+
N 5+
N 1+
. . .
cf. Visitor Pattern
33
AST Data Structures – C Example 1
struct tokenType {
int tokenNumber;
char * tokenValue;
}
typedef struct nodeType{
struct tokenType token; // only meaningful token
struct nodeType children[MAX]; // here, max==2
}
// Space waste problem
// Might be serious for a “Statement List” (rather than nodes for ADD)
// What about using a linked list for a node? Performance overhead!
(+,0)
(+,0)
(+,0)
. . .
(N,1)
(N,5)
34
AST Data Structures – C Example 2
struct tokenType {
int tokenNumber;
char * tokenValue;
}
typedef struct nodeType{
struct tokenType token; // only meaningful tokens
struct nodeType * son;
struct nodeType * brother;
}
// n-ary tree is represented as a binary tree
(+,0)
(+,0)
(+,0)
. . .
(N,1)
(N,5)