View
220
Download
0
Embed Size (px)
Citation preview
Recap
Mooly Sagiv
Outline
• Subjects Studied
• Questions & Answers
• input
– program text (file)
• output
– sequence of tokens
• Read input file
• Identify language keywords and standard identifiers
• Handle include files and macros
• Count line numbers
• Remove whitespaces
• Report illegal symbols
• [Produce symbol table]
Lexical Analysis (Scanning)
The Lexical Analysis Problem
• Given – A set of token descriptions– An input string
• Partition the strings into tokens (class, value)
• Ambiguity resolution– The longest matching token – Between two equal length tokens select the first
Jlex• Input
– regular expressions and actions (Java code)
• Output– A scanner program that reads the input and
applies actions when input regular expression is matched
Jlex
regular expressions
input program tokensscanner
Summary
• For most programming languages lexical analyzers can be easily constructed automatically
• Exceptions:– Fortran– PL/1
• Lex/Flex/Jlex are useful beyond compilers
• input– Sequence of tokens
• output– Abstract Syntax Tree
• Report syntax errors• unbalanced parenthesizes
• [Create “symbol-table” ]• [Create pretty-printed version of the program]• In some cases the tree need not be generated
(one-pass compilers)
Syntax Analysis (Parsing)
Pushdown Automaton
control
parser-table
input
stack
$
$u t w
V
Efficient Parsers • Pushdown automata
• Deterministic
• Report an error as soon as the input is not a prefix of a valid program
• Not usable for all context free grammars
cup
context free grammar
tokens parser
“Ambiguity errors”
parse tree
Kinds of Parsers
• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next production– Preorder tree traversal
• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a production
is found – Postorder tree traversal
Top-Down Parsing
1
t1 t2
input
5
4
3
2
Bottom-Up Parsing
t1 t2 t4 t5 t6 t7 t8
input
1 2
3
Example Grammar for Predictive LL Top-Down Parsing
expression digit | ‘(‘ expression operator expression ‘)’
operator ‘+’ | ‘*’
digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Example Grammar for Predictive LL Top-Down Parsing
expression digit | ‘(‘ expression operator expression ‘)’
operator ‘+’ | ‘*’
digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
static int Parse_Expression(Expression **expr_p) {
Expression *expr = *expr_p = new_expression() ;
/* try to parse a digit */
if (Token.class == DIGIT) {
expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token();
return 1; }
/* try parse parenthesized expression */
if (Token.class == ‘(‘) {
expr->type=‘P’; get_next_token();
if (!Parse_Expression(&expr->left)) Error(“missing expression”);
if (!Parse_Operator(&expr->oper)) Error(“missing operator”);
if (Token.class != ‘)’) Error(“missing )”);
get_next_token();
return 1; }
return 0;
}
Parsing Expressions
• Try every alternative production– For P A1 A2 … An | B1 B2 … Bm
– If A1 succeeds• Call A2
• If A2 succeeds – Call A3
• If A2 fails report an error
– Otherwise try B1
• Recursive descent parsing• Can be applied for certain grammars• Generalization: LL1 parsing
int P(...) {
/* try parse the alternative P A1 A2 ... An */
if (A1(...)) {
if (!A2()) Error(“Missing A2”);
if (!A3()) Error(“Missing A3”);
..
if (!An()) Error(Missing An”);
return 1;
}/* try parse the alternative P B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;
Predictive Parser for Arithmetic Expressions
• Grammar
• C-code?
1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)
Bottom-Up Syntax Analysis
• Input– A context free grammar– A stream of tokens
• Output– A syntax tree or error
• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a
production is found– Report an error as soon as the input is not a prefix of valid
program
Constructing LR(0) parsing table
• Add a production S’ S$• Construct a finite automaton accepting “valid
stack symbols”• States are set of items A
– The states of the automaton becomes the states of parsing-table
– Determine shift operations– Determine goto operations– Determine reduce operations– Report an error when conflicts arise
1: S E$
4: E T6: E E + T
10: T i12: T (E)
5: E T T
11: T i i
2: S E $7: E E + TE
13: T ( E)4: E T6: E E + T10: T i12: T (E)
(
(
15: T (E) )
14: T (E )7: E E + T
E
7: E E + T10: T i12: T (E)
+
+
8: E E + T
T
2: S E $ $
i
i
1: S E$
4: E T6: E E + T
10: T i12: T (E)
5: E T T
11: T i i
2: S E $7: E E + TE
13: T ( E)4: E T6: E E + T10: T i12: T (E)
(
(
15: T (E) )
14: T (E )7: E E + T
E
7: E E + T10: T i12: T (E)
+
+
8: E E + T
T
2: S E $ $
i
i
Parsing “(i)$”
Summary (Bottom-Up)• LR is a powerful technique• Generates efficient parsers• Generation tools exit LALR(1)
– Bison, yacc, CUP
• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser
Summary (Parsing)
• Context free grammars provide a natural way to define the syntax of programming languages
• Ambiguity may be resolved• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• But LR bottom-up parsing is more expressible
Abstract Syntax• Intermediate program representation
• Defines a tree - Preserves program hierarchy
• Generated by the parser
• Declared using an (ambiguous) context free grammar (relatively flat)– Not meant for parsing
• Keywords and punctuation symbols are not stored (Not relevant once the tree exists)
• Big programs can be also handled (possibly via virtual memory)
Semantic Analysis
• Requirements related to the “context” in which a construct occurs
• Examples– Name resolution– Scoping– Type checking– Escape
• Implemented via AST traversals
• Guides subsequent compiler phases
Abstract InterpretationStatic analysis
• Automatically identify program properties– No user provided loop invariants
• Sound but incomplete methods– But can be rather precise
• Non-standard interpretation of the program operational semantics
• Applications– Compiler optimization– Code quality tools
• Identify potential bugs• Prove the absence of runtime errors• Partial correctness
Constant Propagation
z =3
while (x>0)
if (x=1)
y =7 y =z+4
assert y==7
[x?, y?, z?]
[x?, y?, z 3]
[x?, y?, z3]
[x?, y?, z3][x1, y?, z3]
[x1, y7, z3] [x?, y7, z3]
[x?, y?, z3]
/* c */
L0: a := 0
/* ac */
L1: b := a + 1
/* bc */
c := c + b
/* bc */
a := b * 2
/* ac */
if c < N goto L1
/* c */
return c
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c}
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c}
{c}
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c}
{c}
{c, b}
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c}
{c}
{c, b}
{c, b}
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c}
{c}
{c, b}
{c, a}
{c, b}
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
{c, a}
{c, a}
{c, b}
{c, a}
{c, b}
Summary Iterative Procedure
• Analyze one procedure at a time– More precise solutions exit
• Construct a control flow graph for the procedure
• Initializes the values at every node to the most optimistic value
• Iterate until convergence
Basic Compiler Phases
Overall Structure
Techniques Studied
• Simple code generation
• Basic blocks
• Global register allocation
• Activation records
• Object Oriented
• Assembler/Linker/Loader
Heap Memory Management
• Part of the runtime system
• Utilities for dynamic memory allocation
• Utilities for automatic memory reclamation– Garbage Colletion
Garbage Collection
• Techniques– Mark and sweep– Copying collection– Reference counting
• Modes– Generational– Incremental vs. Stop the world