Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers

Recap

Mooly Sagiv

Outline

• Subjects Studied

• Questions & Answers

• input

– program text (file)

• output

– sequence of tokens

• Read input file

• Identify language keywords and standard identifiers

• Handle include files and macros

• Count line numbers

• Remove whitespaces

• Report illegal symbols

• [Produce symbol table]

Lexical Analysis (Scanning)

The Lexical Analysis Problem

• Given – A set of token descriptions– An input string

• Partition the strings into tokens (class, value)

• Ambiguity resolution– The longest matching token – Between two equal length tokens select the first

Jlex• Input

– regular expressions and actions (Java code)

• Output– A scanner program that reads the input and

applies actions when input regular expression is matched

Jlex

regular expressions

input program tokensscanner

Summary

• For most programming languages lexical analyzers can be easily constructed automatically

• Exceptions:– Fortran– PL/1

• Lex/Flex/Jlex are useful beyond compilers

• input– Sequence of tokens

• output– Abstract Syntax Tree

• Report syntax errors• unbalanced parenthesizes

• [Create “symbol-table” ]• [Create pretty-printed version of the program]• In some cases the tree need not be generated

(one-pass compilers)

Syntax Analysis (Parsing)

Pushdown Automaton

control

parser-table

input

stack

$

$u t w

V

Efficient Parsers • Pushdown automata

• Deterministic

• Report an error as soon as the input is not a prefix of a valid program

• Not usable for all context free grammars

cup

context free grammar

tokens parser

“Ambiguity errors”

parse tree

Kinds of Parsers

• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next production– Preorder tree traversal

• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a production

is found – Postorder tree traversal

Top-Down Parsing

1

t1 t2

input

5

4

3

2

Bottom-Up Parsing

t1 t2 t4 t5 t6 t7 t8

input

1 2

3

Example Grammar for Predictive LL Top-Down Parsing

expression digit | ‘(‘ expression operator expression ‘)’

operator ‘+’ | ‘*’

digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

Example Grammar for Predictive LL Top-Down Parsing

expression digit | ‘(‘ expression operator expression ‘)’

operator ‘+’ | ‘*’

digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

static int Parse_Expression(Expression **expr_p) {

Expression *expr = *expr_p = new_expression() ;

/* try to parse a digit */

if (Token.class == DIGIT) {

expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token();

return 1; }

/* try parse parenthesized expression */

if (Token.class == ‘(‘) {

expr->type=‘P’; get_next_token();

if (!Parse_Expression(&expr->left)) Error(“missing expression”);

if (!Parse_Operator(&expr->oper)) Error(“missing operator”);

if (Token.class != ‘)’) Error(“missing )”);

get_next_token();

return 1; }

return 0;

}

Parsing Expressions

• Try every alternative production– For P A1 A2 … An | B1 B2 … Bm

– If A1 succeeds• Call A2

• If A2 succeeds – Call A3

• If A2 fails report an error

– Otherwise try B1

• Recursive descent parsing• Can be applied for certain grammars• Generalization: LL1 parsing

int P(...) {

/* try parse the alternative P A1 A2 ... An */

if (A1(...)) {

if (!A2()) Error(“Missing A2”);

if (!A3()) Error(“Missing A3”);

..

if (!An()) Error(Missing An”);

return 1;

}/* try parse the alternative P B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;

Predictive Parser for Arithmetic Expressions

• Grammar

• C-code?

1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)

Bottom-Up Syntax Analysis

• Input– A context free grammar– A stream of tokens

• Output– A syntax tree or error

• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a

production is found– Report an error as soon as the input is not a prefix of valid

program

Constructing LR(0) parsing table

• Add a production S’ S$• Construct a finite automaton accepting “valid

stack symbols”• States are set of items A

– The states of the automaton becomes the states of parsing-table

– Determine shift operations– Determine goto operations– Determine reduce operations– Report an error when conflicts arise

1: S E$

4: E T6: E E + T

10: T i12: T (E)

5: E T T

11: T i i

2: S E $7: E E + TE

13: T ( E)4: E T6: E E + T10: T i12: T (E)

(

(

15: T (E) )

14: T (E )7: E E + T

E

7: E E + T10: T i12: T (E)

+

+

8: E E + T

T

2: S E $ $

i

i

1: S E$

4: E T6: E E + T

10: T i12: T (E)

5: E T T

11: T i i

2: S E $7: E E + TE

13: T ( E)4: E T6: E E + T10: T i12: T (E)

(

(

15: T (E) )

14: T (E )7: E E + T

E

7: E E + T10: T i12: T (E)

+

+

8: E E + T

T

2: S E $ $

i

i

Parsing “(i)$”

Summary (Bottom-Up)• LR is a powerful technique• Generates efficient parsers• Generation tools exit LALR(1)

– Bison, yacc, CUP

• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser

Summary (Parsing)

• Context free grammars provide a natural way to define the syntax of programming languages

• Ambiguity may be resolved• Predictive parsing is natural

– Good error messages

– Natural error recovery

– But not expressive enough

• But LR bottom-up parsing is more expressible

Abstract Syntax• Intermediate program representation

• Defines a tree - Preserves program hierarchy

• Generated by the parser

• Declared using an (ambiguous) context free grammar (relatively flat)– Not meant for parsing

• Keywords and punctuation symbols are not stored (Not relevant once the tree exists)

• Big programs can be also handled (possibly via virtual memory)

Semantic Analysis

• Requirements related to the “context” in which a construct occurs

• Examples– Name resolution– Scoping– Type checking– Escape

• Implemented via AST traversals

• Guides subsequent compiler phases

Abstract InterpretationStatic analysis

• Automatically identify program properties– No user provided loop invariants

• Sound but incomplete methods– But can be rather precise

• Non-standard interpretation of the program operational semantics

• Applications– Compiler optimization– Code quality tools

• Identify potential bugs• Prove the absence of runtime errors• Partial correctness

Constant Propagation

z =3

while (x>0)

if (x=1)

y =7 y =z+4

assert y==7

[x?, y?, z?]

[x?, y?, z 3]

[x?, y?, z3]

[x?, y?, z3][x1, y?, z3]

[x1, y7, z3] [x?, y7, z3]

[x?, y?, z3]

/* c */

L0: a := 0

/* ac */

L1: b := a + 1

/* bc */

c := c + b

/* bc */

a := b * 2

/* ac */

if c < N goto L1

/* c */

return c

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c}

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c}

{c}

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c}

{c}

{c, b}

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c}

{c}

{c, b}

{c, b}

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c}

{c}

{c, b}

{c, a}

{c, b}

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

{c, a}

{c, a}

{c, b}

{c, a}

{c, b}

Summary Iterative Procedure

• Analyze one procedure at a time– More precise solutions exit

• Construct a control flow graph for the procedure

• Initializes the values at every node to the most optimistic value

• Iterate until convergence

Basic Compiler Phases

Overall Structure

Techniques Studied

• Simple code generation

• Basic blocks

• Global register allocation

• Activation records

• Object Oriented

• Assembler/Linker/Loader

Heap Memory Management

• Part of the runtime system

• Utilities for dynamic memory allocation

• Utilities for automatic memory reclamation– Garbage Colletion

Garbage Collection

• Techniques– Mark and sweep– Copying collection– Reference counting

• Modes– Generational– Incremental vs. Stop the world

Documents

Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers