CSC 415: Translators and Compilers

CSC 415: Translators and Compilers

Dr. Chuck Lillie

Chart 2

Course Outline

Translators and Compilers– Language Processors– Compilation– Syntactic Analysis– Contextual Analysis– Run-Time Organization– Code Generation– Interpretation

Major Programming Project– Project Definition and

Planning– Implementation

Weekly Status Reports– Project Presentation

Chart 3

Project

Implement a Compiler for the Programming Language Triangle– Appendix B: Informal Specification of the Programming

Language Triangle– Appendix D: Class Diagrams for the Triangle Compiler

Present Project Plan– What and How

Weekly Status Reports– Work accomplished during the reporting period– Deliverable progress, as a percentage of completion– Problem areas– Planned activities for the next reporting period

Chart 4

Chapter 1: Introduction to Programming Languages

Programming Language: A formal notation for expressing algorithms.

Programming Language Processors: Tools to enter, edit, translate, and interpret programs on machines.

Machine Code: Basic machine instructions– Keep track of exact address of each data item and

each instruction– Encode each instruction as a bit string

Assembly Language: Symbolic names for operations, registers, and addresses.

Chart 5

Programming Languages

High Level Languages: Notation similar to familiar mathematical notation– Expressions: +, -, *, /– Data Types: truth variables, characters, integers,

records, arrays– Control Structures: if, case, while, for– Declarations: constant values, variables, procedures,

functions, types– Abstraction: separates what is to be performed from

how it is to be performed– Encapsulation (or data abstraction): group together

related declarations and selectively hide some

Chart 6

Programming LanguagesAny system that manipulates programs

expressed in some particular programming language– Editors: enter, modify, and save program text– Translators and Compilers: Translates text from

one language to another. Compiler translates a program from a high-level language to a low-level language, preparing it to be run on a machine

Checks program for syntactic and contextual errors– Interpreters: Runs program without compliation

Command languagesDatabase query languages

Chart 7

Programming Languages Specifications

Syntax– Form of the program– Defines symbols– How phrases are composed

Contextual constraints– Scope: determine scope of each declaration– Type:

Semantics– Meaning of the program

Chart 8

Representation

Syntax– Backus-Naur Form (BNF): context-free grammar

Terminal symbols (>=, while, ;) Non-terminal symbols (Program, Command, Expression,

Declaration) Start symbol (Program) Production rules (defines how phrases are composed from

terminals and sub-phrases)– N::=a|b|….

– Syntax Tree Used to define language in terms of strings and terminal

symbols

Chart 9

Representation

Semantics– Abstract Syntax

Concentrate on phrase structure alone– Abstract Syntax Tree

Chart 10

Contextual Constraints

Scope– Binding

Static: determined by language processor Dynamic: determined at run-time

– Type Statically: language processor can detect all errors Dynamically: type errors cannot be detected until run-time

Will assume static binding and statically typed

Chart 11

Semantics

Concerned with meaning of program– Behavior when run

Usually specified informally– Declarative sentences– Could include side effects– Correspond to production rules

Chart 12

Chapter 2: Language Processors

Translators and Compilers InterpretersReal and Abstract Machines Interpretive CompilersPortable CompilersBootstrappingCase Study: The Triangle Language

Processor

Chart 13

Translators & Compilers

Translator: a program that accepts any text expressed in one language (the translator’s source language), and generates a semantically-equivalent text expressed in another language (its target language)– Chinese-into-English– Java-into-C– Java-into-x86– X86 assembler

Chart 14


Assembler: translates from an assembly language into the corresponding machine code– Generates one machine code instruction per source

instruction Compiler: translates from a high-level language

into a low-level language– Generates several machine-code instructions per

source command.

Chart 15


Disassembler: translates a machine code into the corresponding assembly language

Decompiler: translates a low-level language into a high-level language

Question: Why would you want a disassembler or decompiler?

Chart 16


Source Program: the source language text Object Program: the target language text

Compiler

ObjectProgram

Syntax Check

Context Constraints

Generate Object CodeSemantic Analysis

SourceProgram

• Object program semantically equivalent to source program If source program is well-formed

Chart 17


Why would you want to do:– Java-into-C translator– C-into-Java translator– Assembly-language-into-Pascal decompiler

Chart 18


M

PL

PL

M

P = Program NameL = Implementation Language

M = Target Machine

For this to work, L must equal M, that is, the implementation language must be the same as the machine language

S TL

S = Source LanguageT = Target LanguageL = Translator’s Implementation LanguageS-into-T Translator is

itself a program that runs on machine L

Chart 19


• Translating a source program P • Expressed in language T, • Using an S-into-T translator • Running on machine M

PS

M

S TM

PT

Chart 20


• Translating a source program sort • Expressed in language Java, • Using an Java-into-x86 translator • Running on an x86 machine

sortJava

x86

Java x86x86

sortx86

The object program is running on the same machine as the compiler

sortx86

x86

Chart 21


sortJava

x86

Java PPCx86

sortPPC

Cross Compiler: The object program is running on a different machine than the compiler

sortPPC

PPC

download

• Translating a source program sort • Expressed in language Java, • Using an Java-into-PPC translator • Running on an x86 machine• Downloaded to a PPC machine

Chart 22


sortJava

x86

Java Cx86

sortC

Two-stage Compiler: The source program is translated to another language before being translated into the object program

sortx86

x86

• Translating a source program sort • Expressed in language Java, • Using an Java-into-C translator • Running on an x86 machine

x86

x86x86

sortx86C

• Then translating the C program• Using an C-into x86 compiler• Running on an x86 machine• Into x86 object program

Chart 23


Translator Rules– Can run on machine M only if it is expressed in

machine code M– Source program must be expressed in translator’s

source language S– Object program is expressed in the translator’s target

language T– Object program is semantically equivalent to the

source program

Chart 24

Interpreters

Accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately– Does not translate the source program into object code

prior to execution

Chart 25

Interpreters

Interpreter

Program Complete

Fetch Instruction

Analyze Instruction

Execute Instruction

SourceProgram

• Source program starts to run as soon as the first instruction is analyzed

Chart 26

Interpreters

When to Use Interpretation– Interactive mode – want to see results of instruction

before entering next instruction– Only use program once– Each instruction expected to be executed only once– Instructions have simple formats

Disadvantages– Slow: up to 100 times slower than in machine code

Chart 27

Interpreters

Examples– Basic– Lisp– Unix Command Language (shell)– SQL

Chart 28

Interpreters

SL S interpreter expressed in language L

SM

PS

M

Program P expressed in language S, using Interpreter S, running on machine M

Basicx86

graphBasic

x86

Program graph written in Basic running on a Basic interpreter executed on an x86 machine

Chart 29

Real and Abstract Machines

Hardware emulation: Using software to execute one set of machine code on another machine– Can measure everything about the new machine

except its speed– Abstract machine: emulator– Real machine: actual hardware

An abstract machine is functionally equivalent to a real machine if they both implement the same language L

Chart 30

Real and Abstract Machines

nmiC

M

C MM

New Machine Instruction (nmi) interpreter written in C

nmiC

nmiM

The nmi interpreter is translated into machine code M using the C compiler

Compiler to translate C program into M machine code

nmi interpreter written in C nmi interpreter expressed in machine code M

nmiM

Pnmi

M

Pnmi

nmi

Chart 31

Interpretive Compilers

Combination of compiler and interpreter– Translate source program into an intermediate

language– It is intermediate in level between the source language

and ordinary machine code– Its instructions have simple formats, and therefore can

be analyzed easily and quickly– Translation from the source language into the

intermediate language is easy and fast

An interpretive compiles combines fast compilation with tolerable running speed

Chart 32

Interpretive Compilers

Java JVM

M

JVMM

Java into JVM translator running on machine M

JVM code interpreter running on machine M

Java JVM

M

PJava

PJVM

M

PJVM

M

JVMM

A Java program P is first translated into JVM-code, and then the JVM-code object program is interpreted

Chart 33

Portable Compilers

A program is portable if it can be compiled and run on any machine, without change– A portable program is more valuable than an

unportable one, because its development cost can be spread over more copies

– Portability is measured by the proportion of code that remains unchanged when it is moved to a dissimilar machine

Language affects protability– Assembly language: 0% portable– High level language: approaches 100% portability

Chart 34

Portable Compilers

Language Processors– Valuable and widely used programs– Typically written in high-level language

Pascal, C, Java– Part of language processor is machine dependent

Code generation part Language processor is only about 50% portable

– Compiler that generates intermediate code is more portable than a compiler that generates machine code

Chart 35

Portable Compilers

Java JVM

JavaJVMJava

Java JVM

JVM

PJava

PJVM

M

PJVM

M

JVMM

JVMC

Java JVM

JVM

Rewrite interpreter in C

C M

M

M

JVMC

JVMM

JVMM

Note: C M Compiler exists; rewrite JVM interpreter from Java to C

Chart 36

Bootstrapping

The language processor is used to process itself– Implementation language is the source language

Bootstrapping a portable compiler– A portable compiler can be bootstrapped to make a true compiler

– one that generates machine code – by writing an intermediate-language-into-machine-code translator

Full bootstrap– Writing the compiler in itself– Using the latest version to upgrade the next version

Half bootstrap– Compiler expressed in itself but targeted for another machine

Bootstrapping to improve efficiency– Upgrade the compiler to optomize code generation as well as to

improve compile efficiency

Chart 37

Bootstrapping

Bootstrap an interpretive compiler to generate machine code

Java M

Java

M

JVMM

Java M

Java Java JVM

JVM

Java M

JVM

M

JVMM

JVM M

JVM JVM M

JVM

Java M

M

M

Java JVM

JVM JVM M

M

Java JVM

M Java JVM

M

M

JVM M

M

M

PJava

PJVM

PM

First, write a JVM-coded-into-M translator in Java

Next, compile translator using existing interpreter

Use translator to translate itself

Translate Java-into-JVM-code translator into machine code

Two stage Java-into-M compiler

Chart 38

Bootstrapping

Full bootstrapAda-S M

C

v1

Ada-S M

C C M

M

Ada-S M

M

M

v1 v1

Ada-S M

Ada-S

v2

Ada-S M

Ada-S Ada-S M

M

Ada-S M

M

M

v2 v2

v1 Ada M

Ada-S Ada-S M

M

Ada M

M

M

v3 v3

v2

Ada M

Ada-S

v3

Extend Ada-S compiler to (full) Ada compiler

Convert the C version of Ada-S into Ada-S version of Ada-S

Write Ada-S compiler in C

Chart 39

Bootstrapping

Half bootstrapAda HM

Ada

Ada HM

HM

Ada TM

Ada

Ada TM

Ada Ada HM

HM

HM

Ada TM

HM

PAda Ada TM

HM

PTM

PTM

TM

Ada TM

Ada Ada TM

HM

HM

Ada TM

TM

Chart 40

Bootstrapping

Bootstrap to improve efficiencyAda Ms

Ms

v1

Ada Ms

Ada

v1

Ada Mf

Ada

v2

Ada Mf

Ada

v2

Ada Ms

Ms

v1 Ada Mf

Ms

v2

M

Ada Mf

Ms

v2PAda

Chart 41

Chapter 3: Compilation

Phases– Syntactic Analysis– Contextual Analysis– Code Generation

Passes– Multi-pass Compilation– One-pass Compilation– Compiler Design Issues

Case Study: The Triangle Compiler

Chart 42

Phases

Syntactic Analysis– The source program is parsed to check whether it

conforms to the source language’s syntax, and to determine its phrase structure

Contextual Analysis– The parsed program is analyzed to check whether it

conforms to the source language's contextual constraints

Code Generation– The checked program is translated to an object

program, in accordance with the semantics of the source and target languages

Chart 43

Phases

Syntactic Analysis

Contextual Analysis

Code Generation

Object Program

Decorated AST

AST

Source Program

Error Report

Error Report

Chart 44

Syntactic Analysis

To determine the source program’s phrase structure– Parsing– Contextual analysis and code generation must know how the

program is composed Commands, expressions, declarations, …

– Check for conformance to the source language’s syntax– Construct suitable representation of its phrase structure (AST)

AST– Terminal nodes corresponding to identifiers, literals, and operators– Sub trees representing the phases of the source program– Blanks and comments not in AST (no meaning)– Punctuation and brackets not in AST (only separate and enclose)

Chart 45

Contextual Analysis

Analyzes the parsed program– Scope rules– Type rules

Produces decorated AST– AST with information gathered during contextual

analysis– Each applied occurrence of an identifier is linked ot the

corresponding declaration– Each expression is decorated by its type T

Chart 46

Code Generation

The final translation of the checked program to an object program– After syntactic and contextual analysis is completed

Treatment of identifiers– Constants

Binds identifier to value Replace each occurrence of identifier with value

– Variables Binds identifier to some memory address Replace each occurrence of identifier by address

Target language– Assembly language– Machine code

Chart 47

Passes

Multi-pass compilation– Traverses the program or AST several times

One-pass compilation– Single traverse of program– Contextual analysis and code

generation are performed ‘on the fly’ during syntactic analysis

Compiler Driver

Syntactic Analyzer

Contextual Analyzer

Code Generator

Compiler Driver

Syntactic Analyzer

Contextual Analyzer

Code Generator

Chart 48

Compiler Design Issues

Speed– Compiler run time

Space– Storage: size of compiler + files generated

Modularity– Multi-pass compiler more modular than one-pass compiler

Flexibility– Multi-pass compiler is more flexible because it generates an AST

that can be traversed in any order by the other phases Semantics-preserving transformations

– To optimize code – must have multi-pass compiler Source language properties

– May restrict compiler choice – some language constructs may require multi-pass compilers

Chart 49

Chapter 4: Syntactic Analysis

Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle

Compiler

Chart 50

Structure of a Compiler

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation


Chart 51

Syntactic Analysis

Main function– Parse source program to discover its phrase structure– Recursive-descent parsing– Constructing an AST– Scanning to group characters into tokens

Chart 52

Sub-phases of Syntactic Analysis

Scanning (or lexical analysis)– Source program transformed to a stream of tokens

Identifiers Literals Operators Keywords Punctuation

– Comments and blank spaces discarded Parsing

– To determine the source programs phrase structure– Source program is input as a stream of tokens (from the Scanner)– Treats each token as a terminal symbol

Representation of phrase structure– AST

Chart 53

Lexical Analysis – A Simple Example

Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments

Some tokens for this example:main(){inta,b,c;

Main() {int a, b, c;char number[5];

/* get user inputs */A = atoi ( gets(number));B = atoi (gets(number));

/* calculate value for c */C = 2*(a+b) + a*(a+b);

/* print results */Printf(“%d”,c);

}

Chart 54

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t S v a r y : I n t e g e r i n . . . .S S S

(S= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.:=

becomes

y

Ident.

+

op.1

Intlit.

eot

Chart 55

Tokens in Triangle

// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2, "<identifier>", OPERATOR = 3, "<operator>",

// reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",

// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,

// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",

// special tokens... EOT = 33, "", ERROR = 34; "<error>"

Chart 56

Grammars Revisited

Context free grammars– Generates a set of sentences– Each sentence is a string of terminal symbols– An unambiguous sentence has a unique phrase

structure embodied in its syntax tree Develop parsers from context-free grammars

Chart 57

Regular Expressions

A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols

Main features– ‘|’ separates alternatives– ‘*’ indicates that the previous item may be represented

zero or more times– ‘(‘ and ‘)’ are grouping parentheses

Chart 58

Regular Expression Basics

e The empty string a special string of length 0 Regular expression operations

– | separates alternatives– * indicates that the previous item may be represented

zero or more times (repetition)– ( and ) are grouping parentheses

Chart 59


Algebraic Properties– | is commutative and associative

r|s = s|r r|(s|t) = (r|s)|t

– Concatenation is associative (rs)t = r(st)

– Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr

– e is the identity for concatenatione r = r r e = r

– * is idempotent r** = r* r* = (r| e)*

Chart 60


Common Extensions– r+ one or more of expression r, same as rr*– rk k repetitions of r

r3 = rrr– ~r the characters not in the expression r

~[\t\n]– r-z range of characters

[0-9a-z]– r? Zero or one copy of expression (used for fields

of an expression that are optional)

Chart 61

Regular Expression Example

Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– First Try: [0|1|e][0-9] Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes

0, 00, 18

Chart 62


Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– Second Try: [1-9]|(0[1-9])|(1[0-2]) Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No

Chart 63


Regular Expression for Floating Point Numbers– Examples of legal inputs

1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6 Assume that a 0 is required before numbers less than 1 and

does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal

– Building the regular expression Assume

Digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159

Digit+.digit+ Add an optional sign (only minus, no plus)

– (-| e)digit+.digit+ or -?digit+.digit+

Chart 64


Regular Expression for Floating Point Numbers (cont.)– Building the regular expression (cont.)

Format for the exponent(E|e)(+|-)?(digit+)

Adding it as an optional expression to the decimal part

(-| e)digit+.digit+((E|e)(+|-)?(digit+))?

Chart 65

Extended BNF

Extended BNF (EBNF)– Combination of BNF and RE– N::=X, where N is a nonterminal symbol and X is an

extended RE, i.e., an RE constructed from both terminal and nonterminal symbols

– EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal

symbols

Chart 66

Example EBNF

Expression ::= primary-Expression (Operator primary-Expression)*

Primary-Expression ::= Identifier| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))

Chart 67

Grammar Transformations

Left FactorizationXY | XZ is equivalent to X(Y | Z)

single-Command ::= V-name := Expression| if Expression then single-

Command| if Expression then single-

Commandelse single-Command

single-Command ::= V-name := Expression| if Expression then single-

Command(e |else single-Command)

Chart 68


Elimination of left recursionN::= X | NY is equivalent to N::=X(Y)*

Identifier ::= Letter| Identifier Letter| Identifier Digit

Identifier ::= Letter| Identifier (Letter | Digit)

Identifier ::= Letter(Letter | Digit)*

Chart 69


Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X

iff N::=X is nonrecursive and is the only production rule for N

single-Command ::= for Control-Variable := Expression To-or-DowntoExpression do single-Command

| …Control-Variable ::= IdentifierTo-or-Downto ::= to

| down

single-Command ::= for Identifier := Expression (to|downto)Expression do single-Command

| …

Chart 70

Scanning (Lexical Analysis)

The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.

Difference between parsing and scanning:– Parsing groups terminal symbols, which are tokens,

into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure

– Scanning groups individual characters into tokens

Chart 71


Lexical Analyzer

Parser & Semantic Analyzer


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Chart 72

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t S v a r y : I n t e g e r i n . . . .S S S

(S= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.:=

becomes

y

Ident.

+

op.1

Intlit.

eot

Chart 73

What Does a Scanner Do?

Hand keywords (reserve words)– Recognizes identifiers and keywords– Match explicitly

Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword

– Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded

keyword table

How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.

Chart 74


Remove white space– Tabs, spaces, new lines

Remove comments– Single line

-- Ada comment– Multi-line, start and end delimiters

{ Pascal comment }/* c comment */

– Nested– Runaway comments

Nonterminated comments can’t be detected till end of file

Chart 75


Perform look ahead– Multi-character tokens

1..10 vs. 1.10&, &&<, <=etc

Challenging input languages– FORTRAN

Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal)

DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I

Chart 76


Challenging input languages (cont.)– PL/I, keywords not reserved

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

Chart 77


Error Handling– Error token passed to parser which reports the error– Recovery

Delete characters from current token which have been read so far, restart scanning at next unread character

Delete the first character of the current lexeme and resume scanning form next character.

– Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character

– Some errors that are not lexical errors Mistyped keywords

– Begim Mismatched parenthesis Undeclared variables

Chart 78

Scanner Implementation

Issues– Simpler design – parser doesn’t have to worry about

white space, etc.– Improve compiler efficiency – allows the construction of

a specialized and potentially more efficient processor– Compiler portability is enhanced – input alphabet

peculiarities and other device-specific anomalies can be restricted to the scanner

Chart 79

Scanner Implementation

What are the keywords in Triangle? How are keywords and identifiers implemented in

Triangles? Is look ahead implemented in Triangle?

– If so, how?

Chart 80


Lexical Analyzer

Parser


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Semantic Analyzer

Chart 81

Parsing

Given an unambiguous, context free grammar, parsing is– Recognition of an input string, i.e., deciding whether or

not the input string is a sentence of the grammar– Parsing of an input string, i.e., recognition of the input

string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

Chart 82

Parsing

The syntax of programming language constructs are described by context-free grammars.

Advantages of unambiguous, context-free grammars– A precise, yet easy-to understand, syntactic specification of

the programming language– For certain classes of grammars we can automatically

construct an efficient parser that determines if a source program is syntactically well formed.

– Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.

– Easier to add new constructs to the language if the implementation is based on a grammatical description of the language

Chart 83

Parsing

Check the syntax (structure) of a program and create a tree representation of the program

Programming languages have non-regular constructs– Nesting– Recursion

Context-free grammars are used to express the syntax for programming languages

sequence of tokens parser syntax tree

Chart 84

Context-Free Grammars

Comprised of– A set of tokens or terminal symbols– A set of non-terminal symbols– A set of rules or productions which express the legal

relationships between symbols– A start or goal symbol

Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr

Chart 85

Context-Free Grammars

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr

expr

expr digit

digit

digit

3

2

8

+

-

Chart 86

Checking for Correct Syntax

Given a grammar for a language and a program, how do you know if the syntax of the program is legal?

A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free

Chart 87

Deriving a String

The derivation begins with the start symbol At each step of a derivation the right hand side of a

grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal

symbols remain



expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2

expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4

Chart 88

Rightmost Derivation

The rightmost non-terminal is replaced in each step



expr + digit - 2 expr + 8-2

expr + 8-2 digit + 8-2Rule 3

expr expr – digitRule 1

expr – digit expr – 2Rule 4

expr – 2 expr + digit - 2Rule 2

Rule 4

digit + 8-2 3+8 -2Rule 4

Chart 89

Leftmost Derivation

The leftmost non-terminal is replaced in each step



digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4


expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

Chart 90

Leftmost Derivation

The leftmost non-terminal is replaced in each step

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4


expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

expr

expr

expr digit

digit

digit

3

2

8

+

-

3

2

1

4

5

6

1

2

3

4

5

6

Chart 91

Bottom-Up Parsing

Parser examines terminal symbols of the input string, in order from left to right

Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)

Bottom-up parsing reduces a string w to the start symbol of the grammar.– At each reduction step a particular sub-string matching

the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

Chart 92

Bottom-Up Parsing

Types of bottom-up parsing algorithms– Shift-reduce parsing

At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

– LR(k) parsing L is for left-to-right scanning of the input, the R is for

constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.

Chart 93

Bottom-Up Parsing Example3+8-2



3 + 8 - 2

3 + 8 - 2

digit

3 + 8 - 2

digitdigit

3 + 8 - 2

digitdigitexpr

Chart 94

Bottom-Up Parsing Example3+8-2

3 + 8 - 2

digitdigitexpr

3 + 8 - 2

digitdigitexpr

digit

expr

3 + 8 - 2

digitdigitexpr

digit

Chart 95

Bottom-Up Parsing Exampleabbcde

a b b c d1. S aABe2. A Abc | b3. B d

Example input: abbcde

e

a b b c d e

A

a b b c d e

A

Abbcde aAbcde

aAbcde

Chart 96


1. S aABe2. A Abc | b3. B d


a b b c d e

A

A

a b b c d e

A

A

aAde

aAbcde aAde

Chart 97




a b b c d e

A

A

aAde aABe

B

a b b c d e

A

A

aABe

B

Chart 98




a b b c d e

A

A

aABe S

B

S

Chart 99

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

the cat sees a rat

Noun

the cat sees a rat. the Noun sees a rat.

the cat sees a rat

Noun

the Noun sees a rat.

.

.

.

Chart 100




the cat sees a rat

Noun

the Noun sees a rat. Subject sees a rat.

Subject

the cat sees a rat

Noun

Subject sees a rat.

Subject

.

.

Chart 101




the cat sees a rat

Noun

Subject sees a rat. Subject Verb a rat.

Subject

Verb

.

the cat sees a rat

Noun

Subject Verb a rat.

Subject

Verb

.

Chart 102




the cat sees a rat

Noun

Subject Verb a rat. Subject Verb a Noun.

Subject

Verb

.

Noun

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun.

Chart 103




the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun. Subject Verb Object.

Object

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

ObjectWhat would happened if we

choose ‘Subject a Noun’ instead of

‘Object a Noun’?

Chart 104




the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

Object

Sentence

Chart 105

Top-Down Parsing

The parser examines the terminal symbols of the input string, in order from left to right.

The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string

Chart 106

Top-Down Parsers

General rules for top-down parsers– Start with just a stub for the root node– At each step the parser takes the left most stub– If the stub is labeled by terminal symbol t, the parser

connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)

– If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).

– Parsing succeeds when and if the whole input string is connected up to the syntax tree.

Chart 107

Top-Down Parsing

Two forms– Backtracking parsers

Guesses which rule to apply, back up, and changes choices if it can not proceed

– Predictive Parsers Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers

Chart 108

Predictive Parsers

Many types– LL(1) parsing

First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead

Table driven with an explicit stack to maintain the parse tree– Recursive decent parsing

Uses recursive subroutines to traverse the parse tree

Chart 109

Predictive Parsers (Lookahead)

Lookahead in predictive parsing– The lookahead token (next token in the input) is used

to determine which rule should be used next– For example:

1. term num term’2. term’ ‘+’ num term’ |

‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’


term’num7

+

term

num term’

term

num term’

Chart 110



‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’


term’num7

+

term

num term’

3

term’num7

+

term

num term’

3

- num term’

Chart 111



‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’


term’num7 +

term

num term’

3 - num term’

2

term’num7 +

term

num term’

3 - num term’

2 e

Chart 112

Recursive-Decent Parsing

Top-down parsing algorithm– Consists of a group of methods (programs) parseN,

one for each nonterminal symbol N of the grammar.– The task of each method parseN is to parse a single

N-phrase– These parsing methods cooperate to parse complete

sentences

Chart 113




Sentence

Subject Verb Object .

the cat sees a rat

.a. Decide which production rule to apply. Only one, #1.This step created four stubs.

Chart 114




Sentence


the cat sees a rat

Noun

Chart 115




Sentence


the cat sees a rat

Noun

Chart 116




Sentence


the cat sees a rat

Noun

Chart 117




Sentence


the cat sees a rat

Noun Noun

Chart 118




Sentence


the cat sees a rat

Noun Noun

Chart 119




Sentence


the cat sees a rat

Noun Noun

Chart 120

Recursive-Descent Parser for Micro-English

ParseSentenceParseSubjectParseObjectParseVerbParseNoun


Chart 121


ParseSentenceparseSubjectparseVerbparseObjectparseEnd


Sentence

Subject

Verb

Object

.

Chart 122


ParseSubjectif input = “I”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error


Subject I

|

Noun

a

|

Noun

the

Chart 123


ParseNounif input = “cat”

acceptelse if input =“mat”

acceptelse if input = “rat”

acceptelse error


Noun cat

| mat

| rat

Chart 124


ParseObjectif input = “me”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error


Object me

|

Noun

a

|

Noun

the

Chart 125


ParseVerbif input = “like”

acceptelse if input =“is”

acceptelse if input = “see”

acceptelse if input = “sees”

accept else error


Verb like

| is

| see

| sees

Chart 126


ParseEndif input = “.”

acceptelse error


.

Chart 127

Systematic Development of a Recursive-Descent Parser

Given a (suitable) context-free grammar– Express the grammar in EBNF, with a single production rule for

each nonterminal symbol, and perform any necessary grammar transformations

Always eliminate left recursion Always left-factorize whenever possible

– Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X

– Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the

scanner A public parse method that calls parseS, where S is the start symbol

of the grammar), having first called the scanner to store the first input token in currentToken

Chart 128

Quote of the Week

“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”– Bjarne Stroustrup

Chart 129

Quote of the WeekDid you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

Chart 130

Converting EBNF Production Rules to Parsing Methods

For production rule N::=X– Convert production rule to parsing method named parseN

Private void parseN () { Parse X }

– Refine parseE to a dummy statement– Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()– Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing

methodparseN()

– Refine parse X Y to{parseXparseY}}

– Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]

Parse XBreak;

Cases in starters[[Y]]:Parse YBreak

Default:Report a syntax error

}

Chart 131


For X | Y – Choose parse X only if the current token is one that

can start an X-phrase– Choose parse Y only if the current token is one that

can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint

For X*– Choose

while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can

follow X* in this particular context

Chart 132


A grammar that satisfies both these conditions is called an LL(1) grammar

Recursive-descent parsing is suitable only for LL(1) grammars

Chart 133

Error Repair

Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.

Error repair usually occurs at two levels:– Local: repairs mistakes with little global import, such as

missing semicolons and undeclared variables.– Scope: repairs the program text so that scopes are

correct. Errors of this kind include unbalanced parentheses and begin/end blocks.

Chart 134

Error Repair

Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:– No input should cause the compiler to collapse– Illegal constructs are flagged– Frequently occurring errors are repaired gracefully– Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

Chart 135

Mini-Triangle Production RulesProgram ::= Command Program (1.14)

Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand(1.15b)

| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)

else Command| while Expression do Command WhileCommand

(1.15e| let Declaration in Command LetCommand

(1.15f)

Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression BinaryExpressioiun

(1.16d)

V-name ::= Identifier SimpelVname (1.17)

Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)| var Identifier : Typoe-denoter VarDeclaration

(1.18b)| Declaration ; Declaration SequentialDeclaration

(1.18c)

Type-denoter ::= Identifier SimpleTypeDenoter (1.19)

Chart 136

Abstract Syntax Trees

An explicit representation of the source program’s phrase structure

AST for Mini-Triangle

Chart 137


Program ASTs (P):

Program

C

Program ::= Command Program (1.14

Command ASTs (C):

AssignCommand

V E

CallCommand

Identifier E

spelling

SequentialCommand

C1 C2

Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

(1.15a)(1.15b) (1.15c)

Chart 138


Command ASTs (C):

WhileCommand

V E

SequentialCommand

C1 C2(1.15e)(1.15d)

LetCommand

D C(1.15f) E

Command ::= | if Expression then Command IfCommand (15.d)else Command

| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)

Chart 139

Midterm Review: Chapter 1

Context-free Grammar– A finite set of terminal symbols– A finite set of non-terminal symbols– A start symbol– A finite se to production rules

Aspects of a programming language that need to be specified– Syntax: form of programs– Contextual constraints: scope rules and type

variables– Semantics: meaning of programs

Chart 140


Language specification– Informal: written in English– Formal: precise notation (BNF, EBNF)

UnambiguousConsistentComplete

Context-free language– Syntax tree– Phrase– Sentence

Chart 141


Syntax tree– Terminal node labeled by terminal symbol– Non-terminal nodes labeled b y non-terminal symbol

Abstract Syntax Tree (AST)– Each non-terminal node ius labeled by production rule– Each non-terminal node has exactly one subtree for

each subprogram– Does not generate sentences

Chart 142


Translator– Accepts any text expressed in one language (source

language) and generates a semantically-equivalent text expressed in another language (target language)

Compiler– Translates from high-level language into low-level

language Interpreter

– A program that accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately

Chart 143


Interpretive compiler– Combination of compiler and interpreter

Some of the advantages of each Portable compiler

– Compiled and run on any mainline, without change– Portability measured by proportion of code that

remains unchanged– Portability is an economic issue

Bootstrapping– Using the language processor to process itself

Tombstone diagrams

Chart 144

Midterm Review: Chapter 3 Three phases of compilation

– Syntactic analysis– Contextual analysis– Code generation

Single pass compilers Multi-pass compilers Compiler design issues

– Speed– Space– Modularity– Flexibility– Semantic preserving transformations– Source language properties

Chart 145


Sub-phases of syntactic analysis– Scanning (lexical analysis)

Source program transformed to stream of tokens Comments and blank spaces between tokens are discarded

– Parsing Source program in form of stream of tokens parsed to

determine phrase structure Parser treats each token as a terminal symbol

– Representation of the phrase structure A data structure representing the source program’s phrase

structure Typically an abstract syntax tree (AST)

Chart 146


Tokens– An atomic symbol of the source program– May consist of several characters– Classified according to kind

All tokens of the same kind can be freely interchanged without affecting the program’s phrase structure

– Each token completely described by it’s kind and spelling

Token represented by tuple– Only kind of each token examined by parser

Spelling examined by contextual analyzer and/or code generator

Chart 147


Grammars– Regular expressions

“|” separates alternatives “*” indicates that the previous item may be repeated zero or

more times “(“ and “)” are grouping parenthesis e is the empty string

– a special string of length 0 Algebraic properties Common extensions

– Grammar transformations Left factorization Elimination of left recursion Substitution of non-terminal symbols

Chart 148


Structure of compiler– Source code– Lexical analyzer– Parser & semantic analyzer– Intermediate code generation– Optimization– Assembly code generation– Assembly code

Chart 149


Scanning (lexical analysis)– What does it do?

Handles keywords (reserve wordsRemoves white space (tabs, spaces, new lines)Removes commentsPerform look aheadError handling

– IssuesSimpler designImprove compiler efficiencyEnhance compiler portability

Chart 150


Parsing– Given an unambiguous, context-free grammar

Recognition of input string – sentence in grammarParsing an input string – determines its phrase

structure– Why is unambiguous important?– Advantages of unambiguous, context-free

grammars (see chart 81)– How do you know the syntax of a language is

legal?A legal program can be derived from the start

symbol of the grammar

Chart 151


Parsing– Rightmost (replace rightmost non-terminal in

each step) and leftmost (replaced leftmost non-terminal in each step) derivation

– Bottom-up (reconstructs syntax tree from terminal nodes up toward the root node) and top-down (reconstructs syntax tree from the root node down towards the terminal nodes)

– Predictive parsersLL(1)Recursive decent

Chart 152


Parsing– Converting EBNF production rules to parsing

methods– Error repair

Chart 153

Chapter 5: Contextual Analysis

Identification– Monolithic Block Structure– Flat Block Structure– Nested Block Structure– Attributes– Standard Environment

Type Checking A Contextual Analysis Algorithm Case Study: Contextual Analysis in the Triangle

Compiler

Chart 154

Contextual Analysis

Given a parsed program, the purpose of contextual analysis is to check that the program conforms to the source language’s contextual constraints.– Scope rules: rules governing declarations and applied

occurrences of identifiers– Type rules: rules that allow us t0 infer the types of

expressions, and to decide whether each expression has a valid type

Analysis of the program to determine correctness with respect to the language definition (beyond structure)

Chart 155

Contextual Analysis

Contextual analysis consists of two sub-phases:– Identification: applying the source language’s scope

rules to relate each applied occurrence of an identifier to its declaration (if any).

– Type checking: applying the source language's type rules to infer the type of each expression, and compare that type with the expected type.

Chart 156


Lexical Analyzer

Parser


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Chart 157

Identification

Relate each applied occurrence of an identifier in the source program to the corresponding declaration– Ill-formed program if no corresponding declaration –

generate error Identification could cause compiler efficiency

problems Inefficient to use the AST

Chart 158

Identification Table

Also known as symbol table Associates identifiers with their attributes Basic operation

– Make the identification table empty– Add an entry associating a given identifier with a given attribute– Retrieve the attribute (if any) associated with a given identifier

Attribute– Consists of information relevant to contextual analysis– Obtained from the identifier’s declaration

Chart 159

Identification Table

Each declaration in a program has a defined scope– Portion of program over which the declaration takes

effect Block: any program phase that delimits the scope

of declarations within it Example Triangle block command

– Let D in C Scope of each declaration in D extends over the subcommand

C

Chart 160

Identification Table: Structure/Implementation

Maintain scope– An identifier should be found in the table only when

valid– If an identifier is defined in multiple scopes, then a

lookup in the table must provide the appropriate meaning for the use

Efficiency– How fast is lookup?– How fast to enter/exit a scope?– What is the overall table size?

Chart 161

Identification Table: Structure/Implementation

Different implementations– Organized for efficient retrieval– Binary search tree– Hash table

Chart 162

Identification Table: Functionality

A mapping of identifiers to their meanings

Information– Name– Type– Location

Operations– Create– Insert– Lookup– Delete– Update entry– Entering a new

scope– Leaving a scope

Chart 163

Block Structures

Monolithic block structure– Basic and Cobol

Flat block structure– Fortran

Nested block structure– Pascal, Ada, C, and Java

Chart 164

Monolithic Block Structure

The only block is the entire program All declarations are global Simple rules

– No identifier may be declared more than once– For every applied occurrence of an identifier I, there must be a

corresponding declaration of I No identifier may be used unless declared

The identification table should contain entries for all declarations in the source program– At most, one entry for each identifier– The table contains an identifier I and the associated attribute A

Chart 165

Monolithic Block Structure

Identification Attribute

b

n

c

(1)

(2)

(3)

Program(1) integer b = 10(2) integer n(3) char C

begin…n = n * b…Write c…

end

• Create new tablecreate command

• At declaration for identifier I, make table entry

insert command• At applied occurrence of identifier I, retrieve

information from tablelookup command

Chart 166

Flat Block Structure

Program partitioned into several disjoint blocks Two scope levels

– Some declarations are local in scope Identifiers restricted to particular block

– Other declarations are global in scope Identifiers allowed anywhere in the program – the program as a whole

is a block Less simple rules

– No global declared identifier may be redeclared globally But same identifier may also be declared locally

– No locally declared identifier may be redeclared in the same block Same identifier may be declared locally in several different blocks

– For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I

Either global declaration of I or a declaration of I local to B

Minor complication is to distinguish global and local declaration entries

Chart 167

Flat Block Structure

• Create new tablecreate command

• At start of a blockenter new scope command

• At end of a blockleave scope commanddelete command


insert command• At applied occurrence of identifier I,

retrieve information from tablelookup command

(5) integer cbegin

…end

(4) procedure R

(2) real r(3) real pi = 3.14begin

…end

(1) procedure Q

(6) integer i(7) boolean b(8)char cbegin

…call R…

end

program


Q

r

pi

(1)

(2)

(3)

Level

global

local

local


Q

R

c

(1)

(4)

(5)

Level

global

global

local


Q

R

(1)

(4)

Level

global

global


Q

R

i

(1)

(4)

(6)

Level

global

global

local

local

local

b

c

(7)

(8)

Chart 168

Nested Block Structure

Blocks may be nested one within another Many scope levels

– Declarations in the outermost block are global in scope.

The outermost block is at scope level 1– Declarations inside an inner block are local to that

block Every inner block is completely enclosed by another block Next to outermost block is at scope level 2 If enclosed by a level-n, the block is at scope level n+1

Chart 169

Nested Block Structure

More complex rules– No identifier may be declared more than once in the

same block Same identifier may be declared in different blocks, even if

they are nested– For every applied occurrence of an identifier I in a

block B, there must be a corresponding declaration of I Must be in B itself Or in the block B’ immediately enclosing B Or in B’’ immediately enclosing B’ Etc.In smallest enclosing block that contains any declaration of I

Chart 170

Nested Block Structure• Create new table

create command• At start of a block

enter new scope command• At end of a block

leave scope commanddelete command


insert commandLevel number determined by number of calls to enter new scope

• At applied occurrence of identifier I, retrieve information from table using highest level for I

lookup command

Let(1) var a: Integer;(2) var b: BooleanIn

begin…;


a

b

(1)

(2)

Level

1

1


a

b

b

(1)

(2)

(3)

Level

1

1

2

2

3

c

d

(4)

(5)

let(3) var b: Integer;(4) var c: BooleanIn

begin…;

let(6) var d: Boolean;(7) Var e: Integer

in…;

…end;

…end

…

let(5) var d: Integer;

In…;


a

b

b

(1)

(2)

(3)

Level

1

1

2

2 c (4)


a

b

d

(1)

(2)

(6)

Level

1

1

2

e (7)2

Chart 171

Attributes

Kind– constant– variable– procedure– function– type

Type– boolean– character– integer– record– array

Examples

Chart 172

Attributes

Information to be extracted from declaration– Constant, variable, procedure, function, type– Procedure or function declaration includes a list of formal

parameters that may be a constant, variable, procedural, or functional parameter

– Language provides whole families of record and array types How to manage attribute information

– Extract type information from declarations and store in information table

Could be complex for a realistic programming language Could require tedious programming

– Use the AST Pointers in information table pointing to location in AST with that

identifier

Chart 173

AttributesProgram

LetCommand

SequentialDeclaration SequentialCommand

VarDeclaration VarDeclaration SequentialCommand

LetCommand

SequentialDeclaration

VarDeclaration VarDeclaration

Ident. int boolIdent.

Ident. intbool Ident.

a b

d e

. . .

. . .

. . .

Identification Attributeab

Level

11

(1) (2)

(6)

Identification Attributeabd

Level112

e

(7)

2

Chart 174

Standard Environment

Predefined constants, variables, types, procedures, and functions

These are loaded into the identification table Scope rules for standard environment

– Scope enclosing the entire program Level 0

– Same scope level as global declarations Example is C

Chart 175


Lexical Analyzer

Parser


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Chart 176

Type Checking

Second task of contextual analyzer is to ensure that the source program contains on type errors

Once applied occurrence of an identifier has been identified, the contextual analyzer will check that the identifier is used in a way consistent with its declaration

Chart 177

Type Checking

Statically –typed language can detect any type errors without actually running the program– For every expression E in the language, the compiler

can infer either that E has some type T or that E is ill-typed

If E does have type T, then E will always yield a value of type T If a value of type T’ is expected, then compiler checks that T’ is

equivalent to T

Chart 178

Type Checking Infers the type of each expression bottom-up

– Starting with literals and identifiers, and working up through larger and larger subexpressions

– Literal: The type of a literal is immediately known– Identifier: The type of an applied occurrence of identifier I is

obtained from the corresponding declaration of I– Unary operator application:

Consider “O E” where O is a unary operator of type T1 T2 Type checker ensures that E’s type is equivalent to T1 Infers that type of “O E” is T2. Otherwise a type error

– Binary operator application: Consider “E1 O E2” where O is binary operator of type T1 X T2 T3 E1’s type is equivalent to T1 E2’s type is equivalent to T2 ‘E1 O E2‘ is of type T3 Otherwise type error

Chart 179

Type Checking

Type of a nontrivial expression is inferred from the types of its sub-expressions, using the appropriate type rules

Must be able to test if two given types T and T’ are equivalent

Chart 180

Type Checking – Constant or Variable Identifier

ConstDeclaration

Ident. Expr.

x . . .

:T

SimpelVname

Ident.

x

ConstDeclaration

Ident. Expr.

x . . .

:T

SimpelVname

Ident.

x

:T

Chart 181

Type Checking – Variable Declaration

VarDeclaration

Ident.

x

T

SimpelVname

Ident.

x

VarDeclaration

Ident.

x

T

SimpelVname

Ident.

x

:T

Chart 182

Type Checking – Binary Operator

BinaryExpression

Ident.

. . .

Expr.Op.

. . .

:int:int

<

BinaryExpression

Ident.

. . .

Expr.Op.

. . .

:int:int

<

:bool

< is of type Int X int bool

Chart 183

Type Checking

Each applied occurrence of an identifier must be identified before type checking can proceed

Chart 184

Chapter 6: Run-time Organization

Marshal the resources of the target machine (instructions, storage, and system software) in order to implement the source language

Chart 185

Chapter 6: Run-time Organization

Data Representation– How should we represent the values of each source-language type in the

target machine? Expression Evaluation

– How should we organize the evaluation of expressions, taking care of intermediate results?

Static Storage Allocation– How should we organize storage for variables, taking into account the

different lifetimes of global, local, and heap variables? Stack Storage Allocation Routines

– How should we implement procedures, functions, and parameters, in terms of low-level routines?

Heap Storage Allocation Run-time Organization for Object-oriented Languages

– How should we represent objects and methods? Case Study: The Abstract Machine TAM

Chart 186

Data Representation

How should we represent the values of each source-language type in the target machine?

High-Level Data Types• Truth values• Integers• Characters• Records• Arrays• Operations over these types

Machine Data Types• Bits• Bytes• Words• Double-words• Low-level arithmetic

and logical operations

Need to bridge the semantic gap between high-level types and machine level types

Chart 187

Data Representation -- Fundamental Principles

Non-confusion– Different values of a given type should have different

representations– If two different values are confused, i.e., have the same

representation, then comparison of these values will incorrectly treat the values as equal

– Example: approximate representation of real numbers Real numbers that are slightly different mathematically might have the

same approximate representation Difficult to avoid – need to take care during compiler design

– Must avoid confusion in the representations of discrete types such as truth values, characters, and integers

– For statically typed languages need only be concedrned with values of the same type

00…002 may represent false, the integer 0, the real number 0.0 Compile time type checks will denote the values of different types

Chart 188

Data Representation -- Fundamental Principles

Uniqueness– Each value should always have the same

representation– Example of non-uniqueness

Ones-complement representation of integers in which zero is represented both by 00...002 and 11…112 (+0 and –0)

A simple bit-string co0parison would incorrectly treat these values as unequal

More specialized integer comparison must be used Alternative twos-complement representation gives us unique

representations of integers

Chart 189

Data Representation – Pragmatic Issues

Constant-size representation– The representations of all values of a given type should

occupy the same amount of space– Make possible for compiler to plan the allocation of

storage– Knowing the type of variable but not the actual value,

the compiler will know exactly how much storage space the variable will occupy

Chart 190

Data Representation – Pragmatic Issues

Direct representation vs. indirect representation– Should the values of a given type be represented

directly, or indirectly through pointers?– Direct representation

Just the binary representation of the value consisting of one or more bits, bytes, words

– Indirect representation A handle that points to the storage area which has the binary

representation of the value Essential for types whose values vary greatly in size

– List or dynamic array

Chart 191

Direct representation vs. indirect representation

x x yhandle handle

Same type as x but requiring more space

Chart 192

Notation

#T: cardinality of type T– Number of distinct values of type T– #[[Boolean]] = 2

Size T: amount of space (in bits, bytes, or words) occupied by each value of type T– For indirect representation only handle is counted

For direct representation of type T– size T log2 (#T) or 2(size T) #T– size T is represented in bits– In n bits we can represent at most 2n distinct values if

we are to avoid confusion non-confusion requirement

Chart 193

Primitive Types

Cannot be decomposed into simpler values Most programming languages provide these

primitive types– Boolean, Char, Integer– Also provide elementary logical and arithmetic

operations Machines typically support the above primitive

types, so choice of representation is straightforward

Chart 194

Primitive Types Representation

Boolean– true and false– Since #[[Boolean]] = 2 then size[[Boolean]] 1 bit– Can represent Boolean with one bit, one bye, or one

word For single bit: 0 for false and 1 for true For byte or word: 00…002 for false and either 00…012 or 11…

112 for true Negation, conjunction, disjunction NOT, AND, OR

Chart 195


Char– Source language can specify character set

Ada: ISO-Latin1 character set (28 distinct characters) Java: Unicode character set (216 distinct characters)

– Most do not Allows compiler writers to choose the machine’s native

character set (27 or 28 distinct characters)– ISO defines character representation for “A” to be

010000012

Can represent a character by one byte or one word

Chart 196


Integer– Denotes an implementation-defined bounded range of integers

Defined by the individual language processor– Binary representation determined by target machine’s arithmetic

unit and almost always occupies one word Can implement language’s integer operations with machine's integer

operations– Pascal and Triangle

-maxint, …, -1, 0, +1, …, +maxint– maxint is implementation defined

#[[Integer]] = 2 X maxint + 1 2size[[Integer]] 2 X maxint + 1 For word size of w bits, size[[Integer]] = w, maxint = 2w-1 – 1

– Java Int denotes –231, …, -1, 0, +1, …, +231 – 1 #[[Int]] = 232

Chart 197

Record Type

Consists of several fields, each of which has an identifier– All records of a particular type have fields with the

same identifiers and types Fundamental operation on records is field

selection– Use one field identifier to access the corresponding

field Simple representation

– Juxtapose the fields to make them occupy consecutive positions in storage

– Allows us to predict total sized of each record and the position of each field relative to the base of the record

Chart 198

Record Type

Consider the followingtype T = record I1: T1, …, In: Tn end;var r: T

– size T = size T1 + … + size Tn

– If size T1, .., and size Tn are all constant, then size T is also constant

– Implementation of field selection Address[[r.Ii]] = address r + (size T1 + … + size Ti-1)

Value of type T1

Value of type T2

Value of type Tn

r.I1

r.I2

r.In

… …

Some machines have alignment restrictions, which force unused space to be left between record fields; cannot use these equations

Chart 199

Disjoint Unions Tag and a variant part Value of tag determines type of variant part

– T = T1 + … + Tn

In each value of type T, the variant part is a value chosen from one of the types T1, …, or Tn; the tag indicates which one

– Size T = size Ttag + max(sizeT1, …, size Tn)– Address[[u.Itag]] = address u + 0– Address[[u.Ii]] = address u + size Ttag

value of type T1

value of type T2

value of type Tn

value of type Ttag

u.Itag u.Itag u.Itag

u.I1u.I2 u.Inor … or …

Will

hav

e w

aste

d sp

ace

Wasted space

Max

(siz

eT1,…

,size

T n)

Chart 200

Static Arrays

Consists of several elements, all of the same type– Bounded range of indices – usually integers– Each index has exactly one element– Fundamental operation on arrays is indexing

Access an individual element by giving its index Index evaluated at run-time

– Static Array Index bounds are known at compile-time Direct representation is to juxtapose the array elements, in

order of increasing indices. Implemented by run-time address computation

Chart 201

Static Arrays (lower index bound is 0)

Consider the following exampleType T = array n of Telem;Var a: T

– Size T = n X size Telem

– The number of elements n is constant, so size Telem is constant, then size T is also constant

– Address[[a[i] ]] = address a + (i X size Telem)

– Since i is known only at run-time, an array indexing implies a run-time address computation

a[0]a[1]a[2]

a[n-1]

values of type Telem

Chart 202

Static Arrays (programmer chooses lower and upper array bounds)

Consider the following exampleType T = array [l..u] of Telem;Var a: T

– size T = (u - l + 1) X size Telem– The number of elements (u – l + 1) is

constant, so size Telem is constant, then size T is also constant

– address[[a[i] ]] = address a + (i – l) X size Telem) = address a – (l X size Telem) + (i X size Telem)

– Address[[a[0] ]] = address a – (l X size Telem)– Address[[a[i] ]] = address[[a[0] ]] + (i X size

Telem)– Since i is known only at run-time, an array

indexing implies a run-time address computation

– Index check must ensure that l i u

a[l]a[l+1]a[l+2]

a[u]

values of type Telem

Chart 203

Dynamic Arrays

An array whose index bounds are not know until run-time– Different dynamic arrays of the same type may have

different index bounds, and therefore different numbers of elements

– Need to satisfy constant-size requirement– Create array descriptor or handle

Pointer to the array’s elements Index bounds Handle has constant size

Chart 204

Dynamic Arrays Ada example

Type T is array [Integer range <>) of Telem;a: T (E1 .. E2);

– size T = address:size + 2 X size[[Integer]] Address:size is the amount of space required to store an address –

usually one word. Satisfies constant-size requirement

– Declaration of array variable a: E1 and E2 are evaluated to yield a’s index bounds (say l and u) Space is allocated for (u – l + 1) elements, juxtaposed and separate

from a’s handle Address[[a(0)]] = address[[a(l)]] – (l X size Telem) Values for address[[a(0)]], l, and u are stored in a’s handle

– The element with index i will be address as follows: Address[[a(i)]] = address[[a(0)]] + (i X size Telem) =

content(address[[a]]) + (i X size Telem) Index check is l i u where l = content(address[[a]] + address:size)

and u = content(address[[a]]+ address:size + size[[Integer]]

Chart 205

Dynamic Arrays

a[l]

a[u]

elements of type Telem

a[l+1]a[l+2]

a[0]lu

origina lower bound

upper bound

handle

Chart 206

Status

Chapter 6: Run-time Organization– Data Representations

Primitive types Record types Disjoint unions Static arrays Dynamic arrays Recursive types

– Expression Evaluation Register machine Stack machine

– Static Storage Allocation Global variables

– Stack Storage Allocation Local variables

Chart 207

Recursive Types

Defined in terms of itself– Values of recursive type T have components that are

themselves of type T– Examples

List with tail being itself a list Tree with the sub-trees themselves being trees

Chart 208

Recursive Types Consider the Pascal declaration

type IntList = ^IntNode; IntNode = record

head: Integer;tail: IntList

end;var primes: IntList

– Size[[IntList]] = address:size (usually 1 word)

primeshandle

Always use pointers to represent values of the recursive type

Chart 209

Expression EvaluationRegister Machine

How should we organize the evaluation of expressions

The problem is the need to keep intermediate results somewhere

Consider the expressiona * b + (1 – (c * 2))

– Will have intermediate results for a * b, c * 2, and 1 – (c * 2)

– For a register based machine (non-stack machine) Use the registers to store intermediate results Problem arises when there are not enough registers for all

intermediate results

Chart 210

Expression EvaluationExample a * b + (1 – (c * 2))

LOAD R1 aMULT R1 bLOAD R2 #1LOAD R3 cMULT R3 #2SUB R2 R3ADD R1 R2

a, b, c are memory addresses for the values of a, b, c

Chart 211

Expression EvaluationStack Machine

The machine provides a stack for holding intermediate results

For the expression a * b + (1 – (c * 2))LOAD aLOAD bMULTLOADL 1LOAD cLOADL 2MULTSUBADD

Chart 212

Expression EvaluationStack Machine Example a * b + (1 – (c * 2))

value of a

unusedspace

value of avalue of b

value of a*b value of a*b1

value of a*b1

value of c

value of a*b1

value of c2

value of a*b1

value of c*2

value of a*bvalue of 1-(c*2)

value of (a*b)+(1-(c*2))

(1) After LOAD a (2) After LOAD b (3) After MULT

(5) After LOAD c

(4) After LOAD 1

(6) After LOAD 2 (7) After MULT (8) After SUB

(9) After ADD

Operands of different types (and therefore different sizes) can be evaluated in just the same way. E.g., AND, OR, function, etc. Each operation takes values from top of stack and places results onto top of stack

Chart 213

Static Storage AllocationGlobal Variables

Each variable in source program requires enough storage to contain any value that might be assigned to it

As a consequence of constant-size representation, the compiler knows how much storage needs to be allocated to variable, based on type of variable (size T)

Global variables– Variables that exist and take up storage throughout the program’s

run-time.– Static storage allocation: Compiler locates these variables at

some fixed positions in storage (decides each global variable’s address relative to the base of the storage region in which global variables are located)

Chart 214

Static Storage AllocationGlobal Variables: Example

lettype Date = record

y: Integer,m: Integer;d: Integer

end;var a: array 3 of Integer;var b: Boolean;var c: Char;var t: Date

in. . .

a(0)a(1)a(2)bct.yt.mt.dunusedspace

a

t

Chart 215

Stack Storage AllocationLocal Variables

A local variable v is one that is declared inside a procedure (or function).

Lifetime of v: the variable v exists (occupies storage) only during an activation of that procedure

If same procedure is activated several times– v will have several lifetimes– Each activation creates a distinct variable

Chart 216

Stack Storage AllocationLocal Variables: An Example

letvar a: array 3 of Integer;var b: Boolean;var c: Char;proc Y () ~

letvar d: Integer;var e: record c: Char, n:

Integer endin

. . .proc Z () ~

letvar f: Integer

inbegin …; Y(); … end

inbegin …; Y(); …; Z(); … end

Chart 217

Stack Storage AllocationLocal Variables: An Example

time

Programcalls Y

Returnfrom Y

Programcalls Z

Z calls Y Returnfrom Y

Returnfrom Z

Programstops

Lifetime of variables local to Y

Lifetime of variables local to Z

Lifetime of variables local to Y

Lifetime of global variables

Observations:• Global variables are the only ones that exist throughout the program’s run-time

• Use static allocation for global variables• Lifetimes of local variables are properly nested

• Use a stack for local variables

Chart 218

Stack Storage AllocationStack Frames: An Example

globals

SB

ST

(1) After program starts

globals

SB

LB

(2) After program calls Y

globals

SB

ST

(3) After return from Y

framefor Y

ST

globals

SB

LB

(4) After program calls Z

framefor Z

ST

globals

SB

LB

(5) After Z calls Y

framefor Z

ST

framefor Y

globals

SB

LB

(6) After return from Y

framefor Z

ST

globals

SB

ST

(7) After return from Z

dynamic links

RegistersSB: Stack Base – Location of global

variablesLB: Local Base – Local variables of

currently running procedureST: Stack Top – Very top of stack

Chart 219

Stack Storage Allocation

The stack varies in size– For example, the frames for each of Y’s activation are at two

different locations– The position of a frame within a stack cannot be predicted in

advance– Need registers dedicated to point to the frames

Registers (find address of variables relative to these registers)– SB: stack base – is fixed, pointing to the base of the stack. This is

where the global variables are located.– LB: local base – points to the base of the topmost frame in the

stack. This frame always contains the variables of the currently running procedure.

– ST: stack top – points to the very top of the stack. ST keeps track of the frame boundary as expressions are evaluated and the top of the stack expands and contracts.

Chart 220

Stack Storage Allocation

Frame contents– Space for local variables– Link data

Return address – code address to which control will be returned at the end of the procedure activation. It is the address of the instruction following the call instruction that activated the procedure in the first place.

Dynamic link – the pointer to the base of the underlying fram e in the stack. It is the old content of LB and will be restored at end of procedure activation

dynamic linkreturn addresslink data

local data

Since there are two words of link data, local variable addresses are offset by 2

This only considers access to local or global variables, not nested variables.

Chart 221

Chapter 7: Code Generation

Code Selection A Code Generation Algorithm Constants and Variables Procedures and Functions Case Study: Code Generation in the Triangle

Compiler

Chart 222

Code Generation

Translation of the source program to object code– Dependent on source language and target machine

Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or

a mixture– Single addressing mode, or many

Chart 223

Code GenerationMajor Subproblems

Code selection: which sequence of target machine instructions will be the object code for each phrase– Write code templates: a general rule specifying the object code of

all phases of a particular form (e.g., all assignment commands, etc.)

– But there are usually lots of special cases Storage allocation: deciding the storage address of each

variable in source program– Exact for glob al variables, but only relative for local variables

Register allocation: should be used to hold intermediate results during expression evaluation– Complex expressions -- not enough registers

Since code generation for stack machine much simpler than for register machine, will only generate code for stack machine

Chart 224

Code GenerationCode Selection

Deciding which sequence of instructions to generate for each case

Code template: specifies the object code to which a phrase is translated, in terms of the object code to which its sub phrases are translated.

Object code: sequence of instructions to which the source-language phrase will be translated

Code specification: collection of code functions and code templates; must cover the entire source langauge

Chart 225

Abstract Machine TAM

Suitable for executing programs compiled from a block-structured language such as Triangle

All evaluation takes place o a stack Primitive arithmetic, logical, and other operations

are treated uniformly with programmed functions and procedures

Two separate stores– Code Store: 32-bit instruction words (read only)– Data Store: 16-bit data words (read-write)

Chart 226

Abstract Machine TAMCode and Data Stores

Code Store– Fixed while program is running– Code segment: contains the program’s instructions

CB points to base of code segment CT points to top of code segment CP points to next instruction to be executed

– Initialized to CB (programs first instruction is at base of code segment)

– Primitive segment: contains ‘microcode’ for elementary arithmetic , logical, input-output, heap, and general-purpose operations

PB points to base of primitive segment PT points to top of primitive segment

Chart 227


Data Store– While program is running segments of data store may

vary– Stack grows from low-address end of Data Store

SB points to base of the stack ST points to top of the stack

– Initialized to SB– Heap grows from the high-address endo fo Data Store

HB points to base of heap HT points to top of heap

– Initialized to HB

Chart 228


Code Store

codesegment

unused

primitivesegment

CB

CP

CT

PB

PT

Data Store

globalsegment

unused

heapsegment

SB

LB

ST

HT

HB

frame

frame

stack

• Stack and heap can expand and contract

• Global segment is always at base of stack

• Stack can contain any number of other segments known as frames containing data local to an activation of some routine

• LB points to base of topmost frame

Chart 229

Code Functions

Run the program P and then halt, starting and finishing with an empty stack

Execute the command C, possibly updating variables, but neither expanding nor contracting the stack

Execute the expression E, pushing its result on to the stack top, but having no other effect

Push the value of the constant or variable named V on to the stack top

Pop a value from t he stack top, and store it in the variable named V

Elaborate the declaration D, expanding the stack to make space for any constants and variables declared therein

run P

execute C

evaluate E

fetch V

assign V

elaborate D

Chart 230

Abstract Machine TAMInstructions

Fetch an n-word object from the data address (d+register r), and push it on the stack

Push the data address (d+register r) on to the stackPop a data address from the stack, fetch an n-word object from that address, and

push it on to the stackPush the 1-word literal value d on to the stackPop an n-word object from the stack, and store it at the data address (d+register r)Pop an address from the stack, then pop an n-word object from t he stack and store

it at that addressCall the routine at code address (d+register r), using the address in register n as

the static linkPop a closure (static link and code address) from the stack, then call the routine at

that code addressReturn from the current routine: pop an n-word result from the stack, then pop the

topmost frame, then pop d words of arguments, then push the result back on to the stack

Push d words (uninitialized) on to the stackPop an n-word result from the stack, then pop d more words, then push the result

back on to the stackJump to code address (d+register r)Pop a code address from the stack, then jump to that addressPop a 1-word value from the stack, then jump to code address (d+register r) if and

only if that value equals nStop execution of the program

LOAD(n) d[r]LOADA d[r]LOADI(n)

HALT

LOADL dSTORE(n) d[r]STOREI(n)

CALL(n) d[r]

CALLI

RETURN(n) d

PUSH dPOP(n) d

JUMP d[r]JUMPIJUMPIF(n) d[r]

Chart 231

While Command

execute [[while E do C]] =

JUMP h– g: execute C– h: evaluate E

JUMPIF(1) g

Chart 232

While Command

execute [[while i > 0 do i := i – 2]]

– execute [[i := I – 2]]

– execute [[i > 0]]

30: JUMP 35 // JUMP hg: 31: LOAD i 32: LOADL2 33: CALL sub 34: STORE ih: 35: LOAD i 36: LOADL0 37: CALL gt 38: JUMPIF(1) 31 // JUMPIF(1) g

Chart 233

While Command

public Object visitWhileCommand(WhileCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr;

jumpAddr = nextInstrAddr;// saves the next instruction address (g:) to put in JUMP command emit(Machine.JUMPop, 0, Machine.CBr, 0);// puts the JUMP h instruction in obj file loopAddr = nextInstrAddr;// this is address g: ast.C.visit(this, frame);// this generates code for C patch(jumpAddr, nextInstrAddr);// this establishes address h: that was needed in the JUMP h statement ast.E.visit(this, frame);// this generated code for E emit(Machine.JUMPIFop, Machine.trueRep, Machine.CBr, loopAddr);// this generated code to check expression, if false to address g: return null; }

Chart 234

While Command

execute [[while E do C]] =g:execute C

evaluate EJUMPIF(1) g

Chart 235

Repeat Command

execute [[repeat i := i – 2 until i < 0 do ]]

– execute [[i := i – 2]]

– execute [[i > 0]]

g: 31: LOAD i 32: LOADL 2 33: CALL sub 34: STORE i 35: LOAD i 36: LOADL 0 37: CALL lt 38: JUMPIF(0) 31 // JUMPIF(0) g

Chart 236

Repeat Command

public Object visitRepeatCommand(RepeatCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr; // emit(Machine.JUMPop, 0, Machine.CBr, 0); // jumpAddr = nextInstrAddr; loopAddr = nextInstrAddr; ast.C.visit(this, frame);

// patch(jumpAddr, nextInstrAddr); ast.E.visit(this, frame); emit(Machine.JUMPIFop, Machine.falseRep, Machine.CBr, loopAddr); return null; }

Chart 237

Abstract Machine TAMRoutines

Chart 238

Abstract Machine TAMPrimitive Routines

Chart 239

Extend Mini-TriangleV1 , V2 := E1 , E2

This is a simultaneous assignment: both E1 and E2 are to be evaluated, and then their values assigned to the variables V1 and V2, respectivelyevaluate E1

evaluate E2

assign V2

assign V1

Results pushed to top of stackResults pushed to top of stackTop of stack stored in variable V2

Top of stack stored in variable V1

Result E2

STST

Result E1

ST

Result E1ST

Result E1

Result E2V2

ST

Result E2V2

Result E1V1

Chart 240

Extend Mini-TriangleC1 , C2

This is a collateral command: the subcommands C1 and C2 are to be executed in any order chosen by the implementer

execute C1

execute C2

Top of stack unchangedTop of stack unchanged

Chart 241

Extend Mini-Triangleif E then C

This is a conditional command: if E evaluates to true, C is executed, otherwise nothing

evaluate EJUMPIF (0) gexecute C

g:

Results pushed to top of stackJump to g if E evaluates to falseTop of stack unchangedJump location

Chart 242

Extend Mini-Trianglerepeat C until E

This is a loop command: E is evaluated at the end of each iteration (after executing C), and the loop terminates if its value is true

g: execute C evaluate E JUMPIF (0) g

Top of stack unchangedResults pushed to top of stack Jump to g if E evaluates to false

Chart 243

Extend Mini-Trianglerepeat C1 while E do C2

This is a loop command: E is evaluated in the middle of each iteration (after executing C1 but before executing C2), and the loop terminates if its value is false

JUMP hg: execute C2

h: execute C1

evaluate E JUMPIF (1) g

Top of stack unchangedTop of stack unchanged Results pushed to top of stack Jump to g if E evaluates to true

Chart 244

Extend Mini-Triangleif E1 then E2 else E3

This is a conditional expression: if E1 evaluates to true, E2 is evaluated, otherwise E3 is evaluated (E2 and E3 must be of the same type)

evaluate E1

JUMPIF (0) gevaluate E2

JUMP hg: evaluate E3

h:

Results pushed to top of stackJump to g if E evaluates to falseResults pushed to top of stackJump locationResults pushed to top of stack

Chart 245

Extend Mini-Trianglelet D in E

This is a block expression: the declaration D is elaborated, and the resultant bindings are used in the evaluation of E

elaborate Devaluate EPOP (n) s

Expand stack for variables or constantsResults pushed to top of stackPop an n word from stack, pop s more, then

push first n-word back on stackIf s>0where s = amount of storage allocated by D

n = size (type of E)

Chart 246

Extend Mini-Trianglebegin C; yield E end

Here the command C is executed (making side effects), and then E is evaluated

execute Cevaluate E

Top of stack unchangedResults pushed to top of stack

Chart 247

Extend Mini-Trianglefor I from E1 to E2 do C

First the expressions E1 and E2 are evaluated, yielding the integer m and n, respectively. Then the subcommand C is executed repeatedly, with I bound to integers m, m+1, …, n in successive iterations. If m < n, C is not executed at all. The scope of I is C, which may fetch I but may not assign to it.

Chart 248

Extend Mini-Trianglefor I from E1 to E2 do C

evaluate E2

evaluate E1

JUMP hg: execute C

CALL succh: LOAD –1 [ST]

LOAD –3 [ST]CALL leJUMPIF(1) gPOP(0) 2

Compute final valueCompute initial value of I

Top of stack unchangedIncrement current value of IFetch current value of IFetch final valueTest current value <= final valueIf so, repeatDiscard current and final values

At g and at h, the current value of I is at the stack top (at address –1 [ST], and the final value is immediately underlying (at address –2 [ST]

Chart 249

Chapter 8: Interpretation

Interactive Interpretation– Interactive Interpretations of Machine Code– Interactive Interpretation of Command

Languages– Interactive Interpretation of Simple

Programming LanguagesRecursive InterpretationCase Study: The TAM Interpreter

Chart 250

Chapter 9: Conclusion

The Programming Language Life Cycle– Design– Specification– Prototype– Compilers

Error Reporting– Compile-time Error Reporting– Run-time Error Reporting

Efficiency– Compile-time Efficiency– Run-time Efficiency

Chart 251


Lexical Analyzer

Parser


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Chart 252

Programming Language Lifecycle:Concepts

Values Types Storage Bindings Abstractions Encapsulation Polymorphism Exceptions Concurrency

Concepts

Advanced Concepts

Chart 253

Programming Language Lifecycle:Simplicity & Regularity

Strive for simplicity and regularity– Simplicity: support only the concepts essential

to the applications for which language is intended

– Regularity: should combine those concepts in a systematic way, avoiding restrictions that might surprise programmers or make their task more difficult

Chart 254

Design Principles

Type completeness: no operation should be arbitrarily restricted in the types of its operands– Operations like assignment and parameter passing should,

ideally, be applied to all types Abstraction: for phrase that specifies some kind of

computation, should be a way to abstract that phrase and parameterize it– Should be possible to abstract any expression and make it a

function Correspondence: for each form of declaration there

should be corresponding parameter mechanism– Take a block with a constant definition and transform it into as

procedure (or function) with a constant parameter

Chart 255

Programming Language Lifecycle

Design

Specification

Prototype

Compilers

Manuals,textbooks

Chart 256

Specification

Precise specification for language’s syntax and semantics must be written– Informal or formal or hybrid

Syntax

Semantics

Informal Formal

English phrases

English phrases

BNF, EBNF

Axiomatic method (based on mathematical logic)

Chart 257

Prototypes

Cheap, low quality implementation Highlights features of language that are hard to

implement Try out language

– Interpreter might be a ghood prototype– Interpretive compiler

From source to abstract machine code

Chart 258

Compile-Time Error Reporting

Rejecting ill-formed programs Report location of each error with some

explanation Distinguish between the major categories of

compile-time errors:– Syntactic error: missing or unexpected characters or

tokens Indicate what characters or tokens were expected

– Scope error: a violation of the language’s scope rules Indicate which identifier was used twice, or used with

declaration– Type error: a violation of the language’s type rule

Indicate which type rule was violated and/or what type was expected

Chart 259

Run-Time Error Reporting

Common run-time errors– Arithmetic overflow– Division by zero– Out-of-range array indexing

Can be detected only at run-time, because they depend on values computed at run-time

Chart 260

Final Exam Review

Final Exam is comprehensive in that: – Essay questions will cover Chapters 5, 6, 7, 9– Problem oriented questions require knowledge from

the entire semester Exam Structure

– Four questions Two essay questions

– Discuss– Describe

Two problems– Develop code template for new language construct– Determine identification table for given program– Calculate size and address for given type(s)

– Compare & contrast– Evaluate

Chart 261

Final Exam ReviewChapter 5 – Contextual Analysis

Contextual analysis checks that the program conforms to the source language’s contextual constraints– Scope rules– Type rules

Block Structure– Monolithic– Flat– Nested

Type Checking– Literal– Identifier– Unary operator application– Binary operator application

Standard Environment

Chart 262

Final Exam ReviewChapter 6 – Run-Time Organization

Key Issues– Data representation– Expression evaluation– Storage allocation– Routines

Fundamental Principles of Data Representation– Non-confusion: different values of a given type should

have different representation– Uniqueness: each value should always have same

representation

Chart 263


Types– Primitive types: cannot be decomposed

Boolean Character Integer

– Records– Disjoint unions– Static arrays– Dynamic arrays– Recursive types

For various types be able to determine size (storage required) and address (how to locate)

Chart 264


Expression Evaluation– Stack machine– Register machine– Static storage allocation

Global variables– Stack storage allocation

Local variables

Chart 265

Final Exam ReviewChapter 7 – Code Generation

Translation of the source program to object code– Dependent on source language and target machine

Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or

a mixture– Single addressing mode, or many

Chart 266

Final Exam ReviewChapter 7 – Code Generation

Code selection: which sequence of target machine instructions will be the object code for each phrase

Storage allocation: deciding the storage address of each variable in source program

Register allocation: should be used to hold intermediate results during expression evaluation

Chart 267

Final Exam ReviewChapter 9 – Programming Language Life-Cycle

Design

Specification

Prototype

Compilers

Manuals,textbooks

Chart 268

Final Exam ReviewChapter 9 – Programming Language Life-Cycle

Strive for simplicity and regularity Design Principles

– Type completeness: no operation should be arbitrarily restricted in the types of its operands

– Abstraction: for phrase that specifies some kind of computation, should be a way to abstract that phrase and parameterize it

– Correspondence: for each form of declaration there should be corresponding parameter mechanism

Specifications Prototype Error Reporting

– Compile-time– Run-time

Chart 269

Final Exam Review Structure of a Compiler

Lexical Analyzer

Parser


Optimization


Symbol Table

Source code

Assembly code

tokens

parse tree



Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Documents

CSC 415: Translators and Compilers