269
CSC 415: Translators and Compilers Dr. Chuck Lillie

CSC 415: Translators and Compilers

  • Upload
    mary

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

CSC 415: Translators and Compilers. Dr. Chuck Lillie. Course Outline. Major Programming Project Project Definition and Planning Implementation Weekly Status Reports Project Presentation. Translators and Compilers Language Processors Compilation Syntactic Analysis Contextual Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: CSC 415:  Translators and Compilers

CSC 415: Translators and Compilers

Dr. Chuck Lillie

Page 2: CSC 415:  Translators and Compilers

Chart 2

Course Outline

Translators and Compilers– Language Processors– Compilation– Syntactic Analysis– Contextual Analysis– Run-Time Organization– Code Generation– Interpretation

Major Programming Project– Project Definition and

Planning– Implementation

Weekly Status Reports– Project Presentation

Page 3: CSC 415:  Translators and Compilers

Chart 3

Project

Implement a Compiler for the Programming Language Triangle– Appendix B: Informal Specification of the Programming

Language Triangle– Appendix D: Class Diagrams for the Triangle Compiler

Present Project Plan– What and How

Weekly Status Reports– Work accomplished during the reporting period– Deliverable progress, as a percentage of completion– Problem areas– Planned activities for the next reporting period

Page 4: CSC 415:  Translators and Compilers

Chart 4

Chapter 1: Introduction to Programming Languages

Programming Language: A formal notation for expressing algorithms.

Programming Language Processors: Tools to enter, edit, translate, and interpret programs on machines.

Machine Code: Basic machine instructions– Keep track of exact address of each data item and

each instruction– Encode each instruction as a bit string

Assembly Language: Symbolic names for operations, registers, and addresses.

Page 5: CSC 415:  Translators and Compilers

Chart 5

Programming Languages

High Level Languages: Notation similar to familiar mathematical notation– Expressions: +, -, *, /– Data Types: truth variables, characters, integers,

records, arrays– Control Structures: if, case, while, for– Declarations: constant values, variables, procedures,

functions, types– Abstraction: separates what is to be performed from

how it is to be performed– Encapsulation (or data abstraction): group together

related declarations and selectively hide some

Page 6: CSC 415:  Translators and Compilers

Chart 6

Programming LanguagesAny system that manipulates programs

expressed in some particular programming language– Editors: enter, modify, and save program text– Translators and Compilers: Translates text from

one language to another. Compiler translates a program from a high-level language to a low-level language, preparing it to be run on a machine

Checks program for syntactic and contextual errors– Interpreters: Runs program without compliation

Command languagesDatabase query languages

Page 7: CSC 415:  Translators and Compilers

Chart 7

Programming Languages Specifications

Syntax– Form of the program– Defines symbols– How phrases are composed

Contextual constraints– Scope: determine scope of each declaration– Type:

Semantics– Meaning of the program

Page 8: CSC 415:  Translators and Compilers

Chart 8

Representation

Syntax– Backus-Naur Form (BNF): context-free grammar

Terminal symbols (>=, while, ;) Non-terminal symbols (Program, Command, Expression,

Declaration) Start symbol (Program) Production rules (defines how phrases are composed from

terminals and sub-phrases)– N::=a|b|….

– Syntax Tree Used to define language in terms of strings and terminal

symbols

Page 9: CSC 415:  Translators and Compilers

Chart 9

Representation

Semantics– Abstract Syntax

Concentrate on phrase structure alone– Abstract Syntax Tree

Page 10: CSC 415:  Translators and Compilers

Chart 10

Contextual Constraints

Scope– Binding

Static: determined by language processor Dynamic: determined at run-time

– Type Statically: language processor can detect all errors Dynamically: type errors cannot be detected until run-time

Will assume static binding and statically typed

Page 11: CSC 415:  Translators and Compilers

Chart 11

Semantics

Concerned with meaning of program– Behavior when run

Usually specified informally– Declarative sentences– Could include side effects– Correspond to production rules

Page 12: CSC 415:  Translators and Compilers

Chart 12

Chapter 2: Language Processors

Translators and Compilers InterpretersReal and Abstract Machines Interpretive CompilersPortable CompilersBootstrappingCase Study: The Triangle Language

Processor

Page 13: CSC 415:  Translators and Compilers

Chart 13

Translators & Compilers

Translator: a program that accepts any text expressed in one language (the translator’s source language), and generates a semantically-equivalent text expressed in another language (its target language)– Chinese-into-English– Java-into-C– Java-into-x86– X86 assembler

Page 14: CSC 415:  Translators and Compilers

Chart 14

Translators & Compilers

Assembler: translates from an assembly language into the corresponding machine code– Generates one machine code instruction per source

instruction Compiler: translates from a high-level language

into a low-level language– Generates several machine-code instructions per

source command.

Page 15: CSC 415:  Translators and Compilers

Chart 15

Translators & Compilers

Disassembler: translates a machine code into the corresponding assembly language

Decompiler: translates a low-level language into a high-level language

Question: Why would you want a disassembler or decompiler?

Page 16: CSC 415:  Translators and Compilers

Chart 16

Translators & Compilers

Source Program: the source language text Object Program: the target language text

Compiler

ObjectProgram

Syntax Check

Context Constraints

Generate Object CodeSemantic Analysis

SourceProgram

• Object program semantically equivalent to source program If source program is well-formed

Page 17: CSC 415:  Translators and Compilers

Chart 17

Translators & Compilers

Why would you want to do:– Java-into-C translator– C-into-Java translator– Assembly-language-into-Pascal decompiler

Page 18: CSC 415:  Translators and Compilers

Chart 18

Translators & Compilers

M

PL

PL

M

P = Program NameL = Implementation Language

M = Target Machine

For this to work, L must equal M, that is, the implementation language must be the same as the machine language

S TL

S = Source LanguageT = Target LanguageL = Translator’s Implementation LanguageS-into-T Translator is

itself a program that runs on machine L

Page 19: CSC 415:  Translators and Compilers

Chart 19

Translators & Compilers

• Translating a source program P • Expressed in language T, • Using an S-into-T translator • Running on machine M

PS

M

S TM

PT

Page 20: CSC 415:  Translators and Compilers

Chart 20

Translators & Compilers

• Translating a source program sort • Expressed in language Java, • Using an Java-into-x86 translator • Running on an x86 machine

sortJava

x86

Java x86x86

sortx86

The object program is running on the same machine as the compiler

sortx86

x86

Page 21: CSC 415:  Translators and Compilers

Chart 21

Translators & Compilers

sortJava

x86

Java PPCx86

sortPPC

Cross Compiler: The object program is running on a different machine than the compiler

sortPPC

PPC

download

• Translating a source program sort • Expressed in language Java, • Using an Java-into-PPC translator • Running on an x86 machine• Downloaded to a PPC machine

Page 22: CSC 415:  Translators and Compilers

Chart 22

Translators & Compilers

sortJava

x86

Java Cx86

sortC

Two-stage Compiler: The source program is translated to another language before being translated into the object program

sortx86

x86

• Translating a source program sort • Expressed in language Java, • Using an Java-into-C translator • Running on an x86 machine

x86

x86x86

sortx86C

• Then translating the C program• Using an C-into x86 compiler• Running on an x86 machine• Into x86 object program

Page 23: CSC 415:  Translators and Compilers

Chart 23

Translators & Compilers

Translator Rules– Can run on machine M only if it is expressed in

machine code M– Source program must be expressed in translator’s

source language S– Object program is expressed in the translator’s target

language T– Object program is semantically equivalent to the

source program

Page 24: CSC 415:  Translators and Compilers

Chart 24

Interpreters

Accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately– Does not translate the source program into object code

prior to execution

Page 25: CSC 415:  Translators and Compilers

Chart 25

Interpreters

Interpreter

Program Complete

Fetch Instruction

Analyze Instruction

Execute Instruction

SourceProgram

• Source program starts to run as soon as the first instruction is analyzed

Page 26: CSC 415:  Translators and Compilers

Chart 26

Interpreters

When to Use Interpretation– Interactive mode – want to see results of instruction

before entering next instruction– Only use program once– Each instruction expected to be executed only once– Instructions have simple formats

Disadvantages– Slow: up to 100 times slower than in machine code

Page 27: CSC 415:  Translators and Compilers

Chart 27

Interpreters

Examples– Basic– Lisp– Unix Command Language (shell)– SQL

Page 28: CSC 415:  Translators and Compilers

Chart 28

Interpreters

SL S interpreter expressed in language L

SM

PS

M

Program P expressed in language S, using Interpreter S, running on machine M

Basicx86

graphBasic

x86

Program graph written in Basic running on a Basic interpreter executed on an x86 machine

Page 29: CSC 415:  Translators and Compilers

Chart 29

Real and Abstract Machines

Hardware emulation: Using software to execute one set of machine code on another machine– Can measure everything about the new machine

except its speed– Abstract machine: emulator– Real machine: actual hardware

An abstract machine is functionally equivalent to a real machine if they both implement the same language L

Page 30: CSC 415:  Translators and Compilers

Chart 30

Real and Abstract Machines

nmiC

M

C MM

New Machine Instruction (nmi) interpreter written in C

nmiC

nmiM

The nmi interpreter is translated into machine code M using the C compiler

Compiler to translate C program into M machine code

nmi interpreter written in C nmi interpreter expressed in machine code M

nmiM

Pnmi

M

Pnmi

nmi

Page 31: CSC 415:  Translators and Compilers

Chart 31

Interpretive Compilers

Combination of compiler and interpreter– Translate source program into an intermediate

language– It is intermediate in level between the source language

and ordinary machine code– Its instructions have simple formats, and therefore can

be analyzed easily and quickly– Translation from the source language into the

intermediate language is easy and fast

An interpretive compiles combines fast compilation with tolerable running speed

Page 32: CSC 415:  Translators and Compilers

Chart 32

Interpretive Compilers

Java JVM

M

JVMM

Java into JVM translator running on machine M

JVM code interpreter running on machine M

Java JVM

M

PJava

PJVM

M

PJVM

M

JVMM

A Java program P is first translated into JVM-code, and then the JVM-code object program is interpreted

Page 33: CSC 415:  Translators and Compilers

Chart 33

Portable Compilers

A program is portable if it can be compiled and run on any machine, without change– A portable program is more valuable than an

unportable one, because its development cost can be spread over more copies

– Portability is measured by the proportion of code that remains unchanged when it is moved to a dissimilar machine

Language affects protability– Assembly language: 0% portable– High level language: approaches 100% portability

Page 34: CSC 415:  Translators and Compilers

Chart 34

Portable Compilers

Language Processors– Valuable and widely used programs– Typically written in high-level language

Pascal, C, Java– Part of language processor is machine dependent

Code generation part Language processor is only about 50% portable

– Compiler that generates intermediate code is more portable than a compiler that generates machine code

Page 35: CSC 415:  Translators and Compilers

Chart 35

Portable Compilers

Java JVM

JavaJVMJava

Java JVM

JVM

PJava

PJVM

M

PJVM

M

JVMM

JVMC

Java JVM

JVM

Rewrite interpreter in C

C M

M

M

JVMC

JVMM

JVMM

Note: C M Compiler exists; rewrite JVM interpreter from Java to C

Page 36: CSC 415:  Translators and Compilers

Chart 36

Bootstrapping

The language processor is used to process itself– Implementation language is the source language

Bootstrapping a portable compiler– A portable compiler can be bootstrapped to make a true compiler

– one that generates machine code – by writing an intermediate-language-into-machine-code translator

Full bootstrap– Writing the compiler in itself– Using the latest version to upgrade the next version

Half bootstrap– Compiler expressed in itself but targeted for another machine

Bootstrapping to improve efficiency– Upgrade the compiler to optomize code generation as well as to

improve compile efficiency

Page 37: CSC 415:  Translators and Compilers

Chart 37

Bootstrapping

Bootstrap an interpretive compiler to generate machine code

Java M

Java

M

JVMM

Java M

Java Java JVM

JVM

Java M

JVM

M

JVMM

JVM M

JVM JVM M

JVM

Java M

M

M

Java JVM

JVM JVM M

M

Java JVM

M Java JVM

M

M

JVM M

M

M

PJava

PJVM

PM

First, write a JVM-coded-into-M translator in Java

Next, compile translator using existing interpreter

Use translator to translate itself

Translate Java-into-JVM-code translator into machine code

Two stage Java-into-M compiler

Page 38: CSC 415:  Translators and Compilers

Chart 38

Bootstrapping

Full bootstrapAda-S M

C

v1

Ada-S M

C C M

M

Ada-S M

M

M

v1 v1

Ada-S M

Ada-S

v2

Ada-S M

Ada-S Ada-S M

M

Ada-S M

M

M

v2 v2

v1 Ada M

Ada-S Ada-S M

M

Ada M

M

M

v3 v3

v2

Ada M

Ada-S

v3

Extend Ada-S compiler to (full) Ada compiler

Convert the C version of Ada-S into Ada-S version of Ada-S

Write Ada-S compiler in C

Page 39: CSC 415:  Translators and Compilers

Chart 39

Bootstrapping

Half bootstrapAda HM

Ada

Ada HM

HM

Ada TM

Ada

Ada TM

Ada Ada HM

HM

HM

Ada TM

HM

PAda Ada TM

HM

PTM

PTM

TM

Ada TM

Ada Ada TM

HM

HM

Ada TM

TM

Page 40: CSC 415:  Translators and Compilers

Chart 40

Bootstrapping

Bootstrap to improve efficiencyAda Ms

Ms

v1

Ada Ms

Ada

v1

Ada Mf

Ada

v2

Ada Mf

Ada

v2

Ada Ms

Ms

v1 Ada Mf

Ms

v2

M

Ada Mf

Ms

v2PAda

Page 41: CSC 415:  Translators and Compilers

Chart 41

Chapter 3: Compilation

Phases– Syntactic Analysis– Contextual Analysis– Code Generation

Passes– Multi-pass Compilation– One-pass Compilation– Compiler Design Issues

Case Study: The Triangle Compiler

Page 42: CSC 415:  Translators and Compilers

Chart 42

Phases

Syntactic Analysis– The source program is parsed to check whether it

conforms to the source language’s syntax, and to determine its phrase structure

Contextual Analysis– The parsed program is analyzed to check whether it

conforms to the source language's contextual constraints

Code Generation– The checked program is translated to an object

program, in accordance with the semantics of the source and target languages

Page 43: CSC 415:  Translators and Compilers

Chart 43

Phases

Syntactic Analysis

Contextual Analysis

Code Generation

Object Program

Decorated AST

AST

Source Program

Error Report

Error Report

Page 44: CSC 415:  Translators and Compilers

Chart 44

Syntactic Analysis

To determine the source program’s phrase structure– Parsing– Contextual analysis and code generation must know how the

program is composed Commands, expressions, declarations, …

– Check for conformance to the source language’s syntax– Construct suitable representation of its phrase structure (AST)

AST– Terminal nodes corresponding to identifiers, literals, and operators– Sub trees representing the phases of the source program– Blanks and comments not in AST (no meaning)– Punctuation and brackets not in AST (only separate and enclose)

Page 45: CSC 415:  Translators and Compilers

Chart 45

Contextual Analysis

Analyzes the parsed program– Scope rules– Type rules

Produces decorated AST– AST with information gathered during contextual

analysis– Each applied occurrence of an identifier is linked ot the

corresponding declaration– Each expression is decorated by its type T

Page 46: CSC 415:  Translators and Compilers

Chart 46

Code Generation

The final translation of the checked program to an object program– After syntactic and contextual analysis is completed

Treatment of identifiers– Constants

Binds identifier to value Replace each occurrence of identifier with value

– Variables Binds identifier to some memory address Replace each occurrence of identifier by address

Target language– Assembly language– Machine code

Page 47: CSC 415:  Translators and Compilers

Chart 47

Passes

Multi-pass compilation– Traverses the program or AST several times

One-pass compilation– Single traverse of program– Contextual analysis and code

generation are performed ‘on the fly’ during syntactic analysis

Compiler Driver

Syntactic Analyzer

Contextual Analyzer

Code Generator

Compiler Driver

Syntactic Analyzer

Contextual Analyzer

Code Generator

Page 48: CSC 415:  Translators and Compilers

Chart 48

Compiler Design Issues

Speed– Compiler run time

Space– Storage: size of compiler + files generated

Modularity– Multi-pass compiler more modular than one-pass compiler

Flexibility– Multi-pass compiler is more flexible because it generates an AST

that can be traversed in any order by the other phases Semantics-preserving transformations

– To optimize code – must have multi-pass compiler Source language properties

– May restrict compiler choice – some language constructs may require multi-pass compilers

Page 49: CSC 415:  Translators and Compilers

Chart 49

Chapter 4: Syntactic Analysis

Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle

Compiler

Page 50: CSC 415:  Translators and Compilers

Chart 50

Structure of a Compiler

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 51: CSC 415:  Translators and Compilers

Chart 51

Syntactic Analysis

Main function– Parse source program to discover its phrase structure– Recursive-descent parsing– Constructing an AST– Scanning to group characters into tokens

Page 52: CSC 415:  Translators and Compilers

Chart 52

Sub-phases of Syntactic Analysis

Scanning (or lexical analysis)– Source program transformed to a stream of tokens

Identifiers Literals Operators Keywords Punctuation

– Comments and blank spaces discarded Parsing

– To determine the source programs phrase structure– Source program is input as a stream of tokens (from the Scanner)– Treats each token as a terminal symbol

Representation of phrase structure– AST

Page 53: CSC 415:  Translators and Compilers

Chart 53

Lexical Analysis – A Simple Example

Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments

Some tokens for this example:main(){inta,b,c;

Main() {int a, b, c;char number[5];

/* get user inputs */A = atoi ( gets(number));B = atoi (gets(number));

/* calculate value for c */C = 2*(a+b) + a*(a+b);

/* print results */Printf(“%d”,c);

}

Page 54: CSC 415:  Translators and Compilers

Chart 54

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t S v a r y : I n t e g e r i n . . . .S S S

(S= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.:=

becomes

y

Ident.

+

op.1

Intlit.

eot

Page 55: CSC 415:  Translators and Compilers

Chart 55

Tokens in Triangle

// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2, "<identifier>", OPERATOR = 3, "<operator>",

// reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",

// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,

// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",

// special tokens... EOT = 33, "", ERROR = 34; "<error>"

Page 56: CSC 415:  Translators and Compilers

Chart 56

Grammars Revisited

Context free grammars– Generates a set of sentences– Each sentence is a string of terminal symbols– An unambiguous sentence has a unique phrase

structure embodied in its syntax tree Develop parsers from context-free grammars

Page 57: CSC 415:  Translators and Compilers

Chart 57

Regular Expressions

A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols

Main features– ‘|’ separates alternatives– ‘*’ indicates that the previous item may be represented

zero or more times– ‘(‘ and ‘)’ are grouping parentheses

Page 58: CSC 415:  Translators and Compilers

Chart 58

Regular Expression Basics

e The empty string a special string of length 0 Regular expression operations

– | separates alternatives– * indicates that the previous item may be represented

zero or more times (repetition)– ( and ) are grouping parentheses

Page 59: CSC 415:  Translators and Compilers

Chart 59

Regular Expression Basics

Algebraic Properties– | is commutative and associative

r|s = s|r r|(s|t) = (r|s)|t

– Concatenation is associative (rs)t = r(st)

– Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr

– e is the identity for concatenatione r = r r e = r

– * is idempotent r** = r* r* = (r| e)*

Page 60: CSC 415:  Translators and Compilers

Chart 60

Regular Expression Basics

Common Extensions– r+ one or more of expression r, same as rr*– rk k repetitions of r

r3 = rrr– ~r the characters not in the expression r

~[\t\n]– r-z range of characters

[0-9a-z]– r? Zero or one copy of expression (used for fields

of an expression that are optional)

Page 61: CSC 415:  Translators and Compilers

Chart 61

Regular Expression Example

Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– First Try: [0|1|e][0-9] Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes

0, 00, 18

Page 62: CSC 415:  Translators and Compilers

Chart 62

Regular Expression Example

Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– Second Try: [1-9]|(0[1-9])|(1[0-2]) Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No

Page 63: CSC 415:  Translators and Compilers

Chart 63

Regular Expression Example

Regular Expression for Floating Point Numbers– Examples of legal inputs

1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6 Assume that a 0 is required before numbers less than 1 and

does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal

– Building the regular expression Assume

Digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159

Digit+.digit+ Add an optional sign (only minus, no plus)

– (-| e)digit+.digit+ or -?digit+.digit+

Page 64: CSC 415:  Translators and Compilers

Chart 64

Regular Expression Example

Regular Expression for Floating Point Numbers (cont.)– Building the regular expression (cont.)

Format for the exponent(E|e)(+|-)?(digit+)

Adding it as an optional expression to the decimal part

(-| e)digit+.digit+((E|e)(+|-)?(digit+))?

Page 65: CSC 415:  Translators and Compilers

Chart 65

Extended BNF

Extended BNF (EBNF)– Combination of BNF and RE– N::=X, where N is a nonterminal symbol and X is an

extended RE, i.e., an RE constructed from both terminal and nonterminal symbols

– EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal

symbols

Page 66: CSC 415:  Translators and Compilers

Chart 66

Example EBNF

Expression ::= primary-Expression (Operator primary-Expression)*

Primary-Expression ::= Identifier| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))

Page 67: CSC 415:  Translators and Compilers

Chart 67

Grammar Transformations

Left FactorizationXY | XZ is equivalent to X(Y | Z)

single-Command ::= V-name := Expression| if Expression then single-

Command| if Expression then single-

Commandelse single-Command

single-Command ::= V-name := Expression| if Expression then single-

Command(e |else single-Command)

Page 68: CSC 415:  Translators and Compilers

Chart 68

Grammar Transformations

Elimination of left recursionN::= X | NY is equivalent to N::=X(Y)*

Identifier ::= Letter| Identifier Letter| Identifier Digit

Identifier ::= Letter| Identifier (Letter | Digit)

Identifier ::= Letter(Letter | Digit)*

Page 69: CSC 415:  Translators and Compilers

Chart 69

Grammar Transformations

Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X

iff N::=X is nonrecursive and is the only production rule for N

single-Command ::= for Control-Variable := Expression To-or-DowntoExpression do single-Command

| …Control-Variable ::= IdentifierTo-or-Downto ::= to

| down

single-Command ::= for Identifier := Expression (to|downto)Expression do single-Command

| …

Page 70: CSC 415:  Translators and Compilers

Chart 70

Scanning (Lexical Analysis)

The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.

Difference between parsing and scanning:– Parsing groups terminal symbols, which are tokens,

into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure

– Scanning groups individual characters into tokens

Page 71: CSC 415:  Translators and Compilers

Chart 71

Structure of a Compiler

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 72: CSC 415:  Translators and Compilers

Chart 72

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t S v a r y : I n t e g e r i n . . . .S S S

(S= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.:=

becomes

y

Ident.

+

op.1

Intlit.

eot

Page 73: CSC 415:  Translators and Compilers

Chart 73

What Does a Scanner Do?

Hand keywords (reserve words)– Recognizes identifiers and keywords– Match explicitly

Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword

– Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded

keyword table

How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.

Page 74: CSC 415:  Translators and Compilers

Chart 74

What Does a Scanner Do?

Remove white space– Tabs, spaces, new lines

Remove comments– Single line

-- Ada comment– Multi-line, start and end delimiters

{ Pascal comment }/* c comment */

– Nested– Runaway comments

Nonterminated comments can’t be detected till end of file

Page 75: CSC 415:  Translators and Compilers

Chart 75

What Does a Scanner Do?

Perform look ahead– Multi-character tokens

1..10 vs. 1.10&, &&<, <=etc

Challenging input languages– FORTRAN

Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal)

DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I

Page 76: CSC 415:  Translators and Compilers

Chart 76

What Does a Scanner Do?

Challenging input languages (cont.)– PL/I, keywords not reserved

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

Page 77: CSC 415:  Translators and Compilers

Chart 77

What Does a Scanner Do?

Error Handling– Error token passed to parser which reports the error– Recovery

Delete characters from current token which have been read so far, restart scanning at next unread character

Delete the first character of the current lexeme and resume scanning form next character.

– Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character

– Some errors that are not lexical errors Mistyped keywords

– Begim Mismatched parenthesis Undeclared variables

Page 78: CSC 415:  Translators and Compilers

Chart 78

Scanner Implementation

Issues– Simpler design – parser doesn’t have to worry about

white space, etc.– Improve compiler efficiency – allows the construction of

a specialized and potentially more efficient processor– Compiler portability is enhanced – input alphabet

peculiarities and other device-specific anomalies can be restricted to the scanner

Page 79: CSC 415:  Translators and Compilers

Chart 79

Scanner Implementation

What are the keywords in Triangle? How are keywords and identifiers implemented in

Triangles? Is look ahead implemented in Triangle?

– If so, how?

Page 80: CSC 415:  Translators and Compilers

Chart 80

Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Page 81: CSC 415:  Translators and Compilers

Chart 81

Parsing

Given an unambiguous, context free grammar, parsing is– Recognition of an input string, i.e., deciding whether or

not the input string is a sentence of the grammar– Parsing of an input string, i.e., recognition of the input

string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

Page 82: CSC 415:  Translators and Compilers

Chart 82

Parsing

The syntax of programming language constructs are described by context-free grammars.

Advantages of unambiguous, context-free grammars– A precise, yet easy-to understand, syntactic specification of

the programming language– For certain classes of grammars we can automatically

construct an efficient parser that determines if a source program is syntactically well formed.

– Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.

– Easier to add new constructs to the language if the implementation is based on a grammatical description of the language

Page 83: CSC 415:  Translators and Compilers

Chart 83

Parsing

Check the syntax (structure) of a program and create a tree representation of the program

Programming languages have non-regular constructs– Nesting– Recursion

Context-free grammars are used to express the syntax for programming languages

sequence of tokens parser syntax tree

Page 84: CSC 415:  Translators and Compilers

Chart 84

Context-Free Grammars

Comprised of– A set of tokens or terminal symbols– A set of non-terminal symbols– A set of rules or productions which express the legal

relationships between symbols– A start or goal symbol

Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr

Page 85: CSC 415:  Translators and Compilers

Chart 85

Context-Free Grammars

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr

expr

expr digit

digit

digit

3

2

8

+

-

Page 86: CSC 415:  Translators and Compilers

Chart 86

Checking for Correct Syntax

Given a grammar for a language and a program, how do you know if the syntax of the program is legal?

A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free

Page 87: CSC 415:  Translators and Compilers

Chart 87

Deriving a String

The derivation begins with the start symbol At each step of a derivation the right hand side of a

grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal

symbols remain

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2

expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4

Page 88: CSC 415:  Translators and Compilers

Chart 88

Rightmost Derivation

The rightmost non-terminal is replaced in each step

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr + digit - 2 expr + 8-2

expr + 8-2 digit + 8-2Rule 3

expr expr – digitRule 1

expr – digit expr – 2Rule 4

expr – 2 expr + digit - 2Rule 2

Rule 4

digit + 8-2 3+8 -2Rule 4

Page 89: CSC 415:  Translators and Compilers

Chart 89

Leftmost Derivation

The leftmost non-terminal is replaced in each step

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

Page 90: CSC 415:  Translators and Compilers

Chart 90

Leftmost Derivation

The leftmost non-terminal is replaced in each step

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

expr

expr

expr digit

digit

digit

3

2

8

+

-

3

2

1

4

5

6

1

2

3

4

5

6

Page 91: CSC 415:  Translators and Compilers

Chart 91

Bottom-Up Parsing

Parser examines terminal symbols of the input string, in order from left to right

Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)

Bottom-up parsing reduces a string w to the start symbol of the grammar.– At each reduction step a particular sub-string matching

the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

Page 92: CSC 415:  Translators and Compilers

Chart 92

Bottom-Up Parsing

Types of bottom-up parsing algorithms– Shift-reduce parsing

At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

– LR(k) parsing L is for left-to-right scanning of the input, the R is for

constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.

Page 93: CSC 415:  Translators and Compilers

Chart 93

Bottom-Up Parsing Example3+8-2

1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

3 + 8 - 2

3 + 8 - 2

digit

3 + 8 - 2

digitdigit

3 + 8 - 2

digitdigitexpr

Page 94: CSC 415:  Translators and Compilers

Chart 94

Bottom-Up Parsing Example3+8-2

3 + 8 - 2

digitdigitexpr

3 + 8 - 2

digitdigitexpr

digit

expr

3 + 8 - 2

digitdigitexpr

digit

Page 95: CSC 415:  Translators and Compilers

Chart 95

Bottom-Up Parsing Exampleabbcde

a b b c d1. S aABe2. A Abc | b3. B d

Example input: abbcde

e

a b b c d e

A

a b b c d e

A

Abbcde aAbcde

aAbcde

Page 96: CSC 415:  Translators and Compilers

Chart 96

Bottom-Up Parsing Exampleabbcde

1. S aABe2. A Abc | b3. B d

Example input: abbcde

a b b c d e

A

A

a b b c d e

A

A

aAde

aAbcde aAde

Page 97: CSC 415:  Translators and Compilers

Chart 97

Bottom-Up Parsing Exampleabbcde

1. S aABe2. A Abc | b3. B d

Example input: abbcde

a b b c d e

A

A

aAde aABe

B

a b b c d e

A

A

aABe

B

Page 98: CSC 415:  Translators and Compilers

Chart 98

Bottom-Up Parsing Exampleabbcde

1. S aABe2. A Abc | b3. B d

Example input: abbcde

a b b c d e

A

A

aABe S

B

S

Page 99: CSC 415:  Translators and Compilers

Chart 99

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

the cat sees a rat

Noun

the cat sees a rat. the Noun sees a rat.

the cat sees a rat

Noun

the Noun sees a rat.

.

.

.

Page 100: CSC 415:  Translators and Compilers

Chart 100

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

the Noun sees a rat. Subject sees a rat.

Subject

the cat sees a rat

Noun

Subject sees a rat.

Subject

.

.

Page 101: CSC 415:  Translators and Compilers

Chart 101

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject sees a rat. Subject Verb a rat.

Subject

Verb

.

the cat sees a rat

Noun

Subject Verb a rat.

Subject

Verb

.

Page 102: CSC 415:  Translators and Compilers

Chart 102

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject Verb a rat. Subject Verb a Noun.

Subject

Verb

.

Noun

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun.

Page 103: CSC 415:  Translators and Compilers

Chart 103

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun. Subject Verb Object.

Object

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

ObjectWhat would happened if we

choose ‘Subject a Noun’ instead of

‘Object a Noun’?

Page 104: CSC 415:  Translators and Compilers

Chart 104

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

Object

Sentence

Page 105: CSC 415:  Translators and Compilers

Chart 105

Top-Down Parsing

The parser examines the terminal symbols of the input string, in order from left to right.

The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string

Page 106: CSC 415:  Translators and Compilers

Chart 106

Top-Down Parsers

General rules for top-down parsers– Start with just a stub for the root node– At each step the parser takes the left most stub– If the stub is labeled by terminal symbol t, the parser

connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)

– If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).

– Parsing succeeds when and if the whole input string is connected up to the syntax tree.

Page 107: CSC 415:  Translators and Compilers

Chart 107

Top-Down Parsing

Two forms– Backtracking parsers

Guesses which rule to apply, back up, and changes choices if it can not proceed

– Predictive Parsers Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers

Page 108: CSC 415:  Translators and Compilers

Chart 108

Predictive Parsers

Many types– LL(1) parsing

First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead

Table driven with an explicit stack to maintain the parse tree– Recursive decent parsing

Uses recursive subroutines to traverse the parse tree

Page 109: CSC 415:  Translators and Compilers

Chart 109

Predictive Parsers (Lookahead)

Lookahead in predictive parsing– The lookahead token (next token in the input) is used

to determine which rule should be used next– For example:

1. term num term’2. term’ ‘+’ num term’ |

‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

term

num term’

Page 110: CSC 415:  Translators and Compilers

Chart 110

Predictive Parsers (Lookahead)

1. term num term’2. term’ ‘+’ num term’ |

‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

3

term’num7

+

term

num term’

3

- num term’

Page 111: CSC 415:  Translators and Compilers

Chart 111

Predictive Parsers (Lookahead)

1. term num term’2. term’ ‘+’ num term’ |

‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7 +

term

num term’

3 - num term’

2

term’num7 +

term

num term’

3 - num term’

2 e

Page 112: CSC 415:  Translators and Compilers

Chart 112

Recursive-Decent Parsing

Top-down parsing algorithm– Consists of a group of methods (programs) parseN,

one for each nonterminal symbol N of the grammar.– The task of each method parseN is to parse a single

N-phrase– These parsing methods cooperate to parse complete

sentences

Page 113: CSC 415:  Translators and Compilers

Chart 113

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

.a. Decide which production rule to apply. Only one, #1.This step created four stubs.

Page 114: CSC 415:  Translators and Compilers

Chart 114

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 115: CSC 415:  Translators and Compilers

Chart 115

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 116: CSC 415:  Translators and Compilers

Chart 116

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 117: CSC 415:  Translators and Compilers

Chart 117

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 118: CSC 415:  Translators and Compilers

Chart 118

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 119: CSC 415:  Translators and Compilers

Chart 119

Recursive-Decent Parsing

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 120: CSC 415:  Translators and Compilers

Chart 120

Recursive-Descent Parser for Micro-English

ParseSentenceParseSubjectParseObjectParseVerbParseNoun

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Page 121: CSC 415:  Translators and Compilers

Chart 121

Recursive-Descent Parser for Micro-English

ParseSentenceparseSubjectparseVerbparseObjectparseEnd

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Sentence

Subject

Verb

Object

.

Page 122: CSC 415:  Translators and Compilers

Chart 122

Recursive-Descent Parser for Micro-English

ParseSubjectif input = “I”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Subject I

|

Noun

a

|

Noun

the

Page 123: CSC 415:  Translators and Compilers

Chart 123

Recursive-Descent Parser for Micro-English

ParseNounif input = “cat”

acceptelse if input =“mat”

acceptelse if input = “rat”

acceptelse error

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Noun cat

| mat

| rat

Page 124: CSC 415:  Translators and Compilers

Chart 124

Recursive-Descent Parser for Micro-English

ParseObjectif input = “me”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Object me

|

Noun

a

|

Noun

the

Page 125: CSC 415:  Translators and Compilers

Chart 125

Recursive-Descent Parser for Micro-English

ParseVerbif input = “like”

acceptelse if input =“is”

acceptelse if input = “see”

acceptelse if input = “sees”

accept else error

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

Verb like

| is

| see

| sees

Page 126: CSC 415:  Translators and Compilers

Chart 126

Recursive-Descent Parser for Micro-English

ParseEndif input = “.”

acceptelse error

1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees

.

Page 127: CSC 415:  Translators and Compilers

Chart 127

Systematic Development of a Recursive-Descent Parser

Given a (suitable) context-free grammar– Express the grammar in EBNF, with a single production rule for

each nonterminal symbol, and perform any necessary grammar transformations

Always eliminate left recursion Always left-factorize whenever possible

– Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X

– Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the

scanner A public parse method that calls parseS, where S is the start symbol

of the grammar), having first called the scanner to store the first input token in currentToken

Page 128: CSC 415:  Translators and Compilers

Chart 128

Quote of the Week

“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”– Bjarne Stroustrup

Page 129: CSC 415:  Translators and Compilers

Chart 129

Quote of the WeekDid you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it.  I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

Page 130: CSC 415:  Translators and Compilers

Chart 130

Converting EBNF Production Rules to Parsing Methods

For production rule N::=X– Convert production rule to parsing method named parseN

Private void parseN () { Parse X }

– Refine parseE to a dummy statement– Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()– Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing

methodparseN()

– Refine parse X Y to{parseXparseY}}

– Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]

Parse XBreak;

Cases in starters[[Y]]:Parse YBreak

Default:Report a syntax error

}

Page 131: CSC 415:  Translators and Compilers

Chart 131

Converting EBNF Production Rules to Parsing Methods

For X | Y – Choose parse X only if the current token is one that

can start an X-phrase– Choose parse Y only if the current token is one that

can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint

For X*– Choose

while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can

follow X* in this particular context

Page 132: CSC 415:  Translators and Compilers

Chart 132

Converting EBNF Production Rules to Parsing Methods

A grammar that satisfies both these conditions is called an LL(1) grammar

Recursive-descent parsing is suitable only for LL(1) grammars

Page 133: CSC 415:  Translators and Compilers

Chart 133

Error Repair

Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.

Error repair usually occurs at two levels:– Local: repairs mistakes with little global import, such as

missing semicolons and undeclared variables.– Scope: repairs the program text so that scopes are

correct. Errors of this kind include unbalanced parentheses and begin/end blocks.

Page 134: CSC 415:  Translators and Compilers

Chart 134

Error Repair

Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:– No input should cause the compiler to collapse– Illegal constructs are flagged– Frequently occurring errors are repaired gracefully– Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

Page 135: CSC 415:  Translators and Compilers

Chart 135

Mini-Triangle Production RulesProgram ::= Command Program (1.14)

Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand(1.15b)

| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)

else Command| while Expression do Command WhileCommand

(1.15e| let Declaration in Command LetCommand

(1.15f)

Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression BinaryExpressioiun

(1.16d)

V-name ::= Identifier SimpelVname (1.17)

Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)| var Identifier : Typoe-denoter VarDeclaration

(1.18b)| Declaration ; Declaration SequentialDeclaration

(1.18c)

Type-denoter ::= Identifier SimpleTypeDenoter (1.19)

Page 136: CSC 415:  Translators and Compilers

Chart 136

Abstract Syntax Trees

An explicit representation of the source program’s phrase structure

AST for Mini-Triangle

Page 137: CSC 415:  Translators and Compilers

Chart 137

Abstract Syntax Trees

Program ASTs (P):

Program

C

Program ::= Command Program (1.14

Command ASTs (C):

AssignCommand

V E

CallCommand

Identifier E

spelling

SequentialCommand

C1 C2

Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

(1.15a)(1.15b) (1.15c)

Page 138: CSC 415:  Translators and Compilers

Chart 138

Abstract Syntax Trees

Command ASTs (C):

WhileCommand

V E

SequentialCommand

C1 C2(1.15e)(1.15d)

LetCommand

D C(1.15f) E

Command ::= | if Expression then Command IfCommand (15.d)else Command

| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)

Page 139: CSC 415:  Translators and Compilers

Chart 139

Midterm Review: Chapter 1

Context-free Grammar– A finite set of terminal symbols– A finite set of non-terminal symbols– A start symbol– A finite se to production rules

Aspects of a programming language that need to be specified– Syntax: form of programs– Contextual constraints: scope rules and type

variables– Semantics: meaning of programs

Page 140: CSC 415:  Translators and Compilers

Chart 140

Midterm Review: Chapter 1

Language specification– Informal: written in English– Formal: precise notation (BNF, EBNF)

UnambiguousConsistentComplete

Context-free language– Syntax tree– Phrase– Sentence

Page 141: CSC 415:  Translators and Compilers

Chart 141

Midterm Review: Chapter 1

Syntax tree– Terminal node labeled by terminal symbol– Non-terminal nodes labeled b y non-terminal symbol

Abstract Syntax Tree (AST)– Each non-terminal node ius labeled by production rule– Each non-terminal node has exactly one subtree for

each subprogram– Does not generate sentences

Page 142: CSC 415:  Translators and Compilers

Chart 142

Midterm Review: Chapter 2

Translator– Accepts any text expressed in one language (source

language) and generates a semantically-equivalent text expressed in another language (target language)

Compiler– Translates from high-level language into low-level

language Interpreter

– A program that accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately

Page 143: CSC 415:  Translators and Compilers

Chart 143

Midterm Review: Chapter 2

Interpretive compiler– Combination of compiler and interpreter

Some of the advantages of each Portable compiler

– Compiled and run on any mainline, without change– Portability measured by proportion of code that

remains unchanged– Portability is an economic issue

Bootstrapping– Using the language processor to process itself

Tombstone diagrams

Page 144: CSC 415:  Translators and Compilers

Chart 144

Midterm Review: Chapter 3 Three phases of compilation

– Syntactic analysis– Contextual analysis– Code generation

Single pass compilers Multi-pass compilers Compiler design issues

– Speed– Space– Modularity– Flexibility– Semantic preserving transformations– Source language properties

Page 145: CSC 415:  Translators and Compilers

Chart 145

Midterm Review: Chapter 4

Sub-phases of syntactic analysis– Scanning (lexical analysis)

Source program transformed to stream of tokens Comments and blank spaces between tokens are discarded

– Parsing Source program in form of stream of tokens parsed to

determine phrase structure Parser treats each token as a terminal symbol

– Representation of the phrase structure A data structure representing the source program’s phrase

structure Typically an abstract syntax tree (AST)

Page 146: CSC 415:  Translators and Compilers

Chart 146

Midterm Review: Chapter 4

Tokens– An atomic symbol of the source program– May consist of several characters– Classified according to kind

All tokens of the same kind can be freely interchanged without affecting the program’s phrase structure

– Each token completely described by it’s kind and spelling

Token represented by tuple– Only kind of each token examined by parser

Spelling examined by contextual analyzer and/or code generator

Page 147: CSC 415:  Translators and Compilers

Chart 147

Midterm Review: Chapter 4

Grammars– Regular expressions

“|” separates alternatives “*” indicates that the previous item may be repeated zero or

more times “(“ and “)” are grouping parenthesis e is the empty string

– a special string of length 0 Algebraic properties Common extensions

– Grammar transformations Left factorization Elimination of left recursion Substitution of non-terminal symbols

Page 148: CSC 415:  Translators and Compilers

Chart 148

Midterm Review: Chapter 4

Structure of compiler– Source code– Lexical analyzer– Parser & semantic analyzer– Intermediate code generation– Optimization– Assembly code generation– Assembly code

Page 149: CSC 415:  Translators and Compilers

Chart 149

Midterm Review: Chapter 4

Scanning (lexical analysis)– What does it do?

Handles keywords (reserve wordsRemoves white space (tabs, spaces, new lines)Removes commentsPerform look aheadError handling

– IssuesSimpler designImprove compiler efficiencyEnhance compiler portability

Page 150: CSC 415:  Translators and Compilers

Chart 150

Midterm Review: Chapter 4

Parsing– Given an unambiguous, context-free grammar

Recognition of input string – sentence in grammarParsing an input string – determines its phrase

structure– Why is unambiguous important?– Advantages of unambiguous, context-free

grammars (see chart 81)– How do you know the syntax of a language is

legal?A legal program can be derived from the start

symbol of the grammar

Page 151: CSC 415:  Translators and Compilers

Chart 151

Midterm Review: Chapter 4

Parsing– Rightmost (replace rightmost non-terminal in

each step) and leftmost (replaced leftmost non-terminal in each step) derivation

– Bottom-up (reconstructs syntax tree from terminal nodes up toward the root node) and top-down (reconstructs syntax tree from the root node down towards the terminal nodes)

– Predictive parsersLL(1)Recursive decent

Page 152: CSC 415:  Translators and Compilers

Chart 152

Midterm Review: Chapter 4

Parsing– Converting EBNF production rules to parsing

methods– Error repair

Page 153: CSC 415:  Translators and Compilers

Chart 153

Chapter 5: Contextual Analysis

Identification– Monolithic Block Structure– Flat Block Structure– Nested Block Structure– Attributes– Standard Environment

Type Checking A Contextual Analysis Algorithm Case Study: Contextual Analysis in the Triangle

Compiler

Page 154: CSC 415:  Translators and Compilers

Chart 154

Contextual Analysis

Given a parsed program, the purpose of contextual analysis is to check that the program conforms to the source language’s contextual constraints.– Scope rules: rules governing declarations and applied

occurrences of identifiers– Type rules: rules that allow us t0 infer the types of

expressions, and to decide whether each expression has a valid type

Analysis of the program to determine correctness with respect to the language definition (beyond structure)

Page 155: CSC 415:  Translators and Compilers

Chart 155

Contextual Analysis

Contextual analysis consists of two sub-phases:– Identification: applying the source language’s scope

rules to relate each applied occurrence of an identifier to its declaration (if any).

– Type checking: applying the source language's type rules to infer the type of each expression, and compare that type with the expected type.

Page 156: CSC 415:  Translators and Compilers

Chart 156

Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Page 157: CSC 415:  Translators and Compilers

Chart 157

Identification

Relate each applied occurrence of an identifier in the source program to the corresponding declaration– Ill-formed program if no corresponding declaration –

generate error Identification could cause compiler efficiency

problems Inefficient to use the AST

Page 158: CSC 415:  Translators and Compilers

Chart 158

Identification Table

Also known as symbol table Associates identifiers with their attributes Basic operation

– Make the identification table empty– Add an entry associating a given identifier with a given attribute– Retrieve the attribute (if any) associated with a given identifier

Attribute– Consists of information relevant to contextual analysis– Obtained from the identifier’s declaration

Page 159: CSC 415:  Translators and Compilers

Chart 159

Identification Table

Each declaration in a program has a defined scope– Portion of program over which the declaration takes

effect Block: any program phase that delimits the scope

of declarations within it Example Triangle block command

– Let D in C Scope of each declaration in D extends over the subcommand

C

Page 160: CSC 415:  Translators and Compilers

Chart 160

Identification Table: Structure/Implementation

Maintain scope– An identifier should be found in the table only when

valid– If an identifier is defined in multiple scopes, then a

lookup in the table must provide the appropriate meaning for the use

Efficiency– How fast is lookup?– How fast to enter/exit a scope?– What is the overall table size?

Page 161: CSC 415:  Translators and Compilers

Chart 161

Identification Table: Structure/Implementation

Different implementations– Organized for efficient retrieval– Binary search tree– Hash table

Page 162: CSC 415:  Translators and Compilers

Chart 162

Identification Table: Functionality

A mapping of identifiers to their meanings

Information– Name– Type– Location

Operations– Create– Insert– Lookup– Delete– Update entry– Entering a new

scope– Leaving a scope

Page 163: CSC 415:  Translators and Compilers

Chart 163

Block Structures

Monolithic block structure– Basic and Cobol

Flat block structure– Fortran

Nested block structure– Pascal, Ada, C, and Java

Page 164: CSC 415:  Translators and Compilers

Chart 164

Monolithic Block Structure

The only block is the entire program All declarations are global Simple rules

– No identifier may be declared more than once– For every applied occurrence of an identifier I, there must be a

corresponding declaration of I No identifier may be used unless declared

The identification table should contain entries for all declarations in the source program– At most, one entry for each identifier– The table contains an identifier I and the associated attribute A

Page 165: CSC 415:  Translators and Compilers

Chart 165

Monolithic Block Structure

Identification Attribute

b

n

c

(1)

(2)

(3)

Program(1) integer b = 10(2) integer n(3) char C

begin…n = n * b…Write c…

end

• Create new tablecreate command

• At declaration for identifier I, make table entry

insert command• At applied occurrence of identifier I, retrieve

information from tablelookup command

Page 166: CSC 415:  Translators and Compilers

Chart 166

Flat Block Structure

Program partitioned into several disjoint blocks Two scope levels

– Some declarations are local in scope Identifiers restricted to particular block

– Other declarations are global in scope Identifiers allowed anywhere in the program – the program as a whole

is a block Less simple rules

– No global declared identifier may be redeclared globally But same identifier may also be declared locally

– No locally declared identifier may be redeclared in the same block Same identifier may be declared locally in several different blocks

– For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I

Either global declaration of I or a declaration of I local to B

Minor complication is to distinguish global and local declaration entries

Page 167: CSC 415:  Translators and Compilers

Chart 167

Flat Block Structure

• Create new tablecreate command

• At start of a blockenter new scope command

• At end of a blockleave scope commanddelete command

• At declaration for identifier I, make table entry

insert command• At applied occurrence of identifier I,

retrieve information from tablelookup command

(5) integer cbegin

…end

(4) procedure R

(2) real r(3) real pi = 3.14begin

…end

(1) procedure Q

(6) integer i(7) boolean b(8)char cbegin

…call R…

end

program

Identification Attribute

Q

r

pi

(1)

(2)

(3)

Level

global

local

local

Identification Attribute

Q

R

c

(1)

(4)

(5)

Level

global

global

local

Identification Attribute

Q

R

(1)

(4)

Level

global

global

Identification Attribute

Q

R

i

(1)

(4)

(6)

Level

global

global

local

local

local

b

c

(7)

(8)

Page 168: CSC 415:  Translators and Compilers

Chart 168

Nested Block Structure

Blocks may be nested one within another Many scope levels

– Declarations in the outermost block are global in scope.

The outermost block is at scope level 1– Declarations inside an inner block are local to that

block Every inner block is completely enclosed by another block Next to outermost block is at scope level 2 If enclosed by a level-n, the block is at scope level n+1

Page 169: CSC 415:  Translators and Compilers

Chart 169

Nested Block Structure

More complex rules– No identifier may be declared more than once in the

same block Same identifier may be declared in different blocks, even if

they are nested– For every applied occurrence of an identifier I in a

block B, there must be a corresponding declaration of I Must be in B itself Or in the block B’ immediately enclosing B Or in B’’ immediately enclosing B’ Etc.In smallest enclosing block that contains any declaration of I

Page 170: CSC 415:  Translators and Compilers

Chart 170

Nested Block Structure• Create new table

create command• At start of a block

enter new scope command• At end of a block

leave scope commanddelete command

• At declaration for identifier I, make table entry

insert commandLevel number determined by number of calls to enter new scope

• At applied occurrence of identifier I, retrieve information from table using highest level for I

lookup command

Let(1) var a: Integer;(2) var b: BooleanIn

begin…;

Identification Attribute

a

b

(1)

(2)

Level

1

1

Identification Attribute

a

b

b

(1)

(2)

(3)

Level

1

1

2

2

3

c

d

(4)

(5)

let(3) var b: Integer;(4) var c: BooleanIn

begin…;

let(6) var d: Boolean;(7) Var e: Integer

in…;

…end;

…end

let(5) var d: Integer;

In…;

Identification Attribute

a

b

b

(1)

(2)

(3)

Level

1

1

2

2 c (4)

Identification Attribute

a

b

d

(1)

(2)

(6)

Level

1

1

2

e (7)2

Page 171: CSC 415:  Translators and Compilers

Chart 171

Attributes

Kind– constant– variable– procedure– function– type

Type– boolean– character– integer– record– array

Examples

Page 172: CSC 415:  Translators and Compilers

Chart 172

Attributes

Information to be extracted from declaration– Constant, variable, procedure, function, type– Procedure or function declaration includes a list of formal

parameters that may be a constant, variable, procedural, or functional parameter

– Language provides whole families of record and array types How to manage attribute information

– Extract type information from declarations and store in information table

Could be complex for a realistic programming language Could require tedious programming

– Use the AST Pointers in information table pointing to location in AST with that

identifier

Page 173: CSC 415:  Translators and Compilers

Chart 173

AttributesProgram

LetCommand

SequentialDeclaration SequentialCommand

VarDeclaration VarDeclaration SequentialCommand

LetCommand

SequentialDeclaration

VarDeclaration VarDeclaration

Ident. int boolIdent.

Ident. intbool Ident.

a b

d e

. . .

. . .

. . .

Identification Attributeab

Level

11

(1) (2)

(6)

Identification Attributeabd

Level112

e

(7)

2

Page 174: CSC 415:  Translators and Compilers

Chart 174

Standard Environment

Predefined constants, variables, types, procedures, and functions

These are loaded into the identification table Scope rules for standard environment

– Scope enclosing the entire program Level 0

– Same scope level as global declarations Example is C

Page 175: CSC 415:  Translators and Compilers

Chart 175

Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Page 176: CSC 415:  Translators and Compilers

Chart 176

Type Checking

Second task of contextual analyzer is to ensure that the source program contains on type errors

Once applied occurrence of an identifier has been identified, the contextual analyzer will check that the identifier is used in a way consistent with its declaration

Page 177: CSC 415:  Translators and Compilers

Chart 177

Type Checking

Statically –typed language can detect any type errors without actually running the program– For every expression E in the language, the compiler

can infer either that E has some type T or that E is ill-typed

If E does have type T, then E will always yield a value of type T If a value of type T’ is expected, then compiler checks that T’ is

equivalent to T

Page 178: CSC 415:  Translators and Compilers

Chart 178

Type Checking Infers the type of each expression bottom-up

– Starting with literals and identifiers, and working up through larger and larger subexpressions

– Literal: The type of a literal is immediately known– Identifier: The type of an applied occurrence of identifier I is

obtained from the corresponding declaration of I– Unary operator application:

Consider “O E” where O is a unary operator of type T1 T2 Type checker ensures that E’s type is equivalent to T1 Infers that type of “O E” is T2. Otherwise a type error

– Binary operator application: Consider “E1 O E2” where O is binary operator of type T1 X T2 T3 E1’s type is equivalent to T1 E2’s type is equivalent to T2 ‘E1 O E2‘ is of type T3 Otherwise type error

Page 179: CSC 415:  Translators and Compilers

Chart 179

Type Checking

Type of a nontrivial expression is inferred from the types of its sub-expressions, using the appropriate type rules

Must be able to test if two given types T and T’ are equivalent

Page 180: CSC 415:  Translators and Compilers

Chart 180

Type Checking – Constant or Variable Identifier

ConstDeclaration

Ident. Expr.

x . . .

:T

SimpelVname

Ident.

x

ConstDeclaration

Ident. Expr.

x . . .

:T

SimpelVname

Ident.

x

:T

Page 181: CSC 415:  Translators and Compilers

Chart 181

Type Checking – Variable Declaration

VarDeclaration

Ident.

x

T

SimpelVname

Ident.

x

VarDeclaration

Ident.

x

T

SimpelVname

Ident.

x

:T

Page 182: CSC 415:  Translators and Compilers

Chart 182

Type Checking – Binary Operator

BinaryExpression

Ident.

. . .

Expr.Op.

. . .

:int:int

<

BinaryExpression

Ident.

. . .

Expr.Op.

. . .

:int:int

<

:bool

< is of type Int X int bool

Page 183: CSC 415:  Translators and Compilers

Chart 183

Type Checking

Each applied occurrence of an identifier must be identified before type checking can proceed

Page 184: CSC 415:  Translators and Compilers

Chart 184

Chapter 6: Run-time Organization

Marshal the resources of the target machine (instructions, storage, and system software) in order to implement the source language

Page 185: CSC 415:  Translators and Compilers

Chart 185

Chapter 6: Run-time Organization

Data Representation– How should we represent the values of each source-language type in the

target machine? Expression Evaluation

– How should we organize the evaluation of expressions, taking care of intermediate results?

Static Storage Allocation– How should we organize storage for variables, taking into account the

different lifetimes of global, local, and heap variables? Stack Storage Allocation Routines

– How should we implement procedures, functions, and parameters, in terms of low-level routines?

Heap Storage Allocation Run-time Organization for Object-oriented Languages

– How should we represent objects and methods? Case Study: The Abstract Machine TAM

Page 186: CSC 415:  Translators and Compilers

Chart 186

Data Representation

How should we represent the values of each source-language type in the target machine?

High-Level Data Types• Truth values• Integers• Characters• Records• Arrays• Operations over these types

Machine Data Types• Bits• Bytes• Words• Double-words• Low-level arithmetic

and logical operations

Need to bridge the semantic gap between high-level types and machine level types

Page 187: CSC 415:  Translators and Compilers

Chart 187

Data Representation -- Fundamental Principles

Non-confusion– Different values of a given type should have different

representations– If two different values are confused, i.e., have the same

representation, then comparison of these values will incorrectly treat the values as equal

– Example: approximate representation of real numbers Real numbers that are slightly different mathematically might have the

same approximate representation Difficult to avoid – need to take care during compiler design

– Must avoid confusion in the representations of discrete types such as truth values, characters, and integers

– For statically typed languages need only be concedrned with values of the same type

00…002 may represent false, the integer 0, the real number 0.0 Compile time type checks will denote the values of different types

Page 188: CSC 415:  Translators and Compilers

Chart 188

Data Representation -- Fundamental Principles

Uniqueness– Each value should always have the same

representation– Example of non-uniqueness

Ones-complement representation of integers in which zero is represented both by 00...002 and 11…112 (+0 and –0)

A simple bit-string co0parison would incorrectly treat these values as unequal

More specialized integer comparison must be used Alternative twos-complement representation gives us unique

representations of integers

Page 189: CSC 415:  Translators and Compilers

Chart 189

Data Representation – Pragmatic Issues

Constant-size representation– The representations of all values of a given type should

occupy the same amount of space– Make possible for compiler to plan the allocation of

storage– Knowing the type of variable but not the actual value,

the compiler will know exactly how much storage space the variable will occupy

Page 190: CSC 415:  Translators and Compilers

Chart 190

Data Representation – Pragmatic Issues

Direct representation vs. indirect representation– Should the values of a given type be represented

directly, or indirectly through pointers?– Direct representation

Just the binary representation of the value consisting of one or more bits, bytes, words

– Indirect representation A handle that points to the storage area which has the binary

representation of the value Essential for types whose values vary greatly in size

– List or dynamic array

Page 191: CSC 415:  Translators and Compilers

Chart 191

Direct representation vs. indirect representation

x x yhandle handle

Same type as x but requiring more space

Page 192: CSC 415:  Translators and Compilers

Chart 192

Notation

#T: cardinality of type T– Number of distinct values of type T– #[[Boolean]] = 2

Size T: amount of space (in bits, bytes, or words) occupied by each value of type T– For indirect representation only handle is counted

For direct representation of type T– size T log2 (#T) or 2(size T) #T– size T is represented in bits– In n bits we can represent at most 2n distinct values if

we are to avoid confusion non-confusion requirement

Page 193: CSC 415:  Translators and Compilers

Chart 193

Primitive Types

Cannot be decomposed into simpler values Most programming languages provide these

primitive types– Boolean, Char, Integer– Also provide elementary logical and arithmetic

operations Machines typically support the above primitive

types, so choice of representation is straightforward

Page 194: CSC 415:  Translators and Compilers

Chart 194

Primitive Types Representation

Boolean– true and false– Since #[[Boolean]] = 2 then size[[Boolean]] 1 bit– Can represent Boolean with one bit, one bye, or one

word For single bit: 0 for false and 1 for true For byte or word: 00…002 for false and either 00…012 or 11…

112 for true Negation, conjunction, disjunction NOT, AND, OR

Page 195: CSC 415:  Translators and Compilers

Chart 195

Primitive Types Representation

Char– Source language can specify character set

Ada: ISO-Latin1 character set (28 distinct characters) Java: Unicode character set (216 distinct characters)

– Most do not Allows compiler writers to choose the machine’s native

character set (27 or 28 distinct characters)– ISO defines character representation for “A” to be

010000012

Can represent a character by one byte or one word

Page 196: CSC 415:  Translators and Compilers

Chart 196

Primitive Types Representation

Integer– Denotes an implementation-defined bounded range of integers

Defined by the individual language processor– Binary representation determined by target machine’s arithmetic

unit and almost always occupies one word Can implement language’s integer operations with machine's integer

operations– Pascal and Triangle

-maxint, …, -1, 0, +1, …, +maxint– maxint is implementation defined

#[[Integer]] = 2 X maxint + 1 2size[[Integer]] 2 X maxint + 1 For word size of w bits, size[[Integer]] = w, maxint = 2w-1 – 1

– Java Int denotes –231, …, -1, 0, +1, …, +231 – 1 #[[Int]] = 232

Page 197: CSC 415:  Translators and Compilers

Chart 197

Record Type

Consists of several fields, each of which has an identifier– All records of a particular type have fields with the

same identifiers and types Fundamental operation on records is field

selection– Use one field identifier to access the corresponding

field Simple representation

– Juxtapose the fields to make them occupy consecutive positions in storage

– Allows us to predict total sized of each record and the position of each field relative to the base of the record

Page 198: CSC 415:  Translators and Compilers

Chart 198

Record Type

Consider the followingtype T = record I1: T1, …, In: Tn end;var r: T

– size T = size T1 + … + size Tn

– If size T1, .., and size Tn are all constant, then size T is also constant

– Implementation of field selection Address[[r.Ii]] = address r + (size T1 + … + size Ti-1)

Value of type T1

Value of type T2

Value of type Tn

r.I1

r.I2

r.In

… …

Some machines have alignment restrictions, which force unused space to be left between record fields; cannot use these equations

Page 199: CSC 415:  Translators and Compilers

Chart 199

Disjoint Unions Tag and a variant part Value of tag determines type of variant part

– T = T1 + … + Tn

In each value of type T, the variant part is a value chosen from one of the types T1, …, or Tn; the tag indicates which one

– Size T = size Ttag + max(sizeT1, …, size Tn)– Address[[u.Itag]] = address u + 0– Address[[u.Ii]] = address u + size Ttag

value of type T1

value of type T2

value of type Tn

value of type Ttag

u.Itag u.Itag u.Itag

u.I1u.I2 u.Inor … or …

Will

hav

e w

aste

d sp

ace

Wasted space

Max

(siz

eT1,…

,size

T n)

Page 200: CSC 415:  Translators and Compilers

Chart 200

Static Arrays

Consists of several elements, all of the same type– Bounded range of indices – usually integers– Each index has exactly one element– Fundamental operation on arrays is indexing

Access an individual element by giving its index Index evaluated at run-time

– Static Array Index bounds are known at compile-time Direct representation is to juxtapose the array elements, in

order of increasing indices. Implemented by run-time address computation

Page 201: CSC 415:  Translators and Compilers

Chart 201

Static Arrays (lower index bound is 0)

Consider the following exampleType T = array n of Telem;Var a: T

– Size T = n X size Telem

– The number of elements n is constant, so size Telem is constant, then size T is also constant

– Address[[a[i] ]] = address a + (i X size Telem)

– Since i is known only at run-time, an array indexing implies a run-time address computation

a[0]a[1]a[2]

a[n-1]

values of type Telem

Page 202: CSC 415:  Translators and Compilers

Chart 202

Static Arrays (programmer chooses lower and upper array bounds)

Consider the following exampleType T = array [l..u] of Telem;Var a: T

– size T = (u - l + 1) X size Telem– The number of elements (u – l + 1) is

constant, so size Telem is constant, then size T is also constant

– address[[a[i] ]] = address a + (i – l) X size Telem) = address a – (l X size Telem) + (i X size Telem)

– Address[[a[0] ]] = address a – (l X size Telem)– Address[[a[i] ]] = address[[a[0] ]] + (i X size

Telem)– Since i is known only at run-time, an array

indexing implies a run-time address computation

– Index check must ensure that l i u

a[l]a[l+1]a[l+2]

a[u]

values of type Telem

Page 203: CSC 415:  Translators and Compilers

Chart 203

Dynamic Arrays

An array whose index bounds are not know until run-time– Different dynamic arrays of the same type may have

different index bounds, and therefore different numbers of elements

– Need to satisfy constant-size requirement– Create array descriptor or handle

Pointer to the array’s elements Index bounds Handle has constant size

Page 204: CSC 415:  Translators and Compilers

Chart 204

Dynamic Arrays Ada example

Type T is array [Integer range <>) of Telem;a: T (E1 .. E2);

– size T = address:size + 2 X size[[Integer]] Address:size is the amount of space required to store an address –

usually one word. Satisfies constant-size requirement

– Declaration of array variable a: E1 and E2 are evaluated to yield a’s index bounds (say l and u) Space is allocated for (u – l + 1) elements, juxtaposed and separate

from a’s handle Address[[a(0)]] = address[[a(l)]] – (l X size Telem) Values for address[[a(0)]], l, and u are stored in a’s handle

– The element with index i will be address as follows: Address[[a(i)]] = address[[a(0)]] + (i X size Telem) =

content(address[[a]]) + (i X size Telem) Index check is l i u where l = content(address[[a]] + address:size)

and u = content(address[[a]]+ address:size + size[[Integer]]

Page 205: CSC 415:  Translators and Compilers

Chart 205

Dynamic Arrays

a[l]

a[u]

elements of type Telem

a[l+1]a[l+2]

a[0]lu

origina lower bound

upper bound

handle

Page 206: CSC 415:  Translators and Compilers

Chart 206

Status

Chapter 6: Run-time Organization– Data Representations

Primitive types Record types Disjoint unions Static arrays Dynamic arrays Recursive types

– Expression Evaluation Register machine Stack machine

– Static Storage Allocation Global variables

– Stack Storage Allocation Local variables

Page 207: CSC 415:  Translators and Compilers

Chart 207

Recursive Types

Defined in terms of itself– Values of recursive type T have components that are

themselves of type T– Examples

List with tail being itself a list Tree with the sub-trees themselves being trees

Page 208: CSC 415:  Translators and Compilers

Chart 208

Recursive Types Consider the Pascal declaration

type IntList = ^IntNode; IntNode = record

head: Integer;tail: IntList

end;var primes: IntList

– Size[[IntList]] = address:size (usually 1 word)

primeshandle

Always use pointers to represent values of the recursive type

Page 209: CSC 415:  Translators and Compilers

Chart 209

Expression EvaluationRegister Machine

How should we organize the evaluation of expressions

The problem is the need to keep intermediate results somewhere

Consider the expressiona * b + (1 – (c * 2))

– Will have intermediate results for a * b, c * 2, and 1 – (c * 2)

– For a register based machine (non-stack machine) Use the registers to store intermediate results Problem arises when there are not enough registers for all

intermediate results

Page 210: CSC 415:  Translators and Compilers

Chart 210

Expression EvaluationExample a * b + (1 – (c * 2))

LOAD R1 aMULT R1 bLOAD R2 #1LOAD R3 cMULT R3 #2SUB R2 R3ADD R1 R2

a, b, c are memory addresses for the values of a, b, c

Page 211: CSC 415:  Translators and Compilers

Chart 211

Expression EvaluationStack Machine

The machine provides a stack for holding intermediate results

For the expression a * b + (1 – (c * 2))LOAD aLOAD bMULTLOADL 1LOAD cLOADL 2MULTSUBADD

Page 212: CSC 415:  Translators and Compilers

Chart 212

Expression EvaluationStack Machine Example a * b + (1 – (c * 2))

value of a

unusedspace

value of avalue of b

value of a*b value of a*b1

value of a*b1

value of c

value of a*b1

value of c2

value of a*b1

value of c*2

value of a*bvalue of 1-(c*2)

value of (a*b)+(1-(c*2))

(1) After LOAD a (2) After LOAD b (3) After MULT

(5) After LOAD c

(4) After LOAD 1

(6) After LOAD 2 (7) After MULT (8) After SUB

(9) After ADD

Operands of different types (and therefore different sizes) can be evaluated in just the same way. E.g., AND, OR, function, etc. Each operation takes values from top of stack and places results onto top of stack

Page 213: CSC 415:  Translators and Compilers

Chart 213

Static Storage AllocationGlobal Variables

Each variable in source program requires enough storage to contain any value that might be assigned to it

As a consequence of constant-size representation, the compiler knows how much storage needs to be allocated to variable, based on type of variable (size T)

Global variables– Variables that exist and take up storage throughout the program’s

run-time.– Static storage allocation: Compiler locates these variables at

some fixed positions in storage (decides each global variable’s address relative to the base of the storage region in which global variables are located)

Page 214: CSC 415:  Translators and Compilers

Chart 214

Static Storage AllocationGlobal Variables: Example

lettype Date = record

y: Integer,m: Integer;d: Integer

end;var a: array 3 of Integer;var b: Boolean;var c: Char;var t: Date

in. . .

a(0)a(1)a(2)bct.yt.mt.dunusedspace

a

t

Page 215: CSC 415:  Translators and Compilers

Chart 215

Stack Storage AllocationLocal Variables

A local variable v is one that is declared inside a procedure (or function).

Lifetime of v: the variable v exists (occupies storage) only during an activation of that procedure

If same procedure is activated several times– v will have several lifetimes– Each activation creates a distinct variable

Page 216: CSC 415:  Translators and Compilers

Chart 216

Stack Storage AllocationLocal Variables: An Example

letvar a: array 3 of Integer;var b: Boolean;var c: Char;proc Y () ~

letvar d: Integer;var e: record c: Char, n:

Integer endin

. . .proc Z () ~

letvar f: Integer

inbegin …; Y(); … end

inbegin …; Y(); …; Z(); … end

Page 217: CSC 415:  Translators and Compilers

Chart 217

Stack Storage AllocationLocal Variables: An Example

time

Programcalls Y

Returnfrom Y

Programcalls Z

Z calls Y Returnfrom Y

Returnfrom Z

Programstops

Lifetime of variables local to Y

Lifetime of variables local to Z

Lifetime of variables local to Y

Lifetime of global variables

Observations:• Global variables are the only ones that exist throughout the program’s run-time

• Use static allocation for global variables• Lifetimes of local variables are properly nested

• Use a stack for local variables

Page 218: CSC 415:  Translators and Compilers

Chart 218

Stack Storage AllocationStack Frames: An Example

globals

SB

ST

(1) After program starts

globals

SB

LB

(2) After program calls Y

globals

SB

ST

(3) After return from Y

framefor Y

ST

globals

SB

LB

(4) After program calls Z

framefor Z

ST

globals

SB

LB

(5) After Z calls Y

framefor Z

ST

framefor Y

globals

SB

LB

(6) After return from Y

framefor Z

ST

globals

SB

ST

(7) After return from Z

dynamic links

RegistersSB: Stack Base – Location of global

variablesLB: Local Base – Local variables of

currently running procedureST: Stack Top – Very top of stack

Page 219: CSC 415:  Translators and Compilers

Chart 219

Stack Storage Allocation

The stack varies in size– For example, the frames for each of Y’s activation are at two

different locations– The position of a frame within a stack cannot be predicted in

advance– Need registers dedicated to point to the frames

Registers (find address of variables relative to these registers)– SB: stack base – is fixed, pointing to the base of the stack. This is

where the global variables are located.– LB: local base – points to the base of the topmost frame in the

stack. This frame always contains the variables of the currently running procedure.

– ST: stack top – points to the very top of the stack. ST keeps track of the frame boundary as expressions are evaluated and the top of the stack expands and contracts.

Page 220: CSC 415:  Translators and Compilers

Chart 220

Stack Storage Allocation

Frame contents– Space for local variables– Link data

Return address – code address to which control will be returned at the end of the procedure activation. It is the address of the instruction following the call instruction that activated the procedure in the first place.

Dynamic link – the pointer to the base of the underlying fram e in the stack. It is the old content of LB and will be restored at end of procedure activation

dynamic linkreturn addresslink data

local data

Since there are two words of link data, local variable addresses are offset by 2

This only considers access to local or global variables, not nested variables.

Page 221: CSC 415:  Translators and Compilers

Chart 221

Chapter 7: Code Generation

Code Selection A Code Generation Algorithm Constants and Variables Procedures and Functions Case Study: Code Generation in the Triangle

Compiler

Page 222: CSC 415:  Translators and Compilers

Chart 222

Code Generation

Translation of the source program to object code– Dependent on source language and target machine

Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or

a mixture– Single addressing mode, or many

Page 223: CSC 415:  Translators and Compilers

Chart 223

Code GenerationMajor Subproblems

Code selection: which sequence of target machine instructions will be the object code for each phrase– Write code templates: a general rule specifying the object code of

all phases of a particular form (e.g., all assignment commands, etc.)

– But there are usually lots of special cases Storage allocation: deciding the storage address of each

variable in source program– Exact for glob al variables, but only relative for local variables

Register allocation: should be used to hold intermediate results during expression evaluation– Complex expressions -- not enough registers

Since code generation for stack machine much simpler than for register machine, will only generate code for stack machine

Page 224: CSC 415:  Translators and Compilers

Chart 224

Code GenerationCode Selection

Deciding which sequence of instructions to generate for each case

Code template: specifies the object code to which a phrase is translated, in terms of the object code to which its sub phrases are translated.

Object code: sequence of instructions to which the source-language phrase will be translated

Code specification: collection of code functions and code templates; must cover the entire source langauge

Page 225: CSC 415:  Translators and Compilers

Chart 225

Abstract Machine TAM

Suitable for executing programs compiled from a block-structured language such as Triangle

All evaluation takes place o a stack Primitive arithmetic, logical, and other operations

are treated uniformly with programmed functions and procedures

Two separate stores– Code Store: 32-bit instruction words (read only)– Data Store: 16-bit data words (read-write)

Page 226: CSC 415:  Translators and Compilers

Chart 226

Abstract Machine TAMCode and Data Stores

Code Store– Fixed while program is running– Code segment: contains the program’s instructions

CB points to base of code segment CT points to top of code segment CP points to next instruction to be executed

– Initialized to CB (programs first instruction is at base of code segment)

– Primitive segment: contains ‘microcode’ for elementary arithmetic , logical, input-output, heap, and general-purpose operations

PB points to base of primitive segment PT points to top of primitive segment

Page 227: CSC 415:  Translators and Compilers

Chart 227

Abstract Machine TAMCode and Data Stores

Data Store– While program is running segments of data store may

vary– Stack grows from low-address end of Data Store

SB points to base of the stack ST points to top of the stack

– Initialized to SB– Heap grows from the high-address endo fo Data Store

HB points to base of heap HT points to top of heap

– Initialized to HB

Page 228: CSC 415:  Translators and Compilers

Chart 228

Abstract Machine TAMCode and Data Stores

Code Store

codesegment

unused

primitivesegment

CB

CP

CT

PB

PT

Data Store

globalsegment

unused

heapsegment

SB

LB

ST

HT

HB

frame

frame

stack

• Stack and heap can expand and contract

• Global segment is always at base of stack

• Stack can contain any number of other segments known as frames containing data local to an activation of some routine

• LB points to base of topmost frame

Page 229: CSC 415:  Translators and Compilers

Chart 229

Code Functions

Run the program P and then halt, starting and finishing with an empty stack

Execute the command C, possibly updating variables, but neither expanding nor contracting the stack

Execute the expression E, pushing its result on to the stack top, but having no other effect

Push the value of the constant or variable named V on to the stack top

Pop a value from t he stack top, and store it in the variable named V

Elaborate the declaration D, expanding the stack to make space for any constants and variables declared therein

run P

execute C

evaluate E

fetch V

assign V

elaborate D

Page 230: CSC 415:  Translators and Compilers

Chart 230

Abstract Machine TAMInstructions

Fetch an n-word object from the data address (d+register r), and push it on the stack

Push the data address (d+register r) on to the stackPop a data address from the stack, fetch an n-word object from that address, and

push it on to the stackPush the 1-word literal value d on to the stackPop an n-word object from the stack, and store it at the data address (d+register r)Pop an address from the stack, then pop an n-word object from t he stack and store

it at that addressCall the routine at code address (d+register r), using the address in register n as

the static linkPop a closure (static link and code address) from the stack, then call the routine at

that code addressReturn from the current routine: pop an n-word result from the stack, then pop the

topmost frame, then pop d words of arguments, then push the result back on to the stack

Push d words (uninitialized) on to the stackPop an n-word result from the stack, then pop d more words, then push the result

back on to the stackJump to code address (d+register r)Pop a code address from the stack, then jump to that addressPop a 1-word value from the stack, then jump to code address (d+register r) if and

only if that value equals nStop execution of the program

LOAD(n) d[r]LOADA d[r]LOADI(n)

HALT

LOADL dSTORE(n) d[r]STOREI(n)

CALL(n) d[r]

CALLI

RETURN(n) d

PUSH dPOP(n) d

JUMP d[r]JUMPIJUMPIF(n) d[r]

Page 231: CSC 415:  Translators and Compilers

Chart 231

While Command

execute [[while E do C]] =

JUMP h– g: execute C– h: evaluate E

JUMPIF(1) g

Page 232: CSC 415:  Translators and Compilers

Chart 232

While Command

execute [[while i > 0 do i := i – 2]]

– execute [[i := I – 2]]

– execute [[i > 0]]

30: JUMP 35 // JUMP hg: 31: LOAD i 32: LOADL2 33: CALL sub 34: STORE ih: 35: LOAD i 36: LOADL0 37: CALL gt 38: JUMPIF(1) 31 // JUMPIF(1) g

Page 233: CSC 415:  Translators and Compilers

Chart 233

While Command

public Object visitWhileCommand(WhileCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr;

jumpAddr = nextInstrAddr;// saves the next instruction address (g:) to put in JUMP command emit(Machine.JUMPop, 0, Machine.CBr, 0);// puts the JUMP h instruction in obj file loopAddr = nextInstrAddr;// this is address g: ast.C.visit(this, frame);// this generates code for C patch(jumpAddr, nextInstrAddr);// this establishes address h: that was needed in the JUMP h statement ast.E.visit(this, frame);// this generated code for E emit(Machine.JUMPIFop, Machine.trueRep, Machine.CBr, loopAddr);// this generated code to check expression, if false to address g: return null; }

Page 234: CSC 415:  Translators and Compilers

Chart 234

While Command

execute [[while E do C]] =g:execute C

evaluate EJUMPIF(1) g

Page 235: CSC 415:  Translators and Compilers

Chart 235

Repeat Command

execute [[repeat i := i – 2 until i < 0 do ]]

– execute [[i := i – 2]]

– execute [[i > 0]]

g: 31: LOAD i 32: LOADL 2 33: CALL sub 34: STORE i 35: LOAD i 36: LOADL 0 37: CALL lt 38: JUMPIF(0) 31 // JUMPIF(0) g

Page 236: CSC 415:  Translators and Compilers

Chart 236

Repeat Command

public Object visitRepeatCommand(RepeatCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr; // emit(Machine.JUMPop, 0, Machine.CBr, 0); // jumpAddr = nextInstrAddr; loopAddr = nextInstrAddr; ast.C.visit(this, frame);

// patch(jumpAddr, nextInstrAddr); ast.E.visit(this, frame); emit(Machine.JUMPIFop, Machine.falseRep, Machine.CBr, loopAddr); return null; }

Page 237: CSC 415:  Translators and Compilers

Chart 237

Abstract Machine TAMRoutines

Page 238: CSC 415:  Translators and Compilers

Chart 238

Abstract Machine TAMPrimitive Routines

Page 239: CSC 415:  Translators and Compilers

Chart 239

Extend Mini-TriangleV1 , V2 := E1 , E2

This is a simultaneous assignment: both E1 and E2 are to be evaluated, and then their values assigned to the variables V1 and V2, respectivelyevaluate E1

evaluate E2

assign V2

assign V1

Results pushed to top of stackResults pushed to top of stackTop of stack stored in variable V2

Top of stack stored in variable V1

Result E2

STST

Result E1

ST

Result E1ST

Result E1

Result E2V2

ST

Result E2V2

Result E1V1

Page 240: CSC 415:  Translators and Compilers

Chart 240

Extend Mini-TriangleC1 , C2

This is a collateral command: the subcommands C1 and C2 are to be executed in any order chosen by the implementer

execute C1

execute C2

Top of stack unchangedTop of stack unchanged

Page 241: CSC 415:  Translators and Compilers

Chart 241

Extend Mini-Triangleif E then C

This is a conditional command: if E evaluates to true, C is executed, otherwise nothing

evaluate EJUMPIF (0) gexecute C

g:

Results pushed to top of stackJump to g if E evaluates to falseTop of stack unchangedJump location

Page 242: CSC 415:  Translators and Compilers

Chart 242

Extend Mini-Trianglerepeat C until E

This is a loop command: E is evaluated at the end of each iteration (after executing C), and the loop terminates if its value is true

g: execute C evaluate E JUMPIF (0) g

Top of stack unchangedResults pushed to top of stack Jump to g if E evaluates to false

Page 243: CSC 415:  Translators and Compilers

Chart 243

Extend Mini-Trianglerepeat C1 while E do C2

This is a loop command: E is evaluated in the middle of each iteration (after executing C1 but before executing C2), and the loop terminates if its value is false

JUMP hg: execute C2

h: execute C1

evaluate E JUMPIF (1) g

Top of stack unchangedTop of stack unchanged Results pushed to top of stack Jump to g if E evaluates to true

Page 244: CSC 415:  Translators and Compilers

Chart 244

Extend Mini-Triangleif E1 then E2 else E3

This is a conditional expression: if E1 evaluates to true, E2 is evaluated, otherwise E3 is evaluated (E2 and E3 must be of the same type)

evaluate E1

JUMPIF (0) gevaluate E2

JUMP hg: evaluate E3

h:

Results pushed to top of stackJump to g if E evaluates to falseResults pushed to top of stackJump locationResults pushed to top of stack

Page 245: CSC 415:  Translators and Compilers

Chart 245

Extend Mini-Trianglelet D in E

This is a block expression: the declaration D is elaborated, and the resultant bindings are used in the evaluation of E

elaborate Devaluate EPOP (n) s

Expand stack for variables or constantsResults pushed to top of stackPop an n word from stack, pop s more, then

push first n-word back on stackIf s>0where s = amount of storage allocated by D

n = size (type of E)

Page 246: CSC 415:  Translators and Compilers

Chart 246

Extend Mini-Trianglebegin C; yield E end

Here the command C is executed (making side effects), and then E is evaluated

execute Cevaluate E

Top of stack unchangedResults pushed to top of stack

Page 247: CSC 415:  Translators and Compilers

Chart 247

Extend Mini-Trianglefor I from E1 to E2 do C

First the expressions E1 and E2 are evaluated, yielding the integer m and n, respectively. Then the subcommand C is executed repeatedly, with I bound to integers m, m+1, …, n in successive iterations. If m < n, C is not executed at all. The scope of I is C, which may fetch I but may not assign to it.

Page 248: CSC 415:  Translators and Compilers

Chart 248

Extend Mini-Trianglefor I from E1 to E2 do C

evaluate E2

evaluate E1

JUMP hg: execute C

CALL succh: LOAD –1 [ST]

LOAD –3 [ST]CALL leJUMPIF(1) gPOP(0) 2

Compute final valueCompute initial value of I

Top of stack unchangedIncrement current value of IFetch current value of IFetch final valueTest current value <= final valueIf so, repeatDiscard current and final values

At g and at h, the current value of I is at the stack top (at address –1 [ST], and the final value is immediately underlying (at address –2 [ST]

Page 249: CSC 415:  Translators and Compilers

Chart 249

Chapter 8: Interpretation

Interactive Interpretation– Interactive Interpretations of Machine Code– Interactive Interpretation of Command

Languages– Interactive Interpretation of Simple

Programming LanguagesRecursive InterpretationCase Study: The TAM Interpreter

Page 250: CSC 415:  Translators and Compilers

Chart 250

Chapter 9: Conclusion

The Programming Language Life Cycle– Design– Specification– Prototype– Compilers

Error Reporting– Compile-time Error Reporting– Run-time Error Reporting

Efficiency– Compile-time Efficiency– Run-time Efficiency

Page 251: CSC 415:  Translators and Compilers

Chart 251

Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Semantic Analyzer

Identification

Type checking

Page 252: CSC 415:  Translators and Compilers

Chart 252

Programming Language Lifecycle:Concepts

Values Types Storage Bindings Abstractions Encapsulation Polymorphism Exceptions Concurrency

Concepts

Advanced Concepts

Page 253: CSC 415:  Translators and Compilers

Chart 253

Programming Language Lifecycle:Simplicity & Regularity

Strive for simplicity and regularity– Simplicity: support only the concepts essential

to the applications for which language is intended

– Regularity: should combine those concepts in a systematic way, avoiding restrictions that might surprise programmers or make their task more difficult

Page 254: CSC 415:  Translators and Compilers

Chart 254

Design Principles

Type completeness: no operation should be arbitrarily restricted in the types of its operands– Operations like assignment and parameter passing should,

ideally, be applied to all types Abstraction: for phrase that specifies some kind of

computation, should be a way to abstract that phrase and parameterize it– Should be possible to abstract any expression and make it a

function Correspondence: for each form of declaration there

should be corresponding parameter mechanism– Take a block with a constant definition and transform it into as

procedure (or function) with a constant parameter

Page 255: CSC 415:  Translators and Compilers

Chart 255

Programming Language Lifecycle

Design

Specification

Prototype

Compilers

Manuals,textbooks

Page 256: CSC 415:  Translators and Compilers

Chart 256

Specification

Precise specification for language’s syntax and semantics must be written– Informal or formal or hybrid

Syntax

Semantics

Informal Formal

English phrases

English phrases

BNF, EBNF

Axiomatic method (based on mathematical logic)

Page 257: CSC 415:  Translators and Compilers

Chart 257

Prototypes

Cheap, low quality implementation Highlights features of language that are hard to

implement Try out language

– Interpreter might be a ghood prototype– Interpretive compiler

From source to abstract machine code

Page 258: CSC 415:  Translators and Compilers

Chart 258

Compile-Time Error Reporting

Rejecting ill-formed programs Report location of each error with some

explanation Distinguish between the major categories of

compile-time errors:– Syntactic error: missing or unexpected characters or

tokens Indicate what characters or tokens were expected

– Scope error: a violation of the language’s scope rules Indicate which identifier was used twice, or used with

declaration– Type error: a violation of the language’s type rule

Indicate which type rule was violated and/or what type was expected

Page 259: CSC 415:  Translators and Compilers

Chart 259

Run-Time Error Reporting

Common run-time errors– Arithmetic overflow– Division by zero– Out-of-range array indexing

Can be detected only at run-time, because they depend on values computed at run-time

Page 260: CSC 415:  Translators and Compilers

Chart 260

Final Exam Review

Final Exam is comprehensive in that: – Essay questions will cover Chapters 5, 6, 7, 9– Problem oriented questions require knowledge from

the entire semester Exam Structure

– Four questions Two essay questions

– Discuss– Describe

Two problems– Develop code template for new language construct– Determine identification table for given program– Calculate size and address for given type(s)

– Compare & contrast– Evaluate

Page 261: CSC 415:  Translators and Compilers

Chart 261

Final Exam ReviewChapter 5 – Contextual Analysis

Contextual analysis checks that the program conforms to the source language’s contextual constraints– Scope rules– Type rules

Block Structure– Monolithic– Flat– Nested

Type Checking– Literal– Identifier– Unary operator application– Binary operator application

Standard Environment

Page 262: CSC 415:  Translators and Compilers

Chart 262

Final Exam ReviewChapter 6 – Run-Time Organization

Key Issues– Data representation– Expression evaluation– Storage allocation– Routines

Fundamental Principles of Data Representation– Non-confusion: different values of a given type should

have different representation– Uniqueness: each value should always have same

representation

Page 263: CSC 415:  Translators and Compilers

Chart 263

Final Exam ReviewChapter 6 – Run-Time Organization

Types– Primitive types: cannot be decomposed

Boolean Character Integer

– Records– Disjoint unions– Static arrays– Dynamic arrays– Recursive types

For various types be able to determine size (storage required) and address (how to locate)

Page 264: CSC 415:  Translators and Compilers

Chart 264

Final Exam ReviewChapter 6 – Run-Time Organization

Expression Evaluation– Stack machine– Register machine– Static storage allocation

Global variables– Stack storage allocation

Local variables

Page 265: CSC 415:  Translators and Compilers

Chart 265

Final Exam ReviewChapter 7 – Code Generation

Translation of the source program to object code– Dependent on source language and target machine

Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or

a mixture– Single addressing mode, or many

Page 266: CSC 415:  Translators and Compilers

Chart 266

Final Exam ReviewChapter 7 – Code Generation

Code selection: which sequence of target machine instructions will be the object code for each phrase

Storage allocation: deciding the storage address of each variable in source program

Register allocation: should be used to hold intermediate results during expression evaluation

Page 267: CSC 415:  Translators and Compilers

Chart 267

Final Exam ReviewChapter 9 – Programming Language Life-Cycle

Design

Specification

Prototype

Compilers

Manuals,textbooks

Page 268: CSC 415:  Translators and Compilers

Chart 268

Final Exam ReviewChapter 9 – Programming Language Life-Cycle

Strive for simplicity and regularity Design Principles

– Type completeness: no operation should be arbitrarily restricted in the types of its operands

– Abstraction: for phrase that specifies some kind of computation, should be a way to abstract that phrase and parameterize it

– Correspondence: for each form of declaration there should be corresponding parameter mechanism

Specifications Prototype Error Reporting

– Compile-time– Run-time

Page 269: CSC 415:  Translators and Compilers

Chart 269

Final Exam Review Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Semantic Analyzer

Identification

Type checking