Upload
mary
View
61
Download
0
Embed Size (px)
DESCRIPTION
CSC 415: Translators and Compilers. Dr. Chuck Lillie. Course Outline. Major Programming Project Project Definition and Planning Implementation Weekly Status Reports Project Presentation. Translators and Compilers Language Processors Compilation Syntactic Analysis Contextual Analysis - PowerPoint PPT Presentation
Citation preview
CSC 415: Translators and Compilers
Dr. Chuck Lillie
Chart 2
Course Outline
Translators and Compilers– Language Processors– Compilation– Syntactic Analysis– Contextual Analysis– Run-Time Organization– Code Generation– Interpretation
Major Programming Project– Project Definition and
Planning– Implementation
Weekly Status Reports– Project Presentation
Chart 3
Project
Implement a Compiler for the Programming Language Triangle– Appendix B: Informal Specification of the Programming
Language Triangle– Appendix D: Class Diagrams for the Triangle Compiler
Present Project Plan– What and How
Weekly Status Reports– Work accomplished during the reporting period– Deliverable progress, as a percentage of completion– Problem areas– Planned activities for the next reporting period
Chart 4
Chapter 1: Introduction to Programming Languages
Programming Language: A formal notation for expressing algorithms.
Programming Language Processors: Tools to enter, edit, translate, and interpret programs on machines.
Machine Code: Basic machine instructions– Keep track of exact address of each data item and
each instruction– Encode each instruction as a bit string
Assembly Language: Symbolic names for operations, registers, and addresses.
Chart 5
Programming Languages
High Level Languages: Notation similar to familiar mathematical notation– Expressions: +, -, *, /– Data Types: truth variables, characters, integers,
records, arrays– Control Structures: if, case, while, for– Declarations: constant values, variables, procedures,
functions, types– Abstraction: separates what is to be performed from
how it is to be performed– Encapsulation (or data abstraction): group together
related declarations and selectively hide some
Chart 6
Programming LanguagesAny system that manipulates programs
expressed in some particular programming language– Editors: enter, modify, and save program text– Translators and Compilers: Translates text from
one language to another. Compiler translates a program from a high-level language to a low-level language, preparing it to be run on a machine
Checks program for syntactic and contextual errors– Interpreters: Runs program without compliation
Command languagesDatabase query languages
Chart 7
Programming Languages Specifications
Syntax– Form of the program– Defines symbols– How phrases are composed
Contextual constraints– Scope: determine scope of each declaration– Type:
Semantics– Meaning of the program
Chart 8
Representation
Syntax– Backus-Naur Form (BNF): context-free grammar
Terminal symbols (>=, while, ;) Non-terminal symbols (Program, Command, Expression,
Declaration) Start symbol (Program) Production rules (defines how phrases are composed from
terminals and sub-phrases)– N::=a|b|….
– Syntax Tree Used to define language in terms of strings and terminal
symbols
Chart 9
Representation
Semantics– Abstract Syntax
Concentrate on phrase structure alone– Abstract Syntax Tree
Chart 10
Contextual Constraints
Scope– Binding
Static: determined by language processor Dynamic: determined at run-time
– Type Statically: language processor can detect all errors Dynamically: type errors cannot be detected until run-time
Will assume static binding and statically typed
Chart 11
Semantics
Concerned with meaning of program– Behavior when run
Usually specified informally– Declarative sentences– Could include side effects– Correspond to production rules
Chart 12
Chapter 2: Language Processors
Translators and Compilers InterpretersReal and Abstract Machines Interpretive CompilersPortable CompilersBootstrappingCase Study: The Triangle Language
Processor
Chart 13
Translators & Compilers
Translator: a program that accepts any text expressed in one language (the translator’s source language), and generates a semantically-equivalent text expressed in another language (its target language)– Chinese-into-English– Java-into-C– Java-into-x86– X86 assembler
Chart 14
Translators & Compilers
Assembler: translates from an assembly language into the corresponding machine code– Generates one machine code instruction per source
instruction Compiler: translates from a high-level language
into a low-level language– Generates several machine-code instructions per
source command.
Chart 15
Translators & Compilers
Disassembler: translates a machine code into the corresponding assembly language
Decompiler: translates a low-level language into a high-level language
Question: Why would you want a disassembler or decompiler?
Chart 16
Translators & Compilers
Source Program: the source language text Object Program: the target language text
Compiler
ObjectProgram
Syntax Check
Context Constraints
Generate Object CodeSemantic Analysis
SourceProgram
• Object program semantically equivalent to source program If source program is well-formed
Chart 17
Translators & Compilers
Why would you want to do:– Java-into-C translator– C-into-Java translator– Assembly-language-into-Pascal decompiler
Chart 18
Translators & Compilers
M
PL
PL
M
P = Program NameL = Implementation Language
M = Target Machine
For this to work, L must equal M, that is, the implementation language must be the same as the machine language
S TL
S = Source LanguageT = Target LanguageL = Translator’s Implementation LanguageS-into-T Translator is
itself a program that runs on machine L
Chart 19
Translators & Compilers
• Translating a source program P • Expressed in language T, • Using an S-into-T translator • Running on machine M
PS
M
S TM
PT
Chart 20
Translators & Compilers
• Translating a source program sort • Expressed in language Java, • Using an Java-into-x86 translator • Running on an x86 machine
sortJava
x86
Java x86x86
sortx86
The object program is running on the same machine as the compiler
sortx86
x86
Chart 21
Translators & Compilers
sortJava
x86
Java PPCx86
sortPPC
Cross Compiler: The object program is running on a different machine than the compiler
sortPPC
PPC
download
• Translating a source program sort • Expressed in language Java, • Using an Java-into-PPC translator • Running on an x86 machine• Downloaded to a PPC machine
Chart 22
Translators & Compilers
sortJava
x86
Java Cx86
sortC
Two-stage Compiler: The source program is translated to another language before being translated into the object program
sortx86
x86
• Translating a source program sort • Expressed in language Java, • Using an Java-into-C translator • Running on an x86 machine
x86
x86x86
sortx86C
• Then translating the C program• Using an C-into x86 compiler• Running on an x86 machine• Into x86 object program
Chart 23
Translators & Compilers
Translator Rules– Can run on machine M only if it is expressed in
machine code M– Source program must be expressed in translator’s
source language S– Object program is expressed in the translator’s target
language T– Object program is semantically equivalent to the
source program
Chart 24
Interpreters
Accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately– Does not translate the source program into object code
prior to execution
Chart 25
Interpreters
Interpreter
Program Complete
Fetch Instruction
Analyze Instruction
Execute Instruction
SourceProgram
• Source program starts to run as soon as the first instruction is analyzed
Chart 26
Interpreters
When to Use Interpretation– Interactive mode – want to see results of instruction
before entering next instruction– Only use program once– Each instruction expected to be executed only once– Instructions have simple formats
Disadvantages– Slow: up to 100 times slower than in machine code
Chart 27
Interpreters
Examples– Basic– Lisp– Unix Command Language (shell)– SQL
Chart 28
Interpreters
SL S interpreter expressed in language L
SM
PS
M
Program P expressed in language S, using Interpreter S, running on machine M
Basicx86
graphBasic
x86
Program graph written in Basic running on a Basic interpreter executed on an x86 machine
Chart 29
Real and Abstract Machines
Hardware emulation: Using software to execute one set of machine code on another machine– Can measure everything about the new machine
except its speed– Abstract machine: emulator– Real machine: actual hardware
An abstract machine is functionally equivalent to a real machine if they both implement the same language L
Chart 30
Real and Abstract Machines
nmiC
M
C MM
New Machine Instruction (nmi) interpreter written in C
nmiC
nmiM
The nmi interpreter is translated into machine code M using the C compiler
Compiler to translate C program into M machine code
nmi interpreter written in C nmi interpreter expressed in machine code M
nmiM
Pnmi
M
Pnmi
nmi
Chart 31
Interpretive Compilers
Combination of compiler and interpreter– Translate source program into an intermediate
language– It is intermediate in level between the source language
and ordinary machine code– Its instructions have simple formats, and therefore can
be analyzed easily and quickly– Translation from the source language into the
intermediate language is easy and fast
An interpretive compiles combines fast compilation with tolerable running speed
Chart 32
Interpretive Compilers
Java JVM
M
JVMM
Java into JVM translator running on machine M
JVM code interpreter running on machine M
Java JVM
M
PJava
PJVM
M
PJVM
M
JVMM
A Java program P is first translated into JVM-code, and then the JVM-code object program is interpreted
Chart 33
Portable Compilers
A program is portable if it can be compiled and run on any machine, without change– A portable program is more valuable than an
unportable one, because its development cost can be spread over more copies
– Portability is measured by the proportion of code that remains unchanged when it is moved to a dissimilar machine
Language affects protability– Assembly language: 0% portable– High level language: approaches 100% portability
Chart 34
Portable Compilers
Language Processors– Valuable and widely used programs– Typically written in high-level language
Pascal, C, Java– Part of language processor is machine dependent
Code generation part Language processor is only about 50% portable
– Compiler that generates intermediate code is more portable than a compiler that generates machine code
Chart 35
Portable Compilers
Java JVM
JavaJVMJava
Java JVM
JVM
PJava
PJVM
M
PJVM
M
JVMM
JVMC
Java JVM
JVM
Rewrite interpreter in C
C M
M
M
JVMC
JVMM
JVMM
Note: C M Compiler exists; rewrite JVM interpreter from Java to C
Chart 36
Bootstrapping
The language processor is used to process itself– Implementation language is the source language
Bootstrapping a portable compiler– A portable compiler can be bootstrapped to make a true compiler
– one that generates machine code – by writing an intermediate-language-into-machine-code translator
Full bootstrap– Writing the compiler in itself– Using the latest version to upgrade the next version
Half bootstrap– Compiler expressed in itself but targeted for another machine
Bootstrapping to improve efficiency– Upgrade the compiler to optomize code generation as well as to
improve compile efficiency
Chart 37
Bootstrapping
Bootstrap an interpretive compiler to generate machine code
Java M
Java
M
JVMM
Java M
Java Java JVM
JVM
Java M
JVM
M
JVMM
JVM M
JVM JVM M
JVM
Java M
M
M
Java JVM
JVM JVM M
M
Java JVM
M Java JVM
M
M
JVM M
M
M
PJava
PJVM
PM
First, write a JVM-coded-into-M translator in Java
Next, compile translator using existing interpreter
Use translator to translate itself
Translate Java-into-JVM-code translator into machine code
Two stage Java-into-M compiler
Chart 38
Bootstrapping
Full bootstrapAda-S M
C
v1
Ada-S M
C C M
M
Ada-S M
M
M
v1 v1
Ada-S M
Ada-S
v2
Ada-S M
Ada-S Ada-S M
M
Ada-S M
M
M
v2 v2
v1 Ada M
Ada-S Ada-S M
M
Ada M
M
M
v3 v3
v2
Ada M
Ada-S
v3
Extend Ada-S compiler to (full) Ada compiler
Convert the C version of Ada-S into Ada-S version of Ada-S
Write Ada-S compiler in C
Chart 39
Bootstrapping
Half bootstrapAda HM
Ada
Ada HM
HM
Ada TM
Ada
Ada TM
Ada Ada HM
HM
HM
Ada TM
HM
PAda Ada TM
HM
PTM
PTM
TM
Ada TM
Ada Ada TM
HM
HM
Ada TM
TM
Chart 40
Bootstrapping
Bootstrap to improve efficiencyAda Ms
Ms
v1
Ada Ms
Ada
v1
Ada Mf
Ada
v2
Ada Mf
Ada
v2
Ada Ms
Ms
v1 Ada Mf
Ms
v2
M
Ada Mf
Ms
v2PAda
Chart 41
Chapter 3: Compilation
Phases– Syntactic Analysis– Contextual Analysis– Code Generation
Passes– Multi-pass Compilation– One-pass Compilation– Compiler Design Issues
Case Study: The Triangle Compiler
Chart 42
Phases
Syntactic Analysis– The source program is parsed to check whether it
conforms to the source language’s syntax, and to determine its phrase structure
Contextual Analysis– The parsed program is analyzed to check whether it
conforms to the source language's contextual constraints
Code Generation– The checked program is translated to an object
program, in accordance with the semantics of the source and target languages
Chart 43
Phases
Syntactic Analysis
Contextual Analysis
Code Generation
Object Program
Decorated AST
AST
Source Program
Error Report
Error Report
Chart 44
Syntactic Analysis
To determine the source program’s phrase structure– Parsing– Contextual analysis and code generation must know how the
program is composed Commands, expressions, declarations, …
– Check for conformance to the source language’s syntax– Construct suitable representation of its phrase structure (AST)
AST– Terminal nodes corresponding to identifiers, literals, and operators– Sub trees representing the phases of the source program– Blanks and comments not in AST (no meaning)– Punctuation and brackets not in AST (only separate and enclose)
Chart 45
Contextual Analysis
Analyzes the parsed program– Scope rules– Type rules
Produces decorated AST– AST with information gathered during contextual
analysis– Each applied occurrence of an identifier is linked ot the
corresponding declaration– Each expression is decorated by its type T
Chart 46
Code Generation
The final translation of the checked program to an object program– After syntactic and contextual analysis is completed
Treatment of identifiers– Constants
Binds identifier to value Replace each occurrence of identifier with value
– Variables Binds identifier to some memory address Replace each occurrence of identifier by address
Target language– Assembly language– Machine code
Chart 47
Passes
Multi-pass compilation– Traverses the program or AST several times
One-pass compilation– Single traverse of program– Contextual analysis and code
generation are performed ‘on the fly’ during syntactic analysis
Compiler Driver
Syntactic Analyzer
Contextual Analyzer
Code Generator
Compiler Driver
Syntactic Analyzer
Contextual Analyzer
Code Generator
Chart 48
Compiler Design Issues
Speed– Compiler run time
Space– Storage: size of compiler + files generated
Modularity– Multi-pass compiler more modular than one-pass compiler
Flexibility– Multi-pass compiler is more flexible because it generates an AST
that can be traversed in any order by the other phases Semantics-preserving transformations
– To optimize code – must have multi-pass compiler Source language properties
– May restrict compiler choice – some language constructs may require multi-pass compilers
Chart 49
Chapter 4: Syntactic Analysis
Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle
Compiler
Chart 50
Structure of a Compiler
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 51
Syntactic Analysis
Main function– Parse source program to discover its phrase structure– Recursive-descent parsing– Constructing an AST– Scanning to group characters into tokens
Chart 52
Sub-phases of Syntactic Analysis
Scanning (or lexical analysis)– Source program transformed to a stream of tokens
Identifiers Literals Operators Keywords Punctuation
– Comments and blank spaces discarded Parsing
– To determine the source programs phrase structure– Source program is input as a stream of tokens (from the Scanner)– Treats each token as a terminal symbol
Representation of phrase structure– AST
Chart 53
Lexical Analysis – A Simple Example
Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments
Some tokens for this example:main(){inta,b,c;
Main() {int a, b, c;char number[5];
/* get user inputs */A = atoi ( gets(number));B = atoi (gets(number));
/* calculate value for c */C = 2*(a+b) + a*(a+b);
/* print results */Printf(“%d”,c);
}
Chart 54
Creating Tokens – Mini-Triangle Example
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t S v a r y : I n t e g e r i n . . . .S S S
(S= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.:=
becomes
y
Ident.
+
op.1
Intlit.
eot
Chart 55
Tokens in Triangle
// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2, "<identifier>", OPERATOR = 3, "<operator>",
// reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",
// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,
// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",
// special tokens... EOT = 33, "", ERROR = 34; "<error>"
Chart 56
Grammars Revisited
Context free grammars– Generates a set of sentences– Each sentence is a string of terminal symbols– An unambiguous sentence has a unique phrase
structure embodied in its syntax tree Develop parsers from context-free grammars
Chart 57
Regular Expressions
A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols
Main features– ‘|’ separates alternatives– ‘*’ indicates that the previous item may be represented
zero or more times– ‘(‘ and ‘)’ are grouping parentheses
Chart 58
Regular Expression Basics
e The empty string a special string of length 0 Regular expression operations
– | separates alternatives– * indicates that the previous item may be represented
zero or more times (repetition)– ( and ) are grouping parentheses
Chart 59
Regular Expression Basics
Algebraic Properties– | is commutative and associative
r|s = s|r r|(s|t) = (r|s)|t
– Concatenation is associative (rs)t = r(st)
– Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr
– e is the identity for concatenatione r = r r e = r
– * is idempotent r** = r* r* = (r| e)*
Chart 60
Regular Expression Basics
Common Extensions– r+ one or more of expression r, same as rr*– rk k repetitions of r
r3 = rrr– ~r the characters not in the expression r
~[\t\n]– r-z range of characters
[0-9a-z]– r? Zero or one copy of expression (used for fields
of an expression that are optional)
Chart 61
Regular Expression Example
Regular Expression for Representing Months– Examples of legal inputs
January represented as 1 or 01 October represented as 10
– First Try: [0|1|e][0-9] Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes
0, 00, 18
Chart 62
Regular Expression Example
Regular Expression for Representing Months– Examples of legal inputs
January represented as 1 or 01 October represented as 10
– Second Try: [1-9]|(0[1-9])|(1[0-2]) Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No
Chart 63
Regular Expression Example
Regular Expression for Floating Point Numbers– Examples of legal inputs
1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6 Assume that a 0 is required before numbers less than 1 and
does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal
– Building the regular expression Assume
Digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159
Digit+.digit+ Add an optional sign (only minus, no plus)
– (-| e)digit+.digit+ or -?digit+.digit+
Chart 64
Regular Expression Example
Regular Expression for Floating Point Numbers (cont.)– Building the regular expression (cont.)
Format for the exponent(E|e)(+|-)?(digit+)
Adding it as an optional expression to the decimal part
(-| e)digit+.digit+((E|e)(+|-)?(digit+))?
Chart 65
Extended BNF
Extended BNF (EBNF)– Combination of BNF and RE– N::=X, where N is a nonterminal symbol and X is an
extended RE, i.e., an RE constructed from both terminal and nonterminal symbols
– EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal
symbols
Chart 66
Example EBNF
Expression ::= primary-Expression (Operator primary-Expression)*
Primary-Expression ::= Identifier| ( Expression )
Identifier ::= a|b|c|d|e
Operator ::= +|-|*|/
Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))
Chart 67
Grammar Transformations
Left FactorizationXY | XZ is equivalent to X(Y | Z)
single-Command ::= V-name := Expression| if Expression then single-
Command| if Expression then single-
Commandelse single-Command
single-Command ::= V-name := Expression| if Expression then single-
Command(e |else single-Command)
Chart 68
Grammar Transformations
Elimination of left recursionN::= X | NY is equivalent to N::=X(Y)*
Identifier ::= Letter| Identifier Letter| Identifier Digit
Identifier ::= Letter| Identifier (Letter | Digit)
Identifier ::= Letter(Letter | Digit)*
Chart 69
Grammar Transformations
Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X
iff N::=X is nonrecursive and is the only production rule for N
single-Command ::= for Control-Variable := Expression To-or-DowntoExpression do single-Command
| …Control-Variable ::= IdentifierTo-or-Downto ::= to
| down
single-Command ::= for Identifier := Expression (to|downto)Expression do single-Command
| …
Chart 70
Scanning (Lexical Analysis)
The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.
Difference between parsing and scanning:– Parsing groups terminal symbols, which are tokens,
into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure
– Scanning groups individual characters into tokens
Chart 71
Structure of a Compiler
Lexical Analyzer
Parser & Semantic Analyzer
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Chart 72
Creating Tokens – Mini-Triangle Example
let var y: Integerin !new year y := y+1
InputConverter
Buffer
Scanner
l e t S v a r y : I n t e g e r i n . . . .S S S
(S= space)character string
let
let
var
var
y
Ident.
:
colonInteger
Ident.
in
in
y
Ident.:=
becomes
y
Ident.
+
op.1
Intlit.
eot
Chart 73
What Does a Scanner Do?
Hand keywords (reserve words)– Recognizes identifiers and keywords– Match explicitly
Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword
– Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded
keyword table
How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.
Chart 74
What Does a Scanner Do?
Remove white space– Tabs, spaces, new lines
Remove comments– Single line
-- Ada comment– Multi-line, start and end delimiters
{ Pascal comment }/* c comment */
– Nested– Runaway comments
Nonterminated comments can’t be detected till end of file
Chart 75
What Does a Scanner Do?
Perform look ahead– Multi-character tokens
1..10 vs. 1.10&, &&<, <=etc
Challenging input languages– FORTRAN
Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal)
DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I
Chart 76
What Does a Scanner Do?
Challenging input languages (cont.)– PL/I, keywords not reserved
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
Chart 77
What Does a Scanner Do?
Error Handling– Error token passed to parser which reports the error– Recovery
Delete characters from current token which have been read so far, restart scanning at next unread character
Delete the first character of the current lexeme and resume scanning form next character.
– Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character
– Some errors that are not lexical errors Mistyped keywords
– Begim Mismatched parenthesis Undeclared variables
Chart 78
Scanner Implementation
Issues– Simpler design – parser doesn’t have to worry about
white space, etc.– Improve compiler efficiency – allows the construction of
a specialized and potentially more efficient processor– Compiler portability is enhanced – input alphabet
peculiarities and other device-specific anomalies can be restricted to the scanner
Chart 79
Scanner Implementation
What are the keywords in Triangle? How are keywords and identifiers implemented in
Triangles? Is look ahead implemented in Triangle?
– If so, how?
Chart 80
Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Chart 81
Parsing
Given an unambiguous, context free grammar, parsing is– Recognition of an input string, i.e., deciding whether or
not the input string is a sentence of the grammar– Parsing of an input string, i.e., recognition of the input
string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.
Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.
Chart 82
Parsing
The syntax of programming language constructs are described by context-free grammars.
Advantages of unambiguous, context-free grammars– A precise, yet easy-to understand, syntactic specification of
the programming language– For certain classes of grammars we can automatically
construct an efficient parser that determines if a source program is syntactically well formed.
– Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.
– Easier to add new constructs to the language if the implementation is based on a grammatical description of the language
Chart 83
Parsing
Check the syntax (structure) of a program and create a tree representation of the program
Programming languages have non-regular constructs– Nesting– Recursion
Context-free grammars are used to express the syntax for programming languages
sequence of tokens parser syntax tree
Chart 84
Context-Free Grammars
Comprised of– A set of tokens or terminal symbols– A set of non-terminal symbols– A set of rules or productions which express the legal
relationships between symbols– A start or goal symbol
Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr
Chart 85
Context-Free Grammars
1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr
expr
expr digit
digit
digit
3
2
8
+
-
Chart 86
Checking for Correct Syntax
Given a grammar for a language and a program, how do you know if the syntax of the program is legal?
A legal program can be derived from the start symbol of the grammar
Grammar must be unambiguous and context-free
Chart 87
Deriving a String
The derivation begins with the start symbol At each step of a derivation the right hand side of a
grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal
symbols remain
1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2
expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4
Chart 88
Rightmost Derivation
The rightmost non-terminal is replaced in each step
1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
expr + digit - 2 expr + 8-2
expr + 8-2 digit + 8-2Rule 3
expr expr – digitRule 1
expr – digit expr – 2Rule 4
expr – 2 expr + digit - 2Rule 2
Rule 4
digit + 8-2 3+8 -2Rule 4
Chart 89
Leftmost Derivation
The leftmost non-terminal is replaced in each step
1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
Chart 90
Leftmost Derivation
The leftmost non-terminal is replaced in each step
digit + digit – digit 3 + digit – digit
3 + digit – digit 3 + 8 – digitRule 4
expr expr – digitRule 1
expr – digit expr + digit – digitRule 2
expr + digit – digit digit + digit – digitRule 3
Rule 4
3 + 8 – digit 3 + 8 – 2Rule 4
expr
expr
expr digit
digit
digit
3
2
8
+
-
3
2
1
4
5
6
1
2
3
4
5
6
Chart 91
Bottom-Up Parsing
Parser examines terminal symbols of the input string, in order from left to right
Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)
Bottom-up parsing reduces a string w to the start symbol of the grammar.– At each reduction step a particular sub-string matching
the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
Chart 92
Bottom-Up Parsing
Types of bottom-up parsing algorithms– Shift-reduce parsing
At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
– LR(k) parsing L is for left-to-right scanning of the input, the R is for
constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.
Chart 93
Bottom-Up Parsing Example3+8-2
1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9
Example input: 3 + 8 - 2
3 + 8 - 2
3 + 8 - 2
digit
3 + 8 - 2
digitdigit
3 + 8 - 2
digitdigitexpr
Chart 94
Bottom-Up Parsing Example3+8-2
3 + 8 - 2
digitdigitexpr
3 + 8 - 2
digitdigitexpr
digit
expr
3 + 8 - 2
digitdigitexpr
digit
Chart 95
Bottom-Up Parsing Exampleabbcde
a b b c d1. S aABe2. A Abc | b3. B d
Example input: abbcde
e
a b b c d e
A
a b b c d e
A
Abbcde aAbcde
aAbcde
Chart 96
Bottom-Up Parsing Exampleabbcde
1. S aABe2. A Abc | b3. B d
Example input: abbcde
a b b c d e
A
A
a b b c d e
A
A
aAde
aAbcde aAde
Chart 97
Bottom-Up Parsing Exampleabbcde
1. S aABe2. A Abc | b3. B d
Example input: abbcde
a b b c d e
A
A
aAde aABe
B
a b b c d e
A
A
aABe
B
Chart 98
Bottom-Up Parsing Exampleabbcde
1. S aABe2. A Abc | b3. B d
Example input: abbcde
a b b c d e
A
A
aABe S
B
S
Chart 99
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
the cat sees a rat
Noun
the cat sees a rat. the Noun sees a rat.
the cat sees a rat
Noun
the Noun sees a rat.
.
.
.
Chart 100
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
the Noun sees a rat. Subject sees a rat.
Subject
the cat sees a rat
Noun
Subject sees a rat.
Subject
.
.
Chart 101
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject sees a rat. Subject Verb a rat.
Subject
Verb
.
the cat sees a rat
Noun
Subject Verb a rat.
Subject
Verb
.
Chart 102
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject Verb a rat. Subject Verb a Noun.
Subject
Verb
.
Noun
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun.
Chart 103
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb a Noun. Subject Verb Object.
Object
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb Object.
ObjectWhat would happened if we
choose ‘Subject a Noun’ instead of
‘Object a Noun’?
Chart 104
Bottom-Up Parsing Examplethe cat sees a rat.
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
the cat sees a rat
Noun
Subject
Verb
.
Noun
Subject Verb Object.
Object
Sentence
Chart 105
Top-Down Parsing
The parser examines the terminal symbols of the input string, in order from left to right.
The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).
An attempt to find the leftmost derivation for an input string
Chart 106
Top-Down Parsers
General rules for top-down parsers– Start with just a stub for the root node– At each step the parser takes the left most stub– If the stub is labeled by terminal symbol t, the parser
connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)
– If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).
– Parsing succeeds when and if the whole input string is connected up to the syntax tree.
Chart 107
Top-Down Parsing
Two forms– Backtracking parsers
Guesses which rule to apply, back up, and changes choices if it can not proceed
– Predictive Parsers Predicts which rule to apply by using look-ahead tokens
Backtracking parsers are not very efficient. We will cover Predictive parsers
Chart 108
Predictive Parsers
Many types– LL(1) parsing
First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead
Table driven with an explicit stack to maintain the parse tree– Recursive decent parsing
Uses recursive subroutines to traverse the parse tree
Chart 109
Predictive Parsers (Lookahead)
Lookahead in predictive parsing– The lookahead token (next token in the input) is used
to determine which rule should be used next– For example:
1. term num term’2. term’ ‘+’ num term’ |
‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
term
num term’
Chart 110
Predictive Parsers (Lookahead)
1. term num term’2. term’ ‘+’ num term’ |
‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7
+
term
num term’
3
term’num7
+
term
num term’
3
- num term’
Chart 111
Predictive Parsers (Lookahead)
1. term num term’2. term’ ‘+’ num term’ |
‘-’ num term’ | e– num ‘0’|’1’|’2’|…|’9’
Example input: 7 + 3 - 2
term’num7 +
term
num term’
3 - num term’
2
term’num7 +
term
num term’
3 - num term’
2 e
Chart 112
Recursive-Decent Parsing
Top-down parsing algorithm– Consists of a group of methods (programs) parseN,
one for each nonterminal symbol N of the grammar.– The task of each method parseN is to parse a single
N-phrase– These parsing methods cooperate to parse complete
sentences
Chart 113
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
.a. Decide which production rule to apply. Only one, #1.This step created four stubs.
Chart 114
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 115
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 116
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun
Chart 117
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 118
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 119
Recursive-Decent Parsing
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Example input: the cat sees a rat
Sentence
Subject Verb Object .
the cat sees a rat
Noun Noun
Chart 120
Recursive-Descent Parser for Micro-English
ParseSentenceParseSubjectParseObjectParseVerbParseNoun
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Chart 121
Recursive-Descent Parser for Micro-English
ParseSentenceparseSubjectparseVerbparseObjectparseEnd
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Sentence
Subject
Verb
Object
.
Chart 122
Recursive-Descent Parser for Micro-English
ParseSubjectif input = “I”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”acceptparseNoun
else error
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Subject I
|
Noun
a
|
Noun
the
Chart 123
Recursive-Descent Parser for Micro-English
ParseNounif input = “cat”
acceptelse if input =“mat”
acceptelse if input = “rat”
acceptelse error
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Noun cat
| mat
| rat
Chart 124
Recursive-Descent Parser for Micro-English
ParseObjectif input = “me”
acceptelse if input =“a”
acceptparseNoun
else if input = “the”acceptparseNoun
else error
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Object me
|
Noun
a
|
Noun
the
Chart 125
Recursive-Descent Parser for Micro-English
ParseVerbif input = “like”
acceptelse if input =“is”
acceptelse if input = “see”
acceptelse if input = “sees”
accept else error
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
Verb like
| is
| see
| sees
Chart 126
Recursive-Descent Parser for Micro-English
ParseEndif input = “.”
acceptelse error
1. Sentence Subject Verb Object.2. Subject I | a Noun | the Noun3. Object me | a Noun | the Noun 4. Noun cat | mat | rat5. Verb like | is | see | sees
.
Chart 127
Systematic Development of a Recursive-Descent Parser
Given a (suitable) context-free grammar– Express the grammar in EBNF, with a single production rule for
each nonterminal symbol, and perform any necessary grammar transformations
Always eliminate left recursion Always left-factorize whenever possible
– Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X
– Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the
scanner A public parse method that calls parseS, where S is the start symbol
of the grammar), having first called the scanner to store the first input token in currentToken
Chart 128
Quote of the Week
“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”– Bjarne Stroustrup
Chart 129
Quote of the WeekDid you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.
Chart 130
Converting EBNF Production Rules to Parsing Methods
For production rule N::=X– Convert production rule to parsing method named parseN
Private void parseN () { Parse X }
– Refine parseE to a dummy statement– Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()– Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing
methodparseN()
– Refine parse X Y to{parseXparseY}}
– Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]
Parse XBreak;
Cases in starters[[Y]]:Parse YBreak
Default:Report a syntax error
}
Chart 131
Converting EBNF Production Rules to Parsing Methods
For X | Y – Choose parse X only if the current token is one that
can start an X-phrase– Choose parse Y only if the current token is one that
can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint
For X*– Choose
while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can
follow X* in this particular context
Chart 132
Converting EBNF Production Rules to Parsing Methods
A grammar that satisfies both these conditions is called an LL(1) grammar
Recursive-descent parsing is suitable only for LL(1) grammars
Chart 133
Error Repair
Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.
Error repair usually occurs at two levels:– Local: repairs mistakes with little global import, such as
missing semicolons and undeclared variables.– Scope: repairs the program text so that scopes are
correct. Errors of this kind include unbalanced parentheses and begin/end blocks.
Chart 134
Error Repair
Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:– No input should cause the compiler to collapse– Illegal constructs are flagged– Frequently occurring errors are repaired gracefully– Minimal stuttering or cascading of errors.
LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input
Chart 135
Mini-Triangle Production RulesProgram ::= Command Program (1.14)
Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand(1.15b)
| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)
else Command| while Expression do Command WhileCommand
(1.15e| let Declaration in Command LetCommand
(1.15f)
Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression BinaryExpressioiun
(1.16d)
V-name ::= Identifier SimpelVname (1.17)
Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)| var Identifier : Typoe-denoter VarDeclaration
(1.18b)| Declaration ; Declaration SequentialDeclaration
(1.18c)
Type-denoter ::= Identifier SimpleTypeDenoter (1.19)
Chart 136
Abstract Syntax Trees
An explicit representation of the source program’s phrase structure
AST for Mini-Triangle
Chart 137
Abstract Syntax Trees
Program ASTs (P):
Program
C
Program ::= Command Program (1.14
Command ASTs (C):
AssignCommand
V E
CallCommand
Identifier E
spelling
SequentialCommand
C1 C2
Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b)
| Command ; Command SequentialCommand (1.15c)
(1.15a)(1.15b) (1.15c)
Chart 138
Abstract Syntax Trees
Command ASTs (C):
WhileCommand
V E
SequentialCommand
C1 C2(1.15e)(1.15d)
LetCommand
D C(1.15f) E
Command ::= | if Expression then Command IfCommand (15.d)else Command
| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)
Chart 139
Midterm Review: Chapter 1
Context-free Grammar– A finite set of terminal symbols– A finite set of non-terminal symbols– A start symbol– A finite se to production rules
Aspects of a programming language that need to be specified– Syntax: form of programs– Contextual constraints: scope rules and type
variables– Semantics: meaning of programs
Chart 140
Midterm Review: Chapter 1
Language specification– Informal: written in English– Formal: precise notation (BNF, EBNF)
UnambiguousConsistentComplete
Context-free language– Syntax tree– Phrase– Sentence
Chart 141
Midterm Review: Chapter 1
Syntax tree– Terminal node labeled by terminal symbol– Non-terminal nodes labeled b y non-terminal symbol
Abstract Syntax Tree (AST)– Each non-terminal node ius labeled by production rule– Each non-terminal node has exactly one subtree for
each subprogram– Does not generate sentences
Chart 142
Midterm Review: Chapter 2
Translator– Accepts any text expressed in one language (source
language) and generates a semantically-equivalent text expressed in another language (target language)
Compiler– Translates from high-level language into low-level
language Interpreter
– A program that accepts any program (source program) expressed in a particular language (source language) and runs that source program immediately
Chart 143
Midterm Review: Chapter 2
Interpretive compiler– Combination of compiler and interpreter
Some of the advantages of each Portable compiler
– Compiled and run on any mainline, without change– Portability measured by proportion of code that
remains unchanged– Portability is an economic issue
Bootstrapping– Using the language processor to process itself
Tombstone diagrams
Chart 144
Midterm Review: Chapter 3 Three phases of compilation
– Syntactic analysis– Contextual analysis– Code generation
Single pass compilers Multi-pass compilers Compiler design issues
– Speed– Space– Modularity– Flexibility– Semantic preserving transformations– Source language properties
Chart 145
Midterm Review: Chapter 4
Sub-phases of syntactic analysis– Scanning (lexical analysis)
Source program transformed to stream of tokens Comments and blank spaces between tokens are discarded
– Parsing Source program in form of stream of tokens parsed to
determine phrase structure Parser treats each token as a terminal symbol
– Representation of the phrase structure A data structure representing the source program’s phrase
structure Typically an abstract syntax tree (AST)
Chart 146
Midterm Review: Chapter 4
Tokens– An atomic symbol of the source program– May consist of several characters– Classified according to kind
All tokens of the same kind can be freely interchanged without affecting the program’s phrase structure
– Each token completely described by it’s kind and spelling
Token represented by tuple– Only kind of each token examined by parser
Spelling examined by contextual analyzer and/or code generator
Chart 147
Midterm Review: Chapter 4
Grammars– Regular expressions
“|” separates alternatives “*” indicates that the previous item may be repeated zero or
more times “(“ and “)” are grouping parenthesis e is the empty string
– a special string of length 0 Algebraic properties Common extensions
– Grammar transformations Left factorization Elimination of left recursion Substitution of non-terminal symbols
Chart 148
Midterm Review: Chapter 4
Structure of compiler– Source code– Lexical analyzer– Parser & semantic analyzer– Intermediate code generation– Optimization– Assembly code generation– Assembly code
Chart 149
Midterm Review: Chapter 4
Scanning (lexical analysis)– What does it do?
Handles keywords (reserve wordsRemoves white space (tabs, spaces, new lines)Removes commentsPerform look aheadError handling
– IssuesSimpler designImprove compiler efficiencyEnhance compiler portability
Chart 150
Midterm Review: Chapter 4
Parsing– Given an unambiguous, context-free grammar
Recognition of input string – sentence in grammarParsing an input string – determines its phrase
structure– Why is unambiguous important?– Advantages of unambiguous, context-free
grammars (see chart 81)– How do you know the syntax of a language is
legal?A legal program can be derived from the start
symbol of the grammar
Chart 151
Midterm Review: Chapter 4
Parsing– Rightmost (replace rightmost non-terminal in
each step) and leftmost (replaced leftmost non-terminal in each step) derivation
– Bottom-up (reconstructs syntax tree from terminal nodes up toward the root node) and top-down (reconstructs syntax tree from the root node down towards the terminal nodes)
– Predictive parsersLL(1)Recursive decent
Chart 152
Midterm Review: Chapter 4
Parsing– Converting EBNF production rules to parsing
methods– Error repair
Chart 153
Chapter 5: Contextual Analysis
Identification– Monolithic Block Structure– Flat Block Structure– Nested Block Structure– Attributes– Standard Environment
Type Checking A Contextual Analysis Algorithm Case Study: Contextual Analysis in the Triangle
Compiler
Chart 154
Contextual Analysis
Given a parsed program, the purpose of contextual analysis is to check that the program conforms to the source language’s contextual constraints.– Scope rules: rules governing declarations and applied
occurrences of identifiers– Type rules: rules that allow us t0 infer the types of
expressions, and to decide whether each expression has a valid type
Analysis of the program to determine correctness with respect to the language definition (beyond structure)
Chart 155
Contextual Analysis
Contextual analysis consists of two sub-phases:– Identification: applying the source language’s scope
rules to relate each applied occurrence of an identifier to its declaration (if any).
– Type checking: applying the source language's type rules to infer the type of each expression, and compare that type with the expected type.
Chart 156
Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Semantic Analyzer
Identification
Type checking
Chart 157
Identification
Relate each applied occurrence of an identifier in the source program to the corresponding declaration– Ill-formed program if no corresponding declaration –
generate error Identification could cause compiler efficiency
problems Inefficient to use the AST
Chart 158
Identification Table
Also known as symbol table Associates identifiers with their attributes Basic operation
– Make the identification table empty– Add an entry associating a given identifier with a given attribute– Retrieve the attribute (if any) associated with a given identifier
Attribute– Consists of information relevant to contextual analysis– Obtained from the identifier’s declaration
Chart 159
Identification Table
Each declaration in a program has a defined scope– Portion of program over which the declaration takes
effect Block: any program phase that delimits the scope
of declarations within it Example Triangle block command
– Let D in C Scope of each declaration in D extends over the subcommand
C
Chart 160
Identification Table: Structure/Implementation
Maintain scope– An identifier should be found in the table only when
valid– If an identifier is defined in multiple scopes, then a
lookup in the table must provide the appropriate meaning for the use
Efficiency– How fast is lookup?– How fast to enter/exit a scope?– What is the overall table size?
Chart 161
Identification Table: Structure/Implementation
Different implementations– Organized for efficient retrieval– Binary search tree– Hash table
Chart 162
Identification Table: Functionality
A mapping of identifiers to their meanings
Information– Name– Type– Location
Operations– Create– Insert– Lookup– Delete– Update entry– Entering a new
scope– Leaving a scope
Chart 163
Block Structures
Monolithic block structure– Basic and Cobol
Flat block structure– Fortran
Nested block structure– Pascal, Ada, C, and Java
Chart 164
Monolithic Block Structure
The only block is the entire program All declarations are global Simple rules
– No identifier may be declared more than once– For every applied occurrence of an identifier I, there must be a
corresponding declaration of I No identifier may be used unless declared
The identification table should contain entries for all declarations in the source program– At most, one entry for each identifier– The table contains an identifier I and the associated attribute A
Chart 165
Monolithic Block Structure
Identification Attribute
b
n
c
(1)
(2)
(3)
Program(1) integer b = 10(2) integer n(3) char C
begin…n = n * b…Write c…
end
• Create new tablecreate command
• At declaration for identifier I, make table entry
insert command• At applied occurrence of identifier I, retrieve
information from tablelookup command
Chart 166
Flat Block Structure
Program partitioned into several disjoint blocks Two scope levels
– Some declarations are local in scope Identifiers restricted to particular block
– Other declarations are global in scope Identifiers allowed anywhere in the program – the program as a whole
is a block Less simple rules
– No global declared identifier may be redeclared globally But same identifier may also be declared locally
– No locally declared identifier may be redeclared in the same block Same identifier may be declared locally in several different blocks
– For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I
Either global declaration of I or a declaration of I local to B
Minor complication is to distinguish global and local declaration entries
Chart 167
Flat Block Structure
• Create new tablecreate command
• At start of a blockenter new scope command
• At end of a blockleave scope commanddelete command
• At declaration for identifier I, make table entry
insert command• At applied occurrence of identifier I,
retrieve information from tablelookup command
(5) integer cbegin
…end
(4) procedure R
(2) real r(3) real pi = 3.14begin
…end
(1) procedure Q
(6) integer i(7) boolean b(8)char cbegin
…call R…
end
program
Identification Attribute
Q
r
pi
(1)
(2)
(3)
Level
global
local
local
Identification Attribute
Q
R
c
(1)
(4)
(5)
Level
global
global
local
Identification Attribute
Q
R
(1)
(4)
Level
global
global
Identification Attribute
Q
R
i
(1)
(4)
(6)
Level
global
global
local
local
local
b
c
(7)
(8)
Chart 168
Nested Block Structure
Blocks may be nested one within another Many scope levels
– Declarations in the outermost block are global in scope.
The outermost block is at scope level 1– Declarations inside an inner block are local to that
block Every inner block is completely enclosed by another block Next to outermost block is at scope level 2 If enclosed by a level-n, the block is at scope level n+1
Chart 169
Nested Block Structure
More complex rules– No identifier may be declared more than once in the
same block Same identifier may be declared in different blocks, even if
they are nested– For every applied occurrence of an identifier I in a
block B, there must be a corresponding declaration of I Must be in B itself Or in the block B’ immediately enclosing B Or in B’’ immediately enclosing B’ Etc.In smallest enclosing block that contains any declaration of I
Chart 170
Nested Block Structure• Create new table
create command• At start of a block
enter new scope command• At end of a block
leave scope commanddelete command
• At declaration for identifier I, make table entry
insert commandLevel number determined by number of calls to enter new scope
• At applied occurrence of identifier I, retrieve information from table using highest level for I
lookup command
Let(1) var a: Integer;(2) var b: BooleanIn
begin…;
Identification Attribute
a
b
(1)
(2)
Level
1
1
Identification Attribute
a
b
b
(1)
(2)
(3)
Level
1
1
2
2
3
c
d
(4)
(5)
let(3) var b: Integer;(4) var c: BooleanIn
begin…;
let(6) var d: Boolean;(7) Var e: Integer
in…;
…end;
…end
…
let(5) var d: Integer;
In…;
Identification Attribute
a
b
b
(1)
(2)
(3)
Level
1
1
2
2 c (4)
Identification Attribute
a
b
d
(1)
(2)
(6)
Level
1
1
2
e (7)2
Chart 171
Attributes
Kind– constant– variable– procedure– function– type
Type– boolean– character– integer– record– array
Examples
Chart 172
Attributes
Information to be extracted from declaration– Constant, variable, procedure, function, type– Procedure or function declaration includes a list of formal
parameters that may be a constant, variable, procedural, or functional parameter
– Language provides whole families of record and array types How to manage attribute information
– Extract type information from declarations and store in information table
Could be complex for a realistic programming language Could require tedious programming
– Use the AST Pointers in information table pointing to location in AST with that
identifier
Chart 173
AttributesProgram
LetCommand
SequentialDeclaration SequentialCommand
VarDeclaration VarDeclaration SequentialCommand
LetCommand
SequentialDeclaration
VarDeclaration VarDeclaration
Ident. int boolIdent.
Ident. intbool Ident.
a b
d e
. . .
. . .
. . .
Identification Attributeab
Level
11
(1) (2)
(6)
Identification Attributeabd
Level112
e
(7)
2
Chart 174
Standard Environment
Predefined constants, variables, types, procedures, and functions
These are loaded into the identification table Scope rules for standard environment
– Scope enclosing the entire program Level 0
– Same scope level as global declarations Example is C
Chart 175
Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Semantic Analyzer
Identification
Type checking
Chart 176
Type Checking
Second task of contextual analyzer is to ensure that the source program contains on type errors
Once applied occurrence of an identifier has been identified, the contextual analyzer will check that the identifier is used in a way consistent with its declaration
Chart 177
Type Checking
Statically –typed language can detect any type errors without actually running the program– For every expression E in the language, the compiler
can infer either that E has some type T or that E is ill-typed
If E does have type T, then E will always yield a value of type T If a value of type T’ is expected, then compiler checks that T’ is
equivalent to T
Chart 178
Type Checking Infers the type of each expression bottom-up
– Starting with literals and identifiers, and working up through larger and larger subexpressions
– Literal: The type of a literal is immediately known– Identifier: The type of an applied occurrence of identifier I is
obtained from the corresponding declaration of I– Unary operator application:
Consider “O E” where O is a unary operator of type T1 T2 Type checker ensures that E’s type is equivalent to T1 Infers that type of “O E” is T2. Otherwise a type error
– Binary operator application: Consider “E1 O E2” where O is binary operator of type T1 X T2 T3 E1’s type is equivalent to T1 E2’s type is equivalent to T2 ‘E1 O E2‘ is of type T3 Otherwise type error
Chart 179
Type Checking
Type of a nontrivial expression is inferred from the types of its sub-expressions, using the appropriate type rules
Must be able to test if two given types T and T’ are equivalent
Chart 180
Type Checking – Constant or Variable Identifier
ConstDeclaration
Ident. Expr.
x . . .
:T
SimpelVname
Ident.
x
ConstDeclaration
Ident. Expr.
x . . .
:T
SimpelVname
Ident.
x
:T
Chart 181
Type Checking – Variable Declaration
VarDeclaration
Ident.
x
T
SimpelVname
Ident.
x
VarDeclaration
Ident.
x
T
SimpelVname
Ident.
x
:T
Chart 182
Type Checking – Binary Operator
BinaryExpression
Ident.
. . .
Expr.Op.
. . .
:int:int
<
BinaryExpression
Ident.
. . .
Expr.Op.
. . .
:int:int
<
:bool
< is of type Int X int bool
Chart 183
Type Checking
Each applied occurrence of an identifier must be identified before type checking can proceed
Chart 184
Chapter 6: Run-time Organization
Marshal the resources of the target machine (instructions, storage, and system software) in order to implement the source language
Chart 185
Chapter 6: Run-time Organization
Data Representation– How should we represent the values of each source-language type in the
target machine? Expression Evaluation
– How should we organize the evaluation of expressions, taking care of intermediate results?
Static Storage Allocation– How should we organize storage for variables, taking into account the
different lifetimes of global, local, and heap variables? Stack Storage Allocation Routines
– How should we implement procedures, functions, and parameters, in terms of low-level routines?
Heap Storage Allocation Run-time Organization for Object-oriented Languages
– How should we represent objects and methods? Case Study: The Abstract Machine TAM
Chart 186
Data Representation
How should we represent the values of each source-language type in the target machine?
High-Level Data Types• Truth values• Integers• Characters• Records• Arrays• Operations over these types
Machine Data Types• Bits• Bytes• Words• Double-words• Low-level arithmetic
and logical operations
Need to bridge the semantic gap between high-level types and machine level types
Chart 187
Data Representation -- Fundamental Principles
Non-confusion– Different values of a given type should have different
representations– If two different values are confused, i.e., have the same
representation, then comparison of these values will incorrectly treat the values as equal
– Example: approximate representation of real numbers Real numbers that are slightly different mathematically might have the
same approximate representation Difficult to avoid – need to take care during compiler design
– Must avoid confusion in the representations of discrete types such as truth values, characters, and integers
– For statically typed languages need only be concedrned with values of the same type
00…002 may represent false, the integer 0, the real number 0.0 Compile time type checks will denote the values of different types
Chart 188
Data Representation -- Fundamental Principles
Uniqueness– Each value should always have the same
representation– Example of non-uniqueness
Ones-complement representation of integers in which zero is represented both by 00...002 and 11…112 (+0 and –0)
A simple bit-string co0parison would incorrectly treat these values as unequal
More specialized integer comparison must be used Alternative twos-complement representation gives us unique
representations of integers
Chart 189
Data Representation – Pragmatic Issues
Constant-size representation– The representations of all values of a given type should
occupy the same amount of space– Make possible for compiler to plan the allocation of
storage– Knowing the type of variable but not the actual value,
the compiler will know exactly how much storage space the variable will occupy
Chart 190
Data Representation – Pragmatic Issues
Direct representation vs. indirect representation– Should the values of a given type be represented
directly, or indirectly through pointers?– Direct representation
Just the binary representation of the value consisting of one or more bits, bytes, words
– Indirect representation A handle that points to the storage area which has the binary
representation of the value Essential for types whose values vary greatly in size
– List or dynamic array
Chart 191
Direct representation vs. indirect representation
x x yhandle handle
Same type as x but requiring more space
Chart 192
Notation
#T: cardinality of type T– Number of distinct values of type T– #[[Boolean]] = 2
Size T: amount of space (in bits, bytes, or words) occupied by each value of type T– For indirect representation only handle is counted
For direct representation of type T– size T log2 (#T) or 2(size T) #T– size T is represented in bits– In n bits we can represent at most 2n distinct values if
we are to avoid confusion non-confusion requirement
Chart 193
Primitive Types
Cannot be decomposed into simpler values Most programming languages provide these
primitive types– Boolean, Char, Integer– Also provide elementary logical and arithmetic
operations Machines typically support the above primitive
types, so choice of representation is straightforward
Chart 194
Primitive Types Representation
Boolean– true and false– Since #[[Boolean]] = 2 then size[[Boolean]] 1 bit– Can represent Boolean with one bit, one bye, or one
word For single bit: 0 for false and 1 for true For byte or word: 00…002 for false and either 00…012 or 11…
112 for true Negation, conjunction, disjunction NOT, AND, OR
Chart 195
Primitive Types Representation
Char– Source language can specify character set
Ada: ISO-Latin1 character set (28 distinct characters) Java: Unicode character set (216 distinct characters)
– Most do not Allows compiler writers to choose the machine’s native
character set (27 or 28 distinct characters)– ISO defines character representation for “A” to be
010000012
Can represent a character by one byte or one word
Chart 196
Primitive Types Representation
Integer– Denotes an implementation-defined bounded range of integers
Defined by the individual language processor– Binary representation determined by target machine’s arithmetic
unit and almost always occupies one word Can implement language’s integer operations with machine's integer
operations– Pascal and Triangle
-maxint, …, -1, 0, +1, …, +maxint– maxint is implementation defined
#[[Integer]] = 2 X maxint + 1 2size[[Integer]] 2 X maxint + 1 For word size of w bits, size[[Integer]] = w, maxint = 2w-1 – 1
– Java Int denotes –231, …, -1, 0, +1, …, +231 – 1 #[[Int]] = 232
Chart 197
Record Type
Consists of several fields, each of which has an identifier– All records of a particular type have fields with the
same identifiers and types Fundamental operation on records is field
selection– Use one field identifier to access the corresponding
field Simple representation
– Juxtapose the fields to make them occupy consecutive positions in storage
– Allows us to predict total sized of each record and the position of each field relative to the base of the record
Chart 198
Record Type
Consider the followingtype T = record I1: T1, …, In: Tn end;var r: T
– size T = size T1 + … + size Tn
– If size T1, .., and size Tn are all constant, then size T is also constant
– Implementation of field selection Address[[r.Ii]] = address r + (size T1 + … + size Ti-1)
Value of type T1
Value of type T2
Value of type Tn
r.I1
r.I2
r.In
… …
Some machines have alignment restrictions, which force unused space to be left between record fields; cannot use these equations
Chart 199
Disjoint Unions Tag and a variant part Value of tag determines type of variant part
– T = T1 + … + Tn
In each value of type T, the variant part is a value chosen from one of the types T1, …, or Tn; the tag indicates which one
– Size T = size Ttag + max(sizeT1, …, size Tn)– Address[[u.Itag]] = address u + 0– Address[[u.Ii]] = address u + size Ttag
value of type T1
value of type T2
value of type Tn
value of type Ttag
u.Itag u.Itag u.Itag
u.I1u.I2 u.Inor … or …
Will
hav
e w
aste
d sp
ace
Wasted space
Max
(siz
eT1,…
,size
T n)
Chart 200
Static Arrays
Consists of several elements, all of the same type– Bounded range of indices – usually integers– Each index has exactly one element– Fundamental operation on arrays is indexing
Access an individual element by giving its index Index evaluated at run-time
– Static Array Index bounds are known at compile-time Direct representation is to juxtapose the array elements, in
order of increasing indices. Implemented by run-time address computation
Chart 201
Static Arrays (lower index bound is 0)
Consider the following exampleType T = array n of Telem;Var a: T
– Size T = n X size Telem
– The number of elements n is constant, so size Telem is constant, then size T is also constant
– Address[[a[i] ]] = address a + (i X size Telem)
– Since i is known only at run-time, an array indexing implies a run-time address computation
a[0]a[1]a[2]
a[n-1]
values of type Telem
Chart 202
Static Arrays (programmer chooses lower and upper array bounds)
Consider the following exampleType T = array [l..u] of Telem;Var a: T
– size T = (u - l + 1) X size Telem– The number of elements (u – l + 1) is
constant, so size Telem is constant, then size T is also constant
– address[[a[i] ]] = address a + (i – l) X size Telem) = address a – (l X size Telem) + (i X size Telem)
– Address[[a[0] ]] = address a – (l X size Telem)– Address[[a[i] ]] = address[[a[0] ]] + (i X size
Telem)– Since i is known only at run-time, an array
indexing implies a run-time address computation
– Index check must ensure that l i u
a[l]a[l+1]a[l+2]
a[u]
values of type Telem
Chart 203
Dynamic Arrays
An array whose index bounds are not know until run-time– Different dynamic arrays of the same type may have
different index bounds, and therefore different numbers of elements
– Need to satisfy constant-size requirement– Create array descriptor or handle
Pointer to the array’s elements Index bounds Handle has constant size
Chart 204
Dynamic Arrays Ada example
Type T is array [Integer range <>) of Telem;a: T (E1 .. E2);
– size T = address:size + 2 X size[[Integer]] Address:size is the amount of space required to store an address –
usually one word. Satisfies constant-size requirement
– Declaration of array variable a: E1 and E2 are evaluated to yield a’s index bounds (say l and u) Space is allocated for (u – l + 1) elements, juxtaposed and separate
from a’s handle Address[[a(0)]] = address[[a(l)]] – (l X size Telem) Values for address[[a(0)]], l, and u are stored in a’s handle
– The element with index i will be address as follows: Address[[a(i)]] = address[[a(0)]] + (i X size Telem) =
content(address[[a]]) + (i X size Telem) Index check is l i u where l = content(address[[a]] + address:size)
and u = content(address[[a]]+ address:size + size[[Integer]]
Chart 205
Dynamic Arrays
a[l]
a[u]
elements of type Telem
a[l+1]a[l+2]
a[0]lu
origina lower bound
upper bound
handle
Chart 206
Status
Chapter 6: Run-time Organization– Data Representations
Primitive types Record types Disjoint unions Static arrays Dynamic arrays Recursive types
– Expression Evaluation Register machine Stack machine
– Static Storage Allocation Global variables
– Stack Storage Allocation Local variables
Chart 207
Recursive Types
Defined in terms of itself– Values of recursive type T have components that are
themselves of type T– Examples
List with tail being itself a list Tree with the sub-trees themselves being trees
Chart 208
Recursive Types Consider the Pascal declaration
type IntList = ^IntNode; IntNode = record
head: Integer;tail: IntList
end;var primes: IntList
– Size[[IntList]] = address:size (usually 1 word)
primeshandle
Always use pointers to represent values of the recursive type
Chart 209
Expression EvaluationRegister Machine
How should we organize the evaluation of expressions
The problem is the need to keep intermediate results somewhere
Consider the expressiona * b + (1 – (c * 2))
– Will have intermediate results for a * b, c * 2, and 1 – (c * 2)
– For a register based machine (non-stack machine) Use the registers to store intermediate results Problem arises when there are not enough registers for all
intermediate results
Chart 210
Expression EvaluationExample a * b + (1 – (c * 2))
LOAD R1 aMULT R1 bLOAD R2 #1LOAD R3 cMULT R3 #2SUB R2 R3ADD R1 R2
a, b, c are memory addresses for the values of a, b, c
Chart 211
Expression EvaluationStack Machine
The machine provides a stack for holding intermediate results
For the expression a * b + (1 – (c * 2))LOAD aLOAD bMULTLOADL 1LOAD cLOADL 2MULTSUBADD
Chart 212
Expression EvaluationStack Machine Example a * b + (1 – (c * 2))
value of a
unusedspace
value of avalue of b
value of a*b value of a*b1
value of a*b1
value of c
value of a*b1
value of c2
value of a*b1
value of c*2
value of a*bvalue of 1-(c*2)
value of (a*b)+(1-(c*2))
(1) After LOAD a (2) After LOAD b (3) After MULT
(5) After LOAD c
(4) After LOAD 1
(6) After LOAD 2 (7) After MULT (8) After SUB
(9) After ADD
Operands of different types (and therefore different sizes) can be evaluated in just the same way. E.g., AND, OR, function, etc. Each operation takes values from top of stack and places results onto top of stack
Chart 213
Static Storage AllocationGlobal Variables
Each variable in source program requires enough storage to contain any value that might be assigned to it
As a consequence of constant-size representation, the compiler knows how much storage needs to be allocated to variable, based on type of variable (size T)
Global variables– Variables that exist and take up storage throughout the program’s
run-time.– Static storage allocation: Compiler locates these variables at
some fixed positions in storage (decides each global variable’s address relative to the base of the storage region in which global variables are located)
Chart 214
Static Storage AllocationGlobal Variables: Example
lettype Date = record
y: Integer,m: Integer;d: Integer
end;var a: array 3 of Integer;var b: Boolean;var c: Char;var t: Date
in. . .
a(0)a(1)a(2)bct.yt.mt.dunusedspace
a
t
Chart 215
Stack Storage AllocationLocal Variables
A local variable v is one that is declared inside a procedure (or function).
Lifetime of v: the variable v exists (occupies storage) only during an activation of that procedure
If same procedure is activated several times– v will have several lifetimes– Each activation creates a distinct variable
Chart 216
Stack Storage AllocationLocal Variables: An Example
letvar a: array 3 of Integer;var b: Boolean;var c: Char;proc Y () ~
letvar d: Integer;var e: record c: Char, n:
Integer endin
. . .proc Z () ~
letvar f: Integer
inbegin …; Y(); … end
inbegin …; Y(); …; Z(); … end
Chart 217
Stack Storage AllocationLocal Variables: An Example
time
Programcalls Y
Returnfrom Y
Programcalls Z
Z calls Y Returnfrom Y
Returnfrom Z
Programstops
Lifetime of variables local to Y
Lifetime of variables local to Z
Lifetime of variables local to Y
Lifetime of global variables
Observations:• Global variables are the only ones that exist throughout the program’s run-time
• Use static allocation for global variables• Lifetimes of local variables are properly nested
• Use a stack for local variables
Chart 218
Stack Storage AllocationStack Frames: An Example
globals
SB
ST
(1) After program starts
globals
SB
LB
(2) After program calls Y
globals
SB
ST
(3) After return from Y
framefor Y
ST
globals
SB
LB
(4) After program calls Z
framefor Z
ST
globals
SB
LB
(5) After Z calls Y
framefor Z
ST
framefor Y
globals
SB
LB
(6) After return from Y
framefor Z
ST
globals
SB
ST
(7) After return from Z
dynamic links
RegistersSB: Stack Base – Location of global
variablesLB: Local Base – Local variables of
currently running procedureST: Stack Top – Very top of stack
Chart 219
Stack Storage Allocation
The stack varies in size– For example, the frames for each of Y’s activation are at two
different locations– The position of a frame within a stack cannot be predicted in
advance– Need registers dedicated to point to the frames
Registers (find address of variables relative to these registers)– SB: stack base – is fixed, pointing to the base of the stack. This is
where the global variables are located.– LB: local base – points to the base of the topmost frame in the
stack. This frame always contains the variables of the currently running procedure.
– ST: stack top – points to the very top of the stack. ST keeps track of the frame boundary as expressions are evaluated and the top of the stack expands and contracts.
Chart 220
Stack Storage Allocation
Frame contents– Space for local variables– Link data
Return address – code address to which control will be returned at the end of the procedure activation. It is the address of the instruction following the call instruction that activated the procedure in the first place.
Dynamic link – the pointer to the base of the underlying fram e in the stack. It is the old content of LB and will be restored at end of procedure activation
dynamic linkreturn addresslink data
local data
Since there are two words of link data, local variable addresses are offset by 2
This only considers access to local or global variables, not nested variables.
Chart 221
Chapter 7: Code Generation
Code Selection A Code Generation Algorithm Constants and Variables Procedures and Functions Case Study: Code Generation in the Triangle
Compiler
Chart 222
Code Generation
Translation of the source program to object code– Dependent on source language and target machine
Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or
a mixture– Single addressing mode, or many
Chart 223
Code GenerationMajor Subproblems
Code selection: which sequence of target machine instructions will be the object code for each phrase– Write code templates: a general rule specifying the object code of
all phases of a particular form (e.g., all assignment commands, etc.)
– But there are usually lots of special cases Storage allocation: deciding the storage address of each
variable in source program– Exact for glob al variables, but only relative for local variables
Register allocation: should be used to hold intermediate results during expression evaluation– Complex expressions -- not enough registers
Since code generation for stack machine much simpler than for register machine, will only generate code for stack machine
Chart 224
Code GenerationCode Selection
Deciding which sequence of instructions to generate for each case
Code template: specifies the object code to which a phrase is translated, in terms of the object code to which its sub phrases are translated.
Object code: sequence of instructions to which the source-language phrase will be translated
Code specification: collection of code functions and code templates; must cover the entire source langauge
Chart 225
Abstract Machine TAM
Suitable for executing programs compiled from a block-structured language such as Triangle
All evaluation takes place o a stack Primitive arithmetic, logical, and other operations
are treated uniformly with programmed functions and procedures
Two separate stores– Code Store: 32-bit instruction words (read only)– Data Store: 16-bit data words (read-write)
Chart 226
Abstract Machine TAMCode and Data Stores
Code Store– Fixed while program is running– Code segment: contains the program’s instructions
CB points to base of code segment CT points to top of code segment CP points to next instruction to be executed
– Initialized to CB (programs first instruction is at base of code segment)
– Primitive segment: contains ‘microcode’ for elementary arithmetic , logical, input-output, heap, and general-purpose operations
PB points to base of primitive segment PT points to top of primitive segment
Chart 227
Abstract Machine TAMCode and Data Stores
Data Store– While program is running segments of data store may
vary– Stack grows from low-address end of Data Store
SB points to base of the stack ST points to top of the stack
– Initialized to SB– Heap grows from the high-address endo fo Data Store
HB points to base of heap HT points to top of heap
– Initialized to HB
Chart 228
Abstract Machine TAMCode and Data Stores
Code Store
codesegment
unused
primitivesegment
CB
CP
CT
PB
PT
Data Store
globalsegment
unused
heapsegment
SB
LB
ST
HT
HB
frame
frame
stack
• Stack and heap can expand and contract
• Global segment is always at base of stack
• Stack can contain any number of other segments known as frames containing data local to an activation of some routine
• LB points to base of topmost frame
Chart 229
Code Functions
Run the program P and then halt, starting and finishing with an empty stack
Execute the command C, possibly updating variables, but neither expanding nor contracting the stack
Execute the expression E, pushing its result on to the stack top, but having no other effect
Push the value of the constant or variable named V on to the stack top
Pop a value from t he stack top, and store it in the variable named V
Elaborate the declaration D, expanding the stack to make space for any constants and variables declared therein
run P
execute C
evaluate E
fetch V
assign V
elaborate D
Chart 230
Abstract Machine TAMInstructions
Fetch an n-word object from the data address (d+register r), and push it on the stack
Push the data address (d+register r) on to the stackPop a data address from the stack, fetch an n-word object from that address, and
push it on to the stackPush the 1-word literal value d on to the stackPop an n-word object from the stack, and store it at the data address (d+register r)Pop an address from the stack, then pop an n-word object from t he stack and store
it at that addressCall the routine at code address (d+register r), using the address in register n as
the static linkPop a closure (static link and code address) from the stack, then call the routine at
that code addressReturn from the current routine: pop an n-word result from the stack, then pop the
topmost frame, then pop d words of arguments, then push the result back on to the stack
Push d words (uninitialized) on to the stackPop an n-word result from the stack, then pop d more words, then push the result
back on to the stackJump to code address (d+register r)Pop a code address from the stack, then jump to that addressPop a 1-word value from the stack, then jump to code address (d+register r) if and
only if that value equals nStop execution of the program
LOAD(n) d[r]LOADA d[r]LOADI(n)
HALT
LOADL dSTORE(n) d[r]STOREI(n)
CALL(n) d[r]
CALLI
RETURN(n) d
PUSH dPOP(n) d
JUMP d[r]JUMPIJUMPIF(n) d[r]
Chart 231
While Command
execute [[while E do C]] =
JUMP h– g: execute C– h: evaluate E
JUMPIF(1) g
Chart 232
While Command
execute [[while i > 0 do i := i – 2]]
– execute [[i := I – 2]]
– execute [[i > 0]]
30: JUMP 35 // JUMP hg: 31: LOAD i 32: LOADL2 33: CALL sub 34: STORE ih: 35: LOAD i 36: LOADL0 37: CALL gt 38: JUMPIF(1) 31 // JUMPIF(1) g
Chart 233
While Command
public Object visitWhileCommand(WhileCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr;
jumpAddr = nextInstrAddr;// saves the next instruction address (g:) to put in JUMP command emit(Machine.JUMPop, 0, Machine.CBr, 0);// puts the JUMP h instruction in obj file loopAddr = nextInstrAddr;// this is address g: ast.C.visit(this, frame);// this generates code for C patch(jumpAddr, nextInstrAddr);// this establishes address h: that was needed in the JUMP h statement ast.E.visit(this, frame);// this generated code for E emit(Machine.JUMPIFop, Machine.trueRep, Machine.CBr, loopAddr);// this generated code to check expression, if false to address g: return null; }
Chart 234
While Command
execute [[while E do C]] =g:execute C
evaluate EJUMPIF(1) g
Chart 235
Repeat Command
execute [[repeat i := i – 2 until i < 0 do ]]
– execute [[i := i – 2]]
– execute [[i > 0]]
g: 31: LOAD i 32: LOADL 2 33: CALL sub 34: STORE i 35: LOAD i 36: LOADL 0 37: CALL lt 38: JUMPIF(0) 31 // JUMPIF(0) g
Chart 236
Repeat Command
public Object visitRepeatCommand(RepeatCommand ast, Object o) { Frame frame = (Frame) o; int jumpAddr, loopAddr; // emit(Machine.JUMPop, 0, Machine.CBr, 0); // jumpAddr = nextInstrAddr; loopAddr = nextInstrAddr; ast.C.visit(this, frame);
// patch(jumpAddr, nextInstrAddr); ast.E.visit(this, frame); emit(Machine.JUMPIFop, Machine.falseRep, Machine.CBr, loopAddr); return null; }
Chart 237
Abstract Machine TAMRoutines
Chart 238
Abstract Machine TAMPrimitive Routines
Chart 239
Extend Mini-TriangleV1 , V2 := E1 , E2
This is a simultaneous assignment: both E1 and E2 are to be evaluated, and then their values assigned to the variables V1 and V2, respectivelyevaluate E1
evaluate E2
assign V2
assign V1
Results pushed to top of stackResults pushed to top of stackTop of stack stored in variable V2
Top of stack stored in variable V1
Result E2
STST
Result E1
ST
Result E1ST
Result E1
Result E2V2
ST
Result E2V2
Result E1V1
Chart 240
Extend Mini-TriangleC1 , C2
This is a collateral command: the subcommands C1 and C2 are to be executed in any order chosen by the implementer
execute C1
execute C2
Top of stack unchangedTop of stack unchanged
Chart 241
Extend Mini-Triangleif E then C
This is a conditional command: if E evaluates to true, C is executed, otherwise nothing
evaluate EJUMPIF (0) gexecute C
g:
Results pushed to top of stackJump to g if E evaluates to falseTop of stack unchangedJump location
Chart 242
Extend Mini-Trianglerepeat C until E
This is a loop command: E is evaluated at the end of each iteration (after executing C), and the loop terminates if its value is true
g: execute C evaluate E JUMPIF (0) g
Top of stack unchangedResults pushed to top of stack Jump to g if E evaluates to false
Chart 243
Extend Mini-Trianglerepeat C1 while E do C2
This is a loop command: E is evaluated in the middle of each iteration (after executing C1 but before executing C2), and the loop terminates if its value is false
JUMP hg: execute C2
h: execute C1
evaluate E JUMPIF (1) g
Top of stack unchangedTop of stack unchanged Results pushed to top of stack Jump to g if E evaluates to true
Chart 244
Extend Mini-Triangleif E1 then E2 else E3
This is a conditional expression: if E1 evaluates to true, E2 is evaluated, otherwise E3 is evaluated (E2 and E3 must be of the same type)
evaluate E1
JUMPIF (0) gevaluate E2
JUMP hg: evaluate E3
h:
Results pushed to top of stackJump to g if E evaluates to falseResults pushed to top of stackJump locationResults pushed to top of stack
Chart 245
Extend Mini-Trianglelet D in E
This is a block expression: the declaration D is elaborated, and the resultant bindings are used in the evaluation of E
elaborate Devaluate EPOP (n) s
Expand stack for variables or constantsResults pushed to top of stackPop an n word from stack, pop s more, then
push first n-word back on stackIf s>0where s = amount of storage allocated by D
n = size (type of E)
Chart 246
Extend Mini-Trianglebegin C; yield E end
Here the command C is executed (making side effects), and then E is evaluated
execute Cevaluate E
Top of stack unchangedResults pushed to top of stack
Chart 247
Extend Mini-Trianglefor I from E1 to E2 do C
First the expressions E1 and E2 are evaluated, yielding the integer m and n, respectively. Then the subcommand C is executed repeatedly, with I bound to integers m, m+1, …, n in successive iterations. If m < n, C is not executed at all. The scope of I is C, which may fetch I but may not assign to it.
Chart 248
Extend Mini-Trianglefor I from E1 to E2 do C
evaluate E2
evaluate E1
JUMP hg: execute C
CALL succh: LOAD –1 [ST]
LOAD –3 [ST]CALL leJUMPIF(1) gPOP(0) 2
Compute final valueCompute initial value of I
Top of stack unchangedIncrement current value of IFetch current value of IFetch final valueTest current value <= final valueIf so, repeatDiscard current and final values
At g and at h, the current value of I is at the stack top (at address –1 [ST], and the final value is immediately underlying (at address –2 [ST]
Chart 249
Chapter 8: Interpretation
Interactive Interpretation– Interactive Interpretations of Machine Code– Interactive Interpretation of Command
Languages– Interactive Interpretation of Simple
Programming LanguagesRecursive InterpretationCase Study: The TAM Interpreter
Chart 250
Chapter 9: Conclusion
The Programming Language Life Cycle– Design– Specification– Prototype– Compilers
Error Reporting– Compile-time Error Reporting– Run-time Error Reporting
Efficiency– Compile-time Efficiency– Run-time Efficiency
Chart 251
Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Semantic Analyzer
Identification
Type checking
Chart 252
Programming Language Lifecycle:Concepts
Values Types Storage Bindings Abstractions Encapsulation Polymorphism Exceptions Concurrency
Concepts
Advanced Concepts
Chart 253
Programming Language Lifecycle:Simplicity & Regularity
Strive for simplicity and regularity– Simplicity: support only the concepts essential
to the applications for which language is intended
– Regularity: should combine those concepts in a systematic way, avoiding restrictions that might surprise programmers or make their task more difficult
Chart 254
Design Principles
Type completeness: no operation should be arbitrarily restricted in the types of its operands– Operations like assignment and parameter passing should,
ideally, be applied to all types Abstraction: for phrase that specifies some kind of
computation, should be a way to abstract that phrase and parameterize it– Should be possible to abstract any expression and make it a
function Correspondence: for each form of declaration there
should be corresponding parameter mechanism– Take a block with a constant definition and transform it into as
procedure (or function) with a constant parameter
Chart 255
Programming Language Lifecycle
Design
Specification
Prototype
Compilers
Manuals,textbooks
Chart 256
Specification
Precise specification for language’s syntax and semantics must be written– Informal or formal or hybrid
Syntax
Semantics
Informal Formal
English phrases
English phrases
BNF, EBNF
Axiomatic method (based on mathematical logic)
Chart 257
Prototypes
Cheap, low quality implementation Highlights features of language that are hard to
implement Try out language
– Interpreter might be a ghood prototype– Interpretive compiler
From source to abstract machine code
Chart 258
Compile-Time Error Reporting
Rejecting ill-formed programs Report location of each error with some
explanation Distinguish between the major categories of
compile-time errors:– Syntactic error: missing or unexpected characters or
tokens Indicate what characters or tokens were expected
– Scope error: a violation of the language’s scope rules Indicate which identifier was used twice, or used with
declaration– Type error: a violation of the language’s type rule
Indicate which type rule was violated and/or what type was expected
Chart 259
Run-Time Error Reporting
Common run-time errors– Arithmetic overflow– Division by zero– Out-of-range array indexing
Can be detected only at run-time, because they depend on values computed at run-time
Chart 260
Final Exam Review
Final Exam is comprehensive in that: – Essay questions will cover Chapters 5, 6, 7, 9– Problem oriented questions require knowledge from
the entire semester Exam Structure
– Four questions Two essay questions
– Discuss– Describe
Two problems– Develop code template for new language construct– Determine identification table for given program– Calculate size and address for given type(s)
– Compare & contrast– Evaluate
Chart 261
Final Exam ReviewChapter 5 – Contextual Analysis
Contextual analysis checks that the program conforms to the source language’s contextual constraints– Scope rules– Type rules
Block Structure– Monolithic– Flat– Nested
Type Checking– Literal– Identifier– Unary operator application– Binary operator application
Standard Environment
Chart 262
Final Exam ReviewChapter 6 – Run-Time Organization
Key Issues– Data representation– Expression evaluation– Storage allocation– Routines
Fundamental Principles of Data Representation– Non-confusion: different values of a given type should
have different representation– Uniqueness: each value should always have same
representation
Chart 263
Final Exam ReviewChapter 6 – Run-Time Organization
Types– Primitive types: cannot be decomposed
Boolean Character Integer
– Records– Disjoint unions– Static arrays– Dynamic arrays– Recursive types
For various types be able to determine size (storage required) and address (how to locate)
Chart 264
Final Exam ReviewChapter 6 – Run-Time Organization
Expression Evaluation– Stack machine– Register machine– Static storage allocation
Global variables– Stack storage allocation
Local variables
Chart 265
Final Exam ReviewChapter 7 – Code Generation
Translation of the source program to object code– Dependent on source language and target machine
Target Machines– Registers, or stack, or both for intermediate results– Instructions with zero, one, two, or three operands, or
a mixture– Single addressing mode, or many
Chart 266
Final Exam ReviewChapter 7 – Code Generation
Code selection: which sequence of target machine instructions will be the object code for each phrase
Storage allocation: deciding the storage address of each variable in source program
Register allocation: should be used to hold intermediate results during expression evaluation
Chart 267
Final Exam ReviewChapter 9 – Programming Language Life-Cycle
Design
Specification
Prototype
Compilers
Manuals,textbooks
Chart 268
Final Exam ReviewChapter 9 – Programming Language Life-Cycle
Strive for simplicity and regularity Design Principles
– Type completeness: no operation should be arbitrarily restricted in the types of its operands
– Abstraction: for phrase that specifies some kind of computation, should be a way to abstract that phrase and parameterize it
– Correspondence: for each form of declaration there should be corresponding parameter mechanism
Specifications Prototype Error Reporting
– Compile-time– Run-time
Chart 269
Final Exam Review Structure of a Compiler
Lexical Analyzer
Parser
Intermediate Code Generation
Optimization
Assembly Code Generation
Symbol Table
Source code
Assembly code
tokens
parse tree
intermediate representation
intermediate representation
Semantic Analyzer
Semantic Analyzer
Identification
Type checking