Upload
asher-carroll
View
213
Download
1
Embed Size (px)
Citation preview
Chapter 1
Introduction
Major Data Structures in Compiler
Gang S. LiuCollege of Computer Science &
TechnologyHarbin Engineering University
Compiler [email protected] 2
Major Data Structures in Compiler
There is a strong interaction between the algorithms used by the phases of a compiler and the data structures that support these phases.
Algorithms need to be implemented in efficient manner.
The choice of data structures is important
Compiler [email protected] 3
Tokens
When a scanner collects characters into a token, it represents the token symbolically as a value of an enumerated data type representing a
set of tokens of the source language Sometimes, it is necessary to preserve the character
string itself or other information derived from it The name associated with an identifier token The value of a number token
In most languages the scanner needs to generate one token at a time (single symbol lookahead) A single global variable can be used to hold the token
information.
Compiler [email protected] 4
The Syntax Tree The syntax tree is constructed as a
standard pointer-based structure that is dynamically allocated as parsing proceeds.
The tree can be kept as a single variable pointing to the root node.
Each node is a record. Its fields represent the information collected by the parser and the semantic analyzer. Sometimes these fields are dynamically
allocated
Compiler [email protected] 5
The Symbol Table This data structure keeps information
associated with identifiers, functions, variables, constants, and data types.
The symbol table interacts with almost every phase of the compiler.
The insertion, deletion access operations need to be efficient.
A standard data type for this purpose is the hash table.
Compiler [email protected] 6
The Literal Table
Stores constants and strings used in the program.
Quick insertion and lookup are essential.
Need not allow deletions.
Compiler [email protected] 7
Intermediate Code
Depending on the kind of intermediate code, it may be kept as An array of text strings A temporary text file Linked list of structures
Compiler [email protected] 8
Temporary Files Computers did not possess enough memory
for an entire program to be kept in memory during compilation.
This was solved by using temporary files to hold the products of intermediate steps.
Memory constrains are now much smaller problem.
Occasionally, compilers generate intermediate files during some of the steps.
Compiler [email protected] 9
Other Issues in Compiler Structure
Passes Language Definition Error Handling
Compiler [email protected] 10
Passes A compiler often processed the entire source program
several times before generating code. These repetitions are referred as passes. Passes may or may not correspond to phases. Depending on the language, a compiler may be one
pass. Efficient compilation, but not efficient target code. Examples: Pascal and C.
Most compilers with optimizations use more than one pass: 1. Scanning and parsing2. Semantic analysis and source-level optimization3. Code generation and target code optimization
Compiler [email protected] 11
Language Definition The description of the lexical, syntactic, and
semantics of a programming language is collected in a language reference manual, or language definition.
With a new language, a language definition and compiler are often developed together.
More common situation is when a compiler is written for well-known language which has an existing language definition.
Compiler [email protected] 12
Error Handling One of the most important functions of a compiler. Errors can be detected during almost every phase
of compilation. Error reported by a compiler are static (or
compile-time) errors. It is important to generate meaningful error
messages. Error handler contains different operations for a
specific compiler phase and situation