1 Introduction to Compilation Cheng-Chia Chen. 2 What is a compiler? l a program that translates an...

Preview:

Citation preview

1

Introduction to Compilation

Cheng-Chia Chen

2

What is a compiler? a program that translates an executable program

in one language into an executable program in another language

the compiler typically lowers the level of abstraction of the program

for “optimizing” compilers, we also expect the program produced to be better, in some way, than the original

3

Abstract view of compiler

Implications:» recognize legal (and illegal) programs» generate correct code» manage storage of all variables and code» need format for object (or assembly) code

4

Traditional decomposition of a compiler

Implications:

• intermediate language (il)

• front end maps legal code into il

• back end maps il onto target machine

• simplify retargeting

•allows multiple front ends

•multiple phases => better code

Front end is O(n) or O(n log n) Back end is NP-Complete

5

Advantage of the decomposition

6

Components of a Compiler Analysis

» Lexical Analysis» Syntax Analysis» Semantic Analysis

Synthesis» Intermediate Code Generation» Code Optimization» Code Generation

7

The Structure of a Compiler Front-end

» Lexical Analysis» Parsing» Semantic Analysis» intermediate code generation

back-end» Optimization» Code Generation

The first 3, at least, can be understood by analogy to how humans comprehend a natural language.

8

Responsibilities of Frond End recognize legal programs report errors produce il preliminary storage map shape the code for the back end

Much of front end construction can be automated

9

Responsibilities of Back-end code optimization: [middle-end]

» analyzes and changes il» goal is to reduce runtime» must preserve values

code generation:» translate il into target machine code» choose instructions for each il operation» decide what to keep in registers at each

point» ensure conformance with system

interfaces

10

Lexical Analysis First step: recognize words.

» Smallest unit above letters

Compiler is an interesting course.

Note the» Capital “C” (start of sentence symbol)» Blank “ “ (word separator)» Period “.” (end of sentence symbol)

11

More Lexical Analysis Lexical analysis is not trivial. Consider:

編譯器是一門有趣的課程。

Programming languages are typically more cryptic than English:

*h->j++ = -12.345e-5

12

And More Lexical Analysis Lexical analyzer divides program text into “words”

or “tokens”

if x == y then z = 1; else z = 2;

Units:

if, x, ==, y, then, z, =, 1, ;, else, z, =, 2, ;

13

Parsing (syntax analysis) Once words are understood, the next step is to

understand sentence structure

Parsing = Diagramming Sentences» The diagram is a tree

14

Diagramming a Sentence

This

line is a longer sentence

verbarticle noun article adjective noun

NP NP

sentence

VP

15

Parsing Programs

Parsing program expressions is the same Consider:

If x == y then z = 1; else z = 2; Diagrammed:

if-then-else

x y z 1 z 2==

assignrelation assign

predicate else-stmtthen-stmt

16

Semantic Analysis Once sentence structure is understood, we can

try to understand “meaning”» But meaning is too hard for compilers

Compilers perform limited analysis to catch inconsistencies

Some do more analysis to improve the performance of the program

17

Semantic Analysis in Natural Language

Example:

張三認為李四拿走他的課本 .

誰的課本被拿走 ? 張三 , 李四 or 第三者 ?

Even worse:

Jack said Jack left his assignment at home?

How many Jacks are there?

Which one left the assignment?

18

Semantic Analysis in Programming

Programming languages define strict rules to avoid such ambiguities

This C++ code prints “4”; the inner definition is used

Illegal in Java.

{int x = 3;{

int x = 4;cout << x;

}}

19

More Semantic Analysis Compilers perform many semantic checks

besides variable bindings

Example:

John loves her sister.

A “type mismatch” between her and John; we know they are different people» Presumably John is male

20

Optimization No strong counterpart in English, but akin to

editing

Automatically modify programs so that they» Run faster» Use less memory» In general, conserve some resource

21

Optimization Example

X = Y * 0 is the same as X = 0

X = Y * 2 is the same as X = Y + Y

Assume X and Y are integers

22

Code Generation Produces assembly code (usually)

A translation into another language» Analogous to human translation

23

Intermediate Languages Many compilers perform translations between

successive intermediate forms» All but first and last are intermediate languages

internal to the compiler» Typically there is 1 IL

IL’s generally ordered in descending level of abstraction» Highest is source» Lowest is assembly

24

Intermediate Languages (Cont.)

IL’s are useful because lower levels require exposure of many features hidden by higher levels» registers» memory layout» etc.

It is hard to obtain all these hidden features directly from the source input.

25

Example source line: a = bb+abs(c-7);

» a sequence of ASCII characters in a text file.

The scanner groups characters into tokens:

a = bb+abs(c-7); After scanning, we have the token sequence:

Ida Asg Idbb Plus Idabs Lparen Idc Minus IntLiteral7 Rparen Semi

26

Example The parser groups these tokens into parse tree:

note: (, ) and ; disappearin the tree.

27

The type checker resolves types and binds declarations within scopes:

28

Finally, JVM code is generated for each node in the tree (leaves first, then roots):

iload 3 // push local 3 (bb)

iload 2 // push local 2 (c)

ldc 7 // Push literal 7

isub // compute c-7

invokestatic java/lang/Math/abs(I)I

iadd // compute bb+abs(c-7)

istore 1 // store result into local 1(a)

29

Issues Compiling is almost this simple, but there are

many pitfalls.

Example: How are erroneous programs handled?

Language design has big impact on compiler» Determines what is easy and hard to compile» Course theme: many trade-offs in language

design

30

Compilers Today The overall structure of almost every compiler

adheres to the outline

The proportions have changed since FORTRAN» Early: lexing, parsing most complex, expensive

» Today: optimization dominates all other phases, lexing and parsing are cheap

31

Applications of Compilation Techniques

Editor Interpreter Debugger Word Processing (Tex, Word) VLSI Design (VHDL, Verilog) Pattern Recognition

32

Trends in Compilation Compilation for speed is less interesting. But:

» scientific programs» advanced processors (Digital Signal

Processors, advanced speculative architectures)

Ideas from compilation used for improving code reliability:» memory safety» detecting data races» ...

Recommended