38
Yu-Chen Kuo 1 Chapter 1 Introduction to Compiling

Yu-Chen Kuo1 Chapter 1 Introduction to Compiling

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Yu-Chen Kuo 1

Chapter 1

Introduction to Compiling

Yu-Chen Kuo 2

1.1 Compilers

Yu-Chen Kuo 3

• Source languages: Fortran, Pascal, C, etc.• Target languages: another PL, machine Lang• Compilers:

– Single-pass– Multi-pass– Load-and-Go– Debugging– Optimizing

Yu-Chen Kuo 4

Analysis-Synthesis Model

• Compilation: Analysis & Synthesis• Analysis:

– Break source program into pieces– Intermediate representation– Hierarchical structure: syntax tree

• Node: operation• Leaf: arguments

• Synthesis: construct target program from tree

Yu-Chen Kuo 5

Analysis-Synthesis Model

Yu-Chen Kuo 6

Context of a Compiler

• Several other programs to create .exe files– Preprocessor: macros– Assembler: translate assembly into machine

code– Loader/link-editor: link library routines

Yu-Chen Kuo 7

Context of a Compiler

Yu-Chen Kuo 8

1.2 Analysis of the source program

• Three phases

1. Linear analysis• Divide source program into tokens

2. Hierarchical analysis• Tokens grouped hierarchically

3. Semantic analysis• Ensure components fit meaningfully

Yu-Chen Kuo 9

Lexical Analysis

• Linear analysis: lexical analysis, scanning e.g., position:= initial+rate*60

1. Identifier position

2. Assignment symbol “: =“

3. Identifier initial

4. “+” sign

5. Identifier rate

6. “*” sign

7. number 60

Yu-Chen Kuo 10

Syntax Analysis

• Hierarchical analysis: parsing or syntax analysis

– Group tokens into grammatical phrases

Grammatical phrases: parser tree

Yu-Chen Kuo 11

Syntax Analysis

Yu-Chen Kuo 12

Syntax Analysis

• Hierarchical structure is expressed by recursive rules

• Recursively define expression1. identifier is an expression2. number is an expression3. expression1 +/ expression2

(expression1) are an expression• By rule 1, initial and rate are exp.• By rule 2, 60 is an exp.• By rule 3, initial+rate*60 is an exp.

Yu-Chen Kuo 13

Syntax Analysis

• Recursively define statement

1. identifier1:= expression2 is a statement

2. while (expression1) do statement2

If (expression1) then statement2

are statements

Yu-Chen Kuo 14

Lexical v.s. Syntax Analysis

• Division is arbitrary

• Recursion or not– recognize identifiers, by linear scan until

neither a letter or a digital was found, no recursion

• E.g., initial

– Not powerful enough to analyze exp. or statement, without putting hierarchical structure

• E.g, ( …..), begin …. end, statements

Yu-Chen Kuo 15

Lexical v.s. Syntax Analysis

• Division is arbitrary

• Recursion or not– recognize identifiers, by linear scan until

neither a letter or a digital was found, no recursion

• E.g., initial

– Not powerful enough to analyze exp. or statement, without putting hierarchical structure

• E.g, ( …..), begin …. end, statements

Yu-Chen Kuo 16

Semantic Analysis

• Check semantic error

• Gather type information for code-generation

• Using hierarchical structure to identify operators and operands

• Doing type checking– E.g, using a real number to index an array (error)– Type convert– E.g, Fig.1.5 ittoreal(60) if initial is a real number

Yu-Chen Kuo 17

Semantic Analysis

Yu-Chen Kuo 18

Analysis in Text Formatters

• \hbox {<list of boxes>}

• \hbox {\vbox{! 1} \vbox{@ 2}}

Yu-Chen Kuo 19

1.3 The Phases of A Compiler

Yu-Chen Kuo 20

1.3 The Phases of A Compiler

• Phases

• First three phases: analysis portion

• Last three phases: synthesis portion

• Symbol-table management phase

• Error handler phases

Yu-Chen Kuo 21

Symbol-table Management

• To record the identifiers in source program– Identifier is detected by lexical analysis and then is stored

in symbol table

• To collect the attributes of identifiers (not by lexical analysis)– Storage allocation : memory address– Types– Scope (where it is valid, local or global) – Arguments (in case of procedure names)

• Arguments numbers and types • Call by reference or address• Return types

Yu-Chen Kuo 22

Symbol-table Management

• Semantic analysis uses type information check the type consistence of identifiers

• Code generating uses storage allocation information to generate proper relocation address code

Yu-Chen Kuo 23

Error Detection and Reporting

• Syntax and semantic analysis handle a large fraction of errors

• Lexical phase: could not form any token

• Syntax phase: tokens violate structure rules

• Semantic phase: no meaning of operations– Add an array name and a procedure name

Yu-Chen Kuo 24

Translation of A Statement

Yu-Chen Kuo 25

Translation of A Statement

Yu-Chen Kuo 26

The Analysis Phases

• Lexical analysis– Group characters into tokens

• Identifiers

• Keywords (if, while)

• Punctuations ( ‘(‘ ,’)’)

• Multi-character operator (‘:=‘)

– Enter lexical value (lexeme) into symbol table• position, rate, initial

• Syntax analysis– Fig. 1.11(a), 1.11(b)

Yu-Chen Kuo 27

The Analysis Phases

• Syntax analysis

• Semantic analysis– Type checking and converting

Yu-Chen Kuo 28

Intermediate Code Generation

• Represent the source program for an abstract machine code

• Should be easy to produce

• Should be easy to translate into target program

• Three-address code (at most three operands)– temp2:=id3*temp1– every memory location can act like a register

• temp2 BX

Yu-Chen Kuo 29

Code Optimization

• Improve the intermediate code

• Faster-running machine code– temp1 :=id3*60.0

id1:=id2+temp1

Yu-Chen Kuo 30

Code Generation

• Generate relocation machine code or assembly code

– MOVF id3, R2

MULF #60.0, R2

MOVF id2, R1

ADDF R2, R1

MOVF R1, id1

Yu-Chen Kuo 31

1.4 Cousins of The Compiler

• Preprocessors

• Assemblers

• Two-Pass Assembler

• Loaders and Link-Editors

Yu-Chen Kuo 32

Preprocessors

• Macro processing

• File inclusion– #include <global.h> replace by file “global.h”

• Rational preprocessors

• Language extensions– ## query language embedded in C– Translated into procedure call

Yu-Chen Kuo 33

Preprocessors

• Example 1.2– \define\JACM #1; #2; #3

{{\s1 J. ACM} {\bf #1}: #2, pp. #3.}– \JACM 17;4;715-728

J. ACM 17:4, pp. 715-728.

Yu-Chen Kuo 34

Assembler

• Producing relocatable machine code– DW a #10

DW b #20MOV a, R1ADD #2, R1MOV R1, b

• Load content of address a into R1• Add constant 2• Store R1 into address b

Yu-Chen Kuo 35

Two-Pass Assembly

• First pass– Find all identifiers and their storage location and store in sy

mbol table• Identifier Address

a 0

b 4

• Second pass– Translate each operation code into the sequence of bits

– Relocatable machine code

Yu-Chen Kuo 36

Two-Pass Assembly

• Example 1.3

Inst. Code Register Mem/Const. Content (R)

0001(MOV) 01(R1) 00(Mem) 00000000(a) *

0011(ADD) 01(R1) 10(Constant) 00000010

0010(MOV) 01(R1) 00(Mem) 00000100(b) *

Yu-Chen Kuo 37

Two-Pass Assembly

• ‘*’ denotes relocation bit– if data is loaded starting at address 00001111– a should be at location 00001111+00000000– b should be at location 00001111+00000100

Inst. Code Register Mem/Const. Content (R) 0001(MOV) 01(R1) 00(Mem) 00000111(a) * 0011(ADD) 01(R1) 10(Constant) 00000010 0010(MOV) 01(R1) 00(Mem) 00010011(b) *

Yu-Chen Kuo 38

Loaders and Link-Editors

• Loader– Taking and altering relocatable address machine c

odes

• Link-editors– External references

• Library file, routines by system, any other program