35
1 Structure of Compilers Lexica l Analyz er (scann er) Modified Source Program Parser Tokens Semanti c Analysi s Syntacti c Structur e Optimiz er Code Generat or Intermediat e Representat ion Target machine code Symbol Table

1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

Embed Size (px)

DESCRIPTION

3 Binding time Example: int count; count = count + 5; Type of count is bound at compile time. Value of count bound at execution time. Meaning of + bound at compile time. Possible types for count, internal representation of 5 and set of possible meanings for + bound at compiler design time.

Citation preview

Page 1: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

1

Structure of Compilers Lexical Analyzer (scanner)

Modified Source Program

ParserTokens Semantic

AnalysisSyntactic Structure

Optimizer

Code Generator

Intermediate Representation

Target machine code

Symbol Table

Page 2: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

2

Binding

• Binding - association of an operation and a symbol.

• Binding time - when does the association take place? – Design time – Compile time – Link time– Runtime

Page 3: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

3

Binding time

• Example: int count; count = count + 5;

• Type of count is bound at compile time. • Value of count bound at execution time.• Meaning of + bound at compile time.• Possible types for count, internal representation

of 5 and set of possible meanings for + bound at compiler design time.

Page 4: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

4

Binding

• Static– It occurs before runtime and remains

unchanged throughout program execution.• Dynamic

– It occurs at runtime and can change in the course of program execution.

Page 5: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

5

Type Binding

• Before a variable can be referenced in a program, it must be bound to a data type.

• Two important questions to ask: 1. How is the type specified?

• Explicit Declaration • Implicit Declaration

– All variable names that start with the letters ‘i’ - ‘r’ are integer, real otherwise

– @name is an array, %something is a hash structure

• Determined by context and value

Page 6: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

6

Type Binding

2. When does the binding take place– Explicit declaration (static)– Implicit declaration (static)– Determined by context and value (dynamic)

• Dynamic Type Binding– When a variable gets a value, the type of the

variable is determined right there and then.

Page 7: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

7

Dynamic Type Binding• Specified through an assignment statement (set x ‘(1 2 3)) <== x becomes a list

(set x ‘a) <== x becomes an atom• Advantage:

– flexibility (generic program units)• Disadvantages: 1. High cost (dynamic type checking and interpretation) 2. Type error detection by the compiler is difficult

Page 8: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

8

Type Inference• Rather than by assignment statement, types are

determined from the context of the reference.– Some languages don’t support assignment

statements• Example:

(define someFunction (m n) .... )

(someFunction ‘a ‘b) <== m becomes an atom(someFunction ‘(a b c) ‘(1 2 3)) <= m becomes a list

Page 9: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

9

Storage Bindings

• Allocation – getting a cell from some pool of available cells

• Deallocation – putting a cell back into the pool

• The lifetime of a variable is the time during which it is bound to a particular memory cell.

Page 10: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

10

Variable lifetime

• Static– bound to memory cells before execution begins and

remains bound to the same memory cell throughout execution.

– C’s static variables • Advantage:

– efficiency (direct addressing),– history-sensitive subprogram support

• Disadvantage: – lack of flexibility (no recursion)

Page 11: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

11

Variable lifetime• Stack-dynamic

– Storage bindings are created for variables when their declaration statements are elaborated.

– If scalar, all attributes except address are statically bound.– e.g. local variables in Pascal and C subprograms

• Advantage: – allows recursion; conserves storage

• Disadvantages: – Overhead of allocation and deallocation– Subprograms cannot be history sensitive

Page 12: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

12

Variable Lifetime• Explicit heap-dynamic

– Allocated and deallocated by explicit directives, specified by the programmer, which take effect during execution.

– Referenced only through pointers or references– e.g. dynamic objects in C++ (via new and delete)– all objects in Java

• Advantage: – provides for dynamic storage management

• Disadvantage: – inefficient and unreliable

Page 13: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

13

Variable lifetime

• Implicit heap-dynamic– Allocation and deallocation caused by

assignment statements– e.g. Lisp variables

• Advantage: – flexibility

• Disadvantages: – Inefficient, because all attributes are dynamic– Loss of error detection

Page 14: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

14

-

Type Checking

• Generalize the concept of operands and operators to include subprograms and assignments

• Type checking is the activity of ensuring that the operands of an operator are of compatible types

• A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler-generated code, to a legal type. This automatic conversion is called a coercion.

Page 15: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

15

Type Checking

• A type error is the application of an operator to an operand of an inappropriate type.

• If all type bindings are static, nearly all type checking can be static

• If type bindings are dynamic, type checking must be dynamic

• A programming language is strongly typed if type errors are always detected.

Page 16: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

16

Type Checking• Advantage of strong typing:

– Allows the detection of the misuses of variables that result in type errors.

• Languages:– 1. FORTRAN 77 is not: parameters, EQUIVALENCE– 2. Pascal is not: variant records– 3. Modula-2 is not: variant records, WORD type– 4. C and C++ are not: parameter type checking can be

avoided; unions are not type checked.– 5. Ada is, almost (UNCHECKED CONVERSION is

loophole) (Java is similar)• Coercion rules strongly affect strong typing

– they can weaken it considerably (C++ versus Ada)

Page 17: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

17

Name Type Compatibility

• Type compatibility by name means the two variables have compatible types if they are in either the same declaration or in declarations that use the same type name– Easy to implement but highly restrictive:

• Subranges of integer types are not compatible with integer types

• Formal parameters must be the same type as their corresponding actual parameters

Page 18: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

18

Name Type Compatibility

type indextype = 1..100; //subrangevar count : integer; index : indextype;

Is count and index name type compatible?Is count and index name type compatible?

Page 19: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

19

Structure Type Compatibility

• Type compatibility by structure means that two variables have compatible types if their types have identical structures– More flexible, but harder to implement

type myType1 = double;type myType2 = double;myType1 data1;myType2 data2;data1 = data2; //????

Page 20: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

20

Type Compatibility

type type1 = arary [1..10] of integer; type2 = array [1..10] of integer; type3 = type2;

type2 is compatible with type3 by name

type1 is compatible with type2 by structure

Page 21: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

21

Type Compatibility

• Anonymous types

A: array [1..10] of INTEGER;B: array [1..10] of INTEGER;C, D: array [1..10] of INTEGER;

Page 22: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

22

Scope

• The scope of a variable is the range of statements over which it is visible.

• The nonlocal variables of a program unit are those that are visible but not declared there.

• The scope rules of a language determine how references to names are associated with variables.

Page 23: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

23

Static scope• Based on program text

– To connect a name reference to a variable, you (or the compiler) must find the declaration.

• Search process: – search declarations, first locally, then in increasingly larger

enclosing scopes, until one is found for the given name.

• Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent.

Page 24: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

24

Static Scope

• Blocks - a method of creating static scopes inside program units--from ALGOL 60

C and C++: for (...) { int index; ... }

Ada: declare LCL : FLOAT; begin

... end

Page 25: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

25

Evaluation of Static Scoping (for nested procedures)

Consider the example:

MAIN Define Sub A Define Sub C

Define Sub D Call Sub C

Define Sub B Define Sub E Call Sub E Call Sub A

Static Scope: Procedures

MAIN

A

B

C

D

E

Page 26: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

26

Static Scope: Variablesprogram main; var x: integer; procedure sub1; begin print(x); end; {sub1} procedure sub2; var x: integer; begin x := 10; sub1; end;begin x := 5; sub2end.

Output

10 or 5?

Page 27: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

27

Static Scope• Suppose the spec is changed so that D must now

access some data in B

• Solutions:1. Put D in B (but then C can no longer call it and D cannot access

A's variables)2. Move the data from B that D needs to MAIN (but then all

procedures can access them)

• Same problem for procedure access!

• Overall: static scoping often encourages many globals

Page 28: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

28

Dynamic Scope• Based on calling sequences of program units, not

their textual layout (temporal versus spatial)

• References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point.

• Evaluation of Dynamic Scoping:– Advantage: convenience

– Disadvantage: hard to debug, hard to understand the code

Page 29: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

29

Dynamic Scope: Variablesprogram main; var x: integer; procedure sub1; begin print(x); end; {sub1} procedure sub2; var x: integer; begin x := 10; sub1; end;begin x := 5; sub2end.

Output

10 or 5?

Page 30: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

30

Scope vs. Lifetime

• Scope and lifetime are sometimes closely related, but are different concepts!!

• Consider a static variable in a C or C++ functionvoid someFunction(){ static int x; ...}

Page 31: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

Intermediate Representation

• Almost no compiler produces code without first converting a program into some intermediate representation that is used just inside the compiler.

• This intermediate representation is called by various names:– Internal Representation– Intermediate representation– Intermediate language

Page 32: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

Intermediate Representation

• Intermediate Representations are also called by the form the intermediate language takes:– tuples– abstract syntax trees– Triples– Simplied language

Page 33: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

Intermediate Form

• In general, an intermediate form is kept around only until the compiler generates code; then it cane be discarded.

• Another difference in compilers is how much of the program is kept in intermediate form; this is related to the question of how much of the program the compiler looks at before it starts to generate code.

• There is a wide spectrum of choices.

Page 34: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

34

Abstract Syntax Treex = y + 3;

=

x +

y 3

Page 35: 1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator

35

Quadruples

y= a*(x+b)/(x-c);

T1= x+b; (+, 3, 4, 5)T2=a*T1; (*, 2, 5, 6)T3=x-c; (-, 3, 7, 8)T4=T2/T3; (/, 6, 8, 9)y=T4; (=, 0, 9, 1)

y

a

x

b

T1

T2

c

T3

T4