Transcript
Page 1: Compilers and Language Translation Gordon College

Compilers and Language Translation

Gordon College

Page 2: Compilers and Language Translation Gordon College

2

What’s a compiler? All computers only understand machine language

Therefore, high-level language instructions must be translated into machine language prior to execution

10000010010110100100101……

This isa program

Page 3: Compilers and Language Translation Gordon College

3

What’s a compiler? Compiler

A piece of system software that translates high-level languages into machine language

10000010010110100100101……

Congrats!

while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')

printf("Congrats!"); else if (c!='x') printf("You Loser!"); }

Compiler

gcc -o prog program.c

program.c

prog

Page 4: Compilers and Language Translation Gordon College

4

Assembler (a kind of compiler)

LOAD X Assembly

0101 0000 0000 1001 Machine Language

(symbol table)(opcode table)

One-to-one translation

Page 5: Compilers and Language Translation Gordon College

5

Compiler (high-level language translator)

a = b + c - d;

One-to-many translation

0101 00001110001 LOAD B0111 00001110010 ADD C0110 00001110011 SUBTRACT D0100 00001110100 STORE A

0101 00001110001 0111 00001110010…….

Page 6: Compilers and Language Translation Gordon College

6

Goals of a compiler Code produced must be correct

A = (B+C)-(D+E);

Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE A

No - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intend

Is this correct?

Page 7: Compilers and Language Translation Gordon College

7

Goals of a compiler Code produced should be reasonably efficient

and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000

sum = 0.0for(i=0;i<50000;i++) {

sum = sum + (2.0 * x[i]);

Optimizing compiler:sum = 0.0for(i=0;i<50000;i++) {

sum = sum + x[i];sum = sum * 2.0; 49,999 less instructions

Page 8: Compilers and Language Translation Gordon College

8

General Structure of a Compiler

Page 9: Compilers and Language Translation Gordon College

9

The Compilation Process Phase I: Lexical analysis

Compiler examines the individual characters in the source program and groups them into syntactical units called tokens

Phase II: Parsing

The sequence of tokens formed by the scanner is checked to see whether it is syntactically correct

ScannerSourcecode

Groups of

tokens

ParserGroups of

tokens

correct

not correct

Page 10: Compilers and Language Translation Gordon College

10

The Compilation Process

Phase III: Semantic analysis and code generation The compiler analyzes the meaning of the

high-level language statement and generates the machine language instructions to carry out these actions

Code Generator

Groupsof

tokens

Machinelanguage

Page 11: Compilers and Language Translation Gordon College

11

The Compilation Process Phase IV: Code optimization

The compiler takes the generated code and sees whether it can be made more efficient

Code Optimizer Machine

languageMachinelanguage

Page 12: Compilers and Language Translation Gordon College

12

Overall Execution Sequence on a High-Level Language Program

Page 13: Compilers and Language Translation Gordon College

13

The Compilation Process

Source program

Original high-level language program

Object program

Machine language translation of the source program

Page 14: Compilers and Language Translation Gordon College

14

Phase I: Lexical Analysis Lexical analyzer

The program that performs lexical analysis

More commonly called a scanner

Job of lexical analyzer Group input characters into tokens

• Tokens: Syntactical units that are treated as single, indivisible entities for the purposes of translation

Classify tokens according to their type

Page 15: Compilers and Language Translation Gordon College

15

Phase I: Lexical AnalysisProgram statement

sum = sum + a[i];

Digital perspective:

tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;

Tokenized:

sum,=,sum,+,a[i],;

Page 16: Compilers and Language Translation Gordon College

16

Phase I: Lexical Analysis

TOKEN TYPE CLASSIFICATION NUMBERSymbol 1Number 2= 3+ 4- 5; 6== 7If 8Else 9( 10) 11[ 12] 13…

Typical Token Classifications

Page 17: Compilers and Language Translation Gordon College

17

Phase I: Lexical Analysis Lexical Analysis Process

1. Discard blanks, tabs, etc. - look for beginning of token.2. Put characters together 3. Repeat step 2 until end of token4. Classify and save token5. Repeat steps 1-4 until end of statement6. Repeat steps 1-5 until end of source code

Scannersum=sum+a[i];

sum 1= 3+ 4a 1[ 12i 1] 13; 6

Page 18: Compilers and Language Translation Gordon College

18

Phase I: Lexical Analysis

Input to a scanner- A high-level language statement from the source program

Scanner’s output- A list of all the tokens in that statement- The classification number of each token found

Scannersum=sum+a[i];

sum 1= 3+ 4a 1[ 12i 1] 13; 6

Page 19: Compilers and Language Translation Gordon College

19

Phase II: Parsing

Parsing phase

A compiler determines whether the tokens recognized by the scanner are a syntactically legal statement

Performed by a parser

Page 20: Compilers and Language Translation Gordon College

20

Phase II: Parsing

Output of a parser

A parse tree, if such a tree exists

An error message, if a parse tree cannot be constructed

Successful construction of a parse tree is proof that the statement is correctly formed

Page 21: Compilers and Language Translation Gordon College

21

Example High-level language statement: a = b + c

Page 22: Compilers and Language Translation Gordon College

22

Grammars, Languages, and BNF

Syntax

The grammatical structure of the language

The parser must be given the syntax of the language

BNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language

literal_expression ::= integer_literal | float_literal | string | character

Page 23: Compilers and Language Translation Gordon College

23

Grammars, Languages, and BNF

In BNF

The syntax of a language is specified as a set of rules (also called productions)

A grammar

• The entire collection of rules for a language

Structure of an individual BNF rule

left-hand side ::= “definition”

Page 24: Compilers and Language Translation Gordon College

24

Grammars, Languages, and BNF

BNF rules use two types of objects on the right-hand side of a production Terminals

• The actual tokens of the language

• Never appear on the left-hand side of a BNF rule Nonterminals

• Intermediate grammatical categories used to help explain and organize the language

• Must appear on the left-hand side of one or more rules

Page 25: Compilers and Language Translation Gordon College

25

Grammars, Languages, and BNF

Goal symbol

The highest-level nonterminal

The nonterminal object that the parser is trying to produce as it builds the parse tree

All nonterminals are written inside angle brackets

Java BNF

Page 26: Compilers and Language Translation Gordon College

26

BNF Example<postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL><zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL><opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""

Identify the following:Goal symbol, terminals, nonterminals, a individual rule

Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563

Page 27: Compilers and Language Translation Gordon College

27

Parsing Concepts and Techniques Fundamental rule of parsing:

By repeated applications of the rules of the grammar-

If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid

statement of the languageelse

the sequence of tokens is not a syntactically valid statement of the language

Page 28: Compilers and Language Translation Gordon College

28

Parsing Example<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]<hostport> ::= <host> [ : <port> ]<host> ::= <hostname> | <hostnumber><hostname> ::= <ialpha> [ . <hostname> ]<hostnumber> ::= <digits> . <digits> . <digits> . <digits><port> ::= <digits><path> ::= <void> | <xpalphas> [ / <path> ]<search> ::= <xalphas> [ + <search> ]<xalpha> ::= <alpha> | <digit> | <safe> | <extra> | <escape> <xalphas> ::= <xalpha> [ <xalphas> ]<xpalpha> ::= <xalpha> | +<xpalphas> ::= <xpalpha> [ <xpalpha> ]<ialpha> ::= <alpha> [ <xalphas> ]<alpha> ::= a | b | … | z | A | B | … | Z<digit> ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<safe> ::= $ | - | _ | @ | . | & | ~<extra> ::= ! | * | " | ' | ( | ) | : | ; | , | <space><escape> ::= % <hex> <hex><hex> ::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F<digits> ::= <digit> [ <digits> ]<void> ::=

Is the following http address legal:http://www.csm.astate.edu/~rossa/cs3543/bnf.html

Page 29: Compilers and Language Translation Gordon College

29

Parsing Concepts and Techniques

Look-ahead parsing algorithms - intelligent parsers

One of the biggest problems in building a compiler is designing a grammar that:

Includes every valid statement that we want to be in the language

Excludes every invalid statement that we do not want to be in the language

Page 30: Compilers and Language Translation Gordon College

30

Parsing Concepts and Techniques

Another problem in constructing a compiler: Designing a grammar that is not ambiguous

An ambiguous grammar allows the construction of two or more distinct parse trees for the same statement

NOT GOOD - multiple interpretations

Page 31: Compilers and Language Translation Gordon College

31

Phase III: Semantics and Code Generation

Semantic analysis

The compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically valid

• If they are validthe compiler can generate machine language

instructionselse

there is a semantic error; machine language instructions are not generated

Page 32: Compilers and Language Translation Gordon College

32

Phase III: Semantics and Code Generation

Semantic analysis

Syntactically correct, but semantically incorrect

example:

sum = a + b;

int a;double sum; data type mismatchchar b;

Semantic recordsa integersum doubleb char

Page 33: Compilers and Language Translation Gordon College

33

Phase III: Semantics and Code Generation

Semantic analysis

a integerb char

temp ?

<expression> + <expression>

<expression>

Parse tree

Semantic recordSemantic record

Semantic record

Page 34: Compilers and Language Translation Gordon College

34

Phase III: Semantics and Code Generation

Semantic analysis

a integerb integer

temp integer

<expression> + <expression>

<expression>

Parse tree

Semantic recordSemantic record

Semantic record

Page 35: Compilers and Language Translation Gordon College

35

Phase III: Semantics and Code Generation

Code generation

Compiler makes a second pass over the parse tree to produce the translated code

Page 36: Compilers and Language Translation Gordon College

36

Phase IV: Code Optimization Two types of optimization

Local Global

Local optimization The compiler looks at a very small block of

instructions and tries to determine how it can improve the efficiency of this local code block

Relatively easy; included as part of most compilers:

Page 37: Compilers and Language Translation Gordon College

37

Phase IV: Code Optimization

Examples of possible local optimizations

Constant evaluation x = 1 + 1 ---> x = 2

Strength reduction x = x * 2 ---> x = x + x

Eliminating unnecessary operations

Page 38: Compilers and Language Translation Gordon College

38

Phase IV: Code Optimization

Global optimization The compiler looks at large segments of the program

to decide how to improve performance Much more difficult; usually omitted from all but the

most sophisticated and expensive production-level “optimizing compilers”

Optimization cannot make an inefficient algorithm efficient - “only makes an efficient algorithm more efficient”

Page 39: Compilers and Language Translation Gordon College

39

Summary A compiler is a piece of system software that

translates high-level languages into machine language

Goals of a compiler: Correctness and the production of efficient and concise code

Source program: High-level language program

Page 40: Compilers and Language Translation Gordon College

40

Summary Object program: The machine language translation

of the source program

Phases of the compilation process

Phase I: Lexical analysis

Phase II: Parsing

Phase III: Semantic analysis and code generation

Phase IV: Code optimization


Recommended