46
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth Programming Languages 2nd edition Tucker and Noonan

Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Embed Size (px)

Citation preview

Page 1: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Chapter 2Syntax

A language that is simple to parse for the compiler is also simple to parse for the human programmer.

N. Wirth

Programming Languages2nd edition

Tucker and Noonan

Page 2: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

2.1 Grammars2.1.1 Backus-Naur Form2.1.2 Derivations2.1.3 Parse Trees

2.1.4 Associativity and Precedence2.1.5 Ambiguous Grammars

2.2 Extended BNF2.3 Syntax of a Small Language: Clite

2.3.1 Lexical Syntax2.3.2 Concrete Syntax

2.4 Compilers and Interpreters2.5 Linking Syntax and Semantics

2.5.1 Abstract Syntax2.5.2 Abstract Syntax Trees2.5.3 Abstract Syntax of Clite

Contents

Page 3: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Expr Expr + Term | Expr – Term | TermTerm Term * Factor | Term / Factor |

Term % Factor | FactorFactor Primary ** Factor | PrimaryPrimary 0 | ... | 9 | ( Expr )

Red indicates a terminal of G1, blue indicates meta-symbols (symbols that aren’t part of the language but are used to describe the language)

Example: 10 * ( 5 – 3)

Grammar G1: Review

Page 4: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Motivation for using a subset of C:

GrammarLanguage (pages) ReferencePascal 5 Jensen & WirthC 6 Kernighan &

RichieC++ 22 StroustrupJava 14 Gosling, et. al.

The Clite grammar fits on one page (next 3 slides), so it’s a far better tool for studying language design.

2.3 Syntax of a Small Language: Clite

Page 5: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Program int main ( ) { Declarations Statements }

Declarations { Declaration }

Declaration Type Identifier [ [Integer ]] { , Identifier

[[Integer ] ] };

Type int | bool | float | char

Statements { Statement }

Statement ; | Block | Assignment | IfStatement |

WhileStatement

Block { Statements }

Assignment Identifier [ [ Expression ] ] = Expression ;

IfStatement if ( Expression ) Statement [ else Statement ]

WhileStatement while ( Expression ) Statement

Clite EBNF Grammar: StatementsFig. 2.7

Page 6: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Expression Conjunction { || Conjunction }Conjunction Equality { && Equality } Equality Relation [ EquOp Relation ] EquOp == | != Relation Addition [ RelOp Addition ] RelOp < | <= | > | >= Addition Term { AddOp Term } AddOp + | - Term Factor { MulOp Factor } MulOp * | / | % Factor [ UnaryOp ] Primary UnaryOp - | ! Primary Identifier [ [ Expression ] ] | Literal | ( Expression ) |

Type ( Expression )

Fig. 2.7 Clite Grammar: Expressions

Page 7: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Identifier Letter { Letter | Digit } Letter a | b | ... | z | A | B | ... | Z Digit 0 | 1 | ... | 9 Literal Integer | Boolean | Float | Char Integer Digit { Digit } Boolean true | false Float Integer . Integer Char ‘ ASCII Char ‘

(ASCII Char is the set of ASCII characters)

Clite grammar: lexical level

Page 8: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• 13 grammar rules – compare to 4 pages, for C++ • Metabraces { }(0 or more) is interpreted to mean

left associativity; e.g., ◦ Addition → Term { AddOp Term }

AddOp → + | - • Metabrackets [ ] (optional) means an addition can

only be followed by one or no relational operators plus another addition.

• Relation Addition [ RelOp Addition ]◦ RelOp < | <= | > | >=

◦ (no a > b > c for example)

Clite Grammar - Discussion

Page 9: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Comments• The significance of whitespace• Distinguishing one token <= from two

tokens < =• Distinguishing identifiers from keywords

like if

Issues Not Addressed by this Grammar

Page 10: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

The Clite grammar has two levels◦ lexical level (described by lexical syntax)◦ syntactic level (described by concrete

syntax) They correspond to two separate parts of

a compiler. The issues on the previous slide are

lexical issues.

Grammar Levels

Page 11: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Examples of lexical entities (tokens):

◦ Identifiers e.g., numbr1, X◦ Literals e.g., 123, 'x', 3.25, true◦ Keywords bool char else false float

if int main true while◦ Operators = || && == != < <= > >= + -

* / ! %◦ Punctuation ; , { } ( )◦ Char e.g.,‘?’

2.3.1 Lexical Syntax & Analysis

Page 12: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Whitespace is any space, tab, end-of-line character (or characters), or character sequence inside a comment

No token may contain embedded whitespace (unless it is a character or string literal) Example:

>= one token> = two tokens

Whitespace

Page 13: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

while ( a <= b) legal - spacing between tokens

while(a<=b) also legal - spacing not needed

while (a < = b) no lexical errors but illegal syntactically – lexer would identify tokens while, (, a, <, =, b, )

The Role of Whitespace

Page 14: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Clite uses // comment style of C++ Not defined in Clite grammar (but could

be) Instead, it’s defined outside the grammar The use of whitespace to differentiate

between one and two character operators is also defined outside the grammar.

Comments

Page 15: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Sequence of letters and digits, starting with a letter

“if” is an identifier which also is a keyword

• Keywords versus reserved words:◦ Keyword: predefined by the language

◦ Reserved word: can only be used as defined.

◦ In most languages all keywords are also reserved, but in a few; e.g., Pascal, a subset of the keyword identifiers are predefined but not reserved (and can be redefined by the programmer). Implications? Flexibility, confusion …

Clite Identifiers

Page 16: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Concrete syntax of a language is the set of rules for writing correct programs

The structure of a specific program can be represented by a parse tree, based on the concrete syntax of the language, using the stream of Tokens identified during lexical analysis

The root of the parse tree is the Start Symbol of the language (Program, in Clite).

2.3.2 Concrete Syntax

Page 17: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Clite’s expression rules are non-ambiguous with respect to precedence and associativity◦ Rule ordering defines precedence; rule

format defines associativity.• C/C++ expression grammar definition

is ambiguous – precedence and associativity are specified separately.

Clite Grammar - Discussion

Page 18: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Clite Operator AssociativityUnary - ! none* / left+ - left< <= > >= none (i.e., no a < b <= c)== != none&& left|| left

Associativity and Precedence

Page 19: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

… are non-associative.(an idea borrowed from Ada)

Why is this important?In C & C++, the expression:

if (10 < x < 20)is not equivalent to if (10 < x && x < 20)

But it is error-free! So, what does it mean?

Clite Relational Operators

Page 20: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Grammar rules don’t specify the operand types to be used with various operators; e.g., is

true + 13 a legal expression? What is the type of the expression 123.78 + 37 ?

These are type and semantic issues, not lexical or syntax.

Syntax versus Semantics

Page 21: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

2.4 Compilers and Interpreters

Figure 2.8: Major Stages in the Compilation Process

Lexical Analyzer(lexer)

SyntacticAnalyzer(parser)

Semantic Analyzer

CodeOptimizer

CodeGenerator

Toke

ns

Abs

trac

t Syn

tax

Machine Code

Inte

rmed

iate

Cod

e (I

C)

SourceProgram

Inte

rmed

iate

Cod

e (I

C)

Page 22: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Input: characters (the program) Output: tokens & token type Lexical grammars are simpler than syntax

grammars Often generated automatically by lexical

analyzer generating programs

Lexer

Page 23: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Often based on BNF/EBNF grammar• Input: tokens• Output: abstract syntax tree or some

other representation of the program• Abstract syntax: similar to a concrete

parse tree but with punctuation, many nonterminals discarded

Parser (Syntax Analyzer)

Page 24: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Typical tasks:◦ Check that all identifiers are declared◦ Perform type checking for expressions,

assignments, …◦ Insert implied conversion operators (i.e., make

them explicit)• Context free grammars can’t express the

semantic rules that are needed for this phase of translation.

• Output: Intermediate code (IC) tree, modified abstract syntax tree representation.

Semantic Analysis/Type Checker

Page 25: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Purpose: Improve the run-time performance of the object code◦ Usually, to make it run faster◦ Other possibilities: reduce amount of storage

required Drawback: optimization is time-

consuming; slows down debugging Output: Intermediate code, similar to

abstract syntax notation; closer to machine code

Code Optimization

Page 26: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Evaluate constant expressions at compile-time

• In-line expansion (of function calls)• Loop unrolling• Reorder code to improve cache

performance• Eliminate common sub-expressions• Eliminate unnecessary code• Store local variables/intermediate results

in registers rather than on the stack or elsewhere in memory

Code Optimization - Techniques

Page 27: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Output: machine code• Instruction selection, register

management• “Peephole” optimization: look at a small

segment of machine code, make it more efficient; e.g.,◦ x = y; →→ load y, R0

store R0, xz = x * 2; load x, R0 // redundant code

mul R0, #2

Code Generation

Page 28: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Replaces last 2 phases of a compiler with direct execution

Input: ◦ Mixed: generates & uses intermediate

code/abstract syntax ◦ Pure: start from stream of ASCII characters

each time a statement is executed Mixed interpreters

◦ Java, Perl, Python, Haskell, Scheme Pure interpreters:

◦ most Basics, shell commands

Interpreter

Page 29: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Source code: a = x + y; Compiler-generated object code:

load r0, x;add r0, y;store r0, a;

Will be executed later, with remainder of program

Interpreter: Call an interpretive routine to actually perform the operation.

e.g., add(x, y, a);

Interpreter versus Compiler

Page 30: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

It’s not the case that the lexical analyzer identifies all the tokens, and then the parser analyzes all the tokens, and then the type/semantic analysis is performed.

Instead, parser repeatedly contacts lexer to get another token

As tokens are received they either do or don’t match the expected syntax◦ If there’s a match, perform any type or semantic

testing, possibly generate int. code, call for another token.

Stages of Translation Are Interleaved

Page 31: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Output of parser: the concrete parse tree is large – probably more than needed for next phase◦ The compiler usually produces some more

compact representation of the program

Example: Fig. 2.9 (page 46)

2.5 Linking Syntax and Semantics

Page 32: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Parse Tree for z = x + 2*y;Fig. 2.9

Page 33: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

The shape of the parse tree reveals the meaning of the program.

So as output of syntax analysis we want a tree that removes its inefficiency and keeps its meaning. ◦ Remove separator/punctuation terminal

symbols◦ Remove all trivial nonterminals◦ Replace remaining nonterminals with leaf

terminals Example: Fig. 2.10

Finding a More Efficient Tree

Page 34: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Compare: concrete & abstract parse trees for z = x + 2*y

Page 35: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Abstract Syntax

Removes unnecessary details but keeps the essential language elements; e.g., consider the following two equivalent loops:

Pascal C++while j < n do begin while (j < n) { j := j + 1; j = j + 1;end; } Essential information: 1) it is a loop, 2) its terminating condition is j >= n, and 3) its body increments the current value of j.

Page 36: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Purpose: an intermediate form of the source code

• Generated by the parser during syntax analysis

• Used during type checking/semantic analysis• Abstract syntax rules are defined by the

compiler (or interpreter).• One concrete syntax can have several

abstract syntaxes associated with it, depending on the design of the translator.

Abstract Syntax Tree

Page 37: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

LHS = RHS LHS names an abstract syntax class RHS either (1) gives a list of one or more

alternatives or (2) lists the essential elements of

the syntax class Compare to production rules in concrete

grammar

General Form of Abstract Syntax Rules

Page 38: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Simplified Abstract Assignment Syntax Figure 2.11

Assignment = Variable target; Expression sourceExpression = Variable| Value | Binary | UnaryVariable = String idValue = Integer valueBinary = Operator op; Expression term1, term2Unary = UnaryOp op; Expression termOperator = +| - | * | / | !

Priority? Associativity? ….

Page 39: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

• Concrete:Assignment Identifier [ [ Expression ] ] = Expression ;

• Abstract:Assignment = Variable target; Expression source

Compare Concrete to Abstract

Page 40: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Abstract Syntax Tree for z = x + 2 * y

target source

Binary

BinaryOperator

Operator

Variable

VariableValue

+

2 y*

x

z

Page 41: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

op term1 term2Binary node

op termUnary node

Binaries and unaries represent information that can be used for later processing

Binaries and Unaries

Page 42: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Abstract Syntax of Clite Assignments – Fig. 2.14

Assignment = Variable target; Expression sourceExpression = VariableRef | Value | Binary | UnaryVariableRef = Variable | ArrayRefVariable = String idArrayRef = String id; Expression indexValue = IntValue | BoolValue | FloatValue | CharValueBinary = Operator op; Expression term1, term2Unary = UnaryOp op; Expression termOperator = ArithmeticOp | RelationalOp | BooleanOpIntValue = Integer intValue…

Page 43: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Abstract Syntax as Java Classes - Examples

abstract class Expression { } abstract class VariableRef extends Expression { }class Variable extends VariableRef { String id; }class ArrayRef extends VariableRef { String id; Expression index}class Value extends Expression { … }class Binary extends Expression {

Operator op;Expression term1, term2;

}class Unary extends Expression {

UnaryOp op;Expression term;

}

Page 44: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Remaining Abstract Syntax of Clite (Declarations and Statements)Fig 2.14

Page 45: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

Lexical syntax: small, simple, defines language tokens

Concrete syntax: detailed, specific, defines correct programs, used to direct parsing algorithms(language specific, not implementation specific)

Abstract syntax: simpler than concrete, used to describe the structure of the intermediate code (implementation specific, not language specific) NOT INTENDED TO BE USED FOR PARSING

Semantics: program “meaning”, or runtime behavior

Summary

Page 46: Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

A syntax is ambiguous if a portion of a program has two or more possible interpretations (parse trees)

Non-ambiguous grammars can be written but some ambiguity may be tolerated to reduce grammar size.

Operator associativity and precedence can be defined by the concrete syntax

Compilers generate code for later execution while interpreters execute program statements as they are analyzed.

Summary