ISBN 0-321-19362-8 Lecture 03 Describing Syntax and Semantics

ISBN 0-321-19362-8

Lecture 03

Describing Syntax and Semantics

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-2

Lecture 03 Topics

• Introduction• Syntax Description• Static Semantics - Attribute

Grammars• Dynamic Semantics

– Operational Semantics– Axiomatic Semantics– Denotational Semantics


Introduction• Who must use language definitions?

– Other language designers– Implementors– Programmers (the users of the language)

• Syntax - the form or structure of the expressions, statements, and program units

• Semantics - the meaning of the expressions, statements, and program units


Syntax Description

• A sentence is a string of characters over some alphabet

• A language is a set of sentences• A lexeme is the lowest level

syntactic unit of a language (e.g., *, sum, 17, ;)

• A token is a category of lexemes (e.g., mult_op, identifier, int_literal, semicolon)


Syntax Description

• Formal approaches to describing syntax– Recognizers - used in compilers (we will

look at in Chapter 4)– Generators – generate the sentences of

a language (what we'll study in this chapter)


Syntax Description

• Context-Free Grammars– A class of languages developed by Noam

Chomsky in the mid-1950s– Meant to describe the syntax of natural

languages– Can be used to describe the syntax of

sentences of programming languages

• Regular Grammars (Regular Expressions)– Another class of languages developed by

Noam Chomsky– Can be used to describe the syntax of

tokens of programming languages


Syntax Description

• Backus-Naur Form– Invented in 1959 by John Backus to

describe Algol 58, modified by Peter Naur– BNF is a metalanguage, which is used to

describe another language – BNF is equivalent to context-free

grammars – In BNF, abstractions are used to

represent classes of syntactic structures--they act like syntactic variables (also called nonterminal symbols)


Syntax Description

• A grammar is a finite nonempty set of rules

• A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols

• An abstraction (or nonterminal symbol) can have more than one RHS

<stmt> <single_stmt>

| begin <stmt_list> end


Syntax Description• Syntactic lists are described using

recursion <ident_list> ident

| ident, <ident_list>• A derivation is a repeated application

of rules, starting with the start symbol and ending with a sentence (all terminal symbols)


Syntax Description• An example grammar:

<program> <stmts>

<stmts> <stmt> | <stmt> ; <stmts>

<stmt> <var> = <expr>

<var> a | b | c | d

<expr> <term> + <term> | <term> - <term>

<term> <var> | const


Syntax Description• An example derivation: <program> => <stmts>

=> <stmt> => <var> = <expr>

=> a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const


Syntax Description - Derivation• Every string of symbols in the

derivation is a sentential form• A sentence is a sentential form that

has only terminal symbols• A leftmost derivation is one in which

the leftmost nonterminal in each sentential form is the one that is expanded

• A derivation may be neither leftmost nor rightmost


Syntax Description - Parse Tree• A hierarchical representation of a

derivation <program>

<stmts>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>


Syntax Description

• A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees


Syntax Description – Ambiguous Expression Grammar

<expr> <expr> <op> <expr> | const

<op> / | -

<expr>

<expr> <expr>

<expr> <expr>

<expr>

<expr> <expr>

<expr> <expr>

<op>

<op>

<op>

<op>

const const const const const const- -/ /

<op>


Syntax Description - Unambiguous Expression Grammar

<expr> <expr> - <term> | <term><term> <term> / const | const (Precedence of / over -)

• Derivation<expr> => <expr> - <term>

=> <term> - <term>

=> const - <term>

=> const - <term> / const

=> const - const / const

<expr>

<expr> <term>

<term> <term>

const const

const/

-


Syntax Description

• Operator associativity can also be indicated by a grammar 1: <expr> <expr> + <expr> | const (ambiguous) 2: <expr> <expr> + const | const (unambiguous, left associativity)

1: 1/2:<expr><expr>

<expr>

<expr>const

const

const

+

+

<expr><expr>

<expr>

const

<expr>

const

+

<expr> +

const

<expr>


Syntax Description• Extended BNF (just abbreviations):

– Optional parts are placed in brackets ([ ])

<proc_call> ident [( <expr_list>)]

– Put alternative parts of RHSs in parentheses and separate them with vertical bars

<term> <term> (+ | -) const

– Put repetitions (0 or more) in braces ({ })

<ident> letter {letter | digit}


Syntax Description - BNF & EBNF

• BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor>

• EBNF: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}


Attribute Grammars

• Developed by Knuth, 1968 • Context-free grammars cannot

describe all of the syntax of programming languages

• Additions to context-free grammars to carry some semantic info along through parse trees

• Primary value of attribute grammars:– Static semantics specification– Static semantics checking (Compiler

design)


Attribute Grammars• Definition: An attribute grammar is a

context-free grammar with the following additions:– For each grammar symbol x there is a

set of attributes A(x)– Each rule has a set of attribute

computation (or semantic) functions that define how attribute values of the nonterminals in the rule are computed

– Each rule has a (possibly empty) set of predicate functions to check for attribute consistency


Attribute Grammars• Let X0 X1 ... Xn be a rule

• Semantic function f: <(A(X1),..., A(Xn)>

S(X0) computes a synthesized attribute S(X0) for X0; S(X0)⊂A(X0) (Synthesize up)

• Semantic Function f: <A(X0),...,A(Xj-1))>

I(Xj), 1≤j≤n, computes an inherited attribute I(Xj) for Xj; I(Xj)⊂A(Xj) (Inherit down)

• Initially, there are intrinsic attributes on the leaves; a false predicate function indicates a violation of the syntax or static semantics


Attribute Grammars• Example: • For an expression of the form id = id + id, we

require– id's can be either int_type or real_type– types of the two id's on the RHS must be the same– type of the id + id must match it's expected type

constrained by the id on the LHS• BNF: <assign> <var> = <expr>

<expr> <var> + <var> <var> id• Define two attributes A(X):

– actual_type as a synthesized attribute for <var> and <expr>

– expected_type as an inherited attribute for <expr>


Attribute Grammars• Syntax rule:

<assign> <var> = <expr>• Semantic function:

<expr>.expected_type <var>.actual_type

• Syntax rule: <expr> <var>[1] + <var>[2]

• Semantic function: <expr>.actual_type <var>[1].actual_type

• Predicate funciton: <var>[1].actual_type == <var>[2].actual_type<expr>.expected_type == <expr>.actual_type

• Syntax rule: <var> id

• Semantic function:<var>.actual_type lookup (<var>.string)


Attribute Grammars• Parse tree

<expr><expr>

<var>[1]<var>[2]

+A B

<assign>

<var>

A =


Attribute Grammars

•Parse tree decorated with attribute semantics

<expr><expr>

<var>[1]<var>[2]

+A B

<assign>

<var>

A =

actual_type = real_type

expected_type=real_type

actual_type=real_type




Dynamic Semantics

• Describing the Meanings of Programs

• There is no single widely acceptable notation or formalism for describing dynamic semantics– Operational Semantics– Axiomatic Semantics– Denotational Semantics


Operational Semantics• Describe the meaning of a program by

executing its statements on a machine, either simulated or actual.

• The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement

• To use operational semantics for a high-level language, a virtual machine is needed, because– A hardware pure interpreter would be too

expensive– A software pure interpreter also has problems:

• The detailed characteristics of the particular computer would make actions difficult to understand

• Such a semantic definition would be machine- dependent


Operational Semantics• Use a complete computer simulation• The process:

– Build a translator (translates source code to the machine code of an idealized computer)

– Build a simulator for the idealized computer

• Evaluation of operational semantics:– The semantics of a high-level language depend

on the low-level languages (machine code)– Good if low-level languages are informally

defined (language manuals, etc.)– Extremely complex if it is defined formally (e.g.,

VDL)


Axiomatic Semantics

• Based on formal logic (predicate calculus)

• Original purpose: formal program verification (or proof)

• Approach: Define axioms (inference rules or logical expressions) for each statement type in the language

• These logical expressions are called assertions


Axiomatic Semantics• An assertion before a statement (a

precondition) states the relationships and constraints among variables that are true at that point in execution

• An assertion following a statement is a postcondition

• A weakest precondition is the least restrictive precondition that will guarantee the postcondition

• Preconditions and postconditions together define the meaning of the statement


Axiomatic Semantics

• Axiom semantics Representation: – {P} <stmt> {Q}

• Example: – {b > 0} a = b + 1 {a > 1}– <stmt> : a = b + 1 – Postcondition: {a > 1}– Weakest precondition: {b > 0}– One possible precondition: {b > 10}


Axiomatic Semantics

• Program proof process – The postcondition for the whole

program is the desired result. – Work back through the program to the

first statement. – If the precondition on the first statement

is the same as the program spec, the program is correct.


Axiomatic Semantics• Derive axiom semantics for assignment

statements (<assign> <var> = <expr>):– Given <assign> {Q}– Find {P}, such that {P} <assign> {Q}– Key: P = wp(<assign>, Q), where wp is a

function which applies Q in <assign> and returns the weakest precondition

• Example– Given x = 2 * y - 3 {x > 25}– wp(x = 2 * y - 3 , x > 25) => 2 * y -3 > 25 => y > 14 => P

• Axiomatic semantics– {y>14} x=2*y-3 {x>25}


Axiomatic Semantics• Use axiom semantics for program proof:

– Given a specification (theorem)

{P’=y>20} x=2*y-3 {Q’=x>25}– Is this theorem ture?

• Define inference rule

– Where {P}<stmt>{Q} is an axiom, “=>“ for imply, and {P’}<stmt>{Q’} is derived theorem

– P’= y>20 => P = y>14

}{Q' stmt }{P'

Q' Q P, P' {Q}, stmt {P}


Axiomatic Semantics• Derive axiom semantics for sequences

(<stmt>[1];<stmt>[2])• {P1} <stmt>[1] {P2}

{P2} <stmt>[2] {P3}• Define inference rule

• Example {x < 2}

S1= y = 3 * x +1

{y < 7}

S2= x = y + 3

{x < 10}

{P3} [2]stmt [1];stmt {P1}{P3} [2]stmt {P2} {P2}, [1]stmt {P1}

{x < 2}y = 3 * x +1x = y + 3 {x < 10}

=>


Axiomatic Semantics• Derive axiom semantics for logical pretest

loops (<loop> while <test> do <loop_body> end):

{P} <loop> {Q}

• Define inference rule:

• Key: find I, the loop invariant (the inductive hypothesis), and then use I to find P, which guarantees Q at loop exit and guarantee the loop terminates.

• I must be true all the times, unaffected by the evaluation of the loop-controlling Boolean expression and the loop body statements

)}test(not and {I loop {I}

{I} loop_body }test and {I


Axiomatic Semantics• I must be weak enough to be satisfied prior

to the beginning of the loop, but when combined with the loop exit condition, it must be strong enough to force the truth of the postcondition– P => I (the loop invariant must be true

initially)– {I} <test> {I} (evaluation of the test must

not change the validity of I)– {I and <test> } <loop_body> {I} (I is not

changed by executing the body of the loop)– (I and (not <test> )) => Q (if I is true

and <test> is false, Q is implied)– The loop terminates (this can be difficult

to prove)


Axiomatic Semantics

• Example of finding I: • while y <> x do y = y + 1 end {y = x}

Iteration 0: wp (no-op, {y = x}) => {y = x}

Iteration 1: wp (y = y + 1, {y = x} ) => {x = y+ 1}

=> {y = x -1}

Iteration 2: wp (y = y + 1, {y = x-1} ) => {x -1 = y + 1} => {y = x -2}

Iteration n: wp (y = y + 1, {y = x-(n-1)} ) => {x –(n-1) = y + 1} => {y = x – n}

{I} can be set to {y<=x}

• After verifying the characteristics of I, use it to find P, e.g., set P to I


Axiomatic Semantics• Evaluation of axiomatic semantics

– Developing axioms and inference rules for all types of the statements in a language is difficult

– It is a good tool for correctness proofs, and an excellent framework for reasoning about programs, but it is not as useful for language users and compiler writers


Denotational Semantics• Based on recursive function theory• The most rigorous semantics

description method• Originally developed by Scott and

Strachey (1970)


Denotational Semantics• The process of building a

denotational spec for a language (not necessarily easy):– Define a mathematical object for each

language entity– Define a function that maps instances of

the language entities onto instances of the corresponding mathematical objects


Denotational Semantics• Example: Decimal Numbers

– The following denotational semantics description maps decimal numbers as strings of symbols into numeric values

• <dec_num> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | <dec_num> (0 | 1 | 2 |

3| 4 | 5 | 6 | 7 | 8 | 9)

• Define denotational semantics mapping function for decimals (Mdec):Mdec('0‘) = 0, Mdec ('1') = 1, …, Mdec ('9') = 9Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1…Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9


Denotational Semantics• Define denotational semantics for Programs

(VARMAP): • Semantic mapping functions for programs or

constructs map states to states (or values)• The state of a program is the mapped values

of all its current variables (i) s = {<i1, v1>, <i2, v2>, …, <in, vn>}

• Define VARMAP be a mapping retrieval function that, when given a variable name and a state, returns the current mapped value of a variable

VARMAP(ij, s) = vj

• With state s, Mdec(‘0‘, s) = 0


Denotational Semantics• Define denotational semantics

mapping function for Expressions (Me ): – <expr> <dec_num> | <var> |

<binary_expr><binary_expr> <left_expr> <operator> <right_expr><left_expr> <dec_num> | <var><right_expr> <dec_num> | <var><operator> + | -

• Mathematical objects: set of intergers Z {error}


Denotational SemanticsMe(<expr>, s) = case <expr> of <dec_num> => Mdec(<dec_num>, s) <var> => if VARMAP(<var>, s) = undef then error else VARMAP(<var>, s) <binary_expr> =>

if (Me(<binary_expr>.<left_expr>, s) = undef OR Me(<binary_expr>.<right_expr>, s) =

undef) then error

else if (<binary_expr>.<operator> = ‘+’ then Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s) else Me(<binary_expr>.<left_expr>, s) * Me(<binary_expr>.<right_expr>, s)...


Denotational Semantics• Define denotational semantics mapping

function for assignment statements (Ma):– <assign> <var> = <expr>

Ma (<assign>, s) =

if Me(<expr>, s) = error

then error

else s’ = {<i1’,v1’>,<i2’,v2’>,...,<in’,vn’>},

where for j = 1, 2, ..., n,

vj’ = VARMAP(ij, s) if ij <> <var>

= Me(<expr>, s) if ij== <var>


Denotational Semantics• Define denotational semantics mapping

function logical pretest loops (Ml):– <loop> while <test> do <loop_body> end

Ml(<loop>, s) =

if Mt(<test>, s) = undef then error

else if Mt(<test>, s) = false then s

else if Mlb(<loop_body>, s) = error then error

else Ml(<loop>, Mlb(<loop_body>, s))


Denotational Semantics• The meaning of the loop is the value of

the program variables after the statements in the loop have been executed the prescribed number of times, assuming there have been no errors

• In essence, the loop has been converted from iteration to recursion, where the recursive control is mathematically defined by other recursive state mapping functions

• Recursion, when compared to iteration, is easier to describe with mathematical rigor


Denotational Semantics• Evaluation of denotational

semantics:– Provides a rigorous way to think about

programs– Can be an aid to language design– Little use for language users

Documents

ISBN 0-321-19362-8 Lecture 03 Describing Syntax and Semantics