28
Principles of Software Engineering and Operating Systems Languages and Compilers SDAGE: Level I Dr Valery Adzhiev [email protected] Office: TA-121 2015-16 7. Semantic Analysis For some images: Copyright © 2009 Elsevier, Inc. All rights reserved

Languages and Compilers

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Languages and Compilers

Principles of Software Engineering

and Operating Systems

Languages and Compilers

SDAGE: Level I

Dr Valery Adzhiev

[email protected]

Office: TA-121

2015-16

7. Semantic Analysis

For some images: Copyright © 2009 Elsevier, Inc. All rights reserved

Page 2: Languages and Compilers

Contents

• Semantic Analysis Principles

• Attribute Grammars

• Decoration of Parse Trees

• Translating Schemes

• Construction of Syntax Tree

• Action Routines

• Tree Grammars and Decorating Syntax Tree

• Check Your Understanding

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 2

Page 3: Languages and Compilers

Syntax and Semantics

• Syntax is concerned with the form (i.e., structure) of a

valid program

• Semantics concerns program’s meaning that

– Allows for enforcing rules (e.g., type consistency) that go beyond

mere form

– Provides information necessary to generate equivalent output

program

• Semantics cannot be conveniently described by CFG

– In general, it’s a matter of semantics to deal with any rule that

requires compiler to

• compare things separated by long distances (e.g., if number of

arguments in call to subroutine match number of formal parameters in

subroutine’s definition?)

• To count things not properly nested, etc.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 3

Page 4: Languages and Compilers

Semantics Rules

in Different Languages

• Programming languages vary dramatically in their choice of semantic

rules making sure that run-time behaviour will be proper.

– In Fortran and C, operands of many types are allowed to be intermixed in

expressions, in Ada they aren’t.

• Languages also vary in the extent to which they require their

implementations to perform dynamic checks

– At one extreme, C requires no checks at all, beyond those that come "free"

with the hardware (e.g., division by zero, or attempted access to memory

outside the bounds of the program).

– At the other extreme, Java tries to check at run as many rules as possible,

in part to ensure that an untrusted program cannot do anything damaging.

• Some languages (e.g., Euclid and Eiffel) also provide explicit support

for logical assertions (invariants, preconditions, and post-conditions)

– Invariant is expected to be true at all "clean points" of a given body of code.

– Pre- and post-conditions are expected to be true at the beginning and end

of subroutines, respectively. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 4

Page 5: Languages and Compilers

Static and Dynamic

Semantics

• Compiler enforces static semantic rules at compile time

– Deals with restrictions on structure of valid program that are not

expressed in standard syntactic formalisms

– Most important are types of data and expressions

• Type checking is static and precise in languages like Ada and ML: the

compiler ensures that no variable will ever be used at run time in a

way that is inappropriate for its type.

• In Lisp and Smalltalk: greater flexibility, while remaining completely

type-safe, by accepting the run-time overhead of dynamic type checks.

• Compiler generates code to enforce dynamic semantic rules

at run time or to call library routines that do so

– Certain errors, such as division by zero or attempting to index into

array with out-of-bounds subscript may occur only for certain

values or behaviours of complex code

– Sometimes dynamic checks enabled only during development. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 5

Page 6: Languages and Compilers

Semantic Analyser

Functionality

• The role of semantic

analyser

– To enforce all static

semantic rules.

– To annotate the program

with information needed by

intermediate code

generator

• Clarification (e.g., this is

floating-point addition, not

integer or this is reference

to global variable x)

• Requirements for dynamic

checks

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 6

• Interface between semantic

analysis (figuring out what the

program means) and

intermediate code synthesis

(expressing the meaning in

some new form) defines the

boundary between front end

and back end

• This boundary can vary

Page 7: Languages and Compilers

Parse Tree and

Syntax Tree for GCD

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 7

Page 8: Languages and Compilers

Parsing and

Semantic Analysis

• Compilers vary in extent to which semantic analysis and

intermediate code generation are interleaved with parsing

– With fully separated phases, Parser passes a full parse tree onto

Semantic Analyser, which

• Converts a parse tree to a syntax tree

• Fills in symbol table

• Performs semantic checks (e.g., if types in expressions are compatible or all

identifiers are declared)

• Passes a syntax tree onto the code generator

– With fully interleaved phases, there maybe no need to build

either parse tree or syntax tree in its entirety

• Parser can call semantic check and code generation routines on the fly as it

parses each expression, statement or subroutine of the source.

• Common approach: construction of a syntax tree is interleaved with parsing

(no explicit parse tree built) and then sequential phases of semantic analysis

and code generation follow.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 8

Page 9: Languages and Compilers

Attribute Grammars

• Both semantic analysis and

intermediate code generation

can be described in terms of

annotation, or decoration of

a parse tree or a syntax tree

• CFG need some addition to

carry semantic info along trees

• Attribute Grammar (AG)

provides formal framework for

decoration of tree

– It is a useful conceptual tool even

in compilers that do not build

parse tree or syntax tree as an

explicit data structure.

• Action Routines are ad-hoc

cousins of AGs

• AG can be defined in machine-

independent way, and then

attributes can be

– Sequence of computation steps that

results from processing the

program’s input (re operational

semantics)

– Logical formulas (re axiomatic

semantics) allowing to prove

properties of programs

– Domain theory denotations (re

denotational semantics) represent

what programs do through

mathematical objects

• Formal semantics of programming

languages – Introduction:

http://people.cis.ksu.edu/~schmidt/705s12/Lectures

/chapter.pdf 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 9

Page 10: Languages and Compilers

SLR(1) Attribute Grammar

for Constant Expressions

• AG: CFG + additional rules – val attribute is associated with each E, T,

F and const

• S.val is meaning (arithmetic value) of token

derived from S

– Each production has associated rules (they

are definitions rather than assignments)

• Copy rules (productions 3, 6, 8, 9) that specify

that one attribute should be a copy of another

• Other rules invoke semantic functions (sum,

quotient, additive_inverse, etc.) that can in

general be arbitrary complex functions

specified by language designer (each of their

arguments must be an attribute of a symbol in

the current production – no global variables

allowed).

• Simple semantic functions can be written “in-

line” in some well-defined notation. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 10

Subscripts are to distinguish symbols with the

same name in LHS and RHS of production.

Page 11: Languages and Compilers

Decoration of

a Parse Tree: Example • Process of evaluating attributes is called

annotation or decoration of the parse tree

• Example: how to decorate a parse tree using

LR(1) AG from previous slide for the expression:

( 1 + 3 ) * 2

– val attributes of symbols are in boxes

– Curving arrows show the attribute flow, which is

strictly upward in this case

– Each box holds output of a single semantic rule

• Arrow(s) entering the box indicate the input(s) to the

rule

• For example, at 2nd level two arrows pointing into

box with the ‘8’ represent application of the rule:

T1.val := product(T2.val, F.val)

– Once decoration is complete, the value of overall expression is in ‘val’ attribute of the root.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 11

Page 12: Languages and Compilers

Synthesised and

Inherited Attributes • Synthesised attributes: their

values are calculated

(synthesised) only in productions

in which their symbol appears on

LHS

– For annotated parse trees this means

that the attribute flow (i.e. the pattern

in which information moves from node

to node) is entirely bottom-up.

• S-attributed AG: it’s SLR(1) - in it

all attributes are synthesised

– Arguments to semantic functions in

such AG are always attributes of

symbols on RHS of current production

• Value of numeric constant is initialised

by scanner

– Return value is placed into attribute

on LHS.

• Inherited attributes: their values

are calculated when their symbol

is on RHS of current production

– They allow contextual info to flow into

symbol from above or from side

– This means that rules of that

production can be enforced in

different ways (or generate different

values) depending on context.

– Symbol table info is commonly passed

from symbol to symbol by means of

inherited attributes.

– Inherited attributes of the root of parse

tree (i.e. of start symbol) can be used

to represent external environment

(run-time parameters like command-

line ones to compiler)

• L-attributed AG: it’s LL(1) 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 12

Page 13: Languages and Compilers

How to Decorate Parse

Tree Using LL(1) CFG?

• If we want to create AG that

accumulates the value of

overall expression into the

root of tree, we have a

problem:

– Because subtraction is left

associative, we cannot

summarise in bottom-up

attribute flow the right sub-tree

of root with single numeric

value

• Note that tokens have only

synthesized attributes, initialized by

the scanner (name of an identifier,

value of a constant, etc.).

• LL(1) CFG:

• Parse tree for 9 – 4 – 3

Construct it!

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 13

Page 14: Languages and Compilers

Decoration of Parse Tree

Using LL(1) AG • Attribute values are allowed to be

passed not only bottom-up but also

left-to-right in the tree

– One passes ‘9’ into the top-most expr_tail node, where it is combined

with ‘4’.

– The resulting ‘5’ is passed into the middle expr_tail node, combined with

‘3’ to make ‘2’, and then passed upward

to root.

• So we build LL(1) AG with inherited

attributes

– In each of first two productions, 1st rule

serves to copy left content (value of expression so far) into “subtotal” (st)

attribute (in left box of nodes)

– 2nd rule copies final value from right-most leaf back up to root or holds val.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 14

LL(1) AG:

Decorated Parse tree for 9 – 4 - 3

Page 15: Languages and Compilers

LL(1) AG for

Constant Expressions • This is L-attributed AG with underlying

LL(1) CFG grammar rather than SLR(1) as before (cf with expr productions in 5-18)

– Relative complexity of attribute flow results from

fact that operators are left associative but the

grammar cannot be left-recursive: the left and

right operands of a given operator are thus

found in separate productions

– Attributes can be evaluated during LL parsing in

a single left-to-right pass over input

– Each inherited attribute of RHS symbol

depends only on

• Inherited attributes of LHS symbol, or

• Synthetic or inherited attributes of symbols to its

left in RHS

– Value of left operand of each operator is carried into TT and FT productions by the st (subtotal)

attribute.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 15

Page 16: Languages and Compilers

Decoration of Top-Down

Tree using LL(1) AG

• Expression:

( 1 + 3 ) * 2 • Curving arrows

indicate attribute

flow

• Entering given box

represents

application of a

single semantic rule

– Match boxes & rules!

• Flow is no longer

strictly bottom-up,

but it is still left-to-

right

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 16

Construct parse tree before decorating it!

Page 17: Languages and Compilers

Translation Schemes

• AGs do not specify the order in which attribute rules should be invoked

– they are declarative: just define a set of valid trees, not how to build or

decorate them.

• Translation Scheme: an algorithm that decorates parse trees by

invoking rules of AG in an order consistent with tree’s attribute flow

– Oblivious Scheme: makes repeated passes over tree, invoking semantic

functions whose arguments have all been defined and stopping after pass in

which no value changed; no knowledge about tree or grammar.

– Dynamic Scheme: tailors the evaluation order to the structure of a given

parse tree (e.g., by constructing a topological sort of the attribute flow graph

and then invoking rules in an order consistent with the sort).

– Static Scheme: is based on analysis of AG structure itself, and then applied

mechanically to any tree arising from the AG (usually the fastest scheme)

• S-attributed AGs are most general class for which evaluation can be implemented

on the fly during an LR parsing

• L-attributed AGs are most general class for which evaluation can be implemented

on the fly during an LL parsing.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 17

Page 18: Languages and Compilers

Parsing and Semantic

Analysis: Interleave them?

• Compiler that interleaves semantic analysis and code

generation with parsing is sometimes called a one-pass

compiler

– If intermediate code generation is interleaved with parsing, one need

not build a syntax tree (unless the syntax tree is the intermediate code)

– It is even possible to write intermediate code to output file on the fly

without accumulating it in the attributes of parse tree root

• If we choose not to interleave parsing and semantic analysis:

– Semantic analysis is easier to perform during a separate traversal of

a syntax tree because that tree reflects program’s semantic structure

better than a parse tree (especially with top-down parsing)

– Still need to devise AG but it will serve only to create a syntax tree –

not to enforce semantic rules or generate code.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 18

Page 19: Languages and Compilers

Building a Syntax Tree

• If choose not to interleave parsing

and semantic analysis, we need AG

to (only!) construct Syntax Tree

– Attributes hold neither numerical

values nor target code fragments

– Instead they point to nodes of a

syntax tree

• Bottom-Up (S-attributed) AG

– Function make_leaf returns pointer

to newly allocated syntax tree node

containing value of a constant

– Functions make_un_op and

make_bin_op return pointers to

newly allocated syntax tree nodes

containing unary or binary operator

and pointers to supplied operand(s). 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 19

cf with AG on slide 10!

Page 20: Languages and Compilers

Construction of Syntax Tree

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 20

Construction of syntax tree (cf slide 11)

for ( 1 + 3 ) * 2 is via decoration of bottom-

up parse tree using S-attributed AG

• In (a), values ‘1’ and ‘3’ have been placed in new syntax tree leaves and then pointers to them propagate up into attributes of E and T.

• In (b), pointers to these leaves become child pointers of new internal ‘+’ node.

• In (c), pointer to this node propagates up into attributes of T, and new leaf for ‘2’ created.

• In (d), pointers from T and F become child pointers of new internal ‘x’ node, and pointer to it

propagates up into the attributes of E.

Page 21: Languages and Compilers

Action Routines

• There are automatic tools that will construct a semantic analyser (in simple

case – construct for a given AG (“Attribute Evaluator Generators”)

– They are used mostly in syntax-based editors, incremental compilers, etc.

• Most production compilers use ad-hoc, hand-written translation scheme

– It interleaves parsing with at least initial construction of syntax tree, and

possibly all of semantic analysis and intermediate code generation, and don’t

need to build full parse tree.

– Such a translation scheme is implemented using a set of Action Routines

– Action Routine is a semantic function that compiler has to execute at a

particular point in the parse.

• Most parser generators allow the programmer to specify action routines.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 21

• Implementation mechanism uses stack – when parser predicts a

production, all of RHS is pushed onto stack; action routine is called

when pointer to it is at the top of stack.

Fragment of LL(1)

grammar with action

routines in {…}

(for a complete

grammar see Fig. 4.10

in Scott’s texbook)

Page 22: Languages and Compilers

Decorating a Syntax Tree:

Tree Grammar

• If compiler uses action routines simply to build a syntax tree, then the

bulk of semantic analysis and intermediate code generation will use

not parse tree but a syntax tree as base.

– So we need to know how to decorate syntax tree.

• Tree Grammar is used to represent syntax tree structure – CFG is meant to define language composed of strings of tokens and

describes the possible structure of parse tree for that language where

every valid string is tree’s yield in the form of leaves’ sequence.

• Parsing is a process of finding a tree that has a given yield for the CFG.

– Tree grammar is meant to define (or generate) the trees themselves

• It is not for parsing but for decoration of syntax tree

• Semantic rules attached to productions of tree grammar can be used to define

the attribute flow of a syntax tree – i.e. for decorating syntax tree.

• As in CFG, each production represents a possible relationship between

parent (LHS) and children (RHS) in the tree

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 22

Page 23: Languages and Compilers

Calculator Language with

Types and Declarations

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 23

• Bottom-Up LR CFG for Calculator Language with types and

declarations (compare with the grammar from 6/21) - Declarations allowed to be intermixed with statements

- Constants of different types (integer and real) (presumably the latter

contain a decimal point)

- Explicit conversions between integer and real operands required

- The intended semantics of the language requires that every identifier

be declared before it is used, and that types not be mixed in computations

Page 24: Languages and Compilers

Tree Grammar and Syntax

Tree for Calculator

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 24

• Notation ‘A : B‘ on LHS of production

means: A is one variant of B, and may

appear anywhere a B expected on RHS.

• Productions for calculator

tree grammar with types

and declarations look like:

• Syntax Tree for simple program: - Represents a program as a linked list of

“nodes” which are “items” (declarations

and statements)

- Represents expressions

Page 25: Languages and Compilers

Attribute Tree Grammar

for Calculator

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 25

• This tree grammar will be used to

perform static semantic types checking

• Complete tree attribute grammar for

calculator language with types is

constructed using the node classes,

variants, and attributes (see Fig. 4.14 in

Scott’s textbook)

• Once decorated, the program node will

contain a list, in a synthesized attribute, of all static semantic errors_in program.

Syntax Tree

Attribute

Grammar

Classes of nodes:

• Attribute symtab contains a

list, with types, of all declared

identifiers.

• an inherited attribute errors_in lists all static

semantic errors found to its

left in the tree

• a synthesized attribute errors_out propagates the

final error list back to the root

Page 26: Languages and Compilers

Attribute Tree Grammar

for Calculator

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 26

• This tree grammar will be used to

perform static semantic types

checking

• Complete tree attribute grammar

for calculator language with types is

constructed using the node classes,

variants, and attributes (see Fig.

4.14 in Scott’s textbook) • Macro check_types calculates the

values of two different attributes type and errors. Under a strict

formulation of attribute grammars it

would be replaced by two separate

semantic functions, one per

calculated attribute.

Fragments of the attribute

tree grammar

Page 27: Languages and Compilers

Decoration of Syntax Tree

for Calculator • Syntax Tree allowing

for type checking

• Symbol table information

flows along chain of items

(int_decl and real_decl

nodes add new info) and

down into expr trees.

• Type information is

synthesized at id : expr

leaves by looking up an

identifier's name in the

symbol table. It then

propagates upward within

an expression tree, and is

used to type-check.

• Error messages are

accumulated into

synthesised attribute of

syntax tree root.

2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 27

Page 28: Languages and Compilers

Check Your Understanding

• What is the role of semantic analyser?

• What are the specifics of attribute grammar and their

difference from context-free grammars?

• Describe the difference between synthesised and inherited

attributes.

• Why is parse tree decoration necessary?

• What is attribute flow and how it works for top-down and

bottom-up decoration processes?

• Describe main translating schemes.

• What are action routines for?

• Describe the process of building a syntax tree and its

decorating using a tree grammar. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 28