Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Principles of Software Engineering
and Operating Systems
Languages and Compilers
SDAGE: Level I
Dr Valery Adzhiev
Office: TA-121
2015-16
7. Semantic Analysis
For some images: Copyright © 2009 Elsevier, Inc. All rights reserved
Contents
• Semantic Analysis Principles
• Attribute Grammars
• Decoration of Parse Trees
• Translating Schemes
• Construction of Syntax Tree
• Action Routines
• Tree Grammars and Decorating Syntax Tree
• Check Your Understanding
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 2
Syntax and Semantics
• Syntax is concerned with the form (i.e., structure) of a
valid program
• Semantics concerns program’s meaning that
– Allows for enforcing rules (e.g., type consistency) that go beyond
mere form
– Provides information necessary to generate equivalent output
program
• Semantics cannot be conveniently described by CFG
– In general, it’s a matter of semantics to deal with any rule that
requires compiler to
• compare things separated by long distances (e.g., if number of
arguments in call to subroutine match number of formal parameters in
subroutine’s definition?)
• To count things not properly nested, etc.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 3
Semantics Rules
in Different Languages
• Programming languages vary dramatically in their choice of semantic
rules making sure that run-time behaviour will be proper.
– In Fortran and C, operands of many types are allowed to be intermixed in
expressions, in Ada they aren’t.
• Languages also vary in the extent to which they require their
implementations to perform dynamic checks
– At one extreme, C requires no checks at all, beyond those that come "free"
with the hardware (e.g., division by zero, or attempted access to memory
outside the bounds of the program).
– At the other extreme, Java tries to check at run as many rules as possible,
in part to ensure that an untrusted program cannot do anything damaging.
• Some languages (e.g., Euclid and Eiffel) also provide explicit support
for logical assertions (invariants, preconditions, and post-conditions)
– Invariant is expected to be true at all "clean points" of a given body of code.
– Pre- and post-conditions are expected to be true at the beginning and end
of subroutines, respectively. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 4
Static and Dynamic
Semantics
• Compiler enforces static semantic rules at compile time
– Deals with restrictions on structure of valid program that are not
expressed in standard syntactic formalisms
– Most important are types of data and expressions
• Type checking is static and precise in languages like Ada and ML: the
compiler ensures that no variable will ever be used at run time in a
way that is inappropriate for its type.
• In Lisp and Smalltalk: greater flexibility, while remaining completely
type-safe, by accepting the run-time overhead of dynamic type checks.
• Compiler generates code to enforce dynamic semantic rules
at run time or to call library routines that do so
– Certain errors, such as division by zero or attempting to index into
array with out-of-bounds subscript may occur only for certain
values or behaviours of complex code
– Sometimes dynamic checks enabled only during development. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 5
Semantic Analyser
Functionality
• The role of semantic
analyser
– To enforce all static
semantic rules.
– To annotate the program
with information needed by
intermediate code
generator
• Clarification (e.g., this is
floating-point addition, not
integer or this is reference
to global variable x)
• Requirements for dynamic
checks
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 6
• Interface between semantic
analysis (figuring out what the
program means) and
intermediate code synthesis
(expressing the meaning in
some new form) defines the
boundary between front end
and back end
• This boundary can vary
Parse Tree and
Syntax Tree for GCD
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 7
Parsing and
Semantic Analysis
• Compilers vary in extent to which semantic analysis and
intermediate code generation are interleaved with parsing
– With fully separated phases, Parser passes a full parse tree onto
Semantic Analyser, which
• Converts a parse tree to a syntax tree
• Fills in symbol table
• Performs semantic checks (e.g., if types in expressions are compatible or all
identifiers are declared)
• Passes a syntax tree onto the code generator
– With fully interleaved phases, there maybe no need to build
either parse tree or syntax tree in its entirety
• Parser can call semantic check and code generation routines on the fly as it
parses each expression, statement or subroutine of the source.
• Common approach: construction of a syntax tree is interleaved with parsing
(no explicit parse tree built) and then sequential phases of semantic analysis
and code generation follow.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 8
Attribute Grammars
• Both semantic analysis and
intermediate code generation
can be described in terms of
annotation, or decoration of
a parse tree or a syntax tree
• CFG need some addition to
carry semantic info along trees
• Attribute Grammar (AG)
provides formal framework for
decoration of tree
– It is a useful conceptual tool even
in compilers that do not build
parse tree or syntax tree as an
explicit data structure.
• Action Routines are ad-hoc
cousins of AGs
• AG can be defined in machine-
independent way, and then
attributes can be
– Sequence of computation steps that
results from processing the
program’s input (re operational
semantics)
– Logical formulas (re axiomatic
semantics) allowing to prove
properties of programs
– Domain theory denotations (re
denotational semantics) represent
what programs do through
mathematical objects
• Formal semantics of programming
languages – Introduction:
http://people.cis.ksu.edu/~schmidt/705s12/Lectures
/chapter.pdf 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 9
SLR(1) Attribute Grammar
for Constant Expressions
• AG: CFG + additional rules – val attribute is associated with each E, T,
F and const
• S.val is meaning (arithmetic value) of token
derived from S
– Each production has associated rules (they
are definitions rather than assignments)
• Copy rules (productions 3, 6, 8, 9) that specify
that one attribute should be a copy of another
• Other rules invoke semantic functions (sum,
quotient, additive_inverse, etc.) that can in
general be arbitrary complex functions
specified by language designer (each of their
arguments must be an attribute of a symbol in
the current production – no global variables
allowed).
• Simple semantic functions can be written “in-
line” in some well-defined notation. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 10
Subscripts are to distinguish symbols with the
same name in LHS and RHS of production.
Decoration of
a Parse Tree: Example • Process of evaluating attributes is called
annotation or decoration of the parse tree
• Example: how to decorate a parse tree using
LR(1) AG from previous slide for the expression:
( 1 + 3 ) * 2
– val attributes of symbols are in boxes
– Curving arrows show the attribute flow, which is
strictly upward in this case
– Each box holds output of a single semantic rule
• Arrow(s) entering the box indicate the input(s) to the
rule
• For example, at 2nd level two arrows pointing into
box with the ‘8’ represent application of the rule:
T1.val := product(T2.val, F.val)
– Once decoration is complete, the value of overall expression is in ‘val’ attribute of the root.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 11
Synthesised and
Inherited Attributes • Synthesised attributes: their
values are calculated
(synthesised) only in productions
in which their symbol appears on
LHS
– For annotated parse trees this means
that the attribute flow (i.e. the pattern
in which information moves from node
to node) is entirely bottom-up.
• S-attributed AG: it’s SLR(1) - in it
all attributes are synthesised
– Arguments to semantic functions in
such AG are always attributes of
symbols on RHS of current production
• Value of numeric constant is initialised
by scanner
– Return value is placed into attribute
on LHS.
• Inherited attributes: their values
are calculated when their symbol
is on RHS of current production
– They allow contextual info to flow into
symbol from above or from side
– This means that rules of that
production can be enforced in
different ways (or generate different
values) depending on context.
– Symbol table info is commonly passed
from symbol to symbol by means of
inherited attributes.
– Inherited attributes of the root of parse
tree (i.e. of start symbol) can be used
to represent external environment
(run-time parameters like command-
line ones to compiler)
• L-attributed AG: it’s LL(1) 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 12
How to Decorate Parse
Tree Using LL(1) CFG?
• If we want to create AG that
accumulates the value of
overall expression into the
root of tree, we have a
problem:
– Because subtraction is left
associative, we cannot
summarise in bottom-up
attribute flow the right sub-tree
of root with single numeric
value
• Note that tokens have only
synthesized attributes, initialized by
the scanner (name of an identifier,
value of a constant, etc.).
• LL(1) CFG:
• Parse tree for 9 – 4 – 3
Construct it!
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 13
Decoration of Parse Tree
Using LL(1) AG • Attribute values are allowed to be
passed not only bottom-up but also
left-to-right in the tree
– One passes ‘9’ into the top-most expr_tail node, where it is combined
with ‘4’.
– The resulting ‘5’ is passed into the middle expr_tail node, combined with
‘3’ to make ‘2’, and then passed upward
to root.
• So we build LL(1) AG with inherited
attributes
– In each of first two productions, 1st rule
serves to copy left content (value of expression so far) into “subtotal” (st)
attribute (in left box of nodes)
– 2nd rule copies final value from right-most leaf back up to root or holds val.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 14
LL(1) AG:
Decorated Parse tree for 9 – 4 - 3
LL(1) AG for
Constant Expressions • This is L-attributed AG with underlying
LL(1) CFG grammar rather than SLR(1) as before (cf with expr productions in 5-18)
– Relative complexity of attribute flow results from
fact that operators are left associative but the
grammar cannot be left-recursive: the left and
right operands of a given operator are thus
found in separate productions
– Attributes can be evaluated during LL parsing in
a single left-to-right pass over input
– Each inherited attribute of RHS symbol
depends only on
• Inherited attributes of LHS symbol, or
• Synthetic or inherited attributes of symbols to its
left in RHS
– Value of left operand of each operator is carried into TT and FT productions by the st (subtotal)
attribute.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 15
Decoration of Top-Down
Tree using LL(1) AG
• Expression:
( 1 + 3 ) * 2 • Curving arrows
indicate attribute
flow
• Entering given box
represents
application of a
single semantic rule
– Match boxes & rules!
• Flow is no longer
strictly bottom-up,
but it is still left-to-
right
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 16
Construct parse tree before decorating it!
Translation Schemes
• AGs do not specify the order in which attribute rules should be invoked
– they are declarative: just define a set of valid trees, not how to build or
decorate them.
• Translation Scheme: an algorithm that decorates parse trees by
invoking rules of AG in an order consistent with tree’s attribute flow
– Oblivious Scheme: makes repeated passes over tree, invoking semantic
functions whose arguments have all been defined and stopping after pass in
which no value changed; no knowledge about tree or grammar.
– Dynamic Scheme: tailors the evaluation order to the structure of a given
parse tree (e.g., by constructing a topological sort of the attribute flow graph
and then invoking rules in an order consistent with the sort).
– Static Scheme: is based on analysis of AG structure itself, and then applied
mechanically to any tree arising from the AG (usually the fastest scheme)
• S-attributed AGs are most general class for which evaluation can be implemented
on the fly during an LR parsing
• L-attributed AGs are most general class for which evaluation can be implemented
on the fly during an LL parsing.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 17
Parsing and Semantic
Analysis: Interleave them?
• Compiler that interleaves semantic analysis and code
generation with parsing is sometimes called a one-pass
compiler
– If intermediate code generation is interleaved with parsing, one need
not build a syntax tree (unless the syntax tree is the intermediate code)
– It is even possible to write intermediate code to output file on the fly
without accumulating it in the attributes of parse tree root
• If we choose not to interleave parsing and semantic analysis:
– Semantic analysis is easier to perform during a separate traversal of
a syntax tree because that tree reflects program’s semantic structure
better than a parse tree (especially with top-down parsing)
– Still need to devise AG but it will serve only to create a syntax tree –
not to enforce semantic rules or generate code.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 18
Building a Syntax Tree
• If choose not to interleave parsing
and semantic analysis, we need AG
to (only!) construct Syntax Tree
– Attributes hold neither numerical
values nor target code fragments
– Instead they point to nodes of a
syntax tree
• Bottom-Up (S-attributed) AG
– Function make_leaf returns pointer
to newly allocated syntax tree node
containing value of a constant
– Functions make_un_op and
make_bin_op return pointers to
newly allocated syntax tree nodes
containing unary or binary operator
and pointers to supplied operand(s). 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 19
cf with AG on slide 10!
Construction of Syntax Tree
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 20
Construction of syntax tree (cf slide 11)
for ( 1 + 3 ) * 2 is via decoration of bottom-
up parse tree using S-attributed AG
• In (a), values ‘1’ and ‘3’ have been placed in new syntax tree leaves and then pointers to them propagate up into attributes of E and T.
• In (b), pointers to these leaves become child pointers of new internal ‘+’ node.
• In (c), pointer to this node propagates up into attributes of T, and new leaf for ‘2’ created.
• In (d), pointers from T and F become child pointers of new internal ‘x’ node, and pointer to it
propagates up into the attributes of E.
Action Routines
• There are automatic tools that will construct a semantic analyser (in simple
case – construct for a given AG (“Attribute Evaluator Generators”)
– They are used mostly in syntax-based editors, incremental compilers, etc.
• Most production compilers use ad-hoc, hand-written translation scheme
– It interleaves parsing with at least initial construction of syntax tree, and
possibly all of semantic analysis and intermediate code generation, and don’t
need to build full parse tree.
– Such a translation scheme is implemented using a set of Action Routines
– Action Routine is a semantic function that compiler has to execute at a
particular point in the parse.
• Most parser generators allow the programmer to specify action routines.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 21
• Implementation mechanism uses stack – when parser predicts a
production, all of RHS is pushed onto stack; action routine is called
when pointer to it is at the top of stack.
Fragment of LL(1)
grammar with action
routines in {…}
(for a complete
grammar see Fig. 4.10
in Scott’s texbook)
Decorating a Syntax Tree:
Tree Grammar
• If compiler uses action routines simply to build a syntax tree, then the
bulk of semantic analysis and intermediate code generation will use
not parse tree but a syntax tree as base.
– So we need to know how to decorate syntax tree.
• Tree Grammar is used to represent syntax tree structure – CFG is meant to define language composed of strings of tokens and
describes the possible structure of parse tree for that language where
every valid string is tree’s yield in the form of leaves’ sequence.
• Parsing is a process of finding a tree that has a given yield for the CFG.
– Tree grammar is meant to define (or generate) the trees themselves
• It is not for parsing but for decoration of syntax tree
• Semantic rules attached to productions of tree grammar can be used to define
the attribute flow of a syntax tree – i.e. for decorating syntax tree.
• As in CFG, each production represents a possible relationship between
parent (LHS) and children (RHS) in the tree
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 22
Calculator Language with
Types and Declarations
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 23
• Bottom-Up LR CFG for Calculator Language with types and
declarations (compare with the grammar from 6/21) - Declarations allowed to be intermixed with statements
- Constants of different types (integer and real) (presumably the latter
contain a decimal point)
- Explicit conversions between integer and real operands required
- The intended semantics of the language requires that every identifier
be declared before it is used, and that types not be mixed in computations
Tree Grammar and Syntax
Tree for Calculator
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 24
• Notation ‘A : B‘ on LHS of production
means: A is one variant of B, and may
appear anywhere a B expected on RHS.
• Productions for calculator
tree grammar with types
and declarations look like:
• Syntax Tree for simple program: - Represents a program as a linked list of
“nodes” which are “items” (declarations
and statements)
- Represents expressions
Attribute Tree Grammar
for Calculator
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 25
• This tree grammar will be used to
perform static semantic types checking
• Complete tree attribute grammar for
calculator language with types is
constructed using the node classes,
variants, and attributes (see Fig. 4.14 in
Scott’s textbook)
• Once decorated, the program node will
contain a list, in a synthesized attribute, of all static semantic errors_in program.
Syntax Tree
Attribute
Grammar
Classes of nodes:
• Attribute symtab contains a
list, with types, of all declared
identifiers.
• an inherited attribute errors_in lists all static
semantic errors found to its
left in the tree
• a synthesized attribute errors_out propagates the
final error list back to the root
Attribute Tree Grammar
for Calculator
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 26
• This tree grammar will be used to
perform static semantic types
checking
• Complete tree attribute grammar
for calculator language with types is
constructed using the node classes,
variants, and attributes (see Fig.
4.14 in Scott’s textbook) • Macro check_types calculates the
values of two different attributes type and errors. Under a strict
formulation of attribute grammars it
would be replaced by two separate
semantic functions, one per
calculated attribute.
Fragments of the attribute
tree grammar
Decoration of Syntax Tree
for Calculator • Syntax Tree allowing
for type checking
• Symbol table information
flows along chain of items
(int_decl and real_decl
nodes add new info) and
down into expr trees.
• Type information is
synthesized at id : expr
leaves by looking up an
identifier's name in the
symbol table. It then
propagates upward within
an expression tree, and is
used to type-check.
• Error messages are
accumulated into
synthesised attribute of
syntax tree root.
2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 27
Check Your Understanding
• What is the role of semantic analyser?
• What are the specifics of attribute grammar and their
difference from context-free grammars?
• Describe the difference between synthesised and inherited
attributes.
• Why is parse tree decoration necessary?
• What is attribute flow and how it works for top-down and
bottom-up decoration processes?
• Describe main translating schemes.
• What are action routines for?
• Describe the process of building a syntax tree and its
decorating using a tree grammar. 2015-16 NCCA SDAGE Level I: Languages and Compilers Dr Valery Adzhiev 28