Upload
others
View
29
Download
0
Embed Size (px)
Citation preview
Dr. Philip Cannata 1
Lexical and Syntactic Analysis
• Chomsky Grammar Hierarchy
• Lexical Analysis – Tokenizing
• Syntactic Analysis – Parsing
• Hmm Concrete Syntax
• Hmm Abstract Syntax
Programming Languages
Noam Chomsky
Dr. Philip Cannata 2
• Regular grammar – used for tokenizing• Context-free grammar (BNF) – used for parsing• Context-sensitive grammar – not really used for
programming languages
Chomsky Hierarchy
Dr. Philip Cannata 3
• Simplest; least powerful• Equivalent to:
– Regular expression (think of perl)– Finite-state automaton
• Right regular grammar: Terminal*,A and B NonterminalA → BA →
• Example:Integer → 0 Integer | 1 Integer | ... | 9 Integer |
0 | 1 | ... | 9
Regular Grammar
Dr. Philip Cannata 4
• Less powerful than context-free grammars• The following is not a regular language
{ aⁿ bⁿ | n ≥ 1 }i.e., cannot balance: ( ), { }, begin end
Regular Grammar
Dr. Philip Cannata 5
Regular Expressions
x a character x \x an escaped character, e.g., \n{ name } a reference to a nameM | N M or NM N M followed by NM* zero or more occurrences of MM+ One or more occurrences of MM? Zero or one occurrence of M[aeiou] the set of vowels[0-9] the set of digits. any single character
Dr. Philip Cannata 6
Regular Expressions
Dr. Philip Cannata 7
Regular Expressions
Dr. Philip Cannata 8
(S, a2i$) ├ (I, 2i$)├ (I, i$)├ (I, $)├ (F, )
Thus: (S, a2i$) ├* (F, )
Finite State Automaton for Identifiers
Dr. Philip Cannata 9
•
Deterministic Finite State Automaton Examples
Dr. Philip Cannata 10
Production:α → β
α Nonterminalβ (Nonterminal Terminal)*
ie, lefthand side is a single nonterminal, and righthandside is a string of nonterminals and/or terminals (possibly empty).
Context-Free Grammar
Dr. Philip Cannata 11
Production:α → β |α| ≤ |β|
α, β (Nonterminal Terminal)*ie, lefthand side can be composed of strings of
terminals and nonterminals
Context-Sensitive Grammar
Dr. Philip Cannata 12
• The syntax of a programming language is a precise description of all its grammatically correct programs.
• Precise syntax was first used with Algol 60, and has been used ever since.
• Three levels:– Lexical syntax - all the basic symbols of the language
(names, values, operators, etc.)– Concrete syntax - rules for writing expressions,
statements and programs.– Abstract syntax - internal representation of the program,
favoring content over form.
Syntax
Dr. Philip Cannata 13
GrammarsGrammars: Metalanguages used to define the concrete syntax of a language.
Backus Normal Form – Backus Naur Form (BNF)• Stylized version of a context-free grammar (cf. Chomsky hierarchy)• First used to define syntax of Algol 60• Now used to define syntax of most major languages
Production:α → βα Nonterminalβ (Nonterminal Terminal)*
ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty).
• ExampleInteger Digit | Integer DigitDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Dr. Philip Cannata 14
Extended BNF (EBNF)
Additional metacharacters{ } a series of zero or more( ) must pick one from a list[ ] pick none or one from a list
ExampleExpression -> Term { ( + | - ) Term }IfStatement -> if ( Expression ) Statement [ else Statement ]
EBNF is no more powerful than BNF, but its production rules are often simpler and clearer.
Javacc EBNF( … )* a series of zero or more( … )+ a series of one or more[ … ] optional
Dr. Philip Cannata 15
For more details, see Chapter 2 of“Programming Language Pragmatics, Third Edition (Paperback)”Michael L. Scott (Author)
Dr. Philip Cannata 16
Internal Parse Tree
Abstract Syntax
int main ()
{
return 0 ;
}
Program (abstract syntax):Function = main; Return type = intparams =Block:
Return:Variable: return#main, LOCAL addr=0IntValue: 0
Instance of a Programming Language:
Dr. Philip Cannata 17
Now we’ll focus on the internal parse tree
Dr. Philip Cannata 18
Parse Trees
Integer Digit | Integer DigitDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parse Tree for 352 as an Integer
Dr. Philip Cannata 19
Arithmetic Expression Grammar
Expr Expr + Term | Expr – Term | TermTerm 0 | ... | 9 | ( Expr )
Parse of 5 - 4 + 3
Dr. Philip Cannata 20
• A grammar can be used to define associativity and precedence among the operators in an expression.E.g., + and - are left-associative operators in mathematics;
* and / have higher precedence than + and - .
• Consider the following grammar:Expr -> Expr + Term | Expr – Term | TermTerm -> Term * Factor | Term / Factor | Term % Factor | FactorFactor -> Primary ** Factor | PrimaryPrimary -> 0 | ... | 9 | ( Expr )
Associativity and Precedence
Dr. Philip Cannata 21
Associativity and Precedence
Parse of 4**2**3 + 5 * 6 + 7
Dr. Philip Cannata 22
Precedence Associativity Operators3 right **2 left * / %1 left + -
Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level.
Associativity and Precedence
Dr. Philip Cannata 23
• A grammar is ambiguous if one of its strings has two or more diffferent parse trees.
• Example:Expr -> Expr Op Expr | ( Expr ) | IntegerOp -> + | - | * | / | % | **
• Equivalent to previous grammar but ambiguous
Ambiguous Grammars
Dr. Philip Cannata 24
Ambiguous Parse of 5 – 4 + 3
Ambiguous Grammars
Dr. Philip Cannata 25
Dangling Else Ambiguous Grammars
IfStatement -> if ( Expression ) Statement |if ( Expression ) Statement else Statement
Statement -> Assignment | IfStatement | BlockBlock -> { Statements }Statements -> Statements Statement | Statement
With which ‘if’ does the following ‘else’ associate
if (x < 0)if (y < 0) y = y - 1;else y = 0;
Dr. Philip Cannata 26
Dangling Else Ambiguous Grammars
Dr. Philip Cannata 27
Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject}Function : ( ) BlockMyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function
} }MyObject: Identifier Identifier = create Identifier callArgsConstructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] }Type : int|bool| float | list |tuple| object | string | voidStatements : { Statement }Statement : ; | Declaration| Block |ForEach| Assignment
|IfStatement|WhileStatement|CallStatement|ReturnStatementBlock : { Statements }ForEach: for( Expression <- Expression ) BlockAssignment : Identifier [ [ Expression ] ]= Expression ;Parameter : Type IdentifierIfStatement: if ( Expression ) Block [elseifStatement| Block ]WhileStatement: while ( Expression ) Block
Hmm BNF (i.e., Concrete Syntax)
Dr. Philip Cannata 28
Expression : Conjunction {|| Conjunction }Conjunction : Equality {&&Equality }Equality : Relation [EquOp Relation ]EquOp: == | != Relation : Addition [RelOp Addition ]RelOp: <|<= |>|>= Addition : Term {AddOp Term }AddOp: + | -Term : Factor {MulOp Factor }MulOp: * | / | %Factor : [UnaryOp]PrimaryUnaryOp: - | !Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension| ObjFunctioncallOrLambda : Identifier callArgs|LambdaDefcallArgs : ([Expression |passFunc { ,Expression |passFunc}] )passFunc : Identifier (Type Identifier { Type Identifier } )LambdaDef : (\\ Identifier { ,Identifier } -> Expression)
Hmm BNF (i.e., Concrete Syntax)
Dr. Philip Cannata 29
Hmm BNF (i.e., Concrete Syntax)
IdentifierOrArrayRef : Identifier [ [Expression] ]subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] )ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<-Expression ] } ]ObjFunction: Identifier . Identifier . Identifier callArgsIdentifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)}Literal : Integer | True | False | ClFloat | ClStringInteger : Digit { Digit }ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9} ClString: ” {~[“] }”
Dr. Philip Cannata 30
Clite Operator AssociativityUnary - ! none* / left+ - left< <= > >= none== != none&& left|| left
Associativity and Precedence for Hmm
Dr. Philip Cannata 31
Hmm Parse Tree Example
z = x + 2 * y;
Dr. Philip Cannata 32
Now we’ll focus on the AbstractSyntax
Dr. Philip Cannata 33
Hmm Parse Tree
z = x + 2 * y;
=
Dr. Philip Cannata 34
Very Approximate Hmm Abstract Syntax
Dr. Philip Cannata 35
Assignment = Variable target; Expression sourceExpression = VariableRef | Value | Binary | UnaryVariableRef = Variable | ArrayRefVariable = String idArrayRef = String id; Expression indexValue = IntValue | BoolValue | FloatValue | CharValueBinary = Operator op; Expression term1, term2Unary = UnaryOp op; Expression termOperator = ArithmeticOp | RelationalOp | BooleanOpIntValue = Integer intValue…
Very Approximate Hmm Abstract Syntax
Dr. Philip Cannata 36
Binary
BinaryOperator
Operator
Variable
VariableValue
+
2 y*
x
Hmm Abstract Syntax – Binary Examplez = x + 2 * y
=